Posted:1 day ago|
Platform:
On-site
Full Time
Key Responsibilities : Data Pipeline Design & Development : Independently design, develop, and maintain scalable data pipelines and ETL processes to collect, clean, and transform large-scale datasets from multiple sources. Infrastructure Building & Optimization : Build and optimize the data infrastructure, including data lakes, warehouses, and real-time data streaming solutions, ensuring they meet performance and scalability requirements. Data Quality & Integrity : Ensure data quality, integrity, and availability across all systems by troubleshooting, diagnosing, and resolving any issues independently. Collaboration : Work closely with data scientists, analysts, and software engineers to ensure smooth integration between data pipelines and data models. Performance Monitoring & Optimization : Continuously monitor the performance of data pipelines and infrastructure, optimizing them for high performance and scalability. Data Storage Solutions : Design and implement solutions for data storage, including developing database schemas, indexing strategies, and partitioning approaches to ensure data accessibility. Cloud Infrastructure Management : Manage cloud-based environments (AWS, GCP, Azure) and implement infrastructure as code using tools like Terraform or CloudFormation . Qualifications : Education : Bachelor's degree in Computer Science, Engineering, Information Systems, or a related field. A Master's degree is a plus. Experience : Minimum of 5 years of experience as a Data Engineer, working on building and maintaining data pipelines and infrastructure. Primary Skills : Programming : Strong expertise in Python , Pyspark , and Spark/Scala for data processing tasks. SQL : Proficient in writing complex SQL queries for data manipulation and extraction. ETL Tools : Experience using ETL tools to build and orchestrate data workflows. Cloud Platforms : Hands-on experience with cloud services such as AWS , GCP , Azure (S3, Redshift, BigQuery, Dataflow, Databricks, or Snowflake). Database Knowledge : Familiarity with relational (MySQL, PostgreSQL, SQL Server) and NoSQL databases (MongoDB, Cassandra, HBase). Version Control & CI/CD : Experience with Git and familiarity with CI/CD pipelines for managing data pipelines and infrastructure. Preferred Skills : Azure Cloud : Knowledge of Azure cloud services like Delta Lake , ADLS , Event Hubs , and Cosmos DB . Real-time Data Processing : Experience with frameworks like Kafka Streams , Apache Flink , or Spark Streaming for real-time data processing. Machine Learning/Data Science Pipelines : Familiarity with machine learning or data science pipelines. Containerization & Orchestration : Experience with Docker and Kubernetes for containerization and orchestration of data services. API Integration : Understanding of APIs and microservices architecture for integrating data across systems. Delta Lake : Familiarity with Delta Lake or similar technologies for efficient data lake architecture.
Clifyx Technology
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
My Connections Clifyx Technology
Andhra Pradesh, India
Experience: Not specified
Salary: Not disclosed
Bengaluru / Bangalore, Karnataka, India
3.0 - 11.5 Lacs P.A.
Bengaluru / Bangalore, Karnataka, India
4.0 - 9.0 Lacs P.A.
Bengaluru / Bangalore, Karnataka, India
4.0 - 9.0 Lacs P.A.
Bengaluru
20.0 - 25.0 Lacs P.A.
Pune, Maharashtra, India
Experience: Not specified
Salary: Not disclosed
Hyderabad, Telangana, India
Experience: Not specified
Salary: Not disclosed
Hyderabad, Pune
16.0 - 18.0 Lacs P.A.
Bengaluru
3.0 - 7.0 Lacs P.A.
Karnataka
3.0 - 7.0 Lacs P.A.