Home
Jobs

Posted:1 day ago| Platform: Foundit logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Key Responsibilities : Data Pipeline Design & Development : Independently design, develop, and maintain scalable data pipelines and ETL processes to collect, clean, and transform large-scale datasets from multiple sources. Infrastructure Building & Optimization : Build and optimize the data infrastructure, including data lakes, warehouses, and real-time data streaming solutions, ensuring they meet performance and scalability requirements. Data Quality & Integrity : Ensure data quality, integrity, and availability across all systems by troubleshooting, diagnosing, and resolving any issues independently. Collaboration : Work closely with data scientists, analysts, and software engineers to ensure smooth integration between data pipelines and data models. Performance Monitoring & Optimization : Continuously monitor the performance of data pipelines and infrastructure, optimizing them for high performance and scalability. Data Storage Solutions : Design and implement solutions for data storage, including developing database schemas, indexing strategies, and partitioning approaches to ensure data accessibility. Cloud Infrastructure Management : Manage cloud-based environments (AWS, GCP, Azure) and implement infrastructure as code using tools like Terraform or CloudFormation . Qualifications : Education : Bachelor's degree in Computer Science, Engineering, Information Systems, or a related field. A Master's degree is a plus. Experience : Minimum of 5 years of experience as a Data Engineer, working on building and maintaining data pipelines and infrastructure. Primary Skills : Programming : Strong expertise in Python , Pyspark , and Spark/Scala for data processing tasks. SQL : Proficient in writing complex SQL queries for data manipulation and extraction. ETL Tools : Experience using ETL tools to build and orchestrate data workflows. Cloud Platforms : Hands-on experience with cloud services such as AWS , GCP , Azure (S3, Redshift, BigQuery, Dataflow, Databricks, or Snowflake). Database Knowledge : Familiarity with relational (MySQL, PostgreSQL, SQL Server) and NoSQL databases (MongoDB, Cassandra, HBase). Version Control & CI/CD : Experience with Git and familiarity with CI/CD pipelines for managing data pipelines and infrastructure. Preferred Skills : Azure Cloud : Knowledge of Azure cloud services like Delta Lake , ADLS , Event Hubs , and Cosmos DB . Real-time Data Processing : Experience with frameworks like Kafka Streams , Apache Flink , or Spark Streaming for real-time data processing. Machine Learning/Data Science Pipelines : Familiarity with machine learning or data science pipelines. Containerization & Orchestration : Experience with Docker and Kubernetes for containerization and orchestration of data services. API Integration : Understanding of APIs and microservices architecture for integrating data across systems. Delta Lake : Familiarity with Delta Lake or similar technologies for efficient data lake architecture.

Mock Interview

Practice Video Interview with JobPe AI

Start Spark/Scala Interview Now

My Connections Clifyx Technology

Download Chrome Extension (See your connection in the Clifyx Technology )

chrome image
Download Now
Clifyx Technology
Clifyx Technology

Technology

Innovation City

50-100 Employees

484 Jobs

    Key People

  • Jane Doe

    CEO
  • John Smith

    CTO

RecommendedJobs for You

Andhra Pradesh, India

Bengaluru / Bangalore, Karnataka, India

Bengaluru / Bangalore, Karnataka, India

Bengaluru / Bangalore, Karnataka, India

Pune, Maharashtra, India

Hyderabad, Telangana, India