Posted:2 weeks ago|
Platform:
Remote
Full Time
ob Title: Data Engineer Experience: 6-8 years Technology Stack: PySpark, AWS EMR, Apache Airflow Job Overview We are seeking an experienced Data Engineer to join our team in an offshore capacity. The ideal candidate will have 5-8 years of hands-on experience in building, deploying, and maintaining scalable data pipelines and processing frameworks using PySpark, AWS EMR, and Apache Airflow. This role will require collaborating with cross-functional teams, understanding business needs, and designing robust solutions for large-scale data processing. Key Responsibilities Design, develop, and maintain efficient, scalable data pipelines for batch and real-time processing. Use PySpark to process large datasets and perform transformations, ensuring high performance and optimized workflows. Build and manage data workflows with Apache Airflow, ensuring smooth scheduling and execution of ETL pipelines. Implement AWS EMR clusters for big data processing, ensuring efficient scaling, cost optimization, and high availability. Develop automated solutions for data extraction, transformation, and loading (ETL) across various sources and sinks. Collaborate with data architects, analysts, and other stakeholders to gather requirements and ensure smooth integration of data solutions. Monitor and troubleshoot data pipelines, ensuring the system runs efficiently and without disruptions. Optimize complex queries, algorithms, and processing logic to meet performance and scalability requirements. Perform data validation and quality checks to ensure the accuracy and consistency of the data. Stay updated with the latest advancements in big data technologies and cloud infrastructure to suggest improvements in processes. Required Skills 6-8 years of experience in data engineering, with strong expertise in data pipeline design and big data processing. Proficiency in PySpark for distributed data processing. Experience working with AWS EMR for big data processing and managing clusters. Hands-on experience with Apache Airflow for orchestration and scheduling of data workflows. Solid understanding of data warehousing concepts, ETL processes, and data integration. Strong experience with SQL for querying and optimizing large datasets. Familiarity with other AWS services like Lambda, RDS, and Glue is a plus. Strong troubleshooting, debugging, and problem-solving skills. Ability to work independently in an offshore setup, collaborating effectively with remote teams. Preferred Skills Experience in Data Lakes, Redshift, or other cloud-based data storage and processing systems. Understanding of data security and privacy best practices for handling sensitive data. Familiar with machine learning concepts and data science workflows. Education cheloror Master degree in Computer Science, Engineering, or a related field. Additional Information This position is offshore (India) and will require remote collaboration with teams based in other regions. Opportunity to work on challenging data engineering projects with a global team. Show more Show less
Virtusa
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
My Connections Virtusa
Chennai
25.0 - 30.0 Lacs P.A.
Hyderabad, Pune, Bengaluru
10.0 - 20.0 Lacs P.A.
Chennai
0.5 - 0.6 Lacs P.A.
Hyderabad, Chennai, Bengaluru
9.5 - 15.0 Lacs P.A.
Bengaluru
7.0 - 17.0 Lacs P.A.
Hyderabad
15.0 - 30.0 Lacs P.A.
Pune
15.0 - 30.0 Lacs P.A.
Chennai, Bengaluru
15.0 - 20.0 Lacs P.A.
Hyderabad, Chennai, Bengaluru
10.0 - 19.0 Lacs P.A.
HyderÄbÄd
2.51046 - 7.5 Lacs P.A.