Data Engineer - Python/PySpark

68 years

0 Lacs

Hyderabad, Telangana, India

Posted:3 days ago| Platform: Linkedin logo

Apply

Skills Required

data python pyspark design apache spark etl aws consistency reliability schedule airflow ml model deployment sagemaker sql processing documentation architecture agile planning code resolve engineering management governance communication kafka devops terraform nosql dynamodb mongodb security compliance cutting collaborative development support

Work Mode

On-site

Job Type

Full Time

Job Description

Key Responsibilities Design, develop, and optimize large-scale data pipelines using PySpark and Apache Spark. Build scalable and robust ETL workflows leveraging AWS services such as EMR, S3, Lambda, and Glue. Collaborate with data scientists, analysts, and other engineers to gather requirements and deliver clean, well-structured data solutions. Integrate data from various sources, ensuring high data quality, consistency, and reliability. Manage and schedule workflows using Apache Airflow. Work on ML model deployment pipelines using tools like SageMaker and Anaconda. Write efficient and optimized SQL queries for data processing and validation. Develop and maintain technical documentation for data pipelines and architecture. Participate in Agile ceremonies, sprint planning, and code reviews. Troubleshoot and resolve issues in production environments with minimal supervision. Required Skills And Qualifications Bachelor's or Masters degree in Computer Science, Engineering, or a related field. 68 years of experience in data engineering with a strong focus on : Python PySpark SQL AWS (EMR, EC2, S3, Lambda, Glue) Experience in developing and orchestrating pipelines using Apache Airflow. Familiarity with SageMaker for ML deployment and Anaconda for environment management. Proficiency in working with large datasets and optimizing Spark jobs. Experience in building data lakes and data warehouses on AWS. Strong understanding of data governance, data quality, and data lineage. Excellent documentation and communication skills. Comfortable working in a fast-paced Agile environment. Experience with Kafka or other real-time streaming platforms. Familiarity with DevOps practices and tools (e.g., Terraform, CloudFormation). Exposure to NoSQL databases such as DynamoDB or MongoDB. Knowledge of data security and compliance standards (GDPR, HIPAA). Work with cutting-edge technologies in a collaborative and innovative environment. Opportunity to influence large-scale data infrastructure. Competitive salary, benefits, and professional development support. Be part of a growing team solving real-world data challenges. (ref:hirist.tech) Show more Show less

Mock Interview

Practice Video Interview with JobPe AI

Start Data Interview Now

RecommendedJobs for You