Jobs
Interviews

86 Aws Emr Jobs - Page 4

Setup a job Alert
JobPe aggregates results for easy application access, but you actually apply on the job portal directly.

5.0 - 10.0 years

3 - 7 Lacs

Bengaluru

Work from Office

Job Title:EMR_Spark SME Experience:5-10 Years Location:Bangalore : Technical Skills: 5+ years of experience in big data technologies with hands-on expertise in AWS EMR and Apache Spark. Proficiency in Spark Core, Spark SQL, and Spark Streaming for large-scale data processing. Strong experience with data formats (Parquet, Avro, JSON) and data storage solutions (Amazon S3, HDFS). Solid understanding of distributed systems architecture and cluster resource management (YARN). Familiarity with AWS services (S3, IAM, Lambda, Glue, Redshift, Athena). Experience in scripting and programming languages such as Python, Scala, and Java. Knowledge of containerization and orchestration (Docker, Kubernetes) is a plus. Architect and develop scalable data processing solutions using AWS EMR and Apache Spark. Optimize and tune Spark jobs for performance and cost efficiency on EMR clusters. Monitor, troubleshoot, and resolve issues related to EMR and Spark workloads. Implement best practices for cluster management, data partitioning, and job execution. Collaborate with data engineering and analytics teams to integrate Spark solutions with broader data ecosystems (S3, RDS, Redshift, Glue, etc.). Automate deployments and cluster management using infrastructure-as-code tools like CloudFormation, Terraform, and CI/CD pipelines. Ensure data security and governance in EMR and Spark environments in compliance with company policies. Provide technical leadership and mentorship to junior engineers and data analysts. Stay current with new AWS EMR features and Spark versions to recommend improvements and upgrades. Requirements and Skills Performance tuning and optimization of Spark jobs. Problem-solving skills with the ability to diagnose and resolve complex technical issues. Strong experience with version control systems (Git) and CI/CD pipelines. Excellent communication skills to explain technical concepts to both technical and non-technical audiences. Qualification: Education qualificationB.Tech, BE, BCA, MCA, M. Tech or equivalent technical degree from a reputed college. Certifications: AWS Certified Solutions Architect – Associate/Professional AWS Certified Data Analytics – Specialty

Posted 2 months ago

Apply

5.0 - 10.0 years

5 - 10 Lacs

Gurgaon / Gurugram, Haryana, India

On-site

Technology Leadership Independently he/she should be able to design, implement, and deliver complex Data Warehousing/Data Lake, Cloud Data Management, and Data Integration project assignments. Technical Design and Development Expertise Any of the ETL tools (Informatica, IICS, Matillion, Data Stage), and hosting technologies like the AWS stack (Redshift, EC2) is mandatory. Any of the BI tools among Tableau, Qlik, Power BI, and MSTR. Informatica MDM, Customer Data Management . Expert knowledge of SQL with the capability to performance tune complex SQL queries in tradition and distributed RDBMS systems is a must. Experience across Python, PySpark, and Unix/Linux Shell Scripting . Project Management Is a must to have. Should be able to create simple to complex project plans in Microsoft Project Plan and think in advance about potential risks and mitigation plans as per project plan. Task Management Should be able to onboard the team on the project plan and delegate tasks to accomplish milestones as per plan. Should be comfortable in discussing and prioritizing work items with team members in an onshore-offshore model. Client Relationship Manage client communication and client expectations independently or with support of reporting manager. Should be able to deliver results back to the Client as per plan. Education Bachelor Equivalent - Other PG Diploma in Management Work Experience We are hiring for the following roles across Data management tech stacks: ETL-Snowflake/AWS/IICS: 5-8 years of experience in ETL tools - IICS, Redshift, Snowflake. Strong experience in AWS/Snowflake technologies - Redshift / Synapse/ Snowflake. Experienced in running an end-to-end ETL project and interacting with users globally. Has good knowledge of DW architectural principles and ETL mapping, transformation, workflow designing, batch script development. Python/PySpark: Expert in Python and should be able to efficiently use Python data-science and math packages such as NumPy, Pandas, and Scikit-learn/ Python web framework. Deep experience in developing data processing tasks using pySpark such as reading data from external sources, merge data, perform data enrichment and load into target data destinations. Prior experience with Redshift / Synapse / Snowflake. AWS Infra Architect: 10-15 years of experience as an AWS Cloud Infrastructure administrator role, AWS Cloud Architect, or Solution Architect. 2-3 years of experience as AWS Cloud Architect. Hands-on experience in debugging AWS services like EC2, EMR, S3, Redshift, Lambda etc. Hands-on experience in container orchestration tools like ECS / EKS. Hands-on experience in creating infrastructure using IaC like Cloudformation/Terraform. Data Modeler: 8+ years of experience in commercial Data modeling, data entity defining for developing business insights for life sciences organization. Prior experience in client management and have worked across a variety of projects from data engineering to data operations, to help improve and run clients entire system of business processes and operations, implementing cutting-edge automation technologies. Azure ADF: 5+ years of relevant experience in delivering customer-focused information management solution(s) across Data Lakes, Enterprise Data Warehouses and Enterprise Data Integration projects primarily in MS Azure cloud using Data Factory and Databricks. Snowflake Architect: 10+ years overall EDW (ETL, BI projects) /Cloud Architecture experience, software development experience using object-oriented languages. Expertise in Snowflake advanced concepts like setting up resource monitors, RBAC controls, virtual warehouse sizing, query performance tuning, Zero copy clone, time travel, and understand how to use these features. Business Analyst - Patient Specialty Services: 8-10 years of extensive experience in working on Patient level datasets. Have a fair understanding of Patient data processing within the HIPAA environment, such as Patient data aggregation, tokenization, etc. MDM - Informatica/Reltio: 5-8 years of experience should have hands-on experience working on MDM Projects. Hands-on experience in industry data quality tools like Informatica IDQ, and IBM Data Quality.

Posted 2 months ago

Apply

5.0 - 9.0 years

5 - 9 Lacs

Noida, Uttar Pradesh, India

On-site

Technology Leadership Independently he/she should be able to design, implement, and deliver complex Data Warehousing/Data Lake, Cloud Data Management, and Data Integration project assignments. Technical Design and Development Expertise Any of the ETL tools (Informatica, IICS, Matillion, Data Stage), and hosting technologies like the AWS stack (Redshift, EC2) is mandatory. Any of the BI tools among Tableau, Qlik, Power BI, and MSTR. Informatica MDM, Customer Data Management . Expert knowledge of SQL with the capability to performance tune complex SQL queries in tradition and distributed RDBMS systems is a must. Experience across Python, PySpark, and Unix/Linux Shell Scripting . Project Management Is a must to have. Should be able to create simple to complex project plans in Microsoft Project Plan and think in advance about potential risks and mitigation plans as per project plan. Task Management Should be able to onboard the team on the project plan and delegate tasks to accomplish milestones as per plan. Should be comfortable in discussing and prioritizing work items with team members in an onshore-offshore model. Client Relationship Manage client communication and client expectations independently or with support of reporting manager. Should be able to deliver results back to the Client as per plan. Education Bachelor Equivalent - Other PG Diploma in Management Work Experience We are hiring for the following roles across Data management tech stacks: ETL-Snowflake/AWS/IICS: 5-8 years of experience in ETL tools - IICS, Redshift, Snowflake. Strong experience in AWS/Snowflake technologies - Redshift / Synapse/ Snowflake. Experienced in running an end-to-end ETL project and interacting with users globally. Has good knowledge of DW architectural principles and ETL mapping, transformation, workflow designing, batch script development. Python/PySpark: Expert in Python and should be able to efficiently use Python data-science and math packages such as NumPy, Pandas, and Scikit-learn/ Python web framework. Deep experience in developing data processing tasks using pySpark such as reading data from external sources, merge data, perform data enrichment and load into target data destinations. Prior experience with Redshift / Synapse / Snowflake. AWS Infra Architect: 10-15 years of experience as an AWS Cloud Infrastructure administrator role, AWS Cloud Architect, or Solution Architect. 2-3 years of experience as AWS Cloud Architect. Hands-on experience in debugging AWS services like EC2, EMR, S3, Redshift, Lambda etc. Hands-on experience in container orchestration tools like ECS / EKS. Hands-on experience in creating infrastructure using IaC like Cloudformation/Terraform. Data Modeler: 8+ years of experience in commercial Data modeling, data entity defining for developing business insights for life sciences organization. Prior experience in client management and have worked across a variety of projects from data engineering to data operations, to help improve and run clients entire system of business processes and operations, implementing cutting-edge automation technologies. Azure ADF: 5+ years of relevant experience in delivering customer-focused information management solution(s) across Data Lakes, Enterprise Data Warehouses and Enterprise Data Integration projects primarily in MS Azure cloud using Data Factory and Databricks. Snowflake Architect: 10+ years overall EDW (ETL, BI projects) /Cloud Architecture experience, software development experience using object-oriented languages. Expertise in Snowflake advanced concepts like setting up resource monitors, RBAC controls, virtual warehouse sizing, query performance tuning, Zero copy clone, time travel, and understand how to use these features. Business Analyst - Patient Specialty Services: 8-10 years of extensive experience in working on Patient level datasets. Have a fair understanding of Patient data processing within the HIPAA environment, such as Patient data aggregation, tokenization, etc. MDM - Informatica/Reltio: 5-8 years of experience should have hands-on experience working on MDM Projects. Hands-on experience in industry data quality tools like Informatica IDQ, and IBM Data Quality.

Posted 2 months ago

Apply

5.0 - 8.0 years

5 - 8 Lacs

Gurgaon / Gurugram, Haryana, India

On-site

Technology Leadership Independently he/she should be able to design, implement, and deliver complex Data Warehousing/Data Lake, Cloud Data Management, and Data Integration project assignments. Technical Design and Development Expertise Any of the ETL tools (Informatica, IICS, Matillion, Data Stage), and hosting technologies like the AWS stack (Redshift, EC2) is mandatory. Any of the BI tools among Tableau, Qlik, Power BI, and MSTR. Informatica MDM, Customer Data Management . Expert knowledge of SQL with the capability to performance tune complex SQL queries in tradition and distributed RDBMS systems is a must. Experience across Python, PySpark, and Unix/Linux Shell Scripting . Project Management Is a must to have. Should be able to create simple to complex project plans in Microsoft Project Plan and think in advance about potential risks and mitigation plans as per project plan. Task Management Should be able to onboard the team on the project plan and delegate tasks to accomplish milestones as per plan. Should be comfortable in discussing and prioritizing work items with team members in an onshore-offshore model. Client Relationship Manage client communication and client expectations independently or with support of reporting manager. Should be able to deliver results back to the Client as per plan. Education Bachelor Equivalent - Other PG Diploma in Management Work Experience We are hiring for the following roles across Data management tech stacks: ETL-Snowflake/AWS/IICS: 5-8 years of experience in ETL tools - IICS, Redshift, Snowflake. Strong experience in AWS/Snowflake technologies - Redshift / Synapse/ Snowflake. Experienced in running an end-to-end ETL project and interacting with users globally. Has good knowledge of DW architectural principles and ETL mapping, transformation, workflow designing, batch script development. Python/PySpark: Expert in Python and should be able to efficiently use Python data-science and math packages such as NumPy, Pandas, and Scikit-learn/ Python web framework. Deep experience in developing data processing tasks using pySpark such as reading data from external sources, merge data, perform data enrichment and load into target data destinations. Prior experience with Redshift / Synapse / Snowflake. AWS Infra Architect: 10-15 years of experience as an AWS Cloud Infrastructure administrator role, AWS Cloud Architect, or Solution Architect. 2-3 years of experience as AWS Cloud Architect. Hands-on experience in debugging AWS services like EC2, EMR, S3, Redshift, Lambda etc. Hands-on experience in container orchestration tools like ECS / EKS. Hands-on experience in creating infrastructure using IaC like Cloudformation/Terraform. Data Modeler: 8+ years of experience in commercial Data modeling, data entity defining for developing business insights for life sciences organization. Prior experience in client management and have worked across a variety of projects from data engineering to data operations, to help improve and run clients entire system of business processes and operations, implementing cutting-edge automation technologies. Azure ADF: 5+ years of relevant experience in delivering customer-focused information management solution(s) across Data Lakes, Enterprise Data Warehouses and Enterprise Data Integration projects primarily in MS Azure cloud using Data Factory and Databricks. Snowflake Architect: 10+ years overall EDW (ETL, BI projects) /Cloud Architecture experience, software development experience using object-oriented languages. Expertise in Snowflake advanced concepts like setting up resource monitors, RBAC controls, virtual warehouse sizing, query performance tuning, Zero copy clone, time travel, and understand how to use these features. Business Analyst - Patient Specialty Services: 8-10 years of extensive experience in working on Patient level datasets. Have a fair understanding of Patient data processing within the HIPAA environment, such as Patient data aggregation, tokenization, etc. MDM - Informatica/Reltio: 5-8 years of experience should have hands-on experience working on MDM Projects. Hands-on experience in industry data quality tools like Informatica IDQ, and IBM Data Quality.

Posted 2 months ago

Apply

8.0 - 11.0 years

35 - 37 Lacs

Kolkata, Ahmedabad, Bengaluru

Work from Office

Dear Candidate, We are hiring a Cloud Architect to design and oversee scalable, secure, and cost-efficient cloud solutions. Great for architects who bridge technical vision with business needs. Key Responsibilities: Design cloud-native solutions using AWS, Azure, or GCP Lead cloud migration and transformation projects Define cloud governance, cost control, and security strategies Collaborate with DevOps and engineering teams for implementation Required Skills & Qualifications: Deep expertise in cloud architecture and multi-cloud environments Experience with containers, serverless, and microservices Proficiency in Terraform, CloudFormation, or equivalent Bonus: Cloud certification (AWS/Azure/GCP Architect) Soft Skills: Strong troubleshooting and problem-solving skills. Ability to work independently and in a team. Excellent communication and documentation skills. Note: If interested, please share your updated resume and preferred time for a discussion. If shortlisted, our HR team will contact you. Kandi Srinivasa Delivery Manager Integra Technologies

Posted 2 months ago

Apply

8.0 - 11.0 years

35 - 37 Lacs

Kolkata, Ahmedabad, Bengaluru

Work from Office

Dear Candidate, Looking for a Cloud Data Engineer to build cloud-based data pipelines and analytics platforms. Key Responsibilities: Develop ETL workflows using cloud data services. Manage data storage, lakes, and warehouses. Ensure data quality and pipeline reliability. Required Skills & Qualifications: Experience with BigQuery, Redshift, or Azure Synapse. Proficiency in SQL, Python, or Spark. Familiarity with data lake architecture and batch/streaming. Soft Skills: Strong troubleshooting and problem-solving skills. Ability to work independently and in a team. Excellent communication and documentation skills. Note: If interested, please share your updated resume and preferred time for a discussion. If shortlisted, our HR team will contact you. Kandi Srinivasa Delivery Manager Integra Technologies

Posted 2 months ago

Apply

5 - 8 years

5 - 15 Lacs

Pune, Chennai

Work from Office

• SQL: 2-4 years of experience • Spark: 1-2 years of experience • NoSQL Databases: 1-2 years of experience • Database Architecture: 2-3 years of experience • Cloud Architecture: 1-2 years of experience • Experience in programming language like Python • Good Understanding of ETL (Extract, Transform, Load) concepts • Good analytical and problem-solving skills • Inclination for learning & be self-motivated. • Knowledge of ticketing tool like JIRA/SNOW. • Good communication skills to interact with Customers on issues & requirements. Good to Have: • Knowledge/Experience in Scala.

Posted 2 months ago

Apply

2 - 3 years

0 - 0 Lacs

Thiruvananthapuram

Work from Office

Role Proficiency: Acts under very minimal guidance to develop error free code; testing and documenting applications Outcomes: Understand the applications features and component design and develop the same in accordance with user stories/requirements. Code debug test and document; and communicate product/component/feature development stages. Develop optimized code with appropriate approach and algorithms following standards and security guidelines independently Effectively interact with customers and articulate their input Optimise efficiency cost and quality by identifying opportunities for automation/process improvements and agile delivery models Mentor Developer I - Software Engineering to become more effective in their role Learn technology business domain and system domain as recommended by the project/account Set FAST goals and provide feedback to FAST goals of mentees Measures of Outcomes: Adherence to engineering processes and standards (coding standards) Adherence to schedule / timelines Adhere to SLAs where applicable # of defects post delivery # of non-compliance issues Reduction of reoccurrence of known defects Quickly turnaround production bugs Meet the Defined productivity standards for project Completion of applicable technical/domain certifications Completion of all mandatory training requirements Outputs Expected: Configure: Follow configuration process Test: Create and conduct unit testing Domain relevance: Develop features and components with good understanding of the business problem being addressed for the client Manage Defects: Raise fix retest defects Estimate: Estimate time effort and resource dependence for one's own work Mentoring: Mentor junior developers in the team Set FAST goals and provide feedback to FAST goals of mentees Document: Create documentation for one's own work Manage knowledge: Consume and contribute to project related documents share point libraries and client universities Status Reporting: Report status of tasks assigned Comply with project related reporting standards/process Release: Adhere to release management process Design: Understand the design/LLD and link it to requirements/user stories Code: Develop code with guidance for the above Skill Examples: Explain and communicate the design / development to the customer Perform and evaluate test results against product specifications Develop user interfaces business software components and embedded software components Manage and guarantee high levels of cohesion and quality Use data models Estimate effort time required for own work Perform and evaluate tests in the customers or target environments Team player Good written and verbal communication abilities Proactively ask for and offer help Knowledge Examples: Appropriate software programs / modules Technical designing Programming languages DBMS Operating Systems and software platforms Integrated development environment (IDE) Agile methods Knowledge of customer domain and sub domain where problem is solved Additional Comments: Responsibilities and Skills - Manage incident response, root cause analysis, and ensure high system availability. - Oversee support for Hadoop, Spark, Hive, PySpark, Snowflake, and AWS EMR. - Maintain Python Flask APIs, Scala applications, and Airflow workflows. - Optimize SQL/HQL queries and manage shell/bash scripts. - Develop monitoring and ing systems, and provide detailed reporting. - 3+ years in production support/data engineering, with team leadership. - Expertise in Hadoop, Spark, Hive, PySpark, SQL, HQL, Python, Scala, and Python Flask API. - Proficiency in Unix/Linux, shell/bash scripting, Snowflake, and AWS EMR. - Experience with Airflow and incident management. - Strong problem-solving and communication skills. Required Skills Python,Pyspark,Airflow

Posted 2 months ago

Apply

4 - 9 years

12 - 16 Lacs

Hyderabad

Work from Office

As Data Engineer, you will develop, maintain, evaluate and test big data solutions. You will be involved in the development of data solutions using Spark Framework with Python or Scala on Hadoop and AWS Cloud Data Platform Experienced in building data pipelines to Ingest, process, and transform data from files, streams and databases. Process the data with Spark, Python, PySpark, Scala, and Hive, Hbase or other NoSQL databases on Cloud Data Platforms (AWS) or HDFS Experienced in develop efficient software code for multiple use cases leveraging Spark Framework / using Python or Scala and Big Data technologies for various use cases built on the platform Experience in developing streaming pipelines Experience to work with Hadoop / AWS eco system components to implement scalable solutions to meet the ever-increasing data volumes, using big data/cloud technologies Apache Spark, Kafka, any Cloud computing etc Required education Bachelor's Degree Preferred education Master's Degree Required technical and professional expertise Total 5 - 7+ years of experience in Data Management (DW, DL, Data Platform, Lakehouse) and Data Engineering skills Minimum 4+ years of experience in Big Data technologies with extensive data engineering experience in Spark / Python or Scala. Minimum 3 years of experience on Cloud Data Platforms on AWS; Exposure to streaming solutions and message brokers like Kafka technologies. Experience in AWS EMR / AWS Glue / DataBricks, AWS RedShift, DynamoDB Good to excellent SQL skills Preferred technical and professional experience Certification in AWS and Data Bricks or Cloudera Spark Certified developers AWS S3 , Redshift , and EMR for data storage and distributed processing. AWS Lambda , AWS Step Functions , and AWS Glue to build serverless, event-driven data workflows and orchestrate ETL processes

Posted 2 months ago

Apply

8 - 13 years

12 - 22 Lacs

Gurugram

Work from Office

Data & Information Architecture Lead 8 to 15 years - Gurgaon Summary An Excellent opportunity for Data Architect professionals with expertise in Data Engineering, Analytics, AWS and Database. Location Gurgaon Your Future Employer : A leading financial services provider specializing in delivering innovative and tailored solutions to meet the diverse needs of our clients and offer a wide range of services, including investment management, risk analysis, and financial consulting. Responsibilities Design and optimize architecture of end-to-end data fabric inclusive of data lake, data stores and EDW in alignment with EA guidelines and standards for cataloging and maintaining data repositories Undertake detailed analysis of the information management requirements across all systems, platforms & applications to guide the development of info. management standards Lead the design of the information architecture, across multiple data types working closely with various business partners/consumers, MIS team, AI/ML team and other departments to design, deliver and govern future proof data assets and solutions Design and ensure delivery excellence for a) large & complex data transformation programs, b) small and nimble data initiatives to realize quick gains, c) work with OEMs and Partners to bring the best tools and delivery methods. Drive data domain modeling, data engineering and data resiliency design standards across the micro services and analytics application fabric for autonomy, agility and scale Requirements Deep understanding of the data and information architecture discipline, processes, concepts and best practices Hands on expertise in building and implementing data architecture for large enterprises Proven architecture modelling skills, strong analytics and reporting experience Strong Data Design, management and maintenance experience Strong experience on data modelling tools Extensive experience in areas of cloud native lake technologies e.g. AWS Native Lake Solution onsibilities

Posted 2 months ago

Apply

5 - 10 years

20 - 35 Lacs

Hyderabad, Pune, Bengaluru

Hybrid

EPAM has presence across 40+ countries globally with 55,000 + professionals & numerous delivery centers, Key locations are North America, Eastern Europe, Central Europe, Western Europe, APAC, Mid East & Development Centers in India (Hyderabad, Pune & Bangalore). Location: Gurgaon/Pune/Hyderabad/Bengaluru/Chennai Work Mode: Hybrid (2-3 days office in a week) Job Description: 5-14 Years of in Big Data & Data related technology experience Expert level understanding of distributed computing principles Expert level knowledge and experience in Apache Spark Hands on programming with Python Proficiency with Hadoop v2, Map Reduce, HDFS, Sqoop Experience with building stream-processing systems, using technologies such as Apache Storm or Spark-Streaming Good understanding of Big Data querying tools, such as Hive, and Impala Experience with integration of data from multiple data sources such as RDBMS (SQL Server, Oracle), ERP, Files Good understanding of SQL queries, joins, stored procedures, relational schemas Experience with NoSQL databases, such as HBase, Cassandra, MongoDB Knowledge of ETL techniques and frameworks Performance tuning of Spark Jobs Experience with native Cloud data services AWS/Azure Ability to lead a team efficiently Experience with designing and implementing Big data solutions Practitioner of AGILE methodology WE OFFER Opportunity to work on technical challenges that may impact across geographies Vast opportunities for self-development: online university, knowledge sharing opportunities globally, learning opportunities through external certifications Opportunity to share your ideas on international platforms Sponsored Tech Talks & Hackathons Possibility to relocate to any EPAM office for short and long-term projects Focused individual development Benefit package: • Health benefits, Medical Benefits• Retirement benefits• Paid time off• Flexible benefits Forums to explore beyond work passion (CSR, photography, painting, sports, etc

Posted 2 months ago

Apply
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Featured Companies