Jobs
Interviews

280 Apache Spark Jobs - Page 9

Setup a job Alert
JobPe aggregates results for easy application access, but you actually apply on the job portal directly.

5.0 - 8.0 years

15 - 25 Lacs

Kolkata, Chennai, Bengaluru

Hybrid

Global Gen AI Developer Enabling a software-defined, electrified future. Visteon is a technology company that develops and builds innovative digital cockpit and electrification products at the leading-edge of the mobility revolution. Founded in 2000, Visteon brings decades of automotive intelligence combined with Silicon Valley speed to apply global insights that help transform the software-defined vehicle of the future for many of the worlds largest OEMs. The company employs 10,000 employees in 18 countries around the globe. To know more about us click here. Mission of the Role: Facilitate Enterprise machine learning and artificial intelligence solutions using the latest technologies Visteon is adopting globally. Key Objectives of this Role: The primary goal of the Global ML/AI Developer is to leverage advanced machine learning and artificial intelligence techniques to develop innovative solutions that drive Visteons strategic initiatives. By collaborating with cross-functional teams and stakeholders, this role identifies opportunities for AI-driven improvements, designs and implements scalable ML models, and integrates these models into existing systems to enhance operational efficiency. Following development best practices, fostering a culture of continuous learning, and staying abreast of AI advancements, the Global ML/AI Developer ensures that all AI solutions align with organizational goals, support data-driven decision-making, and continuously improve Visteons technological capabilities. Qualification, Experience and Skills: 6-8 Yrs Technical Skills: Expertise in machine learning frameworks (e.g., TensorFlow, PyTorch), programming languages (e.g., Python, R, SQL), and data processing tools (e.g., Apache Spark, Hadoop). Proficiency in developing, training, and deploying ML models, including supervised and unsupervised learning, deep learning, and reinforcement learning. Strong understanding of data engineering concepts, including data preprocessing, feature engineering, and data pipeline development. Experience with cloud platforms (preferably Microsoft Azure) for deploying and scaling ML solutions. Business Acumen : Strong business analysis and ability to translate complex technical concepts into actionable business insights and recommendations. Key Behaviors: Innovation: Continuously seeks out new ideas, technologies, and methodologies to improve AI/ML solutions and drive the organization forward. Attention to Detail: Pays close attention to all aspects of the work, ensuring accuracy and thoroughness in data analysis, model development, and documentation. Effective Communication: Clearly and effectively communicates complex technical concepts to non-technical stakeholders, ensuring understanding and alignment across the organization.

Posted 1 month ago

Apply

5.0 - 10.0 years

25 - 35 Lacs

Chennai

Hybrid

Data Software Engineer Job Description: 1. 5-12 Years of in Big Data & Data related technology experience 2. Expert level understanding of distributed computing principles 3. Expert level knowledge and experience in Apache Spark 4. Hands on programming with Python 5. Proficiency with Hadoop v2, Map Reduce, HDFS, Sqoop 6. Experience with building stream-processing systems, using technologies such as Apache Storm or Spark-Streaming 7. Experience with messaging systems, such as Kafka or RabbitMQ 8. Good understanding of Big Data querying tools, such as Hive, and Impala 9. Experience with integration of data from multiple data sources such as RDBMS (SQL Server, Oracle), ERP, Files 10. Good understanding of SQL queries, joins, stored procedures, relational schemas 11. Experience with NoSQL databases, such as HBase, Cassandra, MongoDB 12. Knowledge of ETL techniques and frameworks 13. Performance tuning of Spark Jobs 14. Experience with native Cloud data services AWS or AZURE Databricks or GCP 15. Ability to lead a team efficiently 16. Experience with designing and implementing Big data solutions 17. Practitioner of AGILE methodology

Posted 1 month ago

Apply

10.0 - 15.0 years

12 - 17 Lacs

Bengaluru

Work from Office

Position Purpose The Regulatory Reporting team is ramping up to about 20 members to address the ambitious project of standardizing the activity for APAC region. This long-term project involves the reporting of Balance Sheet, P&L, Capital and a more recent and intense focus on liquidity, especially Basel III LCR requirement. Our team captures reporting requirements from the users located all across APAC, comes up with efficient propositions, implements solution and follows up to guarantee user comfort on the long run. To operate efficiently, the team is constituted of complementing profiles: business analysts understand the logic from the users and develop the reports, data managers are the reference for data sourcing, SMEs develop complex logic components and enhance the solution framework and project managers orchestrate the work and communication on the progress. Collective discussions on the design/strategy and agile approach to project ensure the best compromise between long term robustness and immediate usability of the solution. This position is for the Team Leader/Manager to work on business solutions and contribute on people management Responsibilities Direct Responsibilities Regional Regulatory Reporting Project : Hands on in developing both FE (Angular/Typescript) and BE (SpringBoot/Java) Having good understanding of Database concepts (Oracle) Growth mindset and knowledge of SDLC cycles Experience working with onshore team (would raise key questions/challenges and willingness to learn) Strong communication skill with proactive attitude Organized, Flexible and able to prioritize team success and point out issues and train team on their weakness Strong in writing tests for FE and BE, follows TDD closely Assist in defining platform architectures Design and develop robust, performant software components to support regulatory reporting platform Maintain / build the platform following coding / quality standards and technology strategy of BNPP Assess risk and impact to AS-IS and TO-BE prior implementation Prepare documentation and knowledge sharing to the team Regional Regulatory Reporting Project : Participate in code review & improvements, testing, support To support system testing (e.g. SIT & UAT & STG) To prepare system documentation (e.g. technical/ function spec, user guide) Provide support to resolve production incident System Enhancement & Production Support (secondary responsibility) Provide support to resolve production incident raised by end user or by IT Identify the root cause of the incident by applying sound error tracing technics To help Program Manager by providing expertise on the complexity of proposed design, estimation on workload/timelines for our team as well as other contributors, insight on issues faced and related remediation To provide support to users and APS on incidents and user queries for timely resolution Contributing Responsibilities Engage Regional Stakeholders ensure operational objectives & oversight Establish processes, governance & analysis, work practices to achieve objectives and ensure delivery Review activities & work performed by team members Evaluate, establish & execute controls on the functional & technical processes Team management & development - capacity planning, hiring, work allocation, monitoring, skill evaluation, development, training, coaching Anticipate changes to business processes and demands from stakeholders and events Set-up processes and ensure compliance with OPC, Risk Management, Quality requirements Technical & Behavioral Competencies Knowledge of Functional Banking Experience Technical (Mandatory) FRONT END (Angular/Typescript) BACK END (SpringBoot/Java) ORACLE (Good to have) Technical Apache SPARK Functional Financial Markets Banking Fundamentals Regulatory Reporting Specific Qualifications (if required) Skills Referential Behavioural Skills : (Please select up to 4 skills) Ability to collaborate / Teamwork Organizational skills Ability to synthetize / simplify Client focused Transversal Skills: (Please select up to 5 skills) Ability to manage a project Ability to inspire others & generate people's commitment Ability to develop others & improve their skills Ability to set up relevant performance indicators Ability to manage / facilitate a meeting, seminar, committee, training Education Level: Bachelor Degree or equivalent Experience Level At least 10 years Other/Specific Qualifications (if required)

Posted 1 month ago

Apply

10.0 - 15.0 years

12 - 18 Lacs

Bengaluru

Work from Office

Position Purpose The role is split between architectural / project-based tasks and third-line support of infrastructure hosting CIB SSC Data services. The team is responsible for the CIB SSC Data platform across EMEA, AMER and APAC with an application support team (APS) based in Chennai and Mumbai, India acting as a single, global team. Responsibilities Direct Responsibilities To provide architectural expertise thereby driving change and improvement in technology and process. To contribute to the development of the Data technology roadmap. To liaise with CIB architecture teams in undertaking architectural reviews. To stay abreast of emerging technologies and technology trends. To work with Project Managers in defining, designing, documenting and deploying new functionality for existing and new applications. To liaise with the Application Developers in low level diagnosis of on-going issues. To provide subject matter expertise on core technologies to all business application teams. To contribute to the improvement of process and technology used within BNP Paribas. Problem Management. Change Management. Incident Management. Contributing Responsibilities Perform pre-assigned tasks to accomplish the function responsible for. o The nature of these tasks are Change the Bank or Run the Bank. Work cooperatively with the other members of the team. Ensure adherence to processes and procedures. Request improvement of knowledge (training) when needed. Apply own initiative, within the levels of acceptable risk. Whenever in doubt, escalate and seek advice and guidance. Apply the guidelines and principles of user service mentality and behaviour. Escalate risks / issues to the manager of the team. Minimise operational failure, including but not exclusively, the risk of fraud, by helping to devise, and by implementing, sufficient regular controls. Ensure appropriate escalation to management and/or Permanent Control (or Compliance as appropriate) as soon as an issue is identified. Provide a direct contribution to the BNPP operational permanent control framework. Technical & Behavioral Competencies Essential Strong working knowledge on Linux and Windows systems. Experience in installation, configuration, documentation and administration of multiple pre-production and production platforms in both virtual and physical environments Strong DevOps mentality (Ansible, Artifactory, BitBucket/Git, Docker, Kubernetes, Apache Spark, Jenkins) Excellent security knowledge vulnerability management, hardening, web access firewall requirements, DMZ architecture Web infrastructure technologies (load-balancing, firewalls, proxies, networking, SSO, LDAP, SAML, Kerberos) Great understanding of ETL process Experience in scheduling tools such as AutoSys, Argo etc Server and application monitoring, Dynatrace, ELK Nice to have Skills: Data Visualisation tools: o Tableau o Power BI DataIKU, Quantexa, Python MS SQL Server, Oracle, Mongo Infrastructure standards for servers, networks and storage Good understanding of Web Services model Previously used ServiceNow as a Service Desk Management product Knowledge of Process & Quality management, ITIL, Agile principles including SCRUM and Kanban Personal Attributes Strong attention to detail Structured and methodical Very strong analytical skills Team oriented, enjoy adding good team spirit Clear communicator in both written and oral forms Ability to operate with demanding Senior IT Management A can do mind set Good interpersonal / communication skills Specific Qualifications (if required) Bachelors degree in Engineering or MCA Skills Referential Behavioural Skills : (Please select up to 4 skills) Ability to collaborate / Teamwork Ability to deliver / Results driven Attention to detail / rigor Decision Making Transversal Skills: (Please select up to 5 skills) Analytical Ability Ability to develop others & improve their skills Ability to develop and adapt a process Ability to develop and leverage networks Ability to understand, explain and support change Education Level: Bachelor Degree or equivalent Experience Level At least 10 years Other/Specific Qualifications (if required) Any relevant certification matching expected skill set

Posted 1 month ago

Apply

4.0 - 7.0 years

8 - 12 Lacs

Bengaluru

Work from Office

As Data Scientist, you will navigate uncharted territories with us, discovering new paths to creating solutions for our users.?? You will be at the forefront of interesting challenges and solve unique customer problems in an untapped market. But wait theres more to us. Our team is huge on having a well-rounded personal and professional life. When we aren't nose-deep in data, you will most likely find us belting Summer of 69 at the nearest Karaoke bar, or debating who the best Spider-Man is: Maguire, Garfield, or Holland? You tell us ?? About the Role Love deep data? Love discussing solutions instead of problems? Then you could be our next Data Scientist. In a nutshell, your primary responsibility will be enhancing the productivity and utilisation of the generated data. Other things you will do include working closely with the business stakeholders, transforming scattered pieces of information into valuable data and sharing and presenting your valuable insights with peers. What you will do Develop models and run experiments to infer insights from hard data Improve our product usability and identify new growth opportunities Understand reseller preferences to provide them with the most relevant products Designing discount programs to help our resellers sell more Help resellers better recognise end-customer preferences to improve their revenue Use data to identify bottlenecks that will help our suppliers meet their SLA requirements Model seasonal demand to predict key organisational metrics Mentor junior data scientists in the team What you will need Bachelor's/Master's degree in computer science (or similar degrees) 4-7 years of experience as a Data Scientist in a fast-paced organization, preferably B2C Familiarity with Neural Networks, Machine Learning etc. Familiarity with tools like SQL, R, Python, etc. Strong understanding of Statistics and Linear Algebra Strong understanding of hypothesis/model testing and ability to identify common model testing errors Experience designing and running A/B tests and drawing insights from them Proficiency in machine learning algorithms Excellent analytical skills to fetch data from reliable sources to generate accurate insights Experience in tech and product teams is a plus Bonus points for: Experience in working on personalization or other ML problems Familiarity with Big Data tech stacks like Apache Spark, Hadoop, Redshift

Posted 1 month ago

Apply

5.0 - 9.0 years

10 - 20 Lacs

Hyderabad, Chennai, Bengaluru

Hybrid

Role & responsibilities Must have excellent knowledge in Apache Spark and Python programming experience Deep experience in developing data processing tasks using PySpark such as reading data from external sources, merge data, perform data enrichment and load in to target data destinations. Experience in deployment and operationalizing the code, knowledge of scheduling tools like Airflow, Control-M etc. is preferred Working experience on Cloud technology architecture like AWS ecosystem, Google Cloud, BigQuery etc. is an added advantage Understanding of Unix/Linux + Shell Scripting Data modelling experience using advanced statistical analysis,unstructured data processing Experience with building APIs for provisioning data to downstream systems by leveraging different frameworks. Hands on project experience on Jupyter notebook/ Zeppelin/ PyCharm etc. IDEs Hands on experience with AWS S3 Filesystem operations Good knowledge of Hadoop, Hive and Cloudera/ Hortonworks Data Platform Experience handling CDC operations for huge volume of data Should understand and have operating experience with Agile delivery methodologies Should have hands-on experience in the following: data validation, writing unit test cases Should have experience in integrating PySpark with downstream and upstream applications through a batch/real-time interface Should have experience in fine tuning process and troubleshooting performance issues Should have demonstrated expertise in development of design documents like HLD, LLD etc. Should have experience in leading requirements gathering and developing solution architecture for Data migration/integration initiatives Should have experience in handling client interactions at different phases of the projects Should have experience in leading a team in a project or a module Should be well versed with onsite/offshore model and its challenges.

Posted 1 month ago

Apply

5.0 - 8.0 years

8 - 16 Lacs

Bengaluru, Delhi / NCR, Mumbai (All Areas)

Work from Office

Must have skills : Apache Spark Good to have skills : NA Educational Qualification : minimum 15 years of full time education Share CV on - neha.mandal@mounttalent.com Project Role : Application Developer Project Role Description : Design, build and configure applications to meet business process and application requirements. Must have Skills : Apache Spark Good to Have Skills : Job Requirements : Key Responsibilities : As a Software Development Engineer you will be responsible for analyzing designing coding and testing multiple components of application code using Apache Spark across one or more clients Your typical day will involve performing maintenance enhancements andor development work using Google BigQuery Python and PySpark Technical Experience : Design develop and maintain Apache Spark applications using Google BigQuery Python and PySpark\nAnalyze design code and test multiple components of application code across one or more clients\nPerform maintenance enhancements andor development work using Apache Spark\nCollaborate with crossfunctional teams to identify and resolve technical issues and ensure timely delivery of highquality software solutions Professional Attributes : Proficiency in Apache Spark\nExperience with Google BigQuery Python and PySpark\nStrong understanding of software engineering principles and best practices\nExperience with software development methodologies such as Agile and Scrum Educational Qualification: minimum 15 years of full time education Additional Information : minimum 15 years of full time education

Posted 1 month ago

Apply

10.0 - 12.0 years

12 - 14 Lacs

Hyderabad

Work from Office

About the Roe: Grade Leve (for interna use): 11 The Team: Our team is responsibe for the design, architecture, and deveopment of our cient facing appications using a variety of toos that are reguary updated as new technoogies emerge. You wi have the opportunity every day to work with peope from a wide variety of backgrounds and wi be abe to deveop a cose team dynamic with coworkers from around the gobe. The Impact: The work you do wi be used every singe day, its the essentia code you write that provides the data and anaytics required for crucia, daiy decisions in the capita and commodities markets. Whats in it for you: Buid a career with a goba company. Work on code that fues the goba financia markets. Grow and improve your skis by working on enterprise eve products and new technoogies. Responsibiities: Sove probems, anayze and isoate issues.Provide technica guidance and mentoring to the team and hep them adopt change as new processes are introduced.Champion best practices and serve as a subject matter authority.Deveop soutions to deveop/support key business needs.Engineer components and common services based on standard deveopment modes, anguages and toosProduce system design documents and ead technica wakthroughsProduce high quaity codeCoaborate effectivey with technica and non-technica partnersAs a team-member shoud continuousy improve the architecture Basic Quaifications: 10-12 years of experience designing/buiding data-intensive soutions using distributed computing.Proven experience in impementing and maintaining enterprise search soutions in arge-scae environments.Experience working with business stakehoders and users, providing research direction and soution design and writing robust maintainabe architectures and APIs.Experience deveoping and depoying Search soutions in a pubic coud such as AWS.Proficient programming skis at a high-eve anguages -Java, Scaa, PythonSoid knowedge of at east one machine earning research frameworksFamiiarity with containerization, scripting, coud patforms, and CI/CD.5+ years experience with Python, Java, Kubernetes, and data and workfow orchestration toos4+ years experience with Easticsearch, SQL, NoSQL,Apache spark, Fink, Databricks and Mfow.Prior experience with operationaizing data-driven pipeines for arge scae batch and stream processing anaytics soutionsGood to have experience with contributing to GitHub and open source initiatives or in research projects and/or participation in Kagge competitionsAbiity to quicky, efficienty, and effectivey define and prototype soutions with continua iteration within aggressive product deadines.Demonstrate strong communication and documentation skis for both technica and non-technica audiences. Preferred Quaifications: Search TechnoogiesQuery and Indexing content for Apache Sor, Eastic Search, etc.Proficiency in search query anguages (e.g., Lucene Query Syntax) and experience with data indexing and retrieva.Experience with machine earning modes and NLP techniques for search reevance and ranking.Famiiarity with vector search techniques and embedding modes (e.g., BERT, Word2Vec).Experience with reevance tuning using A/B testing frameworks.Big Data TechnoogiesApache Spark, Spark SQL, Hadoop, Hive, AirfowData Science Search TechnoogiesPersonaization and Recommendation modes, Learn to Rank (LTR)Preferred LanguagesPython, JavaDatabase TechnoogiesMS SQL Server patform, stored procedure programming experience using Transact SQL.Abiity to ead, train and mentor. About S&P Goba Market Inteigence At S&P Goba Market Inteigence, a division of S&P Goba we understand the importance of accurate, deep and insightfu information. Our team of experts deivers unrivaed insights and eading data and technoogy soutions, partnering with customers to expand their perspective, operate with confidence, andmake decisions with conviction.For more information, visit . Whats In It For You Our Purpose: Progress is not a sef-starter. It requires a catayst to be set in motion. Information, imagination, peope, technoogythe right combination can unock possibiity and change the word.Our word is in transition and getting more compex by the day. We push past expected observations and seek out new eves of understanding so that we can hep companies, governments and individuas make an impact on tomorrow. At S&P Goba we transform data into Essentia Inteigence, pinpointing risks and opening possibiities. We Acceerate Progress.

Posted 1 month ago

Apply

5.0 - 7.0 years

16 - 27 Lacs

Bengaluru

Work from Office

We're Nagarro. We are a Digital Product Engineering company that is scaling in a big way! We build products, services, and experiences that inspire, excite, and delight. We work at scale across all devices and digital mediums, and our people exist everywhere in the world (18000+ experts across 38 countries, to be exact). Our work culture is dynamic and non-hierarchical. We're looking for great new colleagues. That's where you come in! REQUIREMENTS: Total experience 5+ Years. Excellent knowledge and experience in Big data engineer. Strong hands-on experience with Apache Spark and Python. Solid experience with Hadoop, MapReduce , Hive, and SQL-lik e languages. Familiarity with GCP Pub/Sub, Kafka, and Trino. Experience building end-to-end data pipelines and integrating various data sources. Understanding of both relational (e.g., PostgreSQL) and NoSQL (e.g. , MongoDB) databases. Experience with Git, CI/CD tools, and Agile development practices. Experience working on Google Cloud Platform (GCP), particularly with BigQuery, GCS, Airflow, and Kubernetes. Excellent problem-solving and analytical skills. Strong verbal and written communication abilities. RESPONSIBILITIES: Writing and reviewing great quality code Understanding the client's business use cases and technical requirements and be able to convert them into technical design which elegantly meets the requirements Mapping decisions with requirements and be able to translate the same to developers Identifying different solutions and being able to narrow down the best option that meets the clients' requirements Defining guidelines and benchmarks for NFR considerations during project implementation Writing and reviewing design document explaining overall architecture, framework, and high-level design of the application for the developers Reviewing architecture and design on various aspects like extensibility, scalability, security, design patterns, user experience, NFRs, etc., and ensure that all relevant best practices are followed Developing and designing the overall solution for defined functional and non-functional requirements; and defining technologies, patterns, and frameworks to materialize it Understanding and relating technology integration scenarios and applying these learnings in projects Resolving issues that are raised during code/review, through exhaustive systematic analysis of the root cause, and being able to justify the decision taken Carrying out POCs to make sure that suggested design/technologies meet the requirements.

Posted 1 month ago

Apply

2.0 - 5.0 years

4 - 7 Lacs

Hyderabad, Bengaluru

Hybrid

Key Skills & Responsibilities Hands-on experience with AWS services: S3, Lambda, Glue, API Gateway, and SQS. Strong data engineering expertise on AWS, with proficiency in Python, PySpark, and SQL. Experience in batch job scheduling and managing data dependencies across pipelines. Familiarity with data processing tools such as Apache Spark and Airflow. Ability to automate repetitive tasks and build reusable frameworks for improved efficiency. Provide RunOps DevOps support, and manage the ongoing operation and monitoring of data services. Ensure high performance, scalability, and reliability of data workflows in cloud environments. Skills: aws,s3,glue,apache spark,lambda,airflow,sql,s3, lambda, glue, api gateway, and sqs,api gateway,pyspark,sqs,python,devops support

Posted 1 month ago

Apply

5.0 - 10.0 years

8 - 14 Lacs

Navi Mumbai

Work from Office

Data Strategy and Planning: Develop and implement data architecture strategies that align with organizational goals and objectives. Collaborate with business stakeholders to understand data requirements and translate them into actionable plans. Data Modeling: Design and implement logical and physical data models to support business needs. Ensure data models are scalable, efficient, and comply with industry best practices. Database Design and Management: Oversee the design and management of databases, selecting appropriate database technologies based on requirements. Optimize database performance and ensure data integrity and security. Data Integration: Define and implement data integration strategies to facilitate seamless flow of information across. Responsibilities: Experience in data architecture and engineering Proven expertise with Snowflake data platform Strong understanding of ETL/ELT processes and data integration Experience with data modeling and data warehousing concepts Familiarity with performance tuning and optimization techniques Excellent problem-solving skills and attention to detail Strong communication and collaboration skills Required education Bachelor's Degree Preferred education Master's Degree Required technical and professional expertise Cloud & Data Architecture: AWS , Snowflake ETL & Data Engineering: AWS Glue, Apache Spark, Step Functions Big Data & Analytics: Athena,Presto, Hadoop Database & Storage: SQL, Snow sql Security & Compliance: IAM, KMS, Data Masking Preferred technical and professional experience Cloud Data Warehousing: Snowflake (Data Modeling, Query Optimization) Data Transformation: DBT (Data Build Tool) for ELT pipeline management Metadata & Data Governance: Alation (Data Catalog, Lineage, Governance

Posted 1 month ago

Apply

4.0 - 8.0 years

20 - 32 Lacs

Hyderabad, Gurugram

Work from Office

• Designing, developing & deploying cloud-based data platforms using (AWS) • Integrating & processing structured & unstructured data from various sources • Troubleshooting data platform issues Watsapp (ANUJ - 8249759636) for more details.

Posted 1 month ago

Apply

3.0 - 8.0 years

5 - 10 Lacs

Mumbai

Work from Office

This position participates in the support of batch and real-time data pipelines utilizing various data analytics processing frameworks in support of data science practices for Marketing and Finance business units. This position supports the integration of data from various data sources, as well as performs extract, transform, load (ETL) data conversions, and facilitates data cleansing and enrichment. This position performs full systems life cycle management activities, such as analysis, technical requirements, design, coding, testing, implementation of systems and applications software. This position participates and contributes to synthesizing disparate data sources to support reusable and reproducible data assets. Responsibilities Supervises and supports data engineering projects and builds solutions by leveraging a strong foundational knowledge in software/application development. Develops and delivers data engineering documentation. Gathers requirements, defines the scope, and performs the integration of data for data engineering projects. Recommends analytic reporting products/tools and supports the adoption of emerging technology. Performs data engineering maintenance and support. Provides the implementation strategy and executes backup, recovery, and technology solutions to perform analysis. Performs ETL tool capabilities with the ability to pull data from various sources and perform a load of the transformed data into a database or business intelligence platform. Required Qualifications Codes using programming language used for statistical analysis and modeling such as Python/Java/Scala/C# Strong understanding of database systems and data warehousing solutions. Strong understanding of the data interconnections between organizations operational and business functions. Strong understanding of the data life cycle stages - data collection, transformation, analysis, storing the data securely, providing data accessibility Strong understanding of the data environment to ensure that it can scale for the following demands: Throughput of data, increasing data pipeline throughput, analyzing large amounts of data, Real-time predictions, insights and customer feedback, data security, data regulations, and compliance. Strong knowledge of data structures, as well as data filtering and data optimization. Strong understanding of analytic reporting technologies and environments (e.g., PBI, Looker, Qlik, etc.) Strong understanding of a cloud services platform (e.g., GCP, or AZURE, or AWS) and all the data life cycle stages. Azure Preferred. Understanding of distributed systems and the underlying business problem being addressed, as well as guides team members on how their work will assist by performing data analysis and presenting findings to the stakeholders. Bachelors degree in MIS, mathematics, statistics, or computer science, international equivalent, or equivalent job experience. Required Skills 3 years of experience with Databricks Other required experience includes: SSIS/SSAS, Apache Spark, Python, R and SQL, SQL Server Preferred Skills Scala, DeltaLake Unity Catalog, Azure Logic Apps, Cloud Services Platform (e.g., GCP, or AZURE, or AWS)

Posted 1 month ago

Apply

3.0 - 8.0 years

6 - 14 Lacs

Gurugram

Work from Office

The ideal candidate will have strong expertise in Python, Apache Spark, and Databricks, along with experience in machine learning Data Engineer

Posted 1 month ago

Apply

3.0 - 6.0 years

3 - 7 Lacs

Hyderabad, Bengaluru

Hybrid

Locations : Hyderabad & Bangalore Work Mode: Hybrid Interview Mode: Virtual (2 Rounds) Type: Contract-to-Hire (C2H) Key Skills & Responsibilities Hands-on experience with AWS services: S3, Lambda, Glue, API Gateway, and SQS. Strong data engineering expertise on AWS, with proficiency in Python, PySpark, and SQL. Experience in batch job scheduling and managing data dependencies across pipelines. Familiarity with data processing tools such as Apache Spark and Airflow. Ability to automate repetitive tasks and build reusable frameworks for improved efficiency. Provide RunOps / DevOps support, and manage the ongoing operation and monitoring of data services. Ensure high performance, scalability, and reliability of data workflows in cloud environments. Skills: aws,s3,glue,apache spark,lambda,airflow,sql,s3, lambda, glue, api gateway, and sqs,api gateway,pyspark,sqs,python,devops support

Posted 1 month ago

Apply

6.0 - 9.0 years

4 - 7 Lacs

Pune

Hybrid

Work Mode: Hybrid Interview Mode: Virtual (2 Rounds) Type: Contract-to-Hire (C2H) Job Summary We are looking for a skilled PySpark Developer with hands-on experience in building scalable data pipelines and processing large datasets. The ideal candidate will have deep expertise in Apache Spark , Python , and working with modern data engineering tools in cloud environments such as AWS . Key Skills & Responsibilities Strong expertise in PySpark and Apache Spark for batch and real-time data processing. Experience in designing and implementing ETL pipelines, including data ingestion, transformation, and validation. Proficiency in Python for scripting, automation, and building reusable components. Hands-on experience with scheduling tools like Airflow or Control-M to orchestrate workflows. Familiarity with AWS ecosystem, especially S3 and related file system operations. Strong understanding of Unix/Linux environments and Shell scripting. Experience with Hadoop, Hive, and platforms like Cloudera or Hortonworks. Ability to handle CDC (Change Data Capture) operations on large datasets. Experience in performance tuning, optimizing Spark jobs, and troubleshooting. Strong knowledge of data modeling, data validation, and writing unit test cases. Exposure to real-time and batch integration with downstream/upstream systems. Working knowledge of Jupyter Notebook, Zeppelin, or PyCharm for development and debugging. Understanding of Agile methodologies, with experience in CI/CD tools (e.g., Jenkins, Git). Preferred Skills Experience in building or integrating APIs for data provisioning. Exposure to ETL or reporting tools such as Informatica, Tableau, Jasper, or QlikView. Familiarity with AI/ML model development using PySpark in cloud environments Skills: ci/cd,zeppelin,pycharm,pyspark,etl tools,control-m,unit test cases,tableau,performance tuning,jenkins,qlikview,informatica,jupyter notebook,api integration,unix/linux,git,aws s3,hive,cloudera,jasper,airflow,cdc,pyspark, apache spark, python, aws s3, airflow/control-m, sql, unix/linux, hive, hadoop, data modeling, and performance tuning,agile methodologies,aws,s3,data modeling,data validation,ai/ml model development,batch integration,apache spark,python,etl pipelines,shell scripting,hortonworks,real-time integration,hadoop

Posted 1 month ago

Apply

6.0 - 10.0 years

4 - 8 Lacs

Bengaluru

Hybrid

Role: PySpark Developer Work Mode: Hybrid Interview Mode: Virtual (2 Rounds) Type: Contract-to-Hire (C2H) Job Summary We are looking for a skilled PySpark Developer with hands-on experience in building scalable data pipelines and processing large datasets. The ideal candidate will have deep expertise in Apache Spark , Python , and working with modern data engineering tools in cloud environments such as AWS . Key Skills & Responsibilities Strong expertise in PySpark and Apache Spark for batch and real-time data processing. Experience in designing and implementing ETL pipelines, including data ingestion, transformation, and validation. Proficiency in Python for scripting, automation, and building reusable components. Hands-on experience with scheduling tools like Airflow or Control-M to orchestrate workflows. Familiarity with AWS ecosystem, especially S3 and related file system operations. Strong understanding of Unix/Linux environments and Shell scripting. Experience with Hadoop, Hive, and platforms like Cloudera or Hortonworks. Ability to handle CDC (Change Data Capture) operations on large datasets. Experience in performance tuning, optimizing Spark jobs, and troubleshooting. Strong knowledge of data modeling, data validation, and writing unit test cases. Exposure to real-time and batch integration with downstream/upstream systems. Working knowledge of Jupyter Notebook, Zeppelin, or PyCharm for development and debugging. Understanding of Agile methodologies, with experience in CI/CD tools (e.g., Jenkins, Git). Preferred Skills Experience in building or integrating APIs for data provisioning. Exposure to ETL or reporting tools such as Informatica, Tableau, Jasper, or QlikView. Familiarity with AI/ML model development using PySpark in cloud environments Skills: ci/cd,zeppelin,pycharm,pyspark,etl tools,control-m,unit test cases,tableau,performance tuning,jenkins,qlikview,informatica,jupyter notebook,api integration,unix/linux,git,aws s3,hive,cloudera,jasper,airflow,cdc,pyspark, apache spark, python, aws s3, airflow/control-m, sql, unix/linux, hive, hadoop, data modeling, and performance tuning,agile methodologies,aws,s3,data modeling,data validation,ai/ml model development,batch integration,apache spark,python,etl pipelines,shell scripting,hortonworks,real-time integration,hadoop

Posted 1 month ago

Apply

8.0 - 12.0 years

12 - 22 Lacs

Hyderabad, Secunderabad

Work from Office

Proficiency in SQL, Python, and data pipeline frameworks such as Apache Spark, Databricks, or Airflow. Hands-on experience with cloud data platforms (e.g., Azure Synapse, AWS Redshift, Google BigQuery). Strong understanding of data modeling, ETL/ELT, and data lake/warehouse/ Datamart architectures. Knowledge on Data Factory or AWS Glue Experience in developing reports and dashboards using tools like Power BI, Tableau, or Looker.

Posted 1 month ago

Apply

7.0 - 12.0 years

0 - 0 Lacs

Kochi

Work from Office

Greetings from TCS Recruitment Team! Role: DATABRICKS LEAD/ DATABRICKS SOLUTION ARCHITECT/ DATABRICKS ML ENGINEER Years of experience: 7 to 18 Years Walk-In-Drive Location: Kochi Walk-in-Location Details: Tata Consultancy Services TCS Centre SEZ Unit, Infopark Kochi Phase 1, Infopark Kochi P.O, Kakkanad, Kochi - 682042, Kerala India Drive Time: 9 am to 1:00 PM Date: 21-Jun-25 Must have 5+ years of experience in data engineering or related fields At least 2-3 years of hands-on experience with Databricks (using Apache Spark, Delta Lake, etc.) Solid experience in working with big data technologies such as Hadoop, Spark, Kafka, or similar Experience with cloud platforms (AWS, Azure, or GCP) and cloud-native data tools Experience with machine learning frameworks and pipelines, particularly in Databricks. Experience with AI/ML model deployment, MLOps, and ML lifecycle management using Databricks and related tools.

Posted 1 month ago

Apply

5.0 - 10.0 years

8 - 16 Lacs

Bhubaneswar, Bengaluru, Delhi / NCR

Work from Office

Project Role : Application Developer Project Role Description : Design, build and configure applications to meet business process and application requirements. Must have skills : Apache Spark Good to have skills : Oracle Procedural Language Extensions to SQL (PLSQL) Minimum 5 year(s) of experience is required Educational Qualification : 15 years full time education Summary: As an Application Developer, you will design, build, and configure applications to meet business process and application requirements. You will be responsible for ensuring that the applications are developed and implemented efficiently and effectively, while meeting the needs of the organization. Your typical day will involve collaborating with the team, making team decisions, engaging with multiple teams, and providing solutions to problems for your immediate team and across multiple teams. You will also contribute to key decisions and provide expertise in application development. Roles & Responsibilities: - Expected to be an SME - Collaborate and manage the team to perform - Responsible for team decisions - Engage with multiple teams and contribute on key decisions - Provide solutions to problems for their immediate team and across multiple teams - Design, build, and configure applications to meet business process and application requirements - Ensure that applications are developed and implemented efficiently and effectively - Contribute expertise in application development Professional & Technical Skills: - Must To Have Skills: Proficiency in Apache Spark - Good To Have Skills: Experience with Oracle Procedural Language Extensions to SQL (PLSQL), Google BigQuery - Strong understanding of statistical analysis and machine learning algorithms - Experience with data visualization tools such as Tableau or Power BI - Hands-on implementing various machine learning algorithms such as linear regression, logistic regression, decision trees, and clustering algorithms - Solid grasp of data munging techniques, including data cleaning, transformation, and normalization to ensure data quality and integrity Additional Information: - The candidate should have a minimum of 5 years of experience in Apache Spark - This position is based at our Gurugram office - A 15 years full time education is required

Posted 1 month ago

Apply

12.0 - 22.0 years

30 - 45 Lacs

Chennai

Work from Office

Project Role Description : Design, build and configure applications to meet business process and application requirements. Must have skills : Apache Spark Good to have skills : Google BigQuery, PySpark Professional & Technical Skills: - Must To Have Skills: Proficiency in Apache Spark, PySpark, Google BigQuery. - Strong understanding of statistical analysis and machine learning algorithms. - Experience with data visualization tools such as Tableau or Power BI. - Hands-on implementing various machine learning algorithms such as linear regression, logistic regression, decision trees, and clustering algorithms. - Solid grasp of data munging techniques, including data cleaning, transformation, and normalization to ensure data quality and integrity. Additional Information: - The candidate should have a minimum of 12 years of experience in Apache Spark.

Posted 1 month ago

Apply

8.0 - 13.0 years

25 - 40 Lacs

Bengaluru

Work from Office

*Must-Have Skills:* * Azure Databricks / PySpark hands-on * SQL/PL-SQL advanced level * Snowflake – 2+ years * Spark/Data pipeline development – 2+ years * Azure Repos / GitHub, Azure DevOps * Unix Shell Scripting * Cloud technology experience *Key Responsibilities:* 1. *Design, build, and manage data pipelines using Azure Databricks, PySpark, and Snowflake. 2. *Analyze and resolve production issues (Tier 2 support with weekend/on-call rotation). 3. *Write and optimize complex SQL/PL-SQL queries. 4. *Collaborate on low-level and high-level design for data solutions. 5. *Document all project deliverables and support deployment. Good to Have: Knowledge of Oracle, Qlik Replicate, GoldenGate, Hadoop Job scheduler tools like Control-M or Airflow Behavioral: Strong problem-solving & communication skills

Posted 1 month ago

Apply

3.0 - 5.0 years

15 - 17 Lacs

Pune

Work from Office

Performance Testing Specialist Databricks Pipelines Key Responsibilities: - Design and execute performance testing strategies specifically for Databricks-based data pipelines. - Identify performance bottlenecks and provide optimization recommendations across Spark/Databricks workloads. - Collaborate with development and DevOps teams to integrate performance testing into CI/CD pipelines. - Analyze job execution metrics, cluster utilization, memory/storage usage, and latency across various stages of data pipeline processing. - Create and maintain performance test scripts, frameworks, and dashboards using tools like JMeter, Locust, or custom Python utilities. - Generate detailed performance reports and suggest tuning at the code, configuration, and platform levels. - Conduct root cause analysis for slow-running ETL/ELT jobs and recommend remediation steps. - Participate in production issue resolution related to performance and contribute to RCA documentation. Technical Skills: Mandatory - Strong understanding of Databricks, Apache Spark, and performance tuning techniques for distributed data processing systems. - Hands-on experience in Spark (PySpark/Scala) performance profiling, partitioning strategies, and job parallelization. - 2+ years of experience in performance testing and load simulation of data pipelines. - Solid skills in SQL, Snowflake, and analyzing performance via query plans and optimization hints. - Familiarity with Azure Databricks, Azure Monitor, Log Analytics, or similar observability tools. - Proficient in scripting (Python/Shell) for test automation and pipeline instrumentation. - Experience with DevOps tools such as Azure DevOps, GitHub Actions, or Jenkins for automated testing. - Comfortable working in Unix/Linux environments and writing shell scripts for monitoring and debugging. Good to Have - Experience with job schedulers like Control-M, Autosys, or Azure Data Factory trigger flows. - Exposure to CI/CD integration for automated performance validation. - Understanding of network/storage I/O tuning parameters in cloud-based environments.

Posted 1 month ago

Apply

4.0 - 9.0 years

25 - 32 Lacs

Ahmedabad

Remote

Key Responsibilities: Design and implement robust, scalable search architectures using Solr and Elasticsearch. Write, optimize, and maintain complex search queries (including full-text, faceted, fuzzy, geospatial, and nested queries) using Solr Query Parser and Elasticsearch DSL. Work with business stakeholders to understand search requirements and translate them into performant and accurate queries. Build and manage custom analyzers, tokenizers, filters, and index mappings/schemas tailored to domain-specific search needs. Develop and optimize indexing pipelines using Apache Spark for processing large-scale structured and unstructured datasets. Perform query tuning and search relevance optimization based on precision, recall, and user engagement metrics. Create and maintain query templates and search APIs for integration with enterprise applications. Monitor, troubleshoot, and improve search performance and infrastructure reliability. Conduct evaluations and benchmarking of search quality, query latency, and index refresh times. Required Skills and Qualifications: 4 to 5 years of hands-on experience with Apache Solr and/or Elasticsearch in production environments. Proven ability to write and optimize complex Solr queries (standard, dismax, edismax parsers) and Elasticsearch Query DSL, including: Full-text search with analyzers Faceted and filtered search Boolean and range queries Aggregations and suggesters Nested and parent/ child queries Strong understanding of indexing principles, Lucene internals, and relevance scoring mechanisms (BM25, TF-IDF). Proficiency with Apache Spark for custom indexing workflows and large-scale data processing. Experience with document parsing and extraction (JSON, XML, PDFs, etc.) for search indexing. Experience integrating search into web applications or enterprise software platforms.

Posted 1 month ago

Apply

9.0 - 12.0 years

7 - 12 Lacs

Hyderabad

Work from Office

Role Description: We are looking for highly motivated expert Senior Data Engineer who can own the design & development of complex data pipelines, solutions and frameworks. The ideal candidate will be responsible to design, develop, and optimize data pipelines, data integration frameworks, and metadata-driven architectures that enable seamless data access and analytics. This role prefers deep expertise in big data processing, distributed computing, data modeling, and governance frameworks to support self-service analytics, AI-driven insights, and enterprise-wide data management. Roles & Responsibilities: Design, develop, and maintain scalable ETL/ELT pipelines to support structured, semi-structured, and unstructured data processing across the Enterprise Data Fabric. Implement real-time and batch data processing solutions, integrating data from multiple sources into a unified, governed data fabric architecture. Optimize big data processing frameworks using Apache Spark, Hadoop, or similar distributed computing technologies to ensure high availability and cost efficiency. Work with metadata management and data lineage tracking tools to enable enterprise-wide data discovery and governance. Ensure data security, compliance, and role-based access control (RBAC) across data environments. Optimize query performance, indexing strategies, partitioning, and caching for large-scale data sets. Develop CI/CD pipelines for automated data pipeline deployments, version control, and monitoring. Implement data virtualization techniques to provide seamless access to data across multiple storage systems. Collaborate with cross-functional teams, including data architects, business analysts, and DevOps teams, to align data engineering strategies with enterprise goals. Stay up to date with emerging data technologies and best practices, ensuring continuous improvement of Enterprise Data Fabric architectures. Must-Have Skills: Hands-on experience in data engineering technologies such as Databricks, PySpark, SparkSQL Apache Spark, AWS, Python, SQL, and Scaled Agile methodologies. Proficiency in workflow orchestration, performance tuning on big data processing. Strong understanding of AWS services Experience with Data Fabric, Data Mesh, or similar enterprise-wide data architectures. Ability to quickly learn, adapt and apply new technologies Strong problem-solving and analytical skills Excellent communication and teamwork skills Experience with Scaled Agile Framework (SAFe), Agile delivery practices, and DevOps practices. Good-to-Have Skills: Good to have deep expertise in Biotech & Pharma industries Experience in writing APIs to make the data available to the consumers Experienced with SQL/NOSQL database, vector database for large language models Experienced with data modeling and performance tuning for both OLAP and OLTP databases Experienced with software engineering best-practices, including but not limited to version control (Git, Subversion, etc.), CI/CD (Jenkins, Maven etc.), automated unit testing, and Dev Ops Education and Professional Certifications 9 to 12 years of Computer Science, IT or related field experience AWS Certified Data Engineer preferred Databricks Certificate preferred Scaled Agile SAFe certification preferred Soft Skills: Excellent analytical and troubleshooting skills. Strong verbal and written communication skills Ability to work effectively with global, virtual teams High degree of initiative and self-motivation. Ability to manage multiple priorities successfully. Team-oriented, with a focus on achieving team goals. Ability to learn quickly, be organized and detail oriented. Strong presentation and public speaking skills.

Posted 1 month ago

Apply
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Featured Companies