Get alerts for new jobs matching your selected skills, preferred locations, and experience range.
1.0 - 3.0 years
3 - 5 Lacs
New Delhi, Chennai, Bengaluru
Hybrid
Your day at NTT DATA We are seeking an experienced Data Engineer to join our team in delivering cutting-edge Generative AI (GenAI) solutions to clients. The successful candidate will be responsible for designing, developing, and deploying data pipelines and architectures that support the training, fine-tuning, and deployment of LLMs for various industries. This role requires strong technical expertise in data engineering, problem-solving skills, and the ability to work effectively with clients and internal teams. What youll be doing Key Responsibilities: Design, develop, and manage data pipelines and architectures to support GenAI model training, fine-tuning, and deployment Data Ingestion and Integration: Develop data ingestion frameworks to collect data from various sources, transform, and integrate it into a unified data platform for GenAI model training and deployment. GenAI Model Integration: Collaborate with data scientists to integrate GenAI models into production-ready applications, ensuring seamless model deployment, monitoring, and maintenance. Cloud Infrastructure Management: Design, implement, and manage cloud-based data infrastructure (e.g., AWS, GCP, Azure) to support large-scale GenAI workloads, ensuring cost-effectiveness, security, and compliance. Write scalable, readable, and maintainable code using object-oriented programming concepts in languages like Python, and utilize libraries like Hugging Face Transformers, PyTorch, or TensorFlow Performance Optimization: Optimize data pipelines, GenAI model performance, and infrastructure for scalability, efficiency, and cost-effectiveness. Data Security and Compliance: Ensure data security, privacy, and compliance with regulatory requirements (e.g., GDPR, HIPAA) across data pipelines and GenAI applications. Client Collaboration: Collaborate with clients to understand their GenAI needs, design solutions, and deliver high-quality data engineering services. Innovation and R&D: Stay up to date with the latest GenAI trends, technologies, and innovations, applying research and development skills to improve data engineering services. Knowledge Sharing: Share knowledge, best practices, and expertise with team members, contributing to the growth and development of the team. Bachelors degree in computer science, Engineering, or related fields (Masters recommended) Experience with vector databases (e.g., Pinecone, Weaviate, Faiss, Annoy) for efficient similarity search and storage of dense vectors in GenAI applications 5+ years of experience in data engineering, with a strong emphasis on cloud environments (AWS, GCP, Azure, or Cloud Native platforms) Proficiency in programming languages like SQL, Python, and PySpark Strong data architecture, data modeling, and data governance skills Experience with Big Data Platforms (Hadoop, Databricks, Hive, Kafka, Apache Iceberg), Data Warehouses (Teradata, Snowflake, BigQuery), and lakehouses (Delta Lake, Apache Hudi) Knowledge of DevOps practices, including Git workflows and CI/CD pipelines (Azure DevOps, Jenkins, GitHub Actions) Experience with GenAI frameworks and tools (e.g., TensorFlow, PyTorch, Keras) Nice to have: Experience with containerization and orchestration tools like Docker and Kubernetes Integrate vector databases and implement similarity search techniques, with a focus on GraphRAG is a plus Familiarity with API gateway and service mesh architectures Experience with low latency/streaming, batch, and micro-batch processing Familiarity with Linux-based operating systems and REST APIs
Posted 1 week ago
8.0 - 12.0 years
0 Lacs
Mumbai, Maharashtra, India
On-site
Introduction A career in IBM Consulting is rooted by long-term relationships and close collaboration with clients across the globe. Youll work with visionaries across multiple industries to improve the hybrid cloud and AI journey for the most innovative and valuable companies in the world. Your ability to accelerate impact and make meaningful change for your clients is enabled by our strategic partner ecosystem and our robust technology platforms across the IBM portfolio including Software and Red Hat. Curiosity and a constant quest for knowledge serve as the foundation to success in IBM Consulting. In your role, youll be encouraged to challenge the norm, investigate ideas outside of your role, and come up with creative solutions resulting in ground breaking impact for a wide network of clients. Our culture of evolution and empathy centers on long-term career growth and development opportunities in an environment that embraces your unique skills and experience Your role and responsibilities Role Overview : We are looking for an experienced Denodo SME to design, implement, and optimize data virtualization solutions using Denodo as the enterprise semantic and access layer over a Cloudera-based data lakehouse. The ideal candidate will lead the integration of structured and semi-structured data across systems, enabling unified access for analytics, BI, and operational use cases. Key Responsibilities: Design and deploy the Denodo Platform for data virtualization over Cloudera, RDBMS, APIs, and external data sources. Define logical data models , derived views, and metadata mappings across layers (integration, business, presentation). Connect to Cloudera Hive, Impala, Apache Iceberg , Oracle, and other on-prem/cloud sources. Publish REST/SOAP APIs, JDBC/ODBC endpoints for downstream analytics and applications. Tune virtual views, caching strategies, and federation techniques to meet performance SLAs for high-volume data access. Implement Denodo smart query acceleration , usage monitoring, and access governance. Configure role-based access control (RBAC) , row/column-level security, and integrate with enterprise identity providers (LDAP, Kerberos, SSO). Work with data governance teams to align Denodo with enterprise metadata catalogs (e.g., Apache Atlas, Talend). Required education Bachelors Degree Preferred education Masters Degree Required technical and professional expertise Skills Required : 8-12 years in data engineering, with 4+ years of hands-on experience in Denodo Platform . Strong experience integrating RDBMS (Oracle, SQL Server), Cloudera CDP (Hive, Iceberg), and REST/SOAP APIs. Denodo Admin Tool, VQL, Scheduler, Data Catalog SQL, Shell scripting, basic Python (preferred). Deep understanding of query optimization , caching, memory management, and federation principles. Experience implementing data security, masking, and user access control in Denodo.
Posted 2 weeks ago
4.0 - 9.0 years
10 - 20 Lacs
Hyderabad, Chennai, Bengaluru
Work from Office
JD: • Good experience in Apache Iceberg, Apache Spark, Trino • Proficiency in SQL and data modeling • Experience with open Data Lakehouse using Apache Iceberg • Experience with Data Lakehouse architecture with Apache Iceberg and Trino
Posted 3 weeks ago
7.0 - 12.0 years
10 - 20 Lacs
Hyderabad
Remote
Job Title: Senior Data Engineer Location: Remote Job Type: Fulltime Experience Level: 7+ years About the Role: We are seeking a highly skilled Senior Data Engineer to join our team in building a modern data platform on AWS. You will play a key role in transitioning from legacy systems to a scalable, cloud-native architecture using technologies like Apache Iceberg, AWS Glue, Redshift, and Atlan for governance. This role requires hands-on experience across both legacy (e.g., Siebel, Talend, Informatica) and modern data stacks. Responsibilities: Design, develop, and optimize data pipelines and ETL/ELT workflows on AWS. Migrate legacy data solutions (Siebel, Talend, Informatica) to modern AWS-native services. Implement and manage a data lake architecture using Apache Iceberg and AWS Glue. Work with Redshift for data warehousing solutions including performance tuning and modelling. Apply data quality and observability practices using Soda or similar tools. Ensure data governance and metadata management using Atlan (or other tools like Collibra, Alation). Collaborate with data architects, analysts, and business stakeholders to deliver robust data solutions. Build scalable, secure, and high-performing data platforms supporting both batch and real-time use cases. Participate in defining and enforcing data engineering best practices. Required Qualifications: 7+ years of experience in data engineering and data pipeline development. Strong expertise with AWS services, especially Redshift, Glue, S3, and Athena. Proven experience with Apache Iceberg or similar open table formats (like Delta Lake or Hudi). Experience with legacy tools like Siebel, Talend, and Informatica. Knowledge of data governance tools like Atlan, Collibra, or Alation. Experience implementing data quality checks using Soda or equivalent. Strong SQL and Python skills; familiarity with Spark is a plus. Solid understanding of data modeling, data warehousing, and big data architectures. Strong problem-solving skills and the ability to work in an Agile environment.
Posted 3 weeks ago
4.0 - 7.0 years
10 - 14 Lacs
Noida
Work from Office
Location: Noida (In-office/Hybrid; client site if required) Type: Full-Time | Immediate Joiners Preferred Must-Have Skills: GCP (BigQuery, Dataflow, Dataproc, Cloud Storage) PySpark / Spark Distributed computing expertise Apache Iceberg (preferred), Hudi, or Delta Lake Role Overview: Be part of a high-impact Data Engineering team focused on building scalable, cloud-native data pipelines. You'll support and enhance EMR platforms using DevOps principles, helping deliver real-time health alerts and diagnostics for platform performance. Key Responsibilities: Provide data engineering support to EMR platforms Design and implement cloud-native, automated data solutions Collaborate with internal teams to deliver scalable systems Continuously improve infrastructure reliability and observability Technical Environment: Databases: Oracle, MySQL, MSSQL, MongoDB Distributed Engines: Spark/PySpark, Presto, Flink/Beam Cloud Infra: GCP (preferred), AWS (nice-to-have), Terraform Big Data Formats: Iceberg, Hudi, Delta Tools: SQL, Data Modeling, Palantir Foundry, Jenkins, Confluence Bonus: Stats/math tools (NumPy, PyMC3), Linux scripting Ideal for engineers with cloud-native, real-time data platform experience especially those who have worked with EMR and modern lakehouse stacks.
Posted 3 weeks ago
6 - 10 years
18 - 30 Lacs
Bengaluru
Remote
Design, develop, and optimize data pipelines and ETL/ELT workflows on AWS. Exp Data engineering and data pipeline development. AWS services, especially Redshift, Glue, S3, and Athena. Apache Iceberg or formats (like Delta Lake or Hudi). Required Candidate profile Legacy tools like Siebel, Talend, and Informatica Data governance tools like Atlan, Collibra, or Alation. Data quality checks using Soda or equivalent. Strong SQL and Python skills, Spark
Posted 2 months ago
8 - 13 years
20 - 35 Lacs
Gurgaon
Work from Office
Responsibilities: Technical Mastery: Demonstrate expertise in Core Java, Algorithms, and Multi-Threading, ensuring a robust understanding of JVM internals and hands-on experience in performance engineering. Data Platform Expertise: Join our dynamic data platform team, contributing your insights and skills to enhance our systems. Experience with at least one of the following table formats is required: Apache Hudi, Delta, or Apache Iceberg. Ideal candidates will have contributed to the architecture of data platforms. Open Source Contributions: While not mandatory, contributions to open-source projects like Apache Spark or Apache Trino/Presto are highly valued. Mentorship: Provide mentorship and guidance to younger engineers, sharing your knowledge and expertise to foster their growth and development. Maintaining Technical Excellence: Uphold high standards of technical proficiency and code quality in all aspects of your work. Metadata Services Layer: Take ownership of the critical metadata services layer, essential for our query engine's seamless operation. Table Format Support and Performance: Lead efforts in optimizing and supporting table formats such as Apache Hudi, Delta, and Apache Iceberg. Query Engine Optimisation: Dive into the internals of our query engine, focusing on performance enhancements like liquid clustering to deliver exceptional user experiences. Continuous Improvement: Engage in ongoing refinement and enhancement of the query engine's internals to ensure top-tier performance. Requirements: Strong Work Ethic: We seek individuals with an unparalleled dedication to their craft and a drive to deliver excellence in all endeavors. Customer-Centric Approach: If you derive satisfaction from seeing your work positively impact end-users, you'll thrive in our environment. Accountability: Take ownership of your responsibilities and demonstrate a high level of accountability for your work. Results-Oriented: Display a proactive attitude towards achieving goals and a willingness to go the extra mile to meet customer needs. Mentorship Skills: Demonstrate a passion for nurturing talent and supporting the professional growth of younger engineers within the team. Join us in revolutionizing the way data is managed and processed.
Posted 2 months ago
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
Accenture
36723 Jobs | Dublin
Wipro
11788 Jobs | Bengaluru
EY
8277 Jobs | London
IBM
6362 Jobs | Armonk
Amazon
6322 Jobs | Seattle,WA
Oracle
5543 Jobs | Redwood City
Capgemini
5131 Jobs | Paris,France
Uplers
4724 Jobs | Ahmedabad
Infosys
4329 Jobs | Bangalore,Karnataka
Accenture in India
4290 Jobs | Dublin 2