Posted:1 day ago|
Platform:
On-site
Full Time
Own and manage the enterprise Data Lake infrastructure on AWS and Databricks, ensuring reliability, scalability, and governance.
Design, develop, and optimize data ingestion and transformation pipelines from MySQL to Kafka (CDC pipelines) and from Kafka to Databricks using Spark Structured Streaming.
Strong expertise in designing and implementing MapReduce jobs to process and transform large-scale datasets efficiently across distributed systems.
Capable of optimizing MapReduce workflows for performance, fault tolerance, and scalability in big data environments.
Build robust batch and near real-time data pipelines capable of handling high-volume, high-velocity data efficiently.
Develop and maintain metadata-driven data processing frameworks, ensuring consistency, lineage, and traceability.
Implement and maintain strong observability and monitoring systems (logging, metrics, alerting) using Prometheus, Grafana, or equivalent tools.
Work closely with Product, Regulatory, and Security teams to ensure compliance, privacy, and data quality across the data lifecycle.
Collaborate with cross-functional teams to build end-to-end data lakehouse solutions integrating multiple systems and data sources.
Apply best practices in code quality, CI/CD automation (Jenkins, GitHub Actions), and infrastructure as code (IaC) for deployment consistency.
Ensure system reliability and scalability through proactive monitoring, performance tuning, and fault-tolerant design.
Stay up to date with the latest technologies in data engineering, streaming, and distributed systems, and drive continuous improvements.
Strong programming expertise in one or more of the following: Scala, Spark, Java, or Python.
Experience 6 - 9 Years
Proven experience working with Kafka (Confluent or Apache) for building event-driven or CDC-based pipelines.
Hands-on experience with distributed data processing frameworks (Apache Spark, Databricks, or Flink) for large-scale data handling.
Solid understanding of Kubernetes for deploying and managing scalable, resilient data workloads (EKS experience preferred).
Practical experience with AWS Cloud Services such as S3, Lambda, EMR, Glue, IAM, and CloudWatch.
Experience designing and managing data lakehouse architectures using Databricks or similar platforms.
Familiarity with metadata-driven frameworks and principles of data governance, lineage, and cataloging.
Experience with CI/CD pipelines (Jenkins, GitHub Actions) for data workflow deployment and automation.
Experience with monitoring and alerting frameworks such as Prometheus, Grafana, or ELK stack.
Problem-solving, communication, and collaboration skills.
Freshworks
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
We have sent an OTP to your contact. Please enter it below to verify.
Practice Java coding challenges to boost your skills
Start Practicing Java Now
chennai, tamil nadu, india
Salary: Not disclosed
chennai, tamil nadu, india
Salary: Not disclosed
chennai, tamil nadu, india
Salary: Not disclosed
chennai, tamil nadu, india
Salary: Not disclosed
bengaluru
7.0 - 10.0 Lacs P.A.
chennai, tamil nadu, india
Salary: Not disclosed
chennai, tamil nadu, india
Salary: Not disclosed
greater chennai area
Salary: Not disclosed
Greater Chennai Area
Salary: Not disclosed
Chennai, Tamil Nadu, India
30.7 - 66.52 Lacs P.A.