Posted:6 days ago|
Platform:
On-site
Full Time
Key Responsibilities
Ensure platform uptime and application health as per SLOs/KPIs
Monitor infrastructure and applications using ELK, Prometheus, Zabbix, etc.
Debug and resolve complex production issues, performing root cause analysis
Automate routine tasks and implement self-healing systems
Design and maintain dashboards, alerts, and operational playbooks
Participate in incident management, problem resolution, and RCA documentation
Own and update SOPs for repeatable processes
Collaborate with L3 and Product teams for deeper issue resolution
Support and guide L1 operations team
Conduct periodic system maintenance and performance tuning
Respond to user data requests and ensure timely resolution
Address and mitigate security vulnerabilities and compliance issues Technical Skillset
Hands-on with Spark, Hive, Cloudera Hadoop, Kafka, Ranger
Strong Linux fundamentals and scripting (Python, Shell)
Experience with Apache NiFi, Airflow, Yarn, and Zookeeper
Proficient in monitoring and observability tools: ELK Stack, Prometheus, Loki
Working knowledge of Kubernetes, Docker, Jenkins CI/CD pipelines
Strong SQL skills (Oracle/Exadata preferred)
Familiarity with DataHub, DataMesh, and security best practices is a plus
Strong problem-solving and debugging mindset
Ability to work under pressure in a fast-paced environment.
Excellent communication and collaboration skills.
Ownership, customer orientation, and a bias for actionKey Responsibilities
Ensure platform uptime and application health as per SLOs/KPIs
Monitor infrastructure and applications using ELK, Prometheus, Zabbix, etc.
Debug and resolve complex production issues, performing root cause analysis
Automate routine tasks and implement self-healing systems
Design and maintain dashboards, alerts, and operational playbooks
Participate in incident management, problem resolution, and RCA documentation
Own and update SOPs for repeatable processes
Collaborate with L3 and Product teams for deeper issue resolution
Support and guide L1 operations team
Conduct periodic system maintenance and performance tuning
Respond to user data requests and ensure timely resolution
Address and mitigate security vulnerabilities and compliance issues Technical Skillset
Hands-on with Spark, Hive, Cloudera Hadoop, Kafka, Ranger
Strong Linux fundamentals and scripting (Python, Shell)
Experience with Apache NiFi, Airflow, Yarn, and Zookeeper
Proficient in monitoring and observability tools: ELK Stack, Prometheus, Loki
Working knowledge of Kubernetes, Docker, Jenkins CI/CD pipelines
Strong SQL skills (Oracle/Exadata preferred)
Kezan Consulting
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
We have sent an OTP to your contact. Please enter it below to verify.
bengaluru, karnataka, india
4.0 - 9.0 Lacs P.A.
delhi, delhi, india
4.0 - 9.0 Lacs P.A.
Noida, Uttar Pradesh, India
4.0 - 9.0 Lacs P.A.
Pune, Maharashtra, India
4.0 - 9.0 Lacs P.A.
Gurgaon / Gurugram, Haryana, India
4.0 - 9.0 Lacs P.A.
Noida, Gurugram, Delhi / NCR
3.0 - 8.0 Lacs P.A.
hyderabad, bengaluru
10.0 - 18.0 Lacs P.A.
bengaluru
15.0 - 18.0 Lacs P.A.
gurugram
5.0 - 7.0 Lacs P.A.
pune, bengaluru
5.0 - 10.0 Lacs P.A.