Posted:3 weeks ago|
Platform:
On-site
Full Time
Key Responsibilities Ensure platform uptime and application health as per SLOs/KPIs Monitor infrastructure and applications using ELK, Prometheus, Zabbix, etc. Debug and resolve complex production issues, performing root cause analysis Automate routine tasks and implement self-healing systems Design and maintain dashboards, alerts, and operational playbooks Participate in incident management, problem resolution, and RCA documentation Own and update SOPs for repeatable processes Collaborate with L3 and Product teams for deeper issue resolution Support and guide L1 operations team Conduct periodic system maintenance and performance tuning Respond to user data requests and ensure timely resolution Address and mitigate security vulnerabilities and compliance issues Technical Skillset Hands-on with Spark, Hive, Cloudera Hadoop, Kafka, Ranger Strong Linux fundamentals and scripting (Python, Shell) Experience with Apache NiFi, Airflow, Yarn, and Zookeeper Proficient in monitoring and observability tools: ELK Stack, Prometheus, Loki Working knowledge of Kubernetes, Docker, Jenkins CI/CD pipelines Strong SQL skills (Oracle/Exadata preferred) Familiarity with DataHub, DataMesh, and security best practices is a plus Strong problem-solving and debugging mindset Ability to work under pressure in a fast-paced environment. Excellent communication and collaboration skills. Ownership, customer orientation, and a bias for actionKey Responsibilities Ensure platform uptime and application health as per SLOs/KPIs Monitor infrastructure and applications using ELK, Prometheus, Zabbix, etc. Debug and resolve complex production issues, performing root cause analysis Automate routine tasks and implement self-healing systems Design and maintain dashboards, alerts, and operational playbooks Participate in incident management, problem resolution, and RCA documentation Own and update SOPs for repeatable processes Collaborate with L3 and Product teams for deeper issue resolution Support and guide L1 operations team Conduct periodic system maintenance and performance tuning Respond to user data requests and ensure timely resolution Address and mitigate security vulnerabilities and compliance issues Technical Skillset Hands-on with Spark, Hive, Cloudera Hadoop, Kafka, Ranger Strong Linux fundamentals and scripting (Python, Shell) Experience with Apache NiFi, Airflow, Yarn, and Zookeeper Proficient in monitoring and observability tools: ELK Stack, Prometheus, Loki Working knowledge of Kubernetes, Docker, Jenkins CI/CD pipelines Strong SQL skills (Oracle/Exadata preferred)
Kezan Consulting
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
My Connections Kezan Consulting
Noida, Uttar Pradesh, India
4.0 - 9.0 Lacs P.A.
Pune, Maharashtra, India
4.0 - 9.0 Lacs P.A.
Gurgaon / Gurugram, Haryana, India
4.0 - 9.0 Lacs P.A.
Noida, Gurugram, Delhi / NCR
3.0 - 8.0 Lacs P.A.
Uttar Pradesh
7.0 - 12.0 Lacs P.A.
14.0 - 15.0 Lacs P.A.
New Delhi, Padampur
2.0 - 3.0 Lacs P.A.
10.0 - 11.0 Lacs P.A.
Hyderabad
4.0 - 7.0 Lacs P.A.
15.0 - 25.0 Lacs P.A.