Get alerts for new jobs matching your selected skills, preferred locations, and experience range. Manage Job Alerts
3.0 - 9.0 years
3 - 7 Lacs
Navi Mumbai, Maharashtra, India
On-site
Responsibilities: Lead a team of engineers managing a high-availability infrastructure environment using AWS Cloud, Kubernetes, Linux, CI/CD, and other open-source technologies. Support daily operations with a focus on triage and root cause analysis, understanding the business impact of products, and practicing sustainable incident response with blameless postmortems. Take a holistic approach to problem-solving during production events to optimize mean time to recovery. Provide technical leadership and mentor junior team members on best practices and processes. Analyse ITSM activities and provide feedback to infrastructure and development teams on operational gaps or resiliency concerns. Assess the security of existing and proposed systems, recommending and implementing plans to resolve vulnerabilities. Establish and recommend processes, policies, and standards for system use and services, and innovate on new methodologies to improve operations. Understand business processes and production applications to troubleshoot issues and recommend short and long-term resolutions. Coordinate with internal groups to resolve recurrent problems, alerts, and escalated issues, ensuring clear communication. Exhibit a sense of urgency to resolve issues and ensure SLAs and operational standards are met. Clear communication skills, both spoken and written, with the ability to read and interpret complex information, talk with customers, and listen well. Qualifications: BSc/MSc IT/Comp, BCA, MCA, BE, BTech. 7+ years of SRE and system administration experience with large-scale cloud-native microservices platforms. 3+ years of hands-on experience managing and monitoring large-scale Kubernetes clusters in the public cloud, specifically AWS. Strong communication and leadership skills to foster team collaboration and problem-solving. Experience with infrastructure automation and scripting using Python and/or bash scripting. Experience with Infrastructure-as-Code using Terraform, Cloud Formation, Packer, etc. Strong hands-on experience with monitoring tools such as Splunk, Dynatrace, Prometheus, Grafana, ELK stack, etc., to build observability for large-scale microservices deployments. Excellent problem-solving, triaging, and debugging skills in large-scale distributed systems. Experience managing cloud infrastructure and operations in strict security, compliance, and regulatory environments. Experience with CI/CD frameworks and Pipeline-as-Code such as Jenkins, Spinnaker, Gitlab, Argo, Artifactory, etc. Proven skills to work effectively across teams and functions to influence the design, operations, and deployment of highly available software. AWS Solutions Architect certification preferred.
Posted 1 day ago
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
We have sent an OTP to your contact. Please enter it below to verify.
Accenture
32455 Jobs | Dublin
Wipro
16590 Jobs | Bengaluru
EY
11025 Jobs | London
Accenture in India
10991 Jobs | Dublin 2
Amazon
8878 Jobs | Seattle,WA
Uplers
8715 Jobs | Ahmedabad
IBM
8204 Jobs | Armonk
Oracle
7750 Jobs | Redwood City
Capgemini
6181 Jobs | Paris,France
Muthoot FinCorp (MFL)
6170 Jobs | New Delhi