9 - 14 years

12 - 19 Lacs

Posted:-1 days ago| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description

Role overview

The SRE lead will oversee the reliability, performance, and operational excellence of cloud and on-premise infrastructure, applications, and services. This role combines deep technical expertise with leadership responsibilityensuring stability, security, and scalability across environments. The SRE lead will manage a team of engineers providing support and lead initiatives around automation, patching, monitoring, and FinOps optimization to ensure high availability and efficiency.

Key responsibilities

1. Infrastructure and VM management

  • Oversee VM provisioning, patching, scaling, and performance management.
  • Automate patching and log maintenance processes to minimize downtime.
  • Ensure monthly updates, backups, and system health checks.
  • Coordinate with business teams to schedule patching for minimal impact.

2. Application and CI/CD service reliability

  • Manage patching and updates for app services and associated components.
  • Administer Jenkins pipelines, job management, and agent scaling.
  • Implement secure access controls and perform regular reviews.
  • Maintain Azure DevOps and pipeline governance for CI/CD stability.

3. Security and compliance

  • Support CSPM and vulnerability management, prioritizing high-severity remediation.
  • Respond to SOC/SIEM alerts, conducting incident triage and resolution.
  • Manage PAM integrations, access controls, and compliance tracking.
  • Maintain DNS and certificate lifecycle management, including renewals and secure updates.

4. Monitoring and observability

  • Establish unified monitoring for infrastructure, applications, and performance metrics.
  • Create dashboards and alerting systems to proactively detect anomalies.
  • Provide incident response coverage and periodic service health reports.
  • Conduct post-mortem analyses and implement corrective actions.

5. Cloud and FinOps operations

  • Optimize cloud resource usage and cost through detailed FinOps reporting.
  • Identify savings opportunities via rightsizing and unused resource cleanup.
  • Generate monthly cost reports by application, service, and environment.
  • Collaborate with business and finance teams for budget forecasting and cost governance.

6. Performance and scalability

  • Continuously monitor infrastructure utilization and adjust resources dynamically.
  • Analyze performance data to drive improvements in reliability and efficiency.
  • Manage scaling of services and compute resources based on consumption trends.

7. Change and release management

  • Facilitate CAB meetings and manage end-to-end change lifecycle.
  • Review and prioritize change requests based on risk and business impact.
  • Supervise production deployments and implement rollback strategies.
  • Conduct post-implementation evaluations and report on success metrics.

8. Support and maintenance

  • Lead the SRE team in providing L3 support for incidents and operational issues.

  • Maintain documentation, knowledge bases, and troubleshooting guides.
  • Implement preventive maintenance measures to enhance system stability.

Qualifications and experience

Essential

  • Bachelor’s degree in computer science, engineering, or equivalent experience.
  • 8+ years of IT operations experience, with at least 3 years in an SRE or DevOps leadership role.
  • Expertise in cloud environments (Azure preferred), including infrastructure automation, monitoring, and FinOps.
  • Hands-on experience with CI/CD tools (Jenkins, Azure DevOps).
  • Strong knowledge of scripting (PowerShell, Python, or Bash).
  • Deep understanding of networking, security, and system administration principles.

Preferred

  • Experience with CSPM tools and vulnerability management platforms.
  • Familiarity with SOC/SIEM tools (e.g., Microsoft Sentinel, Splunk).
  • Strong communication and stakeholder management skills.
  • ITIL, Azure Administrator, or DevOps Engineer certification.

Key competencies

  • Reliability mindset:

    designs systems for fault tolerance and operational excellence.
  • Automation-first approach:

    reduces manual effort through tooling and scripts.
  • Leadership:

    mentors engineers and coordinates cross-functional initiatives.
  • Analytical rigor:

    uses data-driven insights for optimization and cost control.
  • Collaboration:

    works closely with security, development, and infrastructure teams to ensure seamless delivery.

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now
Advaiya Solutions logo
Advaiya Solutions

Information Technology and Services

Richmond

RecommendedJobs for You