Posted:4 days ago| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description

Role & responsibilities

Job Title:

Experience:

Location:

Job Summary:

Dev Lead with strong Site Reliability Engineering (SRE)

Key Responsibilities:

  • Lead a cross-functional SRE/DevOps team responsible for the

    reliability, scalability, and performance

    of production systems.
  • Develop and implement

    monitoring and observability strategies

    using tools like

    Splunk, AppDynamics, and ThousandEyes

    .
  • Automate infrastructure deployment and configuration using

    Ansible

    and other IaC tools.
  • Utilize

    BigPanda

    for intelligent alerting and incident correlation.
  • Integrate and manage

    IPSoft/Amelia

    for automated L1 incident handling and resolution.
  • Design, develop, and maintain

    Python scripts

    for automation, data processing, and tool integration.
  • Define and track

    SLOs, SLIs, and SLAs

    in collaboration with product and operations teams.
  • Lead incident management processes and conduct

    postmortems

    to ensure continuous improvement.
  • Collaborate with development teams to enhance

    application reliability, deployment, and CI/CD

    processes.
  • Drive operational excellence,

    cost optimization

    , and security best practices.

Technical Skills Required:

  • Monitoring & Observability:

    Splunk, AppDynamics, ThousandEyes
  • Incident Management & Correlation:

    BigPanda, IPSoft/Amelia
  • Automation & Scripting:

    Python, Ansible, Shell scripting
  • DevOps Practices:

    CI/CD, version control (Git), infrastructure as code
  • Cloud Platforms:

    AWS, Azure, or GCP (preferred)
  • Containerization & Orchestration:

    Docker, Kubernetes (added advantage)
  • ITSM & Collaboration Tools:

    ServiceNow, Jira, Confluence

Required Qualifications:

  • Bachelor’s or Master’s degree in Computer Science, Engineering, or related discipline.
  • 11+ years of experience in DevOps/SRE, with at least

    3+ years in a leadership role

    .
  • Proven experience managing

    24x7 production systems

    with high availability and performance requirements.
  • Strong analytical, problem-solving, and

    incident response skills

    .
  • Excellent communication and leadership abilities.

Preferred Certifications (Optional):

  • AWS / Azure Certified DevOps Engineer
  • Splunk Core Certified Power User / Admin
  • AppDynamics Certified Associate Performance Analyst
  • Red Hat Certified Specialist in Ansible Automation
  • Python Certification (e.g., PCEP, PCAP)

Mock Interview

Practice Video Interview with JobPe AI

Start DevOps Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now
Redberyl Tech Solutions logo
Redberyl Tech Solutions

Software Development

Silicon Valley

RecommendedJobs for You

Hyderabad, Chennai, Bengaluru

Noida, Uttar Pradesh, India