Lead Architect SRE & Observability

10 - 15 years

22 - 27 Lacs

Posted:3 days ago| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description


Key Responsibilities
:

  • Monitoring & Observability (CAMO Focus)
  • Architect and lead end-to-end observability strategies (logs, metrics, traces) across on-premises, private, and public cloud environments.
  • Manage and mature enterprise observability solutions across complex architectures.
  • Define standards for telemetry data collection, correlation, and alerting for distributed systems.
  • Collaborate with application and infrastructure teams to ensure instrumentation coverage and SLO/SLI definition.
  • Lead the migration and consolidation of legacy monitoring platforms to modern observability stacks.
  • Enable proactive problem detection, root cause analysis, and capacity forecasting using analytics and AI/ML insights.

  • Site Reliability Engineering (SRE Focus)
  • Define and implement SRE principles (SLIs/SLOs, error budgets, chaos testing, postmortems, etc.) across supported services.
  • Design and manage infrastructure automation, CI/CD pipelines, AI/ML solutions, runbooks, and self-healing systems.
  • Lead incident response coordination during major outages and drive post-incident analysis and systemic fixes.
  • Collaborate with DevOps, Cloud, and Security teams to enforce resiliency, observability, and reliability as core design principles.
  • Mentor junior SREs and CAMO engineers to grow technical and operational expertise.

  • Technical
    Skills:

  • Expertise in designing and implementing observability frameworks including logs, metrics, and traces across hybrid environments (on-premises, private cloud, public cloud).
  • Strong understanding of distributed systems, microservices architecture, and telemetry pipelines.
  • Proficiency in infrastructure automation and configuration management using tools like Terraform, Ansible, and scripting languages (Python, Shell, etc.).
  • Experience with CI/CD pipelines, incident response automation, and self-healing systems.
  • Familiarity with container orchestration platforms (e.g., Kubernetes) and virtualization technologies.

  • Functional Knowledge:
  • Experience in implementing cyber asset management and security observability principles.
  • Familiarity with AIOPS, ITSM, CAASM tools and configuration management databases.
  • Exposure to compliance and governance frameworks such as CIS, NIST for cyber resilience, observability and alerting.
  • Relevant certifications in observability, cloud platforms, SRE, or security domains.

  •  
    Qualifications: 
  • Bachelors or Masters degree in computer science, Engineering, or related field.10-15 years of experience in IT Operations, SRE, DevOps, or Monitoring Engineering roles.
  • Strong expertise in modern observability platforms and telemetry pipelines.
  • Experience with hybrid environments including virtualization, container orchestration, and cloud platforms.
  • Proven track record in automation, telemetry governance, and infrastructure as code.
  • Excellent incident management, communication, and stakeholder engagement skills.

  • Interpersonal Skills

  • Communicates difficult concepts and negotiates with others to adopt a different point of view

  • Additional Information
    Time Type:
    Full timeEmployee Type:
    Assignee / RegularTravel:
    Yes, 10% of the TimeRelocation Eligible:
    Yes

    Mock Interview

    Practice Video Interview with JobPe AI

    Start Python Interview
    cta

    Start Your Job Search Today

    Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

    Job Application AI Bot

    Job Application AI Bot

    Apply to 20+ Portals in one click

    Download Now

    Download the Mobile App

    Instantly access job listings, apply easily, and track applications.

    coding practice

    Enhance Your Python Skills

    Practice Python coding challenges to boost your skills

    Start Practicing Python Now
    Applied Materials logo
    Applied Materials

    Semiconductor Manufacturing

    Santa Clara CA

    RecommendedJobs for You