Software Engineer - SRE

3 - 5 years

11 - 15 Lacs

Posted:1 week ago| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description

This role is not just about applying AI; it s about applying engineering mindset and AI capabilities to reliability problems
You should be comfortable writing clean, maintainable code and have a understanding of SRE principles such as observability, incident response, and automation By combining software skills with practical knowledge of operational challenges, youll help eliminate toil, drive proactive reliability improvements, and embed intelligence into day-to-day engineering workflows Your efforts will directly contribute to unifying reliability efforts across teams, enabling consistent engineering standards, and fostering a shared accountability model for service health By driving operational discipline and aligning reliability goals with business priorities, you will help create a culture where platform stability, developer productivity, and customer experience go hand in hand These contributions will play a vital role in supporting the organizations broader strategy enabling faster innovation, scalable growth, and a resilient technology foundation aligned with long-term business outcomes
Key Responsibilities
Support initiatives to enhance SRE capabilities using AI/ML, ensuring strong foundations in reliability engineering and operational excellence
Leverage AI and machine learning technologies to architect and implement solutions that advance the overall SRE agenda improving reliability, automation, observability, and operational efficiency across complex systems
Contribute to incident management, change management, and release processes bringing structure, automation, and intelligent insights to drive stability, safety, and velocity
Participate and Drive key SRE practices and routines including initiation and facilitation of SRE Community of Practice (CoP), aligning SLAs/SLOs, launching error budget governance, and enabling data-driven process improvements across reliability areas
Partner effectively with SREs, platform engineers, and data teams to develop production-grade, measurable, and reliable models and tools
Develop and maintain internal frameworks and tooling to accelerate AI/ML adoption across reliability use cases
Partner , Understand and assist in driving Zero-Touch Operations by enabling platforms to detect, analyze, and resolve issues autonomously
Utilize metrics, logs, and historical incident data to build actionable insights and reliability dashboards
Actively participate in on-call rotations, improving incident response processes and escalation management
Integrate security best practices into workflows and collaborate with security teams to ensure platform stability
Contribute significantly to shaping the AI-in-SRE strategy and mentor junior team members
Required Skills & Qualifications
3 5 years of experience as a software engineer or platform engineer, with a strong focus on building production-grade systems, developer tooling, or intelligent automation
LLM-Native Development Approach- Proficiency in leveraging LLM-powered tools (e
g, for research, code generation, or automation) Demonstrated experience building AI-assisted workflows or custom automations that enhance engineering efficiency, reduce manual effort, or accelerate operational tasks
Proficient in Python, Go, or equivalent, with strong software engineering fundamentals testing, version control, CI/CD, and clean code practices
Understanding of core SRE principles (SLIs/SLOs, incident response, error budgets), with the ability to partner with SREs to productionize reliability tooling
Hands-on experience with cloud platforms (AWS, GCP, Azure), containers/orchestration (Docker, Kubernetes), and infrastructure-as-code patterns
Familiarity with observability and telemetry systems building or integrating with tools like Prometheus, OpenTelemetry, or Elastic stack
Comfortable working with Linux-based systems, debugging performance issues, and understanding systems-level behavior
Ability to translate operational pain points into intelligent, automated solutions using software, AI, and data-driven techniques
Preferred Qualifications
Advanced SRE Practice Exposure: Familiarity with operating in mature SRE environments such as participating in production readiness reviews, chaos engineering exercises, Capacity planning, Error budget governance and operational health reviews etc
Exposure to building AI-assisted tools using LLMs, vector databases, or prompt engineering techniques to streamline engineering or operational workflows would be a big plus

Mock Interview

Practice Video Interview with JobPe AI

Start Machine Learning Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You