Home
Jobs

Tech Manager

14 - 20 years

15 - 20 Lacs

Posted:11 hours ago| Platform: Naukri logo

Apply

Work Mode

Hybrid

Job Type

Full Time

Job Description

So, what’s the role all about?

Site Reliability Engineering (SRE) Manager

How will you make an impact?

  • Build server-side software using Java
  • Lead and mentor a team of SREs; support their career growth and ensure strong team performance.
  • Drive initiatives to improve availability, reliability, observability, and performance of applications and infrastructure.
  • Establish SLOs/SLAs and implement monitoring systems, dashboards, and alerting to measure and uphold system health.
  • Develop strategies for incident management, root cause analysis, and postmortem reporting.
  • Build scalable automation solutions for infrastructure provisioning, deployments, and system maintenance.
  • Collaborate with cross-functional teams to design fault-tolerant and cost-effective architectures.
  • Promote a culture of continuous improvement and reliability-first engineering.
  • Participate in capacity planning and infrastructure scaling.
  • Manage on-call rotations and ensure incident response processes are effective and well-documented.
  • Work in a fast-paced, fluid landscape while managing and prioritizing multiple responsibilities
  •  

Have you got what it takes?

  • Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
  • 10+ years of overall experience in SRE/DevOps roles, with at least 2 years managing technical teams.
  • Proficiency in at least one programming language (e.g., Python, Go, Java, C#) and experience with scripting languages (e.g., Bash, PowerShell).
  • Deep understanding of cloud computing platforms (e.g., AWS), the working and reliability constraints of some of the prominent services (e.g., EC2, ECS, Lambda, DynamoDB etc)
  • Experience with infrastructure as code tools such as CloudFormation, Terraform.
  • Deep understanding of CI/CD concepts and experience with CI/CD tools such as Jenkins, GitLab CI/CD, or CircleCI.
  • Strong knowledge of containerization technologies (e.g., Docker, Kubernetes) and microservices architecture.
  • Experience with monitoring and observability tools (e.g., Prometheus, Grafana, ELK).
  • Working experience of Grafana Observability Suite (Loki, Mimir, Tempo).
  • Experience in implementing OpenTelemetry protocol in Microservice environment.
  • Excellent problem-solving skills and the ability to troubleshoot complex issues in distributed systems.
  • Experience of Incident management and blameless postmortems that includes driving the incident response efforts during outages and other critical incidents, resolution, and communication in a cross-functional team setup.

Good to have skills:

  • Handson experience of working with large Kubernetes Cluster. Certification will be an added plus.
  • Administration and/or development experience of standard monitoring and automation tools such as Splunk, Datadog, Pagerduty Rundeck. 
  • Familiarity with configuration management tools like Ansible, Puppet, or Chef.
  • Certifications such as AWS Certified DevOps Engineer, Google Cloud Professional DevOps Engineer, or equivalent.

Mock Interview

Practice Video Interview with JobPe AI

Start Java Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Golang Skills

Practice Golang coding challenges to boost your skills

Start Practicing Golang Now
NICE
NICE

Software / Technology

Bulgaria

RecommendedJobs for You

Pune, Maharashtra, India

Hyderabad, Telangana, India

Pune, Maharashtra, India