Site Reliability Engineer - 2

3 - 5 years

5 - 8 Lacs

Posted:1 day ago| Platform: Foundit logo

Apply

Skills Required

Work Mode

On-site

Job Type

Full Time

Job Description

SRE-2

What You'll Do to Keep Our Engines Roaring

  • Be a Reliability Champion

    : Take ownership of the reliability, performance, and efficiency of critical services.
  • Automate, Automate, Automate

    : Design, develop, and implement robust automation solutions to eliminate toil, streamline operations, and improve system resilience.
  • Battle Incidents (and Win)

    : Lead troubleshooting efforts for complex production incidents, perform in-depth root cause analysis, and implement sustainable preventative measures.
  • Sculpt Our Infrastructure

    : Actively contribute to the design, implementation, and optimization of our cloud infrastructure on

    AWS and GCP

    , leveraging your expertise in technologies like Kubernetes.
  • Enhance Observability

    : Implement and refine advanced monitoring, alerting, and logging solutions to gain deep insights into system behavior and predict potential issues.
  • Collaborate for Success

    : Partner closely with development teams to influence architectural decisions, ensuring reliability, scalability, and security are built in from the start.
  • Strengthen Our Security Posture

    : Implement and advocate for advanced security practices within our infrastructure and operational workflows.
  • Drive Efficiency

    : Analyze and optimize cloud infrastructure spend, identifying and implementing cost-saving opportunities.
  • Guide the Next Wave

    : Mentor and guide SRE-1 engineers, contributing to the growth and knowledge sharing within the team.
  • Be Ready for Action

    : Participate in our on-call rotation, acting as a key point of escalation and resolution for critical issues.

What Makes You the Ideal Candidate

  • 3-5 years of hands-on experience

    in Site Reliability Engineering, DevOps, or a similar role with a strong focus on production systems.
  • Demonstrated expertise in

    Python or Go

    you have a proven track record of automating complex tasks.
  • Strong command of

    AWS and/or GCP cloud platforms

    .
  • In-depth experience with containerization and orchestration using

    Kubernetes (K8s, ArgoCD, Helm/Kustomize)

    .
  • Experience with infrastructure as code tools like

    Terraform or Ansible

    is highly valued.
  • Solid understanding and experience with

    monitoring and observability stacks

    (VictoriaMetrics, Prometheus, Grafana, ELK stack, etc.).
  • Deep knowledge of

    Linux/Unix systems internals and advanced networking concepts

    .
  • Proven ability to diagnose and resolve complex issues in large-scale distributed systems.
  • A strong understanding of

    Cloud Security and Information Security principles and best practices

    .
  • Experience with cloud cost analysis and optimization techniques.
  • Familiarity with CI/CD pipelines and GitOps methodologies.
  • Experience with messaging queues and distributed systems (Celery, Kafka) is a plus.
  • Excellent communication, collaboration, and problem-solving skills.
  • A desire to mentor and lead by example.

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now
MoEngage logo
MoEngage

Marketing Technology

Mumbai

RecommendedJobs for You

Bengaluru / Bangalore, Karnataka, India