Site Reliability Engineer

4 - 8 years

3 - 10 Lacs

Posted:1 day ago| Platform: Foundit logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

This role is for one of the Weekday's clients

Min Experience: 4 years

Location: Bengaluru

JobType: full-time

We are looking for a seasonedSite Reliability Engineer (SRE)to join our infrastructure team. As an SRE, you will play a critical role in ensuring the reliability, scalability, and performance of our systems, particularly across bare metal infrastructure and containerized environments. You will be responsible for bridging the gap between software development and operations by applying a software engineering mindset to system administration topics. This role is ideal for someone passionate about automation, observability, infrastructure as code, and production excellence.

Requirements

Key Responsibilities:

  • Design, build, and maintain scalable and reliable infrastructure acrossbare metal environments.
  • Develop and manage containerized services usingDockerand orchestrate them usingKubernetes.
  • LeverageTerraformto implement and manage infrastructure as code, enabling consistent, repeatable deployments.
  • Create, maintain, and improve monitoring, alerting, and visualization systems usingGrafanaand other observability tools.
  • Collaborate closely with development teams to ensure new services are scalable, observable, and deployable.
  • Automate routine operational tasks to improve efficiency and reduce the risk of human error.
  • Troubleshoot complex production issues spanning applications, systems, networks, and services.
  • Participate in incident management, root cause analysis, and postmortem reviews to continuously improve system reliability.
  • Ensure high availability and performance of production systems and services.

Key Skills and Experience Required:

  • 48 yearsof hands-on experience in site reliability, DevOps, or infrastructure engineering roles.
  • Strong experience managingbare metal servers, including provisioning, configuration, and lifecycle management.
  • Deep understanding ofDockercontainers and orchestration usingKubernetes, including managing multi-node clusters in production environments.
  • Proficient in usingTerraformfor building and managing infrastructure across environments (cloud/on-prem).
  • Hands-on experience withGrafanafor monitoring and visualization, along with Prometheus or other metrics tools.
  • Solid understanding of system internals (Linux), networking concepts, and distributed system patterns.
  • Experience with CI/CD pipelines and automating deployment workflows.
  • Proficiency in at least one scripting or programming language such as Python, Bash, or Go.
  • Familiarity with logging, alerting, and tracing tools and principles of observability.
  • Strong problem-solving and analytical skills, with the ability to work independently and as part of a team.

Good to Have:

  • Exposure to hybrid or multi-cloud environments.
  • Experience with performance tuning and capacity planning.
  • Background in security best practices for infrastructure.
  • Familiarity with configuration management tools like Ansible, Chef, or Puppet.

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now

RecommendedJobs for You

Serilingampalli, Telangana, India

Navi Mumbai, Maharashtra, India