Site Reliability Engineer

6 - 8 years

7 - 17 Lacs

Posted:12 hours ago| Platform: Naukri logo

Apply

Work Mode

Hybrid

Job Type

Full Time

Job Description

Position Title:

Site Reliability Engineer (SRE), Ansible, and Linux Administrator

Role Overview:

Site Reliability Engineer (SRE)

Key Responsibilities:

Site Reliability Engineering (SRE):

  • Design, implement, and maintain highly available and scalable systems to ensure 99.9% uptime.
  • Monitor and improve system reliability, performance, and capacity planning.
  • Develop and maintain observability tools (e.g., Prometheus, Grafana, ELK Stack) to monitor system health and performance.
  • Respond to incidents, troubleshoot issues, and perform root cause analysis to prevent recurrence.
  • Automate repetitive operational tasks to reduce manual intervention and improve efficiency.
  • Collaborate with development teams to implement DevOps best practices and CI/CD pipelines.

Ansible Automation:

  • Develop and maintain

    Ansible playbooks

    for configuration management, application deployment, and infrastructure provisioning.
  • Automate repetitive tasks such as patch management, system updates, and application deployments.
  • Ensure Ansible configurations are version-controlled and follow best practices.
  • Troubleshoot and resolve issues with Ansible scripts and deployments.
  • Optimize existing Ansible workflows to improve efficiency and reduce execution time.

Linux Administration:

  • Manage and maintain Linux-based systems (e.g., RHEL, Ubuntu, CentOS) in production, staging, and development environments.
  • Perform system updates, patching, and security hardening to ensure compliance with organizational policies.
  • Configure and manage services such as Apache, Nginx, MySQL, PostgreSQL, and Docker.
  • Manage user accounts, permissions, and access control on Linux systems.
  • Troubleshoot and resolve system-level issues, including performance bottlenecks and hardware failures.
  • Implement and maintain backup and disaster recovery solutions for Linux systems.

Required Skills and Qualifications:

Technical Skills:

  • SRE Expertise:

    • Strong understanding of SRE principles, including SLAs, SLOs, and error budgets.
    • Experience with monitoring tools like Prometheus, Grafana, ELK Stack, or Datadog.
    • Proficiency in incident management and root cause analysis.
  • Ansible:

    • Hands-on experience with Ansible for configuration management and automation.
    • Ability to write and debug complex Ansible playbooks and roles.
    • Knowledge of Ansible Tower/AWX is a plus.
  • Linux Administration:

    • Strong experience with Linux systems (RHEL, Ubuntu, CentOS).
    • Proficiency in shell scripting (Bash) and familiarity with Python for automation.
    • Experience with system performance tuning, security hardening, and troubleshooting.
  • DevOps Tools:

    • Familiarity with CI/CD tools like Jenkins, GitLab CI, or GitHub Actions.
    • Experience with containerization tools like Docker and orchestration platforms like Kubernetes.
  • Networking:

    • Basic understanding of networking concepts (DNS, load balancing, firewalls, etc.).

Soft Skills:

  • Strong problem-solving and analytical skills.
  • Excellent communication and collaboration abilities.
  • Ability to work in a fast-paced, dynamic environment.
  • Proactive and self-motivated with a focus on continuous improvement.

Preferred Qualifications:

  • Experience with cloud platforms (AWS, Azure, GCP) and infrastructure-as-code tools (Terraform, CloudFormation).
  • Knowledge of security best practices for Linux systems and automation tools.
  • Certification in Linux (e.g., RHCSA, RHCE) or Ansible (e.g., Red Hat Certified Specialist in Ansible Automation).

Education and Experience:

  • Bachelors degree in Computer Science, Information Technology, or a related field.
  • 6+ years of experience in SRE, Linux Administration, and Ansible automation.

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now
Purview Services logo
Purview Services

Data Management

San Francisco

RecommendedJobs for You