Associate Site Reliability Engineer Specialist

6 years

0 Lacs

Posted:2 days ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Description

  • Summary of This Role
  • manage DevOps tools such as Jenkins, Git, Docker, Kubernetes, and Terraform. Use these skills to support, build and maintain Kubernetes clusters on-prem, in OCP and in AWS. Responsible for availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning. Site reliability engineers create a bridge between development and operations by applying a software engineering mindset to system administration topics. They split their time between operations/on-call duties and developing systems and software that help increase site reliability and performance.
  • What Part Will You Play?
    • Participate in architecture and R&D discussions for new technology or processes to increase the performance and reliability of our systems.
    • Chaos engineering - you’re expected to think laterally about how our systems might fail in theory, design tests to demonstrate how they behave in practice, and then formulate and implement remediation plans, as appropriate.
    • Pushing our systems to their limits, and then coming up with designs for how to get them to the next performance tier.
    • Use practices from DevOps and GitOps to improve automation and processes to make self service possible.
    • Safeguarding reliability. Ensuring that our services are highly available, resilient against disasters, self-monitoring, and self-healing.
    • Running “game days” to test assumptions about reliability and learn what will break before it matters to customers.
    • Reviewing designs with an eye toward increasing the holistic stability of our platform and identifying potential risks.
    • Building systems to proactively monitor the health, performance and security of our production and non-production virtualized infrastructure.
    • Improving our monitoring and alerting systems to make sure engineers get paged when it matters (and don’t get paged when it doesn’t).
    • Troubleshooting systems and network issues, alongside our Technical Operations Team.
    • Mentoring other engineers in reliability-related skills.
    • Evolving our SDLC, practices, and tooling to account for Site Reliability considerations and best practices.
    • Developing runbooks and improving documentation.
  • What Are We Looking For in This Role?
  • Minimum Qualifications
    • BS in Computer Science, Information Technology, Business / Management Information Systems or related field
    • Typically have 6+ years of experience with programming in one or more programming languages and 4 years of experience working with Unix/Linux systems internals and administration (e.g. filesystems, inodes, system calls) or networking (e.g. TCP/IP, routing, network topologies and hardware, SDN).
  • What Are Our Desired Skills and Capabilities?

Required

  • Basic familiarity with containerization tools like Docker.
  • Deep understanding of Kubernetes concepts, architecture, and best practices.
  • Familiarity with OpenShift Container Platform, its features, and how it extends Kubernetes.
  • Basic understanding of version control systems such as Git.
  • Basic knowledge of CI/CD concepts and tools (e.g., Jenkins, GitLab CI).
  • Basic understanding of Infrastructure as Code principles.
  • Basic knowledge of Linux operating systems.
  • Understanding of basic networking concepts and protocols.
  • Awareness of fundamental security practices and principles.
  • Basic understanding of securing applications and infrastructure.
  • Analytical skills to troubleshoot and resolve basic technical issues.
  • Ability to identify and escalate complex issues to senior team members.
  • Eagerness to learn new technologies and continuously improve technical skills.
  • Active participation in training sessions, workshops, and relevant certifications.

Preferred

  • Experience with cloud platforms (e.g., AWS, Azure, GCP) and their services.
  • Proficiency in scripting languages (e.g., Python, Bash, Groovy) and experience with automation tools (e.g., Ansible, Terraform, Salt).
  • Basic knowledge of monitoring and logging tools (e.g., Prometheus, Grafana).
  • Exposure to Kafka, Nats, Vault

Mock Interview

Practice Video Interview with JobPe AI

Start DevOps Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You