Site Reliability Engineer III

5 - 10 years

11 - 15 Lacs

Posted:2 days ago| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description

We are looking for someone who has:

  • Multi-cloud experience, both public and private cloud.
  • Strong knowledge of continuous delivery, testing, security practices, performance, and disaster recovery.
  • Experience supporting mission-critical, customer facing systems in production environments, including incident management response.

Responsibilities:

  • Collaborate with developers to promote the concept of reliability engineering during all phases of the SDLC to detect and correct performance issues early in the lifecycle.
  • Scope tooling and automation, monitoring, workflow management, maintaining and improving data pipelines, CI/CD, etc. Assess gaps in as-is monitoring tool capabilities and develop automated solutions to support the production infrastructure.
  • Establish and enhance infrastructure and application performance metrics; provide actionable reporting to proactively identify and address issues.
  • Run the CI/CD infrastructure production environment by monitoring availability and taking a holistic view of system health.
  • Performs proactive data analysis to identify problems before a service is impacted, and ad-hoc data analysis to quickly identify root cause for service impacting issues as they arise. Defines and implements alerting rules, and manages, prioritizes, and responds to alerts.

Knowledge, Skills, and Experience

  • Bachelor''s (or higher level) degree in one or more of these disciplines: Computer Science, Computer Engineering, or related fields.
  • 5+ years of professional experience in software engineering
  • Experience setting up and using incident and on-call management systems.
  • Experience setting up and building tools to collect and visualize data (logs, metrics, alerts), building dashboards, alerting, and monitoring systems.
  • Experience with deploying secure infrastructure and services in one or more cloud environments such as AWS or Azure.
  • Experience with configuration management and deployment automation tools, such as Terraform, Ansible, Packer, etc.
  • Proficiency in scripting languages such as Python and Bash.
  • Experience with container (Docker) and orchestration systems (Kubernetes).
  • Solid understanding of Linux OS + systems administration skills
  • Excellent analytical and trouble-shooting skills.
  • Dynamic collaborator who thrives in diverse, geographically distributed locales.
  • Team player that demonstrates diplomacy, promotion of sound ideas & concepts, paired with the desire to help others grow their skills.
  • Strong verbal and written communication skills.
  • Experience with NGINX technologies a strong plus.

Fundamental competencies:

SYSTEM EXPERIENCE

  • Application Build and Deployment Processes (git*, automation pipelines, Infrastructure as code, etc.)
  • Automated Application Delivery (load balancers, container orchestration, service mesh, High Availability architectures, Frontend, Backend technologies including database, etc.)
  • Service Operation (Define, instrument, measure, and manage service level objectives. Experience with observability tooling including logging infrastructure, time series metrics databases, tracing systems, alert definitions, etc.)
  • Incident management (service restoration, root cause analysis, postmortem authorship, define roles and responsibilities, etc.)
  • Security awareness and competencies, including security as code.
  • Configuration management

OBSERVABILITY

  • Explores beyond the obvious to ensure Service Level Objectives (SLO) are met.
  • Understands and measures system behaviors to quickly and efficiently diagnose, identify, and address needs.
  • Proactively test, automate, monitor outputs, leverage signals to infer services and needs.
  • Data management to explore properties, patterns, and distributed tracing

SOLUTIONIST

  • Constantly seeking ways to improve systems, making them more efficient and reducing toil.
  • Understands the difference between short-term strategic and long-term fixes
  • Simplifies decisions and judgments by recognizing what to pay attention to and what to ignore; a proficient problem solver. Tenacious and resourceful with an inherent predisposition toward action; unafraid to try something new in the name of innovation.

FORWARD THINKING

  • Possess an inherent bias toward innovation, always abreast of developing ideas and technologies. Thoughtfully and strategically considers future needs, opportunities, and advocates positive change.
  • Technological creativity and capacity

COMMUNICATION AND COLLABORATION

  • Conveys information, vision, and strategy in an accurate and timely manner, adjusting to ensure understanding based on the audience. Actively listens; seeks to understand rather than respond. Proactively solicits and values diverse perspectives, ideas, and opinions

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You

kanchipuram, tamil nadu, india

kanchipuram, tamil nadu, india