Site Reliability Engineer III

2 - 9 years

16 - 18 Lacs

Posted:1 day ago| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description

  • Manages the collaboration with Software Engineering teams to design, develop, and implement features that enhance system resilience, scalability, and performance, proactively identifying and resolving system bottlenecks and failure points
  • Develops and refines sophisticated automation tools and frameworks, including advanced infrastructure as code (IaC) practices, to streamline operational workflows, deployment processes, and infrastructure management, ensuring high system efficiency
  • Engages in architectural design discussions, ensuring that advanced reliability, scalability, and performance considerations are integrated into strategic decision-making processes
  • Designs and executes comprehensive chaos engineering experiments and advanced resiliency testing, analyzing results to implement robust improvements that enhance system robustness and recovery capabilities
  • Develops, optimizes, and maintains comprehensive disaster recovery plans and business continuity strategies, ensuring systems can recover quickly and effectively from complex and unexpected disruptions
  • Advocates for observability practices by promoting and implementing best practices such as error budgeting, service-level objectives (SLOs), and service-level indicators (SLIs), contributing to a culture of continuous improvement and reliability
  • Collaborates and co-creates effectively with teams in product and the business to align technology initiatives with business objectives
Qualifications
  • Bachelors degree in computer science, Information Technology, Engineering, and/or comparable experience; advance degree preferred
  • Knowledge of modern observability stack - Splunk, Elastic Search, Prometheus, Grafana
  • Knowledge of containerization technologies (eg, Kubernetes, Docker) and microservices architecture
  • Knowledge of observability tools and methodologies, including experience with logging, monitoring, tracing, and performance analysis platforms
  • Knowledge of cloud-based Site Reliability Engineering (SRE) practices and experience with public cloud platforms such as AWS, Azure, or Google Cloud
We back you with benefits that support your holistic we'll-being so you can be and deliver your best
This means caring for you and your loved ones physical, financial, and mental health, as we'll as providing the flexibility you need to thrive personally and professionally:
  • Competitive base salaries
  • Bonus incentives
  • Support for financial-we'll-being and retirement
  • Comprehensive medical, dental, vision, life insurance, and disability benefits (depending on location)
  • Flexible working model with hybrid, onsite or virtual arrangements depending on role and business need
  • Generous paid parental leave policies (depending on your location)
  • Free access to global on-site we'llness centers staffed with nurses and doctors (depending on location)
  • Free and confidential counseling support through our Healthy Minds program
  • Career development and training opportunities

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now
AMERICAN EXPRESS logo
AMERICAN EXPRESS

Financial Services

New York NY

RecommendedJobs for You