Home
Jobs

2 Chaos Engineering Jobs

Filter
Filter Interviews
Min: 0 years
Max: 25 years
Min: ₹0
Max: ₹10000000
Setup a job Alert
JobPe aggregates results for easy application access, but you actually apply on the job portal directly.

5.0 - 7.0 years

5 - 9 Lacs

Mumbai, Bengaluru, Delhi / NCR

Work from Office

Naukri logo

Key Responsibilities : Chaos Engineering : - Design and implement chaos engineering experiments to identify weaknesses in systems and applications. - Develop and execute strategies to improve system resilience and reliability. - Analyze experiment results, provide actionable insights, and drive remediation efforts. - Collaborate with development, operations, and infrastructure teams to integrate chaos engineering practices. Operational Acceptance : - Develop and maintain comprehensive operational acceptance criteria for new and existing systems. - Conduct thorough operational acceptance testing, ensuring systems meet all predefined criteria before go-live. - Work closely with project managers, developers, and QA teams to align operational acceptance processes with project timelines and objectives. - Document and communicate operational readiness findings, providing recommendations for improvement. System Resilience and Reliability : - Implement and manage strategies for continuous improvement of system resilience and reliability. - Monitor and assess system performance, identifying potential risks and areas for enhancement. - Lead initiatives to improve disaster recovery and business continuity plans. - Stay updated with the latest industry trends and best practices in chaos engineering and operational acceptance. Collaboration and Training : - Educate and mentor team members on chaos engineering and operational acceptance methodologies. - Foster a culture of resilience and reliability within the organization. - Engage with external communities, attending conferences and participating in knowledge-sharing events. Requirements : - Extensive experience in chaos engineering, operational acceptance testing, and system resilience. - Strong understanding of cloud platforms (AWS, Azure, GCP) and their resilience features. - Proficiency in scripting and automation tools (Python, Bash, Terraform, etc. - Experience with monitoring and observability tools (Prometheus, Grafana, Splunk, etc. - Experience with Chaos Engineering Tools such as Gremlin, Chaos Monkey etc. - Excellent analytical and problem-solving skills. - Strong communication and collaboration skills, with the ability to work effectively in cross-functional teams. - Certifications in relevant fields (e.g , AWS Certified Solutions Architect, Azure DevOps Engineer) are a plus. Location: Delhi NCR,Bangalore,Chennai,Pune,Kolkata,Ahmedabad,Mumbai,Hyderabad

Posted 3 weeks ago

Apply

4 - 8 years

5 - 15 Lacs

Chennai, Hyderabad

Work from Office

Naukri logo

L2 Support Engineer (SRE Chaos Engineering) Area: Private cloud VMware, OpenStack, Kubernetes Linux, Monitoring, Reliability Engineering Defining & implementing practices in Resiliency Engineering, Automation, Observability & Chaos Testing while also engraining a proactive Chaos Culture that thinks reliability first design Scope of work • Supervise a team of SREs, ensuring that production applications which team supports are stable, reliable, and well documented. Own end to end availability and performance of mission critical service. Contributing to the design/architecture of the system. Analyze system architectures to identify single points of failure and other areas that may present a resiliency deficiency. Develop software to automate chaos and resiliency test cases that simulate failures in a system that performs financial data processing. Integrate Chaos engineering with CI/CD process. Establish a process to define a hypothesis around a steady-state and to simulate real-world events. Executing Game Days on mission critical applications. Identification of top errors, reliability issues and driving root cause to avoid repeat of incidents. Ability to analyze and debug complex issues across tiers from frontend to mid-tier to infrastructure. Hands on experience on any Chaos tool (Harness, Litmus, Gremlin, Chaos monkey, and ChaosBlade). Mindset to identify and explore chaotic situations and conduct formalized experiments. Experience with monitoring and logging tools (e. g. Datadog, ELK, Prometheus, Grafana). Experience with Kubernetes and Docker. Deep understanding of SRE concepts like SLAs, SLOs, SLIs, and error budgets. Experience working on cross department efforts by communicating and negotiating with multiple teams to accomplish goals. Expert with troubleshooting issues and bugs. Programming experience (Python/Go/shell). Experience in financial domain (desirable). Prior SRE/DevOps experience desirable. Skill Set " Experience in OS platforms (windows, linux, centos, ubuntu etc., ) highly skilled Site Reliability Engineer to join our Technology team and will be working as part of a cross-functional product team to create elegant solutions to highly complex and intricate business challenges. Ability to prioritize and multitask. Excellent communication and interpersonal skills

Posted 2 months ago

Apply
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Featured Companies