Home
Jobs

Technical Lead

8 years

0 Lacs

Posted:11 hours ago| Platform: GlassDoor logo

Apply

Work Mode

On-site

Job Type

Part Time

Job Description

SRE Job Description At CHUBB , we’re passionate about building software that solves problems. We count on our site reliability engineers (SREs) to empower our users with a rich feature set, high availability, and stellar performance level to pursue their missions. As we expand our customer deployments, we are currently seeking an experienced SRE to deliver insights from massive scale data in real time. Specifically, we are searching for someone who brings fresh ideas, demonstrates a unique and informed viewpoint, and enjoys collaborating with a cross-functional team to develop real-world solutions and positive user experiences at every interaction. Job Summary: As a Site Reliability Engineer, you will play a critical role in enhancing the reliability, availability, and performance of our systems. You will collaborate closely with software engineers, product managers, and other stakeholders to drive performance improvements, automate processes, and resolve issues swiftly. You will be responsible for defining and tracking Service Level Objectives (SLOs), Service Level Indicators (SLIs), and managing error budgets to ensure optimal service reliability. A key part of your role will be to identify and reduce toil through automation and efficiency-improving strategies. Additionally, you will leverage your ability to understand application architecture and derive critical user journeys to ensure the system meets user expectations. Key Responsibilities: Design, implement, and manage highly reliable and scalable systems and services. Define and track SLIs and SLOs for critical services, ensuring alignment with business objectives and customer requirements. Monitor system performance using tools like AppDynamics and other monitoring solutions to ensure high availability and performance. Manage error budgets effectively, working proactively to reduce incidents and improve service quality. Identify, document, and reduce toil within operational processes through effective automation and optimizations. Develop and maintain tools for deployment, monitoring, and performance analysis, with a strong focus on reducing manual work. Understand application architecture to derive critical user journeys and ensure system reliability aligns with user expectations. Collaborate with software development teams to enhance the reliability and resilience of applications. Participate in on-call rotations and incident response efforts, contributing to postmortems and root cause analysis. Document system architecture, processes, and troubleshooting procedures. Establish and track key performance metrics and recommend improvements based on data analysis. Ensure adherence to security and compliance requirements for systems and applications. Stay up-to-date with industry trends and emerging technologies, suggesting innovative solutions where appropriate. Qualifications: Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent practical experience. 8+ years of experience in Site Reliability Engineering or a similar role. Strong experience with any cloud platforms (AWS, Azure, GCP) and container orchestration systems (Kubernetes, Docker). Proficiency in Python programming and scripting languages (e.g., Go, Bash). Familiarity with front-end technologies (e.g., HTML, CSS, JavaScript, React, Angular, or Vue.js) Good understanding of infrastructure, including operating systems, networking concepts, and distributed systems. Hands-on experience with monitoring and logging tools like AppDynamics, splunk. Familiarity with CI/CD tools and methodologies. Excellent troubleshooting and problem-solving skills. Strong communication and collaboration abilities. Preferred Qualifications: Experience with defining and managing SLOs, SLIs, and error budgets. Experience with configuration management tools (e.g., Ansible, Chef, Puppet). Knowledge of database systems (SQL and NoSQL) and their management. Experience with site reliability best practices and methodologies, including incident management and postmortem processes.

Mock Interview

Practice Video Interview with JobPe AI

Start Software Interview Now
Chubb
Chubb

38 Jobs

RecommendedJobs for You