Architect - Site Reliability Engineering

4 - 8 years

0 Lacs

Posted:2 days ago| Platform: Shine logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

The Architect Site Reliability Engineering plays a crucial role in providing technical leadership to support initiatives in cloud computing at Inspire. With a primary focus on enhancing efficiency, reducing toil, and increasing uptime and availability of Inspire's cloud platforms, you will collaborate with peers to influence cloud application and infrastructure design, improve production readiness reviews, streamline build/test/release automation, elevate observability practices, and fortify platform resiliency, scalability, and recovery capabilities. Your success in this role will stem from your ability to engage with diverse technical partners, employ data-driven problem-solving approaches, demonstrate self-motivation, and exhibit a commitment to continuous improvement. In this position, your responsibilities will include: - Involvement in the entire application and cloud services development lifecycle, from inception to refinement, ensuring well-designed and monitored software releases in collaboration with application and platform teams. - Designing, motivating, guiding, and supporting the development of software, systems, and processes to enhance product reliability, organizational efficiency, and resource optimization. - Advocating for reliability practices across the software development lifecycle through activities like architecture reviews, code reviews, platform creation, and capacity planning. - Collaborating with senior engineering and testing team members to develop tools and recommend testing strategies for problem prevention, detection, and chaos testing. - Enhancing SRE practices by establishing error budgets, refining SRE dashboards, and improving anomaly detection capabilities. - Providing design recommendations for platform enhancements based on production incident analysis and root cause investigations. - Improving service reliability through blameless post-incident reviews and leveraging automation tools to respond to or prevent future issues. - Identifying automation opportunities, designing tools, and supporting their implementation to automate routine, time-consuming, or manual tasks. - Periodically evaluating current SRE practices and tools to suggest enhancements and improvements. - Training, guiding, and mentoring teammates on SRE practices and principles. - Developing strategies to ensure infrastructure scalability and elasticity, along with code-level debugging for escalated issues. To be successful in this role, you should have: - A minimum of 8 years of experience as a platform architect with expertise in containers, deployment architecture, benchmarking, design, and network engineering. - At least 4 years of combined experience in DevOps, SRE, Systems, and/or software development roles. - Hands-on experience in establishing and maturing SRE practices, programs, and roadmaps. - Extensive knowledge of public cloud technologies, particularly Azure, and cloud-native architectures. - Proficiency in Infrastructure-as-Code (IAC), DevOps, and CI/CD practices and tools like Terraform, Gitlab, ArgoCD, and Jenkins. - Familiarity with configuration management tools such as Ansible, Chef, and Packer. - Expertise in container technology and orchestration, including Kubernetes and Docker. - Experience with Observability and Monitoring practices and tools like OpenTelemetry, New Relic, Prometheus, Grafana, and more. - Deep understanding of microservice architectures, application servers, networks, and databases. - Excellent grasp of scalability processes and techniques. - Strong communication and collaboration skills, with the ability to understand and improve complex systems. In summary, this role requires a dedicated professional with a strong technical background, a proactive approach to problem-solving, and a commitment to enhancing reliability and efficiency across cloud platforms. If you are someone who thrives in a dynamic and collaborative environment, excels in technical challenges, and is passionate about driving continuous improvement, this opportunity at Inspire may be the perfect fit for you.,

Mock Interview

Practice Video Interview with JobPe AI

Start DevOps Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now

RecommendedJobs for You