SRE Architect

12 - 20 years

30.0 - 40.0 Lacs P.A.

Pune, Bengaluru, Noida

Posted:2 months ago| Platform: Naukri logo

Apply Now

Skills Required

ArchitectureTerraformSre

Work Mode

Hybrid

Job Type

Full Time

Job Description

Role & responsibilities Core Skills 12 to 14 years of experience in Site Reliability Engineering, DevOps, or a related field, with at least 3 years in a senior or architect-level role. Strong expertise in system architecture, distributed systems, cloud computing (e.g., AWS, Azure, GCP), containerization (e.g., Docker, Kubernetes), and infrastructure as code (e.g., Terraform, Ansible). 1. Proficiency in one or more programming/scripting languages (e.g., Python, Groovy, Shell, Powershell or similar). 2. Strong background of DevOps practices, Cloud Technologies in ensuring scalability, reliability and security of Cloud infrastructure 3. Experience with monitoring and observability tools (e.g., Dynatrace, Prometheus, Grafana, ELK stack, Datadog). 4. Experience in integrating SRE with backend technologies like databases, messaging systems, etc. Strong understanding of software engineering principles and practices 5. 6. Deep understanding of incident management, root cause analysis, and post-incident review processes. Involvement in setting strategic direction for SRE practices, leading technical initiatives, and promoting a culture of excellence in site reliability engineering. 7. Excellent problem-solving and communication skills and ability to work collaboratively in a fast-paced and dynamic environment. 8. 9. Proven ability to lead technical projects, influence cross-functional teams, and drive change. Excellent verbal and written communication skills, with the ability to articulate complex technical concepts to both technical and non-technical audiences. 10. Certifications in relevant technologies like Cloud certified DevOps Architect, Cloud Operations Support Architect etc. 11. Key Responsibilities: Architecting Systems: Design and architect highly available, scalable, and resilient systems to meet the demands of our growing user base and evolving business needs. • Reliability Engineering: Develop and implement strategies to improve system reliability, including incident management, monitoring, and automated remediation. • Performance Optimization: Identify and address performance bottlenecks, optimize system performance, and ensure efficient resource utilization. • Collaboration: Partner with development teams, product managers, and other stakeholders to integrate SRE practices into the development lifecycle and ensure alignment with business objectives. • Automation: Drive automation initiatives to reduce manual intervention, increase efficiency, and improve system reliability. • Incident Management: Lead post-incident reviews, root cause analysis, and develop strategies for preventing future incidents . • Best Practices: Establish and enforce best practices for system design, monitoring, and incident management. • Mentorship: Provide guidance and mentorship to junior SREs and engineering teams on SRE principles and practices. • Qualifications: Experience: 8+ years of experience in Site Reliability Engineering, DevOps, or a related field, with at least 3 years in a senior or architect-level role. • Technical Skills: SProgramming: Proficiency in one or more programming languages (e.g., Python, Go, Java, or similar). • • Monitoring Tools: Experience with monitoring and observability tools (e.g., Prometheus, Grafana, ELK stack, Datadog). Incident Response: Leadership: Proven ability to lead technical projects, influence cross-functional teams, and drive change. • • Communication: Preferred Qualifications: Certifications: Relevant certifications (e.g., AWS Certified Solutions Architect, Google Professional Cloud Architect) are a plus. • • Experience: Previous experience in high-growth or high-availability environments Send your application on sneha.chhabria@infogain.com

IT Services and IT Consulting
Los Gatos CA +32

RecommendedJobs for You

Hyderabad, Bengaluru

Pune, Bengaluru, Noida