Posted:1 week ago| Platform:
Work from Office
Full Time
Nexthink is looking for a Lead Site Reliability Engineer who is passionate about building and running a high-performance cloud platform and enabling best-in-class site reliability and operations practices. This role will support Nexthink operations globally. The candidate will drive the development of modern, cloud-native SRE processes and the management and operations for Nexthink s multi-tenant, microservices-based cloud platform. The platform has multiple instances deployed across the globe. This role involves working closely with cross-functional teams to integrate reliability and security into our systems, ensuring they meet standards. The ideal candidate will have extensive experience in both software engineering and systems administration, with a strong understanding of SRE concepts, requirements and security practices. Leadership and Team Management: Lead, mentor, and develop a team of India-based Site Reliability Engineers. Foster a culture of continuous improvement, collaboration, and innovation. Infrastructure Management: Oversee the design, deployment, and management of scalable and secure cloud infrastructure. Drive automation of infrastructure provisioning, configuration, and management using Infrastructure as Code (IaC) tools. Monitoring and Performance: Develop and maintain comprehensive monitoring, logging, and alerting systems to ensure high availability and performance. Lead efforts in performance tuning and optimization for applications and infrastructure. Security and Compliance: Ensure implementation and maintenance of security controls and best practices to achieve compliance with standards and certifications. Conduct and oversee regular security assessments, vulnerability scans, and penetration testing. Collaborate with the compliance team to prepare for and respond to audits. Incident Management: Lead incident management efforts, ensuring rapid resolution and thorough root cause analysis. Develop and implement strategies for improving incident response and minimizing downtime. Collaboration and Communication: Work closely with development, operations, and security teams to integrate reliability and security into the software development lifecycle. Communicate effectively with stakeholders, providing regular updates on system performance, reliability, and compliance status. Bachelor s degree in Computer Science, Engineering, or a related field (or equivalent experience). 5+ years of experience in site reliability engineering, DevOps, or a related role, with at least 2 years in a leadership position.
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Bengaluru
8.0 - 12.0 Lacs P.A.
1.0 - 5.0 Lacs P.A.
7.0 - 10.0 Lacs P.A.
18.0 - 30.0 Lacs P.A.
7.0 - 11.0 Lacs P.A.
5.0 - 9.0 Lacs P.A.