Senior Site Reliability Engineer, Network Assurance Data Platform

Cisco ThousandEyes

5 - 9 years

0 Lacs

karnataka

Posted:4 months ago| Platform: Shine logo

Apply

Skills Required

aws kubernetes python go terraform prometheus unixlinux systems

Work Mode

On-site

Job Type

Full Time

Job Description

As a Senior Site Reliability Engineer (SRE) on the Network Assurance Data Platform team at Cisco ThousandEyes, you will be responsible for ensuring the reliability, scalability, and security of our cloud and big data platforms. Collaborating with cross-functional teams, including software development, product management, and security, you will design, build, and maintain systems operating at multi-region scale. Your efforts will directly impact the success of our machine learning (ML) and AI initiatives by guaranteeing that the underlying infrastructure is robust, efficient, and aligned with operational excellence. Your main responsibilities will include designing, building, and optimizing cloud and data infrastructure to ensure high availability, reliability, and scalability of big-data and ML/AI systems. You will implement Site Reliability Engineering principles such as monitoring, alerting, error budgets, and fault analysis. Working closely with development, product management, and security teams, you will develop secure, scalable solutions that support ML/AI workloads and enhance operational efficiency through automation. Troubleshooting complex technical issues in production environments, performing root cause analyses, and contributing to continuous improvement efforts will also be part of your role. You will help shape the team's technical strategy and roadmap, balancing immediate needs with long-term goals, while mentoring peers and fostering a culture of learning and technical excellence. Qualifications for this role include the ability to design and implement scalable and well-tested solutions with a focus on streamlining operations. Strong hands-on experience in cloud services, preferably AWS, and Infrastructure as Code skills, ideally with Terraform and Kubernetes, are required. Previous experience in AWS cost management, understanding of Prometheus and its ecosystem, and the ability to write high-quality code in Python, Go, or equivalent languages are essential. A good understanding of Unix/Linux systems, the kernel, system libraries, file systems, and client-server protocols is expected. Experience in building cloud, big data, and/or ML/AI infrastructure (e.g., EMR, Airflow, Comet ML, AWS SageMaker, Spark, etc) would be a bonus. Cisco values diversity in its employees and believes that diverse teams are better equipped to solve problems, innovate, and create a positive impact. The company encourages candidates from all backgrounds to apply, even if they do not meet every single qualification listed. Research shows that individuals from underrepresented groups may experience imposter syndrome and doubt their candidacy strength. Cisco aims to unlock the potential in all candidates and emphasizes that everyone has something valuable to offer.,

More Jobs at Cisco ThousandEyes

ThousandEyes - Scale Specialist (India)

karnataka

3.0 - 7.0 yrs

Salary: Not disclosed

Title Site Reliability Engineering Technical Leader

karnataka

8.0 - 12.0 yrs

Salary: Not disclosed

Site Reliability Engineering Technical Leader, Network Assurance Data Platform

karnataka

8.0 - 12.0 yrs

Salary: Not disclosed

Senior Site Reliability Engineer, Network Assurance Data Platform

karnataka

5.0 - 9.0 yrs

Salary: Not disclosed

Site Reliability Engineer, Network Assurance Data Platform

karnataka

5.0 - 9.0 yrs

Salary: Not disclosed

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.