Get alerts for new jobs matching your selected skills, preferred locations, and experience range. Manage Job Alerts
5.0 - 9.0 years
0 Lacs
karnataka
On-site
As a Site Reliability Engineer (SRE) on the Network Assurance Data Platform team at Cisco ThousandEyes, your role is crucial in ensuring the reliability, scalability, and security of our cloud and big data platforms. You will be responsible for designing, building, and maintaining systems operating at multi-region scale, collaborating with cross-functional teams to support machine learning (ML) and AI initiatives. Your primary responsibilities will include designing, building, and optimizing cloud and data infrastructure to guarantee high availability, reliability, and scalability of big-data and ML/AI systems. You will implement Site Reliability Engineering (SRE) principles such as monitoring, alerting, error budgets, and fault analysis. Close collaboration with development, product management, and security teams is essential to create secure, scalable solutions that enhance operational efficiency through automation. Troubleshooting complex technical issues in production environments, conducting root cause analyses, and contributing to continuous improvement efforts will be part of your daily tasks. Furthermore, you will have the opportunity to shape the technical strategy and roadmap of the team, balancing immediate needs with long-term goals. Mentoring peers and fostering a culture of learning and technical excellence will also be a key aspect of your role. To excel in this position, you should demonstrate the ability to design and implement scalable solutions with a focus on streamlining operations. Strong hands-on experience in cloud technologies, preferably AWS, is required, along with skills in Infrastructure as Code, specifically with Terraform and Kubernetes. Previous experience in AWS cost management and an understanding of Prometheus and its ecosystem, including Alertmanager, are preferred. Proficiency in writing high-quality code in languages like Python, Go, or equivalent is essential. Additionally, a good understanding of Unix/Linux systems, the kernel, system libraries, file systems, and client-server protocols is expected. Experience in building cloud, big data, and/or ML/AI infrastructure such as EMR, Airflow, Comet ML, AWS SageMaker, Spark, etc., would be a bonus. At Cisco, we value diversity and believe that diverse teams are better equipped to solve problems, innovate, and create a positive impact. We welcome candidates from all backgrounds and encourage you to apply even if you do not meet every single qualification listed. Research shows that individuals from underrepresented groups may doubt the strength of their candidacy, but we believe that everyone has something valuable to offer. Join us in our mission to unlock potential and drive innovation in the digital assurance space.,
Posted 3 days ago
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
We have sent an OTP to your contact. Please enter it below to verify.
Accenture
64580 Jobs | Dublin
Wipro
25801 Jobs | Bengaluru
Accenture in India
21267 Jobs | Dublin 2
EY
19320 Jobs | London
Uplers
13908 Jobs | Ahmedabad
Bajaj Finserv
13382 Jobs |
IBM
13114 Jobs | Armonk
Accenture services Pvt Ltd
12227 Jobs |
Amazon
12149 Jobs | Seattle,WA
Oracle
11546 Jobs | Redwood City