Jobs

Interviews
Job Alerts
Tools

Upskill and Grow with AI

Mock Interview Practice interviews in realistic simulations

Coding Practice Improve your coding skills with challenges

Certification Earn certifications to validate your skills

AI Learning Get trained with AI expert sessions

Career Path AI insights for smarter career decisions

AI Job Match Score AI-Powered Job Match Against Your Resume and Optimize Your Resume

Career Tools and Resources

Resume Builder Build Professional Resume with Ease

ATS Friendliness Check Check Resume Friendliness for Applicant Tracking Systems

Auto Apply Apply to hundreds of jobs on any platform effortlessly

Co-Pilot (Chrome Extension) Your AI Assistant for Seamless Browsing Efficiency

Interview Questions Streamline interviews with ready-to-use questions

Salaries Discover market-driven salary insights across skillsets and geographies

Companies Explore leading companies actively hiring talent
For Employers

Home
>
Jobs in bengaluru
>
Thoughtspot
>
Cloud Reliability Engineer II

Cloud Reliability Engineer II

Thoughtspot

2 - 5 years

11 - 15 Lacs

bengaluru

Posted:6 days ago| Platform:

Apply

Skills Required

change management automation operational excellence problem management incident management continuous improvement operations information technology analytics

Work Mode

Work from Office

Job Type

Full Time

Job Description

We are seeking a dedicated Cloud Reliability Engineer to champion the reliability, availability, and security of our production SaaS platform. In this role, you will act as the first line of defense for cloud infrastructure, balancing your time between core production day to day operations such as incident management, change management, monitoring, and triage and automation to reduce operational toil. You will play a pivotal role in maintaining customer trust by strictly adhering to SLAs and compliance processes while driving continuous improvement through code.

What you'll Do :

Operational Excellence & Incident Management

Monitoring & Triage: Proactively monitor cloud infrastructure health to ensure high availability and performance. Act as the primary owner for production alert monitoring, triage, and swift resolution.
Incident Response: Manage critical incidents and escalations from identification to resolution. Lead root cause analysis (RCA) and post-incident reviews to minimize Mean Time To Recovery (MTTR) and prevent recurrence.
Change & Release Management: Execute and track production upgrades, multi-tenant deployments, and change requests within defined SLAs, ensuring zero-downtime maintenance where possible.
Escalation Support: Handle escalated Support cases and provide infrastructure support for field teams and other environments.
24/7 Availability: Participate in a shift-based schedule and on-call rotation to provide round-the-clock support for critical production systems.

Automation & Continuous Improvement

Task Automation: Utilize Python and Jenkins to script and automate repetitive operational tasks, reducing manual intervention and increasing efficiency.
Tooling Optimization: Assist in maintaining and optimizing monitoring, alerting, and CI/CD tools to streamline workflows.
Process Evolution: Identify opportunities to shift left on operations, transforming manual runbooks into automated self-healing mechanisms over time.

What You Bring :

2-5 years of professional experience in Cloud Operations, Site Reliability Engineering (SRE), or K8s administration.
Hands-on experience with public cloud platforms ( AWS, GCP, or Azure ) in a production environment.
Operational knowledge of Kubernetes (EKS, GKE, or AKS ) including troubleshooting and cluster management.
Moderate proficiency in scripting and automation , specifically using Python and Jenkins .
Strong understanding of ITIL processes (Incident, Change, Problem Management) .
Demonstrated ability to prioritize tasks under pressure while maintaining strict SLAs.
Excellent collaboration skills to work effectively with Engineering, Product, and Support teams .
bachelors degree in Computer Science, Information Technology, or equivalent work experience.

Preferred Skills :

Experience with Infrastructure as Code (IaC) tools such as Terraform, Ansible, or CloudFormation.
Familiarity with cloud-native observability tools (eg, CloudWatch, Stackdriver, Prometheus, Grafana).
Strong Linux system administration and networking troubleshooting skills.
Background in supporting enterprise-grade SaaS platforms with strict compliance and security requirements.

Working Conditions :

Shift-Based Role: This position requires working in defined shifts to ensure global coverage.
On-Call: Regular participation in an on-call rotation is required.
Environment: Fast-paced, collaborative, and process-oriented environment with a strong focus on production stability.

More Jobs at Thoughtspot

Member of Technical Staff

Hyderabad

1 - 4 yrs

INR 3 - 6 Lacs

Staff Engineer - DevSecOps

Hyderabad

10 - 15 yrs

INR 35 - 40 Lacs

Lead Xactly Technical Analyst

Bengaluru

5.0 - 10.0 yrs

INR 25 - 30 Lacs

Senior Sales Compensation Analyst

Bengaluru

5.0 - 10.0 yrs

INR 8 - 12 Lacs

Principal Product Manager

Bengaluru

8.0 - 13.0 yrs

INR 20 - 25 Lacs

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now

Thoughtspot

Software Development

Mountain View California

Login to

Please Verify Your Phone or Email

Confirm Action

Cloud Reliability Engineer II