Jobs

Interviews
Job Alerts
Tools

Upskill and Grow with AI

Mock Interview Practice interviews in realistic simulations

Coding Practice Improve your coding skills with challenges

Certification Earn certifications to validate your skills

AI Learning Get trained with AI expert sessions

Career Path AI insights for smarter career decisions

AI Job Match Score AI-Powered Job Match Against Your Resume and Optimize Your Resume

Career Tools and Resources

Resume Builder Build Professional Resume with Ease

ATS Friendliness Check Check Resume Friendliness for Applicant Tracking Systems

Auto Apply Apply to hundreds of jobs on any platform effortlessly

Co-Pilot (Chrome Extension) Your AI Assistant for Seamless Browsing Efficiency

Interview Questions Streamline interviews with ready-to-use questions

Salaries Discover market-driven salary insights across skillsets and geographies

Companies Explore leading companies actively hiring talent
For Employers

Home
>
Jobs in bangalore
>
Taggd
>
SRE Administrator

SRE Administrator

Taggd

8 - 12 years

0 Lacs

bangalore

Posted:1 day ago| Platform: Shine logo

Apply

Skills Required

language itsm orchestration management monitoring infrastructure incident container platforms cases itil cloud coding tools code ai/ml as use ci/cd

Work Mode

On-site

Job Type

Full Time

Job Description

Job Title

Operations Engineer / Site Reliability Engineer (SRE)

Job Summary

We are looking for a dedicated and skilled Operations Engineer (SRE) to ensure the reliability, scalability, and performance of our enterprise systems and applications. In this hybrid role, you will blend software engineering and IT operations to build automated solutions for operational challenges, improve system health, minimize manual effort, and support continuous delivery. You will play a key role in monitoring, maintaining, and improving production infrastructure, and ensuring stable, high-quality service delivery.

Key Responsibilities

System Reliability & Infrastructure Stability
- Ensure high availability, performance, and stability of applications and infrastructure: servers, services, databases, network and other core components.
- Design, build, and maintain fault-tolerant, highly-available, and scalable infrastructure.
- Define, implement and monitor Service Level Objectives (SLOs) / Service Level Indicators (SLIs) / SLAs to measure reliability, performance, latency, error rates, uptime, etc.
Monitoring, Alerting & Observability
- Implement and maintain robust monitoring, logging and alerting systems for infrastructure and applications to proactively detect issues before they impact users.
- Build dashboards and observability tooling to track system health metrics (latency, error rates, resource usage, throughput, etc.).
- Set alert thresholds and alerting workflows for critical infrastructure components and services.
Incident Management & Response
- Lead incident response for system outages or performance degradation: triage issues, coordinate with relevant teams, mitigate impact, restore service.
- Perform root-cause analysis (RCA) and post-incident reviews to understand failures and identify permanent fixes/preventive measures.
- Maintain incident runbooks, playbooks and documentation to support consistent and efficient incident handling.
Automation & Toil Reduction
- Automate routine operational tasks deployment, configuration, infrastructure provisioning, scaling, backups, recovery, etc. to minimize manual intervention and reduce errors.
- Develop and maintain Infrastructure-as-Code (IaC), configuration management, and automated deployment/CI-CD pipelines.
- Build internal tools or scripts to streamline operations, monitoring, alerting, deployments, and recovery.
Performance Optimization & Capacity Planning
- Monitor system performance, resource utilization, load, and growth trends to plan capacity and scaling requirements proactively.
- Optimize infrastructure, services, and configurations for performance, cost-efficiency, fault tolerance, and scalability.
- Collaborate with development teams to design and deploy systems with reliability and scalability in mind.
Collaboration & DevOps Integration
- Work closely with development, QA, and product teams to support deployments, ensure operability of applications, and incorporate reliability practices into development lifecycle.
- Provide feedback on system design, performance, and operational best practices to help build reliable, maintainable systems.
- Contribute to documentation system architecture, runbooks, troubleshooting guides, and standard operating procedures (SOPs).
Security, Compliance & Disaster Recovery
- Ensure infrastructure security, compliance, and follow best practices in configuration, access control, backups, and disaster-recovery planning.
- Plan and test disaster recovery and backup strategies to ensure business continuity.

Qualifications & Skills

Bachelors degree in Computer Science, Information Technology, Engineering, or a related field (or equivalent experience).
Proven experience in SRE, system operations, infrastructure engineering, or related roles managing production-grade systems.
Strong scripting/programming skills (e.g., Python, Bash, Go, etc.) to build automation tools and operational scripts.
Experience with cloud platforms (AWS, GCP, Azure) or on-prem infrastructure; familiarity with containerization/orchestration (e.g., Docker, Kubernetes) is a plus.
Familiarity with monitoring / observability tools, logging, metrics, dashboards, alerting frameworks.
Strong understanding of Linux/Unix systems, networking, load balancing, redundancy, failover, and system architecture.
Good problem-solving, troubleshooting, root-cause analysis skills, with ability to diagnose, mitigate and resolve critical production issues.
Experience or comfort with CI/CD pipelines, Infrastructure-as-Code (IaC), configuration management, automated deployments.
Excellent collaboration and communication skills ability to work across teams (development, QA, operations) and coordinate under pressure.
Proactive mindset, commitment to reliability, operational excellence, automation, and continuous improvement.

More Jobs at Taggd

Human Resources Business Partner

Mumbai, Maharashtra, India

Experience: Not specified

Salary: Not disclosed

Trade Finance Specialist

Surat, Gujarat, India

5 - 8 yrs

Salary: Not disclosed

Production Planning Control

Pune, Maharashtra, India

Experience: Not specified

Salary: Not disclosed

Plant HR

Pune, Maharashtra, India

Experience: Not specified

Salary: Not disclosed

Marine Pilot

Surat, Gujarat, India

Experience: Not specified

Salary: Not disclosed

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now

Taggd

RecommendedJobs for You

SRE Administrator

Taggd

bangalore

SRE Administrator

Taggd

bangalore

Login to

Please Verify Your Phone or Email

Confirm Action

SRE Administrator