Job
Description
Job Title: Site Reliability Engineer
Department: Engineering / Infrastructure
Reports To: SRE Manager / DevOps Lead
Location: Bangalore, India
Role Summary
The Site Reliability Engineer (SRE) will be responsible for ensuring the availability, performance, and scalability of critical systems. This role involves managing CI/CD pipelines, monitoring production environments, automating operations, and driving platform reliability improvements in collaboration with development and infrastructure teams.
Key ResponsibilitiesManage alerts and monitoring of critical production systems.Operate and enhance CI/CD pipelines and improve deployment and rollback strategies.Work with central platform teams on reliability initiatives.Automate testing, regression, and build tooling for operational efficiency.Execute NFR testing on production systems.Plan and implement Debian version migrations with minimal disruption.
Required Qualifications & Skills
CI/CD and Packaging Tools:Hands-on experience with Jenkins, Docker, JFrog for packaging and deployment.
Operating System Expertise:Experience in Debian OS migration and upgrade processes.
Monitoring Systems:Knowledge of Grafana, Nagios, and other observability tools.
Configuration Management:Proficiency with Ansible, Puppet, or Chef.
Version Control:Working knowledge of Git and related version control systems.
Kubernetes:Deep understanding of Kubernetes architecture, deployment pipelines, and debugging.Ability to deploy components with detailed insights into:Configuration parameters and system requirementsMonitoring and alerting needsPerformance tuningDesigning for high availability and fault tolerance
Networking:Understanding of TCP/IP, UDP, Multicast, Broadcast.Experience with TCPDump, Wireshark for network diagnostics.
Linux & Databases:Strong skills in Linux tools and scripting.Familiarity with MySQL and NoSQL database systems.
Soft SkillsStrong problem-solving and analytical skillsEffective communication and collaboration with cross-functional teamsOwnership mindset and accountabilityAdaptability to fast-paced and dynamic environmentsDetail-oriented and proactive approach
Preferred QualificationsBachelors degree in Computer Science, Engineering, or related technical fieldCertifications in Kubernetes (CKA/CKAD), Linux, or DevOps practicesExperience with cloud platforms (AWS, GCP, Azure)Exposure to service mesh, observability stacks, or SRE toolkits
Key Relationships
Internal: DevOps, Infrastructure, Software Development, QA, Security Teams
External: Tool vendors, platform service providers (if applicable)
Role DimensionsImpact on uptime and reliability of business-critical servicesOwnership of CI/CD and production deployment processesContributor to cross-team reliability and scalability initiatives
Success Measures (KPIs)System uptime and availability (SLA adherence)Mean Time to Detect (MTTD) and Mean Time to Resolve (MTTR) incidentsDeployment success rate and rollback frequencyAutomation coverage of operational tasksCompletion of OS migration and infrastructure upgrade projects
Competency Framework Alignment
Technical Mastery: Infrastructure, automation, CI/CD, Kubernetes, monitoring
Execution Excellence: Timely project delivery, process improvements
Collaboration: Cross-functional team engagement and support
Resilience: Problem solving under pressure and incident response
Innovation: Continuous improvement of operational reliability and performance