Senior Site Reliability Engineer

6 - 10 years

30.0 - 33.0 Lacs P.A.

Chennai, Pune, Delhi, Mumbai, Bengaluru, Hyderabad, Kolkata

Posted:2 months ago| Platform: Naukri logo

Apply Now

Skills Required

Patch managementAutomationLinuxDisaster recoveryNetwork securityOracleTroubleshootingRelease managementMonitoringIdentity management

Work Mode

Work from Office

Job Type

Full Time

Job Description

1. Cloud Infrastructure Management Design, deploy, and maintain highly available, scalable, and secure cloud environments in Oracle Cloud Infrastructure (OCI) and AWS . Optimize cloud infrastructure for performance, cost, and security . Manage multi-cloud and hybrid cloud architectures , ensuring seamless integration. 2. Release Management Automation Oversee and streamline software release processes , ensuring minimal downtime. Develop and manage CI/CD pipelines for efficient code deployment and rollback. Automate infrastructure provisioning using Infrastructure-as-Code (IaC) tools (Terraform, Ansible, CloudFormation). 3. Incident Response Root Cause Analysis (RCA) Serve as L3 escalation for complex cloud infrastructure issues. Conduct real-time troubleshooting and resolve critical system outages. Perform root cause analysis (RCA) and implement long-term fixes to prevent recurrence. 4. Monitoring, Observability Performance Optimization Implement and manage monitoring, logging, and alerting tools (Prometheus, Grafana, Splunk, ELK). Optimize cloud performance through proactive capacity planning and tuning . Ensure SLAs, SLOs, and SLIs are met by continuously improving system reliability. 5. Security, Compliance Best Practices Implement cloud security best practices , identity management, and network security controls. Ensure compliance with industry standards (SOC2, ISO 27001, GDPR, HIPAA) . Perform regular security audits, vulnerability assessments, and patch management . 6. Disaster Recovery Business Continuity Design and maintain disaster recovery (DR) and backup strategies . Conduct regular failover testing and drills to validate system resilience. 7. Collaboration Knowledge Sharing Work closely with development, DevOps, and security teams to improve system reliability. Document operational procedures, RCA reports, and best practices. Mentor and guide junior engineers on cloud technologies and SRE best practices. 8. Continuous Improvement Innovation Evaluate and implement emerging cloud technologies to enhance reliability. Identify areas for automation and operational efficiency improvements .

Information Technology
Redwood City

RecommendedJobs for You

Chennai, Pune, Delhi, Mumbai, Bengaluru, Hyderabad, Kolkata

Pune, Bengaluru, Mumbai (All Areas)

Chennai, Pune, Delhi, Mumbai, Bengaluru, Hyderabad, Kolkata

Bengaluru, Hyderabad, Mumbai (All Areas)

Hyderabad, Gurgaon, Mumbai (All Areas)