Site Reliability Engineering

7 - 12 years

15 - 30 Lacs

Bengaluru

Posted:6 days ago| Platform: Naukri logo

Apply

Work Mode

Hybrid

Job Type

Full Time

Job Description

Required Skills:

  • Extensive experience in cloud architecture and management with AWS (including services such as EC2, S3, RDS, Lambda, CloudFormation)
  • Strong expertise in Site Reliability Engineering (SRE) principles, including automation, observability, and incident management
  • Proficiency in scripting and automation using Python, Shell scripting, or similar tools
  • Experience with infrastructure-as-code (IaC) tools such as CloudFormation, Terraform, or similar
  • Knowledge of containerization (Docker) and orchestration (Kubernetes)
  • Familiarity with monitoring, logging, and alerting tools such as CloudWatch, Prometheus, Grafana, ELK Stack, or Splunk
  • Strong understanding of system security best practices, threat detection, vulnerability management, and compliance standards

Preferred Skills:

  • Support experience with automation/configuration management tools like Ansible, Chef, or Puppet
  • Knowledge of CI/CD pipelines, Jenkins, GitLab CI, or Azure DevOps
  • Experience with microservices architecture and cloud-native design patterns
  • Familiarity with compliance standards such as ISO, SOC2, or GDPR

Overall Responsibilities

  • Lead the end-to-end management of enterprise systems environments, ensuring high availability, scalability, and security
  • Architect and implement cloud-based solutions, leveraging AWS services and best practices in cloud security and cost optimization
  • Drive automation initiatives to improve operational efficiency, incident response, and system reliability
  • Manage system health, conduct proactive monitoring, and perform capacity planning and upgrades
  • Oversee incident response, root cause analysis, and problem resolution to ensure continuous service delivery
  • Develop and implement security controls, vulnerability assessments, and compliance procedures
  • Mentor and develop technical teams, sharing knowledge on cloud technologies, SRE practices, and automation strategies
  • Collaborate with business and technology teams to plan future system enhancements and migrations
  • Maintain comprehensive documentation of architecture, configurations, runbooks, and operational procedures
  • Lead service continuity testing, penetration testing, and vulnerability management programs to meet regulatory and security standards

Technical Skills (By Category)

Cloud Architecture & Services:

  • Required: Deep expertise in AWS core services (EC2, S3, RDS, Lambda, CloudFormation)
  • Preferred: Multi-cloud experience (Azure, GCP), serverless architectures, and advanced cloud security implementation

SRE & Automation:

  • Required: Automation of deployment, scaling, and incident response processes
  • Preferred: Monitoring with Prometheus, Grafana, ELK Stack, or Splunk; scripting using Python and Shell

Containerization & Orchestration:

  • Required: Docker containerization; Kubernetes for orchestration and managing microservices
  • Preferred: Helm charts, service mesh tools like Istio, and advanced deployment strategies

Security & Compliance:

  • Required: Implementation of security best practices, vulnerability management, and threat detection
  • Preferred: Experience with security audits, compliance frameworks, and encryption standards

Experience Requirements

  • 10-12 years of proven experience in cloud infrastructure, site reliability, or enterprise systems management
  • Extensive experience designing, deploying, and managing AWS cloud architectures at scale
  • Strong background in SRE principles, automation, and incident management
  • Demonstrated leadership in managing cross-functional teams and guiding best practices in cloud operations
  • Experience with compliance, security, and vulnerability management (ISO, SOC2, GDPR effective practices)
  • Industry domain background in finance, banking, or fintech is highly beneficial but not mandatory

Day-to-Day Activities

  • Architect, deploy, and manage cloud-based enterprise systems ensuring their high availability and resilience
  • Automate system provisioning, scaling, and incident responses to improve SLAs and reduce manual intervention
  • Monitor system health metrics, conduct capacity planning, and optimize resource utilization
  • Lead root cause analysis efforts, manage incident responses, and implement preventive measures
  • Conduct vulnerability assessments, coordinate penetration testing, and implement security controls
  • Collaborate with development teams to incorporate security and reliability best practices into deployment pipelines
  • Lead service continuity testing, disaster recovery planning, and compliance audits
  • Develop and maintain operational documentation, runbooks, and automation scripts
  • Mentor technical staff, promote a culture of continuous improvement, and share knowledge industry-wide

Qualifications

  • Bachelors or Masters degree in Computer Science, Engineering, or related field
  • Certifications in AWS (e.g., AWS Solutions Architect, DevOps Engineer) and security standards (e.g., CISSP, CISA) are preferred
  • Extensive hands-on experience in cloud architecture, SRE practices, automation, and security for large enterprise systems

Professional Competencies

  • Strong analytical and problem-solving skills
  • Excellent leadership and mentorship abilities
  • Effective communication skills across technical and non-technical stakeholders
  • Strategic thinking with a focus on operational excellence, security, and cost efficiency
  • Adaptability to evolving technology stacks, industry standards, and regulatory requirements
  • Proactive and solution-oriented mindset, with a focus on continuous improvement

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now
coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now
Synechron logo
Synechron

Information Technology and Services

New York

RecommendedJobs for You