Site Reliability Engineering Manager

22 years

0 Lacs

Posted:2 days ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

About the Company


Trianz believes that companies around the world face three challenges in their digital transformation journeys - shrinking ‘time to transform’ due to competition & AI, lack of digital-ready talent, and uncertain economic conditions. To help clients leapfrog over these challenges, Trianz has built IP and platforms that have transformed the adoption of the cloud, data, analytics & insights AI. Specifically, the following Trianz platforms are changing the way companies approach transformations in various disciplines:


  • Concierto:

    A fully automated platform to Migrate, Manage, and Maximize the multi & hybrid cloud. A zero code and SaaS platform, Concierto allows teams to migrate to AWS, Azure and GCP and manage them efficiently from a single pane of glass. Visit www.concierto.cloud for more information.
  • Concierto Insights & Agentic AI:

    Built on the concept of ‘federated or distributed data’, Extrica revolutionizes how users access data anywhere in the company’s ecosystems; productizes data and makes it available in a Netflix like user experience while delivering BI and AI powered insights. Visit www.extrica.io for more.
  • Pulse:

    Recognizing that workforces will be distributed, mobile, and fluid, Trianz has built a ‘future of work’ digital workplace platform called Pulse. Visit www.trianz.com/Pulse.


About the Role


We are seeking a seasoned infrastructure leader to own and evolve our AWS cloud platform—the foundation that powers our business 24/7. In this role, you will lead a high-performing team of Cloud Ops and SRE engineers, driving operational excellence while shaping our cloud architecture strategy and security posture for scale. This role goes beyond operations; you will influence how we build, secure, and run our infrastructure, bridging the gap between reliability, innovation, and security. If you thrive on building resilient systems, mentoring technical teams, and making strategic architecture decisions that impact the entire organization, this is the role for you.


Responsibilities


  • Operational Excellence at Scale:

  • Lead a unified CloudOps/SRE team across L1/L2/L3 support, ensuring seamless 24x7 operations through structured shift rotations and escalation frameworks.
  • Drive incident management excellence—from first response to root cause analysis and continuous improvement.
  • Maintain and exceed operational KPIs: MTTA, MTTR, uptime SLAs, and availability objectives.
  • Oversee day-to-day operations across our AWS footprint: EC2, VPC, ELB/ALB, EKS/ECS, RDS/Aurora, S3, IAM, Lambda, CloudFront, and CloudWatch.
  • Architecture Leadership & Platform Evolution:

  • Provide architectural oversight for production workloads, guiding teams on scalable, cost-optimized, and secure AWS designs.
  • Review and approve architecture patterns, deployment topologies, and infrastructure standards.
  • Partner with Cloud Architects to establish guardrails, reference architectures, and reusable Infrastructure-as-Code modules.
  • Create feedback loops where operational insights directly influence design decisions—ensuring we build for observability, resilience, and efficiency.
  • Champion modernization initiatives: containerization, serverless adoption, and edge optimization strategies.
  • Security Posture & Compliance:

  • Own cloud security governance across IAM, network segmentation, encryption, logging, and compliance.
  • Drive continuous security monitoring using AWS Security Hub, GuardDuty, IAM Access Analyzer, Config, Inspector, and third-party CSPM tools.
  • Ensure automated remediation for vulnerabilities, misconfigurations, and security baseline drift.
  • Maintain compliance with SOC2, ISO27001, CIS Benchmarks, and customer-specific security requirements.
  • Lead operational security hygiene: identity lifecycle management, least privilege enforcement, secrets management, and patch compliance.
  • Coordinate cloud security incident response with tight CloudOps-SecOps integration.
  • Automation & Tooling Strategy:

  • Drive automation and tooling adoption across Monitoring & Observability (CloudWatch, Elastic Stack, distributed tracing), Logging & Analytics (CloudWatch Logs, ELK, OpenSearch), ITSM (ServiceNow, Jira Service Management), and IaC & Automation (CloudFormation, Terraform, Python, Shell scripting, GitOps workflows).
  • Build self-healing operations through automated provisioning, scaling, failover, and compliance checking.
  • Governance & Continuous Improvement:

  • Establish and refine operational playbooks, runbooks, SOPs, and change control frameworks.
  • Implement ITIL-aligned processes for change, problem, and incident management.
  • Drive continuous improvement through automation, operational analytics, and team feedback loops.
  • Strategic Partnership & Communication:

  • Collaborate with engineering, architecture, security, DevOps, and product teams to maintain platform reliability.
  • Provide executive-level insights on operational health, incident trends, risks, and improvement opportunities.
  • Influence business continuity planning, cloud cost governance, and infrastructure roadmap.


Qualifications


  • 17–22 years in infrastructure/operations with at least 8 years leading cloud or production operations teams.
  • Proven track record managing 24x7 support teams of engineers in high-availability AWS environments.
  • Experience scaling teams and operations while maintaining quality and reliability.

Required Skills


  • Deep knowledge of AWS architecture, networking, security, and distributed systems design.
  • Strong understanding of cloud security posture management, identity governance, and compliance

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now
Trianz logo
Trianz

Consulting / IT Services

Irvine

RecommendedJobs for You