Site Reliability Engineer (SRE)

3 - 7 years

0 Lacs

Posted:4 days ago| Platform: Shine logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Role Overview: At GlobalLogic, you will be working as a Site Reliability Engineer (SRE) focusing on Platform Reliability & Operational Excellence. Your role will involve scaling a mission-critical safety and automation platform from a monolith to distributed, event-driven, and microservice-based systems. Reliability, latency, and operational efficiency are essential aspects of the platform, and you will play a key role in ensuring availability, performance, scalability, and production readiness across services. Key Responsibilities: - Define, implement, and iterate SLIs/SLOs (latency, availability, errors, saturation); operationalize error budgets and trigger corrective action. - Engineer end-to-end observability (metrics, logs, traces, events) leveraging Datadog for faster detection and root cause analysis. - Automate infrastructure using Terraform, deployment workflows, self-healing mechanisms, and progressive delivery (canary/blue-green). - Lead incident lifecycle including detection, triage, mitigation, coordination, communication, and high-quality post-incident reviews. - Build and optimize CI/CD pipelines (GitHub Actions or equivalent) with reliability, rollback safety, and change quality controls. - Perform capacity & performance engineering such as load modeling, autoscaling policies, and cost/efficiency tuning. - Reduce toil through tooling, runbooks, proactive failure analysis, and chaos/fault injection. - Partner with development teams on architectural reviews, production readiness, and security. - Enforce least-privilege, secrets management, and infrastructure security practices. - Improve alert quality to lower MTTR and fatigue, and champion reliability patterns like backpressure and circuit breaking. - Support distributed systems debugging with emphasis on AI and contribute to governance of change management and release safety. - Document playbooks, escalation paths, and evolving reliability standards, treating reliability as a product with roadmap and continuous improvement. Preferred Qualifications: - 3+ years in SRE/Production Engineering/DevOps. - Proficiency in one or more programming languages: Go, Python, TypeScript/Node.js, or Ruby. - Strong understanding of Linux internals, networking fundamentals, Infrastructure as Code (Terraform), GitOps workflows, containers & orchestration, and observability tools. - Experience in CI/CD design, incident management, distributed systems failure modes, chaos/fault injection, performance/load testing, and a degree in Computer Science, Engineering, or equivalent. Additional Details: - At GlobalLogic, you will experience a culture of caring, continuous learning and development, interesting and meaningful work, balance and flexibility, and a high-trust organization. The company is known for engineering impact and collaboration with clients worldwide to create innovative digital products and experiences.,

Mock Interview

Practice Video Interview with JobPe AI

Start DevOps Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now
GlobalLogic logo
GlobalLogic

Software Development and Technology Consulting

New York

RecommendedJobs for You