Principal Site Reliability Engineer

14 - 19 years

50 - 55 Lacs

Posted:4 hours ago| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description

Overview About Business Unit: At the core of all that Epsilon does is a team that sets the foundation of our IT infrastructure. The team drives innovation and efficiency through disruptive technology across Epsilons platforms and business verticals. From being the first point of contact for infrastructure needs to final deployment, the team provides end-to-end solutions for our client-facing platforms. ETS supports all aspects of revenue-generating platforms for Epsilon and sets the architectural direction for our enterprise deployments. By embracing the latest technologies, such as Cloud, Automation, and Artificial Intelligence, the team is at the front of transforming our digital business and capturing new opportunities. Why we are looking for you: You have experience in building world class automation with Site Reliability mindset for Cloud and on-premise infrastructure You have shift-left approach and have strong cloud native experience You have a solid experience of building products/platforms of scale You enjoy new challenges and are solution oriented in complex infrastructure environments You like mentoring people and enable collaboration of the highest order What you will enjoy in this role: As part of the Epsilon CPTS team, the pace of the work matches the fast-evolving demands of Fortune 500 clients across the globe As part of an innovative team that s not afraid to do things differently, your ideas will come to life in building next gen infrastructure that supports our Fortune 500 global customers You will implement shift-left approach into our infrastructure life cycle practices The open and transparent environment that values innovation and efficiency Responsibilities Lead SRE initiatives across a hybrid infrastructure (on-prem + AWS, Azure, GCP) Manage and optimize 11,000+ servers across Linux and Windows platforms Architect and support scalable, resilient AWS infrastructure (EKS, EC2, S3, RDS, Lambda, etc.) Administer Kubernetes clusters at scale; ensure health, upgrades, and secure deployments Drive infrastructure automation using Python, Shell, and Infrastructure as Code (Terraform, Ansible, Chef) Design and implement AI agents for observability, RCA, and incident triage using modern MLOps/DevOps paradigms Collaborate with development, IT Ops, Command Center, cloud, and platform teams to strengthen CI/CD, security posture, and SLA alignment Qualifications BE/ B.Tech No correspondence course 14+ years of experience in Platform/Cloud Engineering, SRE, DevOps Strong hands-on coding experience in Go, Python, Shell Strong expertise in Cloud, Kubernetes, Linux Administration Hands-on experience with AWS services and Kubernetes Proficiency in IAC tools like Terraform, Cross plane, Ansible Extensive experience in delivering efficient developer experience Familiarity with monitoring tools (Zabbix, PagerDuty, Grafana)

Mock Interview

Practice Video Interview with JobPe AI

Start Artificial Intelligence Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now
Epsilon Data Management logo
Epsilon Data Management

Advertising Services

Irving Texas

RecommendedJobs for You