Jobs

Interviews
Job Alerts
Tools

Upskill and Grow with AI

Mock Interview Practice interviews in realistic simulations

Coding Practice Improve your coding skills with challenges

Certification Earn certifications to validate your skills

AI Learning Get trained with AI expert sessions

Career Path AI insights for smarter career decisions

AI Job Match Score AI-Powered Job Match Against Your Resume and Optimize Your Resume

Career Tools and Resources

Resume Builder Build Professional Resume with Ease

ATS Friendliness Check Check Resume Friendliness for Applicant Tracking Systems

Auto Apply Apply to hundreds of jobs on any platform effortlessly

Co-Pilot (Chrome Extension) Your AI Assistant for Seamless Browsing Efficiency

Interview Questions Streamline interviews with ready-to-use questions

Salaries Discover market-driven salary insights across skillsets and geographies

Companies Explore leading companies actively hiring talent
For Employers

Home
>
Jobs in chennai
>
Net Connect
>
Platform Site Reliability Engineer

Platform Site Reliability Engineer

Net Connect

6 - 9 years

12 - 19 Lacs

chennai bengaluru mumbai (all areas)

Posted:-1 days ago| Platform:

Apply

Skills Required

terraform sre site reliability engineering aws kubernetes jenkins ci/cd devops

Work Mode

Work from Office

Job Type

Full Time

Job Description

Location:

Experience:

CTC:

Notice Period:

Role Overview

Platform Site Reliability Engineer (SRE)

SLA/SLO targets

Key Roles & Responsibilities

Reliability Engineering & Availability

Own and drive
SLIs, SLOs, SLAs
, and error budgets for platform services.
Balance feature velocity with system reliability using
error budget frameworks
.
Design and implement
high-availability architectures
with redundancy, failover, and disaster recovery.
Lead initiatives to achieve and sustain
99.9%+ uptime
for critical systems.
Perform
capacity planning
, forecasting, and scalability assessments.
Conduct
chaos engineering experiments
to proactively identify system weaknesses.

Infrastructure Automation & Platform Engineering

Build and maintain
Infrastructure-as-Code (IaC)
using Terraform (preferred), CloudFormation, Pulumi, or Ansible.
Develop automation and tooling using
Python, Go, Bash
, or similar languages to eliminate manual toil.
Implement
self-healing systems
and automated remediation workflows.
Design and maintain internal
platform services and developer tools
.
Implement
GitOps workflows
and declarative infrastructure management.
Automate infrastructure provisioning, configuration, and deployments.

Cloud & Kubernetes Engineering

Design and operate infrastructure across
AWS, Azure, and GCP
environments.
Build and manage
Kubernetes clusters
(EKS, AKS, GKE, self-managed).
Implement container orchestration best practices including
HPA, VPA, cluster autoscaler
.
Design and operate
service mesh
solutions (Istio, Linkerd, Consul).
Optimize container networking, security, storage, and scheduling.
Manage container registries (ECR, ACR, GCR, Harbor).

CI/CD & Release Engineering

Build and maintain
CI/CD pipelines
using Jenkins, GitLab CI, GitHub Actions, Azure DevOps, or CircleCI.
Implement deployment strategies such as
blue-green, canary, and rolling deployments
.
Manage artifact repositories (Nexus, Artifactory).
Implement deployment gates, approval workflows, and rollback mechanisms.
Integrate security scanning and secrets management into pipelines.

Incident Management & Operational Excellence

Participate in
24/7 on-call rotations
for production systems.
Lead incident response, triage, and resolution during outages.
Conduct
blameless postmortems
and drive actionable improvements.
Track and improve
MTTD and MTTR
metrics.
Build and maintain
runbooks, playbooks, and incident automation
.

Observability & Performance Optimization

Deploy and manage
monitoring, logging, and alerting platforms
.
Design dashboards for system health, performance, and SLO tracking.
Configure intelligent alerts to minimize noise and alert fatigue.
Perform performance tuning across applications, databases, and infrastructure.
Implement caching strategies (Redis, Memcached, CDN) to improve performance.
Optimize infrastructure for
cost efficiency and performance balance
.

Security & Compliance

Implement cloud and infrastructure
security best practices
(least privilege, zero trust).
Manage secrets using
Vault, AWS Secrets Manager, Azure Key Vault
.
Implement network security (VPCs, firewalls, security groups, network policies).
Ensure compliance with
SOC 2, ISO 27001, PCI-DSS, HIPAA
, and internal standards.
Integrate security scanning and vulnerability management into CI/CD pipelines.

Collaboration & Leadership

Partner with development teams to improve service reliability.
Participate in architecture and design reviews.
Mentor junior engineers and promote SRE best practices.
Create and maintain documentation, diagrams, and knowledge bases.
Lead cross-functional reliability and platform improvement initiatives.

Required Skills & Experience

Core Technical Skills

Strong proficiency in
Python, Go, Bash, Ruby, or Java
.
Experience building production-grade automation and tooling.
Deep experience with
AWS
(EC2, EKS, RDS, VPC, IAM, CloudWatch).
Working knowledge of
Azure
and
GCP
infrastructure services.
Strong expertise in
Kubernetes
and Docker containerization.
Hands-on experience with
Terraform
(highly preferred).
Experience with configuration management tools (Ansible, Chef, Puppet).
Strong CI/CD experience with modern pipeline tools.

Systems & Networking

Solid understanding of
distributed systems and microservices
.
Knowledge of
TCP/IP, DNS, load balancing, CDNs, and networking concepts
.
Experience with relational and NoSQL databases.
Strong observability experience with Prometheus, Grafana, ELK, Datadog, or New Relic.

Professional Experience

69 years in
SRE, Platform Engineering, Infrastructure, or DevOps roles
.
3+ years managing
high-availability production systems
.
Proven track record of
automation-driven reliability improvements
.
Experience operating infrastructure at scale.

Soft Skills & Mindset

Strong problem-solving and analytical skills.
Ownership-driven mindset with accountability for system reliability.
Ability to remain calm and effective during high-severity incidents.
Excellent communication and collaboration skills.
Strong commitment to
blameless culture
and continuous improvement.

Certifications (Preferred)

AWS Certified Solutions Architect / DevOps Engineer
Azure Solutions Architect / DevOps Engineer
Google Cloud Professional Cloud Architect
Certified Kubernetes Administrator (CKA / CKAD)
HashiCorp Terraform Associate

Nice-to-Have Skills

Chaos engineering (Chaos Monkey, Gremlin, LitmusChaos).
FinOps and cloud cost optimization.
GitOps tools (ArgoCD, Flux).
Progressive delivery tools (Spinnaker, Flagger).
Serverless platforms and architectures.
Contributions to open-source SRE or infrastructure projects.

Education

Bachelor’s degree
in Computer Science, Engineering, or a related field.

More Jobs at Net Connect

Desktop Support Engineer

Dharuhera, Jaipur, Vadodara

1 - 5 yrs

INR 1 - 2 Lacs

Backend Developer Lead

Kochi

9 - 12 yrs

INR 15 - 30 Lacs

Liferay Senior Engineer

Hyderabad, Bengaluru, Mumbai (All Areas)

6 - 9 yrs

INR 15 - 25 Lacs

Front End Engineering Lead

Kochi

9 - 12 yrs

INR 15 - 30 Lacs

Senior .NET Full Stack Developer

Chennai, Bengaluru, Mumbai (All Areas)

6 - 9 yrs

INR 15 - 25 Lacs

Mock Interview

Practice Video Interview with JobPe AI

Start DevOps Interview

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.