Jobs

Interviews
Job Alerts
Tools

Upskill and Grow with AI

Mock Interview Practice interviews in realistic simulations

Coding Practice Improve your coding skills with challenges

Certification Earn certifications to validate your skills

AI Learning Get trained with AI expert sessions

Career Path AI insights for smarter career decisions

AI Job Match Score AI-Powered Job Match Against Your Resume and Optimize Your Resume

Career Tools and Resources

Resume Builder Build Professional Resume with Ease

ATS Friendliness Check Check Resume Friendliness for Applicant Tracking Systems

Auto Apply Apply to hundreds of jobs on any platform effortlessly

Co-Pilot (Chrome Extension) Your AI Assistant for Seamless Browsing Efficiency

Interview Questions Streamline interviews with ready-to-use questions

Salaries Discover market-driven salary insights across skillsets and geographies

Companies Explore leading companies actively hiring talent
For Employers

Home
>
Jobs in chennai
>
Arcadia
>
Senior Site Reliability Engineer

Senior Site Reliability Engineer

Arcadia

10 years

0 Lacs

chennai tamil nadu india

Posted:7 hours ago| Platform:

Apply

Skills Required

reliability technology software data grid saas learning engineering scaling kubernetes efficiency security devops drive automation automate aws design vpc iam terraform tuning troubleshooting helm packaging gitops jenkins scripting github python datadog metrics budgeting sizing database mysql postgresql replication vault audit monitoring networking development ai debugging architecture analysis code management zones communication workflow devsecops governance certifications compensation model recognition empathy teamwork diversity sponsorship

Work Mode

Remote

Job Type

Full Time

Job Description

Senior Site Reliability Engineer

Who we are

Arcadia is the technology company empowering energy innovators and consumers to fight the climate crisis. Our software and APIs are revolutionizing an industry held back by outdated systems and institutions by creating unprecedented access to the data and clean energy needed to make a decarbonized energy grid possible.In 2014, Arcadia set out on its mission to break the fossil fuel monopoly and since then we have been knocking down the institutional barriers to unlock decarbonization. To date, we have connected hundreds of thousands of consumers and small businesses with high-quality clean energy options. Fast forward to today, and now, we’re thinking even bigger. We have launched Arcadia Platform, an industry-defining SaaS platform that empowers developers and energy innovators to deliver their own custom, personalized energy experiences, accelerating the transformation of the industry from an analog energy system into a digitized information network.Tackling one of the world’s biggest challenges requires out-of-the-box thinking & diverse perspectives. We’re building a team of individuals from different backgrounds, industries, & educational experiences. If you share our passion for ushering in the era of the clean electron, we look forward to learning what you would uniquely bring to Arcadia! Visit www.arcadia.com.HQ: Greenwood Village, Colorado

What we're looking for:

Senior Site Reliability Engineer (L3)

The ideal candidate is a self-starter and hands-on engineer who can dive deep into complex distributed systems, automate away manual processes, and proactively identify reliability gaps. They should have a proven track record of managing production-grade AWS infrastructure, Kubernetes clusters, CI/CD pipelines, and cloud security. They will collaborate daily with US-based engineering teams and cross-functional partners to ensure our platform remains scalable, secure, and cost-optimized as we continue to grow.

What you'll do:

Design, build, and maintain
AWS infrastructure
(EKS, VPC, RDS, IAM, CloudWatch, CloudTrail, GuardDuty, Load Balancers, S3, CloudFront) using
Terraform and CloudFormation
Lead all aspects of
Kubernetes operations
including cluster upgrades, performance tuning, CNI troubleshooting, workload scaling, Helm chart packaging, and GitOps deployments
Own and evolve our
CI/CD ecosystem
across Jenkins (Groovy scripting), GitHub Actions, AWS CodePipeline, ArgoCD, and FluxCD
Improve platform reliability by reducing operational toil through automation, scripting (Python/Bash), and proactive system hardening
Implement and enhance
observability
across Prometheus, Grafana, Loki, Tempo, Datadog, and CloudWatch—ensuring actionable alerting, dashboards, and metrics alignment with SLO/SLIs
Drive
FinOps initiatives
, identifying cost inefficiencies and working with engineering teams to implement best practices, tagging standards, budgeting, and resource right-sizing
Manage
database operations
across MySQL and PostgreSQL including backups, performance tuning, replication, and operational runbooks
Maintain and improve
secret management
using Vault, AWS Secrets Manager, and Parameter Store
Strengthen cloud security posture with IAM least privilege, CSPM reviews, audit readiness, GuardDuty/CloudTrail monitoring, and environment hardening
Troubleshoot complex production issues across networking, Kubernetes, compute, databases, and CI/CD systems
Collaborate daily with US-based teams for incident reviews, migrations, roadmap work, and platform enhancements
Contribute to development and adoption of
AI-enabled tooling
(e.g., automation, debugging assistants, MCP, RAG pipelines—good to have, not mandatory)
Document runbooks, architecture diagrams, SOPs, troubleshooting guides, and operational best practices
Participate in on-call rotations (if required) and drive post-incident analysis and long-term fixes

What will help you succeed:

Must-haves:

Bachelor’s degree in Computer Science, Engineering, or equivalent practical experience
8–10+ years of experience in
SRE/DevOps/Cloud Engineering
, with deep hands-on exposure to AWS and Kubernetes
Strong hands-on experience with:

Terraform
& Infrastructure as Code
AWS core services (EKS, IAM, RDS, EC2, VPC, CloudWatch, CloudTrail, GuardDuty)
Jenkins + Groovy
, GitHub Actions, ArgoCD, FluxCD
Kubernetes troubleshooting and operations
Prometheus/Grafana/Datadog
observability stacks

Proven ability to operate in high-scale, high-uptime, multi-environment production systems
Experience building automation via
Python/Bash
and reducing operational toil
Strong understanding of incident management, root cause analysis, and reliability engineering principles
Experience working with globally distributed teams across multiple time zones
Excellent communication skills (must interact with US teams daily)
Ability to work independently with minimal supervision, take ownership, and drive initiatives end-to-end
A growth mindset, strong troubleshooting ability, and comfort with complex cloud-native environments

Nice to have (Good-to-haves):

Experience with
n8n self-hosted
, workflow automation platforms
Exposure to
LLMs, RAG, vector DBs, MCP
concepts
Experience with cloud security/DevSecOps tools (Trivy, Inspector, OPA, Kyverno)
Hands-on experience with FinOps platforms and cloud cost governance
Certifications in related field ( AWS , Kubernetes , Terraform ..etc)

Benefits

Competitive compensation and employee stock options
Hybrid/remote-first working model (India-based role, with global collaboration)
Flexible leave policy
Comprehensive medical insurance (self + family members)
Annual performance cycle + quarterly recognition awards
A supportive, diverse engineering culture grounded in empathy, teamwork, and innovation

Eliminating carbon footprints, eliminating carbon copies.

Here at Arcadia, we cultivate diversity, celebrate individuality, and believe unique perspectives are key to our collective success in creating a clean energy future. Arcadia is committed to equal employment opportunities regardless of race, color, religion, gender, sexual orientation, gender identity or expression, national origin, age, disability, genetic information, protected veteran status, or any status protected by applicable federal, state, or local law. While we are currently unable to consider candidates who will require visa sponsorship, we welcome applications from all qualified candidates eligible to work in India

Thank you

More Jobs at Arcadia

Software Engineer II - Salesforce

Chennai, Tamil Nadu, India

Experience: Not specified

Salary: Not disclosed

Software Engineer II - Python (RUDE squad)

Chennai, Tamil Nadu, India

3.0 - 3.0 yrs

Salary: Not disclosed

Analytics Engineer III

Chennai, Tamil Nadu, India

3.0 - 3.0 yrs

Salary: Not disclosed

Senior Data Analyst - Revenue Ops

Chennai, Tamil Nadu, India

5.0 - 5.0 yrs

Salary: Not disclosed

Software Engineer II - Connector Engineering

Chennai, Tamil Nadu, India

Experience: Not specified

Salary: Not disclosed

Mock Interview

Practice Video Interview with JobPe AI

Start DevOps Interview

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

Arcadia

Before You Leave... Find Your Perfect Job!

Login to

Please Verify Your Phone or Email

Confirm Action

Contact Us

Senior Site Reliability Engineer

Experience & Salary

Skills Required

Work Mode

Job Type

Job Description

Senior Site Reliability Engineer

Who we are

What we're looking for:

Senior Site Reliability Engineer (L3)

What you'll do:

AWS infrastructure

Terraform and CloudFormation

Kubernetes operations

CI/CD ecosystem

observability

FinOps initiatives

database operations

secret management

AI-enabled tooling

What will help you succeed:

Must-haves:

SRE/DevOps/Cloud Engineering

Terraform

AWS core services (EKS, IAM, RDS, EC2, VPC, CloudWatch, CloudTrail, GuardDuty)

Jenkins + Groovy

Kubernetes troubleshooting and operations

Prometheus/Grafana/Datadog

Python/Bash

Nice to have (Good-to-haves):

n8n self-hosted

LLMs, RAG, vector DBs, MCP

Benefits

Eliminating carbon footprints, eliminating carbon copies.

More Jobs at Arcadia