Site Reliability Engineer

2 years

25 - 30 Lacs

Thiruvananthapuram Kerala India

Posted:21 hours ago| Platform: Linkedin logo

Apply

Work Mode

Remote

Job Type

Full Time

Job Description

Experience

: 2.00 + years

Salary

: INR 2500000-3000000 / year (based on experience)

Expected Notice Period

: 30 Days

Shift

: (GMT+05:30) Asia/Kolkata (IST)

Opportunity Type

: Remote

Placement Type

: Full Time Permanent position(Payroll and Compliance to be managed by: Lyzr)

(*Note: This is a requirement for one of Uplers' client - Lyzr)What do you need for this opportunity?Must have skills required:FinOps, AWS, Python, PowerShell, Bash, DevOps, SRE, System EngineeringLyzr is Looking for:

At Lyzr AI, this role sits at the heart of platform reliability and scale. You will own the availability, security, and performance of mission-critical AI systems powering our customers, ensuring they run flawlessly at all times. Acting as the final escalation point, you’ll blend deep technical expertise with SRE principles to build resilient, automated, and cost-efficient cloud infrastructure

Roles & Responsibilities

System Ownership & Reliability

  • End-to-End Ownership: Own the health and lifecycle of production systems, ensuring high availability (HA) and meeting strict Service Level Objectives (SLOs).
  • Deep-Dive Debugging: Troubleshoot and resolve complex issues across infrastructure, application code, and networking layers. You will be the escalation point for hard-to-solve production incidents.
  • Incident Management: Lead Root Cause Analysis (RCA) processes for outages, driving permanent fixes and architectural changes to prevent recurrence.

Operational Excellence & Security

  • Disaster Recovery (DR): Design and manage DR strategies; conduct periodic failover drills to ensure business continuity.
  • Security & Compliance: Oversee OS patching, vulnerability scanning, and adherence to industry compliance standards (SOC2/HIPAA/ISO). Maintain strict IAM policies and security groups.
  • Observability: Build and maintain comprehensive monitoring, logging, and alerting frameworks (CloudWatch, Prometheus, Datadog) to ensure early detection of anomalies.
  • Maintenance: Define and maintain backup/restore processes and routine maintenance windows with minimal downtime.

SRE & Automation

  • Eliminate Toil: Apply SRE principles to automate repetitive operational tasks, reducing manual intervention.
  • IaC & Tooling: Develop automation tools and manage infrastructure using Terraform or CloudFormation, along with scripting in Python, Go, or Bash.
  • Self-Healing Systems: Implement auto-remediation workflows where systems can detect and resolve common issues (e.g., restarting failed services, rotating bad nodes) without human intervention.
  • Performance Tuning: optimize application runtime parameters, database queries, and system kernel settings for maximum throughput.

Cloud & Cost Optimization (FinOps)

  • AWS Management: Architect and manage extensive AWS services—EC2, EKS/ECS, RDS, S3, Lambda, VPC, and Route53.
  • Cost Efficiency: Actively monitor cloud spend and drive Cost Optimization initiatives.
  • This includes rightsizing instances, managing Reserved/Spot instances, and identifying idle resources to reduce waste.
  • Capacity Planning: Collaborate with engineering teams to forecast infrastructure needs, ensuring we scale to meet demand without over-provisioning.

Technical Qualifications

Must-Have Skills

  • Experience: 2-5 years in SRE, DevOps, or Systems Engineering roles with a strong focus on AWS.
  • Cloud Proficiency: Expert-level knowledge of AWS core services and architecture standards.
  • Scripting: Strong proficiency in Python or Shell/Bash for automation.
  • Cost Tools: Experience with AWS Cost Explorer, Trusted Advisor, or 3rd party tools (e.g., CloudHealth) to drive financial efficiency.
  • Monitoring: Hands-on experience with tools like Grafana, Prometheus, ELK Stack, or Splunk.

Preferred Qualifications

  • Experience in Hybrid Cloud environments (AWS + On-Prem/Data Center).
  • Knowledge of container orchestration (Kubernetes/EKS).
  • Understanding of database administration and replication (PostgreSQL, MySQL, or DynamoDB).

Interview Process -

  • R1 : Technical Round
  • R2 : Culture + Technical Round

How to apply for this opportunity?

  • Step 1: Click On Apply! And Register or Login on our portal.
  • Step 2: Complete the Screening Form & Upload updated Resume
  • Step 3: Increase your chances to get shortlisted & meet the client for the Interview!

About Uplers:

Our goal is to make hiring reliable, simple, and fast. Our role will be to help all our talents find and apply for relevant contractual onsite opportunities and progress in their career. We will support any grievances or challenges you may face during the engagement.(Note: There are many more opportunities apart from this on the portal. Depending on the assessments you clear, you can apply for them as well).So, if you are ready for a new challenge, a great work environment, and an opportunity to take your career to the next level, don't hesitate to apply today. We are waiting for you!

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now
coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now
Uplers logo
Uplers

Digital Services

Ahmedabad

RecommendedJobs for You

Indore, Madhya pradesh, India

Dehradun, Uttarakhand, India

Chandigarh, Chandigarh, India

Bhopal, Madhya pradesh, India

Vijayawada, Andhra pradesh, India

Thiruvananthapuram, Kerala, India