We are seeking a Site Reliability Engineer (SRE) with strong DevOps expertise to ensure the reliability, availability, and performance of critical systems and services. This role bridges the gap between development and operations teams by employing automation, monitoring, and best practices to enhance system scalability, reduce downtime, and improve overall operational efficiency. The SRE will focus on optimizing development pipelines, managing infrastructure, and implementing proactive monitoring and ing systems while upholding the principles of DevOps and reliability engineering . Key Responsibilities: 1. Reliability Engineering: Design, implement, and maintain high-availability systems. Create and enforce Service Level Objectives (SLOs) , Service Level Indicators (SLIs) , and Service Level Agreements (SLAs) . Conduct root cause analysis for system failures and implement post-mortem processes to prevent recurrence. 2. DevOps Automation: Automate infrastructure provisioning, deployment pipelines, and operational processes. Build and maintain CI/CD pipelines using tools like Jenkins , GitHub Actions , or GitLab CI/CD . Develop Infrastructure as Code (IaC) using tools like Terraform , CloudFormation , or Ansible . 3. Monitoring and Incident Management: Implement robust monitoring , logging , and ing solutions using tools like Datadog or Splunk . Establish proactive incident response processes and manage on-call rotations. Ensure effective documentation for incident handling and resolution. 4. Performance and Scalability: Optimize system performance through capacity planning and resource management . Enable horizontal scaling of services to handle increasing loads. Work closely with development teams to improve application resilience and performance . 5. Security and Compliance: Enforce security best practices in infrastructure and application development. Conduct vulnerability assessments and implement remediation measures. Ensure compliance with organizational and industry standards. 6. Collaboration and Culture: Act as a bridge between development and operations teams to foster a DevOps culture . Coach teams on best practices in reliability , automation , and DevOps . Advocate for a culture of ownership and continuous improvement . Key Skills and Competencies: Technical Skills: Expertise in cloud platforms like AWS , Azure , or GCP . Proficiency in Linux system administration and networking concepts . Strong programming/scripting skills (e.g., Python , Go , Bash ). Understanding of Terraform creation and management. Familiarity with containerization and orchestration tools like Docker and Kubernetes . Knowledge of database management (SQL and NoSQL).

More Jobs at UST

UCC Engineer (Collab)

Trivandrum

5 - 7 yrs

INR 0 - 0 Lacs

Specialist I - Cloud Infrastructure Services - Network Engineer

Trivandrum

12 - 18 yrs

INR 0 - 0 Lacs

Application Packaging - SCCM, Release management

Trivandrum

8 - 12 yrs

INR 0 - 0 Lacs

Lead I - Cloud Infrastructure Services

Trivandrum

5 - 7 yrs

INR 0 - 0 Lacs

SQL Database Engineering

Trivandrum

12 - 15 yrs

INR 0 - 0 Lacs

Mock Interview

Practice Video Interview with JobPe AI

Start Automation Interview Now

My Connections UST

Download Chrome Extension (See your connection in the UST )

Download Now

UST

www.ust.com

IT Services and IT Consulting

Aliso Viejo CA

10001 Employees

1845 Jobs

Key People

Kris Canekeratne

Co-Founder & CEO
Sandeep Reddy

President
Baskar Subramanian

Co-Founder & Chief Strategy Officer
Lynn C. Mclean

Chief Financial Officer

Login to

Please Verify Your Phone or Email

Confirm Action

Search

Profile

Upskill and Grow with AI

Lead SRE Engineer