Position Summary : We are looking for a Senior Site Reliability Engineer SMTS to join our Cloud Infrastructure Engineering division in Bangalore .Cloud Infrastructure Engineering ensures the continuous availability of the technologies and systems that are the foundation of athena health's services.We are directly responsible for thousands of servers, petabytes of storage, and handling thousands of web requests per second, all while sustaining growth at a meteoric rate. We enable an operating system for the medical office that abstracts away administrative complexity, leaving doctors free to practice medicine. The Team: We are a bunch of Site Reliability Engineers who are passionate about reliability, automation, and scalability. We use an agile based framework to execute our work, ensuring we are always focused on the most important and impactful needs of the business. We support systems in both private and public cloud and make data-driven decisions for which one best suit the needs of the business. We are relentless in automating away manual, repetitive work so we can focus on projects that help move the business forward. Job Responsibilities: Reliability and Availability: Define, measure, and maintain Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for cloud services and infrastructure components. Lead efforts to continuously improve system availability, fault tolerance, and disaster recovery capabilities. Ensure proactive incident detection, efficient root cause analysis, and timely resolution of production incidents On-Call participation in 24x7 setup. Automation and Infrastructure as Code (IaC): Drive automation efforts to reduce manual intervention and streamline cloud infrastructure management. Implement Infrastructure as Code (IaC) using tools like Terraform, AWS CloudFormation, and Ansible to provision, manage, and scale cloud resources. Automate deployment, scaling, and monitoring processes to improve efficiency and reduce operational complexity. Monitoring, Observability, and Performance Tuning: Design and implement monitoring, logging, and alerting solutions to track cloud infrastructure health, performance, and security. Use observability tools (e.g., Prometheus, Grafana, Cloud Watch) to ensure continuous visibility into cloud infrastructure performance and capacity. Identify bottlenecks and performance issues, proposing and implementing improvements to ensure optimal resource usage. Security and Compliance: Ensure that cloud infrastructure is built with security best practices in mind and meets all relevant compliance and regulatory requirements. Collaborate with security teams to implement security controls and risk mitigation strategies across cloud environments. Regularly audit and review cloud infrastructure for security vulnerabilities and compliance gaps. Collaboration and Cross-Functional Leadership: Work closely with development, DevOps, and operations teams to ensure cloud infrastructure aligns with application and business requirements. Lead and mentor a team of Site Reliability Engineers, promoting best practices and fostering a culture of operational excellence. Act as a key technical point of contact for cloud-related infrastructure and operations issues. Incident Management and Post-Mortem: Lead the incident response efforts for cloud infrastructure-related issues, ensuring that all incidents are managed effectively. Conduct post-incident reviews (PIRs) to identify root causes and implement preventive measures. Continuously refine incident management processes to reduce downtime and enhance recovery times. Qualifications 5-9 years of hands-on experience with cloud automation and configuration management tools (e.g., Terraform, AWS CloudFormation, Ansible).On a Hybrid Cloud Set-up. 5+ years of experience in a Site Reliability Engineering (SRE), Infrastructure Engineering, or DevOps role, with at least 3+ years in a technical leadership capacity. Deep knowledge of cloud services and technologies (e.g., EC2, S3, Lambda, Kubernetes, etc.). Proficiency in scripting or programming languages (Python, Go, Bash, etc.). Experience with monitoring, logging, and observability tools (e.g., Prometheus, Grafana, Datadog, ELK stack). Familiarity with Continuous Integration/Continuous Deployment (CI/CD) pipelines and cloud-native development practices. Strong expertise in managing cloud infrastructure (AWS, Google Cloud, Azure) in production environments. Experience with cloud-native architectures, microservices, and containerized environments (Kubernetes, Docker). Proven experience in building and managing highly available, scalable, and fault-tolerant systems in the cloud. Strong understanding of cloud networking, storage, compute services, On-Prem and security best practices.

More Jobs at Athenahealth Technology Private Limited

Director - Product Management

Chennai, Tamil Nadu, India

Experience: Not specified

Salary: Not disclosed

Senior Platform Engineer - SMTS

Bengaluru / Bangalore, Karnataka, India

4.0 - 9.0 yrs

INR 4 - 9 Lacs

Incident Manager - Cloud Infrastructure

Bengaluru / Bangalore, Karnataka, India

5.0 - 8.0 yrs

INR 5 - 8 Lacs

Data Engineer Associate

Chennai, Tamil Nadu, India

5.0 - 8.0 yrs

INR 5 - 8 Lacs

Senior Member of Technical Staff - SMTS

Chennai, Tamil Nadu, India

5.0 - 6.0 yrs

INR 5 - 6 Lacs

Mock Interview

Practice Video Interview with JobPe AI

Start Interview Now

Athenahealth Technology Private Limited

76 Jobs

Login to

Please Verify Your Phone or Email

Confirm Action

Search

Profile

Bookmarks

Senior Site Reliability Engineer – AWS & Kubernetes

Experience & Salary

Skills Required

Work Mode

Job Type

Job Description

More Jobs at Athenahealth Technology Private Limited

Mock Interview

RecommendedJobs for You

Senior Site Reliability Engineer – AWS & Kubernetes

Asset & Wealth Management - WSS - Associate

Lead Consultant - Data Governance Platform Engineer

Production and Support Engineer

Data Management Strategy & Governance Associate Advisor

Site Reliability Engineer - MTS

Senior Site Reliability Engineer – AWS & Kubernetes

R&D Engineering, Sr Engineer

Site Reliability Engineer - MTS

Senior Site Reliability Engineer - Logging Metrics and Monitoring

Login to

Please Verify Your Phone or Email

Confirm Action

Contact Us

Contact Us

Search

Profile

Bookmarks

Personal Settings

Senior Site Reliability Engineer – AWS & Kubernetes

Experience & Salary

Skills Required

Work Mode

Job Type

Job Description

More Jobs at Athenahealth Technology Private Limited