As a Site Reliability Engineer, you will play a crucial role in maintaining and optimizing cloud infrastructure, ensuring high availability, scalability, and performance. Leveraging your expertise in automation, Kubernetes, and CI/CD pipelines, you will drive reliability and efficiency across systems. This role offers the chance to collaborate with cross-functional teams, improve monitoring and incident response processes, and enhance system resilience. If you re passionate about cloud technologies, automation, and solving complex infrastructure challenges in a dynamic hybrid work environment, this is your opportunity to make a significant impact. REQUIREMENTS Bachelor s degree in Computer Science, Information Technology, or related field. (or equivalent work experience). Proven experience as a Devops Engineer or Site Reliability Engineer or similar role, with at least 2 years . Strong hands-on experience with infrastructure-as-code tools like Terraform, configuration management tools like Ansible, and version control systems like Git. Proficiency in scripting languages such as Python, Bash, or Ruby for automation tasks. In-depth knowledge of CI/CD concepts and experience with CI/CD tools like Jenkins, GitLab CI/CD, CircleCI or GitHub Actions. Extensive experience working with cloud platforms like AWS, Azure, or GCP. Solid understanding of containerization technologies such as Docker and container orchestration tools like Kubernetes. Familiarity with monitoring and logging solutions like Prometheus, Grafana, ELK stack, etc. Excellent problem-solving skills and the ability to troubleshoot complex issues across different technology stacks. Strong communication and interpersonal skills to effectively collaborate with cross-functional teams. WHAT YOU WILL DO 1. AWS Cloud Maintenance : Maintain and optimize AWS Cloud infrastructure to ensure scalability, reliability, and performance. Monitor AWS resources and services to identify and rectify potential issues before they impact the system. 2. Kubernetes Management: Manage and maintain Kubernetes clusters, ensuring high availability and performance. Implement best practices for container orchestration and scaling. 3. Incident Response: Participate in an on-call rotation to provide 24/7 support and respond to critical incidents promptly. Collaborate with cross-functional teams to troubleshoot and resolve system issues efficiently. 4. Bug Tracking and Resolution: Identify and document software and infrastructure bugs, working closely with development teams to prioritize and resolve them. Continuously improve monitoring and alerting systems to proactively detect issues. 5. Performance Optimization : Analyze system performance and implement optimizations to enhance reliability and reduce downtime. 6. Automation : Develop and maintain automation scripts and tools for provisioning, deployment, and monitoring. 7. Documentation : Create and update documentation for systems, processes, and incident response procedures. 8. Security and Compliance : Ensure security best practices are followed and participate in security audits and compliance initiatives.

Rheo Ai

rheo.ai

Artificial Intelligence

Innovate City

Login to

Please Verify Your Phone or Email

Confirm Action

Search

Profile

Bookmarks

Reliability Engineer

Experience & Salary

Skills Required

Work Mode

Job Type

Job Description

RecommendedJobs for You

Java Developer

Site Reliability Engineer- Azure

Ionic Mobile Developer Job

Automation Anywhere Developer - Except Bangalore Location

Data Scientist

Software Engineer III - Java & AWS

Senior React Developer || Infogain || Hybrid Work

Appian Developer

Sap Basis Technical Consultant (Bangalore)

Senior React JS Developer- JavaScript (ES6+)- Paytm Money