5 - 9 years

0 Lacs

Posted:2 days ago| Platform: Shine logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

As a Site Reliability Engineer at our company, you will be responsible for overseeing critical cloud infrastructure for our global clients. Your main role will involve maintaining, enhancing, and ensuring seamless continuity across multiple production environments. **Key Responsibilities:** - Monitoring system availability and ensuring overall system health. - Providing proactive insights on system health and recommending optimizations to prevent future issues. - Developing software and systems to manage platform infrastructure and applications. - Enhancing reliability, quality, and time-to-market for our cloud and on-premises software solutions. - Optimizing system performance to meet evolving customer needs and drive continual innovation. - Offering primary operational support and engineering for large-scale distributed infrastructure and related applications. **Qualifications Required:** - 5+ years of experience in supporting large-scale infrastructure and cloud systems. - Proficiency in gathering and analyzing metrics for performance tuning and issue resolution. - Collaboration with development teams to enhance services through rigorous testing and release processes. - Involvement in system design consulting, platform management, and capacity planning. - Creation of sustainable systems and services through automation. - Balancing feature development speed and reliability with defined service level objectives. In addition to the above, you should have proficiency in automation technologies, particularly Terraform or Ansible. Strong knowledge of Linux, MySQL, and scripting languages like Bash and Python is essential. Experience in maintaining on-premises cloud solutions such as OpenStack, Cloud Stack, etc., and expertise in containers and container orchestration using Kubernetes are also required. Familiarity with monitoring systems like Prometheus, Nagios, etc., and implementing predictive analysis is important. Extensive experience in maintaining high-availability systems and ensuring business continuity, along with a solid understanding of distributed systems, storage, networking, SDN, and SDS, is expected. **Bonus Attributes:** - Familiarity with Cloud Stack, Citrix CloudPlatform, and related roles. - Experience in data centers or ISPs in a similar capacity. - Knowledge of GPU-based systems and virtualization techniques. - Background in supporting AI/ML workloads.,

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You