5 - 9 years

0 Lacs

Posted:3 days ago| Platform: Shine logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

We are seeking Site Reliability Engineers to oversee critical cloud infrastructure for our global clients. Your role involves maintaining, enhancing, and ensuring seamless continuity across multiple production environments. Responsibilities: Your core responsibilities include: - Monitoring system availability and ensuring overall system health. - Providing proactive insights on system health and recommending optimizations to prevent future issues. - Developing software and systems to manage platform infrastructure and applications. - Enhancing reliability, quality, and time-to-market for our cloud and on-premises software solutions. - Optimizing system performance to meet evolving customer needs and drive continual innovation. - Offering primary operational support and engineering for large-scale distributed infrastructure and related applications. Requirements: This is a deeply technical role focused on enhancing and maintaining production systems. We will evaluate candidates based on the following criteria: - 5+ years of experience in supporting large-scale infrastructure and cloud systems. - Proficiency in gathering and analyzing metrics for performance tuning and issue resolution. - Collaboration with development teams to enhance services through rigorous testing and release processes. - Involvement in system design consulting, platform management, and capacity planning. - Creation of sustainable systems and services through automation. - Balancing feature development speed and reliability with defined service level objectives. Technical Requirements: - Proficiency in automation technologies, particularly Terraform or Ansible. - Strong knowledge of Linux, MySQL, and scripting languages like Bash and Python. - Experience in maintaining on-premises cloud solutions such as OpenStack, Cloud Stack, etc. - Expertise in containers and container orchestration using Kubernetes. - Familiarity with monitoring systems like Prometheus, Nagios, etc., and implementing predictive analysis. - Extensive experience in maintaining high-availability systems and ensuring business continuity. - Solid understanding of distributed systems, storage, networking, SDN, and SDS. Bonus Attributes: - Familiarity with Cloud Stack, Citrix CloudPlatform, and related roles. - Experience in data centers or ISPs in a similar capacity. - Knowledge of GPU-based systems and virtualization techniques. - Background in supporting AI/ML workloads.,

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You