Site Reliability Engineer (SRE)

4 - 8 years

4 - 7 Lacs

Posted:6 days ago| Platform: Foundit logo

Apply

Skills Required

ci/cd

Work Mode

On-site

Job Type

Full Time

Job Description

  • Run the production environment by monitoringavailability and taking a holistic view of system health.
  • Provide predictive insights into the health ofthe system and suggest measures to optimize and safeguard against futureabnormalities.
  • Build software and systems to manage platforminfrastructure and applications.
  • Improve reliability, quality, andtime-to-market of our suite of our cloud and on-prem software solutions.
  • Measure and optimize system performance, withan eye toward pushing our capabilities forward, getting ahead of customerneeds, and innovating for continual improvement.
  • Provide primary operational support andengineering for multiple large-scale distributed infrastructure and relatedapplications.

Must Have Skill:

  • 5+ years of experience and a proven trackrecord of maintaining and supporting large scale infrastructure and cloudsystems.
  • Gather and analyze metrics from operatingsystems as well as applications to assist in performance tuning and faultfinding.
  • Partner with development teams to improveservices through rigorous testing and release procedures.
  • Participate in system design consulting,platform management, and capacity planning.
  • Create sustainable systems and servicesthrough automation and uplifts.
  • Balance feature development speed andreliability with well-defined service-level objectives.
  • In-depth and hands-on knowledge of automation technologies with extensiveexpertise in Terraform or Ansible.
  • In-depth and hands-on knowledge of Linux andMySQL, programming and scripting using Bash, Python/alternate.
  • In-depth knowledge of maintaining any on-premcloud solutions like OpenStack / CloudStack / OpenNebula / vCloud etc.
  • In-depth and hands-on knowledge of containersand container orchestration using Kubernetes.
  • In-Depth and hands on knowledge on anymonitoring system (Prometheus / Nagios / Zabbix / SolarWinds / ManageEngine etc.).Experience of implementing correlation and predictive analysis into monitoringof the systems.
  • Hands on extensive experience of implementing,maintaining high availability systems. Ensuring backup and ensuringbusiness continuity in a seamless manner.
  • Thorough conceptual knowledge of distributedsystems, storage, networking, SDN, SDS.

Good to Have Skill:

  • Knowledge of CloudStack/Citrix CloudPlatformand involvement as an administrator / maintainer / committer / tester / supportengineer.
  • Data centre or ISP experience in a similarrole.
  • Knowledge of GPU based systems, Nvidia BCM,GPU Virtualisation techniques.
  • Worked in supporting AI/ML workloads.

Qualification and Experience:

  • Relevant bachelors degree

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now
Yotta Infrastructure logo
Yotta Infrastructure

IT Services and IT Consulting

Mumbai Maharashtra

RecommendedJobs for You