Site Reliability Engineer

4 - 8 years

0 Lacs

Posted:1 day ago| Platform: Shine logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

The Site Reliability Engineer (SRE) plays a crucial role in ensuring the availability, performance, and scalability of critical systems. Your responsibilities include managing CI/CD pipelines, monitoring production environments, automating operations, and collaborating with development and infrastructure teams to enhance platform reliability. Key Responsibilities - Manage alerts and monitoring of critical production systems. - Enhance CI/CD pipelines and deployment strategies. - Collaborate with central platform teams on reliability initiatives. - Automate testing, regression, and operational tooling for efficiency. - Conduct NFR testing on production systems. - Implement Debian version migrations with minimal disruption. Required Qualifications & Skills - Proficiency in CI/CD tools like Jenkins, Docker, JFrog. - Experience in Debian OS migration and upgrades. - Knowledge of monitoring tools such as Grafana and Nagios. - Familiarity with configuration management tools like Ansible, Puppet, or Chef. - Working knowledge of Git and version control systems. - Deep understanding of Kubernetes architecture and deployment pipelines. - Proficiency in networking protocols and tools like TCP/IP, UDP, Wireshark. - Strong skills in Linux, scripting, and databases like MySQL and NoSQL. Soft Skills - Strong problem-solving and analytical abilities. - Effective communication and collaboration with cross-functional teams. - Ownership mindset, accountability, and adaptability to dynamic environments. - Detail-oriented approach and proactive problem-solving. Preferred Qualifications - Bachelor's degree in Computer Science or related field. - Certifications in Kubernetes, Linux, or DevOps practices. - Experience with cloud platforms like AWS, GCP, or Azure. - Exposure to service mesh, observability stacks, or SRE toolkits. Key Relationships Internal: DevOps, Infrastructure, Software Development, QA, Security Teams External: Tool vendors, platform service providers Role Dimensions - Impact on uptime and reliability of business-critical services. - Ownership of CI/CD and deployment processes. - Contribution to cross-team reliability and scalability initiatives. Success Measures (KPIs) - System uptime and availability (SLA adherence). - Incident response metrics (MTTD, MTTR). - Deployment success rate and automation coverage. - Completion of OS migration and infrastructure upgrade projects. Competency Framework Alignment - Technical Mastery: Infrastructure, automation, CI/CD, Kubernetes, monitoring. - Execution Excellence: Timely project delivery, process improvements. - Collaboration: Cross-functional team engagement and support. - Resilience: Problem solving under pressure, incident response. - Innovation: Continuous improvement of operational reliability and performance.,

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now
Enterprise Minds, Inc logo
Enterprise Minds, Inc

Information Technology

San Francisco

RecommendedJobs for You

Bengaluru, Karnataka, India

Noida, Uttar Pradesh, India