Manager- Site Reliability Engineering (SRE)

5 - 12 years

0 Lacs

Posted:1 month ago| Platform: Foundit logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

About Us

Zycus, recognized by leading analyst firms in procurement technology, empowers teams to unlock deep value through its comprehensive Source-to-Pay (S2P) solutions. At the heart of our S2P solution is the Merlin Agentic Platform, which orchestrates intelligent AI agents to deliver simplified, efficient, and compliant processes.The Merlin Intake Agent Offers Business Users Unparalleled Ease Of Use, Increasing Adoption Rates And Significantly Reducing Non-compliant Spending. For Procurement Teams, The Merlin Autonomous Negotiation Agent Handles Tail Spend Autonomously, Securing Additional Savings; The Merlin Contract Agent Helps Draft Compliant Contracts And Reduces Risks By Actively Monitoring Them; And The Merlin AP Agent Further Enhances Efficiency By Automating Invoice Processing With Exceptional Speed And Accuracy.

We Are An Equal Opportunity Employer:

Zycus is committed to providing equal opportunities in employment and creating an inclusive work environment. We do not discriminate against applicants on the basis of race, color, religion, gender, sexual orientation, national origin, age, disability, or any other legally protected characteristic. All hiring decisions will be based solely on qualifications, skills, and experience relevant to the job requirements.Zycus is looking for a

Site Reliability Engineer (SRE)

with deep expertise in

Kubernetes

,

automation

, and

Linux systems

. The ideal candidate will have hands-on experience in deploying, administrating, and optimizing large-scale production systems, with a strong focus on

microservices architecture

, ensuring automation, performance, and reliability across our SaaS platform.

Roles And Responsibilities:

  • System Reliability & Uptime: Ensure high availability, performance, and reliability of applications and infrastructure.
  • Kubernetes & Cluster Management: Deploy, administer, and maintain Kubernetes clusters, managing scaling, upgrades, and troubleshooting.
  • Microservices Management: Handle the deployment, monitoring, and scaling of microservices in distributed environments.
  • Incident Management: Respond to production incidents, perform root cause analysis, and implement long-term fixes to prevent recurrence.
  • Automation & Infrastructure as Code (IaC): Automate repetitive tasks, infrastructure provisioning, and deployment workflows using tools like Ansible and Terraform.
  • Monitoring & Observability: Implement and maintain monitoring tools (e.g., Prometheus, Grafana, Datadog) to track system health and application performance.
  • Performance Optimization: Analyze system performance, identify bottlenecks, and optimize resources for better efficiency.
  • Disaster Recovery & Backup: Design and implement backup and disaster recovery (DR) strategies for business continuity.
  • Capacity Planning: Forecast infrastructure needs based on performance trends and business growth to ensure scalability.
  • Security & Compliance: Ensure infrastructure and applications meet security standards and compliance requirements.
  • Collaboration with Dev & Ops Teams: Work closely with development and operations teams to improve deployment pipelines, release processes, and system reliability.
  • Documentation: Maintain clear and detailed documentation of systems, processes, and incident reports for knowledge sharing and compliance.
  • Continuous Improvement: Identify opportunities for improving system architecture, deployment strategies, and automation workflows.
  • Cloud Infrastructure Management: Manage cloud services (AWS, GCP, Azure) for resource optimization, cost management, and automation.
  • On-Call Support: Participate in on-call rotations to handle urgent production issues and ensure rapid recovery.
Job Requirement
  • Experience : 5 to 12 years
  • Technical skills as mentioned below :


Must Have :

  • Kubernetes Expertise:
Hands-on experience with

installing and provisioning Kubernetes clusters

.Deep understanding of

core Kubernetes components

such as

CRI, CNS, ETCD, CoreDNS, KubeProxy

.

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now

RecommendedJobs for You