Sustain Engineer

5 years

0 Lacs

Posted:2 weeks ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Overview

This role is responsible for ensuring the overall stability of production application. Reliability, availability, scalability, and efficiency of our production systems and platforms. The Operations Engineer will collaborate with cross-functional teams—including Software Engineering, Service Reliability, Infrastructure, and Business Operations—to streamline processes, manage day to day operations, monitor system health, and quickly resolve incidents.The ideal candidate must be skilled in problem-solving, process automation, and root cause analysis, with a passion for operational excellence and continuous improvement.

Responsibilities

  • Monitor production systems, applications, and infrastructure to ensure high availability and performance.
  • Troubleshoot and resolve operational issues, providing timely escalation and communication to stakeholders.
  • Perform root cause analysis (RCA) and drive permanent fixes to recurring problems.
  • Manage RTS & TTS, configuration changes, and production rollouts with minimal impact.
  • Develop and maintain runbooks, standard operating procedures, and technical documentation.
  • Automate operational workflows, monitoring, and reporting using scripts and tools.
  • Collaborate with engineering teams to design for reliability, scalability, and operability.
  • Support incident response, disaster recovery, and business continuity processes.
  • Drive continuous improvement initiatives around system monitoring, alerting, and incident response.
  • Ensure compliance with IT controls, security policies, and audit requirements.

Qualifications

  • Bachelor’s degree in Computer Science, Information Technology, Engineering, or a related field (or equivalent experience).
  • 5+ years of experience in operations engineering, site reliability engineering, or systems administration.
  • Strong knowledge of Linux/Unix and/or Windows server environments.
  • Experience with monitoring and alerting tools (Grafana, Datadog, Splunk, Nagios).
  • Proficiency in at least one scripting/programming language (e.g., Python, Bash, PowerShell).
  • Familiarity with CI/CD pipelines, deployment automation, and configuration management (e.g., Jenkins).
  • Understanding of networking fundamentals (DNS, TCP/IP, load balancing, firewalls).
  • Hands-on experience with cloud platforms (AWS, Azure, GCP).

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You

hyderabad, telangana, india

hyderabad, telangana, india