Site Reliability Engineer - Network Operations Center

5 - 9 years

0 Lacs

Posted:2 days ago| Platform: Shine logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

You will be joining our client's team as a Site Reliability Engineer, where your main responsibility will be ensuring the reliability and uptime of critical services. This will involve a strong focus on Kubernetes administration, CentOS servers, Java application support, incident management, and change management. The ideal candidate for this role will have strong experience with ArgoCD for Kubernetes management, Linux skills, basic scripting knowledge, and familiarity with modern monitoring, alerting, and automation tools. We are looking for someone who is self-motivated, possesses excellent communication skills (both oral and written), and can work both independently and collaboratively. Your main tasks will include monitoring, maintaining, and managing applications on CentOS servers to ensure high availability and performance. You will also be responsible for conducting routine tasks for system and application maintenance, following SOPs to correct and prevent issues. In addition, you will respond to and manage running incidents, conduct post-mortem meetings, perform root cause analysis, and ensure timely resolution. Furthermore, you will be monitoring production systems, applications, and overall performance, using tools to detect abnormal behaviors in the software and collect information to help developers understand the root causes of problems. Security checks, running meetings with business partners, writing and maintaining policy and procedure documents, writing scripts or code as necessary to develop tools and services, and learning from post-mortems to prevent new incidents are also part of your responsibilities. Technical skills required for this role include 5+ years of experience working in a SaaS and Cloud environment, administration of Kubernetes clusters with ArgoCD, Linux scripting for automation, experience with database systems like MySQL and DB2, Linux administration skills, understanding of change management procedures, on-call responsibilities, experience with managing deployments using Jenkins, and familiarity with monitoring tools like New Relic, Splunk, and Nagios. Additionally, experience with log aggregation tools like Splunk, Loki, or Grafana, strong scripting knowledge in at least one language, and experience with API programming and integrating tools such as Jira, Slack, and xMatters/PagerDuty are preferred. This is an exciting opportunity for a motivated individual with the right skill set to make a significant impact on our client's team.,

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now

RecommendedJobs for You