Site Reliability Engineer

3 - 7 years

0 Lacs

Posted:20 hours ago| Platform: Shine logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

The Engineer Intmd Analyst position is responsible for various engineering activities, including designing, acquiring, and developing hardware, software, and network infrastructure in collaboration with the Technology team. The primary goal of this role is to ensure that quality standards are met within existing and planned frameworks. We are looking for a motivated Site Reliability Engineer (SRE) to join our team. As an SRE, your role is crucial in ensuring the reliability, scalability, and performance of our production systems. You will collaborate with development and operations teams to automate tasks, enhance monitoring and observability, and drive continuous improvement in our infrastructure and processes. This position presents a unique opportunity to merge software engineering expertise with operational focus, directly impacting the availability and performance of our services. Responsibilities: - Automate repetitive tasks in the production environment using scripting languages like Python and configuration management tools to enhance efficiency and reduce manual effort. Measure the impact of automation on process improvement and man-hour savings. - Develop and maintain monitoring and observability tools, integrating production applications with platforms like Splunk, ELK, AppDynamics, Evolven, or ITRS. Configure alerts and dashboards to proactively identify and address potential issues for comprehensive system visibility. - Collaborate with development and operations teams to discover automation opportunities and reduce manual tasks, fostering a culture of automation and continuous improvement. - Conduct thorough root cause analysis of production incidents, identifying patterns, and proposing solutions for permanent or temporary fixes. Proactively identify issues and implement preventive measures. - Advocate for SRE best practices within the organization, pushing for enhancements in monitoring, alerting, automation, and incident response processes. - Stay updated with the latest technologies and trends in SRE. Qualifications: - Bachelor's/Master's degree in Computer Science or a related field. - 3+ years of experience in a Site Reliability Engineering hands-on role. - Proficiency in Python or Angular. - Experience with at least one database technology: MongoDB, Oracle, or other RDBMS. - Proficiency in monitoring and observability tools like Splunk, ELK, AppDynamics, Evolven, ITRS, or similar platforms. Ability to configure alerts, dashboards, and onboard applications. - Strong scripting and automation skills, with a track record of automating tasks in a production environment. - Strong problem-solving, analytical, and debugging skills, with a focus on identifying patterns and root causes. - Excellent communication and collaboration skills, with the ability to challenge existing practices and advocate for improvements. Desired Skills: - Experience with version control systems like GitHub or Bitbucket. - Experience with containerization technologies like Docker, OpenShift, or Kubernetes.,

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You