6 years

3 - 9 Lacs

Posted:2 weeks ago| Platform: GlassDoor logo

Apply

Work Mode

On-site

Job Type

Part Time

Job Description

Skills

Skill

IT OPERATIONS
IT SERVICE MANAGEMENT
RCA

CHANGE MANAGEMENT

ITIL

SRE

AZURE

ANSIBLE

PUPPET

TERRAFORM

PYTHON

BASH

SPLUNK

DEVOPS

CI/CD INTEGRATION

Minimum Qualification

No data available

Job Description

Role


SRE/OPS


Location

  • Chennai / Bangalore
  • Flexibility to travel for business trips


Experience:

  • 6+ Years of experience


What awaits you/ Job Profile?


Our vision is to provide an overarching platform for AI-based quality assurance (AIQX) in the global production network to accelerate the end-to-end quality control cycle in vehicle manufacturing. We develop these in an international feature team based on state-of-the-art technologies in close cooperation with our users in the plants.

We are looking for a SRE to join our BMW teams of rock-solid specialists developing and operating our AI-based quality assurance solution for BMW’s plants.

In this position, you will take an important role in maintaining and operating a highly complex platform, working in an international team, securing the quality of our BMW Products.

Our services mainly run on the Microsoft Azure Cloud Platform. If you are a passionate SRE, preferably with a developer background, willing to take responsibility for our platform, sharing knowledge and giving guidance within the team, are thrilled about latest technology, full of energy and ambition, hands-on and not afraid of making your hands dirty, this is the right position for you.


What should you bring along?


  • 6+ years of experience in IT operations or a similar role
  • Willing and able to travel internationally (twice a year)

Monitor and Operate IT Products:

  • Perform regular and sporadic operational tasks to ensure optimal performance of IT services
  • Own and maintain the Regular OPS Tasks list, refining sporadic tasks based on input from the Operations Experts (OE) network

Manage IT Service Continuity:

  • Prepare for and attend emergency exercises (EE), reviewing outcomes and deriving follow-up tasks
  • Communicate findings and improvements to the OE network

Manage Availability:

  • Participate in "Gamedays" and backup/restore test sessions, practicing and executing backup and restore processes.
  • Own the recovery and backup plan, reviewing success and identifying follow-up tasks.

Manage Capacity:

  • Monitor cluster capacity using prepared dashboards and coordinate with the DevOps team for any issues
  • Plan and execute capacity extensions as needed

Manage Service Configuration:

  • Oversee service configuration management using ITSM tools

Manage Events:

  • Observe dashboards and alerts, take action for root cause analysis (RCA) and create tasks for the DevOps team.
  • Provide proactive feedback and maintain monitoring and alerting solutions.

Manage Problems:

  • Conduct root cause analysis and manage known issues, creating Jira defects for further assistance if required

Enable Changes:

  • Create and sync changes with the team, assisting with releases and deployment plans.

Manage Service Requests and Incidents:

  • Observe and resolve service requests and incidents, creating Jira tasks for the DevOps team as necessary.

Manage Knowledge:

  • Create, use, and extend knowledge articles, ensuring availability and consistency.

You take part in 24/7 on-call rotations in a future setup with teams around the world and can restore systems in an efficient manner.


Must have technical skill


  • Strong understanding of IT service management principles and practices
  • Proficiency in monitoring and management tools (e.g., dashboards, alerting systems)
  • Strong analytical and problem-solving abilities, particularly in IT service management
  • Experience in conducting root cause analysis (RCA) and managing known issues
  • Experience in performing regular and sporadic operational tasks to ensure optimal performance of IT services
  • Ability to manage IT service continuity, availability, and capacity effectively
  • Experience with change management processes, including creating and syncing changes with teams
  • Ability to plan and execute capacity extensions and backup/restore processes
  • Any additional responsibilities assigned in the Agile Working Model (AWM) Charter


Good to have technical skills

  • Experience with IT service management frameworks (e.g., ITIL, SRE practices)
  • Familiarity with cloud platforms (e.g. Azure) and their operational management
  • Experience with automation tools (e.g., Ansible, Puppet, Terraform) and scripting languages (e.g., Python, Bash) to streamline operational tasks
  • Understanding of DevOps methodologies and practices, including CI/CD (Continuous Integration/Continuous Deployment) processes
  • Knowledge of network protocols, configurations, and troubleshooting to support IT infrastructure
  • Understanding of IT security best practices and compliance requirements to ensure secure operations
  • Skills in data analysis and visualization tools (e.g., Splunk, Grafana) to interpret operational metrics and trends
  • Above-board work ethics

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You