Site Reliability Engineer

7 - 10 years

8 - 12 Lacs

Posted:-1 days ago| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description

  • Experience Minimum 6 years of relevant work experience with AppDynamics set up in critical production environments
  • Has experience working with AWS and on-prem hosted applications in hybrid cloud
  • Experience in implementing APM and RUM for end-to-end tracing and custom alerts with AppDynamics Core Capabilities Expert level knowledge on AppDynamics integration with agents as well as APM and RUM AWS proficiency with containers and Cloudwatch is key Ability to configure custom alerts and monitors with AppDynamics Ability to build end-to-end observability using AppDynamics from user interaction all the way into infrastructure
  • Good understanding of AppDynamics integration capabilities with other systems Ability to build custom AppDynamics dashboards Ansible or Powershell knowledge is helpful Ability to write SQLs and use AppDynamics to observe database transactions

Qualification

: AppDynamics official certification or alternative certification from Udemy, Coursera or other platforms

Role & Responsibilities:

  • Implement the entire observability solution using AppDynamics for
  • NET monolithic and Java based microservices applications and its infrastructure Implement AppDynamics RUM, APM setup and Log consolidation Build integrations for observability into on-prem and cloud hosted applications using AppDynamics and ensure the deployment as well as continuous running of agents Instrument and expose traces from monolithic
  • NET applications and Java microservices using AppDynamics libraries Set up monitoring of database queries and performance of application transactions with AppDynamics Consult and guide a team of observability engineers to implement the AppDynamics solution Train a new team and hand over in-life maintenance of the AppDynamics solution built

Qualification

  • Experience Minimum 7 years of work experience as an SRE (not Traditional Production Support) covering integration platforms on cloud-based deployments Coding / automation scripting experience in any programming language, particularly for integration tier and middleware
  • Working as a DevOps Engineer or SRE in mission critical applications and infrastructure
  • Working experience with GCP (Google Cloud), particularly with GKE is important Working with AppDynamics and Splunk for monitoring and setting up observability is key CI CD tool chains, setting up and running deployment pipelines and propagating changes on different environments Core Capabilities Maintaining middleware such as Kafka (open source) and MQ as well as application servers (Tomcat)
  • Maintain Hazelcast Data storage platform clusters and Control M job schedulers GCP and private-cloud operational support / administration activities such as provision, capacity management, reliability management, monitoring, restoration, etc Kubernetes cluster management, monitoring and remediation
  • Knowledge of Docker is important Automating deployments and scripting self-healing workflows based on telemetry
  • Define SLIs and configure SLOs, respond to threshold alerts and optimize monitoring capability Work with code as well as configuration artifacts to debug and fix issues that may arise Knowledge of applying SRE practices to daily operations is key
  • Must be inclined to work on proof-of-concept solutions to optimize reliability such as those incorporating AI models for event correlation and assisted triaging Ability to work in shifts in office is mandatory; this is a 24 / 7 on-desk operation

Qualification

Computer Science and or Engineering degrees are preferred SRE Foundation certification by DevOps Institute or any other equivalent certification on SRE by a recognized body is mandatory CKA certification GCP Cloud Digital Leader certification at a minimum is mandatory; Cloud Engineer level is a bonus Hazelcast Platform Operations certification badge

Role & Responsibilities

  • Work as part of a 24 / 7 on-desk team in shifts that will manage middleware and associated applications that are being consumed globally incident, change, event, problem management Debugging integrations and consumers at the code level Work with CI CD pipelines and automate new change rollouts
  • Change deployment and sanity testing is part of the scope Set up and configure an observability product, preferably AppDynamics or Splunk for end-to-end traceability and log analytics
  • Be the guardian to ensure high reliability of the applications, middleware, storage platforms, scheduler (and its jobs) and underlying cloud infrastructure
  • Define and set up SLIs as well as SLOs while continuously refining thresholds Set up anomaly detection and auto-remediation workflows Ensure all alerts and incidents within scope are actioned upon before breaching SLOs

Mock Interview

Practice Video Interview with JobPe AI

Start Java Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Java Skills

Practice Java coding challenges to boost your skills

Start Practicing Java Now
Virtusa logo
Virtusa

Information Technology and Services

Southborough

RecommendedJobs for You

bengaluru east, karnataka, india