Site Reliability Engineer II

4 years

0 Lacs

Posted:2 days ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Site Reliability Engineer


Responsibilities:

System Monitoring and Analysis:

  • Implement and maintain robust observability solutions to monitor system performance, identifying bottlenecks, and ensuring optimal operation.
  • Utilize tools to gather, analyze, and visualize key performance metrics.

Performance Optimization:

  • Proactively identify and address performance bottlenecks through in-depth analysis and optimization strategies.
  • Work closely with development teams to implement performance improvements and enhance overall system efficiency.

Capacity Planning:

  • Conduct capacity planning exercises based on observed patterns and future growth projections.
  • Collaborate with infrastructure and development teams to ensure adequate resources are available to meet system demands.

Automation and Scripting:

  • Develop and maintain automation scripts for routine tasks, enabling efficient monitoring and response procedures.
  • Implement automated processes for scaling and provisioning resources based on observed workload patterns.

Documentation:

  • Document system architecture, configurations, and observability best practices to facilitate knowledge transfer and onboarding for team members.
  • Keep documentation up-to-date to reflect changes in the system and its monitoring setup.

Collaboration with Development Teams:

  • Work closely with software engineers to integrate observability tools into the development lifecycle.
  • Provide guidance on building observable systems and assist in instrumenting applications for effective monitoring.

Continuous Improvement:

  • Stay informed about industry best practices and emerging technologies related to observability and performance engineering.
  • Drive continuous improvement initiatives to enhance the reliability and performance of systems.
  • Security and Compliance:
  • Collaborate with security teams to implement monitoring and observability measures that align with security requirements and compliance standards.
  • Participate in security incident response activities and contribute to ongoing security assessments.
  • Training and Knowledge Sharing:
  • Conduct training sessions for team members and other stakeholders on observability tools, best practices, and performance engineering concepts.
  • Foster a culture of knowledge sharing within the organization.

And other duties as assigned.


Required Work

  • 4+ years of experience with multiple APM tools and extensive experience with Dynatrace
  • 4+ years of experience executing software load and performance testing in an enterprise environment.
  • 3+ years SRE experience
  • Certifications in relevant technologies (e.g. AWS, DevOps, Kubernetes, Dynatrace, Azure, etc.)
  • Working experience with Neoload or equivalent performance testing tool.
  • 1+ years experience testing applications hosted in the cloud.
  • Working experience building CI/CD pipelines and version control systems
  • Working experience with scripting languages (e.g. Python, Bash, Go, etc.)
  • Excellent problem-solving and communication skills.
  • Ability to work collaboratively in a fast-paced, agile environment.


Preferred Work Experience:

  • Proficiency in programming/scripting languages such as Python, Go, or Bash.
  • Experience with infrastructure as code tools such as Terraform or CloudFormation.
  • Deep understanding of Linux systems administration and networking principles.
  • Experience with containerization and orchestration technologies such as Docker and Kubernetes.
  • Experience or familiarity with IIS, HTML, Java, Jboss.
  • Experience in Chaos Engineering
  • Programming experience using.NET, C, C++, Java, or other popular programming language.Perl/Python/JavaScript scripting experience may be considered equivalent.
  • Terraform and Ansible experience
  • Exposure to Splunk tools
  • Exposure to microservices


Knowledge

  • Site Reliability Engineering Principles
  • DevSecOps Principles
  • Agile (SAFe)
  • Healthcare industry
  • ITLT
  • ServiceNow
  • Jira/Confluence


Skills

  • Dynatrace/Prometheus/Grafana
  • Neoload/Jmeter
  • Splunk
  • AWS/Azure/GCP
  • SAFe Agile
  • Strong communication skills (written/verbal)
  • Time management
  • Analytic problem solver
  • Self-starter

Result oriented and proven ability in organizing priorities

Mock Interview

Practice Video Interview with JobPe AI

Start DevOps Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now
Quest Diagnostics logo
Quest Diagnostics

Hospitals and Health Care

Secaucus NJ

RecommendedJobs for You