Site Reliability Engineer

5 - 7 years

0 Lacs

Posted:15 hours ago| Platform: Foundit logo

Apply

Skills Required

Work Mode

On-site

Job Type

Full Time

Job Description

Exp in GCP, Kubernetes, IaC - Terraform automation, Dynatrace or any Enterprise APM tool, Prometheus, Grafana, Automation tools, Scripting languages (Python/ Ansible/ Golang)

Programming Language - Python/ Ansible/ Golang & Automation

Prior experience working in eCommerce, Microfront end and Microservice applications

RESPONSIBILITIES

  • Automate and manage a highly available and scalable cloud environment that allows development teams to deploy and run their services.

  • Having depth knowledge in Terraform (Infrastructure as Cloud) and able to create new terraform or modify the existing file according to Ford formats to create new Monitoring Dashboards / Alert policies and SLA.

  • Collaborating with engineering and Architects teams to evaluate and identify optimal cloud solutions, also leveraging scalability, high-performance and security.

  • Extensive Log monitoring and analysis for both application and deployment pipeline to keep the Cloud Run services up and running without any issues.

  • Creating SLO / SLA / SLI with GCP / Grafana / Dynatrace dashboards.

  • Ability to support incident escalation and troubleshooting and conducting blameless postmortem on the incident resolution.

  • Ensuring efficient functioning of data storage and processing functions in accordance with company security policies and best practices in cloud security.

  • Collaborate with Engineering teams to identify optimization strategies, help develop self-healing capabilities.

  • Experience in developing a strong observability capability.

  • Regularly reviewing performance analysis of existing systems and making recommendations for improvements.

  • Participating in 24x7 on-call production support rotations and handling incident response to minimize disruptions.

QUALIFICATIONS

  • 4 Year College Degree in Computer Science or Equivalent

  • 5 - 6 years experience with JAVA, J2EE, NoSQL/SQL Datastore, Spring Boot, GCP/AWS/Azure & Docker/K8 in Maintenance and Development of multi-tier applications.

  • Proven workexperience in designing, deploying, and operating mid to large scale public cloud environments.

    • Professional Certification

    • Public Cloud GCP is a Must have.

  • Proven work experience in provisioning Infrastructure as Code (IaC) using Terraform Enterprise or community edition.

  • Experience in package, config, and deployment management.

  • Strong knowledge in GitHub, DevOps (Tekton is an advantage)

  • Should be proficient in scripting and coding, that include traditional languages like Python, Node.js and React.

  • Extensive knowledge and hands-on experience in Dynatrace, Grafana and Prometheus micro libraries.

  • Exposure to Cloud Monitoring and logging.

  • Experience with automation tools should be a priority.

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now

RecommendedJobs for You