Principal Site Reliability Engineer

0 years

0 Lacs

Posted:23 hours ago| Platform: Foundit logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Caring. Connecting. Growing together.

Primary Responsibilities:

  • Provision infrastructure using Terraform in cloud(Azure) environments
  • Manage and optimize Azure Cloud Infrastructure, building resilient, self-scaling systems
  • Ensure availability, performance, monitoring, and infrastructure provisioning for platforms spanning Cloud(Azure) and On-Prem technologies
  • Collaborate with Engineering and Technical Support teams to resolve critical issues
  • Automate repeatable tasks to reduce operational toil
  • Deploy applications using CI/CD tools and manage the full lifecycle: code repository, scanning, artifact management, compliance, deployment, and configuration
  • Partner with development teams to resolve platform-related roadblocks
  • Conduct post-mortems and drive continuous improvement after production incidents
  • Implement automation, self-healing, and real-time monitoring in production systems
  • Participate in cross-functional projects involving Engineering, Cloud, Networking, CI/CD, Monitoring, and Project Management
  • Stay current with emerging technologies and drive innovation
  • Enhance CI/CD pipelines with automated performance and load testing
  • Collaborate with DevOps and QA to integrate performance benchmarks into release gates
  • Cloud Architecture & Reliability
    • Design and implement scalable, reliable cloud architectures
    • Drive innovation in SRE through AI and automation
    • Explore and implement AI-driven solutions for anomaly detection, incident prediction, and intelligent alerting
  • Comply with the terms and conditions of the employment contract, company policies and procedures, and any and all directives (such as, but not limited to, transfer and/or re-assignment to different work locations, change in teams and/or work shifts, policies in regard to flexibility of work benefits and/or work environment, alternative work arrangements, and other decisions that may arise due to the changing business environment). The Company may adopt, vary or rescind these policies and directives in its absolute discretion and without any limitation (implied or otherwise) on its ability to do so

Required Qualifications:

  • 10+ years in Software Engineering, DevOps, or SRE roles, with 3+ years in a principal or lead capacity
  • 5+ years experience with CI/CD tooling (e.g., GitHub Actions)
  • 5+ years experience with container orchestration in cloud platforms (Azure)
  • 5+ years deep experience in observability and monitoring tools (Prometheus, Grafana)
  • 5+ years experience with Docker and Kubernetes
  • 3+ years hands-on experience with Terraform and Infrastructure as Code
  • Experience migrating legacy solutions to Azure/Cloud-hosted environments
  • Experience managing and migrating on-premises environments
  • Solid scripting and automation skills in Python and PowerShell (Python preferred)
  • Security & compliance:
    • Ability to strengthen infrastructure security posture across all environments
    • Ability to conduct regular security assessment and apply best practices

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now
Optum logo
Optum

Hospitals and Health Care

Eden Prairie MN

RecommendedJobs for You