Posted:2 weeks ago| Platform: Linkedin logo

Apply

Work Mode

Remote

Job Type

Full Time

Job Description

Project Role :

Application Lead

Project Role Description :

Lead the effort to design, build and configure applications, acting as the primary point of contact.

Must have skills :

Site Reliability Engineering

Good to have skills :

Google Cloud Data Services, Microsoft Azure Analytics Services

Minimum 12 Year(s) Of Experience Is Required

Educational Qualification :

15 years full time educationJob Title: SRE and Automation Architect Location: [Insert Location or Remote] Experience Level: 10+ Years Employment Type: Full-Time ________________________________________ Job Summary: We are looking for a seasoned Site Reliability Engineering (SRE) and Automation Architect to lead the design and implementation of highly available, reliable, and automated platforms and operations. The ideal candidate will bridge the gap between development and operations, driving infrastructure automation, observability, resiliency engineering, and SRE best practices at scale across multi-cloud and hybrid environments. This role requires deep technical expertise in cloud platforms (Azure/AWS/GCP), CI/CD pipelines, IaC, SLO/SLI implementation, and incident management automation. ________________________________________ Key Responsibilities: Platform Reliability & Architecture:
  • Architect highly available, resilient, and self-healing systems and services.
  • Define and implement SLOs, SLIs, error budgets, and performance benchmarks across platforms.
  • Drive observability standards including logging, metrics, and distributed tracing. Automation Strategy:
  • Lead design and implementation of end-to-end automation across infrastructure provisioning, configuration management, CI/CD pipelines, and incident response.
  • Build reusable IaC modules using tools like Terraform, Ansible, Pulumi, or Bicep.
  • Automate environment creation, scaling, patching, and compliance using scripts and DevOps toolchains. DevOps & CI/CD:
  • Architect and maintain CI/CD pipelines using Azure DevOps, GitHub Actions.
  • Ensure secure and reliable software deployments by implementing automated testing, canary deployments, blue-green strategies, and rollback automation. Monitoring & Incident Response:
  • Define standards for monitoring, alerting, and incident management using tools like Prometheus, Grafana, ELK, Datadog, Splunk, or Azure Monitor.
  • Build auto-remediation runbooks and event-driven workflows using platforms like StackStorm, Azure Logic Apps, PagerDuty, or OpsGenie.
  • Facilitate blameless post-mortems and continuous improvement processes. Security, Compliance & Cost Optimization:
  • Integrate security checks and policy-as-code into automation and deployment pipelines (e.g., with OPA, Sentinel, or Azure Policy).
  • Optimize cost through right-sizing, autoscaling, and usage-based automation. Collaboration & Leadership:
  • Act as the SRE and automation thought leader across development, infrastructure, and operations teams.
  • Mentor engineers and advocate for modern SRE principles such as Toil Reduction, Error Budgeting, and Release Engineering.
  • Collaborate with architecture teams to align reliability with business and technical goals. ________________________________________ Required Skills & Experience:
  • 10+ years of experience in infrastructure, DevOps, or SRE roles, with at least 3 years in an architect-level role
  • Deep expertise in cloud platforms: Azure (preferred), AWS, or GCP
  • Strong experience with IaC (Terraform, ARM/Bicep, Ansible) and automation scripting (Python, Bash, PowerShell)
  • Hands-on experience with CI/CD tools and container orchestration (Kubernetes, Helm, Istio)
  • Proven ability to design and manage high-availability and disaster recovery strategies
  • Strong observability experience with APM tools, log aggregation, and distributed tracing
  • Knowledge of incident response automation and auto-remediation frameworks ________________________________________ Preferred Qualifications:
  • Certified: Azure DevOps Expert, GCP SRE, AWS DevOps Engineer, or Kubernetes Administrator (CKA)
  • Experience with GitOps tools like Flux or ArgoCD
  • Familiarity with Service Meshes, Chaos Engineering (e.g., Chaos Monkey, Litmus)
  • Understanding of FinOps, Cloud Governance, and Security Automation ________________________________________ Soft Skills:
  • Strategic mindset with attention to detail
  • Excellent problem-solving and analytical skills
  • Strong communication and documentation skills
  • Passion for automation, scalability, and improving developer productivity ________________________________________

Mock Interview

Practice Video Interview with JobPe AI

Start DevOps Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You