Posted:15 hours ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Mandatory Technical Skills

  • AWS (Amazon Web Services) - Deep hands-on experience with core AWS services (EC2, S3, VPC, IAM, Lambda, etc.)
  • Python - Strong scripting and automation skills
  • Kubernetes - Experience managing large-scale, production-grade clusters
  • Docker - Containerization of microservices and workload orchestration
  • Terraform - Infrastructure as Code (IaC) for AWS provisioning and environment consistency
  • Prometheus & Grafana - Monitoring and dashboarding for real-time observability and alerting

Resilience & Reliability Engineering Requirements

Fault Tolerance :

  • Design systems to continue functioning even when individual components fail
  • Implement redundant servers, fallback mechanisms, and health checks

Graceful Degradation

  • Architect applications to reduce performance or limit features during service disruptions rather than fail completely

Auto-Recovery & Self-Healing Systems

  • Enable auto-scaling, self-restarting pods, health-aware load balancing
  • Configure automated failover and rehydration of services

Redundancy

  • Deploy multi-region and multi-AZ architectures
  • Implement backup, replication, and automated failover strategies

Monitoring & Alerting

  • Implement end-to-end observability pipelines with Prometheus, Grafana, and alert managers
  • Build proactive alerting and SLO/SLI dashboards for key infrastructure components

Chaos Engineering

  • Simulate real-world failures (network, instance crash, latency injection) using tools like Gremlin, Litmus, or custom scripts
  • Validate system resiliency strategies via periodic chaos testing

Disaster Recovery

  • Plan, implement, and test DR strategies, ensuring defined RTO (Recovery Time Objective) and RPO (Recovery Point Objective)
  • Run regular DR drills and create automated recovery documentation/playbooks

Soft Skills & Collaboration

  • Strong problem-solving and debugging skills
  • Ability to work independently and in cross-functional teams
  • Excellent communication skills - both written and verbal
  • Experience working in Agile/Scrum environments

Preferred Qualifications

  • AWS Certification (e.g., Solutions Architect, DevOps Engineer)
  • Experience with service mesh (Istio, Linkerd)
  • Familiarity with GitOps tools (ArgoCD, Flux)
  • Exposure to CI/CD tools like Jenkins, GitLab CI, or CircleCI
(ref:hirist.tech)

Mock Interview

Practice Video Interview with JobPe AI

Start DevOps Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now
HIC Global Solutions logo
HIC Global Solutions

Information Technology

Los Angeles

RecommendedJobs for You

Itanagar, Arunachal Pradesh, India

Itanagar, Arunachal Pradesh, India