Role Description
Job Title:
Site Reliability Engineer / DevOps Engineer
Overview
We are seeking a
Site Reliability Engineer (SRE)
or
DevOps Engineer
with 5+ years of experience to join our engineering team. The ideal candidate will have strong expertise in automation, cloud infrastructure, container orchestration, and modern DevOps practices. You'll work closely with development teams to ensure reliable, scalable, and secure systems in a fast-paced environment.
Key Responsibilities
- Design, build, and maintain reliable, scalable, and secure cloud-based infrastructure (AWS, Azure, or GCP).
- Develop and enhance observability through monitoring, logging, ing, and tracing tools (e.g., Prometheus, Grafana, ELK, Datadog).
- Automate infrastructure and operational tasks using Infrastructure-as-Code tools (Terraform, CloudFormation, Pulumi).
- Create and maintain CI/CD pipelines (e.g., GitHub Actions, GitLab CI, Jenkins, ArgoCD) to support fast and safe delivery.
- Lead incident response efforts, conduct root cause analysis, and drive postmortem reviews to ensure high availability.
- Collaborate with engineering teams to define and meet SLAs/SLOs, improving overall service reliability.
- Optimize system performance, reliability, and cost-efficiency through proactive monitoring and resource tuning.
- Implement and maintain security best practices (e.g., secrets management, IAM, firewall rules, compliance).
- Maintain disaster recovery strategies, backup procedures, and high-availability configurations.
Required Qualifications
- 5+ years of experience in an SRE, DevOps, or similar engineering role.
- Proficiency in scripting and automation using Bash, Python, Go, or similar languages.
- Hands-on experience with containerization and orchestration tools (Docker, Kubernetes, Helm).
- Strong understanding of Linux systems administration and networking fundamentals.
- Experience working with cloud platforms such as AWS, Azure, or GCP.
- Expertise in Infrastructure-as-Code tools (Terraform, CloudFormation).
- Familiarity with GitOps principles and modern deployment methodologies.
- Experience with observability tools (Prometheus, Grafana, Datadog, ELK, etc.).
- Strong problem-solving, troubleshooting, and incident response capabilities.
Preferred Qualifications
- Experience in high-traffic, microservices-based architectures.
- Exposure to service meshes (Istio, Linkerd).
- Relevant certifications (e.g., AWS Certified DevOps Engineer, Certified Kubernetes Administrator - CKA).
- Experience with security automation, compliance frameworks (SOC 2, ISO 27001), and related tooling.
Soft Skills
- Strong communication and collaboration abilities across teams.
- Ability to thrive in a fast-paced, agile, and dynamic environment.
- Analytical mindset with a proactive approach to problem-solving.
- Passion for automation, performance optimization, and modern system design.
Skills
Kubernetes,Cloud Platform,Python Scripting,Sre