As a Senior DevOps Engineer, you'll lead the design, automation, and optimization of our cloud and Kubernetes-based systems. You'll ensure our microservices architecture is secure, observable, and always available, empowering our engineering teams to deliver quickly and confidently.
The core responsibilities for the job include the following:
Infrastructure and Automation:
- Architect, build, and manage multi-cloud infrastructure (GCP, AWS, Azure) using Terraform and Helm.
- Own end-to-end Kubernetes infrastructure (GKE, EKS, AKS) for our microservices ecosystem.
- Develop and maintain Helm charts, YAML configurations, and reusable infrastructure templates.
- Build CI/CD pipelines (GitHub Actions / GitLab CI / Jenkins / ArgoCD) enabling automated, zero-downtime deployments.
- Implement infrastructure-as-code and automate environment provisioning, scaling, and monitoring.
Monitoring, Observability, and Reliability:
- Implement and maintain Prometheus, Grafana, Loki, and Alertmanager for monitoring and alerting.
- Define and track SLOs/SLIs, build real-time dashboards for system performance and health.
- Automate incident detection, alerting, and post-mortem analysis.
- Continuously improve performance, uptime, and scalability through automation and tuning.
Security and Compliance:
- Implement IAM best practices, network segmentation, and secrets management (Vault, Secret Manager).
- Conduct vulnerability scanning, container image hardening, and compliance checks.
- Ensure infrastructure adheres to security-by-design principles, with data encryption and policy enforcement.
- Integrate security scanning into CI/CD pipelines.
Cloud and SaaS Reliability:
- Manage load balancers, DNS, traffic routing, and autoscaling for high-availability SaaS deployments.
- Design disaster recovery, backup, and failover strategies.
- Optimize resource utilization and implement FinOps practices for cloud cost efficiency.
- Work closely with development teams to ensure smooth releases and robust observability.
Requirements:
- 5+ years of experience in DevOps / SRE roles for SaaS or large-scale production systems.
- Strong expertise in GCP, with working knowledge of AWS and Azure.
- Deep hands-on experience with Kubernetes, Helm, and microservices deployments.
- Strong command over Terraform, Docker, Linux administration, and networking fundamentals.
- Proven experience building CI/CD pipelines and automation workflows.
- Proficiency with Prometheus, Grafana, Loki, and Alertmanager.
- Strong understanding of cloud security, IAM policies, firewall rules, and encryption practices.
- Excellent troubleshooting, debugging, and root cause analysis skills.