About Plum
Plum is an employee insurance and health benefits platform focused on making health insurance simple, accessible and inclusive for modern organizations. Healthcare in India is seeing a phenomenal shift with inflation in healthcare costs 3x that of general inflation. A majority of Indians are unable to afford health insurance on their own; and so as many as 600mn Indians will likely have to depend on employer-sponsored insurance.
Plum is on a mission to provide the highest quality insurance and healthcare to 10 million lives by FY2030, through companies that care. Plum is backed by Tiger Global and Peak XV Partners.
Position Overview
We are seeking an experienced Senior DevOps Engineer with 5+ years of expertise to lead our cloud infrastructure and DevOps initiatives. This role is critical in scaling our platform to serve millions of users while maintaining the highest standards of security, reliability, and performance in the healthcare domain.
Key Responsibilities
Infrastructure & Container Management
-
Design, implement, and upgrade enterprise-grade container infrastructure including Kubernetes clusters, node pools, and service mesh architectures
-
Lead the migration and optimization of legacy systems to cloud-native containerized solutions
-
Create, maintain, and optimize deployment manifest files for microservices using HELM charts and advanced templating
-
Architect and implement multi-environment Kubernetes strategies (dev, staging, production) with proper resource allocation and security boundaries
-
Troubleshoot complex container infrastructure issues and implement preventive measures
CI/CD & Automation Leadership
-
Design and maintain sophisticated CI/CD pipelines using ArgoCD for GitOps-based deployments
-
Lead the implementation of advanced pipeline strategies including blue-green deployments, canary releases, and feature flagging
-
Develop comprehensive automation frameworks using multiple scripting languages (Groovy, Go, Python, Shell, PowerShell)
-
Implement and enforce software development best practices including quality gates, automated testing, vulnerability scanning, and penetration testing integration
-
Mentor junior team members on DevOps best practices and tooling
DevSecOps & Security
-
Champion DevSecOps practices by integrating security tools and processes throughout the software development lifecycle
-
Implement comprehensive security scanning, compliance monitoring, and vulnerability management workflows
-
Design and maintain secure artifact repositories using Nexus and JFrog Artifactory with proper access controls and vulnerability scanning
-
Ensure HIPAA and healthcare data compliance requirements are met across all infrastructure components
-
Lead security incident response and post-mortem activities
Observability & Performance Engineering
-
Design and implement comprehensive observability solutions using distributed tracing (Jaeger), service mesh monitoring (Kiali), and cloud-native monitoring tools
-
Architect centralized logging solutions using Elastic Stack (Elasticsearch, Logstash, Kibana) and Fluentd for high-volume healthcare data
-
Implement advanced monitoring and alerting strategies using Prometheus, Grafana, Datadog, and New Relic with custom dashboards and SLA tracking
-
Optimize application and infrastructure performance based on observability insights
-
Design disaster recovery and business continuity strategies
Infrastructure as Code & Cloud Architecture
-
Lead infrastructure design and implementation using Terraform and CloudFormation for multi-cloud environments
-
Architect scalable, fault-tolerant cloud solutions capable of handling healthcare workloads with strict uptime requirements
-
Design and implement auto-scaling strategies for handling variable healthcare enrollment periods and claim processing loads
-
Manage complex networking configurations including VPCs, load balancers, CDNs, and firewall rules for high-volume traffic
Required Qualifications
Experience & Education
-
4+ years of hands-on experience in Cloud & DevOps engineering with a proven track record of scaling production systems
-
Bachelors or Masters degree in Computer Science, Engineering, or related technical field
-
Experience in healthcare, fintech, or other regulated industries is highly preferred
Technical Expertise
-
Cloud Platforms
: Expert-level proficiency in at least one major cloud platform:
-
GCP
: Compute Engine, IAM, VPC, Cloud Storage, Cloud Functions, Cloud SQL, GKE, Pub/Sub, Operations Suite, Cloud Security Command Center
-
AWS
: EC2, IAM, VPC, S3, Lambda, RDS, EKS, SNS, CloudWatch, CloudTrail, AWS Config
-
Container Orchestration
: Advanced Kubernetes experience including custom operators, RBAC, network policies, and multi-cluster management
-
Infrastructure as Code
: Expert proficiency in Terraform with experience in complex, multi-environment deployments
-
CI/CD Tools
: Advanced experience with Jenkins, GitLab CI, ArgoCD, and GitOps methodologies
-
Monitoring & Observability
: Hands-on experience with the full observability stack and custom metric development
-
Programming
: Strong scripting and automation skills in multiple languages
Preferred Qualifications
-
Must Have -
Certification in cloud platforms (GCP Professional Cloud Architect, AWS Solutions Architect)
-
Experience with service mesh technologies (Istio, Linkerd)
-
Knowledge of healthcare compliance standards (HIPAA, SOC 2)
-
Experience with chaos engineering and reliability practices
-
Experience with FinOps and cloud cost optimization
-
Background in microservices architecture and API gateway management
Leadership & Soft Skills
-
Proven ability to lead technical initiatives and mentor junior engineers
-
Experience with incident management and on-call responsibilities
-
Strong communication skills for collaborating with cross-functional teams
-
Experience with agile methodologies and project management
-
Ability to work in a fast-paced startup environment while maintaining attention to detail