Key Responsibilities
Infrastructure Setup and Management
- Design and deploy high-availability cloud and on-prem infrastructure environments.
- Implement and manage Kubernetes clusters (EKS / AKS / self-hosted).
- Configure VPCs, subnets, firewalls and load balancers for secure network architecture.
- Set up and optimize databases (PostgreSQL, MySQL, MongoDB) and caching systems (Redis).
- Manage storage systems (S3, Azure Blob, NFS) for scalability and durability.
Automation and Deployment
- Develop Infrastructure as Code (IaC) using Terraform, CloudFormation, or ARM templates.
- Build and maintain CI/CD pipelines with Jenkins, GitHub Actions, or Azure DevOps.
- Automate microservices deployment using Helm, ArgoCD, or Ansible.
- Enable environment versioning and rollback strategies for all deployments.
Monitoring and Reliability
- Deploy and maintain observability stacks such as Prometheus, Grafana, ELK, or Loki.
- Implement centralized logging, health checks and alerting mechanisms.
- Conduct load testing, capacity planning and performance optimization.
- Ensure infrastructure uptime targets and reliability SLAs are met.
Security and Compliance
- Configure and manage Identity and Access Management (IAM / RBAC).
- Implement encryption, TLS/SSL and network security best practices.
- Deploy Web Application Firewalls (WAF) and intrusion detection systems.
- Maintain compliance with organizational or regulatory standards (ISO, SOC 2, GDPR).
Disaster Recovery and Business Continuity
- Design and manage DR and backup strategies across regions or environments.
- Automate snapshots, replication and failover recovery mechanisms.
- Conduct periodic DR drills and recovery testing.
Integration and Networking
- Configure secure connectivity between cloud and on-prem environments (VPN, Direct Connect, ExpressRoute).
- Manage hybrid integrations and service interconnectivity.
- Support middleware, API and data integration platforms as required.
Documentation and Process Management
- Maintain architecture diagrams, network topology and SOPs for deployments.
- Document DR, backup and scaling procedures.
- Provide internal training and knowledge transfer to DevOps and support teams.
Technical Skills Category Required Skills Cloud Platforms AWS (EKS, RDS, EC2, IAM, VPC), Azure (AKS, AD, Storage), On-Prem (VMware, OpenStack, Rancher) Infrastructure as Code Terraform, CloudFormation, ARM Templates, Ansible Containerization & Orchestration Docker, Kubernetes, Helm, ArgoCD CI/CD Tools Jenkins, GitHub Actions, Azure DevOps Monitoring & Logging Prometheus, Grafana, Loki, ELK, CloudWatch Networking & Security VPC, VPN, WAF, IAM, TLS, Security Groups Databases & Storage PostgreSQL, MySQL, Redis, Elasticsearch, S3, Azure Blob Automation & Scripting Bash, Python, PowerShell Version Control Git, Bitbucket Integration Tools (Preferred) Kafka, RabbitMQ, Apache Camel, WSO2 Qualifications
- Bachelor s or Master s degree in Computer Science, IT, or related field.
- Minimum 4-6 years of experience in infrastructure engineering, cloud operations, or DevOps.
- Proven experience in managing large-scale cloud or hybrid environments.
- Strong background in container orchestration and automation.
- Familiarity with security and compliance frameworks (SOC 2, ISO 27001).
Preferred Experience
- Experience in multi-cloud architecture (AWS + Azure).
- Exposure to hybrid cloud integrations or migration projects.
- Knowledge of DR, HA and cost-optimization strategies.
- Certifications such as:
- o AWS Certified DevOps Engineer / Solutions Architect.
- oMicrosoft Certified: Azure Administrator / Architect.
- oCertified Kubernetes Administrator (CKA) or CKAD.
Soft Skills
- Excellent analytical and troubleshooting skills.
- Strong documentation and communication ability.
- Capable of working in cross-functional agile teams.
- Ownership mindset with a focus on reliability and performance.