Key Responsibilities:
- Develop, deploy, and maintain scalable and highly available systems on Kubernetes.
- Design and implement automation processes for system deployments and scaling. - Monitor system performance, troubleshoot issues, and ensure continuous system improvements. - Collaborate with development teams to enhance the infrastructure required for their needs, including CI/CD pipelines. - Respond to and resolve operational incidents, providing comprehensive incident reports and leading post-mortems. - Manage code deployments, fixes, updates, and related processes on multiple environments.
High-Value Professional Experience and Skills:
Design of infrastructure migration projects from on-prem to cloud
Proven expert in partnering and leading technology resources in solving complex business needs
Cloud architecture design and implementation to solve key business needs and meet team goals
in depth Knowledge in AWS solutions
Infrastructure-as-Code (IaC) tools (Prefer Terraform; Related: Ansible, Puppet, ARM templates)
Automated CI/CD pipelines (Prefer GitHub Actions; Related: Jenkins, Argo CD)
Containerized workloads (Prefer AKS & Helm; Related: EKS, other K8s distributions, Docker, JFrog)
Serverless solutions (e.g. Logic Apps, Function Apps, Functions, WebJobs, AWS Lambda)
Logging and monitoring tools (e.g., Amazon CloudWatch, AWS CloudTrail or Fluentd, Prometheus, Grafana)
Other Desirable Professional Experience and Skills:
Strong and enthusiastic technologist, able to demonstrate broad technical cloud knowledge
Ability to act as a point of expertise, sharing knowledge and advising on best practices
Strong budgeting/finance skills and experience with cost management
Multi-component system integration and troubleshooting
Performance analysis and tuning
Kubernetes service meshes (Prefer Linkerd; Related: Istio, Traefik mesh)
Coding/scripting (e.g., Linux/Bash/Sh, Windows/PowerShell/Batch, Python, Java)
Load balancing and service proxies (e.g., Nginx, Traefik, HAProxy, F5)
Other products in use: Jira, Confluence, MySQL Workbench, Maven
Basic Professional Experience and Skills:
Knowledge of SDLC and change control, and associated procedures
Common source code control tools and repositories (e.g., Git/GitHub/GitLab, VS Code, SVN)
Ability to describe and discuss technical solutions with various audiences
Required Education and Length of Professional Experience:
4+ years professional experience in SRE roles on a major cloud platform.
BSc/BE/MCA in Engineering/Computer Science/IT, advanced degree preferred
Industry certification for Cloud Developers/Architects (Prefer AWS; Related: GCP)
We are looking for a candidate who is not only technical but also has a keen eye for automation and efficiency. If you are passionate about system reliability and have a proactive approach to challenges, we would like to meet you.
Benefits:
-
Group Health Insurance Policy (covering self and family)
-
Competitive salary and benefits package
-
Opportunities for career advancement and professional growth
-
Collaborative and inclusive work environment
-
Flexible work arrangements
-
Health and wellness programs
-
Group Life insurance/accident policy
-
Generous long-service awards
-
New Baby gift
-
Rewards and Certifications