Job Summary
Site Reliability Engineer (SRE) is a core member of the IT Infrastructure team and is a critical role in ensuring the reliability, scalability, and performance of our enterprise software systems. The ideal candidate will have deep expertise in AWS cloud services, production deployment methodologies, and infrastructure automation, while working collaboratively across multiple technical teams to deliver robust, scalable solutions.
Key Responsibilities
- Infrastructure & Cloud Operations
- Design, implement, and maintain highly available, scalable infrastructure on AWS cloud platform
- Manage AWS services including EC2, RDS, S3, VPC, CloudFormation, Lambda, ECS/EKS, and monitoring services
- Optimize cloud resource utilization and cost management strategies
- Ensure security best practices and compliance across cloud infrastructure
- Production Deployment & CI/CD
- Lead production deployment processes for enterprise software applications
- Design and implement robust CI/CD pipelines using tools such as Jenkins, GitLab CI, AWS CodePipeline, or similar platforms
- Establish deployment strategies including blue-green deployments, canary releases, and rollback procedures
- Monitor and troubleshoot production systems to ensure minimal downtime and optimal performance
- Infrastructure as Code & Automation
- Develop and maintain infrastructure as code using tools like Terraform, CloudFormation, or AWS CDK
- Create automation scripts and tools to reduce manual operational overhead
- Implement configuration management using tools such as Ansible, Puppet, or Chef
- Build self-healing systems and automated monitoring solutions
- Scripting & Programming
- Write efficient scripts in Python, Bash, Go, or other relevant programming languages
- Develop tools for system monitoring, alerting, and operational efficiency
- Contribute to internal tooling and automation frameworks
- Debug and optimize existing automation and deployment scripts
- Networking & Security
- Configure and manage cloud networking components including VPCs, subnets, security groups, and load balancers
- Implement network security best practices and troubleshoot connectivity issues
- Manage DNS, CDN, and other network services
- Ensure proper network segmentation and access controls
- Collaboration & Communication
- Work closely with DevOps, Database Administrators, System Administrators, and Software Development teams
- Participate in on-call rotation and incident response procedures
- Lead post-incident reviews and implement preventive measures
- Communicate technical concepts clearly to both technical and non-technical stakeholders
Required Skills And Experience
- Minimum 3 years of experience in Site Reliability Engineering, DevOps, or similar role
- 5+ years preferred with demonstrated progression in responsibility and technical expertise
- Extensive hands-on experience with AWS cloud services and SysOps operations
- Proven track record in production deployment of enterprise software systems
- Strong understanding of CI/CD concepts and implementation experience
- Proficiency in infrastructure as code tools and methodologies
- Advanced scripting abilities in Python, Bash, Go, or similar programming languages
- Solid understanding of cloud networking concepts, security groups, VPCs, and load balancing
- Experience with containerization technologies (Docker, Kubernetes)
- Knowledge of monitoring and observability tools (CloudWatch, Prometheus, Grafana, ELK stack)
- Familiarity with database administration and performance optimization
- Understanding of security best practices and compliance frameworks
- Excellent professional written and spoken English communication skills
- Strong analytical and problem-solving abilities
- Experience working in cross-functional team environments
- Ability to work independently and manage multiple priorities effectively
- Customer-focused mindset with attention to detail
Good To Have
- AWS certifications (Solutions Architect, SysOps Administrator, or DevOps Engineer)
- Experience with microservices architecture and serverless technologies
- Knowledge of disaster recovery and business continuity planning
- Background in performance tuning and capacity planning
- Experience with agile development methodologies
- Previous experience in enterprise environments with high availability requirements
Educational Qualifications
Bachelor’s degree in Computer Science, Information Technology, or related field (or equivalent experience).
Location:
Akasa Air Head Office - Mumbai
Akasa Air does not solicit or accept any form of payment from candidates or institutions during its recruitment process. Any such claims are fraudulent and should be disregarded.Individuals engaging with unauthorized entities do so at their own risk. We encourage you to report any such incidents to
info@akasaair.com
for appropriate action.