Posted:19 hours ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

About Infinova


Infinova is an emerging player in intelligent business transformation, dedicated to helping organizations scale smarter and achieve sustainable success. We are building a foundation that combines strategic consultancy, financial expertise, and technology-driven solutions to deliver measurable growth and operational efficiency.


Our services include AI-powered business consultancy, talent solutions, and advanced technology development, enabling businesses to convert data into actionable intelligence, optimize performance, and embrace innovation. With a commitment to transparency, quality, and future-ready strategies, Infinova ensures every partnership delivers lasting impact.


About The Role


The Site Reliability Engineer will play a key role in designing, deploying, and supporting highly available AI platform environments across Azure, AWS, and Google Cloud. This role is focused on ensuring secure, scalable, and reliable operations for cloud-native and AIdriven workloads.


The ideal candidate will collaborate directly with cloud engineers, product teams, and customer stakeholders to optimize infrastructure, automate CI/CD, and deploy cloud resources that support AI products and enterprise platforms. The successful candidate will have experience in cloud deployment operations, orchestration, automation, and performance monitoring for production systems.


Key Responsibilities


• Lead deployment design, planning, and configuration for cloud platforms, including Azure, AWS, GCP, and Kubernetes environments.


• Optimize cloud architectures for availability, scalability, cost efficiency, and performance while aligning with cloud security and operational best practices.


• Craft and maintain automation using IaC frameworks such as Terraform, Ansible, and CloudFormation.


• Deploy CI/CD pipelines to automate build, test, and release processes for cloudbased AI platforms and services.


• Ensure platform availability and reliability by configuring alerting, monitoring, and response systems.


• Provide ongoing support and maintenance for cloud infrastructure, identifying and resolving incidents to ensure high system uptime.


• Perform detailed analysis of failures, conduct root cause analysis, and implement corrective and preventive actions.


• Establish end-to-end monitoring frameworks, dashboards, and observability for AI workloads and cloud deployments.


• Conduct regular security reviews, threat mitigation, and compliance validation for multi-cloud environments.


• Work closely with development, QA, and product teams to enhance service delivery and operational workflows.


• Share best practices, documentation, and knowledge to elevate team capability and platform reliability.


Skills and Experience


• Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent experience.


• Expertise in Azure, AWS, and Google Cloud platform services and operational models.


• Strong knowledge of Linux, virtualization, and container runtimes including Docker and Kubernetes.


• Deep understanding of networking, security, access control, and compliance frameworks within multi-cloud environments.


• Proficiency in IaC tools (Terraform, CloudFormation), configuration tools (Puppet, Chef, Helm), and scripting (Python, Bash, PowerShell).


• Experience with CI/CD tools such as GitHub Actions or Jenkins, and monitoring tools such as Prometheus, ELK, or Splunk.


• Strong diagnostic and troubleshooting skills, with the ability to support mission-critical deployments.


• Excellent communication skills and the ability to collaborate across engineering, product, and customer teams.


• Fluent in English and comfortable working in cloud-based distributed environments.

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You