Senior AI Cloud Operations Engineer

5 - 9 years

0 Lacs

Posted:1 week ago| Platform: Shine logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

As a Senior AI Cloud Operations Engineer, you will play a critical role in designing and managing scalable cloud infrastructure tailored for AI workloads. Your expertise in AI technologies will ensure the scalability, reliability, and performance of our AI systems, enabling the delivery of cutting-edge solutions. Your responsibilities will include: - Cloud Architecture Design: Design, architect, and manage scalable cloud infrastructure for AI workloads using platforms like AWS, Azure, or Google Cloud. - System Monitoring and Optimization: Implement monitoring solutions for high availability and swift performance with tools like Prometheus, Grafana, or CloudWatch. - Collaboration and Model Deployment: Work with data scientists to operationalize AI models, ensuring seamless integration with existing systems using tools like MLflow or TensorFlow Serving. - Automation and Orchestration: Develop automated deployment pipelines with tools like Kubernetes and Terraform to streamline operations. - Security and Compliance: Ensure cloud operations adhere to security best practices and compliance standards such as GDPR or HIPAA. - Documentation and Reporting: Maintain detailed documentation of cloud configurations and operational metrics for transparency. - Performance Tuning: Assess and optimize cloud resource utilization strategies to reduce costs effectively. - Issue Resolution: Identify and resolve technical issues swiftly to minimize downtime and maximize uptime. Qualifications: - Educational Background: Bachelor's degree in Computer Science, Engineering, or related field. Master's degree preferred. - Professional Experience: 5+ years of experience in cloud operations within AI environments. Technical Expertise: - Deep knowledge of cloud platforms (AWS, Azure, Google Cloud) and their AI-specific services. - AI/ML Proficiency: Understanding of AI/ML frameworks like TensorFlow, PyTorch, and experience in ML model lifecycle management. - Infrastructure as Code: Proficiency in tools like Terraform and AWS CloudFormation for cloud deployment automation. - Containerization and Microservices: Experience in managing Docker containers and orchestrating services with Kubernetes. - Soft Skills: Strong analytical, problem-solving, and communication skills for effective collaboration with cross-functional teams. Preferred Qualifications: - Advanced certifications in cloud services (e.g., AWS Certified Solutions Architect, Google Cloud Professional Data Engineer). - Experience in advanced AI techniques such as deep learning or reinforcement learning. - Knowledge of emerging AI technologies to drive innovation within existing infrastructure.,

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now

RecommendedJobs for You