Home
Jobs

LLM Ops Engineer - Serverless & CI/CD (AWS)

2 - 7 years

4 - 9 Lacs

Posted:12 hours ago| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description

This isnt your average DevOps role. This isnt just about pipelines or cloud provisioning. This is about engineering the backbone of Agentic AI systems that drive the next generation of enterprise SaaS where conversational interfaces, dynamic UIs, and intelligent agents operate seamlessly on AWS Serverless infrastructure , with deep integration into Salesforce and cross-agent protocols . This is for builders with something to prove. For engineers who ve gone beyond cloud fluency to orchestrate complex, multi-agent ecosystems who want to shape how enterprise applications are deployed, debugged, scaled, and observed in real time. If you re driven by deep automation, passionate about creating fault-tolerant agentic systems, and thrive where innovation is the expectation not the exception you re in the right place. Join us to redefine SaaS infrastructure and champion a new era of AI-powered, product-led enterprise experiences . The Role We are seeking a hands-on Agentic AI Ops Engineer who thrives at the intersection of cloud infrastructure , AI agent systems , and DevOps automation . In this role, you will build and maintain the CI/CD infrastructure for Agentic AI solutions using Terraform on AWS , while also developing, deploying, and debugging intelligent agents and their associated tools . This position is critical to ensuring scalable, traceable, and cost-effective delivery of agentic systems in production environments. The Responsibilities CI/CD Infrastructure for Agentic AI Design, implement, and maintain CI/CD pipelines for Agentic AI applications using Terraform , AWS CodePipeline , CodeBuild , and related tools. Automate deployment of multi-agent systems and associated tooling, ensuring version control, rollback strategies, and consistent environment parity across dev/test/prod. Agent Development & Debugging Collaborate with ML/NLP engineers to develop and deploy modular, tool-integrated AI agents in production. Lead the effort to create debuggable agent architectures , with structured logging, standardized agent behaviors, and feedback integration loops. Build agent lifecycle management tools that support quick iteration, rollback, and debugging of faulty behaviors. Monitoring, Tracing & Reliability Implement end-to-end observability for agents and tools, including runtime performance metrics , tool invocation traces , and latency/accuracy tracking . Design dashboards and alerting mechanisms to capture agent failures, degraded performance, and tool bottlenecks in real-time. Build lightweight tracing systems that help visualize agent workflows and simplify root cause analysis. Cost Optimization & Usage Analysis Monitor and manage cost metrics associated with agentic operations including API call usage , toolchain overhead , and model inference costs . Set up proactive alerts for usage anomalies , implement cost dashboards , and propose strategies for reducing operational expenses without compromising performance. Collaboration & Continuous Improvement Work closely with product, backend, and AI teams to evolve the agentic infrastructure design and tool orchestration workflows . Drive the adoption of best practices for Agentic AI DevOps , including retraining automation, secure deployments, and compliance in cloud-hosted environments. Participate in design reviews, postmortems, and architectural roadmap planning to continuously improve reliability and scalability. 2+ years of experience in DevOps, MLOps, or Cloud Infrastructure with exposure to AI/ML systems . Deep expertise in AWS serverless architecture , including hands-on experienc

Mock Interview

Practice Video Interview with JobPe AI

Start Performance Tuning Interview Now
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now
Expedite Commerce
Expedite Commerce

E-commerce Solutions

Commerce City

50-100 Employees

9 Jobs

    Key People

  • John Doe

    CEO
  • Jane Smith

    CTO

RecommendedJobs for You

Kolkata, Mumbai, New Delhi, Hyderabad, Pune, Chennai, Bengaluru