Agentic AI Ops Engineer - Serverless & CI/CD (AWS)

2 years

0 Lacs

Posted:2 months ago| Platform: Linkedin logo

Apply

Work Mode

Remote

Job Type

Full Time

Job Description

This isn't your typical DevOps role. This is your chance to engineer the backbone of a next-gen AI-powered SaaS platform —where modular agents drive dynamic UI experiences, all running on a serverless AWS infrastructure with a Salesforce and SaaS-native backend. We're not building features—we're building an intelligent agentic ecosystem . If you've led complex multi-cloud builds, automated CI/CD pipelines with Terraform, and debugged AI systems in production, this is your arena. About Us We're a forward-thinking organization on a mission to reshape how businesses leverage cloud technologies and AI. Our approach is centered around delivering high-impact solutions that unify platforms across AWS, enterprise SaaS, and Salesforce. We don't just deliver software; we craft robust product ecosystems that redefine user interactions, streamline processes, and accelerate growth for our clients. The Role We are seeking a hands-on Agentic AI Ops Engineer who thrives at the intersection of cloud infrastructure , AI agent systems , and DevOps automation . In this role, you will build and maintain the CI/CD infrastructure for Agentic AI solutions using Terraform on AWS , while also developing, deploying, and debugging intelligent agents and their associated tools . This position is critical to ensuring scalable, traceable, and cost-effective delivery of agentic systems in production environments. The Responsibilities CI/CD Infrastructure for Agentic AI Design, implement, and maintain CI/CD pipelines for Agentic AI applications using Terraform , AWS CodePipeline , CodeBuild , and related tools. Automate deployment of multi-agent systems and associated tooling, ensuring version control, rollback strategies, and consistent environment parity across dev/test/prod Agent Development & Debugging Collaborate with ML/NLP engineers to develop and deploy modular, tool-integrated AI agents in production. Lead the effort to create debuggable agent architectures , with structured logging, standardized agent behaviors, and feedback integration loops. Build agent lifecycle management tools that support quick iteration, rollback, and debugging of faulty behaviors Monitoring, Tracing & Reliability Implement end-to-end observability for agents and tools, including runtime performance metrics , tool invocation traces , and latency/accuracy tracking . Design dashboards and alerting mechanisms to capture agent failures, degraded performance, and tool bottlenecks in real-time. Build lightweight tracing systems that help visualize agent workflows and simplify root cause analysis Cost Optimization & Usage Analysis Monitor and manage cost metrics associated with agentic operations including API call usage , toolchain overhead , and model inference costs . Set up proactive alerts for usage anomalies , implement cost dashboards , and propose strategies for reducing operational expenses without compromising performance Collaboration & Continuous Improvement Work closely with product, backend, and AI teams to evolve the agentic infrastructure design and tool orchestration workflows . Drive the adoption of best practices for Agentic AI DevOps , including retraining automation, secure deployments, and compliance in cloud-hosted environments. Participate in design reviews, postmortems, and architectural roadmap planning to continuously improve reliability and scalability Requirements 2+ years of experience in DevOps, MLOps, or Cloud Infrastructure with exposure to AI/ML systems . Deep expertise in AWS serverless architecture , including hands-on experience with: AWS Lambda - function design, performance tuning, cold-start optimization. Amazon API Gateway - managing REST/HTTP APIs and integrating with Lambda securely. Step Functions - orchestrating agentic workflows and managing execution states. S3, DynamoDB, EventBridge, SQS - event-driven and storage patterns for scalable AI systems. Strong proficiency in Terraform to build and manage serverless AWS environments using reusable, modular templates Experience deploying and managing CI/CD pipelines for serverless and agent-based applications using AWS CodePipeline, CodeBuild, CodeDeploy , or GitHub Actions Hands-on experience with agent and tool development in Python , including debugging and performance tuning in production. Solid understanding of IAM roles and policies , VPC configuration, and least-privilege access control for securing AI systems. Deep understanding of monitoring, alerting, and distributed tracing systems (e.g., CloudWatch, Grafana, OpenTelemetry). Ability to manage environment parity across dev, staging, and production using automated infrastructure pipelines. Excellent debugging, documentation, and cross-team communication skills Benefits Health Insurance, PTO, and Leave time Ongoing paid professional training and certifications Fully Remote work Opportunity Strong Onboarding & Training program Work Timings - 1pm -10 pm IST Next Steps We're looking for someone who already embodies the spirit of a boundary-breaking AI Technologist—someone who's ready to own ambitious projects and push the boundaries of what LLMs can do. Apply Now : Send us your resume and answer a few key questions about your experience and vision Show Us Your Ingenuity : Be prepared to talk shop on your boldest AI solutions and how you overcame the toughest technical hurdles Collaborate & Ideate : If selected, you'll workshop a real-world scenario with our team—so we can see firsthand how your mind works This is your chance to leave a mark on the future of AI—one LLM agent at a time. We're excited to hear from you! Our Belief We believe extraordinary things happen when technology and human creativity unite. By empowering teams with generative AI, we free them to focus on meaningful relationships, innovative solutions, and real impact. It's more than just code—it's about sparking a revolution in how people interact with information, solve problems, and propel businesses forward. If this resonates with you—if you're driven, daring, and ready to build the next wave of AI innovation—then let's do this. Apply now and help us shape the future. About Expedite Commerce At Expedite Commerce, we believe that people achieve their best when technology enables them to build relationships and explore new ideas. So we build systems that free you up to focus on your customers and drive innovations. We have a great commerce platform that changes the way you do business! See more about us at expeditecommerce.com. You can also read about us on https://www.g2.com/products/expedite-commerce/reviews, and on Salesforce Appexchange/ExpediteCommerce. EEO Statement All qualified applicants to Expedite Commerce are considered for employment without regard to race, color, religion, age, sex, sexual orientation, gender identity, national origin, disability, veteran's status or any other protected characteristic. Show more Show less

Mock Interview

Practice Video Interview with JobPe AI

Start DevOps Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now
Expedite Commerce logo
Expedite Commerce

E-commerce Solutions

Commerce City

RecommendedJobs for You