AI- LLMOps Engineer

Concentrix

5 - 10 years

30 - 37 Lacs

Chennai, Pune, Delhi, Mumbai, Bengaluru, Hyderabad, Kolkata

Posted:2 months ago| Platform:

Apply

Skills Required

Computer science Automation Operational excellence Market research Data quality Troubleshooting Resource management Continuous improvement Operations Analytics

Work Mode

Work from Office

Job Type

Full Time

Job Description

We are seeking a skilled LLMOps Engineer with expertise in operationalizing Generative AI solutions to join our AI Engineering Center of Excellence. This role will focus on establishing robust infrastructure, deployment pipelines, and monitoring systems to ensure the reliable, secure, and scalable delivery of LLM-based applications in production environments. The LLMOps Engineer will work closely with AI Tech Leads and Senior Engineers to bridge the gap between development and production deployment of GenAI solutions. Primary Responsibilities Design and implement infrastructure and deployment pipelines for large language model (LLM) applications in production environments Establish monitoring, observability, and logging systems for GenAI applications to ensure performance, reliability, and data quality Develop automated testing frameworks specific to LLM applications, including evaluation of model outputs and prompt effectiveness Implement version control systems for models, prompts, and configurations to ensure reproducibility and traceability Create and maintain CI/CD pipelines for seamless deployment of GenAI solutions Optimize infrastructure and implementations for cost efficiency, considering compute resources and API usage Implement security controls and compliance measures specific to GenAI applications Collaborate with development teams to establish best practices for transitioning GenAI solutions from prototype to production Automate feedback loops for continuous improvement of deployed models Document operational procedures, architecture decisions, and maintenance protocols Required Qualifications 5+ years of experience in DevOps, platform engineering, or related roles with at least 2+ years focused on ML/AI systems Hands-on experience with cloud infrastructure and services for AI workloads (AWS, Azure, GCP) Strong programming skills in languages commonly used for infrastructure and automation (Bash, YAML) Experience with containerization and orchestration technologies (Docker, Kubernetes) for AI workloads Knowledge of LLM deployment patterns and associated infrastructure requirements Familiarity with monitoring tools and techniques for AI systems (e.g., model performance, drift detection, cost tracking) Understanding of CI/CD principles and experience implementing automated pipelines Experience with infrastructure-as-code tools (Terraform, CloudFormation, etc.) Basic understanding of LLM architectures and their operational requirements Bachelors degree in Computer Science, Engineering, or related technical fieldd Preferred Skills Experience deploying and managing production LLM applications at scale Knowledge of vector database operations and optimization for RAG implementations Familiarity with API gateway management and rate limiting strategies Experience with distributed tracing and debugging complex AI systems Understanding of data privacy, security, and compliance considerations for GenAI applications Knowledge of cost optimization techniques for LLM inference and embedding generation Experience with feature flagging and A/B testing frameworks for AI applications Familiarity with LLM evaluation metrics and automated testing approaches Experience with GPU resource management and optimization Success Factors Strong technical curiosity and willingness to explore new GenAI capabilities Balance between operational excellence and enabling rapid innovation Strong problem-solving skills for troubleshooting complex production issues Effective communication across technical and non-technical stakeholders Proactive approach to identifying and mitigating operational risks Ability to translate business requirements into operational specifications Commitment to continuous improvement of operational processes Adaptability to rapidly evolving GenAI technologies and deployment patterns