Jobs
Interviews

23 Triton Jobs

Setup a job Alert
JobPe aggregates results for easy application access, but you actually apply on the job portal directly.

9.0 - 14.0 years

0 Lacs

mumbai, maharashtra, india

On-site

Location: Mumbai, India Experience Level: 9 Plus Years Minimum Qualification: Masters Degree in Computer Science, Engineering, or related field. About the Role: Were looking for a strategic Senior MLOps Engineer to lead the end-to-end design, implementation, and scaling of our AI infrastructure. Youll partner with researchers, product teams, and DevOps to turn prototypes into production services that meet strict SLAs for latency, reliability, and cost efficiency. Responsibilities: Core MLOps Pipelines: Design and implement scalable ML pipelines (training, evaluation, deployment) for LLMs, CV, and multimodal models . Model Serving & CI/CD: Lead efforts in model serving, versioning, automated CI/CD, and real-time monitoring of AI workflows . Inference-as-a-Service: Build and optimize GPU-backed serving infrastructure targeting p99 latency < 100 ms, 99.9% uptime, and > 80% GPU utilization . Governance & Drift Detection: Drive initiatives on model governance, automated drift detection (?10% false positives), and data-management best practices . Vector Search & Agent Orchestration: Integrate vector databases (Qdrant, Pinecone) for low-latency semantic retrieval, and build agentic workflows using LangChain or similar frameworks. Enterprise Multi-Tenancy: Architect RBAC-driven, isolated ML services to securely serve 100500+ organizations. Observability & Logging: Design Prometheus/Grafana dashboards, ELK/Fluentd logging pipelines, and alerting for all ML workloads. CI/CD for Inference APIs: Maintain CI/CD pipelines for Python (FastAPI) and TypeScript (NestJS) inference services. Metrics & Cost Optimization: Define and track SLAs/SLOs, optimize cloud spend by ? 20% year-over-year, and ensure GPU clusters operate at > 80% utilization. Cross-Functional Leadership: Partner with AI researchers, product managers, and legal to align MLOps standards with compliance and roadmap goals. Mentorship & Community: Mentor junior engineers, run quarterly brown-bags, own onboarding docs (upskill 5+ engineers/quarter), and publish ? 1 open-source contribution or talk annually. Requirements : 914 years in software engineering, including ? 4 years in MLOps or ML infrastructure Strong expertise in cloud platforms (AWS/GCP/Azure), Kubernetes, Docker, Terraform, Helm, Kubeflow, and MLflow Experience with inference frameworks (Triton, TensorFlow Serving, BentoML, TorchServe) Familiarity with distributed training, workload schedulers, and GPU-cluster orchestration Proficiency in Python, TypeScript, and infrastructure-as-code (Terraform, Helm, etc.) Proven track record building reliable, scalable ML systems in production. Plus These Critical Skills: Vector DB integration (Qdrant, Pinecone) Agent orchestration (LangChain, LlamaIndex) Multi-tenant security and RBAC Observability stacks (Prometheus/Grafana, ELK) CI/CD for FastAPI/NestJS services Preferred : Masters/PhD in CS/AI and certifications such as AWS ML Specialty, Google Cloud Professional ML Engineer, or CNCF CKA/CKAD. Prior experience at AI-focused startups or enterprises scaling ML for 100500 orgs. Understanding of low-latency streaming inference or agent-based LLM systems. Excellent written and verbal communication, and a proven ability to drive consensus across functions. Show more Show less

Posted 1 week ago

Apply

10.0 - 12.0 years

0 Lacs

bengaluru, karnataka, india

On-site

Oracle Cloud Infrastructure blends the speed of a startup with the scale of an enterprise leader. Our Generative AI Service team builds advanced AI solutions that run on powerful cloud infrastructure tackling real-world, global challenges. In this role, you will: Design, build, and deploy cutting-edge machine learning and generative AI systems, with a focus on Large Language Models (LLMs), AI agents, Retrieval-Augmented Generation (RAG), and large-scale search. Collaborate with scientists, engineers, and product teams to turn complex problems into scalable, cloud-ready AI solutions for enterprises. Develop models and services for decision support, anomaly detection, forecasting, recommendations, NLP/NLU, speech recognition, time series, and computer vision. Run experiments, explore new algorithms, and push the boundaries of AI to optimize performance, customer experience, and business outcomes. Ensure ethical and responsible AI practices in all solutions. We're looking for a Principal Applied Data Scientist with deep expertise in applied ML/AI, hands-on experience building production-grade solutions, and the creativity to innovate at the intersection of AI and enterprise cloud. As a Principal Applied Data Scientist in Oracle's OCI Gen AI & AI Solutions Engineering team, you'll shape the future of enterprise AI for our most strategic customers. You'll lead the design and delivery of next-generation generative AI and machine learning solutions-leveraging cutting-edge technology and Oracle's enterprise-scale cloud to solve complex, high-impact problems in industries like finance, telecom, healthcare, and software development. Lead and mentor teams from concept to delivery, driving product quality and innovation. Design and implement high-quality code for experiments and production ML models. Collaborate with product and engineering teams to deliver customer-focused, market-differentiating AI solutions. Conduct independent R&D to advance state-of-the-art ML with a focus on fairness and explainability. You will: Partner closely with customers to understand their vision, define requirements, and deliver AI solutions that remove blockers and unlock value. Dive deep into model architectures to maximize performance, scalability, and reliability. Build state-of-the-art solutions using emerging Gen AI and ML technologies. Configure and optimize large-scale OpenSearch clusters, including ingestion pipelines for high-volume data. Diagnose and resolve challenges in AI model training, deployment, and serving. Create reusable solution patterns and reference architectures that accelerate adoption across multiple customers. Act as a product evangelist-showcasing our innovations at customer meetings, industry events, and conferences. Collaborate with world-class scientists, engineers, and product managers in a fast-moving, high-impact environment. Qualifications Bachelor's or Master's degree in Computer Science or related technical field, with 10+ years of experience in AI, ML, or data-driven solution development. Proven track record designing, building, and deploying scalable AI/ML solutions in production environments. Deep expertise in Large Language Models (LLMs), Generative AI, Agentic solutions, and advanced ML techniques (fine-tuning, prompt engineering, model optimization). Strong experience with OpenSearch, vector databases, data ingestion pipelines, and large-scale search optimization. Skilled in diagnosing, troubleshooting, and resolving issues in AI model training and serving. Hands-on experience with NLP, NLU, RAG architectures, Agents, and modern AI frameworks (e.g., LangChain, LlamaIndex). Proficient in Python and shell scripting, with familiarity in deep learning frameworks (PyTorch, TensorFlow, JAX, or Transformers). Experience with popular model training and serving frameworks like KServe, KubeFlow, Triton etc. Excellent communication skills for translating complex technical concepts into clear proposals, designs, and presentations. Collaborative mindset with experience working closely with product managers, engineers, and customers. Ability to mentor and guide junior data scientists or ML engineers. Experience acting as a technical evangelist, presenting at conferences, customer briefings, or industry events. Career Level - IC4

Posted 1 week ago

Apply

10.0 - 12.0 years

0 Lacs

bengaluru, karnataka, india

On-site

Oracle Cloud Infrastructure blends the speed of a startup with the scale of an enterprise leader. Our Generative AI Service team builds advanced AI solutions that run on powerful cloud infrastructure tackling real-world, global challenges. In this role, you will: Design, build, and deploy cutting-edge machine learning and generative AI systems, with a focus on Large Language Models (LLMs), AI agents, Retrieval-Augmented Generation (RAG), and large-scale search. Collaborate with scientists, engineers, and product teams to turn complex problems into scalable, cloud-ready AI solutions for enterprises. Develop models and services for decision support, anomaly detection, forecasting, recommendations, NLP/NLU, speech recognition, time series, and computer vision. Run experiments, explore new algorithms, and push the boundaries of AI to optimize performance, customer experience, and business outcomes. Ensure ethical and responsible AI practices in all solutions. We're looking for a Principal Applied Data Scientist with deep expertise in applied ML/AI, hands-on experience building production-grade solutions, and the creativity to innovate at the intersection of AI and enterprise cloud. As a Principal Applied Data Scientist in Oracle's OCI Gen AI & AI Solutions Engineering team, you'll shape the future of enterprise AI for our most strategic customers. You'll lead the design and delivery of next-generation generative AI and machine learning solutions-leveraging cutting-edge technology and Oracle's enterprise-scale cloud to solve complex, high-impact problems in industries like finance, telecom, healthcare, and software development. Lead and mentor teams from concept to delivery, driving product quality and innovation. Design and implement high-quality code for experiments and production ML models. Collaborate with product and engineering teams to deliver customer-focused, market-differentiating AI solutions. Conduct independent R&D to advance state-of-the-art ML with a focus on fairness and explainability. You will: Partner closely with customers to understand their vision, define requirements, and deliver AI solutions that remove blockers and unlock value. Dive deep into model architectures to maximize performance, scalability, and reliability. Build state-of-the-art solutions using emerging Gen AI and ML technologies. Configure and optimize large-scale OpenSearch clusters, including ingestion pipelines for high-volume data. Diagnose and resolve challenges in AI model training, deployment, and serving. Create reusable solution patterns and reference architectures that accelerate adoption across multiple customers. Act as a product evangelist-showcasing our innovations at customer meetings, industry events, and conferences. Collaborate with world-class scientists, engineers, and product managers in a fast-moving, high-impact environment. Qualifications Bachelor's or Master's degree in Computer Science or related technical field, with 10+ years of experience in AI, ML, or data-driven solution development. Proven track record designing, building, and deploying scalable AI/ML solutions in production environments. Deep expertise in Large Language Models (LLMs), Generative AI, Agentic solutions, and advanced ML techniques (fine-tuning, prompt engineering, model optimization). Strong experience with OpenSearch, vector databases, data ingestion pipelines, and large-scale search optimization. Skilled in diagnosing, troubleshooting, and resolving issues in AI model training and serving. Hands-on experience with NLP, NLU, RAG architectures, Agents, and modern AI frameworks (e.g., LangChain, LlamaIndex). Proficient in Python and shell scripting, with familiarity in deep learning frameworks (PyTorch, TensorFlow, JAX, or Transformers). Experience with popular model training and serving frameworks like KServe, KubeFlow, Triton etc. Excellent communication skills for translating complex technical concepts into clear proposals, designs, and presentations. Collaborative mindset with experience working closely with product managers, engineers, and customers. Ability to mentor and guide junior data scientists or ML engineers. Experience acting as a technical evangelist, presenting at conferences, customer briefings, or industry events. Career Level - IC4

Posted 1 week ago

Apply

3.0 - 7.0 years

14 - 18 Lacs

bengaluru

Work from Office

Your Role Design and implement generative AI models using Azure Machine Learning, Cognitive Services, and OpenAI APIs. Fine-tune and deploy models like GPT, Stable Diffusion, or custom transformers on Azure infrastructure. Build and optimize inference pipelines for real-time and batch generation. Integrate generative AI into enterprise applications and workflows. Collaborate with data scientists, product teams, and designers to deliver AI-powered features. Monitor model performance, address bias/hallucination issues, and ensure reliability. Stay current with generative AI research and contribute to experimentation and benchmarking. Ensure compliance with data privacy and ethical AI standards. Your Profile Strong experience with Azure AI services (Azure OpenAI, Azure ML, Cognitive Services). Proficiency in Python and ML frameworks like PyTorch, TensorFlow, Hugging Face. Hands-on experience with LLMs, GANs, diffusion models, and transformer architectures. Familiarity with prompt engineering, fine-tuning, and RAG (retrieval-augmented generation). Experience with model deployment tools (ONNX, TorchServe, Triton). Knowledge of vector databases and embedding techniques. What you"ll love about working here You can shape your career with us. We offer a range of career paths and internal opportunities within Capgemini group. You will also get personalized career guidance from our leaders. You will get comprehensive wellness benefits including health checks, telemedicine, insurance with top-ups, elder care, partner coverage or new parent support via flexible work. You will have the opportunity to learn on one of the industry"s largest digital learning platforms, with access to 250,000+ courses and numerous certifications.

Posted 1 week ago

Apply

3.0 - 5.0 years

9 - 13 Lacs

jaipur

Work from Office

Job Summary Were seeking a hands-on GenAI & Computer Vision Engineer with 35 years of experience delivering production-grade AI solutions. You must be fluent in the core libraries, tools, and cloud services listed below, and able to own end-to-end model developmentfrom research and fine-tuning through deployment, monitoring, and iteration. In this role, youll tackle domain-specific challenges like LLM hallucinations, vector search scalability, real-time inference constraints, and concept drift in vision models. Key Responsibilities Generative AI & LLM Engineering Fine-tune and evaluate LLMs (Hugging Face Transformers, Ollama, LLaMA) for specialized tasks Deploy high-throughput inference pipelines using vLLM or Triton Inference Server Design agent-based workflows with LangChain or LangGraph, integrating vector databases (Pinecone, Weaviate) for retrieval-augmented generation Build scalable inference APIs with FastAPI or Flask, managing batching, concurrency, and rate-limiting Computer Vision Development Develop and optimize CV models (YOLOv8, Mask R-CNN, ResNet, EfficientNet, ByteTrack) for detection, segmentation, classification, and tracking Implement real-time pipelines using NVIDIA DeepStream or OpenCV (cv2); optimize with TensorRT or ONNX Runtime for edge and cloud deployments Handle data challengesaugmentation, domain adaptation, semi-supervised learningand mitigate model drift in production MLOps & Deployment Containerize models and services with Docker; orchestrate with Kubernetes (KServe) or AWS SageMaker Pipelines Implement CI/CD for model/version management (MLflow, DVC), automated testing, and performance monitoring (Prometheus + Grafana) Manage scalability and cost by leveraging cloud autoscaling on AWS (EC2/EKS), GCP (Vertex AI), or Azure ML (AKS) Cross-Functional Collaboration Define SLAs for latency, accuracy, and throughput alongside product and DevOps teams Evangelize best practices in prompt engineering, model governance, data privacy, and interpretability Mentor junior engineers on reproducible research, code reviews, and end-to-end AI delivery Required Qualifications You must be proficient in at least one tool from each category below: LLM Frameworks & Tooling: Hugging Face Transformers, Ollama, vLLM, or LLaMA Agent & Retrieval Tools: LangChain or LangGraph; RAG with Pinecone, Weaviate, or Milvus Inference Serving: Triton Inference Server; FastAPI or Flask Computer Vision Frameworks & Libraries: PyTorch or TensorFlow; OpenCV (cv2) or NVIDIA DeepStream Model Optimization: TensorRT; ONNX Runtime; Torch-TensorRT MLOps & Versioning: Docker and Kubernetes (KServe, SageMaker); MLflow or DVC Monitoring & Observability: Prometheus; Grafana Cloud Platforms: AWS (SageMaker, EC2/EKS) or GCP (Vertex AI, AI Platform) or Azure ML (AKS, ML Studio) Programming Languages: Python (required); C++ or Go (preferred) Additionally: Bachelors or Masters in Computer Science, Electrical Engineering, AI/ML, or a related field 35 years of professional experience shipping both generative and vision-based AI models in production Strong problem-solving mindset; ability to debug issues like LLM drift, vector index staleness, and model degradation Excellent verbal and written communication skills Typical Domain Challenges Youll Solve LLM Hallucination & Safety: Implement grounding, filtering, and classifier layers to reduce false or unsafe outputs Vector DB Scaling: Maintain low-latency, high-throughput similarity search as embeddings grow to millions Inference Latency: Balance batch sizing and concurrency to meet real-time SLAs on cloud and edge hardware Concept & Data Drift: Automate drift detection and retraining triggers in vision and language pipelines Multi-Modal Coordination: Seamlessly orchestrate data flow between vision models and LLM agents in complex workflows

Posted 1 week ago

Apply

3.0 - 7.0 years

0 Lacs

bengaluru

Work from Office

Job Functions: You will be a member of our AI Platform Team, supporting the next generation AI architecture for various research and engineering teams within the organization. You'll partner with vendors and the infrastructure engineering team for security and service availability You'll fix production issues with engineering teams, researchers, data scientists, including performance and functional issues Diagnose and solve customer technical problems Participate in training customers and prepare reports on customer issues Be responsible for customer service improvements and recommend product improvements Write support documentation You'll design and implement zero-downtime to monitor and accomplish a highly available service (99.999%) As a support engineer, find opportunities to automate as part of the problem management process, creating automation to avoid issues. Define engineering excellence for operational maturity You'll work together with AI platform developers to provide the CI/CD model to deploy and configure the production system automatically Develop and follow operational standard processes for tools and automation development. Including: Style guides, versioning practices, source control, branching and merging patterns and advising other engineers on development standards Deliver solutions that accelerate the activities, phenomenal engineers would perform through automation, deep domain expertise, and knowledge sharing Required Skills: Demonstrated ability in designing, building, refactoring and releasing software written in Python. Hands-on experience with ML frameworks such as PyTorch, TensorFlow, Triton. Ability to handle framework-related issues, version upgrades, and compatibility with data processing / model training environments. Experience with AI/ML model training and inferencing platforms is a big plus. Experience with the LLM fine tuning system is a big plus. Debugging and triaging skills. Cloud technologies like Kubernetes, Docker and Linux fundamentals. Familiar with DevOps practices and continuous testing. DevOps pipeline and automations: app deployment/configuration & performance monitoring. Test automations, Jenkins CI/CD. Excellent communication, presentation, and leadership skills to be able to work and collaborate with partners, customers and engineering teams. Well organized and able to manage multiple projects in a fast paced and demanding environment. Good oral/reading/writing English ability

Posted 2 weeks ago

Apply

2.0 - 6.0 years

7 - 11 Lacs

bengaluru

Work from Office

1. Lead Development and deployment of AI Compilers at system level, leveraging deep expertise in AI/ML and Data Science to ensure scalability, reliability, and efficiency. 2. Direct the implementation and optimization of AI Device specific compiler technology, personally driving solutions for complex problems. 3. Collaborate closely with cross-functional teams hands-on approach to ensure seamless integration and efficiency. 4. Proactively stay abreast of the latest advancements in AI/ML technologies and actively contribute to the development and improvement of AI frameworks and libraries, leading by example in fostering innovation. 5. Effectively communicate technical concepts to non-technical stakeholders, showcasing excellent communication and interpersonal skills while leading discussions and decision-making processes. 6. Uphold industry best practices and standards in AI engineering , maintaining unwavering standards of code quality, performance, and security throughout the development lifecycle. Required education Bachelor's Degree Preferred education Master's Degree Required technical and professional expertise 1. AI compiler development Leadership: - Deep experience in demonstrating coding skills, teaming capabilities, and end-to-end understanding of Enterprise AI product. - Deep background in machine learning, deep learning. - Hands-on expertise with MLIR and other AI compilers like XLA, TVM, etc. - Deep understanding of AI accelerators like GPU, TPU, Gaudi, Habana, etc. - Expertise with product design, design principles and integration with various other enterprise products. 2. Traditional AI Methodologies Mastery: - Demonstrated proficiency in traditional AI methodologies, including mastery of machine learning and deep learning frameworks. - Familiarity with model serving platforms such as Triton inference server, TGIS and vLLM, with a track record of leading teams in effectively deploying models in production environments. - Proficient in developing optimal data pipeline architectures for AI applications, taking ownership of designing scalable and efficient solutions. 3. Development Ownership: - Proficient in backend C/C++, with hands-on experience integrating AI technology into full-stack projects. - Demonstrated understanding of the integration of AI tech into complex full-stack applications. - Strong skills in programing with Python - Strong system programming skills 4. Problem-Solving and Optimization Skills: - Demonstrated strength in problem-solving and analytical skills, with a track record of optimizing AI algorithms for performance and scalability. - Leadership in driving continuous improvement initiatives, enhancing the efficiency and effectiveness of AI solutions. Preferred technical and professional experience 1. Knowledge in AI/ML and Data Science: - Over 13 years of demonstrated leadership in AI/ML and Data Science, driving the development and deployment of AI models in production environments with a focus on scalability, reliability, and efficiency. - Ownership mentality, ensuring tasks are driven to completion with precision and attention to detail. 2. Compiler design skills: - Proficiency in LLVM - Base compiler design concepts 3. Commitment to Continuous Learning and Contribution: - Demonstrated dedication to continuous learning and staying updated with the latest advancements in AI/ML technologies. - Proven ability to contribute actively to the development and improvement of AI frameworks and libraries. 4. Effective Communication and Collaboration: - Strong communication skills, with the ability to effectively convey technical concepts to non-technical stakeholders. - Excellence in interpersonal skills, fostering collaboration and teamwork across diverse teams to drive projects to successful completion.

Posted 2 weeks ago

Apply

15.0 - 20.0 years

10 - 20 Lacs

bengaluru

Work from Office

Project Role : Integration Architect Project Role Description : Architect an end-to-end integration solution. Drive client discussions to define the integration requirements and translate the business requirements to the technology solution. Activities include mapping business processes to support applications, defining the data entities, selecting integration technology components and patterns, and designing the integration architecture. Must have skills : AI Agents & Workflow Integration Minimum 18 year(s) of experience is required Educational Qualification : 15 years full time education Summary :We are seeking a C-suite - facing Industrial AI & Agentic Systems Lead to architect, govern, and scale AI solutions - including AI, multi-agent, LLM-driven, tool-using autonomous systems - across manufacturing, supply chain, and plant operations. You will define the strategy-to-scale journey from high-value use case selection (OEE, yield, PdM, energy, scheduling, autonomous quality) to edge - cloud architectures, MLOps/LLMOps, Responsible & Safe AIAgentic AI, and IT/OT convergence, delivering hard business outcomes. Roles & Responsibilities:1.Strategy & C-Suite Advisory:-Define an Industrial AI + Agentic AI strategy & roadmap tied to OEE, yield, cost, throughput, energy, and sustainability KPI - with ROI/payback models.-Shape operating models (central CoE vs. federated), governance, funding, and product-platform scaling approaches.-Educate CxO stakeholders on where Agentic AI adds leverage (closed-loop optimization, autonomous workflows, human-in-the-loop decisioning).2.Architecture & Platforms:- Design edge- plant - cloud reference architectures for ML + Agentic AI:data ingestion (OPC UA, MQTT, Kafka), vector DB/RAG layers, model registries, policy engines, observability, and safe tool execution- Define LLMOps patterns for prompt/version management, agent planning/execution traces, tool catalogs, guardrails, and evaluation harnesses.3.Agentic AI (Dedicated):- Architect multi-agent systems (planner- solver - critic patterns) for:SOP generation & validation, root-cause analysis & corrective action recommendation, autonomous scheduling & rescheduling, MRO/work order intelligence, control room copilots orchestrating OT/IT tools.- Design tooling & action interfaces (function calling tools registry) to safely let agents interact with MES/ERP/CMMS/SCADA/DCS, simulations (DES, digital twins), and optimization solvers (cuOpt, Gurobi, CP-SAT).- Establish policy, safety, and constraints frameworks (role-based agent scopes, allow/deny tool lists, human-in-the-loop gates, audit trails).-Implement RAG + knowledge graph + vector DB stacks for engineering/service manuals, logs, SOPs, and quality records to power grounded agent reasoning.-Set up evaluation & red-teaming for agent behaviors:hallucination tests, unsafe action prevention, KPI-driven performance scoring.4.Use Cases & Solutions (Manufacturing Focus):- Computer Vision Autonomous Quality (TAO, Triton, TensorRT) with agentic triage & escalation to quality engineers.- Predictive/Prescriptive Maintenance with agents orchestrating data retrieval, work order creation, spare part planning.- Process & Yield Optimization where agents run DOE, query historians, simulate scenarios (digital twins), recommend set-point changes.- Scheduling Throughput Optimization with planner - optimizer agents calling OR/RL solvers.- GenAI/LLM for Manufacturing:copilots & autonomous agents for SOPs, RCA documentation, PLC/SCADA code refactoring (with strict guardrails).5.MLOps, LLMOps, Edge AI & Runtime Ops:- Stand up MLOps + LLMOps:CI/CD for models & prompts, drift detection, lineage, experiment & agent run tracking, safe rollback.- Architect Edge AI on NVIDIA Jetson/IGX, x86 GPU, Intel iGPU/OpenVINO, ensuring deterministic latency, TSN/real-time where needed.- Implement observability for agents (traces, actions, rewards/scores, SLA adherence).6.Responsible Safe AI, Compliance & Security:- Codify Responsible AI and Agentic Safety policies:transparency, explainability (XAI), auditability, IP protection, privacy, toxicity & jailbreak prevention.- Align with regulations (e.g., GxP, FDA 21 CFR Part 11, ISO 27001, IEC 62443, ISO 26262, AS9100) for industrial domains.7.Delivery, GTM & Thought Leadership:- Serve as chief architect design authority on large AI + Agentic programs; mentor architects, data scientists/engineers, and MLOps/LLMOps teams.- Lead pre-sales, solution shaping, executive storytelling, and ecosystem partnership building (NVIDIA, hyperscalers, MES/SCADA, optimization, cybersecurity). Professional & Technical Skills: Must have Skills:- Proven AI at scale delivery record in manufacturing with quantified value and hands-on leadership of LLM/Agentic AI initiatives.- Deep understanding of shop-floor tech (MES/MOM, SCADA/DCS, historians- PI/AVEVA, PLC/RTUs), protocols (OPC UA, MQTT, Modbus, Kafka).- Expertise in ML & CV stacks (PyTorch/TensorFlow, Triton, TensorRT, TAO Toolkit) and LLM/Agentic stacks (function calling, RAG, vector DBs, prompt/agent orchestration).- MLOps & LLMOps (MLflow, Kubeflow, SageMaker/Vertex, Databricks, Feast, LangSmith/Evaluation frameworks, guardrails).- Edge AI deployment on NVIDIA/Intel/x86 GPUs, with K8s/K3s, Docker, Triton Inference Server.- Strong security & governance for IT/OT and AI/LLM (IEC 62443, Zero Trust, data residency, key/token vaults, prompt security).- Executive communication:convert complex AI+Agentic architectures into board-level impact narratives Good to have skills:- Agentic frameworks:LangGraph, AutoGen, CrewAI, Semantic Kernel, Guardrails, LMQL.- Optimization & RL:cuOpt, Gurobi, OR-Tools, RLlib, Stable Baselines.- Digital Twins & Simulation:NVIDIA Omniverse/Isaac/Modulus, AnyLogic, AspenTech, Siemens.- Knowledge graphs & semantics:Neo4j, RDF/OWL, SPARQL, ontologies for manufacturing.- Standards & frameworks:ISA-95, RAMI 4.0, MIMOSA, ISO 8000, DAMA-DMBOK.- Experience in regulated sectors (Pharma/MedTech, Aero/Defense, Automotive).- AI/ML/LLM:PyTorch, TensorFlow, ONNX, Triton, TensorRT, TAO Toolkit, RAPIDS,LangChain/LangGraph, AutoGen, Semantic Kernel, Guardrails, OpenVINO.- MLOps/LLMOps/DataOps:MLflow, Kubeflow, SageMaker, Vertex AI, Databricks, Feast, Airflow/Prefect, Great Expectations, LangSmith, PromptLayer.- Edge/OT:NVIDIA Jetson/IGX, K3s/K8s, Docker, OPC UA, MQTT, Ignition, PI/AVEVA, ThingWorx.- Data/Streaming/RAG:Kafka, Flink/Spark, Delta/Iceberg/Hudi, Snowflake/BigQuery/Synapse- Vector DBs (Milvus, FAISS, Qdrant, Weaviate), KG (Neo4j).- Cloud :AWS/Azure/GCP(at least one at expert level), Kubernetes, Security (CISSP/IEC 62443) a plus.- Lean/Six Sigma/TPM nice to have credibility with operations.- Leadership & Behavioral Competencies:C-suite advisor & storyteller with outcome-first mindset.- Architectural authority balancing speed, safety, and scale.- People build across DS/ML, DE, MLOps/LLMOps, and OT.- Change leader who can operationalize AI & agents on real shop floors. Additional Info:- A minimum of 20 years of progressive information technology experience is required.- A Bachelors/master's in engineering CS Data Science (PhD preferred for R&D-heavy roles) is required.- This position is based at Bengaluru location. Qualification 15 years full time education

Posted 2 weeks ago

Apply

2.0 - 5.0 years

15 - 27 Lacs

coimbatore

Work from Office

2-5 yrs of exp with backend systems at scale - on AI / LLM/ DeepLearning systems. Proficiency in Node.js and Python. Hands on TritonServer, TensorRT, ONNX etc Understanding of REST APIs, distributed systems, Cloud Infrastructure (preferably AWS)

Posted 3 weeks ago

Apply

4.0 - 8.0 years

0 Lacs

navi mumbai, maharashtra

On-site

You will be joining a fast-growing enterprise AI & data science consultancy that caters to global clients in finance, healthcare, and enterprise software sectors. Your primary responsibility as a Senior LLM Engineer will involve designing, fine-tuning, and operationalizing large language models for real-world applications using PyTorch and Hugging Face tooling. You will also be tasked with architecting and implementing RAG pipelines, building scalable inference services and APIs, collaborating with data engineers and ML scientists, and establishing engineering best practices. To excel in this role, you must have at least 4 years of experience in data science/ML engineering with a proven track record of delivering LLM-based solutions to production. Proficiency in Python programming, PyTorch, and Hugging Face Transformers is essential. Experience with RAG implementation, production deployment technologies such as Docker and Kubernetes, and cloud infrastructure (AWS/GCP/Azure) is crucial. Preferred qualifications include familiarity with orchestration frameworks like LangChain/LangGraph, ML observability, model governance, and mitigation techniques for bias. Besides technical skills, you will benefit from a hybrid working model in India, opportunities to work on cutting-edge GenAI projects, and a collaborative consultancy culture that emphasizes mentorship and career growth. If you are passionate about LLM engineering and seek end-to-end ownership of projects, Zorba Consulting India offers an equal opportunity environment where you can contribute to diverse and inclusive teams. To apply for this role, submit your resume along with a brief overview of a recent LLM project you led, showcasing your expertise in models, infrastructure, and outcomes.,

Posted 3 weeks ago

Apply

0.0 years

0 Lacs

bengaluru, karnataka, india

On-site

Description By applying to this position, your application will be considered for all locations we hire for in the United States. Annapurna Labs designs silicon and software that accelerates innovation. Customers choose us to create cloud solutions that solve challenges that were unimaginable a short time agoeven yesterday. Our custom chips, accelerators, and software stacks enable us to take on technical challenges that have never been seen before, and deliver results that help our customers change the world. Role AWS Neuron is the complete software stack for the AWS Trainium (Trn1/Trn2) and Inferentia (Inf1/Inf2) our cloud-scale Machine Learning accelerators. This role is for a Machine Learning Engineer on one of our AWS Neuron teams: The ML Distributed Training team works side by side with chip architects, compiler engineers and runtime engineers to create, build and tune distributed training solutions with Trainium instances. Experience with training these large models using Python is a must. FSDP (Fully-Sharded Data Parallel), Deepspeed, Nemo and other distributed training libraries are central to this and extending all of this for the Neuron based system is key. ML?Frameworks partners with compiler, runtime, and research experts to make AWS?Trainium and?Inferentia feel native inside the tools builders already lovePyTorch, JAX, and the rapidly evolving vLLM ecosystem. By weaving Neuron?SDK deep into these frameworks, optimizing operators, and crafting targeted extensions, we unlock every teraflop of Annapurnas AI chips for both training and lightning?fast inference. Beyond kernels, we shape next?generation serving by upstreaming new features and driving scalable deployments with vLLM, Triton, and TensorRTturning breakthrough ideas into production?ready AI for millions of customers. The ML Inference team collaborates closely with hardware designers, software optimization experts, and systems engineers to develop and optimize high-performance inference solutions for Inferentia chips. Proficiency in deploying and optimizing ML models for inference using frameworks like TensorFlow, PyTorch, and ONNX is essential. The team focuses on techniques such as quantization, pruning, and model compression to enhance inference speed and efficiency. Adapting and extending popular inference libraries and tools for Neuron-based systems is a key aspect of their work. Key job responsibilities You&aposll join one of our core ML teams - Frameworks, Distributed Training, or Inference - to enhance machine learning capabilities on AWS&aposs specialized AI hardware. Your responsibilities will include improving PyTorch and JAX for distributed training on Trainium chips, optimizing ML models for efficient inference on Inferentia processors, and collaborating with compiler and runtime teams to maximize hardware performance. You&aposll also develop and integrate new features in ML frameworks to support AWS AI services. We seek candidates with strong programming skills, eagerness to learn complex systems, and basic ML knowledge. This role offers growth opportunities in ML infrastructure, bridging the gap between frameworks, distributed systems, and hardware acceleration. About The Team Annapurna Labs was a startup company acquired by AWS in 2015, and is now fully integrated. If AWS is an infrastructure company, then think Annapurna Labs as the infrastructure provider of AWS. Our org covers multiple disciplines including silicon engineering, hardware design and verification, software, and operations. AWS Nitro, ENA, EFA, Graviton and F1 EC2 Instances, AWS Neuron, Inferentia and Trainium ML Accelerators, and in storage with scalable NVMe, Basic Qualifications To qualify, applicants should have earned (or will earn) a Bachelors or Masters degree between December 2022 and September 2025. Working knowledge of C++ and Python Experience with ML frameworks, particularly PyTorch, Jax, and/or vLLM Understanding of parallel computing concepts and CUDA programming Preferred Qualifications Experience in using analytical tools, such as Tableau, Qlikview, QuickSight Experience in building and driving adoption of new tools Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit https://amazon.jobs/content/en/how-we-hire/accommodations for more information. If the country/region youre applying in isnt listed, please contact your Recruiting Partner. Company - Annapurna Labs (U.S.) Inc. Job ID: A3029797 Show more Show less

Posted 3 weeks ago

Apply

4.0 - 6.0 years

0 Lacs

bengaluru, karnataka, india

On-site

Primary Role Title: Senior LLM Engineer About The Opportunity We are a fast-growing enterprise AI & data science consultancy serving global clients across finance, healthcare, and enterprise software. The team builds production-grade LLM-driven productsRAG systems, intelligent assistants, and custom inference pipelinesthat deliver measurable business outcomes. Location: India (Hybrid) Role & Responsibilities Design, fine-tune and productionize large language models (instruction tuning, LoRA/PEFT) using PyTorch and Hugging Face tooling for real-world applications. Architect and implement RAG pipelines: embeddings generation, chunking strategies, vector search integration (FAISS/Pinecone/Milvus) and relevance tuning for high-quality retrieval. Build scalable inference services and APIs (FastAPI/Falcon), containerize (Docker) and deploy to cloud/Kubernetes with low-latency and cost-optimized inference (quantization, ONNX/Triton). Collaborate with data engineers and ML scientists to productionize data pipelines, automate retraining, monitoring, evaluation and drift detection. Drive prompt-engineering, evaluation frameworks and safety/guardrail implementation to ensure reliable, explainable LLM behavior in production. Establish engineering best-practices (Git workflows, CI/CD, unit tests, observability) and mentor junior engineers to raise team delivery standards. Skills & Qualifications Must-Have 4+ years in data science/ML engineering with demonstrable experience building and shipping LLM-based solutions to production. Strong Python engineering background and hands-on experience with PyTorch and Hugging Face Transformers (fine-tuning, tokenizers, model optimization). Practical experience implementing RAG: embeddings, vector DBs (FAISS/Pinecone/Weaviate/Milvus), chunking and retrieval tuning. Production deployment experience: Docker, Kubernetes, cloud infra (AWS/GCP/Azure) and inference optimization (quantization, batching, ONNX/Triton). Preferred Experience with LangChain/LangGraph or similar orchestration frameworks, and building agentic workflows. Familiarity with ML observability, model governance, safety/bias mitigation techniques and cost/performance trade-offs for production LLMs. Benefits & Culture Highlights Hybrid working model in India with flexible hours, focused on outcomes and work-life balance. Opportunity to work on cutting-edge GenAI engagements for enterprise customers and accelerate your LLM engineering career. Collaborative consultancy culture with mentorship, learning stipend and clear growth paths into technical leadership. This role is with Zorba Consulting India. If you are an experienced LLM practitioner who enjoys end-to-end ownershipfrom research experiments to robust production systemsapply with your resume and a short note on a recent LLM project you led (models, infra, and outcomes). Zorba Consulting India is an equal opportunity employer committed to diversity and inclusion. Skills: ml,llm,data science Show more Show less

Posted 3 weeks ago

Apply

3.0 - 7.0 years

0 Lacs

karnataka

On-site

As a GPU Kernel Developer specializing in AI Models at AMD, you will play a crucial role in developing high-performance kernels for cutting-edge GPU hardware. Working alongside a team of industry specialists, you will have the opportunity to leverage your expertise in GPU computing and hardware architecture to optimize AI and HPC applications. Your contributions will be instrumental in advancing key AI operators on AMD GPUs, ensuring optimal performance and efficiency. Your responsibilities will include developing high-performance GPU kernels for AI operators, optimizing GPU code through structured methodologies, and supporting critical workloads in NLP, LLM, Recommendation, Vision, and Audio domains. Collaboration with system architects, GPU specialists, and performance validation teams will be essential to analyze and enhance training and inference processes for AI. Additionally, you will engage with open-source framework maintainers to integrate code changes and drive AI operator performance improvements. To excel in this role, you should possess a strong background in GPU kernel development, proficiency in HIP, CUDA, OpenCL, and Triton programming, and a deep understanding of GPU hardware. Your expertise in profiling, debugging tools, and software engineering best practices will be key in optimizing GPU kernels and enhancing performance. A Master's or PhD in Computer Science, Computer Engineering, or a related field is preferred to support your technical skills and knowledge. Join AMD in revolutionizing next-generation computing experiences through innovative GPU technologies and contribute to solving the world's most critical challenges. Be a part of our mission to build exceptional products that shape the future of data centers, artificial intelligence, PCs, gaming, and embedded systems. Together, we advance the boundaries of innovation and drive impactful change in the industry and beyond.,

Posted 1 month ago

Apply

3.0 - 7.0 years

0 Lacs

karnataka

On-site

As a GPU Kernel Developer specializing in AI models at AMD, your primary responsibility is to develop high-performance GPU kernels for cutting-edge and upcoming GPU hardware. You will collaborate with a team of industry experts, leveraging the latest hardware and software technologies to drive innovation in the field. To excel in this role, you must possess significant experience in GPU kernel development and optimization for AI/HPC applications. Your expertise should include a deep understanding of GPU computing, hardware architecture, and proficiency in HIP, CUDA, OpenCL, and Triton development. Effective communication skills are essential as you will be required to work within a team environment and convey complex technical concepts to both technical and non-technical audiences. Key responsibilities include developing high-performance GPU kernels for essential AI operators on AMD GPUs, optimizing GPU code through structured methodologies, and supporting critical workloads in NLP/LLM, Recommendation, Vision, and Audio domains. You will collaborate closely with system-level performance architects, GPU hardware specialists, and various validation and marketing teams to analyze and enhance training and inference processes for AI applications. Furthermore, you will engage with open-source framework maintainers to align with their requirements and integrate code changes upstream, debug, maintain, and optimize GPU kernels, and drive AI operator performance improvements. Your expertise in software engineering best practices will be crucial in ensuring the quality and efficiency of the developed solutions. Preferred qualifications for this role include knowledge of GPU computing technologies such as HIP, CUDA, OpenCL, and Triton, experience in optimizing GPU kernels, proficiency in profiling and debugging tools, a strong foundation in GPU hardware, and excellent programming skills in C/C++/Python. Additionally, a Master's or PhD in Computer Science, Computer Engineering, or a related field is preferred. Join us at AMD, where we are dedicated to pushing the boundaries of innovation and solving the world's most significant challenges through transformative technology.,

Posted 1 month ago

Apply

3.0 - 5.0 years

9 - 13 Lacs

Jaipur

Work from Office

Job Summary Were seeking a hands-on GenAI & Computer Vision Engineer with 3-5 years of experience delivering production-grade AI solutions. You must be fluent in the core libraries, tools, and cloud services listed below, and able to own end-to-end model developmentfrom research and fine-tuning through deployment, monitoring, and iteration. In this role, youll tackle domain-specific challenges like LLM hallucinations, vector search scalability, real-time inference constraints, and concept drift in vision models. Key Responsibilities Generative AI & LLM Engineering Fine-tune and evaluate LLMs (Hugging Face Transformers, Ollama, LLaMA) for specialized tasks Deploy high-throughput inference pipelines using vLLM or Triton Inference Server Design agent-based workflows with LangChain or LangGraph, integrating vector databases (Pinecone, Weaviate) for retrieval-augmented generation Build scalable inference APIs with FastAPI or Flask, managing batching, concurrency, and rate-limiting Computer Vision Development Develop and optimize CV models (YOLOv8, Mask R-CNN, ResNet, EfficientNet, ByteTrack) for detection, segmentation, classification, and tracking Implement real-time pipelines using NVIDIA DeepStream or OpenCV (cv2); optimize with TensorRT or ONNX Runtime for edge and cloud deployments Handle data challengesaugmentation, domain adaptation, semi-supervised learningand mitigate model drift in production MLOps & Deployment Containerize models and services with Docker; orchestrate with Kubernetes (KServe) or AWS SageMaker Pipelines Implement CI/CD for model/version management (MLflow, DVC), automated testing, and performance monitoring (Prometheus + Grafana) Manage scalability and cost by leveraging cloud autoscaling on AWS (EC2/EKS), GCP (Vertex AI), or Azure ML (AKS) Cross-Functional Collaboration Define SLAs for latency, accuracy, and throughput alongside product and DevOps teams Evangelize best practices in prompt engineering, model governance, data privacy, and interpretability Mentor junior engineers on reproducible research, code reviews, and end-to-end AI delivery Required Qualifications You must be proficient in at least one tool from each category below: LLM Frameworks & Tooling: Hugging Face Transformers, Ollama, vLLM, or LLaMA Agent & Retrieval Tools: LangChain or LangGraph; RAG with Pinecone, Weaviate, or Milvus Inference Serving: Triton Inference Server; FastAPI or Flask Computer Vision Frameworks & Libraries: PyTorch or TensorFlow; OpenCV (cv2) or NVIDIA DeepStream Model Optimization: TensorRT; ONNX Runtime; Torch-TensorRT MLOps & Versioning: Docker and Kubernetes (KServe, SageMaker); MLflow or DVC Monitoring & Observability: Prometheus; Grafana Cloud Platforms: AWS (SageMaker, EC2/EKS) or GCP (Vertex AI, AI Platform) or Azure ML (AKS, ML Studio) Programming Languages: Python (required); C++ or Go (preferred) Additionally: Bachelors or Masters in Computer Science, Electrical Engineering, AI/ML, or a related field 3-5 years of professional experience shipping both generative and vision-based AI models in production Strong problem-solving mindset; ability to debug issues like LLM drift, vector index staleness, and model degradation Excellent verbal and written communication skills Typical Domain Challenges Youll Solve LLM Hallucination & Safety: Implement grounding, filtering, and classifier layers to reduce false or unsafe outputs Vector DB Scaling: Maintain low-latency, high-throughput similarity search as embeddings grow to millions Inference Latency: Balance batch sizing and concurrency to meet real-time SLAs on cloud and edge hardware Concept & Data Drift: Automate drift detection and retraining triggers in vision and language pipelines Multi-Modal Coordination: Seamlessly orchestrate data flow between vision models and LLM agents in complex workflows

Posted 1 month ago

Apply

2.0 - 6.0 years

7 - 11 Lacs

Bengaluru

Work from Office

1. Lead Development and deployment of AI Compilers at system level, leveraging deep expertise in AI/ML and Data Science to ensure scalability, reliability, and efficiency. 2. Direct the implementation and optimization of AI Device specific compiler technology, personally driving solutions for complex problems. 3. Collaborate closely with cross-functional teams hands-on approach to ensure seamless integration and efficiency. 4. Proactively stay abreast of the latest advancements in AI/ML technologies and actively contribute to the development and improvement of AI frameworks and libraries, leading by example in fostering innovation. 5. Effectively communicate technical concepts to non-technical stakeholders, showcasing excellent communication and interpersonal skills while leading discussions and decision-making processes. 6. Uphold industry best practices and standards in AI engineering , maintaining unwavering standards of code quality, performance, and security throughout the development lifecycle. Required education Bachelor's Degree Preferred education Master's Degree Required technical and professional expertise 1. AI compiler development Leadership: - Deep experience in demonstrating coding skills, teaming capabilities, and end-to-end understanding of Enterprise AI product. - Deep background in machine learning, deep learning. - Hands-on expertise with MLIR and other AI compilers like XLA, TVM, etc. - Deep understanding of AI accelerators like GPU, TPU, Gaudi, Habana, etc. - Expertise with product design, design principles and integration with various other enterprise products. 2. Traditional AI Methodologies Mastery: - Demonstrated proficiency in traditional AI methodologies, including mastery of machine learning and deep learning frameworks. - Familiarity with model serving platforms such as Triton inference server, TGIS and vLLM, with a track record of leading teams in effectively deploying models in production environments. - Proficient in developing optimal data pipeline architectures for AI applications, taking ownership of designing scalable and efficient solutions. 3. Development Ownership: - Proficient in backend C/C++, with hands-on experience integrating AI technology into full-stack projects. - Demonstrated understanding of the integration of AI tech into complex full-stack applications. - Strong skills in programing with Python - Strong system programming skills 4. Problem-Solving and Optimization Skills: - Demonstrated strength in problem-solving and analytical skills, with a track record of optimizing AI algorithms for performance and scalability. - Leadership in driving continuous improvement initiatives, enhancing the efficiency and effectiveness of AI solutions. Preferred technical and professional experience 1. Knowledge in AI/ML and Data Science: - Over 13 years of demonstrated leadership in AI/ML and Data Science, driving the development and deployment of AI models in production environments with a focus on scalability, reliability, and efficiency. - Ownership mentality, ensuring tasks are driven to completion with precision and attention to detail. 2. Compiler design skills: - Proficiency in LLVM - Base compiler design concepts 3. Commitment to Continuous Learning and Contribution: - Demonstrated dedication to continuous learning and staying updated with the latest advancements in AI/ML technologies. - Proven ability to contribute actively to the development and improvement of AI frameworks and libraries. 4. Effective Communication and Collaboration: - Strong communication skills, with the ability to effectively convey technical concepts to non-technical stakeholders. - Excellence in interpersonal skills, fostering collaboration and teamwork across diverse teams to drive projects to successful completion.

Posted 1 month ago

Apply

12.0 - 16.0 years

0 Lacs

karnataka

On-site

As a Firefly models & Services architect within Adobe's Firefly Gen AI Models and services group, you will play a crucial role in supporting the creation, enhancement, and deployment of model pipelines for Adobe's top products across various domains. Your expertise in machine learning will be essential in shaping the future of digital experiences by architecting the pipelines of the future foundation models at Adobe. You will collaborate closely with a talented team of ML and service engineers to drive innovation and bring impactful features to Adobe's products, reaching millions of users worldwide. Your responsibilities will include designing and optimizing large-scale foundation model pipelines in Generative AI, developing GenAI backend services for Firefly, and collaborating with Applied researchers and engineers to bring ideas to production. As a technical leader, you will provide mentorship to junior team members, explore new ML technologies to enhance engineering effectiveness, and contribute to the continuous improvement of Adobe's GenAI engineering processes. Your strong communication skills, technical leadership abilities, and hands-on experience with Generative AI technologies will be instrumental in your success in this role. To thrive in this position, you should possess a Master's or Ph.D. in Computer Science, AI/ML, or related fields, along with at least 12 years of experience and 3+ years in a Lead/Architect role. Your expertise in the latest Generative AI technologies, such as GAN, diffusion, and transformer models, as well as your experience with large-scale GenAI model pipelines and ML feature shipping, will set you up for success. Additionally, your collaboration skills and experience in tech leading time-sensitive and business-critical GenAI projects will be valuable assets in this role. Preferred qualifications include experience in training or optimizing models using CUDA, Triton, TRT, and converting models from frameworks like PyTorch and TensorFlow to ensure compatibility and optimized performance across different platforms. A good publication record in Computer Science, AI/ML, or related fields will further strengthen your candidacy for this position. At Adobe, you will have the opportunity to immerse yourself in a supportive work environment that fosters creativity, curiosity, and continuous learning. If you are seeking to make a significant impact and grow your career in a dynamic and innovative setting, Adobe is the place for you. Join us in shaping the future of digital experiences and explore the meaningful benefits we offer to our employees. Please note that Adobe is committed to accessibility, and if you require accommodation during the application process, you can reach out to accommodations@adobe.com or call (408) 536-3015.,

Posted 2 months ago

Apply

0.0 - 2.0 years

4 - 5 Lacs

Vellore

Work from Office

Job Title: Accelerated Computing Engineer Entry Level Experience Level: 02 Years Location: Vellore Employment Type: Full-time About the Role We seek a driven Accelerated Computing Engineer to join our innovative team in Vellore. This entry-level role offers a unique opportunity to work with advanced AI/ML models, accelerated computing technologies, and cloud infrastructure while collaborating on cutting-edge research and deployment projects. You will work with a variety of state-of-the-art models such as BGE-Large, Mixtral, Gemma, LLaMA, and Stable Diffusion, as well as other fine-tuned architectures, to solve real-world computing challenges through advanced AI/ML infrastructure solutions. Key Responsibilities Customer Interaction & Analysis: Work closely with customers to analyze technical and business needs, translating them into robust, AI-driven solutions. Model Deployment & Optimization: Develop and deploy advanced AI/ML models such as LLaMA, Mixtral, Gemma, and other GenAI models while optimizing their performance for varied computing environments. Performance Testing & System Benchmarking: Execute advanced test scenarios and performance benchmarks across AI/ML models and distributed systems to ensure optimal performance. Infrastructure & Model Research: Research, configure, and maintain infrastructure solutions (using tools like TensorRT and PyTorch) supporting our models and accelerated computing workloads. AI/ML Model Integration: Support and deploy models such as Stable Diffusion, BGE, Mistral, and custom fine-tuned models into end-to-end pipelines for AI/ML-driven solutions. Automation & Process Improvements: Drive automation strategies to streamline workflows, improve testing accuracy, and optimize system performance. Technical Liaison: Served as the technical bridge by collaborating with product development teams, tracking customer feedback, and ensuring timely resolutions. Model Configuration & Troubleshooting: Create custom scripts, troubleshoot advanced configurations, and support tuning efforts for AI/ML model customization. Skills & Qualifications Required Skills: Bachelor’s or Master’s degree in Computer Science, Engineering, or related technical discipline. Strong foundational knowledge of AI/ML model deployment and cloud infrastructure. Proficiency with AI/ML frameworks & libraries, including PyTorch, TensorRT, and Triton. Hands-on experience with deployment models such as LLaMA, Mixtral, Gemma, and Stable Diffusion. Familiarity with distributed computing environments and orchestration tools like Kubernetes. Proficiency in workflow automation, performance tuning, and large-scale system debugging. Understanding of cloud computing technologies and infrastructure architecture, including storage, networking, and computing paradigms. Preferred Skills: Experience working with object storage technologies like AWS S3, Azure Blob Storage, and MinIO. Familiarity with advanced AI/ML model frameworks such as Gemma-2b, Mixtral-8x7b, Mistral-7b-instruct, and other fine-tuned AI models. Expertise in GPU configuration and tuning for AI/ML workloads, including drivers and machine learning optimization strategies. Familiarity with serverless computing and Function as a Service (FaaS) concepts. Experience with infrastructure as code (IaC) and performance benchmarking methodologies.

Posted 2 months ago

Apply

5.0 - 10.0 years

11 - 16 Lacs

Gurugram

Work from Office

Looking for challenging roleIf you really want to make a difference - make it with us Can we energize society and fight climate change at the same time At Siemens Energy, we can. Our technology is key, but our people make the difference. Brilliant minds innovate. They connect, create, and keep us on track towards changing the worlds energy systems. Their spirit fuels our mission. We are seeking a highly skilled and driven Senior AI Engineer to join our team as a founding member, developing the critical data and AI infrastructure for training foundation models for power grid applications. You will be instrumental in building and optimizing the end-to-end systems, data pipelines, and training processes that will power our AI research. Working closely with research scientists, you will translate cutting-edge research into robust, scalable, and efficient implementations, enabling the rapid development and deployment of transformational AI solutions. This role requires deep hands-on expertise in distributed training, data engineering, MLOps, a proven track record of building scalable AI infrastructure. Your new role- challenging and future- oriented Design, build, and rigorously optimize everything necessary for large-scale training, fine-tuning and/or inference with different model architectures. Includes the complete stack from dataloading to distributed training to inference; to maximize the MFU (Model Flop Utilization) on the compute cluster. Collaborate closely and proactively with research scientists, translating research models and algorithms into high-performance, production-ready code and infrastructure. Ability to implement, integrate & test latest advancements from research publications or open-source code. Relentlessly profile and resolve training performance bottlenecks, optimizing every layer of the training stack from data loading to model inference for speed and efficiency. Contribute to technology evaluations and selection of hardware, software, and cloud services that will define our AI infrastructure platform. Experience with MLOps frameworks (MLFlow, WnB, etc) to implement best practices across the model lifecycle- development, training, validation, and monitoring- ensuring reproducibility, reliability, and continuous improvement. Create thorough documentation for infrastructure, data pipelines, and training procedures, ensuring maintainability and knowledge transfer within the growing AI lab. Stay at the forefront of advancements in large-scale training strategies and data engineering and proactively driving improvements and innovation in our workflows and infrastructure. High-agency individual demonstrating initiative, problem-solving, and a commitment to delivering robust and scalable solutions for rapid prototyping and turnaround. We dont need superheroes, just super minds Bachelor's or masters degree in computer science, Engineering, or a related technical field. 5+ years of hands-on experience in a role specifically building and optimizing infrastructure for large-scale machine learning systems Deep practical expertise with AI frameworks (PyTorch, Jax, Pytorch Lightning, etc). Hands-on experience with large-scale multi-node GPU training, and other optimization strategies for developing large foundation models, across various model architectures. Ability to scale solutions involving large datasets and complex models on distributed compute infrastructure. Excellent problem-solving, debugging, and performance optimization skills, with a data-driven approach to identifying and resolving technical challenges. Strong communication and teamwork skills, with a collaborative approach to working with research scientists and other engineers. Experience with MLOps best practices for model tracking, evaluation and deployment. Desired skills Public GitHub profile demonstrating a track record of open-source contributions to relevant projects in data engineering or deep learning infrastructure is a BIG PLUS. Experience with performance monitoring and profiling tools for distributed training and data pipelines. Experience writing CUDA/Triton/CUTLASS kernels.

Posted 2 months ago

Apply

4.0 - 5.0 years

8 - 12 Lacs

Vadodara

Hybrid

Job Type: Full Time Job Description: We are seeking an experienced AI Engineer with 4-5 years of hands-on experience in designing and implementing AI solutions. The ideal candidate should have a strong foundation in developing AI/ML-based solutions, including expertise in Computer Vision (OpenCV). Additionally, proficiency in developing, fine-tuning, and deploying Large Language Models (LLMs) is essential. As an AI Engineer, candidate will work on cutting-edge AI applications, using LLMs like GPT, LLaMA, or custom fine-tuned models to build intelligent, scalable, and impactful solutions. candidate will collaborate closely with Product, Data Science, and Engineering teams to define, develop, and optimize AI/ML models for real-world business applications. Key Responsibilities: Research, design, and develop AI/ML solutions for real-world business applications, RAG is must. Collaborate with Product & Data Science teams to define core AI/ML platform features. Analyze business requirements and identify pre-trained models that align with use cases. Work with multi-agent AI frameworks like LangChain, LangGraph, and LlamaIndex. Train and fine-tune LLMs (GPT, LLaMA, Gemini, etc.) for domain-specific tasks. Implement Retrieval-Augmented Generation (RAG) workflows and optimize LLM inference. Develop NLP-based GenAI applications, including chatbots, document automation, and AI agents. Preprocess, clean, and analyze large datasets to train and improve AI models. Optimize LLM inference speed, memory efficiency, and resource utilization. Deploy AI models in cloud environments (AWS, Azure, GCP) or on-premises infrastructure. Develop APIs, pipelines, and frameworks for integrating AI solutions into products. Conduct performance evaluations and fine-tune models for accuracy, latency, and scalability. Stay updated with advancements in AI, ML, and GenAI technologies. Required Skills & Experience: AI & Machine Learning: Strong experience in developing & deploying AI/ML models. Generative AI & LLMs: Expertise in LLM pretraining, fine-tuning, and optimization. NLP & Computer Vision: Hands-on experience in NLP, Transformers, OpenCV, YOLO, R-CNN. AI Agents & Multi-Agent Frameworks: Experience with LangChain, LangGraph, LlamaIndex. Deep Learning & Frameworks: Proficiency in TensorFlow, PyTorch, Keras. Cloud & Infrastructure: Strong knowledge of AWS, Azure, or GCP for AI deployment. Model Optimization: Experience in LLM inference optimization for speed & memory efficiency. Programming & Development: Proficiency in Python and experience in API development. Statistical & ML Techniques: Knowledge of Regression, Classification, Clustering, SVMs, Decision Trees, Neural Networks. Debugging & Performance Tuning: Strong skills in unit testing, debugging, and model evaluation. Hands-on experience with Vector Databases (FAISS, ChromaDB, Weaviate, Pinecone). Good to Have: Experience with multi-modal AI (text, image, video, speech processing). Familiarity with containerization (Docker, Kubernetes) and model serving (FastAPI, Flask, Triton).

Posted 3 months ago

Apply

2 - 6 years

11 - 16 Lacs

Gurugram

Work from Office

Looking for challenging role?If you really want to make a difference - make it with us Can we energize society and fight climate change at the same time? At Siemens Energy, we can. Our technology is key, but our people make the difference. Brilliant minds innovate. They connect, create, and keep us on track towards changing the worlds energy systems. Their spirit fuels our mission. Our culture is defined by caring, agile, respectful, and accountable individuals. We value excellence of any kind. Sounds like you? We are seeking a highly skilled and driven Senior AI Engineer to join our team as a founding member, developing the critical data and AI infrastructure for training foundation models for power grid applications. You will be instrumental in building and optimizing the end-to-end systems, data pipelines, and training processes that will power our AI research. Working closely with research scientists, you will translate cutting-edge research into robust, scalable, and efficient implementations, enabling the rapid development and deployment of transformational AI solutions. This role requires deep hands-on expertise in distributed training, data engineering, MLOps, a proven track record of building scalable AI infrastructure. Your new role- challenging and future- oriented Design, build, and rigorously optimize everything necessary for large-scale training, fine-tuning and/or inference with different model architectures. Includes the complete stack from dataloading to distributed training to inference; to maximize the MFU (Model Flop Utilization) on the compute cluster. Collaborate closely and proactively with research scientists, translating research models and algorithms into high-performance, production-ready code and infrastructure. Ability to implement, integrate & test latest advancements from research publications or open-source code. Relentlessly profile and resolve training performance bottlenecks, optimizing every layer of the training stack from data loading to model inference for speed and efficiency. Contribute to technology evaluations and selection of hardware, software, and cloud services that will define our AI infrastructure platform. Experience with MLOps frameworks (MLFlow, WnB, etc) to implement best practices across the model lifecycle- development, training, validation, and monitoring- ensuring reproducibility, reliability, and continuous improvement. Create thorough documentation for infrastructure, data pipelines, and training procedures, ensuring maintainability and knowledge transfer within the growing AI lab. Stay at the forefront of advancements in large-scale training strategies and data engineering and proactively driving improvements and innovation in our workflows and infrastructure. High-agency individual demonstrating initiative, problem-solving, and a commitment to delivering robust and scalable solutions for rapid prototyping and turnaround. We dont need superheroes, just super minds Bachelor's or masters degree in computer science, Engineering, or a related technical field. 5+ years of hands-on experience in a role specifically building and optimizing infrastructure for large-scale machine learning systems Deep practical expertise with AI frameworks (PyTorch, Jax, Pytorch Lightning, etc). Hands-on experience with large-scale multi-node GPU training, and other optimization strategies for developing large foundation models, across various model architectures. Ability to scale solutions involving large datasets and complex models on distributed compute infrastructure. Excellent problem-solving, debugging, and performance optimization skills, with a data-driven approach to identifying and resolving technical challenges. Strong communication and teamwork skills, with a collaborative approach to working with research scientists and other engineers. Experience with MLOps best practices for model tracking, evaluation and deployment. Desired skills Public GitHub profile demonstrating a track record of open-source contributions to relevant projects in data engineering or deep learning infrastructure is a BIG PLUS. Experience with performance monitoring and profiling tools for distributed training and data pipelines. Experience writing CUDA/Triton/CUTLASS kernels.

Posted 4 months ago

Apply

1.0 - 6.0 years

10 - 14 Lacs

bengaluru

Work from Office

General Summary: Cloud Data center : ML Test ENIGNEER Job Description As a cloud ML Data center Test engineer you will be working for the Qualcomm Cloud AI100 platform m. This will include defining test for software/firmware features, Enabling automated executing and reporting, analysis of bugs and system level testing. You will work closely with development and architecture team to understand accelerator features and define test plans and solutions needed to deliver production grade software/firmware to the end customer. Required Skills and Aptitudes Strong Proficiency with scripting languages and OOP concepts : Python, Shell Scripting Good knowledge on ML/DL/LLM architectures Good knowledge on LLMs, AI Inferencing solutions ( vLLM/Triton/Dynamo/etc..) Strong debugging and analysis skills, for root causing complex issues Good problem-solving skill and Willingness to learn/work in a high-calibre mixed software/firmware development team Min Experience: Masters in Computer science or Electronics & Communication AI/ML (or equivalent) Minimum 1+ years of Experience and Bachelors in Computer science or Electronics & Communication AI/ML (or equivalent) Minimum Qualifications: Bachelor's degree in Engineering, Information Systems, Computer Science, or related field.

Posted Date not available

Apply

1.0 - 4.0 years

4 - 8 Lacs

bengaluru

Work from Office

Research Engineer position at IBM India Research Lab is a challenging, dynamic and highly innovative role. We are actively looking for top talent in the area of: Software stack optimization for IBM’s Spyre accelerator , including compiler enhancements, specialized kernels, performance libraries, and tooling. Low-level optimization within the PyTorch stack or below , aimed at maximizing GPU resource utilization. Required education Bachelor's Degree Preferred education Master's Degree Required technical and professional expertise You should have one or more of the following: A Master’s/PhD degree in Computer Science, AI, or a related field from a top-tier institution 0–8 years of experience in the Systems for AI domain, with expertise in one or more of the following: Model architectures Distributed training Inference optimization GPU or other accelerator architectures Multi-accelerator networking (e.g., NCCL) Compilers CUDA programming Triton kernel development Experience with PyTorch FSDP and HuggingFace libraries Proficiency in Python or C++ A growth mindset and a pragmatic, problem-solving attitude

Posted Date not available

Apply
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Featured Companies