Jobs
Interviews

1 Llm Internals Jobs

Setup a job Alert
JobPe aggregates results for easy application access, but you actually apply on the job portal directly.

5.0 - 9.0 years

0 Lacs

hyderabad, telangana

On-site

You will be working on Apple scale opportunities and challenges as an engineer at heart, focusing on solving technical problems with a passion for coding and delving deeper into any technology. Your role will involve leading the deployment, scaling, monitoring, and optimization of large language models (LLMs) across diverse environments. This is crucial to ensure that machine learning systems are production-ready, high-performing, and resilient. As a highly skilled LLM Ops and ML Ops Engineer, you must have expertise in Python programming / Go Programming, a comprehensive understanding of LLM internals, and hands-on experience with various inference engines and deployment strategies. Your ability to balance multiple simultaneous competing priorities and deliver solutions in a timely manner will be essential. Additionally, you should be comfortable working with complex architectures and collaborating with multiple teams. Key Responsibilities: - Design and build scalable infrastructure for fine-tuning and deploying large language models. - Develop and optimize inference pipelines using popular frameworks and engines like TensorRT, vLLM, Triton Inference Server. - Implement observability solutions for model performance, latency, throughput, GPU/TPU utilization, and memory efficiency. - Own the end-to-end lifecycle of LLMs in production, from experimentation to continuous integration and continuous deployment (CI/CD). - Collaborate with research scientists, ML engineers, and backend teams to operationalize groundbreaking LLM architectures. - Automate and harden model deployment workflows using Python, Kubernetes, Containers, and orchestration tools like Argo Workflows and GitOps. - Design reproducible model packaging, versioning, and rollback strategies for large-scale serving. - Stay current with advances in LLM inference acceleration, quantization, distillation, and model compilation techniques. Minimum Qualifications: - 5+ years of experience in LLM/ML Ops, DevOps, or infrastructure engineering with a focus on machine learning systems. - Advance level proficiency in Python/Go, with the ability to write clean, performant, and maintainable production code. - Deep understanding of transformer architectures, LLM tokenization, attention mechanisms, memory management, and batching strategies. - Proven experience deploying and optimizing LLMs using multiple inference engines. - Strong background in containerization and orchestration (Kubernetes, Helm). - Familiarity with monitoring tools (e.g., Prometheus, Grafana), logging frameworks, and performance profiling. Preferred Qualifications: - Experience integrating LLMs into micro-services or edge inference platforms. - Experience with Ray distributed inference. - Hands-on experience with quantization libraries. - Contributions to open-source ML infrastructure or LLM optimization tools. - Familiarity with cloud platforms (AWS, GCP) and infrastructure-as-code (Terraform). - Exposure to secure and compliant model deployment workflows. If you meet these qualifications and are excited about the opportunity, we encourage you to submit your CV for consideration.,

Posted 22 hours ago

Apply
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Featured Companies