About Salvo Software
Salvo Software is a global technology company specializing in custom software development and advanced engineering solutions. With distributed teams across the US, LATAM, and India, we partner with clients to build high-performance, scalable systems that solve complex technical challenges. Our culture values innovation, ownership, and engineering excellence. We're growing our AI capabilities and are looking for a backend-focused AI Developer to join our team
Role Description
We are seeking a highly skilled AI Developer with a strong backend and machine learning engineering background to design, train, optimize, and deploy LLM models in on-prem and offline environments. This role is deeply technical and hands-on, requiring expertise across Python ML stacks, model optimization, local inference frameworks, and DevOps workflows tailored for offline systems.You will work closely with our engineering and product teams to build end-to-end LLM pipelines, including data preprocessing, supervised fine-tuning, model quantization, evaluation, and deployment using local or air-gapped infrastructure. If you enjoy working with cutting-edge open-source LLMs, optimizing models for constrained environments, and building reliable backend pipelines, this role is for you.
Responsibilities
Core LLM Development
- Train and fine-tune LLMs using supervised fine-tuning (SFT)
- Work with open-source models such as LLaMA, Mistral, Qwen, and similar architectures
- Build LoRA / Q-LoRA pipelines for efficient fine-tuning
- Implement and optimize data preprocessing workflows, including tokenization and long-context handling
- Use and extend Hugging Face Transformers & Datasets for training and inference
- Parse and process structured and semi-structured data, including XML/XSD files
- Implement document parsing solutions for Office formats (python-docx, OpenXML)
Offline / On-Prem Model Expertise
- Deploy, run, and maintain models fully offline and in air-gapped environments
- Perform model optimization and quantization (GGUF, GPTQ, AWQ, bitsandbytes)
- Build and maintain inference systems using frameworks like vLLM, TGI, and Ollama
- Optimize GPU usage (CUDA, cuDNN, VRAM-aware batching)
- Maintain local CI/CD pipelines for ML models without cloud dependencies
- Manage local model registries, versioning, and artifacts
Backend & DevOps
- Build backend services in Python for ML training and inference workflows
- Work with relational databases (Postgres/MySQL)
- Use Docker and Git for reliable development and deployment pipelines
- Use Azure DevOps for CI/CD (including local runners when applicable)
Requirements
Technical Skills
- Strong experience in Python for backend and ML development
- Expertise with ML frameworks such as PyTorch or TensorFlow, scikit-learn, and pandas
- Solid knowledge of Postgres or MySQL for data storage
- Experience with Docker, Git, and DevOps best practices
- Hands-on expertise with LLM training, fine-tuning, and optimization
- Experience with Hugging Face Transformers & Datasets
- Familiarity with XML/XSD and Office document parsing tools
- Experience deploying models with vLLM, TGI, or Ollama
- Understanding of quantization techniques (GGUF/GPTQ/AWQ)
- Experience working with GPU optimization and CUDA stack
- Ability to build solutions for offline, on-prem, and air-gapped environments
Nice to Have
- Experience managing ML model registries in offline environments
- Familiarity with AWS for hybrid deployments (not mandatory)
- Experience with secure environments, restricted networks, or enterprise compliance
Soft Skills
- Strong ownership and problem-solving ability
- Ability to work in distributed teams across time zones
- Clear communication when discussing complex technical topics