Posted:1 week ago| Platform: Foundit logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Solution Architect AI

Department:

Job Purpose

To architect, design, and implement GPU-enabled High Performance Computing (HPC) and AI/ML platform solutions that are scalable, secure, and optimized to support enterprise AI, ML/DL workloads, simulation, and large-scale data analytics. This role will define infrastructure strategies, workload placement, performance tuning, and managed service roadmaps for GPU and HPC platforms within the Data Centre business.

Key Responsibilities

Platform Architecture & Design

  • Architect GPU and HPC infrastructure platforms for AI/ML training, inference, and HPC workloads.
  • Design GPU-as-a-Service (GPUaaS) models including on-demand, reserved, and burst GPU clusters.
  • Integrate AI/ML frameworks (TensorFlow, PyTorch, KubeFlow, JupyterHub, etc.) into enterprise-ready stacks.

Infrastructure & Workload Optimization

  • Optimize performance tuning, resource scheduling, and workload orchestration across HPC clusters and GPU nodes.
  • Enhance distributed training, model parallelism, and storage bandwidth utilization (NVMe, Lustre, GPFS, Ceph).

AI/ML Platform Enablement

  • Provide cloud-native environments with containerized ML workflows (Kubernetes, Docker, Singularity).
  • Build and manage model hosting and inference platforms (REST APIs, containerized inference servers).

Security & Compliance

  • Implement data security, encryption, access control, and compliance frameworks for sensitive AI/HPC workloads.
  • Architect air-gapped solutions for government/defense workloads when required.

Technology Integration & Innovation

  • Evaluate and integrate next-generation GPUs (NVIDIA H200/A100/L40S, AMD MI300, etc.), HPC accelerators, and AI chipsets.
  • Enable hybrid and hyperconverged AI infrastructure combining GPU, CPU, and storage resources.

Customer & Business Enablement

  • Collaborate with data scientists, researchers, and enterprise customers to align platform capabilities with business outcomes.
  • Define GPU/HPC platform services catalog and managed service offerings.

Automation & DevOps

  • Implement MLOps pipelines, infrastructure as code (Terraform, Ansible), and workload scheduling (SLURM, Kubernetes).

Qualifications & Experience

Educational Qualifications

  • BE/B-Tech or equivalent in Computer Science, Electronics & Communication, or related fields.

Experience

  • 812 years of overall IT experience, including 5+ years in HPC/AI/ML/GPU platform architecture.

Technical Expertise

  • Strong knowledge of GPU architecture (NVIDIA, AMD) and HPC systems.
  • Proficiency with AI/ML frameworks such as TensorFlow, PyTorch, Keras, MXNet, Hugging Face.
  • Experience with distributed training and orchestration frameworks like KubeFlow, MLflow, Ray, Horovod.
  • Knowledge of parallel computing, MPI, CUDA, ROCm, and GPU drivers.
  • Familiarity with storage technologies such as NVMe, Lustre, GPFS, Ceph, and Object Storage for HPC/AI workloads.
  • Hands-on experience with GPU cloud platforms (AWS Sagemaker, Azure ML, GCP Vertex AI) and on-prem HPC cluster management.
  • Automation and MLOps expertise: CI/CD pipelines for ML, infrastructure as code, and workflow automation.
  • Understanding of security and governance including data privacy laws (e.g., DPDP Act), ISO, PCI-DSS, HIPAA compliance, and secure GPU cluster design.

Certifications (Preferred)

  • NVIDIA Certified AI Specialist
  • Azure AI Engineer
  • AWS ML Specialty
  • HPC-related certifications

Soft Skills

  • Strong stakeholder communication and collaboration skills.
  • Ability to work effectively with data scientists, researchers, and enterprise IT teams.
  • Align technical solutions to business objectives with a strategic mindset.

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now

RecommendedJobs for You