Solutions Architect

12 years

0 Lacs

Posted:7 hours ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Job Purpose

Role Summary:

Responsible for architecting, designing, and implementing GPU-enabled, High Performance Computing (HPC), and AI/ML platform solutions. This role involves building scalable, secure, and optimized platforms to support enterprise AI, ML/DL workloads, simulation, and large-scale data analytics. The architect will define the infrastructure strategy, workload placement, performance optimization, and managed services roadmap for GPU and HPC platforms within the Data Centre (DC) business.


Role Description

Key Responsibilities:

Platform Architecture & Design

  • Architect

    GPU and HPC infrastructure platforms

    for AI/ML training, inference, and HPC workloads.
  • Design

    GPUaaS (GPU-as-a-Service)

    models, including on-demand, reserved, and burst GPU clusters.
  • Integrate

    AI/ML frameworks

    (TensorFlow, PyTorch, KubeFlow, JupyterHub, etc.) into enterprise-ready stacks.

Infrastructure & Workload Optimization

  • Ensure

    performance tuning, resource scheduling, and workload orchestration

    across HPC clusters and GPU nodes.
  • Optimize for

    distributed training, model parallelism, and storage bandwidth

    (NVMe, Lustre, GPFS, Ceph).

AI/ML Platform Enablement

  • Provide cloud-native environments with

    containerized ML workflows

    (Kubernetes, Docker, Singularity).
  • Build and manage

    model hosting & inference platforms

    (REST APIs, containerized inference servers).

Security & Compliance

  • Implement

    data security, encryption, access control, and compliance frameworks

    for sensitive AI/HPC workloads.
  • Architect

    air-gapped solutions

    for government/defense workloads when required.

Technology Integration & Innovation

  • Evaluate and integrate

    next-gen GPUs (NVIDIA H200/A100/L40S, AMD MI300, etc.),

    HPC accelerators, and AI chipsets.
  • Enable hybrid/hyperconverged AI infrastructure (GPU + CPU + storage).

Customer & Business Enablement

  • Collaborate with data scientists, researchers, and enterprise customers to align platform capabilities with business outcomes.
  • Define

    GPU/HPC platform services catalog

    and managed service offerings.

Automation & DevOps

  • Implement

    MLOps pipelines

    , infrastructure as code (Terraform, Ansible), and workload scheduling (SLURM, Kubernetes).

  • Experience & Educational Requirements

    Qualifications and Experience

    EDUCATIONAL QUALIFICATIONS:

    BE/B-Tech or equivalent with Computer Science or Electronics & Communication

    RELEVANT EXPERIENCE:

    • Experience

      : 8–12 years overall IT experience, with

      5+ years in HPC/AI/ML/GPU platform architecture

      .
    • Technical Expertise

      :
    • Strong background in

      GPU architecture (NVIDIA, AMD)

      and

      HPC systems

      .
    • Proficiency in

      AI/ML frameworks

      (TensorFlow, PyTorch, Keras, MXNet, Hugging Face).
    • Experience with

      distributed training and orchestration frameworks

      (KubeFlow, MLflow, Ray, Horovod).
    • Knowledge of

      parallel computing, MPI, CUDA, ROCm, and GPU drivers

      .
    • Familiarity with

      storage technologies

      for HPC/AI (NVMe, Lustre, GPFS, Ceph, Object Storage).
    • Cloud & Hybrid AI Platforms

      : Hands-on with

      GPU cloud offerings

      (AWS Sagemaker, Azure ML, GCP Vertex AI) and

      on-prem HPC cluster management

      .
    • Automation & MLOps

      : Experience with

      CI/CD for ML (MLOps)

      , workflow automation, and

      infrastructure as code

      .
    • Security & Governance

      : Knowledge of

      data privacy, DPDP Act, compliance (ISO, PCI-DSS, HIPAA)

      , and secure GPU cluster design.
    • Certifications (Preferred)

      : NVIDIA Certified AI Specialist, Azure AI Engineer, AWS ML Specialty, or HPC-related certifications.
    • Soft Skills

      : Strong stakeholder communication, ability to collaborate with

      data scientists, researchers, and enterprise IT teams

      , and capability to align technical solutions to business objectives.

    Mock Interview

    Practice Video Interview with JobPe AI

    Start Job-Specific Interview
    cta

    Start Your Job Search Today

    Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

    Job Application AI Bot

    Job Application AI Bot

    Apply to 20+ Portals in one click

    Download Now

    Download the Mobile App

    Instantly access job listings, apply easily, and track applications.

    coding practice

    Enhance Your Skills

    Practice coding challenges to boost your skills

    Start Practicing Now

    RecommendedJobs for You

    Hyderabad, Telangana, India

    Hyderabad, Telangana, India

    Mumbai, Maharashtra, India

    Mumbai Metropolitan Region

    Thane, Maharashtra, India

    Bengaluru, Karnataka, India