HPC - Team Lead

6 - 11 years

8 - 13 Lacs

Posted:1 month ago| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description

Hi,

We have an immediate requirement for HPC Team Lead position in Hyderabad with our organization SHI Locuz Enterprise Solutions Pvt Ltd.

PFB JD:


Experience - 6+years
Work location - Hyderabad

ROLE SUMMARY

The

Technology Lead HPC

ensures that critical IT services and high-performance computing (HPC) infrastructure are available, efficient, and secure. The person in this role manages daily operations of mission critical systems in multiple client s data centres, working closely with both facilities engineering teams (power, cooling, physical infrastructure) and IT infrastructure / operations teams, to support service clients around the clock. This role combines technical leadership, operations oversight, incident / problem management, and strategic planning.

PRIMARY ROLES & RESPONSIBILITIES

  • Experience architecting and maintaining HPC/AI systems.
  • Linux system administration
  • Cluster management
  • System and software configuration management
  • High speed networking
  • Resource managers and schedulers
  • High speed parallel storage
  • Monitoring and alerting
  • Strong understanding of HPC/AI architectures and concepts.
  • Experience supporting and managing a group of HPC/AI Clusters.
  • Excellent knowledge in prototyping and deploying HPC/AI clusters.
  • Extensive experience in troubleshooting Linux OS, filesystems and cluster hardware.
  • Good command of various Linux scripting tools, like bash, Perl, python, etc.
  • Experience implementing, maintaining, and verifying defined security policies.
  • To be willing to maintain a flexible work schedule.
  • A positive attitude and willingness to help enable the lab users for success.
  • Excellent guidance and teamwork skills.

TECHNICAL SKILLS

  • RedHat, Ubuntu, SuSE OS
  • Cluster Tools (Bright, xCAT, werewolf, OpenHPC, ROCKS etc)
  • InfiniBand
  • Lustre, BeeGFS and GPFS architecture and maintenance
  • Configuration management software (Ansible, Puppet)
  • SLURM/PBS/LSF/Gridengine Scheduler
  • SPACK software manager
  • Experience in AI Servers & Software stack Deployment.
  • Experience on container technologies and orchestration tools - docker, singularity, Apptainer, Kubernetes.
  • Hands-on with AI/ML tools: TensorFlow, PyTorch, Keras, ONNX, JAX.
  • Experience in benchmarking and performance optimization of large-scale HPC/AI systems
  • Experience in Linux, and/or Windows Operating System (OS), including file management, scripting, editing, and security.
  • Log consolidation and monitoring (ganglia, Grafana etc.)
  • Lifecycle and patch management experience.

SOFT SKILLS

  • Good logical reasoning & analytical skill
  • Good communication skill

OTHER SKILLS

  • Collaborative, co-operative, and commitment mindset.
  • Teamwork
  • Excellent analytical and problem-solving skills.
  • Ability to work independently and within cross-functional teams.
  • Detail-oriented with good documentation practices.
  • Excellent interpersonal, communication, customer interaction, documentation skills and decision-making ability.

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now
Locuz logo
Locuz

Information Technology & Services

Bengaluru

RecommendedJobs for You

south delhi, delhi, india

bengaluru, karnataka, india

noida, uttar pradesh, india