Posted:2 days ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Role -

Years of Experience -

Location -


  • Strong experience in providing support for Linux HPC clusters.
  • Strong working knowledge on Following: o IBM Platform LSF 9 and 10 administration. o Redhat Enterprise Linux Administration. o Lustre Parallel File system. o Mellanox Infiniband Connectivity. o Cluster Manager Administration (HPCM or xCAT) o SSSD & NIS Authentication mechanisms. o Bash & Python scripting. o Ansible playbooks.
  • Experience of Abaqus, and CFD application (Fluent and StarCCM..etc.,)
  • Strong knowledge of application installations and version management on shared file systems.
  • IT infrastructure Technical Operation Management under ITIL framework
  • Security compliance and remediation management.
  • DevOps, ITIL, Agile, Safe (certifications are desirable)
  • Installation, configuration, troubleshooting and administration of Linux HPC clusters (compute, storage, and network) and applications in support of CAE environments.
  • Monitor and analyze LSF job queues and resource utilization to optimize workload management.
  • Troubleshoot and resolve any issues with LSF and its components, including master servers, compute nodes, and resource managers.
  • Collaborate with users to understand their HPC requirements and design LSF job workflows to meet their needs.
  • Develop and maintain LSF documentation, including standard operating procedures, installation guides, and troubleshooting procedures.
  • Develop and maintain LSF scripts for automation and task scheduling.
  • Diagnose and troubleshoot complex RHEL OS, application and HPC cluster technical problems.
  • Interact with hardware and software vendors for external support.
  • Develop and maintain technical solution documents (TSD) and standard operating procedures(SOP).
  • Keep all HPC infrastructure systems/servers/devices up to date and working condition to enhance business continuity.
  • Design and implement HPC network topology, including Mellanox connectivity.
  • Create and maintain HPC capacity planning and periodical cluster utilization reports.
  • Troubleshoot Abaqus, StarCCM+ and Fluent applications, and resolve any issues in a timely manner.
  • Develop and maintain scripts for automation and task scheduling using Python and Bash scripting.

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now
Tata Consultancy Services logo
Tata Consultancy Services

Information Technology and Consulting

Thane

RecommendedJobs for You

hyderabad, telangana, india

chennai, tamil nadu, india

chennai, tamil nadu, india

noida, hyderabad, bengaluru

Chennai, Tamil Nadu, India