Jobs
Interviews

5 Hpc Systems Jobs

Setup a job Alert
JobPe aggregates results for easy application access, but you actually apply on the job portal directly.

3.0 - 8.0 years

0 Lacs

chennai, tamil nadu

On-site

As a DevOps Engineer specializing in High Performance Computing (HPC) applications, your main responsibilities will include optimizing the performance and scalability of HPC applications running in containerized environments. You will be expected to stay updated with the latest advancements in HPC and cloud technologies, collaborating with other DevOps engineers and developers to ensure seamless integration of HPC solutions. Additionally, configuring Linux operating systems to meet HPC needs and implementing/maintaining Kubernetes clusters for HPC workloads will be part of your duties. You will also explore, qualify, and tune open source cloud-based technology stacks for High Performance Compute demands, contributing to the design of robust high-performing cloud-based software architecture systems involving CPU/GPU workloads, scalable/robust storages, and high-bandwidth interconnects. To excel in this role, you should possess a strong knowledge of HPC systems and cloud computing technologies, including gRPC, Kafka, Kubernetes, ZeroMQ, Redis, and Ceph. Proficiency in Linux performance tuning, experience with Kubernetes and container orchestration, and a deep understanding of Linux systems like SuSE, RedHat, Rocky, and Ubuntu are essential. Familiarity with different remote boot technologies, TCP/IP fundamentals, networking, storage, scripting languages such as Ansible, Python, and Bash, as well as CI/CD tools like Jenkins and GitLab, will be valuable assets. Experience with HPC workload managers and schedulers (e.g., Slurm, PBS) and configuration management utilities like Salt, Chef, or Puppet will be beneficial. Preferred qualifications for this position include expertise in CPU and GPU performance tuning, along with a Bachelor's or Master's degree in Computer Engineering or related fields, coupled with 3 to 8 years of validated experience. Your success in this role will be enhanced by your team orientation, interpersonal skills, organization, time management abilities, and adaptability to change. Strong problem-solving skills, attention to detail, communication, and teamwork abilities are essential qualities for excelling in this dynamic and challenging environment.,

Posted 1 month ago

Apply

6.0 - 11.0 years

8 - 13 Lacs

Chennai, Bengaluru

Work from Office

Your Opportunity As an HPC Architect , you will get the opportunity to architect high-performance computing solutions from scratch and design/ optimize all aspects (Compute , Memory, Network ing , Storage) for better cost of Ownership. Roles and Responsibility As an architect, you will be responsible fordesigning HPC infrastructure solutions, including compute, networking, storage, and workload management components. You will work closely with cross-functional teams, including Hardware, Software, product management, and business stakeholders, to understandcomputeworkload and translate theminto Platformarchitecture and designs that meet business needs. You will create and maintain detailed system architecture diagrams and specifications. You will evaluate and select appropriate hardware and software components for HPC environments You will Install, configure, and maintain HPC systems, including hardware, software, and networking components You will develop and implement automation scripts for system management and deployment. You will be a subject Matter expert to unblock dependent teams in the HPC domain. You will be expected to develop system benchmarks, profile systems to understand bottlenecks, optimize workflows and processes to improve cost of ownership. Identify and mitigate technical risks and issues throughout the HPC development life cycle. Ensure that ComputeCluster is resilient, reliable, and maintainable. You will be expected to stay abreast of the latest HPC technologies, including Hardware, Software and Networking Solutions Your primary focus will be to understand thecomputeworkload and design HPC cluster with right combination of Nodes, CPU/GPU, Memory, Interconnects and storageto have optimum performance at minimum cost of Ownership. Our Ideal Candidate Someone who has the drive and passion to learn quickly , has the ability to multi- task and switch contexts based on business needs . Qualifications In-depth experience with Linux System administration and Hardware/Software Configuration. Strong knowledge of HPC technologies including cluster computing, high speed interconnects (InfiniBand, RoCE), parallel filesystems (Lustre, GPFS, BeeGFSetc) Experience in creating, maintaining Operating System images with different installation and boot schemes Extremely good with automation tools like Ansible, Chef, Salt-Stack and Scripting languages (Python and Bash) Experience in Creating,maintaining Storage Solutions with different RAID configuration. Ability to design storage solution for different IOPS, Access patterns (Random vs Sequential RW) and tune storage and filesystemsfor better performance. Good of knowledgeNetworking concepts including IP addressing, routing, protocols and Switch configuration for RDMA, VLAN configuration, network bonding etc. Good Knowledge Virtualization, Hardware and Software Hypervisors Good knowledge of containerization technologies like docker, singularity. Experience in Software Defined Networking and Storage. Experience in setting-up remote management protocols like IPMI, Redfish etc. Experience in setting-up and using monitoring systems like Prometheus, Grafana. Experience System profiling and custom tuningfor targetworkloadfor higher performance and low cost of ownership Very good written and verbal communication skills. Very goodinTechnical documentation meant to serve as manuals for non-experts in the field. Additional Qualifications: Experience in HPC Cluster management and Work-load orchestration software (e.g.SLURM, Torque, LSF) Experience in Setting-up Deep-learning training/inference solutions. Experience in Private cloud infrastructure like Kubernetes, OpenStack,CloudStack etc. Experience in DistributedHigh Performance Computing and Parallel programming frameworks Good knowledge of Low-latency and high-throughput data transfer technologies(RDMA on RoCE, InfiniBand) Education : Bachelor's Degree or higher in Computer science or related Disciplines.

Posted 1 month ago

Apply

3.0 - 8.0 years

3 - 12 Lacs

Hyderabad / Secunderabad, Telangana, Telangana, India

On-site

The role is responsible for the design, integration, and management of high performance computing (HPC) systems encompassing both hardware and software within the organization's network infrastructure. This individual manages system administration and supports business platforms while incorporating new technologies in a sophisticated, evolving technology landscape. The role ensures seamless system integration to meet organizational requirements. Roles & Responsibilities: Implement and manage cloud-based infrastructure supporting HPC environments for data science (e.g., AI/ML workflows, Image Analysis) Collaborate with data scientists and ML engineers to deploy scalable machine learning models into production Ensure security, scalability, and reliability of HPC systems in the cloud Optimize cloud resources for cost-effectiveness and efficiency Stay updated on latest cloud services and industry best practices Provide technical leadership and guidance on cloud and HPC systems management Develop and maintain CI/CD pipelines for multi-cloud resource deployment Monitor and troubleshoot cluster operations, applications, and cloud environments Document system design and operational procedures What We Expect of You We value diverse talents united by the goal of serving patients. We seek a professional with the following qualifications: Basic Qualifications: Master's degree with 4-6 years of hands-on HPC administration experience in Computer Science, IT, or related field OR Bachelor's degree with 6-8 years of hands-on HPC administration experience OR Diploma with 10-12 years of hands-on HPC administration experience Demonstrated expertise in cloud computing (preferably AWS) and cloud architecture Experience with containerization (Singularity, Docker) and cloud HPC solutions Proficiency with infrastructure-as-code (IaC) tools like Terraform, CloudFormation, Packer, Ansible, Git Expert scripting skills (Python or Bash) and Linux/Unix system administration (Red Hat or Ubuntu preferred) Experience with job scheduling/resource management tools (SLURM, PBS, LSF) Knowledge of storage architectures and distributed file systems (Lustre, GPFS, Ceph) Understanding of networking architecture and security best practices Preferred Qualifications: Experience supporting healthcare life sciences research Experience with Kubernetes (EKS) and service mesh architectures Knowledge of AWS Lambda and event-driven architectures Exposure to multi-cloud environments (Azure, GCP) Familiarity with ML frameworks (TensorFlow, PyTorch) and data pipelines Cloud certifications (AWS Certified Solutions Architect, Google Cloud Professional Cloud Architect) Experience in Agile development environments Experience with distributed computing and big data technologies (Hadoop, Spark) Professional Certifications: Red Hat Certified Engineer (RHCE) or Linux Professional Institute Certification (LPIC) (preferred) AWS Certified Solutions Architect Associate or Professional (preferred) Soft Skills: Strong analytical and problem-solving skills Effective communication and collaboration with global, virtual, and cross-functional teams Ability to work in fast-paced, cloud-first environments

Posted 2 months ago

Apply

3 - 8 years

10 - 19 Lacs

Chennai

Work from Office

Role & responsibilities: Design, implementation & support of high-performance compute clusters Solid knowledge on HPC systems, including CPU/GPU architecture, scalable/robust storage, high-bandwidth inter-connects, and a knowledge of cloud based computing architectures Apply their attention to detail to generate HW BOMs for the HCP Clusters, provide vendor management and oversee HW release activities. Use their strong skills with the Linux OS to configure appropriate operating systems for the HPC system Understand and assemble the project specifications and performance requirements at the subsystem and system levels. Adhere and drive to project timelines to insure program achievements complete on time. Support design and release of new products to manufacturing and ultimately the customer, providing quality golden images, procedures, scripts and documentation to the manufacturing team and customer support team. Validated in-depth and flavor agnostic knowledge of Linux systems (SuSE, RedHat, Rocky, Ubuntu) Experience of crafting and maintaining robust storage Strong HPC HW knowledge especially in the server, GPU, networking, Storage, BIOS & BMC arenas. Experience in System-D, Net boot/PXE, Linux HA. Strong understanding of TCP/IP fundamentals and knowledge of protocols, DNS, DHCP, HTTP, LDAP, SMTP. Ability to code and develop Shell and Python scripts. Experience with one or more of the listed Configuration Mgmt utilities. (Salt, Chef, Puppet etc) . Preferred candidate profile: Possess a strong DevOps focus: Knowledge of setting up a continuous development pipeline (Jenkins), Repository software (Git-based), Singularity & Docker Containers. Kubernetes, Prometheus & Grafana experience Knowledge of Apache/Nginx, Setting up proxy/reverse proxy, application server routing, load balancing (HA Proxy) BS or MS degree + 3 to 5 years validated experience Computer Engineering or Electrical Engineer related fields Team Orientation & Interpersonal Highly motivated teammate with ability to develop and maintain collaborative relationships with all levels within and external to the organization. Organization & Time Management Able to plan, schedule, organize, and follow up on tasks related to the job to achieve goals within or ahead of established time frames. Multi-task - Ability to expeditiously organize, coordinate, manage, prioritize, and perform multiple tasks simultaneously to swiftly assess a situation, determine a logical course of action, and apply the appropriate response. Adaptability to Change Able to be flexible and supportive, and able to assimilate change positively and proactively in rapid growth environment. Outstanding teammate with excellent written and verbal communications skills. Education: Doctorate (Academic) Degree and 0 years related work experience; Master's Level Degree and related work experience of 3 years; Bachelor's Level Degree and related work experience of 5 years

Posted 3 months ago

Apply

8 - 13 years

25 - 40 Lacs

Chennai

Hybrid

Role & responsibilities: Design, implementation & support of high-performance compute clusters Solid knowledge on HPC systems, including CPU/GPU architecture, scalable/robust storage, high-bandwidth inter-connects, and a knowledge of cloud based computing architectures Apply their attention to detail to generate HW BOMs for the HCP Clusters, provide vendor management and oversee HW release activities. Use their strong skills with the Linux OS to configure appropriate operating systems for the HPC system Understand and assemble the project specifications and performance requirements at the subsystem and system levels. Adhere and drive to project timelines to insure program achievements complete on time. Support design and release of new products to manufacturing and ultimately the customer, providing quality golden images, procedures, scripts and documentation to the manufacturing team and customer support team. Required Qualifications: Validated in-depth and flavor agnostic knowledge of Linux systems (SuSE, RedHat, Rocky, Ubuntu) Experience of crafting and maintaining robust storage Strong HPC HW knowledge especially in the server, GPU, networking, Storage, BIOS & BMC arenas. Experience in System-D, Net boot/PXE, Linux HA. Strong understanding of TCP/IP fundamentals and knowledge of protocols, DNS, DHCP, HTTP, LDAP, SMTP. Ability to code and develop Shell and Python scripts. Experience with one or more of the listed Configuration Mgmt utilities. (Salt, Chef, Puppet etc) . Preferred Qualifications: Possess a strong DevOps focus: Knowledge of setting up a continuous development pipeline (Jenkins), Repository software (Git-based), Singularity & Docker Containers. Kubernetes, Prometheus & Grafana experience Knowledge of Apache/Nginx, Setting up proxy/reverse proxy, application server routing, load balancing (HA Proxy) BS or MS degree + 3 to 5 years validated experience Computer Engineering or Electrical Engineer related fields. Skills and Abilities: Team Orientation & Interpersonal Highly motivated teammate with ability to develop and maintain collaborative relationships with all levels within and external to the organization. Organization & Time Management Able to plan, schedule, organize, and follow up on tasks related to the job to achieve goals within or ahead of established time frames. Multi-task - Ability to expeditiously organize, coordinate, manage, prioritize, and perform multiple tasks simultaneously to swiftly assess a situation, determine a logical course of action, and apply the appropriate response. Adaptability to Change – Able to be flexible and supportive, and able to assimilate change positively and proactively in rapid growth environment. Outstanding teammate with excellent written and verbal communications skills. Qualifications : Doctorate (Academic) Degree and 0 years related work experience; Master's Level Degree and related work experience of 3 years; Bachelor's Level Degree and related work experience of 5 years Perks and benefits Excellent benefits.

Posted 3 months ago

Apply
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Featured Companies