Get alerts for new jobs matching your selected skills, preferred locations, and experience range. Manage Job Alerts
6.0 - 11.0 years
8 - 13 Lacs
Chennai, Bengaluru
Work from Office
Your Opportunity As an HPC Architect , you will get the opportunity to architect high-performance computing solutions from scratch and design/ optimize all aspects (Compute , Memory, Network ing , Storage) for better cost of Ownership. Roles and Responsibility As an architect, you will be responsible fordesigning HPC infrastructure solutions, including compute, networking, storage, and workload management components. You will work closely with cross-functional teams, including Hardware, Software, product management, and business stakeholders, to understandcomputeworkload and translate theminto Platformarchitecture and designs that meet business needs. You will create and maintain detailed system architecture diagrams and specifications. You will evaluate and select appropriate hardware and software components for HPC environments You will Install, configure, and maintain HPC systems, including hardware, software, and networking components You will develop and implement automation scripts for system management and deployment. You will be a subject Matter expert to unblock dependent teams in the HPC domain. You will be expected to develop system benchmarks, profile systems to understand bottlenecks, optimize workflows and processes to improve cost of ownership. Identify and mitigate technical risks and issues throughout the HPC development life cycle. Ensure that ComputeCluster is resilient, reliable, and maintainable. You will be expected to stay abreast of the latest HPC technologies, including Hardware, Software and Networking Solutions Your primary focus will be to understand thecomputeworkload and design HPC cluster with right combination of Nodes, CPU/GPU, Memory, Interconnects and storageto have optimum performance at minimum cost of Ownership. Our Ideal Candidate Someone who has the drive and passion to learn quickly , has the ability to multi- task and switch contexts based on business needs . Qualifications In-depth experience with Linux System administration and Hardware/Software Configuration. Strong knowledge of HPC technologies including cluster computing, high speed interconnects (InfiniBand, RoCE), parallel filesystems (Lustre, GPFS, BeeGFSetc) Experience in creating, maintaining Operating System images with different installation and boot schemes Extremely good with automation tools like Ansible, Chef, Salt-Stack and Scripting languages (Python and Bash) Experience in Creating,maintaining Storage Solutions with different RAID configuration. Ability to design storage solution for different IOPS, Access patterns (Random vs Sequential RW) and tune storage and filesystemsfor better performance. Good of knowledgeNetworking concepts including IP addressing, routing, protocols and Switch configuration for RDMA, VLAN configuration, network bonding etc. Good Knowledge Virtualization, Hardware and Software Hypervisors Good knowledge of containerization technologies like docker, singularity. Experience in Software Defined Networking and Storage. Experience in setting-up remote management protocols like IPMI, Redfish etc. Experience in setting-up and using monitoring systems like Prometheus, Grafana. Experience System profiling and custom tuningfor targetworkloadfor higher performance and low cost of ownership Very good written and verbal communication skills. Very goodinTechnical documentation meant to serve as manuals for non-experts in the field. Additional Qualifications: Experience in HPC Cluster management and Work-load orchestration software (e.g.SLURM, Torque, LSF) Experience in Setting-up Deep-learning training/inference solutions. Experience in Private cloud infrastructure like Kubernetes, OpenStack,CloudStack etc. Experience in DistributedHigh Performance Computing and Parallel programming frameworks Good knowledge of Low-latency and high-throughput data transfer technologies(RDMA on RoCE, InfiniBand) Education : Bachelor's Degree or higher in Computer science or related Disciplines.
Posted 1 month ago
8.0 - 12.0 years
8 - 12 Lacs
Bengaluru / Bangalore, Karnataka, India
On-site
8+ years of experience in managing Linux setup. 4+ years of Experience in HPC/ Linux clusters. Install, administer, and maintain hardware, system software, networking, accounts, and security measures on VMWare configuration. Diagnose and correct system issues, whether these be issues with correct operation or performance. Reinstate integrity of system as quickly as possible following an outage in order to minimize downtime. Triage and solve user-submitted tickets, especially when they relate to the infrastructure. Track resource usage using monitoring and queuing software. Actively participate in Knowledge Management by creating new technical documents. Patch system firmware and software as needed. Peer assistance is an added trait. What you need to bring: Technical Skills: Demonstrated expertise with Linux system administration, including OS, networking, storage, and security. Expertise with high-speed networking such as InfiniBand and 10/40 Gigabit Ethernet. Expertise with high speed file transfer tools such as file catalyst Familiarity with large storage systems Some experience in scripting language Proven expertise in Hypervisor Knowledge of Horizon is preferred Experience with Linux clusters Troubleshooting Knowledge on ESXi and vCenter performance issues. Knowledge on Virtual Machine snapshots and VMware VDP Understanding of VMware Site Recovery Manager for disaster recovery Business Skills: Demonstrate strong written and verbal communication skills. Interacting and collaborating across different technology teams within HPE. Must work towards achieving HPEs vision for our customers. Affinity and a thorough understanding of support processes defined within HPE. Ability to work in a 24x7 environment in rotation shifts Exhibit Customer First and Customer Last Attitude consistently. Ability to drive cases to closure and provide Case Summary. Demonstrate high level of technical & communication skills. Takes responsibility for end-to-end problem ownership and its solutions. Mandatory Key Skills Ethernet, file catalyst, Hypervisor, VMware ESXi, VMware VDP, VMware Site Recovery Manager, networking, Linux system administration*,InfiniBand*,High-Performance Computing
Posted 1 month ago
6.0 - 10.0 years
3 - 11 Lacs
Bengaluru / Bangalore, Karnataka, India
On-site
Designing software to support large-scale geometric data analysis and high-performance computing for OPC solutions. Optimizing infrastructure for distributed computing, ensuring seamless GPU integration. Collaborating with development teams to ensure efficient data handling and computational resource allocation. Debugging and troubleshooting infrastructure issues related to production line integration. Maintaining and troubleshooting the tool to meet performance and scalability requirements. Regularly contributing to the cutting-edge of semiconductor development by enhancing software performance and scalability. The Impact You Will Have: Advancing the development of high-performance silicon chips and software content. Enabling leading IC manufacturing through efficient software solutions. Contributing to the optimization of infrastructure for distributed computing. Ensuring seamless integration and operation of infrastructure components. Improving software performance and scalability for large-scale data analysis. Enhancing the overall efficiency and effectiveness of semiconductor development processes. What You ll Need: M.S. or Ph.D. in Computer Science, Engineering, or the Physical Sciences. 6+ years of experience in software development, with a focus on computational geometry and distributed processing. Expertise in C++, Python, and distributed computing environments. Experience in debugging and troubleshooting production-related issues. Strong communication and collaboration skills to work as part of a global team. Who You Are: A proactive problem solver with a passion for innovation. Detail-oriented with a focus on optimizing performance and scalability. An effective communicator with the ability to collaborate across teams. A self-motivated individual who can work independently with limited supervision. A sophisticated professional with advanced knowledge and wide-ranging experience.
Posted 2 months ago
6 - 8 years
18 - 20 Lacs
Bengaluru
Work from Office
High-Performance Computing (HPC) NetApp Storage Systems Azure NetApp Files and Cloud Volumes ONTAP Linux Systems Administration Automation Performance Monitoring and Optimization.
Posted 2 months ago
5 - 10 years
20 - 22 Lacs
Bengaluru
Work from Office
High-Performance Computing (HPC) Parallel Filesystems Low Latency Networks Cloud Platforms Infrastructure as Code (IaC) Linux Fundamentals Networking and Virtualization Security and Compliance Programming Skills: Proficiency in programming languages such as Python, C++, or Java, which are often used in HPC applications.
Posted 2 months ago
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
We have sent an OTP to your contact. Please enter it below to verify.
Accenture
39581 Jobs | Dublin
Wipro
19070 Jobs | Bengaluru
Accenture in India
14409 Jobs | Dublin 2
EY
14248 Jobs | London
Uplers
10536 Jobs | Ahmedabad
Amazon
10262 Jobs | Seattle,WA
IBM
9120 Jobs | Armonk
Oracle
8925 Jobs | Redwood City
Capgemini
7500 Jobs | Paris,France
Virtusa
7132 Jobs | Southborough