Get alerts for new jobs matching your selected skills, preferred locations, and experience range.
4.0 - 6.0 years
4 - 6 Lacs
Hyderabad / Secunderabad, Telangana, Telangana, India
On-site
The role is responsible for the design, integration, and management of high performance computing (HPC) systems that encompass both hardware and software components into the organization's network infrastructure. This individual will be responsible for all activities related to handling and supporting the Business and platforms including system administration, as well as incorporating new technologies under the challenge of a sophisticated and constantly evolving technology landscape. This role involves ensuring that all parts of a system work together seamlessly to meet the organization's requirements. Roles & Responsibilities: Implement and manage cloud-based infrastructure that supports HPC environments that support data science (e.g. AI/ML workflows, Image Analysis) Collaborate with data scientists and ML engineers to deploy scalable machine learning models into production Ensure the security, scalability, and reliability of HPC systems in the cloud Optimize cloud resources for cost-effective and efficient use Keep abreast of the latest in cloud services and industry standard processes Provide technical leadership and guidance in cloud and HPC systems management Develop and maintain CI/CD pipelines for deploying resources to multi-cloud environments Monitor and fix cluster operations/applications and cloud environments Document system design and operational procedures Basic Qualifications: Masters degree with a 46 years of experience in Computer Science, IT or related field with hands-on HPC administration OR Bachelors degree with 68 years of experience in Computer Science, IT or related field with hands-on HPC administration OR Diploma with 1012 years of experience in Computer Science, IT or related field with hands-on HPC administration Demonstrable experience in cloud computing (preferably AWS) and cloud architecture Experience with containerization technologies (Singularity, Docker) and cloud-based HPC solutions Experience with infrastructure-as-code (IaC) tools such as Terraform, CloudFormation, Packer, Ansible and Git Expert with scripting (Python or Bash) and Linux/Unix system administration (preferably Red Hat or Ubuntu) Proficiency with job scheduling and resource management tools (SLURM, PBS, LSF, etc.) Knowledge of storage architectures and distributed file systems (Lustre, GPFS, Ceph) Understanding of networking architecture and security best practices Preferred Qualifications: Experience supporting research in healthcare life sciences Experience with Kubernetes (EKS) and service mesh architectures Knowledge of AWS Lambda and event-driven architectures Exposure to multi-cloud environments (Azure, GCP) Familiarity with machine learning frameworks (TensorFlow, PyTorch) and data pipelines Certifications in cloud architecture (AWS Certified Solutions Architect, Google Cloud Professional Cloud Architect, etc.) Experience in an Agile development environment Prior work with distributed computing and big data technologies (Hadoop, Spark) Professional Certifications (please mention if the certification is preferred or mandatory for the role): Red Hat Certified Engineer (RHCE) or Linux Professional Institute Certification (LPIC) AWS Certified Solutions Architect Associate or Professional Soft Skills: Strong analytical and problem-solving skills Ability to work effectively with global, virtual teams Effective communication and collaboration with cross-functional teams Ability to work in a fast-paced, cloud-first environment
Posted 5 days ago
4.0 - 9.0 years
4 - 9 Lacs
Hyderabad / Secunderabad, Telangana, Telangana, India
On-site
Roles & Responsibilities: Collaborate with geographically dispersed teams, including those in the US, EU and other international locations. Partner and ensure alignment of the Amgen India DTI site leadership and follow global standards and practices. Foster a culture of collaboration, innovation, and continuous improvement. Function as a Scientific Business Analyst, providing domain expertise for Research Data and Analytics within a Scaled Agile Framework (SAFe) product team Serve as Agile team scrum master or project manager as needed Serve as a liaison between global DTI functional areas and global research scientists, prioritizing their needs and expectations Create functional analytics dashboards and fit-for-purposes applications for quantitative research, scientific analysis and business intelligence (Databricks, Spotfire, Tableau, Dash, Streamlit, RShiny) Manage a suite of custom internal platforms, commercial off-the-shelf (COTS) software, and systems integrations Translate complex scientific and technological needs into clear, actionable requirements for development teams Develop and maintain release deliverables that clearly outlines the planned features and enhancements, timelines, and milestones Identify and manage risks associated with the systems, including technological risks, scientific validation, and user acceptance Develop documentations, communication plans and training plans for end users Ensure scientific data operations are scoped into building Research-wide Artificial Intelligence/Machine Learning capabilities Ensure operational excellence, cybersecurity and compliance. What we expect of you We are all different, yet we all use our unique contributions to serve patients. Basic Qualifications: Doctorate degree OR Masters degree and 4 to 6 years of Life Science/Biotechnology/Pharmacology/Information Systems experience OR Bachelors degree and 6 to 8 years of Life Science/Biotechnology/Pharmacology/Information Systems experience OR Diploma and 10 to 12 years of Life Science/Biotechnology/Pharmacology/Information Systems experience Preferred Qualifications: BS, MS or PhD in Bioinformatics, Computational Biology, Computational Chemistry, Life Sciences, Computer Science or Engineering 3+ years of experience in implementing and supporting biopharma scientific research data analytics Functional Skills: Must-Have Skills: Excellent problem-solving skills and a passion for tackling complex challenges in drug discovery with technology and data Excellent communication skills and experience creating impactful slide decks with data Collaborative spirit and effective communication skills to work seamlessly in a cross-functional team Familiarity with data analytics and scientific computing platforms such as Databricks, Dash, Streamlit, RShiny, Spotfire, Tableau and related programming languages like SQL, python, R. Good-to-Have Skills: Demonstrated expertise in a scientific domain area and related technology needs Understanding of semantics and FAIR (Findability, Accessibility Interoperability and Reuse) data concepts Understanding of scientific data strategy, data governance, data infrastructure Experience with cloud (e.g. AWS) and on-premise compute infrastructure Familiarity with advanced analytics, AI/ML and scientific computing infrastructure, such as High Performance Compute (HPC) environments and clusters (e.g SLURM, Kubernetes) Experience with scientific and technical team collaborations, ensuring seamless coordination across teams and driving the successful delivery of technical projects Ability to deliver features meeting research user demands using Agile methodology An ongoing commitment to learning and staying at the forefront of AI/ML advancements. We understand that to successfully sustain and grow as a global enterprise and deliver for patients we must ensure a diverse and inclusive work environment. Professional Certifications SAFe for Teams certification (preferred) SAFe Scrum Master or similar (preferred) Soft Skills: Strong transformation and change management experience. Exceptional collaboration and communication skills. High degree of initiative and self-motivation. Ability to manage multiple priorities successfully. Team-oriented with a focus on achieving team goals. Strong presentation and public speaking skills.
Posted 2 weeks ago
4.0 - 6.0 years
10 - 12 Lacs
Hyderabad
Work from Office
Seeking a Senior HPC Administrator to manage and optimize high-performance computing systems. Required Candidate profile Notice Period : Immediate or 30 days max Responsibilities include cluster management, performance tuning, and user support. Requires 5+ years' experience with HPC, Linux, and job schedulers.
Posted 3 weeks ago
2.0 - 4.0 years
3 - 6 Lacs
Mumbai, Hyderabad, Bengaluru
Work from Office
Hiring an HPC Administrator to manage and support high-performance computing systems. Responsibilities include cluster setup, maintenance, monitoring, and user support. Requires 3+ years' experience with HPC environments, Linux, and schedulers. Required Candidate profile Notice Period : Immediate or 30 days max
Posted 3 weeks ago
5 - 10 years
15 - 30 Lacs
Bengaluru
Work from Office
Design and manage HPC infrastructure for geophysics, simulation, ML/AI using Azure and Linux. Optimize compute environments and support job schedulers, file systems, and parallel processing workflows. Required Candidate profile Experienced HPC engineer with 5–10 years in Linux, Azure, job schedulers, and supporting scientific workloads in a large-scale enterprise environment.
Posted 1 month ago
5 - 10 years
7 - 12 Lacs
Gurgaon
Work from Office
The High-Performance Computing Infrastructure Engineer is primarily responsible for the overall health and maintenance of storage technologies in our managed services customer's environments. Our HPC Infrastructure Engineers are a valued member of the Managed Services Infrastructure Practice responsible for Tier 3 incident management, service request management and change management infrastructure support for all Managed Services customers. Roles & Responsibilities Provide enterprise-level operational support to Managed Services customers for incident, problem, and change management activities Plan and perform maintenance activities Assess customer environments for performance and design issues and propose resolutions Work across technical teams to troubleshoot complex infrastructure issues Create and maintain detailed documentation Serve as a subject matter expert and escalation point for storage technologies Work with vendors to resolve storage issues Communicate with customers and internal team with transparency Participate in on-call rotation Completion of training and certification as assigned to further skills and knowledge Skills Required Bachelors degree or equivalent Information Systems or related field. Unique education, specialized experience, skills, knowledge, training, or certification may be substituted for education 5+ years of expert level experience managing infrastructure in high-performance computing environments including configuration, troubleshooting, and best practice. 1+ years of experience with Nvidia DGX preferred. Experience with high-performance computing (HPC) schedulers (e.g., SLURM, PBS, Torque) required. Experience configuring, maintaining and troubleshooting Kubernetes. Experience with storage technology (e.g., Ceph, Vast Data Platform) and distributed file systems (e.g., Lustre, GPFS, NFS, GlusterFS). Experience with machine learning or data science workflows in HPC/AI environments Advances experience with Linux operating systems. Experience configuring, maintaining and troubleshooting Nvidia/Mellanox (Cumulus OS) switches a plus Experience with both ethernet and InfiniBand networking a plus. 1+ years working with monitoring platforms (e.g., Prometheus, Grafana); Elastic Observability experience is a bonus 1+ years working with an enterprise ITSM system: Service Now is a bonus Previous experience with automation tools such as Ansible, Puppet, or Chef a plus. Managed Services or consulting experience is required. Strong background with customer service High level problem-solving and communication skills Strong oral and written communications skills Related network certifications are a bonus.
Posted 2 months ago
5 - 10 years
9 - 19 Lacs
Bengaluru
Work from Office
We are hiring for different falvours of Linux in HP, Bangalore. Job Description: 8+ years of experience in managing Linux setup. 4+ years of Experience in HPC/ Linux clusters. Install, administer, and maintain hardware, system software, networking, accounts, and security measures on VMWare configuration. Diagnose and correct system issues, whether these be issues with correct operation or performance. Reinstate integrity of system as quickly as possible following an outage in order to minimize downtime. Triage and solve user-submitted tickets, especially when they relate to the infrastructure. Track resource usage using monitoring and queuing software. Actively participate in Knowledge Management by creating new technical documents. Patch system firmware and software as needed. Peer assistance is an added trait. What you need to bring: Technical Skills: Demonstrated expertise with Linux system administration, including OS, networking, storage, and security. Understanding of different Linux flavours like suse, satellite, redhat, centos, Ubuntu etc Expertise with high-speed networking such as InfiniBand and 10/40 Gigabit Ethernet. Expertise with high speed file transfer tools such as file catalyst Familiarity with large storage systems Some experience in scripting language Proven expertise in Hypervisor / HPC clusters Scheculer experience like PBS, Slurm Knowledge of Horizon is preferred Good Understanding of Scripting languages like Shell, Bash, Python, JavaScript, Perl, Ruby, or Java Experience with Linux clusters Experience with Kernel level administration, preferred to have KVM exp, Strong OS , Linux,SLES, RHEL and Ubuntu/Debian Troubleshooting Knowledge on ESXi and vCenter performance issues. Knowledge on Virtual Machine snapshots and VMware VDP with virtualization concepts Understanding of VMware Site Recovery Manager for disaster recovery Business Skills: Demonstrate strong written and verbal communication skills. Interacting and collaborating across different technology teams within HPE. Must work towards achieving HPEs vision for our customers. Affinity and a thorough understanding of support processes defined within HPE. Ability to work in a 24x7 environment in rotation shifts Exhibit Customer First and Customer Last Attitude consistently. Ability to drive cases to closure and provide Case Summary. Demonstrate high level of technical & communication skills. Takes responsibility for end-to-end problem ownership and its solutions.
Posted 2 months ago
7 - 12 years
8 - 18 Lacs
Bengaluru
Work from Office
Job description: The HPC Administrator is responsible for design, implementing, and operating enterprise infrastructure and systems automation for HPC (High Performance Computing) clusters. Must be capable of working with minimal supervision & applies the necessary technical expertise for effective and efficient use while communicating with peers, customers and leadership Key Responsibilities : • Setup, configuration, general maintenance and troubleshooting of HPC Cluster for CAE Dept. • Manage large & diverse HPC environment including design, build, capacity planning • Knowledge on High Performance Computing HPC like managing CAE Softwares, troubleshooting failed HPC jobs, PBS/SLURM/LSF/SGE or any scheduler knowledge will be added advantage • New CAE application integration to the existing HPC Cluster • Application knowledge on CAE applications like STARCCM, Abaqus, Numeca, LS-DYNA, Preonlab, Converge, Console • Should have a working experience on Altair Applications like ANSA, Hypermesh, Hyperworks, Medina • Knowledge on Altair PBS, License server management • Evaluate and recommend systems CAE software and hardware for enterprise systems. • Work with core production support personnel in IT and Engineering to automate deployment and operation of the infrastructure • LDAP configuration and Integration • Manage and maintain monitoring to ensure uptime and SLA levels. • Manage, deploy and configure infrastructure with Puppet / Ansible or other automation tools • Knowledge on Operating Systems like CENTOS, Ubuntu, Redhat • Supporting, interfacing and cooperating with cross functional teams. Ability to work closely with end users to understand their needs and provide guidance. • Knowledge in scripting languages including Shell, Python. • Document process and procedure followed in day to day operations as well as new implementation • Directs & coordinates the work assigned with respective teams & possess excellent communication & problem-solving skills. Required Skills: • Minimum 6+ years of HPC experience (required). • Bachelors degree in Computer Science, Information Systems, or equivalent education • Having Hands on experience in HPC Infra • Working knowledge on HPC schedulers like PBS, SLURM • Providing application support for CAE applications like STARCCM, Abaqus, Numeca, LS-DYNA. • Troubleshooting knowledge on HPC jobs • Work with CAE Dept closely, get all the requirements and provide best solutions to the end user • Must be able to work with and provide support for cross functional groups and technical areas (compute, storage, network, applications) • Must have firm understanding of Linux internals and have automated system building, patching, and configuration management • Knowledge in systems management automation using industry-standard and open-source tools such as Python, Bash, Puppet, Ansible. • Good understanding of various server technologies available to deploy servers in DC and also Vendor Management • Excellent Communication Skills, team coordination and interpersonal skills
Posted 2 months ago
5 - 10 years
15 - 25 Lacs
Bengaluru
Work from Office
What role you will play in our team The HPC Systems Engineer role has the overall responsibility to work within a team to provide a performant, reliable, and secure high-performance computing (HPC) environment. The HPC Systems Engineer will be involved in various aspects of designing and engineering our HPC system as well as be responsible for managing day-to-day operations and maintenance activities including, but not limited to the following: general troubleshooting of any issues that may arise, monitoring overall system health, performing system maintenance tasks, and evaluating new hardware/system software. Job location is based out of Bengaluru, Karnataka Kindly click on the below link to apply on the job opening https://jobs.exxonmobil.com/job-invite/79703/ What you will do Establish strategies for overall support of the system! Evaluate new hardware and software and understand potential benefits/impacts it can have in the environment. Perform hardware maintenance. Perform software installations and upgrades, inclusive of operating system. Monitor overall system performance and health. Provide support for the management of data in the environment. Work with users to resolve problems and ensure they are able to effectively utilize the system. Interact with both business customers and technical teams that are globally distributed and within varied time zones Engaging with vendors for problem resolution of existing infrastructure and discussion of roadmaps and new technologies for evaluations Foster a supportive work environment and maintains open, productive interactions among team and across organizations Build and maintain cross-organizational contacts to facilitate execution of work. About You Skills and Qualifications B.E./B.Tech in Computer Science or related degree area (e.g. Computer Engineering, Information Systems) or equivalent skills work experience with CGPA 6 and above. Excellent technical, analytical, and communication skills A minimum of 5 years of hands-on Linux experience (e.g. RHEL, CentOS) and production infrastructure support (e.g. networking, storage, monitoring, compute. Experience in high performance computing, system administration and technical support (e.g. installation, configuration, maintenance, upgrade, retirement, problem resolution) Experience in HPC technologies such as parallel/distributed files systems (e.g. Lustre, GPFS), high speed interconnect fabrics (e.g. Infiniband, Omni-Path), and HPC batch scheduling software suites (e.g. PBSPro, SLURM) Proficiency in technical writing and documentation of solutions Solid understanding of data center operations fundamentals in networking, cooling, and power Works well in a team environment. Self-motivated Preferred Qualifications/ Experience Strong IT skills in infrastructure and applications Experience with supporting large scale production environments. Experience in implementing changes and security controls in a global framework. Understanding of data center operations fundamentals in networking, cooling, and power Knowledge and experience with installing/compiling vendor and open-source software. Knowledge and experience with application/infrastructure deployment and support in one or more of the major cloud environments Comfortable in relocating to Bengaluru and working hour - (1:30 PM to 10:30 PM IST)
Posted 3 months ago
2 - 7 years
4 - 9 Lacs
Hyderabad
Work from Office
Day to Day monitoring of cluster setup Update the patches when required and checking the compatibility with the applications Configuration of SLURM job schedule Check the cluster health statusMaintaining xcat database
Posted 3 months ago
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
Accenture
36723 Jobs | Dublin
Wipro
11788 Jobs | Bengaluru
EY
8277 Jobs | London
IBM
6362 Jobs | Armonk
Amazon
6322 Jobs | Seattle,WA
Oracle
5543 Jobs | Redwood City
Capgemini
5131 Jobs | Paris,France
Uplers
4724 Jobs | Ahmedabad
Infosys
4329 Jobs | Bangalore,Karnataka
Accenture in India
4290 Jobs | Dublin 2