Posted:7 hours ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

In this role, you will ensure the reliability, performance, and scalability of our critical HPC systems and infrastructure. You will work closely with engineering, infrastructure, and operations teams to design, implement, and manage systems that support compute-intensive workloads, enabling cutting-edge research, simulations, and data processing.In your new role you will:System Reliability and Performance
  • Ensure reliability, availability, and performance of High Performance Computing systems.
  • Identify and mitigate bottlenecks in HPC clusters, interconnects, and storage systems.
  • Proactively develop monitoring and alerting systems to anticipate and reduce downtime

Automation and Infrastructure as Code (IaC)

  • Automate system deployment, configuration, and maintenance processes for HPC clusters.
  • Implement Infrastructure as Code (IaC) using tools such as Terraform, Ansible, or similar.
  • Develop self-healing and automated recovery mechanisms to minimize manual intervention.

Incident Management and Troubleshooting

  • Respond to HPC system incidents, conduct root cause analysis, and implement preventive measures.
  • Create and maintain comprehensive runbooks and playbooks for handling predictable issues.

System and Software Optimization

  • Collaborate with engineering teams to optimize workloads, schedulers(LSF), and resource allocations for maximum efficiency.
  • Test, benchmark, and optimize hardware and software configurations in collaboration with vendors

Collaboration and Communication

  • Act as a bridge between software development and operations teams to ensure smooth deployment of HPC workloads.
  • Provide training, documentation, and guidance to users and stakeholders.

Research and Continuous Improvement

  • Stay updated on the latest HPC technologies and trends, including GPUs, accelerators, and interconnects like InfiniBand.
  • Propose and implement innovative solutions to improve the efficiency and scalability of HPC environments.

Your Profile

You are best equipped for this task if you have:
  • Bachelor’s or Master’s degree in Computer Science, Engineering, or a related technical field. Equivalent experience will also be considered.
  • Proven experience in managing and optimizing HPC clusters and associated resources (compute nodes, storage, and interconnects).
  • Expertise with workload managers and job schedulers like LSF.
  • Strong programming or scripting skills in languages such as Python,Bash, or Go.
  • Proficiency in Linux system administration (RHEL, CentOS, Ubuntu) and networking.
  • Familiarity with containerization technologies such as Docker and Kubernetes for HPC workloads.
  • Experience with monitoring tools (e.g., Prometheus, Grafana, Nagios) and log management tools (e.g., ELK stack).
  • Strong problem-solving skills with attention to detail.
  • Excellent communication and teamwork skills, with the ability to collaborate across multi-disciplinary teams.
  • Ability to prioritize tasks effectively in a dynamic and fast-paced environment
  • Understanding of DevOps principles and practices, including CI/CD pipelines.
  • Knowledge of security principles and best practices in HPC environments
  • Certifications such as RHCE

Contact:

Pooja.AnandaChowta@infineon.com

#WeAreIn for driving decarbonization and digitalization.

As a global leader in semiconductor solutions in power systems and IoT, Infineon enables game-changing solutions for green and efficient energy, clean and safe mobility, as well as smart and secure IoT. Together, we drive innovation and customer success, while caring for our people and empowering them to reach ambitious goals. Be a part of making life easier, safer and greener.

Are you in?

We are on a journey to create the best Infineon for everyone.

This means we embrace diversity and inclusion and welcome everyone for who they are. At Infineon, we offer a working environment characterized by trust, openness, respect and tolerance and are committed to give all applicants and employees equal opportunities. We base our recruiting decisions on the applicant´s experience and skills. Learn more about our various contact channels.Please let your recruiter know if they need to pay special attention to something in order to enable your participation in the interview process.Click here for more information about Diversity & Inclusion at Infineon.

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You

bengaluru east, karnataka, india