We seek out curious minds. We value attention to detail, and we care deeply about outcomes. We are looking for passionate people, eager to learn, willing to share, establishing innovative ways of working and influencing culture change.
-
3 5 years of working experience in IT working on heterogeneous Linux/Windows/Unix server
-
environments with development experience.
-
Strong Linux/RedHat system administration skills (installation, configuration, tuning, troubleshooting).
-
Hands-on experience with installation, configuration, tuning, and troubleshooting of scheduler platforms in heterogeneous Linux and Windows environments.
-
Experience with automation and configuration management using Ansible, Git, Bash, and Python
-
Good understanding of compute resource management (CPU, GPU, memory, storage, licenses).
-
Good understanding of ITIL processes (Incident, Change, Problem, Service Management); certification preferred.
-
Analytical mindset with strong problem-solving skills, able to diagnose, reproduce, and resolve complex issues.
-
Experience in stakeholder management and user support , with ability to adapt to evolving business processes.
-
Familiarity with VMware tools , HPC engineering applications, and server architecture.
-
Awareness of Agile methodologies and DevOps practices
-
Strong analytical skills, complex problem solving, ability to diagnose, reproduce & address queries independently.
-
Experience in stakeholder management and Consumer support, adaptation and learning of the business processes
-
Experienced in working within a product delivery lifecycle in software development
-
Good understanding of server architecture and Engineering applications.
-
Knowledge on VMware tools
-
Excellent communication skills to work in a globally distributed team
Responsibilities:
Operate, maintain, and optimize Scientific Computing environments (Linux/RedHat, HPC clusters, scheduling systems).
Develop and maintain automation scripts and playbooks (Ansible, Python, Bash, Git) to streamline administration, monitoring, and reporting.
Troubleshoot and resolve issues related to access & infrastructure
Apply ITIL processes in daily operations (Incident/Change/Problem Management).
Proactively challenge existing workflows and propose innovative improvements to enhance efficiency and user experience.
Support upgrades, patches, and security hardening with minimal service disruption.
Collaborating with business users and relevant stakeholders to define project requirements, scope and deliverables.
Provide day-to-day support for production processing by solving incidents and requests raised by the user community and an ongoing event and alert management
Analyze and propose/implement to Improve incident resolution quality
Perform incident categorisation/classification & find out critical issues/most popular incidents and do root cause analysis
Perform root cause analysis for critical incidents & trend analysis for proactive measures
Write/Update knowledge management articles with incident analysis
Participate in Agile Development activities and deliver features for fix/improvement of the service
Apply DevOps tools, culture and mindset for all your activities on a daily basis
Excellent communication skills to work in a globally distributed team
Contribute to continuous improvement initiatives on demand
Success Metrics:
Success will be measured in a variety of areas, including but not limited to:
-
Consistently ensure the on-time delivery and quality (first-time-right) of the projects
-
Bring innovative cost effective solutions
-
Achieve the customer satisfaction
-
Ability to handle a subject from demand management, to development and support
-
Ability to challenge the needs of the clients and provide appropriate solutions
-
Compliance with security and ITIL best practices .