Posted:1 day ago|
Platform:
On-site
Full Time
Job Description Skilled and experienced HPC Platform & Security Engineer to join Onix team dedicated to ensuring the stability, security, and consistent operation of High-Performance Computing (HPC) clusters on Google Cloud.
This role is critical for delivering a reliable and secure platform for our clients' most demanding computational workloads.
● Cluster Platform Stability & Maintenance:
○ Own the stability, security, and consistent release of HPC Virtual Machine (VM) images and essential software tools.
○ Develop and maintain automation scripts (e.g., Python, Bash, Terraform) for cluster lifecycle management, health checks, and system provisioning.
○ Perform regular maintenance and tuning to ensure optimal performance of the HPC environment.
● Security and Patch Management:
○ Actively manage, deploy, and triage security patches and updates across all HPC cluster components and VM images.
○ Monitor and manage system security configurations, ensuring compliance with Google Cloud and client security policies.
○ Triage and resolve complex integration issues related to security and monitoring tools.
● Release Management:
○ Implement and manage a robust release pipeline to deliver consistent VM image updates and bi-weekly software patches with minimal disruption.
○ Collaborate with security teams to validate and sign off on all image and tooling releases.
● Monitoring and Troubleshooting:
○ Utilize GCP monitoring tools (e.g., Cloud Monitoring, Cloud Logging) to ensure platform health, proactively identify anomalies, and troubleshoot complex system-level and network-related issues.
○ Maintain detailed documentation for all operational and security processes. Required Qualifications
● Education: Bachelor’s degree in Computer Science, Engineering, or a related technical field.
● Experience: 4+ years of hands-on experience in cloud infrastructure, DevOps, or system administration, with a focus on Linux and HPC.
○ Deep expertise in Linux administration and shell scripting (Bash/Python).
○ Strong practical experience with a major cloud platform, preferably Google Cloud Platform (GCP).
○ Experience with Infrastructure as Code (IaC) tools like Terraform or Cloud Deployment Manager.
○ Proven knowledge of security best practices, patch management, and vulnerability remediation in a cloud/Linux environment.
○ Familiarity with job schedulers like Slurm or other HPC workload managers is highly desirable.
Vastika Inc
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
We have sent an OTP to your contact. Please enter it below to verify.
Practice Python coding challenges to boost your skills
Start Practicing Python Nowbengaluru
8.0 - 12.0 Lacs P.A.
bengaluru
5.0 - 10.0 Lacs P.A.
pune, maharashtra, india
Salary: Not disclosed
10.0 - 15.0 Lacs P.A.
bengaluru
10.0 - 14.0 Lacs P.A.
mumbai, maharashtra, india
Salary: Not disclosed
india
Salary: Not disclosed
ahmedabad, gujarat, india
Experience: Not specified
Salary: Not disclosed
32.5 - 47.5 Lacs P.A.
bengaluru
10.0 - 12.0 Lacs P.A.