Home
Jobs
Companies
Resume

1 Mdadm Jobs

Filter
Filter Interviews
Min: 0 years
Max: 25 years
Min: ₹0
Max: ₹10000000
Setup a job Alert
JobPe aggregates results for easy application access, but you actually apply on the job portal directly.

0 years

0 Lacs

India

On-site

Linkedin logo

Role: Senior System Engineer II (AI Infrastructure) Stack: Linux, LXC, Python, libvirt, KVM, QEMU, CEPH, VyOS, GPU network fabric Tools: NetPlan, Ansible, Prometheus, Grafana, Bash Shell scripting What You’ll Be Doing: Provision, deploy, and maintain GPU and compute infrastructure in high-performance environments. Work with your fellow sharks to design, develop, and optimize the next generation of GPU infrastructure. Manage and configure Linux networking using Netplan. Develop and maintain infrastructure automation scripts using Python, Bash, or other scripting languages. Collaborate with cross-functional teams to meet AI/ML infrastructure needs. Work with customers and stakeholders to define and refine infrastructure requirements needed to support their AI/ML workload Work with infrastructure technical leaders to define infrastructure requirements to store, move, and manipulate large datasets Guide performance teams on industry standard testing methodologies and help optimize for GPU fabric throughput Identify security improvements and drive review discussions with internal teams Working directly with individual engineering teams to deliver new infrastructure functions and technologies in support of AI/ML products What We’ll Expect From You: Experience delivering bare metal GPU infrastructure Provision, deploy, and maintain GPU and compute infrastructure in high-performance environments. Manage and configure Linux networking using Netplan. Understanding of AI/ML workloads and overall industry trends Strong collaborator and consensus builder. Author and review design documentation. Experience troubleshooting, analyzing, and debugging relevant virtualization stacks (kernel, KVM, QEMU) Experience as a software engineer / developer in a large scale, distributed environment Experience writing secure, testable, and robust low-level code Deep understanding of operating systems, virtualization, and Linux internals Familiarity with related virtualization fundamentals, including networking datapath, containers, and data persistence layers A critical thinker dedicated to solving problems and delivering solutions Required Skills & Qualifications Strong systems administration experience in Linux (Ubuntu or Debian-based systems preferred). Scripting expertise (Python, Bash, etc.) for automation and tooling. Experience in infrastructure provisioning and deployment, both bare-metal and containerized. Proficiency with Netplan and Linux network stack configuration (routes, interfaces, DNS). Familiarity with GPU technologies and cloud platforms (AWS, Azure, GCP) is a plus. Day-to-day tasks as seen on the job: Provision, deploy, and maintain GPU and compute infrastructure in high-performance environments. Manage and configure Linux networking using Netplan. MAAS Runbooks (documents) Author and edit runbooks (procedures) for provisioning (deployment), decommissioning (reclaim/removal), repave (LXC container cleanup) etc. Author and edit runbooks (playbooks) for new issues encountered and troubleshooting performed to fix them. Provisioning procedure New customer deployments Existing customer expansions Minimum Skillset: MAAS – structure, cloud-init (initial configuration scripts) LXC basic start, stop, destroy, list of container(s) maneuvering LXC access from bare metal host troubleshooting services created within LXC at deploy time Local storage Partitions (on physical disk devices) Software RAID mdadm (md0/1/2, on partitions) Filesystem (ext4, on S/w RAID block devices) NCCL – single-node and multi-node distributed tests Database – PostgreSQL, psql: basic SELECT, UPDATE queries Specific NVIDIA driver and CUDA version install on specific Ubuntu and HWE kernel SCM – Git, GitHub: branch, commit, PR, markdown File – yaml (JSON), sh (Bash Shell), py (Python) formatting knowledge DCIM – NetBox ITSM – Jira Show more Show less

Posted 4 days ago

Apply
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Featured Companies