Jobs
Interviews

2 Mdadm Jobs

Setup a job Alert
JobPe aggregates results for easy application access, but you actually apply on the job portal directly.

10.0 years

0 Lacs

Pune, Maharashtra, India

Remote

Experience: 10+ years Minimum Required: Good verbal and written communication skills in English with a friendly and helpful attitude. Familiarity with ticket-based case management Ability to notify and escalate on back channels [L3 Support and developers] to local and remote management while staying engaged technically with the customer Ability to rapidly research through both internal and external knowledge base while maintaining engaged with the customer Configure and troubleshoot the issue with RAID configurations using tools like mdadm and smartctl Knowledge of PCI and PCIe and troubleshoot PCI issues using tools like lspci and lshw ● Hosting from the Service Processor from the hostusing tools like RKVM, IPMI tool ● Knowledge of booting a system in order to run a rescue process Experienced Linux system administration System performance monitoring using tools such as top, mem, strace, iostat, vmstat, htop, iotop Experienced network administration Should have understanding of the difference between containers and VMs Data Center Interaction: Configure and troubleshoot NFS storage, LDAP, DNS configurations ● Knowledge of standard networking protocols like Spanning Tree (STP, different types), LAG, VLAN (tagged vs untagged) Manage a ‘managed switch’, simple troubleshooting of port-down, Firewall and NAT knowledge, accessing ‘mgmt interface’ and ‘serial console’. Experienced in shell scripting Experienced in python scripting Experienced in troubleshooting client-side API issues Nice to Have: Strong understanding of ITIL Service Management Utilize and analyze output of tools such as eBPF, Linux perf Use GDB to analyze application and operating system core files Knowledge of Docker configuration, Kubernetes and Padman Demonstrated Experience with load balancers VM configuration and control systems (VMWare or similar) Juniper (JunOS) specific working knowledge VPN endpt to endpt knowledge [Setup, Troubleshooting, Handling pre-share key (sensitive) Python

Posted 3 days ago

Apply

0 years

0 Lacs

India

On-site

Role: Senior System Engineer II (AI Infrastructure) Stack: Linux, LXC, Python, libvirt, KVM, QEMU, CEPH, VyOS, GPU network fabric Tools: NetPlan, Ansible, Prometheus, Grafana, Bash Shell scripting What You’ll Be Doing: Provision, deploy, and maintain GPU and compute infrastructure in high-performance environments. Work with your fellow sharks to design, develop, and optimize the next generation of GPU infrastructure. Manage and configure Linux networking using Netplan. Develop and maintain infrastructure automation scripts using Python, Bash, or other scripting languages. Collaborate with cross-functional teams to meet AI/ML infrastructure needs. Work with customers and stakeholders to define and refine infrastructure requirements needed to support their AI/ML workload Work with infrastructure technical leaders to define infrastructure requirements to store, move, and manipulate large datasets Guide performance teams on industry standard testing methodologies and help optimize for GPU fabric throughput Identify security improvements and drive review discussions with internal teams Working directly with individual engineering teams to deliver new infrastructure functions and technologies in support of AI/ML products What We’ll Expect From You: Experience delivering bare metal GPU infrastructure Provision, deploy, and maintain GPU and compute infrastructure in high-performance environments. Manage and configure Linux networking using Netplan. Understanding of AI/ML workloads and overall industry trends Strong collaborator and consensus builder. Author and review design documentation. Experience troubleshooting, analyzing, and debugging relevant virtualization stacks (kernel, KVM, QEMU) Experience as a software engineer / developer in a large scale, distributed environment Experience writing secure, testable, and robust low-level code Deep understanding of operating systems, virtualization, and Linux internals Familiarity with related virtualization fundamentals, including networking datapath, containers, and data persistence layers A critical thinker dedicated to solving problems and delivering solutions Required Skills & Qualifications Strong systems administration experience in Linux (Ubuntu or Debian-based systems preferred). Scripting expertise (Python, Bash, etc.) for automation and tooling. Experience in infrastructure provisioning and deployment, both bare-metal and containerized. Proficiency with Netplan and Linux network stack configuration (routes, interfaces, DNS). Familiarity with GPU technologies and cloud platforms (AWS, Azure, GCP) is a plus. Day-to-day tasks as seen on the job: Provision, deploy, and maintain GPU and compute infrastructure in high-performance environments. Manage and configure Linux networking using Netplan. MAAS Runbooks (documents) Author and edit runbooks (procedures) for provisioning (deployment), decommissioning (reclaim/removal), repave (LXC container cleanup) etc. Author and edit runbooks (playbooks) for new issues encountered and troubleshooting performed to fix them. Provisioning procedure New customer deployments Existing customer expansions Minimum Skillset: MAAS – structure, cloud-init (initial configuration scripts) LXC basic start, stop, destroy, list of container(s) maneuvering LXC access from bare metal host troubleshooting services created within LXC at deploy time Local storage Partitions (on physical disk devices) Software RAID mdadm (md0/1/2, on partitions) Filesystem (ext4, on S/w RAID block devices) NCCL – single-node and multi-node distributed tests Database – PostgreSQL, psql: basic SELECT, UPDATE queries Specific NVIDIA driver and CUDA version install on specific Ubuntu and HWE kernel SCM – Git, GitHub: branch, commit, PR, markdown File – yaml (JSON), sh (Bash Shell), py (Python) formatting knowledge DCIM – NetBox ITSM – Jira Show more Show less

Posted 1 month ago

Apply
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Featured Companies