Role: Senior System Engineer II (AI Infrastructure) Stack: Linux, LXC, Python, libvirt, KVM, QEMU, CEPH, VyOS, GPU network fabric Tools: NetPlan, Ansible, Prometheus, Grafana, Bash Shell scripting What You’ll Be Doing: Provision, deploy, and maintain GPU and compute infrastructure in high-performance environments. Work with your fellow sharks to design, develop, and optimize the next generation of GPU infrastructure. Manage and configure Linux networking using Netplan. Develop and maintain infrastructure automation scripts using Python, Bash, or other scripting languages. Collaborate with cross-functional teams to meet AI/ML infrastructure needs. Work with customers and stakeholders to define and refine infrastructure requirements needed to support their AI/ML workload Work with infrastructure technical leaders to define infrastructure requirements to store, move, and manipulate large datasets Guide performance teams on industry standard testing methodologies and help optimize for GPU fabric throughput Identify security improvements and drive review discussions with internal teams Working directly with individual engineering teams to deliver new infrastructure functions and technologies in support of AI/ML products What We’ll Expect From You: Experience delivering bare metal GPU infrastructure Provision, deploy, and maintain GPU and compute infrastructure in high-performance environments. Manage and configure Linux networking using Netplan. Understanding of AI/ML workloads and overall industry trends Strong collaborator and consensus builder. Author and review design documentation. Experience troubleshooting, analyzing, and debugging relevant virtualization stacks (kernel, KVM, QEMU) Experience as a software engineer / developer in a large scale, distributed environment Experience writing secure, testable, and robust low-level code Deep understanding of operating systems, virtualization, and Linux internals Familiarity with related virtualization fundamentals, including networking datapath, containers, and data persistence layers A critical thinker dedicated to solving problems and delivering solutions Required Skills & Qualifications Strong systems administration experience in Linux (Ubuntu or Debian-based systems preferred). Scripting expertise (Python, Bash, etc.) for automation and tooling. Experience in infrastructure provisioning and deployment, both bare-metal and containerized. Proficiency with Netplan and Linux network stack configuration (routes, interfaces, DNS). Familiarity with GPU technologies and cloud platforms (AWS, Azure, GCP) is a plus. Day-to-day tasks as seen on the job: Provision, deploy, and maintain GPU and compute infrastructure in high-performance environments. Manage and configure Linux networking using Netplan. MAAS Runbooks (documents) Author and edit runbooks (procedures) for provisioning (deployment), decommissioning (reclaim/removal), repave (LXC container cleanup) etc. Author and edit runbooks (playbooks) for new issues encountered and troubleshooting performed to fix them. Provisioning procedure New customer deployments Existing customer expansions Minimum Skillset: MAAS – structure, cloud-init (initial configuration scripts) LXC basic start, stop, destroy, list of container(s) maneuvering LXC access from bare metal host troubleshooting services created within LXC at deploy time Local storage Partitions (on physical disk devices) Software RAID mdadm (md0/1/2, on partitions) Filesystem (ext4, on S/w RAID block devices) NCCL – single-node and multi-node distributed tests Database – PostgreSQL, psql: basic SELECT, UPDATE queries Specific NVIDIA driver and CUDA version install on specific Ubuntu and HWE kernel SCM – Git, GitHub: branch, commit, PR, markdown File – yaml (JSON), sh (Bash Shell), py (Python) formatting knowledge DCIM – NetBox ITSM – Jira Show more Show less
Location: Bangalore / Remote Job Type: Full-time Experience Level: 5+ years Job Summary: We are looking for a System Engineer with practical experience in Cisco UCS implementation, upgrades, and troubleshooting, along with VMware vSphere administration and basic Cisco networking skills. A foundational understanding of Nutanix infrastructure is also preferred. The role focuses on maintaining server and virtualization infrastructure, supporting deployments, and assisting in network-aware system troubleshooting. Key Responsibilities: Deploy, upgrade, and maintain Cisco UCS infrastructure (B-series and C-series) with Intersight. Install and configure physical and virtual servers in enterprise environments. Administer VMware vSphere environments (vCenter, ESXi, VM provisioning, performance tuning). Perform basic configuration and troubleshooting on Cisco network devices (switches, VLANs, port channels). Support Nutanix environments with basic operations (VM creation, snapshots, health checks). Assist with cabling, connectivity, IP addressing, and access layer troubleshooting. Work with cross-functional teams to resolve infrastructure and network issues. Maintain system documentation and operational runbooks. Required Qualifications: Bachelor’s degree in IT, Computer Science, or related field (or equivalent experience). 3-5+ years of experience in Cisco UCS server implementation and support. Solid understanding of VMware vSphere and virtual infrastructure administration. Basic knowledge of Cisco networking concepts (Layer 2 switching, VLANs, trunking). Hands-on experience with server hardware, BIOS/firmware upgrades, and storage connectivity. Familiarity with Nutanix environment administration (AHV / Prism). Strong problem-solving and communication skills. Preferred Qualifications: Cisco or VMware certifications (e.g., CCNA, VCP). Experience in data center operations or converged infrastructure. Scripting knowledge (PowerShell, Python) for automation tasks. Exposure to monitoring tools (vROps, UCS Manager, Nutanix Prism).
Job Description : As a Senior systemengineer\SolutionsArchitect,you will serve as the focused services solution and technical lead engineer supporting the ePlus regional sales team. Focused on key clients drive revenue creation and capture within the account portfolio. Collaborate and work with our master architects, engineers, and consultants to understand our clients’ needs and craft sustainable solutions and IT strategic roadmaps to achieve client objectives. Location: Mumbai / Bangalore. Experience: 6 to 8 years of relevant hands-on experience in IT infrastructure,with a focus on deployment and Day2 operations. Required Skills: Experience with Nutanix and Cisco UCS: • Hands-on expertise with Nutanix (Prism, AHV) and Cisco UCS infrastructure (UCS Manager, Intersight, UCS Central). • Proficient in deploying and managing UCS C-Series, BSeries, and HCI Nutanix nodes, including Hyperflex systems. • Experienced in installing ESXi, Windows, and Linux OS on UCS servers. • Skilled in UCS firmware upgrades, driver installations, and hardware troubleshooting (blades, chassis, FIs, IOMs). • Strong background in VMware installation, patching, and upgrades as per client requirements. • Familiar with managing server I/O components, fabric interconnects/extenders, and GPU-equipped UCS systems. • Basic knowledge of networking (VLANs, routing) and storage (RAID, SAN). • Familiar with networking protocols and cloud/hybrid cloud concepts. • Exposure to scripting (PowerShell, Python) for automation. • Knowledge on Docker and Kubernetes an added advantage. • Strong verbal and written communication skills. What we expect from you.? Deploy and support UCS B/C series servers; configure Fabric Interconnects, service profiles, and boot volumes. • Manage Nutanix HCI, including LCM checks, upgrades, and ESXi host setup. • Apply security patches, hotfixes, and perform version upgrades. • Gather customer requirements; create high- and low-level design documents. Qualifications: Bachelor’s degree in Computer Science, Information Technology, or a related field. • Extensive experience in deploying and managing Cisco UCS, Nutanix HCI, and VMware environments, with strong expertise in clustering and virtualization technologies. • Certifications in VMware, Cisco, or Nutanix are a strong advantage. Skills: VMware vSphere, IOM, Cisco UCS, RAID, vSAN, UCS HCI Nutanix, Windows OS, UCS Manager, Cisco Intersight, Cisco SAN, HyperFlex, UCS Servers, UCS BSeries, Cisco UCS Layer 2, Cisco UCS Blades, VMware Installation, UCS Infrastructure, UCS C-Series, Cisco HyperFlex, VMware ESX, Fabric Interconnects, Linux OS, Chassis, ESXi, ESXi Hypervisor, Unified Computing System.
Job Title: Senior Fullstack Developer Years of Experience: 5 Location: Remote Employment Type: Full-time About the Role We are looking for a highly skilled and motivated Fullstack Developer to join our dynamic team. The ideal candidate will have hands-on expertise across modern frontend, backend, database, and deployment stacks, with a strong understanding of software development best practices. You will be responsible for designing, developing, testing, and deploying scalable applications while contributing to architectural discussions and ensuring secure, high-quality code. Key Responsibilities Design, develop, and maintain scalable web applications across the full stack. Implement responsive, accessible, and high-performing UIs using ReactJS, TypeScript, HTML, and SCSS. Build robust backend services using NodeJS, Express, and FastAPI. Work with relational and non-relational databases (SQLite, MongoDB, PostgreSQL). Integrate and optimize search functionality with Elasticsearch & EQL. Document APIs with Swagger/OpenAPI. Write and maintain automated tests across frontend and backend systems (Jest, React Testing Library, Cypress, Mocha, Chai, Supertest, Pytest). Collaborate with DevOps to implement CI/CD pipelines (GitHub Actions, GitLab CI, Jenkins). Containerize and deploy applications using Docker (build, compose, multi-stage builds). Participate in code reviews, design discussions, and knowledge sharing. Apply security best practices to protect applications against vulnerabilities. Required Skills & Experience Strong experience with ReactJS, TypeScript, HTML, SCSS for frontend development. Proficiency with NodeJS, Express, and FastAPI for backend services. Hands-on expertise with SQLite, MongoDB, and PostgreSQL . Practical knowledge of Elasticsearch and EQL . Experience in writing automated tests for frontend and backend. Experience with CI/CD pipelines and containerized deployments using Docker. Strong debugging, problem-solving, and analytical skills. Familiarity with version control (Git) and agile development practices. Nice to Have (Optional Skills) Knowledge of architectural trade-offs: microservices vs. monolith . Familiarity with REST, GraphQL, and gRPC . Understanding of clean architecture, SOLID principles, and DDD . Exposure to event-driven patterns (Kafka, RabbitMQ) . Awareness of security practices : OWASP top 10, rate limiting, sanitization, secure headers. What We Offer Competitive compensation and benefits. Opportunity to work with modern tech stacks on exciting projects. Collaborative and growth-oriented environment. Continuous learning and professional development opportunities. How to Apply: Interested candidates can share their resume with rashmi@dctech.cloud with the subject line “Application for Fullstack Developer Role” .
Position: Senior Linux Administrator – AI/ML Infrastructure Location: Remote Experience: 5+ Years Type: Full-time Role Overview We are seeking a highly skilled Senior Linux Administrator to join our team, focusing on the implementation and management of on-premises Linux servers optimized for AI/ML workloads. The ideal candidate will have deep expertise in Linux system administration, Kubernetes cluster management, and a strong understanding of data center infrastructure components including servers, networking, storage, and virtualization technologies. This role requires hands-on experience in automating infrastructure, optimizing performance, and ensuring reliability for high-performance computing (HPC) and AI/ML pipelines. Key Responsibilities Deploy, configure, and manage on-premises Linux servers supporting AI/ML workloads. Set up, manage, and troubleshoot Kubernetes clusters for containerized workloads. Optimize system and network performance for compute-intensive applications. Automate provisioning and configuration using Ansible, Terraform, and scripting (Bash/Python). Administer and monitor data center components such as servers, storage arrays, switches, and power systems. Ensure system security, patch management, and compliance across environments. Collaborate with DevOps, Data Science, and AI engineering teams to enable seamless integration with ML pipelines. Plan and implement scalability strategies, maintaining uptime and redundancy. Maintain comprehensive documentation of configurations, policies, and network diagrams. Required Skills & Qualifications 7+ years of experience in Linux system administration (RHEL, Ubuntu, CentOS). Proven hands-on experience with Kubernetes cluster management (setup, scaling, troubleshooting). CKA (Certified Kubernetes Administrator) certification is mandatory. Strong knowledge of data center components – servers, racks, networking switches, storage systems, and virtualization layers. Experience with Ansible, Terraform, CI/CD pipelines, and infrastructure automation. Proficiency in scripting languages (Bash, Python). Understanding of performance tuning, system optimization, and fault diagnosis. Excellent problem-solving, communication, and collaboration skills. Preferred / Good to Have Exposure to NVIDIA GPU management, CUDA environments, and AI/ML compute nodes. Familiarity with HPC environments and distributed computing frameworks. Experience managing monitoring systems (Prometheus, Grafana) and backup solutions. Knowledge of DevOps practices, containerization, and hybrid cloud environments.
Position: Senior Linux Administrator – AI/ML Infrastructure Location: Remote Experience: 5+ Years Type: Full-time Role Overview We are seeking a highly skilled Senior Linux Administrator to join our team, focusing on the implementation and management of on-premises Linux servers optimized for AI/ML workloads. The ideal candidate will have deep expertise in Linux system administration, Kubernetes cluster management, and a strong understanding of data center infrastructure components including servers, networking, storage, and virtualization technologies. This role requires hands-on experience in automating infrastructure, optimizing performance, and ensuring reliability for high-performance computing (HPC) and AI/ML pipelines. Key Responsibilities Deploy, configure, and manage on-premises Linux servers supporting AI/ML workloads. Set up, manage, and troubleshoot Kubernetes clusters for containerized workloads. Optimize system and network performance for compute-intensive applications. Automate provisioning and configuration using Ansible, Terraform, and scripting (Bash/Python). Administer and monitor data center components such as servers, storage arrays, switches, and power systems. Ensure system security, patch management, and compliance across environments. Collaborate with DevOps, Data Science, and AI engineering teams to enable seamless integration with ML pipelines. Plan and implement scalability strategies, maintaining uptime and redundancy. Maintain comprehensive documentation of configurations, policies, and network diagrams. Required Skills & Qualifications 7+ years of experience in Linux system administration (RHEL, Ubuntu, CentOS). Proven hands-on experience with Kubernetes cluster management (setup, scaling, troubleshooting). CKA (Certified Kubernetes Administrator) certification is mandatory. Strong knowledge of data center components – servers, racks, networking switches, storage systems, and virtualization layers. Experience with Ansible, Terraform, CI/CD pipelines, and infrastructure automation. Proficiency in scripting languages (Bash, Python). Understanding of performance tuning, system optimization, and fault diagnosis. Excellent problem-solving, communication, and collaboration skills. Preferred / Good to Have Exposure to NVIDIA GPU management, CUDA environments, and AI/ML compute nodes. Familiarity with HPC environments and distributed computing frameworks. Experience managing monitoring systems (Prometheus, Grafana) and backup solutions. Knowledge of DevOps practices, containerization, and hybrid cloud environments.
Location: Remote Experience: 7+ Years Type: Full-time Role Overview We are seeking a highly skilled Senior Linux Administrator with strong Data Center Networking expertise to join our AI/ML infrastructure team. The role focuses on designing, deploying, and operating on-premises Linux and Kubernetes environments optimized for AI/ML and high-performance computing (HPC) workloads , with a significant emphasis on data center networking architectures . The ideal candidate will bring hands-on experience across Linux systems, Kubernetes, and modern data center networks , including high-speed Ethernet or InfiniBand fabrics used for AI/ML and GPU clusters. This role requires a deep understanding of how network design, latency, throughput, and reliability directly impact AI/ML performance . Key Responsibilities Deploy, configure, and manage on-premises Linux servers supporting AI/ML and GPU-accelerated workloads. Design, implement, and operate data center networking for AI/ML infrastructure , including: High-speed Ethernet (25G/40G/100G/400G) or InfiniBand fabrics Spine-leaf architectures and low-latency network designs Configure and troubleshoot Kubernetes networking , including CNI plugins (Calico, Cilium, Flannel), service networking, ingress, and network policies. Optimize network performance, latency, and throughput for distributed training, storage access, and HPC workloads. Work closely with network teams to integrate switching, routing, VLAN/VXLAN, BGP, and load balancing into Kubernetes and AI platforms. (Desirable) Automate infrastructure and network provisioning using Ansible, Terraform, and scripting (Bash/Python) . Administer and monitor data center components such as compute servers, network switches, storage systems, and virtualization platforms. Troubleshoot end-to-end issues spanning Linux OS, Kubernetes and network layers . Ensure security, segmentation, and compliance across compute and network environments. Plan and implement scalable, highly available architectures for AI/ML platforms. Maintain accurate documentation including network diagrams, IP plans, topology maps, and runbooks . Required Skills & Qualifications 7+ years of experience in Linux system administration (RHEL, Ubuntu, CentOS). Strong hands-on experience with data center networking , including: L2/L3 networking fundamentals (VLANs, routing, BGP, VXLAN) Spine-leaf architectures and modern DC network designs High-bandwidth, low-latency networks for AI/HPC workloads Proven experience managing Kubernetes clusters , with solid understanding of Kubernetes networking concepts. Experience integrating compute, storage, and networking for large-scale on-prem or hybrid data centers. Working knowledge of network performance tuning, packet flow, and troubleshooting tools (tcpdump, iperf, ethtool, etc.). Experience with automation tools such as Ansible, Terraform, and CI/CD pipelines. Proficiency in Bash and Python scripting . Strong understanding of system and network performance optimization . Excellent problem-solving and cross-team collaboration skills. Preferred / Good to Have Experience with NVIDIA GPU networking , GPUDirect, RDMA, or InfiniBand environments. Familiarity with HPC and distributed AI training frameworks . Exposure to data center switches from vendors such as Cisco, Arista, Juniper, NVIDIA (Spectrum), etc. Experience with monitoring and observability tools (Prometheus, Grafana). Knowledge of hybrid cloud networking and on-prem to cloud connectivity. CKA (Certified Kubernetes Administrator) or networking certifications (CCNA/CCNP or equivalent) are a plus.