Jobs
Interviews

6 Rdma Networking Jobs

Setup a job Alert
JobPe aggregates results for easy application access, but you actually apply on the job portal directly.

2.0 - 4.0 years

0 Lacs

Bengaluru, Karnataka, India

On-site

Job Description The AI2NE Org strives to be global leaders in the RDMA cluster networking domain and enable seamless, accelerated High-Performance Compute (HPC), Artificial Intelligence and Machine Learning advancements. We envision a future where artificial intelligence and machine learning revolutionize industries, reshape societies, and unlock limitless possibilities. Our vision is to be a pioneering force, driving the development and design of state-of-the-art RDMA clusters tailored specifically for AI, ML, HPC workloads. We strive to be the go-to experts in RDMA cluster architecture, leveraging our deep understanding of the unique demands of AI/ML and HPC applications. By staying at the forefront of technological advancements, we aim to redefine the boundaries of what is possible, pushing the envelope of computational capabilities and unlocking unprecedented performance. Supports the design, deployment, and operations of a large-scale global Oracle Cloud Infrastructure (OCI). Primarily focused on the development and support of network fabric and systems through a combination of a deep level understanding of networking at the protocol level coupled with programming skills. As OCI is a cloud-based network with a global footprint, this support will include hundreds of thousands of network devices supporting millions of servers, connected over a mix of dedicated backbone infrastructure, CLos Network, and the Internet. Responsibilities Will primarily use existing procedures and tools to develop and safely execute network change. However, may have to develop new procedures from time to time. Develop solutions to enable front line support teams to act on network failure conditions. Participate in operational rotations as either primary or secondary. Provide break-fix support for events. Serve as the escalation point for event remediation. Participate post-event root cause analysis. Frequently develops scripts to automate routine tasks for team and business units. Coordinate with networking automation services for the development and integration of support tooling. Coordinate with network monitoring to gather telemetry and create alerts rules using them. Build dashboards to represent data at various network layers and device roles that help identify network issues, anomalies. Collaborate with network vendor technical account team and internal Quality Assurance team to drive bug resolution and assist in the qualification of new firmware and/or operating systems. Qualifications: Bachelors degree in CS or related engineering field with 2+ years of Network Engineering experience or masters with 1+ years of Network Engineering experience. Experience working in a large ISP or cloud provider environment. Experience in RDMA Networking is a plus. Experience working in a network operations role. Folks with knowledge of routing protocols such as knowledge of TCP/IP, IPv4, IPv6, DNS, DHCP and SSL. Experience with scripting or automation and data center design Python preferred but must demonstrate expertise in scripting or compiled language. Experience with network monitoring and telemetry solutions. Experience with network modeling and programming YANG, OpenConfig, NETCONF. Ability to use professional concepts and company objectives to resolve issues in creative and effective ways. Excellent organizational, verbal, and written communication skills. Participate in an on-call rotation. Qualifications Career Level - IC2 About Us As a world leader in cloud solutions, Oracle uses tomorrows technology to tackle todays challenges. Weve partnered with industry-leaders in almost every sectorand continue to thrive after 40+ years of change by operating with integrity. We know that true innovation starts when everyone is empowered to contribute. Thats why were committed to growing an inclusive workforce that promotes opportunities for all. Oracle careers open the door to global opportunities where work-life balance flourishes. We offer competitive benefits based on parity and consistency and support our people with flexible medical, life insurance, and retirement options. We also encourage employees to give back to their communities through our volunteer programs. Were committed to including people with disabilities at all stages of the employment process. If you require accessibility assistance or accommodation for a disability at any point, let us know by emailing [HIDDEN TEXT] or by calling +1 888 404 2494 in the United States. Oracle is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans status, or any other characteristic protected by law. Oracle will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law. Show more Show less

Posted 4 days ago

Apply

5.0 - 9.0 years

0 Lacs

pune, maharashtra

On-site

As a senior engineer at NVIDIA, you will be at the forefront of groundbreaking developments in High-Performance Computing, Artificial Intelligence, and Visualization. Your role will involve understanding, analyzing, profiling, and optimizing deep learning workloads on cutting-edge hardware and software platforms. You will collaborate with cross-functional teams to enhance cloud application performance on diverse GPU architectures and identify bottlenecks for optimization. Your responsibilities will include building tools to automate workload analysis, optimization, and other critical workflows. You will drive platform optimization from hardware to application levels and design performance benchmarks to evaluate application efficiency. Your expertise in deep learning model architectures, Pytorch, and large-scale distributed training will be essential in proposing optimizations to enhance GPU utilization. To excel in this role, you should hold a Masters in CS, EE, or CSEE, or possess equivalent experience with at least 5 years in application performance engineering. Experience with large-scale multi-node GPU infrastructure, application profiling tools, and a deep understanding of computer architecture is required. Proficiency in Python and C/C++ for analyzing and optimizing application code is also crucial. Standing out from the crowd can be achieved through strong fundamentals in algorithms, GPU programming experience, and hands-on experience in performance optimization on distributed systems. An understanding of NVIDIA's server and software ecosystem, coupled with expertise in storage systems, Linux file systems, and RDMA networking will set you apart. Join NVIDIA, a leading technology company driving the AI revolution, and play a direct role in shaping the hardware and software roadmap while impacting deep learning users globally. If you are a creative and autonomous individual who is unafraid to push the boundaries of performance analysis and optimization, we invite you to be part of our innovative team. JR1986479,

Posted 2 weeks ago

Apply

5.0 - 9.0 years

0 Lacs

pune, maharashtra

On-site

As a senior engineer at NVIDIA, you will play a crucial role in the optimization of deep learning workloads on cutting-edge hardware and software platforms. Your primary responsibility will be to understand, analyze, and profile these workloads to achieve peak performance. By building automated tools for workload analysis and optimization, you will contribute to enhancing the efficiency of GPU utilization and cloud application performance across diverse GPU architectures. Collaboration with cross-functional teams will be essential as you identify bottlenecks and inefficiencies in application code, proposing optimizations to drive end-to-end platform optimization. Your role will involve designing and implementing performance benchmarks and testing methodologies to evaluate application performance accurately. To qualify for this role, you should hold a Master's degree in CS, EE, or CSEE, or possess equivalent experience. With at least 5 years of experience in application performance engineering, you are expected to have a background in deep learning model architectures, proficiency in tools such as NVIDIA NSight and Intel VTune, and a deep understanding of computer architecture and GPU fundamentals. Proficiency in Python and C/C++ will be essential for analyzing and optimizing application code effectively. To stand out from the crowd, strong fundamentals in algorithms and GPU programming experience (CUDA or OpenCL) will be highly beneficial. Hands-on experience in performance optimization and benchmarking on large-scale distributed systems, familiarity with NVIDIA's server and software ecosystem, and expertise in storage systems, Linux file systems, and RDMA networking will further distinguish you as a top candidate. Joining NVIDIA means being part of a dynamic team that leads the AI revolution, offering you the opportunity to directly impact the hardware and software roadmap in a fast-growing technology company. If you are unafraid to tackle challenges across the hardware/software stack and are passionate about achieving peak performance in deep learning workloads, we want to hear from you.,

Posted 3 weeks ago

Apply

5.0 - 7.0 years

0 Lacs

Bengaluru / Bangalore, Karnataka, India

On-site

The AI2NE Org strives to be global leaders in the RDMA cluster networking domain and enable seamless, accelerated High-Performance Compute (HPC), Artificial Intelligence and Machine Learning advancements. We envision a future where artificial intelligence and machine learning revolutionize industries, reshape societies, and unlock limitless possibilities. Our vision is to be a pioneering force, driving the development and design of state-of-the-art RDMA clusters tailored specifically for AI, ML, HPC workloads. We strive to be the go-to experts in RDMA cluster architecture, leveraging our deep understanding of the unique demands of AI/ML and HPC applications. By staying at the forefront of technological advancements, we aim to redefine the boundaries of what is possible, pushing the envelope of computational capabilities and unlocking unprecedented performance. Supports the design, deployment, and operations of a large-scale global Oracle Cloud Infrastructure (OCI). Primarily focused on the development and support of network fabric and systems through a combination of a deep level understanding of networking at the protocol level coupled with programming skills. As OCI is a cloud-based network with a global footprint, this support will include hundreds of thousands of network devices supporting millions of servers, connected over a mix of dedicated backbone infrastructure, CLos Network, and the Internet. Collaborate with program/project managers to develop milestones and deliverables. Will primarily use existing procedures and tools to develop and safely execute network change. However, may have to develop new procedures from time to time. Develop solutions to enable front line support teams to act on network failure conditions. Mentor junior engineers. Participates in network solution and architecture design process and contribute to the roadmaps development. Participate in operational rotations as either primary or secondary. Provide break-fix support for events. Serve as the escalation point for event remediation. Lead post-event root cause analysis. Frequently develops scripts to automate routine tasks for team and business units. Coordinate with networking automation services for the development and integration of support tooling. Coordinate with network monitoring to gather telemetry and create alerts rules using them. Build dashboards to represent data at various network layers and device roles that help identify network issues, anomalies. Serves as SME on software development projects for network automation and network monitoring. Collaborate with network vendor technical account team and internal Quality Assurance team to drive bug resolution and assist in the qualification of new firmware and/or operating systems. Qualifications: Bachelor's degree in CS or related engineering field with 5+ years of Network Engineering experience or master's with 3+ years of Network Engineering experience. Experience working in a large ISP or cloud provider environment. Experience in RDMA Networking is a plus. Experience working in a network operations role. Folks with strong knowledge of protocols such as MPLS, BGP/OSPF/IS-IS, TCP, IPv4, IPv6, DNS, and DHCP. Also, VxLAN and EVPN will be an added advantage. Extensive experience with scripting or automation and data center design - Python preferred but must demonstrate expertise in scripting or compiled language. Experience with networking protocols such as TCP/IP, VPN, DNS, DHCP, and SSL. Experience with network monitoring and telemetry solutions. Experience with network modeling and programming - YANG, OpenConfig, NETCONF. Ability to use professional concepts and company objectives to resolve complex issues in creative and effective ways. Capable of working under limited supervision. Excellent organizational, verbal, and written communication skills. Excellent judgment in influencing product roadmap direction, features, and priorities. Participate in an on-call rotation. Career Level - IC3

Posted 1 month ago

Apply

2.0 - 4.0 years

0 Lacs

Bengaluru / Bangalore, Karnataka, India

On-site

The AI2NE Org strives to be global leaders in the RDMA cluster networking domain and enable seamless, accelerated High-Performance Compute (HPC), Artificial Intelligence and Machine Learning advancements. We envision a future where artificial intelligence and machine learning revolutionize industries, reshape societies, and unlock limitless possibilities. Our vision is to be a pioneering force, driving the development and design of state-of-the-art RDMA clusters tailored specifically for AI, ML, HPC workloads. We strive to be the go-to experts in RDMA cluster architecture, leveraging our deep understanding of the unique demands of AI/ML and HPC applications. By staying at the forefront of technological advancements, we aim to redefine the boundaries of what is possible, pushing the envelope of computational capabilities and unlocking unprecedented performance. Supports the design, deployment, and operations of a large-scale global Oracle Cloud Infrastructure (OCI). Primarily focused on the development and support of network fabric and systems through a combination of a deep level understanding of networking at the protocol level coupled with programming skills. As OCI is a cloud-based network with a global footprint, this support will include hundreds of thousands of network devices supporting millions of servers, connected over a mix of dedicated backbone infrastructure, CLos Network, and the Internet. Will primarily use existing procedures and tools to develop and safely execute network change. However, may have to develop new procedures from time to time. Develop solutions to enable front line support teams to act on network failure conditions. Participate in operational rotations as either primary or secondary. Provide break-fix support for events. Serve as the escalation point for event remediation. Participate post-event root cause analysis. Frequently develops scripts to automate routine tasks for team and business units. Coordinate with networking automation services for the development and integration of support tooling. Coordinate with network monitoring to gather telemetry and create alerts rules using them. Build dashboards to represent data at various network layers and device roles that help identify network issues, anomalies. Collaborate with network vendor technical account team and internal Quality Assurance team to drive bug resolution and assist in the qualification of new firmware and/or operating systems. Qualifications: Bachelor's degree in CS or related engineering field with 2+ years of Network Engineering experience or master's with 1+ years of Network Engineering experience. Experience working in a large ISP or cloud provider environment. Experience in RDMA Networking is a plus. Experience working in a network operations role. Folks with knowledge of routing protocols such as knowledge of TCP/IP, IPv4, IPv6, DNS, DHCP and SSL. Experience with scripting or automation and data center design - Python preferred but must demonstrate expertise in scripting or compiled language. Experience with network monitoring and telemetry solutions. Experience with network modeling and programming - YANG, OpenConfig, NETCONF. Ability to use professional concepts and company objectives to resolve issues in creative and effective ways. Excellent organizational, verbal, and written communication skills. Participate in an on-call rotation. Career Level - IC2

Posted 1 month ago

Apply

5.0 - 7.0 years

0 Lacs

Bengaluru / Bangalore, Karnataka, India

On-site

The AI2NE Org strives to be global leaders in the RDMA cluster networking domain and enable seamless, accelerated High-Performance Compute (HPC), Artificial Intelligence and Machine Learning advancements. We envision a future where artificial intelligence and machine learning revolutionize industries, reshape societies, and unlock limitless possibilities. Our vision is to be a pioneering force, driving the development and design of state-of-the-art RDMA clusters tailored specifically for AI, ML, HPC workloads. We strive to be the go-to experts in RDMA cluster architecture, leveraging our deep understanding of the unique demands of AI/ML and HPC applications. By staying at the forefront of technological advancements, we aim to redefine the boundaries of what is possible, pushing the envelope of computational capabilities and unlocking unprecedented performance. Supports the design, deployment, and operations of a large-scale global Oracle Cloud Infrastructure (OCI). Primarily focused on the development and support of network fabric and systems through a combination of a deep level understanding of networking at the protocol level coupled with programming skills. As OCI is a cloud-based network with a global footprint, this support will include hundreds of thousands of network devices supporting millions of servers, connected over a mix of dedicated backbone infrastructure, CLos Network, and the Internet. Collaborate with program/project managers to develop milestones and deliverables. Will primarily use existing procedures and tools to develop and safely execute network change. However, may have to develop new procedures from time to time. Develop solutions to enable front line support teams to act on network failure conditions. Mentor junior engineers. Participates in network solution and architecture design process and contribute to the roadmaps development. Participate in operational rotations as either primary or secondary. Provide break-fix support for events. Serve as the escalation point for event remediation. Lead post-event root cause analysis. Frequently develops scripts to automate routine tasks for team and business units. Coordinate with networking automation services for the development and integration of support tooling. Coordinate with network monitoring to gather telemetry and create alerts rules using them. Build dashboards to represent data at various network layers and device roles that help identify network issues, anomalies. Serves as SME on software development projects for network automation and network monitoring. Collaborate with network vendor technical account team and internal Quality Assurance team to drive bug resolution and assist in the qualification of new firmware and/or operating systems. Qualification: Bachelor's degree in CS or related engineering field with 5+ years of Network Engineering experience or master's with 3+ years of Network Engineering experience. Experience working in a large ISP or cloud provider environment. Experience in RDMA Networking is a plus. Experience working in a network operations role. Folks with strong knowledge of protocols such as MPLS, BGP/OSPF/IS-IS, TCP, IPv4, IPv6, DNS, and DHCP. Also, VxLAN and EVPN will be an added advantage. Extensive experience with scripting or automation and data center design - Python preferred but must demonstrate expertise in scripting or compiled language. Experience with networking protocols such as TCP/IP, VPN, DNS, DHCP, and SSL. Experience with network monitoring and telemetry solutions. Experience with network modeling and programming - YANG, OpenConfig, NETCONF. Ability to use professional concepts and company objectives to resolve complex issues in creative and effective ways. Capable of working under limited supervision. Excellent organizational, verbal, and written communication skills. Excellent judgment in influencing product roadmap direction, features, and priorities. Participate in an on-call rotation Career Level - IC3

Posted 1 month ago

Apply
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Featured Companies