Job
Description
As a highly motivated software engineer for the NVIDIA NetQ team, you will be responsible for working on a cutting-edge Network management and Telemetry system in the cloud. This system is designed using modern principles at an internet scale. NVIDIA NetQ offers a highly scalable network operations toolset that provides real-time visibility, troubleshooting, and validation for Cumulus fabrics. By utilizing telemetry, NetQ delivers actionable insights regarding the health of your data center network and seamlessly integrates the fabric into your DevOps ecosystem. Your primary responsibilities will include building and maintaining essential infrastructure components such as NoSQL databases (Cassandra, Mongo), TSDB, and Kafka. You will also be tasked with maintaining CI/CD pipelines to automate the build, test, and deployment processes. Additionally, you will work on enhancing automation for manual workflows through tools like Jenkins, Ansible, and Terraform. Ensuring security by performing scans and handling security vulnerabilities for infrastructure components will be a crucial part of your role. Moreover, you will facilitate the triage and resolution of production issues to enhance system reliability and customer service. To be successful in this role, you should possess at least 5 years of experience in complex microservices based architectures along with a Bachelor's degree. Proficiency in Kubernetes and Docker/containerd is essential, as is familiarity with modern deployment architectures for non-disruptive cloud operations, including blue-green and canary rollouts. You should be an automation expert with hands-on experience in frameworks like Ansible and Terraform. Strong knowledge of NoSQL databases (preferably Cassandra), Kafka/Kafka Streams, and Nginx is required. Expertise in cloud platforms such as AWS, Azure, or GCP is also necessary, along with a solid programming background in languages like Scala or Python. Understanding best practices for managing a highly available and secure production infrastructure is crucial for this role. To differentiate yourself, consider showcasing experience with APM tools like Dynatrace, Datadog, AppDynamics, or New Relic. Skills in Linux/Unix Administration, familiarity with Prometheus/Grafana, and experience in implementing highly scalable log aggregation systems using ELK stack or similar technologies can make you stand out. Moreover, proficiency in implementing robust metrics collection and alerting infrastructure will be advantageous. Joining NVIDIA means becoming part of a team at the forefront of technological innovation. As a company known for its forward-thinking and dedicated workforce, we are always looking for creative, passionate, and self-motivated individuals to contribute to our groundbreaking developments in Artificial Intelligence, High-Performance Computing, and Visualization. If you are eager to be part of this exciting journey, we encourage you to apply and join us in shaping the future of technology. (Note: Job Reference Number - JR1998880),