Get alerts for new jobs matching your selected skills, preferred locations, and experience range. Manage Job Alerts
8.0 - 13.0 years
12 - 16 Lacs
hyderabad, bengaluru
Work from Office
Job : Senior Infrastructure Test & Validation Engineer (Zero-Touch GPU Cloud GitOps Validation & Certification) We are seeking a Senior Infrastructure Test & Validation Engineer with 10+ years of experience to lead the Zero-Touch Validation, Upgrade, and Certification automation of our on-prem GPU cloud platform. This role focuses on ensuring the stability, performance, and conformance of the entire stackfrom hardware to Kubernetesusing automated, GitOps-based validation pipelines. The ideal candidate has a strong infrastructure background with deep hands-on skills in Sonobuoy, LitmusChaos, k6, and pytest, and is passionate about automated test orchestration, platform resilience, and continuous conformance. Key Responsibilities Design and implement automated, GitOps-compliant pipelines for validation and certification of the GPU cloud stack across hardware, OS, Kubernetes, and platform layers. Integrate Sonobuoy for Kubernetes conformance and certification testing. Design and orchestrate chaos engineering workflows using LitmusChaos to validate system resilience across failure scenarios. Implement performance testing suites using k6 and system-level benchmarks, integrated into CI/CD pipelines. Develop and maintain end-to-end test frameworks using pytest and/or Go, focusing on cluster lifecycle events, upgrade paths, and GPU workloads. Ensure test coverage and validation across multiple dimensions: conformance, performance, fault injection, and post-upgrade validation. Build and maintain dashboards and reporting for automated test results, including traceability, drift detection, and compliance tracking. Collaborate with infrastructure, SRE, and platform teams to embed testing and validation early in the deployment lifecycle. Own quality assurance gates for all automation-driven deployments. Required Skills & Experience 10+ years of hands-on experience in infrastructure engineering, systems validation, or SRE roles. Primary key skills required are pytest, Go, k6 scripting, automation frameworks integration (Sonobuoy, LitmusChaos), CI integration Strong experience with: o Sonobuoy for Kubernetes conformance and diagnostics o LitmusChaos for fault injection and resilience validation o k6 for performance/load testing in distributed environments o pytest or Go-based test frameworks for automation and validation scripting Deep understanding of Kubernetes architecture, upgrade patterns, and operational risks. Experience validating infrastructure components (GPU drivers, kernel modules, CNI, CRI, etc.) across lifecycle events. Proficient in GitOps workflows and integrating tests into declarative, Git-backed pipelines (e.g., with Argo CD, Flux). Hands-on experience with CI/CD systems (e.g., GitHub Actions, GitLab CI, Jenkins) to automate test orchestration. Solid scripting and automation experience (Python, Bash, or Go). Familiarity with GPU-based infrastructure and its performance characteristics is a strong plus. Strong debugging, root cause analysis, and incident investigation skills.
Posted 2 hours ago
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
We have sent an OTP to your contact. Please enter it below to verify.
Accenture
71627 Jobs | Dublin
Wipro
26798 Jobs | Bengaluru
Accenture in India
22262 Jobs | Dublin 2
EY
20323 Jobs | London
Uplers
14624 Jobs | Ahmedabad
IBM
13848 Jobs | Armonk
Bajaj Finserv
13848 Jobs |
Accenture services Pvt Ltd
13066 Jobs |
Amazon
12516 Jobs | Seattle,WA
Capgemini
12337 Jobs | Paris,France