Posted:3 days ago|
Platform:
On-site
Full Time
Design, develop, and execute end-to-end (E2E) scenario validations that simulate real-world usage of complex AI data platform workflows (data ingestion, transformation, ML pipeline orchestration, etc.).
Collaborate closely with product, engineering, and field teams to identify gaps in coverage and propose test automation strategies.
Develop and maintain automated test frameworks supporting E2E, integration, performance, and regression testing for distributed data/AI services
Monitor system health across the stack (infrastructure, data pipelines, AI/ML workloads), proactively detect failures or SLA breaches.
Champion SRE best practices including observability, incident management, blameless postmortems, and runbook automation.
Analyze logs, traces, and metrics to identify reliability, latency, and scalability issues drive root cause analysis and corrective actions.
Partner with engineering to drive high-availability, fault tolerance, and continuous delivery (CI/CD) improvements.
Participate in on-call rotation to support critical services, ensuring rapid resolution and minimizing customer impact.
Bachelor's or master's degree in computer science, Engineering, or related field (or demonstrated equivalent experience)
3+ years experience in software QA/validation, SRE, or DevOps roles, ideally in data platforms, cloud, or AI/ML environments.
Proficient with DevOps automation and tools for continuous integration, deployment, and monitoring (e.g., Terraform, Jenkins, GitLab CI/CD, Prometheus).
Working knowledge of distributed systems, data engineering pipelines, and cloud-native architectures (OCI, AWS, Azure, GCP, etc.).
Strong proficiency in Java, Python and related technologies
Hands-on experience with test automation frameworks (e.g., Selenium, pytest, JUnit) and scripting (Python, Bash, etc.).
Familiarity with SRE practices: service-level objectives (SLO/SLA), incident response, observability (Prometheus, Grafana, ELK, etc.).
Strong troubleshooting and analytical skills with a passion for reliability engineering and process automation.
Excellent communication and cross-team collaboration abilities.
Career Level - IC2
Oracle
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
We have sent an OTP to your contact. Please enter it below to verify.
thiruvananthapuram, all india
Salary: Not disclosed
Salary: Not disclosed
noida, uttar pradesh, india
Salary: Not disclosed
trivandrum, kerala, india
Salary: Not disclosed