As a Senior Engineer (L3) specializing in Defect Management & DevOps, you will play a critical role in driving operational excellence, ensuring defect-free delivery pipelines, and strengthening reliability across cloud-native platforms. You will collaborate closely with engineering, QA, SRE, and product teams to manage end-to-end defect processes, streamline automation, and enhance service observability. The role demands deep analytical capability, strong DevOps experience, and the ability to influence cross-functional improvements through data-driven insights and advanced troubleshooting.
You will act as a subject matter expert (SME) in DevOps and GCP/AWS, overseeing end-to-end release processes, governance, and delivery pipelines. This role requires leadership, deep technical knowledge, and excellent communication skills.
- Serve as the Subject Matter Expert (SME) for cloud platforms, primarily AWS (GCP exposure is a plus), providing guidance on cloud best practices, architectural decisions, and solution design.
-
Support customers with core Managed Services technologies, including Cloud, Automation, Terraform, CI/CD, and containerization.
-
Design, implement, and optimize cloud-native and DevOps solutions aligned with customer and organizational objectives.
-
Lead technical discussions, demos, and customer engagements while effectively communicating complex technical concepts to both technical and non-technical stakeholders.
-
Assist with team-building activities such as interviewing, onboarding, and aligning technical resources.
-
Provide technical leadership, coaching, and mentorship to junior team members.
-
Maintain strong project and situational awareness to ensure deliverables meet timelines and organizational expectations.
-
Develop high-quality documentation including architectures, workflows, runbooks, and other written deliverables.
-
Act as a technical expert in internal knowledge-sharing initiatives and external client interactions.
-
Influence cloud governance, operational policies, best practices, and process improvements across teams and customer environments.
-
Ensure precision, accuracy, and strong attention to detail across all tasks and deliverables.
- Act as the SME for Defect Management processes, governance, tooling, and reporting.
-
Own and manage the full defect lifecycle, including logging, triage, prioritization, RCA, corrective actions, and closure.
-
Partner with Development, QA, SRE, and Product teams to ensure timely resolution of high-impact issues.
-
Establish and maintain defect dashboards, KPIs, and trend analytics to drive quality and process improvements.
-
Develop standardized runbooks, escalation workflows, and operational procedures for defect handling.
-
Lead cross-team Root Cause Analysis (RCA) investigations and drive Corrective and Preventive Actions (CAPA) implementations.
-
Improve operational readiness through enhanced monitoring, alerting, and structured incident-to-defect workflows.
-
Provide guidance on CI/CD optimization, automation strategies, infrastructure stability, and reliability engineering.
-
Mentor junior engineers in DevOps principles, tooling, defect analysis techniques, and troubleshooting best practices.
-
Defect Management Expertise
-
Full ownership of defect lifecycle ensuring SLA adherence.
-
Deep understanding of SDLC, change management, and ITIL best practices.
-
Ability to analyze defect patterns, severity trends, root causes, and long-term systemic issues.
-
Conduct structured RCA using 5 Why’s, Fishbone, Fault Tree Analysis.
-
Define and enforce severity, categorization, and prioritization standards.
-
Create dashboards and quality metrics to drive continuous improvement.
-
Tools & Skills:
-
Strong JIRA workflow, automation rule, dashboard, and reporting expertise.
-
Ability to visualize defect trends and quality metrics effectively.
-
Observability, Monitoring & SIEM Tools
-
Hands-on experience with Dynatrace, Datadog, Prometheus, Grafana, CloudWatch, or similar tooling.
-
Skilled in APM analysis, log correlation, anomaly detection, service mapping, and performance troubleshooting.
-
Build and maintain dashboards and alert frameworks.
-
Integrate monitoring insights with DevOps and operational workflows.
-
Exposure to SIEM event analysis for operational and security correlation.
- Build, enhance, and support CI/CD pipelines across multiple environments using AWS CodePipeline, CodeBuild, CodeDeploy, and Git-based workflows.
-
Collaborate on automation initiatives using Terraform, CloudFormation, AWS CDK, or equivalent IaC tools to standardize and streamline deployments.
-
Deploy and manage AWS cloud-native services including EKS, ECS, Lambda, API Gateway, S3, IAM, and supporting architectures.
-
Work with containers and orchestration platforms such as Kubernetes, EKS, ECS, and AKS (where required).
-
Implement deployment best practices such as blue/green, rolling updates, and automated rollback strategies to ensure safe, repeatable releases.
-
Troubleshoot complex deployment issues, environment drift, infrastructure failures, performance bottlenecks, and service-level degradations.
-
Implement and maintain observability using CloudWatch, Prometheus, Grafana, Datadog, Dynatrace, or equivalent monitoring stacks.
-
Ensure AWS workloads adhere to resiliency, compliance, security, and operational excellence guidelines.
-
Strong hands-on, production-grade DevOps experience in AWS (primary cloud).
-
Deep expertise in Kubernetes, containerized workloads, microservices, autoscaling, and cloud networking.
-
Advanced troubleshooting across AWS services, distributed systems, CI/CD pipelines, and API-driven workflows.
-
Knowledge of AWS cost optimization, tagging, FinOps alignment, and resource lifecycle governance.
-
Exposure to building or maintaining CI/CD pipelines within GCP ecosystems (Cloud Build, GKE, Artifact Registry, etc.).
-
Ability to work with GCP cloud-native services where required, ensuring consistency across hybrid/multi-cloud deployments.
-
Familiarity with GCP IAM, VPC architecture, and core compute/storage/networking components is a plus.
- Strong communication, leadership, and mentoring capabilities.
-
6–10+ years of experience in DevOps, SRE, QA Engineering, or Cloud Operations.
-
Expert-level AWS knowledge (GCP exposure would be a plus).
-
Strong command of IaC tools such as Terraform, CloudFormation, CDK.
-
Experience with CI/CD systems: Jenkins, GitLab CI, AWS CodePipeline.
-
Proficiency with Docker, Kubernetes, and container orchestration.
-
Experience with monitoring technologies: Datadog, Grafana, Prometheus.
-
Experience with JIRA workflows and project tracking.
-
Ability to excel in dynamic, fast-paced environments.
- Demonstrate deep expertise across DevOps, cloud platforms, automation, and engineering practices.
-
Balance hands-on delivery with leadership responsibilities and strategic initiatives.
-
Continuously assess, refine, and enhance processes, documentation, and operational workflows.
-
Adapt effectively to evolving customer requirements, project priorities, and technology landscapes.
-
Engage confidently with senior stakeholders, providing clear communication and technical guidance.
-
Lead scoping, planning, and methodology definition for major technical initiatives and transformations.
-
Contribute to the development of new engineering standards, frameworks, and best practices across teams.
-
Take senior-level ownership of critical defects, escalations, and operational issues, driving them to resolution.
-
Influence and drive cross-team improvements in tooling, quality, automation, and operational efficiency.
-
Ensure prevention mechanisms, automation guardrails, and reliability practices are embedded early in delivery cycles.
-
Lead initiatives focused on defect prevention, observability enhancements, and overall DevOps maturity uplift.
-
Participate in on-call rotations and provide Tier-3 technical expertise for complex issues.
-
Continuously propose, design, and implement enhancements across tooling, automation, and operational frameworks.