Position Summary... This role manages the Command & Control Center focussed at protecting our Customer, Associate & Merchant journeys both online and offline models across US & Int markets. The ideal candidate will focus on delivering fit for purpose Observability & Optimized recovery paths that protects our Omni channel Customer Experience. Additional focus will remain on Continuous Improvement/Innovation mindset with a passion to drive efficiencies and contain costs.
The responsibility covers the Operational & Engineering aspects to all business segments and the diverse channels for Online & Apps across US, International markets & Enterprise Services. The Availability commitment of our websites across markets is 99.95% (for e-commerce) and this role will have significant contribution to meet and beat the set target.
What youll do... About Team:
Reliability Engineering & Operations (REO) is a part of the larger Global Technology Platforms (GTP) horizontal which is responsible for Design. Build, Deploy & Run for the Infrastructure, Data and Cloud Products and Services that enables the largest 24/7 e-commerce and physical stores landscape in the world.
Our engineers take great pride in their ownership of the Major Incidents and our ability to detect proactively with high degree of Observability along with AI driven mitigation plans to limit impact to the Customer and Associate facing services worldwide
What you ll Do:
- Having strong Engineering focus with acumen to drive meaningful outcomes that creates reliable Customer, Merchant or Associate experience.
- Ensure the Customer, Merchant and Associate journey across our online stores are deployed with appropriate quality checks and are highly available, reliable, scalable and flexible.
- Engage in high priority business/corporate impacting incidents and effectively run production operations including Major Incident Management and the complementing Change & Problem Management
- Collaborate effectively with various functions such as Product, Design, Business, Operations, and other Engineering teams to gain commitments on improvement/remediation initiatives
- Be a transformation agent; constantly striving for excellence in a rapidly evolving environment
- Be a multiplier: Build a high performing team with strong engineering-driven culture that promotes diversity, innovation, and creative problem solving
- Possess a global mindset and develop a positive healthy environment while working internationally across cultures
- Actively lead and be engaged in technical discussions, project execution and incident management to ensure that we meet or beat the stated Availability goals
- Develop a deep understanding of the various Products & Services that come together to deliver Walmart e-commerce and stores products.
- You would provide leadership and mentoring to a highly skilled team on technical and critical functions within Walmart
- You would represent yourself and develop a great partnership with the other functional and business leaders globally to understand, define/refine the priorities
- Set strategy and roadmap for team focused on automation and data science to mature our ability to predict service failures/disruptions
- Actively participate in defining and refining the scope of Business Continuity & DR testing
- Actively engage in Holiday (Peak Trading) business volume discussions across the Omni channel stack and plan the scaling methodologies to meet the demand
What YouII Bring:
- A recognized Bachelor s / master s degree in engineering with 15+ years of experience in Technology Engineering & Operations which includes the Application full stack with Infrastructure on Cloud (Private & Public), Networking (WAN & EDGE), Computes and Traffic with Load balancers/Proxies, DNS.
- Experience in running 24x7 Command & Control Operations team focused on key metrics of MTTD/MTTR
- Ideal candidate should possess deep understanding on Public Cloud Adoption (Azure/GCP), Infrastructure foundational services in the cloud and metrics to govern the cloud environment effectively
- 10+ years of experience in managing, leading and developing technical teams of minimum of 80 - 100 members.
- Experienced in Customer or Site facing transactional web services, and or, internet-based services environment would be added advantage
- Ability to risk assess/review & guide Technical Program executions or delivery of large initiatives and its Operational Assurance
- Ability to drive value with AI workflows and adopt new models for constant generation of value for business and the function
- Good experience in building scalable ecommerce applications or similar environment