Systems and Infrastructure Engineer II

3 - 5 years

0 Lacs

Posted:1 week ago| Platform: Foundit logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Position Summary...Demonstrates up-to-date expertise and applies this to the development, execution, and improvement of action plans by providing expert advice and guidance to others in the application of information and best practices; supporting and aligning efforts to meet customer and business needs; and building commitment for perspectives and rationales. Provides and supports the implementation of business solutions by building relationships and partnerships with key stakeholders; identifying business needs; determining and carrying out necessary processes and practices; monitoring progress and results; recognizing and capitalizing on improvement opportunities; and adapting to competing demands, organizational changes, and new responsibilities. Models compliance with company policies and procedures and supports company mission, values, and standards of ethics and integrity by incorporating these into the development and implementation of business plans; using the Open Door Policy; and demonstrating and assisting others with how to apply these in executing business processes and practices.What you'll do...

About The Team

CCC team also known as Command and Control Center forms the nerve center of Walmart . The team works across multiple technology groups to ensure the stability and reliability of the company's critical e-commerce infrastructure. The team functions by collaborating internally with TDO (Technical duty officers and SRE (Site reliability engineering) for faster detection and mitigation of incidents The team is aimed at proactively maintaining mission critical infrastructure, cloud platforms, micro services, tools and processes to ensure highest levels of availability and reliability across the Global technology platforms. CCC plays a important role in leading all major incident responses and orchestrating for faster mitigation.

What You'll Do

:As a Store Reliability Operations Engineer within the Global Technology Platforms (GTP) CCC team you will work with other CCC, TDO, SRE, DevOps and Engineering practitioners to pro-actively maintain mission-critical infrastructure, cloud platforms, micro-services, tools, and processes that will ensure highest levels of availability and reliability across our Global Technology platforms. You're right for the job if you are comfortable leading our major incident response team as part of a technical team of engineer's laser focused on restoring service across complex distributed systems. You'll excel if you have enthusiasm for digging deep, and a flare for sharp technical communication, prioritization and organization. You will work directly with our SRE, Engineering and DevOps teams to support our next generation always up cloud-based e-commerce platforms.The CCC Site Reliability Operations Engineer is responsible for pro-actively monitoring, detecting and resolving site issues before they become customer and availability impacting. Technically you will understand the full end to end stack and use this knowledge to detect errors/failures and take corrective action to mitigate. During a major incident, you will draw on your technical skills and knowledge to triage and troubleshoot, differentiating between symptom and cause, to help restore impacting issues. Your ability to continuously challenge yourself and develop a strong network within your peer group will see you exceed in this role. Our goal is to protect the customer experience and deliver outstanding levels of availability.

What You'll do:

  • Xmatters workflow integration with scalability, resiliency and performance
  • Assist Walmart Store/Distribution Center associate's in their day-to-day issues related to store functions.
  • Support through Call functioning for internal escalations.
  • Follow the SOP for troubleshooting and get resolutions for issues reported by Store operations team.
  • Must be able to do multitasking whenever needed.
  • Flexible in Shift & Support hours
  • Expert level understanding of incident management processes and procedures.
  • Calm under pressure when participating in major incident response.
  • Deep technical understanding of core infrastructure, cloud services, platforms and micro-services.
  • Ability to understand and capture key data from logs at an expert level.
  • Ability to understand traffics flows and key dependencies between services.
  • Ability to effectively triage be able to detect and determine symptom vs cause.
  • Detect and quantify impact.
  • Expert level troubleshooting skills using a diverse set of tools and methods
  • Analyze trends to pro-actively prevent incidents.
  • Focus on immediate restoration vs root cause.
  • Research and recommend alternative actions for incident resolution Develop procedures and documentation to support this.
  • Create and maintain procedural documentation.
  • Identify and drive continuous improvement efforts to reduce waste (eliminate, automate or streamline).
  • Absorb knowledge and understand complex distributed systems - ability to share and impart this knowledge into your peer group and beyond.
  • Build tools to improve visibility, pro-actively detect issues and restore system availability.
  • Develop automation and self-healing with DevOps, Engineering and SRE partners.
  • Strong focus on collecting and inferring metrics.
  • Clear communication skills.
  • Ability to contribute to multiple incidents at any given time.
  • Analyze systems and make recommendations to prevent possible problems. Takes lead on issue resolution activities using knowledge of complex and company-wide systems.
  • Scripting and software development to automate and help enhance existing solutions.
  • Experience owning, developing and evangelizing a product.
  • Ability to gather requirements and build solutions into a product.
  • Evangelize operational excellence
Additional responsibilities may include:
  • Actively provide data for and participate in root cause analysis.
  • Define CCC onboarding process and ensure they are adhered to when accepting new systems into service.
  • Share knowledge globally between CCC teams.
  • Analyze systems and make recommendations to prevent possible incidents.
  • Strive for continuous improvement and make recommendations based on CCC process.
  • Act as a technical focal point for the CCC team.
  • Transition observability projects in the command centre for better visibility.
  • Other duties and responsibilities as assigned.

What you'll bring:

  • Experience building and scaling distributed, highly available systems
  • Experience developing applications for a cloud environment such as Google Cloud Platform or Microsoft Azure
  • Experience with frameworks/tools such as GIT, xMatters workflow integration, Service Now Integration etc
  • Comfortable building metrics, monitoring, and alerting for micro-services
  • 3+ years in an infrastructure, systems, engineering or development environment delivering operational excellence to highly complex distributed systems.
  • Bachelor's Degree in Computer Science or a related field, or relevant work experience.
  • Strong and demonstrable incident management skills with relevant experience in an enterprise organization.
  • 3+ Years of relevant experience on Major Incident Management with ITIL4 Certification
  • Experience and exposure working is a 24/7 operations support environment.
  • Methodical and systematic problem-solving approach, combined with a solid awareness of ownership, initiative and drive.
  • Experience investigating, analyzing and troubleshooting large scale enterprise systems.
  • Networking knowledge and understanding of network concepts, such as different protocols (TCP/IP, UDP, ICMP, etc.), MAC addresses, IP packets, DNS, OSI layers, and load balancing).
  • Programming experience in one or more of the following languages: Go, Java, Python, Ruby, Shell.
  • Experience administering Unix/Linux in a production environment.
  • Understanding of Unix/Linux systems from kernel to shell and beyond, taking in system libraries, file systems, and client-server protocols along the way.
  • Experience working with and developing enterprise monitoring/tooling solutions like Grafana, Kibana, Splunk, Graphite, Nagios, New Relic and DynaTrace.
  • Working knowledge of one or more cloud technologies such as AZURE, GCP and OpenStack.
  • Working knowledge of CI/CD pipelines.

About Walmart Global Tech

Imagine working in an environment where one line of code can make life easier for hundreds of millions of people. That's what we do at Walmart Global Tech. We're a team of software engineers, data scientists, cybersecurity expert's and service professionals within the world's leading retailer who make an epic impact and are at the forefront of the next retail disruption. People are why we innovate, and people power our innovations. We are people-led and tech-empowered.We train our team in the skillsets of the future and bring in experts like you to help us grow. We have roles for those chasing their first opportunity as well as those looking for the opportunity that will define their career. Here, you can kickstart a great career in tech, gain new skills and experience for virtually every industry, or leverage your expertise to innovate at scale, impact millions and reimagine the future of retail.Walmart's culture sets us apart, and we know being together helps us innovate, learn and grow great careers. This role is based in our [Bangalore/Chennai] office for daily work, with the flexibility for associates to manage their personal lives.

Benefits

Beyond our great compensation package, you can receive incentive awards for your performance. Other great perks include a host of best-in-class benefits maternity and parental leave, PTO, health benefits, and much more.

Belonging

We aim to create a culture where every associate feels valued for who they are, rooted in respect for the individual. Our goal is to foster a sense of belonging, to create opportunities for all our associates, customers and suppliers, and to be a Walmart for everyone.At Walmart, our vision is everyone included. By fostering a workplace culture where everyone isand feelsincluded, everyone wins. Our associates and customers reflect the makeup of all 19 countries where we operate. By making Walmart a welcoming place where all people feel like they belong, we're able to engage associates, strengthen our business, improve our ability to serve customers, and support the communities where we operate.

Equal Opportunity Employer

Walmart, Inc., is an Equal Opportunities Employer By Choice. We believe we are best equipped to help our associates, customers and the communities we serve live better when we really know them. That means understanding, respecting and valuing unique styles, experiences, identities, ideas and opinions while being inclusive of all people.

Minimum Qualifications...

Outlined below are the required minimum qualifications for this position. If none are listed, there are no minimum qualifications.
Option 1: Bachelor's degree in computer science, computer engineering, information systems, information technology, or related area.Option 2: 3 years experience in technology infrastructure engineering across areas such as compute, storage, network, mobility or virtualization relatedtechnologies.

Preferred Qualifications...

Outlined below are the optional preferred qualifications for this position. If none are listed, there are no preferred qualifications.

Primary Location...

G, 1, 3, 4, 5 Floor, Building 11, Sez, Cessna Business Park, Kadubeesanahalli Village, Varthur Hobli , India R-2322653

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now

RecommendedJobs for You