Systems and Infrastructure Engineer II

3 - 8 years

30 - 35 Lacs

Posted:-1 days ago| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description

  • Xmatters workflow integration with scalability, resiliency and performance
  • Assist Walmart Store/Distribution Center associate s in their day-to-day issues related to store functions.
  • Support through Call functioning for internal escalations.
  • Follow the SOP for troubleshooting and get resolutions for issues reported by Store operations team.
  • Must be able to do multitasking whenever needed.
  • Flexible in Shift & Support hours
  • Expert level understanding of incident management processes and procedures.
  • Calm under pressure when participating in major incident response.
  • Deep technical understanding of core infrastructure, cloud services, platforms and micro-services.
  • Ability to understand and capture key data from logs at an expert level.
  • Ability to understand traffics flows and key dependencies between services.
  • Ability to effectively triage be able to detect and determine symptom vs cause.
  • Detect and quantify impact.
  • Expert level troubleshooting skills using a diverse set of tools and methods
  • Analyze trends to pro-actively prevent incidents.
  • Focus on immediate restoration vs root cause.
  • Research and recommend alternative actions for incident resolution Develop procedures and documentation to support this.
  • Create and maintain procedural documentation.
  • Identify and drive continuous improvement efforts to reduce waste (eliminate, automate or streamline).
  • Absorb knowledge and understand complex distributed systems - ability to share and impart this knowledge into your peer group and beyond.
  • Build tools to improve visibility, pro-actively detect issues and restore system availability.
  • Develop automation and self-healing with DevOps, Engineering and SRE partners.
  • Strong focus on collecting and inferring metrics.
  • Clear communication skills.
  • Ability to contribute to multiple incidents at any given time.
  • Analyze systems and make recommendations to prevent possible problems. Takes lead on issue resolution activities using knowledge of complex and company-wide systems.
  • Scripting and software development to automate and help enhance existing solutions.
  • Experience owning, developing and evangelizing a product.
  • Ability to gather requirements and build solutions into a product.
  • Evangelize operational excellence
Additional responsibilities may include:
  • Actively provide data for and participate in root cause analysis.
  • Define CCC onboarding process and ensure they are adhered to when accepting new systems into service.
  • Share knowledge globally between CCC teams.
  • Analyze systems and make recommendations to prevent possible incidents.
  • Strive for continuous improvement and make recommendations based on CCC process.
  • Act as a technical focal point for the CCC team.
  • Transition observability projects in the command centre for better visibility.
  • Other duties and responsibilities as assigned.
What youll bring:
  • Experience building and scaling distributed, highly available systems
  • Experience developing applications for a cloud environment such as Google Cloud Platform or Microsoft Azure
  • Experience with frameworks/tools such as GIT, xMatters workflow integration, Service Now Integration etc
  • Comfortable building metrics, monitoring, and alerting for micro-services
  • 3+ years in an infrastructure, systems, engineering or development environment delivering operational excellence to highly complex distributed systems.
  • Bachelors Degree in Computer Science or a related field, or relevant work experience.
  • Strong and demonstrable incident management skills with relevant experience in an enterprise organization.
  • 3+ Years of relevant experience on Major Incident Management with ITIL4 Certification
  • Experience and exposure working is a 24/7 operations support environment.
  • Methodical and systematic problem-solving approach, combined with a solid awareness of ownership, initiative and drive.
  • Experience investigating, analyzing and troubleshooting large scale enterprise systems.
  • Networking knowledge and understanding of network concepts, such as different protocols (TCP/IP, UDP, ICMP, etc.), MAC addresses, IP packets, DNS, OSI layers, and load balancing).
  • Programming experience in one or more of the following languages: Go, Java, Python, Ruby, Shell.
  • Experience administering Unix/Linux in a production environment.
  • Understanding of Unix/Linux systems from kernel to shell and beyond, taking in system libraries, file systems, and client-server protocols along the way.
  • Experience working with and developing enterprise monitoring/tooling solutions like Grafana, Kibana, Splunk, Graphite, Nagios, New Relic and DynaTrace.
  • Working knowledge of one or more cloud technologies such as AZURE, GCP and OpenStack.
  • Working knowledge of CI/CD pipelines

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now
Walmart logo
Walmart

Retail

Bentonville Arkansas

RecommendedJobs for You