- Engender reliability and availability starting with metrics and measurements.
- Enable scaling by providing tools, developing training and/or augmenting processes.
- Build tools/automate to prevent re-occurrence of problem to mission critical products/services.
- Augment existing instrumentation to build a cohesive picture of the characteristics of our systems with special attention to points of failure.
- Participate in capacity planning, demand forecasting, software performance analysis and system tuning.
- Develop a deep understanding of the numerous services and applications that come together to deliver Walmart e-commerce/Retail and Enterprise products
- Root-cause analysis complex problems involving multiple parties, networks, hardware, and software that relate to scaling and performance. Secure the system from issues, be they real, perceived, or notional.
Additional responsibilities may include:
- Create Systems Engineering and Architectural documentation to be used by others to build and maintain systems
- Scripting and Development responsibilities: Develop software in several modern languages
- Develops large/complex database-backed systems and understands DB schema and query performance
- Utilises professional best practices in day-to-day work like revision control, unit testing, or other
- Applies statistical data analysis techniques
- Networking responsibilities: Understanding and performing TCP dumps, snoop, and other network sniffers
- Understands and applies knowledge of most protocols (TCP/IP, HTTP, UDP, etc
- ) Application Technologies: Provides recommendations and advice to the team and/or department in the areas of web services, OS, and storage, including being an active liaison to Development, QA, and the Business
- Analyses systems and makes recommendations to prevent potential problems
- Takes lead on issue resolution activities using knowledge of complex and company-wide systems
- Lead end-to-end audit of monitors and alarms based on subsystem knowledge
- Utilises time management and project management skills to lead the resolution of issues in a timely and organised manner, effectively communicating necessary information
- May consult directly with developers or third-party vendors; provides subject matter expertise
- Consistent exercise of independent judgment and discretion in matters of significance
.
Your Qualifications
1. Bachelors Degree or Master s Degree with 6+ years of experience in Computer Science or related field.
2. Proficient in any of the programming languages like Java, GoLang, etc
3. Experience in designing, investigating, analysing, and troubleshooting large-scale enterprise systems.
4. Methodical and systematic problem-solving approach, combined with a solid awareness of ownership, initiative, and drive.
5. Fluency with running services at scale; In depth understanding of Unix systems internals and networking.
6. Networking knowledge and in depth understanding of network concepts, such as different protocols (TCP/IP, UDP, ICMP, etc.), MAC addresses, IP packets, DNS, OSI layers, and load balancing).
7. Understanding of Unix/Linux systems from kernel to shell and beyond, taking in system libraries, file systems, and client-server protocols along the way. Experience administering Linux systems in a production environment.
8. Experience with distributed version control like Git or similar
9. Experience with IaaS and PaaS providers such as AWS, AZURE OpenStack, GCP
10. Experience with containerisation and container platforms. (e.g., Docker, Kubernetes, Docker EE, OpenShift, Mesosphere). Experience with enterprise monitoring solutions like AppDynamics, New Relic, Prometheus, Graphite, Grafana, Nagios, Sensu and Splunk
11. Familiarity with continuous integration/deployment processes and tools such as Jenkins, Maven, Nexus, etc.,