We are looking for an experienced Linux & Cloud Administrator to support our multinational team by providing L3/L4 expert support for all Linux OS and infrastructure customer issues mainly in the Azure hyperscaler area. You will lead efforts in trouble shooting, incident handling, root cause analysis, enabling faster issue resolution and continuous improvement of our operational processes.
The ideal candidate possesses deep technical expertise in Linux systems, Azure, exceptional trouble shooting and problem-solving abilities. This role demands comfort in a fast-paced, dynamic environment and the ability to operate effectively in a global 24x7 setting.
Responsibilities:
- The Senior Linux & Cloud Administrator is responsible for 24x7 availability of SAP systems running in SAP ECS;
- Respond, troubleshoot and resolve alerts & incidents in the Linux OS or infra layer - Azure hyperscaler.
- Manage Azure services and resources , including virtual machines,
storage and network. - Monitor and manage Azure infrastructure to ensure optimal performance, scalability, and security.
- Manage Azure virtual networks , subnets, routing, and network security groups.
Implement monitoring solutions and configure alerts to proactively monitor the Azure environment. - Develop and maintain automation scripts (e.g., PowerShell) to streamline routine tasks and optimize processes.
- Collaborate with internal teams to understand and address their Azure requirements and challenges.
- Maintain detailed documentation of Azure configurations, procedures, and best practices.
- Follow change management processes during service request execution;
- Seek opportunities to streamline standard operating procedures through automation;
- Must feel comfortable working in a fast paced, dynamic and flexible environment;
- Support Operations 24/7 model with oncall/on duty/ weekend tasks/activities based on the shift schedule.
- Apply ITIL incident management processes to ensure timely resolution.
What you bring
Experience (Role Requirements):
- Experience: 8+ years of professional experience in Linux system administration with a demonstrated ability to perform trouble shooting and incident handling.
Technical Skills:
- Expert-level Linux knowledge : Deep understanding of Linux internals, kernel architecture, process and memory management, filesystems, and system calls.
- Cloud Infrastructure : Experience with Azure Cloud.
- Troubleshooting and Diagnostics : Mastery of tools like top, htop, vmstat, iostat, sar, ps, netstat, ss, journalctl, rsyslog, dmesg, strace, lsof, tcpdump, wireshark, perf, systemd-analyze.
- Networking: Advanced understanding of TCP/IP, network interfaces, routing, DNS, DHCP, firewalls, and diagnostic tools.
- Backup and Restore:
- Infra service: Knowledge of infra services like DNS, LDAP etc.
- Security: Strong knowledge of security principles, OS hardening, compliance, and tools for vulnerability scanning and intrusion detection.
- Scripting and Automation : Proficiency in Shell scripting, Python, and/or other scripting languages. Experience with Infrastructure-as-Code tools like Ansible, Puppet, Chef, or Terraform.
- Virtualization: Familiarity with Docker, Kubernetes, and other virtualization technologies.
Soft Skills:
- Analytical and Problem-Solving : Exceptional ability to analyze complex issues, identify root causes, and implement effective solutions.
- Communication: Excellent written and verbal communication skills, with the ability to explain technical findings clearly to both technical and non-technical audiences.
- Documentation : Ability to create clear, concise, and comprehensive technical documentation.
- Incident Management : Experience with ITIL or similar incident management frameworks.
- Continuous Learning: A strong commitment to ongoing learning and professional development.
Language Skills:
- Fluency in English is essential.
Tools and Technologies:
- Monitoring Tools: Prometheus, Grafana
- Log Management: Splunk
- Diagnostic Tools: (As listed in technical skills)