As a Sr Server Administrator, you will be responsible for the day-to-day administration, maintenance, and troubleshooting of our Linux-based server infrastructure, supporting our US-based clients. You will ensure the stability, security, and performance of our systems, automate tasks, and contribute to the improvement of our infrastructure. You will work closely with other members of the team to support our development and production environments.
CORPORATE VALUES
- Respectful communication and cooperation : We prioritize respectful communication, fostering an environment where everyone is treated with dignity and respect.
- Teamwork and employee participation : Collaboration and teamwork thrive through diverse perspectives, both within our teams and in our interactions with our customers.
- Work/life balance that supports our employees varying needs : We value the well-being of our employees, recognizing that a healthy work-life balance is pivotal to our collective success.
- Embracing communities : We embrace and support the communities that nurture us. Our employees' dedication to fostering positive change is a source of immense pride for us.
The essential functions of this position include:
- Administer and troubleshoot Linux servers, including storage and network services.
- Automate server provisioning and other infrastructure operations using Ansible.
- Perform basic network and storage troubleshooting.
- Manage and monitor Nvidia GPUs on servers (basic understanding required).
- Maintain and update server documentation.
- Collaborate with other teams to resolve technical issues.
- Create Ansible playbooks for bare metal systems management.
- Contribute to the continuous improvement of our infrastructure and processes.
QUALIFICATIONS
- Strong Linux administration skills, including experience with storage and network services.
- Proficiency in using Ansible for automation
- GitHub Expertise
- Basic understanding of Nvidia GPU management on servers.
- Experience with Container technologies
- Basic network and storage troubleshooting skills.
- Excellent problem-solving and analytical skills.
- Ability to work independently and as part of a team, especially in a remote setting.
- Strong communication and documentation skills.
Preferred Skills
- Experience with Dell and Supermicro servers.
- Experience with MAAS (Metal as a Service) tool for GPU node systems management and provisioning.
- Experience creating Ansible playbooks for bare metal systems management.
- Scripting skills (e.g., Bash, Python).
- Experience with monitoring tools (e.g., Nagios, Zabbix).
- Familiarity with virtualization technologies (e.g., KVM, VMware).