Sr. Site Reliability Engineer, (System Admin)

0 years

0 Lacs

Posted:3 weeks ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Job Summary : Experienced Systems Administrator with a strong foundation in Linux, infrastructure management, and incident response, skilled in monitoring, troubleshooting, and maintaining reliable systems across virtualized and cloud-based environments. Responsibilities : Collaborate with the operations team to manage escalations and oversee incident management. Implement strategies and solutions to enhance daily operations, including system stability, security, and scalability. Drive real-time monitoring of system performance and capacity, addressing alerts and optimizing systems. Lead troubleshooting efforts, coordinating responses to network and system issues. Conduct and oversee server, application, and network equipment setup and maintenance. Ensure effective outage notification and escalation for prompt resolution. Mentor and train the team members on technical skills and troubleshooting methods. Maintain up-to-date documentation of processes and procedures in the WIKI. Key Skills : Experience : Minimum 4 years in Linux system administration. Technical Skills : Datacenter technologies and cloud (AWS/GCP). Application deployment with Git, StackStorm, etc. Strong troubleshooting skills across networks and systems; familiarity with network protocols (TCP/IP, UDP, ICMP) and tools like TCPdump. Advanced diagnostic skills in network performance and system capacity monitoring. Proficient in Linux command-line and system administration. Soft Skills : Analytical skills with an ability to interpret and act on data. Ability to prioritize and escalate issues effectively. Adaptability to shift work and capacity for multitasking in high-pressure scenarios. Excellent leadership, communication, and interpersonal skills. Qualifications : Bachelor’s degree in Computer Science, Engineering (BE/B.Tech), MCA, or M.Sc (IT). Must-Have : Configuration Management : Basic experience with Ansible, SaltStack, StackStorm, or similar. CI/CD : Basic experience with Jenkins or similar. Monitoring : Experience with Nagios, Sensu, Zabbix, or similar. Log Analytics : Basic experience with Splunk/ Elasticsearch/ Sumo Logic/ Prometheus/ Grafana, or similar. Virtualization : VMware, KVM, or similar. Linux & Networking : Strong fundamentals in Linux, troubleshooting, and networking. Containerization : Knowledge of Kubernetes, Rancher, or similar. Good to Have : Cloud Providers : AWS or GCP. Networking : Advanced knowledge of BGP, F5 Load Balancer, and switching protocols. Certifications : RHCSA, CCNA, or equivalent. Show more Show less

Mock Interview

Practice Video Interview with JobPe AI

Start Reliability Interview Now

My Connections Crest Data

Download Chrome Extension (See your connection in the Crest Data )

chrome image
Download Now

RecommendedJobs for You