Posted:2 days ago|
Platform:
Work from Office
Full Time
Summary:
The Server SRE is responsible for ensuring the reliability, scalability, and performance of server infrastructure. This role combines software engineering, development and systems engineering to automate operations, manage incidents, achieve a noise-free environment. The candidate will do the automation development work and work closely with infrastructure teams to implement observability and automation solutions..Must Have Skills- Strong experience in Linux/Unix server administration- Proficiency in automation development using Python, Bash, or Shell scripting- Hands-on experience with monitoring tools such as Prometheus, Grafana, SolarWinds- Ability to analyze incidents and problems to reduce alert noise- Experience with CI/CD pipelines , GitHub and DevOps practices- Hands-on experience in creation of IAC (development using using Terraform / Anisble)- Familiarity with server performance metrics and observability toolsGood to Have Skills- Experience with cloud platforms (AWS, Azure, GCP)- Knowledge of container orchestration (e.g., Kubernetes)- Familiarity with infrastructure as code tools (e.g., Terraform, Ansible)- Exposure to incident management frameworks (e.g., ITIL, SRE principles)Job RequirementsMinimum of 7 years of experience in server administration and reliability engineering. Strong analytical skills and ability to work in a fast-paced environment. Must be able to implement automation and monitoring solutions and analyze incidents to maintain system stability.Key Responsibilities- Monitor and maintain server health across environments- Automate operational tasks and reduce manual interventions- Implement observability solutions including metrics, logging, and tracing- Analyze incidents and perform root cause analysis- Collaborate with teams to improve system reliability and reduce alert noise- Design scalable server architectures for high availability- Conduct capacity planning and performance tuningTechnical ExperienceHands-on experience with server monitoring and automation tools. Strong scripting skills and familiarity with observability platforms. Experience in analyzing incidents and implementing solutions to reduce noise and improve reliability.Professional AttributesExcellent problem-solving and analytical skills. Strong communication and collaboration abilities. Proactive mindset with a focus on continuous improvement and operational excellence.Educational Qualification and CertificationBachelors Degree in Computer Science, Information Technology, or related field. Certifications in Linux administration, cloud platforms, or SRE practices are a plus.
Advent Global Solutions
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
We have sent an OTP to your contact. Please enter it below to verify.
chennai, bengaluru
20.0 - 30.0 Lacs P.A.
gurugram, haryana, india
Salary: Not disclosed
hyderabad
9.0 - 15.0 Lacs P.A.
hyderabad
5.0 - 9.0 Lacs P.A.
coimbatore
5.0 - 9.0 Lacs P.A.
8.0 - 17.0 Lacs P.A.
bengaluru
5.0 - 9.0 Lacs P.A.
5.0 - 9.0 Lacs P.A.
noida, chennai, bengaluru
14.0 - 22.5 Lacs P.A.
Salary: Not disclosed