Posted:8 hours ago|
Platform:
On-site
Full Time
In this role, you will play a key role in ensuring the reliability, performance, and scalability of our cloud-based platform. Your expertise will be essential in maintaining the health and availability of critical systems and applications, contributing directly to the seamless delivery of high-quality software and services. By applying strong technical knowledge and support best practices, you will proactively troubleshoot issues, optimize system performance, and collaborate with crossfunctional teams to minimize downtime and improve infrastructure efficiency. Your efforts will help drive operational excellence and ensure a resilient and scalable platform that meets business demands.
Bachelor's degree or master's in computer science, Engineering, Software Engineering or a relevant field
Relevant 3+ years of experience in SRE / Production/Product Support role, with a track record of implementing SRE practices
• Ensure 24x7 uptime and reliability of production systems
• Investigate, troubleshoot, and resolve production issues in real-time
• Collaborate with development and engineering teams to optimize system performance and reduce operational toil
• Participate in on-call rotation to provide support for critical systems
• Develop and implement automation for deployments, monitoring, and routine tasks
• Continuously enhance infrastructure and workflows to reduce manual intervention
• Maintain and improve CI/CD pipelines and Infrastructure-as-Code practices
• Contribute to system monitoring, logging, and alerting enhancements
• Work closely with stakeholders across time zones and cultures
• Engage with clients via calls to understand reported issues and conduct real-time investigations when necessary.
• Proven track record implementing SRE practices and improving system resilience
• Hands-on experience with cloud platforms such as AWS, Azure, or GCP. Relevant certifications would be a plus.
• Hands-on experience with Linux OS, including system commands and shell scripting
• Proficient in Python, Docker and containerization, with experience in at least one additional scripting language such as Bash or PowerShell.
• Hands-on experience with MongoDB, including designing, configuring, and managing replica sets. Familiarity with replication, failover, high availability, and performance optimization is required.
• Strong problem-solving skills and the ability to analyse complex technical issues
• Excellent communication and collaboration skills across global teams
• Proven experience managing and meeting customer-facing Service Level Agreements (SLAs), ensuring timely resolution of issues and maintaining high levels of customer satisfaction.
• Hands-on experience and proficiency with Infrastructure as Code (IaC) tools such as Terraform, CloudFormation, or Ansible for automating cloud infrastructure deployment and management.
• Good Understanding of Tools: o Orchestration – Autosys / Airflow or Cron o Monitoring & Logging – PagerDuty, Prometheus & Grafana or Datadog, Splunk o Project Management / ITSM – Service Now (Basic ability to navigate / create change tickets / incidents) , Jira (Basic ability to create Jira Tickets , ability to filter your work)
Gemini Solutions Pvt Ltd
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
We have sent an OTP to your contact. Please enter it below to verify.
Practice Python coding challenges to boost your skills
Start Practicing Python Nowindia
Experience: Not specified
Salary: Not disclosed
hyderābād
8.8 - 9.0 Lacs P.A.
hyderābād
25.0 - 30.0 Lacs P.A.
10.0 - 20.0 Lacs P.A.
gurgaon
8.4 - 10.8 Lacs P.A.
4.0 - 6.0 Lacs P.A.
india
3.0 - 4.8 Lacs P.A.
gurugram, haryana, india
Salary: Not disclosed
chennai, tamil nadu, india
Salary: Not disclosed
bengaluru
5.0 - 10.0 Lacs P.A.