Posted:2 months ago|
Platform:
On-site
Contractual
Job Summary:
• Lead efforts to maintain high availability and reliability of critical services.
• Define and monitor SLIs, SLOs, and SLAs to ensure business requirements are met.
• Establish and improve incident management processes and on-call rotations.
• Lead incident response and root cause analysis for high-priority outages.
• Drive post-incident reviews and ensure actionable insights are implemented.
• Develop and implement automated solutions to reduce manual operational tasks.
• Optimize CI/CD pipelines for seamless deployments.
• Partner with software engineering teams to improve the reliability of applications and infrastructure.
• Work closely with product/ engineering teams to design scalable and robust systems.
• Manage, mentor, and grow a team of SREs.
• Promote SRE best practices and foster a culture of reliability and performance across the organization.
• Perform capacity planning and implement autoscaling solutions to handle traffic spikes.
• Optimize infrastructure and cloud costs while maintaining reliability and performance.
Skills & Qualifications:
Required Skills:
• Technical Expertise: o Experience with cloud platforms (AWS / Azure / GCP) and Kubernetes.
o Proficiency in Java o Expertise in distributed systems, databases, and load balancing.
o Understanding of metrics-driven approaches for system monitoring and alerting.
• Automation & CI/CD:
o Hands-on experience with CI/CD pipelines (e.g., Jenkins, Azure Pipelines etc).
o Skilled in automation frameworks and tools for infrastructure and application deployments.
o Proven track record in handling incidents, post-mortems, and implementing solutions to prevent recurrence.
• Strong people management and leadership skills with the ability to inspire and motivate teams.
• Excellent problem-solving and decision-making skills.
• Clear and concise communication, with the ability to translate technical concepts for non-technical stakeholders.
• Experience with database optimization, Kafka, or other messaging systems.
• Knowledge of autoscaling techniques
• Previous experience in an SRE, DevOps, or infrastructure engineering leadership role.
• Understanding of compliance and security best practices in distributed systems.
Resource Algorithm
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
We have sent an OTP to your contact. Please enter it below to verify.
Practice Java coding challenges to boost your skills
Start Practicing Java Nowchennai, tamil nadu, india
Salary: Not disclosed
hyderabad, telangana, india
Experience: Not specified
Salary: Not disclosed
hyderabad, telangana, india
Experience: Not specified
Salary: Not disclosed
pune, maharashtra, india
Salary: Not disclosed
22.5 - 32.5 Lacs P.A.
chennai, tamil nadu, india
Experience: Not specified
Salary: Not disclosed
chennai, tamil nadu, india
Experience: Not specified
Salary: Not disclosed
bengaluru
8.0 - 18.0 Lacs P.A.
hyderabad, telangana
Salary: Not disclosed
pune
8.0 - 12.0 Lacs P.A.