Jobs

Interviews
Job Alerts
Tools

Upskill and Grow with AI

Mock Interview Practice interviews in realistic simulations

Coding Practice Improve your coding skills with challenges

Certification Earn certifications to validate your skills

AI Learning Get trained with AI expert sessions

Career Path AI insights for smarter career decisions

AI Job Match Score AI-Powered Job Match Against Your Resume and Optimize Your Resume

Career Tools and Resources

Resume Builder Build Professional Resume with Ease

ATS Friendliness Check Check Resume Friendliness for Applicant Tracking Systems

Auto Apply Apply to hundreds of jobs on any platform effortlessly

Co-Pilot (Chrome Extension) Your AI Assistant for Seamless Browsing Efficiency

Interview Questions Streamline interviews with ready-to-use questions

Salaries Discover market-driven salary insights across skillsets and geographies

Companies Explore leading companies actively hiring talent
For Employers

Home
>
Jobs in hyderabad
>
S&P Global Market Intelligence
>
Associate Director, Platform Engineering

Associate Director, Platform Engineering

S&P Global Market Intelligence

10 - 20 years

30 - 45 Lacs

hyderabad

Posted:-1 days ago| Platform:

Apply

Skills Required

platform engineering powershell python github data analysis site reliability engineering docker devops linux splunk bash terraform agile aws infrastructure as code kubernetes azure

Work Mode

Work from Office

Job Type

Full Time

Job Description

Position summary

We are seeking a seasoned Senior Site Reliability Engineer (SRE) to join our team. You will be responsible for the bigpicture architecture, day-to-day operations, and continuous improvement of our production systems, ensuring their availability, performance, and resilience. This role is pivotal in blending cutting-edge observability and automation with proactive engineering practices.

Responsibilities

Design, implement, and maintain comprehensive observability solutions to track the health and performance of our systems.

Analyze observability data and explore AIOps methodologies to identify potential issues, predict failures, and proactively troubleshoot problems before they impact users.

Develop and implement alerts and notifications for critical events to ensure timely intervention.

Collaborate with development teams to design and implement solutions that enhance system resilience, partially through designing and executing chaos engineering experiments (e.g., using AWS FIS), to reduce downtime.

Analyze performance metrics to identify and resolve latency bottlenecks in our infrastructure.

Implement performance optimization techniques and tools to improve the overall responsiveness of our systems.

Work with development teams to ensure that new features and code changes do not introduce performance regressions.

Develop and maintain metrics dashboards to track key performance indicators (KPIs) for our critical systems.

Identify performance trends and anomalies that may indicate potential issues or areas for improvement.

Recommend and implement performance optimization strategies to enhance the overall efficiency of our systems.

Optimize resource utilization and minimize unnecessary expenditure on IT infrastructure.

Identify and implement cost-effective solutions to improve the efficiency of our IT operations, reducing TOIL.

Design and implement automated deployment and rollback procedures to mitigate risks associated with software updates.

Monitor the performance of new releases and address any issues that arise promptly.

Analyze root causes of incidents to identify and implement preventive measures to minimize their recurrence.

Document incident responses and communicate lessons learned to enhance our incident handling processes.

Requirements

Proficient in application and infrastructure observability; Splunk OpenTelemetry preferred.

A deep understanding and practical application of Site Reliability Engineering principles.

Ability to build and maintain a system and culture that supports and implements SLOs.

Experienced in production environments running in AWS.

Comfortable with Infrastructure as Code; Terraform is preferred.

Familiar with Docker & Kubernetes, specifically EKS & ECS.

Familiar with programming languages, with a strong preference for Python (for scripting, automation, and data analysis/AI).

Comfortable with CI/CD pipelines such as GitHub Actions or Azure DevOps.

Understanding of the application lifecycle.

Familiarity working in an agile environment.

Ability to review architecture designs, ensuring observability coverage, high availability, resilience, and disaster recovery principles.

Familiarity with Chaos Engineering principles and experience designing or running controlled experiments to test system resilience.

Demonstrable interest or experience in AIOps, including the application of AI/ML to operational data and familiarity with platforms like AWS Bedrock.

Excellent troubleshooting and problem-solving skills with a knack for identifying and resolving complex technical issues.

Ability to work independently and as part of a collaborative team, effectively communicating technical concepts to both technical and non-technical stakeholders.

A passion for maintaining high availability, performance, and reliability of critical systems in a fast-paced environment.

Maintain relationships with other disciplines and stakeholders.

Strong sense of ownership, urgency, and drive.

Potential participation in an on-call rotation.

Qualifications

Bachelor's degree in Computer Science, Information Technology, or a related field.?

10+ years of experience as a Site Reliability Engineer or equivalent in a similar role.?

Proven experience in monitoring, analyzing, and optimizing the performance of large-scale distributed systems in a cloud environment.

Proven experience withWindows or Linux production environments, including managing servers, operating systems, and network configurations within the cloud.

Proven scripting and automation skills, preferably Powershell, Bash or Python.

AWS certification preferred.

More Jobs at S&P Global Market Intelligence

Full Stack / ReactJS Software Developer

Gurugram

3 - 7 yrs

INR 5 - 9 Lacs

Solution Architect ( Java, Angular )

Hyderabad

15 - 20 yrs

INR 45 - 50 Lacs

Senior Associate, Independent Quality Team

Hyderabad

10 - 15 yrs

INR 30 - 35 Lacs

Quality Assurance Team Lead

Bengaluru

6 - 7 yrs

INR 8 - 12 Lacs

Vendor Operations Administrator (Technology)

Hyderabad

5 - 7 yrs

INR 7 - 9 Lacs

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

S&P Global Market Intelligence

Financial Services

New York

Login to

Please Verify Your Phone or Email

Confirm Action

Associate Director, Platform Engineering