Site Reliability Engineer - Database

4 - 8 years

0 Lacs

Posted:5 days ago| Platform: Shine logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Role Overview: PhonePe Limited is looking for a skilled Site Reliability Engineer - Database with 4 to 8 years of experience to join their team. As a Site Reliability Engineer, you will be responsible for the design, provisioning, and lifecycle management of large-scale MySQL/Galera multi-master clusters across multiple geographic locations. Your role will involve ensuring the resilience, scalability, and performance of the distributed, high-volume database infrastructure while driving strategic improvements to the infrastructure. Key Responsibilities: - Lead the design, provisioning, and lifecycle management of large-scale MySQL/Galera multi-master clusters across multiple geographic locations. - Develop and implement database reliability strategies, including automated failure recovery and disaster recovery solutions. - Investigate and resolve database-related issues, including performance problems, connectivity issues, and data corruption. - Own and continuously improve performance tuning, including query optimization, indexing, and resource management, security hardening, and high availability of database systems. - Standardize and automate database operational tasks such as upgrades, backups, schema changes, and replication management. - Drive capacity planning, monitoring, and incident response across infrastructure. - Proactively identify, diagnose, and resolve complex production issues in collaboration with the engineering team. - Participate in and enhance on-call rotations, implementing tools to reduce alert fatigue and human error. - Develop and maintain observability tooling for database systems. - Mentor and guide junior SREs and DBAs, fostering knowledge sharing and skill development within the team. Qualifications Required: - Expertise in Linux systems administration, scripting (Bash/Python), file systems, disk management, and debugging system-level performance issues. - 4+ years of hands-on experience in MySQL database administration in large-scale, high-availability environments. - Deep understanding of MySQL internals, InnoDB storage engine, replication mechanisms (async, semi-sync, Galera), and tuning parameters. - Proven experience managing 100+ production clusters and databases larger than 1TB in size. - Hands-on experience with Galera clusters is a strong plus. - Familiarity with Infrastructure-as-Code tools like Ansible, Terraform, or similar. - Experience with observability tools such as Prometheus, Grafana, or Percona Monitoring & Management. - Exposure to other NOSQL (e.g., Aerospike) will be a plus. - Experience working in on-premise environments is highly desirable. (Note: The additional details of the company were not present in the provided job description.),

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You