Jobs

Interviews
Job Alerts
Tools

Upskill and Grow with AI

Mock Interview Practice interviews in realistic simulations

Coding Practice Improve your coding skills with challenges

Certification Earn certifications to validate your skills

AI Learning Get trained with AI expert sessions

Career Path AI insights for smarter career decisions

AI Job Match Score AI-Powered Job Match Against Your Resume and Optimize Your Resume

Career Tools and Resources

Resume Builder Build Professional Resume with Ease

ATS Friendliness Check Check Resume Friendliness for Applicant Tracking Systems

Auto Apply Apply to hundreds of jobs on any platform effortlessly

Co-Pilot (Chrome Extension) Your AI Assistant for Seamless Browsing Efficiency

Interview Questions Streamline interviews with ready-to-use questions

Salaries Discover market-driven salary insights across skillsets and geographies

Companies Explore leading companies actively hiring talent
For Employers

Home
>
Jobs in haryana
>
PayPay India
>
Site Reliability Engineer (SRE)

Site Reliability Engineer (SRE)

PayPay India

5 - 9 years

0 Lacs

haryana

Posted:3 months ago| Platform: Shine logo

Apply

Skills Required

python java go data structures algorithms nosql communication skills automation tools monitoring tools web technologies observability tools rds tidb container image management iac sredevops concepts disaster recovery strategies

Work Mode

On-site

Job Type

Full Time

Job Description

We are seeking individuals who can offer informed and unique perspectives, enjoy collaborating with cross-functional teams, and are continuously pushing boundaries to create reliable and scalable solutions and enhance user experiences. Your main responsibilities will include analyzing the current technologies utilized within the company, devising monitoring and notification tools to enhance observability and visibility. You will be tasked with ensuring system stability by proactively identifying failure scenarios and implementing solutions to reduce MTTR. Developing solutions to boost system performance with a strong emphasis on high availability, scalability, and resilience will be a key focus. You will also integrate telemetry and alerting platforms to monitor and enhance system reliability. It is essential to adhere to industry best practices for system development, configuration management, and deployment. Additionally, you will play a crucial role in facilitating seamless information flow between teams by documenting acquired knowledge. Staying current with modern technologies and trends will enable you to advocate for their incorporation into products if they bring value. In incident management, you will be involved in troubleshooting production issues, conducting root cause analysis (RCA), and actively sharing insights to enhance system reliability and internal knowledge. The ideal candidate should have experience in troubleshooting and optimizing high-performance microservices architectures running on Kubernetes and AWS in highly available production environments. A minimum of 5 years of experience in software development using languages such as Python, Java, Go, etc., with a strong foundation in data structures, algorithms, problem-solving, and complexity analysis is required. During the SRE selection process, a coding challenge will be presented. You should possess a curious and proactive nature in identifying performance bottlenecks, scalability issues, and resilience problem areas and be adept at resolving them. Familiarity with observability tools and data collection is essential. Knowledge of databases like RDS, NoSQL, distributed TiDB, etc., is preferred. Strong communication skills, a collaborative approach, and a proactive attitude to deliver results are highly valued. Embracing challenges and seeing them through to completion is a key attribute. Preferred qualifications include expertise in container image management and optimization, experience in large distributed system architecture and capacity planning, understanding of Infrastructure as Code (IaC), automation tools like Terraform, CloudFormation, etc., background in SRE/DevOps concepts and implementation, proficiency in managing monitoring tools such as CloudWatch, VictoriaMetrics, Prometheus, and reporting with Snowflake and Sigma. In-depth knowledge of web technologies like CloudFront, Nginx, etc., and experience in designing, implementing, or maintaining disaster recovery strategies and multi-region architecture for high availability, resilience, and business continuity across critical systems are advantageous. Proficiency in Japanese and English languages is a plus, although language skills are not mandatory as we have professional translators available. **Working Conditions** **Employment Status:** Full Time **Office Location:** Gurugram (WeWork) The development center requires your presence at the Gurugram office to help establish a strong core team.,

More Jobs at PayPay India

Android Engineer

Gurugram, Haryana, India

5.0 - 5.0 yrs

Salary: Not disclosed

Senior Product Manager

Gurugram, Haryana, India

6.0 - 6.0 yrs

Salary: Not disclosed

Backend Engineer

Gurugram, Haryana, India

3.0 - 3.0 yrs

Salary: Not disclosed

Quality Assurance Automation Engineer

Gurugram, Haryana, India

3.0 - 3.0 yrs

Salary: Not disclosed

Data Engineer

Gurugram, Haryana, India

5.0 - 5.0 yrs

Salary: Not disclosed

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

PayPay India

Login to

Please Verify Your Phone or Email

Confirm Action

Site Reliability Engineer (SRE)