Site Reliability Engineer (SRE)

5 - 9 years

0 Lacs

Posted:1 day ago| Platform: Shine logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Role Overview: You will be part of a dynamic team that values unique perspectives and collaboration to develop reliable and scalable solutions that enhance user experiences. Your role will involve analyzing current technologies, improving system stability, enhancing system performance, integrating telemetry platforms, advocating for modern technologies, and participating in incident management to ensure system reliability. Key Responsibilities: - Analyze current technologies and develop monitoring tools to enhance observability - Ensure system stability by proactively addressing failure scenarios - Develop solutions to improve system performance with a focus on availability, scalability, and resilience - Integrate telemetry and alerting platforms to enhance system reliability - Implement best practices for system development, configuration management, and deployment - Document knowledge to facilitate seamless information flow between teams - Stay updated on modern technologies and trends to recommend valuable inclusions in products - Participate in incident management, troubleshoot production issues, conduct root cause analysis, and share lessons learned Qualifications Required: - Experience troubleshooting and tuning microservices architectures on Kubernetes and AWS - 5+ years of software development experience in Python, Java, Go, etc, with strong fundamentals in data structures and algorithms - Curiosity and proactiveness in identifying performance bottlenecks and scalability issues - Familiarity with observability tools and data gathering - Knowledge of databases such as RDS, NoSQL, distributed TiDB, etc - Excellent communication skills, collaborative attitude, and proactive approach - Strong coding skills and problem-solving abilities - Experience with container image management, distributed system architecture, and capacity planning - Understanding of IaC, automation tools, and cloud technologies - Background in SRE/DevOps concepts and implementation - Proficiency in managing monitoring tools and reporting systems - Knowledge of web technologies and disaster recovery strategies - Language skills in Japanese and English is a plus (Note: Additional details of the company were not provided in the job description),

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You