Java Specialist

5 - 10 years

0 Lacs

Posted:2 months ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Contractual

Job Description

Senior SRE (Engineering & Reliability)

Job Summary:

As an SeniorSRE, you will play a pivotal role in establishing and implementing SRE practices, leading a team of engineers, and driving automation, monitoring, and incident response strategies. This position combines software engineering and systems engineering expertise to build and maintain high-performing, reliable systems.

Experience: 5-10 years

Key Responsibilities:

Reliability & Performance:

• Lead efforts to maintain high availability and reliability of critical services.

• Define and monitor SLIs, SLOs, and SLAs to ensure business requirements are met.

Management & Response:

• Establish and improve incident management processes and on-call rotations.

• Lead incident response and root cause analysis for high-priority outages.

• Drive post-incident reviews and ensure actionable insights are implemented.

Automation & Tooling:

• Develop and implement automated solutions to reduce manual operational tasks.

Prometheus, Grafana, Elastic APM).

• Optimize CI/CD pipelines for seamless deployments.

Collaboration:

• Partner with software engineering teams to improve the reliability of applications and infrastructure.

• Work closely with product/ engineering teams to design scalable and robust systems.

Team Building:

• Manage, mentor, and grow a team of SREs.

• Promote SRE best practices and foster a culture of reliability and performance across the organization.

Capacity Planning & Cost Optimization:

• Perform capacity planning and implement autoscaling solutions to handle traffic spikes.

• Optimize infrastructure and cloud costs while maintaining reliability and performance.

Skills & Qualifications:

Required Skills:

• Technical Expertise: o Experience with cloud platforms (AWS / Azure / GCP) and Kubernetes.

Hands-on knowledge of infrastructure-as-code tools like Terraform /Helm/ Ansible.

o Proficiency in Java o Expertise in distributed systems, databases, and load balancing.

Monitoring & Observability:

Proficient with tools like Prometheus, Grafana,, Elastic APM, or New relic.

o Understanding of metrics-driven approaches for system monitoring and alerting.

• Automation & CI/CD:

o Hands-on experience with CI/CD pipelines (e.g., Jenkins, Azure Pipelines etc).

o Skilled in automation frameworks and tools for infrastructure and application deployments.

• Incident Management:

o Proven track record in handling incidents, post-mortems, and implementing solutions to prevent recurrence.

Leadership & Communication Skills:

• Strong people management and leadership skills with the ability to inspire and motivate teams.

• Excellent problem-solving and decision-making skills.

• Clear and concise communication, with the ability to translate technical concepts for non-technical stakeholders.

Preferred Qualifications:

• Experience with database optimization, Kafka, or other messaging systems.

• Knowledge of autoscaling techniques

• Previous experience in an SRE, DevOps, or infrastructure engineering leadership role.

• Understanding of compliance and security best practices in distributed systems.

Mock Interview

Practice Video Interview with JobPe AI

Start Java Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Java Skills

Practice Java coding challenges to boost your skills

Start Practicing Java Now
Resource Algorithm logo
Resource Algorithm

Technology / Analytics

San Francisco

RecommendedJobs for You

chennai, tamil nadu, india

pune, maharashtra, india

chennai, tamil nadu, india