Jobs

Interviews
Job Alerts
Tools

Upskill and Grow with AI

Mock Interview Practice interviews in realistic simulations

Coding Practice Improve your coding skills with challenges

Certification Earn certifications to validate your skills

AI Learning Get trained with AI expert sessions

Career Path AI insights for smarter career decisions

AI Job Match Score AI-Powered Job Match Against Your Resume and Optimize Your Resume

Career Tools and Resources

Resume Builder Build Professional Resume with Ease

ATS Friendliness Check Check Resume Friendliness for Applicant Tracking Systems

Auto Apply Apply to hundreds of jobs on any platform effortlessly

Co-Pilot (Chrome Extension) Your AI Assistant for Seamless Browsing Efficiency

Interview Questions Streamline interviews with ready-to-use questions

Salaries Discover market-driven salary insights across skillsets and geographies

Companies Explore leading companies actively hiring talent
For Employers

Home
>
Jobs in pune
>
PubMatic
>
Site Reliability Engineer

Site Reliability Engineer

PubMatic

3 - 6 years

3 - 6 Lacs

pune maharashtra india

Posted:12 hours ago| Platform: Foundit logo

Apply

Skills Required

software programming

Work Mode

On-site

Job Type

Full Time

Job Description

As an SRE Engineer, you will be responsible for ensuring the seamless operation and optimal performance of large-scale distributed software applications. Your role revolves around maintaining a robust and high-performing environment, contributing to the reliability of our services, and innovating solutions to guarantee 24/7 availability. You will leverage your technical expertise to maintain a seamless experience for our users while upholding the highest standards of operational excellence.

Responsibilities:

Monitoring and Alerting:

Review existing monitoring tools and set up new systems to track system performance and key metrics.

Incident Management:

Monitor alerts and logs to promptly identify incidents or anomalies.
Prioritize incidents based on severity and potential impact on stability and reliability.
Engage in incident resolution, applying necessary fixes and mitigations to restore normal operations.

On-Call Responsibilities:

Organize on-call schedules to ensure 24/7 coverage for incident response.
Respond to alerts, troubleshoot issues, and coordinate with NOC and Engineering teams for incident resolution.
Conduct post-incident reviews to identify root causes, learn from incidents, and implement preventive measures.

Automation and Tooling:

Review and build new automation scripts and tools to streamline tasks, enhance efficiency, and reduce manual errors.
Regularly update and maintain monitoring, deployment, and incident management tools.

Performance Optimization:

Analyze application performance using profiling and monitoring tools to identify bottlenecks and areas for improvement.
Work on optimizations, infrastructure upgrades, and architectural improvements to enhance system performance and efficiency.

Capacity Planning and Scaling:

Monitor resource utilization and trends to predict capacity needs and plan for scaling.
Scale resources (servers and databases) based on usage patterns and anticipated growth.
Automate the sizing process to ensure efficiency.

Disaster Recovery and Redundancy:

Develop and maintain disaster recovery plans to ensure business continuity.
Implement redundancy and failover strategies to minimize downtime and maintain service availability during failures.

Knowledge Sharing and Documentation:

Create and maintain comprehensive documentation for configurations, procedures, incidents, and best practices.
Foster a culture of knowledge sharing within the team through regular sessions and training programs.

Feedback Loop and Continuous Improvement:

Collect feedback from incidents, post-mortems, and NOC/Dev team interactions to identify areas for improvement.
Continuously iterate on processes, tools, and systems based on feedback to drive continuous improvement.

Collaboration and Communication:

Collaborate closely with Engineering and DC/NOC teams to align goals and priorities.
Ensure open communication within the team and with stakeholders, providing regular updates on incidents, progress, and initiatives.

Requirements:

Bachelor's degree in Computer Science or related disciplines.
3+ years of experience in software application/product support.
Proficiency in programming using Go, Shell, or Python scripting languages.
Experience in technical engineering (preferred).
A proactive approach to identifying problems, performance bottlenecks, and areas for improvement.
Strong knowledge of Networking, Database (MySQL), Linux System concepts, and experience in debugging and analyzing core dumps.
Hands-on experience with monitoring and observability tools like Grafana, Nagios, Influx, ELK, etc.
Familiarity with orchestration tools like Docker, Grafana, and incident management systems like Zenduty.
Excellent communication and collaboration skills with the ability to work effectively across teams.
Self-motivated with a positive mindset to examine and solve incidents.

More Jobs at PubMatic

Principal Software Engineer, AI-Powered Advertising Agents

Pune

8 - 10 yrs

INR 22 - 37 Lacs

Senior Software Engineer - Data Analytics & AI Solutions

Pune

3 - 6 yrs

INR 15 - 22 Lacs

Accounts Executive (On Contract)

Pune, Maharashtra

Experience: Not specified

Salary: Not disclosed

Solutions Engineer DSPs

Gurugram

1.0 - 5.0 yrs

INR 3 - 7 Lacs

Senior Software Engineer C/C++ Linux/Unix, Cloud

Pune

3.0 - 8.0 yrs

INR 5 - 10 Lacs

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now

PubMatic

Ad Tech / Digital Marketing

New York

Login to

Please Verify Your Phone or Email

Confirm Action

Site Reliability Engineer