Site Reliability Engineer (SRE)

8.0 - 10.0 years

18 - 25 Lacs

Bengaluru

Posted:1 day ago| Platform: Naukri logo

Apply Now

Skills Required

Java Docker Redis MongoDB Kubernetes Postgresql

Work Mode

Remote

Job Type

Full Time

Job Description

Support role - Immediate joiner only. Site Reliability Engineer (SRE) Were looking for a Site Reliability Engineer (SRE) to join our growing team. In this role, youll be responsible for ensuring the reliability, availability, and performance of our systems and services. Youll bridge the gap between development and operations, with a strong focus on technical support, automation, monitoring, and incident response. In short: you will keep systems healthy, respond fast when theyre not, fix problems at the root, prevent future issues and communicate clearly. Responsibilities: Monitoring and Alerting: Maintain and improve system monitoring tools (Grafana, NewRelic). Set up smart, actionable alerts to detect outages or performance issues early. Monitor live systems for signs of security breaches or vulnerabilities. Incident Response: Be on-call to respond to live incidents. Quickly triage and mitigate outages or system degradation. Communicate status updates clearly to internal teams. Troubleshooting and Root Cause Analysis: Debug live systems under pressure. Collect logs, metrics, traces to understand issues. Lead or contribute to postmortem analysis and documentation after incidents. Capacity Planning and Performance Management: Monitor and predict system capacity and scaling needs. Ensure that resources are properly allocated and scaled up if necessary. Maintaining Operational Runbooks: Keep detailed, updated playbooks and runbooks for common incidents and tasks. Cloud & Infrastructure: manages cloud infrastructure (AWS). Manage environment configurations for development, staging, and production. CI/CD Pipelines: Design, implement, and maintain robust CI/CD pipelines to automate the build, test, and deployment processes. Release & Operations: Coordinate with the development team on production releases, patches, and live updates. Work closely with development teams to understand application architecture and deployment needs. Qualifications: Proven experience with Linux and cloud computing technologies, preferably AWS Proficiency in at least one programming/scripting language (Java, Python, Bash) Understanding of containerization and orchestration (Docker, Kubernetes, Terraform). Familiarity with networking fundamentals (TCP/IP, DNS, Load Balancing, firewalls). Experience with database administration and queries: NoSQL/SQL (Redis, PostgreSQL, MongoDB) Experience with observability tools (Grafana, New Relic) Skill in infrastructure as code (Terraform, CloudFormation, Ansible). Experience in a continuous integration / continuous delivery environment Experience with HTTP based services, networking concepts (e.g., TCP/IP, DNS) Strong problem-solving, troubleshooting, debugging skill and communication skills Collaboration mindset: work closely with developers, product managers, and support teams Attention to detail and ownership mentality

Mock Interview

Boost Confidence & Sharpen Skills

Start Java Interview Now
Apex Systems
Apex Systems

Information Technology and Services

Atlanta

1000-5000 Employees

16 Jobs

    Key People

  • Mike McCauley

    CEO
  • Alice McGowan

    CFO

RecommendedJobs for You