Posted:1 day ago|
Platform:
Remote
Full Time
Support role - Immediate joiner only. Site Reliability Engineer (SRE) Were looking for a Site Reliability Engineer (SRE) to join our growing team. In this role, youll be responsible for ensuring the reliability, availability, and performance of our systems and services. Youll bridge the gap between development and operations, with a strong focus on technical support, automation, monitoring, and incident response. In short: you will keep systems healthy, respond fast when theyre not, fix problems at the root, prevent future issues and communicate clearly. Responsibilities: Monitoring and Alerting: Maintain and improve system monitoring tools (Grafana, NewRelic). Set up smart, actionable alerts to detect outages or performance issues early. Monitor live systems for signs of security breaches or vulnerabilities. Incident Response: Be on-call to respond to live incidents. Quickly triage and mitigate outages or system degradation. Communicate status updates clearly to internal teams. Troubleshooting and Root Cause Analysis: Debug live systems under pressure. Collect logs, metrics, traces to understand issues. Lead or contribute to postmortem analysis and documentation after incidents. Capacity Planning and Performance Management: Monitor and predict system capacity and scaling needs. Ensure that resources are properly allocated and scaled up if necessary. Maintaining Operational Runbooks: Keep detailed, updated playbooks and runbooks for common incidents and tasks. Cloud & Infrastructure: manages cloud infrastructure (AWS). Manage environment configurations for development, staging, and production. CI/CD Pipelines: Design, implement, and maintain robust CI/CD pipelines to automate the build, test, and deployment processes. Release & Operations: Coordinate with the development team on production releases, patches, and live updates. Work closely with development teams to understand application architecture and deployment needs. Qualifications: Proven experience with Linux and cloud computing technologies, preferably AWS Proficiency in at least one programming/scripting language (Java, Python, Bash) Understanding of containerization and orchestration (Docker, Kubernetes, Terraform). Familiarity with networking fundamentals (TCP/IP, DNS, Load Balancing, firewalls). Experience with database administration and queries: NoSQL/SQL (Redis, PostgreSQL, MongoDB) Experience with observability tools (Grafana, New Relic) Skill in infrastructure as code (Terraform, CloudFormation, Ansible). Experience in a continuous integration / continuous delivery environment Experience with HTTP based services, networking concepts (e.g., TCP/IP, DNS) Strong problem-solving, troubleshooting, debugging skill and communication skills Collaboration mindset: work closely with developers, product managers, and support teams Attention to detail and ownership mentality
Apex Systems
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Mock Interview
Gurugram, Haryana, India
Salary: Not disclosed
Bengaluru
18.0 - 25.0 Lacs P.A.
Hyderabad, Telangana, India
Salary: Not disclosed
50.0 - 65.0 Lacs P.A.
5.84 - 8.915 Lacs P.A.
2.4 - 5.8 Lacs P.A.
Greater Chennai Area
Salary: Not disclosed
Hyderabad, India
30.0 - 45.0 Lacs P.A.
Hyderābād
6.5 - 10.0 Lacs P.A.
14.0 - 19.0 Lacs P.A.