Posted:2 weeks ago|
Platform:
On-site
Full Time
We’re looking for a hands-on, self-directed Senior DevOps Engineer to join our fast-paced startup. You’ll be the first line of defense for production issues, architect robust observability systems, and improve deployment and testing practices. If you thrive in startup environments, enjoy taking ownership, and are comfortable in modern JS/TS stacks, we’d love to meet you. Top Outcomes – First 3 Months Implement a reliable observability stack: Leverage Grafana, CloudWatch, and OpenTelemetry within our Node.js and TypeScript codebase. Be on top of alerts and issues: Monitor, triage, fix or escalate production issues with traceability and follow-up. Reduce system noise: Begin reducing the frequency and volume of unexpected errors. Top Outcomes – First 12 Months Improve test coverage: Ensure better code quality and proactively catch regressions. Own DevOps workflows: Deploy, debug, and maintain infrastructure health autonomously. Become a core team member: Handle incidents independently and support the evolution of our infra/dev culture. Key Performance Indicators (KPIs) Leading Indicators: Number of alerts and incidents triaged Trace IDs investigated and logged Bugs found early and resolved Tickets opened/closed efficiently Reduced volume of unhandled or duplicate errors Lagging Indicators: Production uptime and stability % fixes resolved without handoff Number of tests added Reduction in recurring or duplicate issues Core Responsibilities Observability & Alerting Maintain and enhance Grafana dashboards Integrate and manage CloudWatch alarms and OpenTelemetry traces Ensure traceability across all systems (CRM, APIs, webhooks, workflows) Issue Response & Triage Act as first responder for production issues during working hours Troubleshoot, escalate with full context, and coordinate incident response Infrastructure Maintenance Improve deployment workflows and monitor resource usage Maintain the health of critical subsystems (queues, sync jobs, memory/cpu) Testing & QA Add and improve test coverage once baseline reliability is achieved Build confidence in deployments through automated testing and regression checks Candidate Profile Strong experience with Node.js, TypeScript, and React Deep knowledge of AWS, Grafana, OpenTelemetry, and CloudWatch Prior Startup Experience Preferred Clear, proactive communicator with a bias toward ownership Available 1:30 AM to 10:30 PM IST 5 days/week for on-call responsibilities Bonus: Experience reviewing pull requests and deploying code regularly Immediate Tasks Review and phase-implement an internal RFC for observability Refine and own Grafana dashboards; implement meaningful alerts Ensure consistent trace ID usage throughout the codebase Improve logging and tracing to increase debuggability Monitor and respond to production errors daily Investigate, fix, or escalate recurring system issues Show more Show less
WeAssemble
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
My Connections WeAssemble
Mumbai Metropolitan Region
Experience: Not specified
Salary: Not disclosed
Mumbai Metropolitan Region
Experience: Not specified
Salary: Not disclosed