Site Reliability Engineer

9 years

0 Lacs

Posted:19 hours ago| Platform: Linkedin logo

Apply

Work Mode

Remote

Job Type

Full Time

Job Description

About The Job

Devo, the cloud-native logging and security analytics company, empowers security and operations teams to maximize the value of all their data. Only the Devo platform delivers the powerful combination of real-time visibility, high-performance analytics, scalability, multitenancy, and low TCO crucial for monitoring and securing business operations as enterprises accelerate their shift to the cloud.Headquartered in Cambridge, Mass., Devo is backed by Insight Partners, Georgian, and Bessemer Venture Partners. Learn more at www.devo.com.Devo security products team is developing the world’s first and only Autonomous SOC to revolutionize the security industry. Candidates will be working on the most cutting-edge security technology when it comes to autonomous and automated threat detections, behavioral analysis, investigations, and hunts.We are looking for an accomplished Site Reliability Engineer to champion the observability & monitoring systems of our AI integrated ASOC platform and its associated products. An ideal candidate should possess a solid understanding of SDLC and agile processes, be proficient in automated testing, and demonstrate extensive prior experience in the software testing field.What You’ll Be Doing
  • Lead the design and implementation of observability and monitoring systems for Devo’s data analytics, SIEM, and AI platforms, ensuring high reliability, availability, and performance through tools like Prometheus and Grafana.
  • Develop and maintain comprehensive automation for operational tasks (e.g., using Kubernetes, Python, Terraform, or ArgoCD Workflows) to resolve common incidents, streamline deployments, and enhance the scalability of data analytics infrastructure.
  • Collaborate with cross-functional teams to define and prioritize Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Service Level Agreements (SLAs) aligned with Devo’s service expectations.
  • Champion proactive incident management by creating automated alerts for Sev1/Sev2 incidents, minimizing customer impact, and reducing Mean Time to Detect (MTTD) and Mean Time to Resolve (MTTR).
  • Conduct thorough postmortem analyses and root cause analyses (RCAs) to identify recurring issues, document findings, and implement solutions to prevent future incidents, fostering continuous improvement.
  • Drive secure and gradual deployment practices for data analytics features using tools like Argo Workflows, ensuring robust testing (beyond UAT) and fail-fast reporting to maintain service reliability.
  • Perform capacity planning and performance tuning to support the scalability of Devo’s data analytics infrastructure, aligning with quarterly business reviews (QBRs) to ensure predictability and accountability.
  • Promote a collaborative culture between engineering, Observability, MonOps, and CloudOps teams, ensuring reliability is a shared responsibility across Devo’s data analytics and security operations.
  • Contribute to transparent communication and reporting practices, keeping users informed about service updates, issues, and resolutions.
  • Proactively research, advocate, and implement cutting-edge technologies to enhance automation, observability, and reliability.
Candidates can expect the possibility of working night shifts, weekends, and public holidays.
What We Need To See
  • 9+ years of solid experience in Site Reliability Engineering, DevOps, with a focus on large-scale, data-intensive systems.
  • Bachelor’s degree in Computer Science, Engineering, or a related field (or equivalent experience)
  • Proven expertise in designing and managing observability and monitoring systems (e.g., Prometheus, Grafana, Kubernetes) for complex, distributed environments.
  • Working knowledge of any cloud service provider(preferably AWS)
  • Extensive experience working with remote and cross-functional teams in an agile, data-driven environment.
  • Solid understanding of the Software Development Life Cycle (SDLC) and agile methodologies, with experience in defining SLOs, SLIs, and SLAs.
  • Experience working on Docker and Kubernetes
  • Proficient in Python, Infrastructure as Code (IaC) tools (e.g., Terraform, Ansible), and CI/CD pipelines ( GitLab CI, ArgoCD Workflows).
  • Strong automation skills, with experience scripting solutions to resolve recurring incidents and operational tasks.
  • A self-starter with a “get things done” mindset, capable of working autonomously with minimal supervision.
  • Expertise in cybersecurity or a data lake platform is a strong plus.
  • Exceptional analytical and problem-solving skills, with experience debugging complex issues (e.g., dependency conflicts, performance bottlenecks) in production environments.
  • Passion for building reliable, scalable systems with a proactive “prevent failure” mentality.
  • Experience with performance monitoring and capacity planning tools (e.g., Locust, Prometheus) to support scalable infrastructure.
  • Familiarity with secure deployment practices and automated testing frameworks for fail-fast validation.
Why work at Devo?
  • Focus on Security and Data Analytics: If you’re passionate about security operations, data analytics, or enterprise-level IT infrastructure, we will offer you a chance to be part of a platform that helps organizations monitor and secure their systems in an increasingly digital world. You will have the opportunity to work with innovative products that solve real-world challenges.
  • Career growth: You’ll join a company where we value our people and provide the tremendous opportunities that come with a hyper-growth organization. To grow as a professional our development programs include:
  • Company-paid job-related technical certifications.
  • Personal development plans based on career paths.
  • Full support for internal job movements as part of career development.
  • Work-Life Balance: We promote a healthy work-life balance with flexible working conditions, including remote work opportunities.
  • Multicultural environment: With offices and clients globally, we offer a chance to work in a multicultural environment, giving our employees international exposure and the opportunity to collaborate across regions.
Comprehensive benefits:
  • Medical health insurance: at Devo, we believe in taking care of not just our employees, but also their families
  • Life insurance.
  • Meal vouchers
  • Employee referral program — get a bonus for helping friends get jobs at Devo!
  • Employee Stock Option Plan

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now
Devo logo
Devo

Information Technology

Cambridge

RecommendedJobs for You

noida, uttar pradesh, india

bhubaneswar, odisha, india

kolkata, west bengal, india