Sr. Staff Site Reliability Engineer

10 years

0 Lacs

Posted:4 days ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

At SolarWinds, we’re a people-first company. Our purpose is to enrich the lives of the people we serve—including our employees, customers, shareholders, partners, and communities. Join us in our mission to help customers accelerate business transformation with simple, powerful, and secure solutions.

The ideal candidate thrives in an innovative, fast-paced environment and is collaborative, accountable, ready, and empathetic. We’re looking for individuals who believe they can accomplish more as a team and create lasting growth for themselves and others. We hire based on attitude, competency, and commitment. Solarians are ready to advance our world-class solutions in a fast-paced environment and accept the challenge to lead with purpose. If you’re looking to build your career with an exceptional team, you’ve come to the right place. Join SolarWinds and grow with us!

Your Role

Sr. Staff Site Reliability Engineer

You will lead reliability strategy, architecture, and execution across distributed systems, helping shape how SolarWinds ingests, stores, queries, and scales massive observability datasets. This includes owning ClickHouse production clusters, designing performance-optimized schemas, ensuring high availability, and driving automation around data platform operations.

Responsibilities

  • Own and operate ClickHouse infrastructure

    —including cluster provisioning, sharding, replication, performance tuning, storage optimization, and backup/restore automation.
  • Collaborate with engineering teams to shape data ingestion, storage, and query performance requirements for high-volume observability workloads.
  • Lead SRE initiatives around infrastructure reliability, SLAs/SLOs, observability, incident management, and post-incident learning.
  • Architect and implement scalable, cloud-native infrastructure using Kubernetes, Terraform, GitOps, and modern SRE practices.
  • Drive automation across provisioning, deployments, monitoring, and operational workflows.
  • Lead and mentor SRE team members, providing direction across distributed systems, reliability engineering, and data-platform operations.
  • Guide incident response for production issues, participate in on-call rotations, facilitate postmortems, and champion a culture of continuous improvement.
  • Establish and enforce best practices across monitoring, telemetry, capacity planning, security, and operational excellence.

Ideal Attributes

  • Deep customer orientation with a strong ownership mindset.
  • Experience influencing architecture and long-term technical direction.
  • Exceptional communication skills—able to translate complex technical topics to cross-functional stakeholders.
  • Bias for action, data-driven decision making, and problem-solving under pressure.
  • Collaborative, empathetic, and committed to the growth of the team.

Qualifications

Required:

  • Expert-level experience operating ClickHouse at scale

    —including performance tuning, schema design, cluster operations, replication, partitioning, RBAC, and storage optimization.
  • 10+ years designing, building, and maintaining large-scale SaaS infrastructure.
  • 8+ years hands-on experience with AWS and/or Azure using Terraform.
  • 5+ years deploying, scaling, and operating Kubernetes clusters in production.
  • Strong experience with data platform infrastructure and distributed systems.
  • Proficiency in Python or Go; solid skills in shell scripting and SQL.
  • Strong background in observability (metrics, logs, tracing), system health, and proactive monitoring.
  • Experience with SQL/NoSQL database technologies.
  • Experience with GitOps (Flux/ArgoCD), CI/CD, and automated deployment workflows.
  • Understanding of security operations, encryption, key management, and cloud security principles.
  • Demonstrated experience mentoring engineers and driving team-wide engineering excellence.

Nice to Have:

  • Experience with large-scale observability platforms, monitoring pipelines, or log analytics systems.
  • Experience with ClickHouse Keeper, tiered storage, or multi-cluster architectures.

SolarWinds is an Equal Employment Opportunity Employer. SolarWinds will consider all qualified applicants for employment without regard to race, color, religion, sex, age, national origin, sexual orientation, gender identity, marital status, disability, veteran status or any other characteristic protected by law.

All applications are treated in accordance with the SolarWinds Privacy Notice: https://www.solarwinds.com/applicant-privacy-notice

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You