SRE Observability Specialist

2 years

0 Lacs

Posted:4 days ago| Platform: Linkedin logo

Apply

Work Mode

Remote

Job Type

Full Time

Job Description

About CloudRaft

CloudRaft is a premier consulting company specializing in AI, cloud-native solutions, observability, and platform engineering. We partner with global startups and enterprises to solve their complex problems. Learn more about us at www.cloudraft.io.

The Opportunity

We're seeking a passionate SRE with expertise in Observability to join our dynamic team. This is your chance to work as a founding engineer and be part of building a rocket ship! The ideal candidate has 2+ years of experience managing production and building large-scale with experience in the full stack observability products particularly in open source stacks and have been of full-stack observability. CloudRaft is a commercial support partner for various open source observability projects. It gives you an opportunity to contribute to upstream projects such as Prometheus, Thanos, Grafana Mimir, Loki and OpenTelemetry.

Location:

Remote

What We're Looking For

  • Observability Expertise:
    • Expert in implementing and integrating observability solutions in services using products like Vector, Fluentd, OpenTelemetry, Prometheus, Thanos, Jaeger, Elasticsearch, and Grafana
    • Deep understanding of service reliability, KPIs, and metrics
    • Good understanding of the Observability ecosystem in designing large scale systems using open source products such as LGTM stack, Victoria Metrics, Thanos, Clickhouse or SigNoz
  • Kubernetes and Cloud Experience:
    • Professional experience running Kubernetes in on-premises and cloud environments
    • Hands-on production experience in designing and managing Kubernetes clusters
    • Good understanding of on-premises and cloud platforms
  • Programming experience:
    • You should have programming knowledge in languages like Java, Golang, Typescript etc.
    • Able to write libraries and contribute to open source
    • Ready to start immediately and make an impact from day one
    • At least 2+ years of experience in SRE, particularly in implementing Observability at scale
    • Proficiency in programming
    • Strong troubleshooting skills for resolving system issues in production environments
    • Implementation experience with SRE concepts such as SLIs and SLOs
    • Contribution in open source observability products
    • Ability to represent the organization, collaborate with, and coach customer teams
    • Passion for sharing knowledge through technical writing and speaking at community events and conference

Qualifications

  • Bachelor's degree in Computer Science, IT, or a related field
  • Expert understanding of Prometheus, OpenTelemetry, Datadog, Grafana, alerting, and incident management systems
  • Programming skills in any modern programming language
  • Experience with Infrastructure as Code
  • Excellent problem-solving and communication skills
  • Product mindset and customer empathy are a big plus

Benefits

  • Competitive salary
  • Premium health insurance and various health & wellness benefits
  • Opportunity to work on cutting-edge technologies
  • Collaborative and supportive work environment
  • Chance to make a real impact on the company's success

If you're ready to take on this exciting challenge and help shape the future of observability and open source, we want to hear from you!

Mock Interview

Practice Video Interview with JobPe AI

Start Java Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Java Skills

Practice Java coding challenges to boost your skills

Start Practicing Java Now

RecommendedJobs for You