Site Reliability AI Engineer

7 years

0 Lacs

Posted:2 months ago| Platform: Linkedin logo

Apply

Work Mode

Remote

Job Type

Full Time

Job Description

Observability/AIOps (5 to 8 yrs exp).

As an SRE with Observability focus you will:

● Explore the complex IT estates of our clients to understand their observability/AIOps opportunities, identify the areas to improvise

● Collaborate to architect unified observability and AIOps strategies which employ leading AI technology

● Implement enterprise observability/AIOps technology and processes

● Amplify observability/AIOps outcomes by accelerating adoption across technology and business organizations

Responsibilities include:

● Architect observability solutions to address the gaps in order to reduce organizational MTTD and MTTR objectives.

● Developing API-driven micro-services that combine into large and complex platforms

● Planning and executing highly parallel distributed object storage transformations and migrations

● Maintaining automated test suites using CI/CD tools

● Participating in collaborative projects with small software engineering teams

● Develop automation, processes, and tools designed to make our services simpler and more robust

● Participate in troubleshooting, capacity planning and analysis, performance analysis activities

● Advise management on service onboarding strategies and execution

Experience in architecting complex IT solutions

● Understanding of observability dimensions(Metrics, logs, traces)

● Excellent communication and stakeholder management skills

● Development experience, comfortable working in multiple languages(Python, Java, Go and Ruby a plus)

● Experience working in collaborative coding environments (peer review, continuous integration, etc)

● 7+ years of application development

● Experience working in distributed remote teams across multiple time zones

● Experience in large scale operations environments

● 7+ years of experience with Linux/Unix development or systems administration

● 3+ years of experience with networking systems and technologies

● Deep understanding of network performance and security

● Ability to identify tasks which require automation and implement required automation

● Configuration Management tools experience with Puppet, Chef, SaltStack

● Hands-on operational experience in a high-volume or critical production service environment - distributed systems, capacity planning, continuous deployment

● Bachelors degree

We have opportunities to work with and learn:

● Object Storage - Minio/S3/etc

● Data Collection - OpenTelemetry/Grafana Alloy/etc

● Message Bus - Kafka/NSQ/etc

● Scaling Databases - Druid/Clickhouse/Cassandra/etc

● Relational database technologies at large scale - Timescale/Vitess/Postgres/etc

● Scheduling & Orchestration - Kubernetes/OpenShift/Docker

● Cloud Platforms - AWS/Azure


Immediate- serving Notice preferred.

Mock Interview

Practice Video Interview with JobPe AI

Start Java Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Java Skills

Practice Java coding challenges to boost your skills

Start Practicing Java Now

RecommendedJobs for You