Jobs

Interviews
Job Alerts
Tools

Upskill and Grow with AI

Mock Interview Practice interviews in realistic simulations

Coding Practice Improve your coding skills with challenges

Certification Earn certifications to validate your skills

AI Learning Get trained with AI expert sessions

Career Path AI insights for smarter career decisions

AI Job Match Score AI-Powered Job Match Against Your Resume and Optimize Your Resume

Career Tools and Resources

Resume Builder Build Professional Resume with Ease

ATS Friendliness Check Check Resume Friendliness for Applicant Tracking Systems

Auto Apply Apply to hundreds of jobs on any platform effortlessly

Co-Pilot (Chrome Extension) Your AI Assistant for Seamless Browsing Efficiency

Interview Questions Streamline interviews with ready-to-use questions

Salaries Discover market-driven salary insights across skillsets and geographies

Companies Explore leading companies actively hiring talent
For Employers

Home
>
Jobs in bengaluru
>
Futurestep Recruitment Services
>
Site Reliability Engineer

Site Reliability Engineer

Futurestep Recruitment Services

2 - 7 years

6 - 16 Lacs

bengaluru

Posted:2 months ago| Platform:

Apply

Skills Required

loki site reliability engineering observability grafana azure mimir tempo gcp cloud aws

Work Mode

Hybrid

Job Type

Full Time

Job Description

Site Reliability Engineer (SRE)

Job Title:

Skills required

Office location:

Experience Range:

Job Title:

Skills required:

Office location:

Experience Range:

Job Title:

Skills required:

Office location:

Experience Range:

What you will be doing:

This role will be an individual contributor responsible for building and finetuning the platform components for the Observability product. The candidate will work closely with the Lead engineer, performance team, data ingestion, platform DevOps and data visualization teams under Observability product. As a member of the platform team, the candidate needs to be able to support and maintain the applications onboarded to Grafana Observability, Ingestion and visualization written in PromQL, Log queries, etc., and monitoring technologies.

This position will preferably be based out of India GCC, Bangalore.

Key Responsibilities:

Lead technical support for applications and programs currently in production.
Analyze complex problems to determine solutions to problems to be implemented permanently into production.
Prepare for Production releases by ensuring appropriate alerts, dashboards, KB articles, Confluence pages and knowledge sharing are properly executed.
Ensures dashboards are being monitored daily to detect anomalies and corrections are shared with appropriate teams and team members.
Check that alerts are being responded to appropriately.
Ensures approvement agendas for services are being maintained and acted on with Development Engineering and DevOps Engineering partners. Experience in Observability and Monitoring initiatives as platform Engineer.
Troubleshoot platform issues and restore service by resolving customer-facing incidents
Development and implementation of build release pipelines with accountability for managing deployment schedules, issues, risks, and impediments.
Agile development experience with team member accountability for commitment and delivery each sprint.
Troubleshoot and implement corrections to problems associated with connectivity between the supported applications and the clients they serve
Provide technical guidance, in the diagnosis of issues as they arise in support of critical applications
Drive collaboration sessions among IT and business groups to facilitate optimal support and operation of the relevant applications
Provide Site Reliability Engineering techniques such as observability, alerting and performance tuning
Contribute to the design, implementation, and enhancement of critical applications
Perform proactive analysis and troubleshooting to predict and prevent production incidents
Define and contribute to monitoring capabilities for critical applications
Collaborate with key vendors on functional, performance and capacity improvements
Design and build tools to automate support and monitoring functions
Ensure that all implementations of observability meet the requirements prescribed by IT Services through the effective implementation or use of approved processes, methodologies, and deliverables.
Provide expertise and build solutions for observability applications as well as system integration with internal systems and external vendors.
Able to provide coding and technical direction to less experienced staff or develops highly complex original code.
Track infrastructure delivery and dependencies to implementation.

We are searching for someone with the following skills:

Experience with gathering and organizing large volume of data to use for instrumentation into an Enterprise Observability solution.
Experience with recommending baseline monitoring thresholds, and performance monitoring KPIs and SLAs.
Experience with installing agents, forwarders, APIs, performance monitoring alerts, dashboards, and data trend analysis.
Good Knowledge and understanding of Azure foundation components e.g. App GW, APIM, Virtual Network, NSG, Load Balancer, Azure VM etc. is required.
Team-oriented, positively contributing to team morale and willing to help.
Learning-Focused, finding ways to improve in their field and use positive constructive feedback to grow personally and professionally
Think strategically and proactively anticipate future problems, needs or changes in the work
Experience with Databases Azure SQL, PostgreSQL, MySQL, MongoDB, TSDB or similar databases.
Azure/GCP hands-on with details around pulling observability data from managed services
Golang/Python coding or from solutioning background with experience on SRE development and Open telemetry implementation
Deploying/managing and optimizing enterprise level observability platform for Grafana OSS products like Mimir,Loki,Tempo, Fluentbit/ Vector
Design and develop standard Grafana dashboards for critical metrics for various Azure/GCP services using the observability data
Experience must include at least one of the following languages: Java (required), Desired--Python, Go, node.js
Knowledge of monitoring tools such as Log Analytics, App Dynamics, Grafana, Prometheus, Splunk, and Sitescope
Experience in working with ServiceNow or similar Service Management tools
Familiarity with Cloud technologies in Azure, AWS, and Google Cloud
Experience on PCF, Docker, Kubernetes platform is required.
Experience with DevOps and CI/CD tools and processes is required.
Experience in high-performance and high-frequency data streaming and health confirmation techniques (using Kafka etc.) and handling large volume of batch data is strongly preferred
In-depth advanced knowledge of current monitoring tools
In-depth advanced knowledge of at least one major cloud platform and Service Container/Instance concepts
In-depth advanced knowledge of querying and inspection techniques for service and other types of logs
In-depth advanced knowledge of the full software development lifecycle and software development methodologies (Agile).
Strong ability to understand client expectations and to resolve issues that may affect service.
Strong ability to mentor, coach and train other application support engineers
Self-starter, with a demonstrated ability to learn beyond formal training with a strong aptitude for delivering quality products.

We believe the successful candidate has these qualifications and experience:

4-year degree (Computer Science, Information Systems, or relational functional field) and/or equivalent combination of education or work experience.
2-9 years of experience on integration engineering related to Observability/Monitoring framework with open source technologies such as Grafana, Mimir, Loki, Tempo, Fluentbit, Vector etc.,
Hands-on experience with Tools and Technology is preferred.
2-9 years of experience as a System Reliability Engineer is required.
Experience working with Open-source platforms and Open Telemetry libraries e.g. Grafana is preferred.

More Jobs at Futurestep Recruitment Services

Data Engineer

Chennai

7 - 10 yrs

INR 15 - 30 Lacs

HR Operations Specialist

Manesar

8.0 - 13.0 yrs

INR 12 - 19 Lacs

Local Case Intake Advisor - (Fluent in Arabic / French)

Bengaluru

2.0 - 7.0 yrs

INR 5 - 11 Lacs

Functional Analyst

Bengaluru

4.0 - 7.0 yrs

INR 0 - 2 Lacs

Software Engineer

Bengaluru

2.0 - 7.0 yrs

INR 0 - 2 Lacs

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now

Futurestep Recruitment Services

Human Resources & Recruitment

N/A

Before You Leave... Find Your Perfect Job!

Login to

Please Verify Your Phone or Email

Confirm Action

Contact Us

Site Reliability Engineer

Experience & Salary

Skills Required

Work Mode

Job Type

Job Description

Site Reliability Engineer (SRE)

Job Title:

Skills required

Office location:

Experience Range:

Job Title:

Skills required:

Office location:

Experience Range:

Job Title:

Skills required:

Office location:

Experience Range:

What you will be doing:

Key Responsibilities:

We are searching for someone with the following skills:

We believe the successful candidate has these qualifications and experience:

More Jobs at Futurestep Recruitment Services