Get alerts for new jobs matching your selected skills, preferred locations, and experience range. Manage Job Alerts
3.0 - 7.0 years
0 Lacs
pune, maharashtra
On-site
Sarvaha is seeking a skilled Observability Engineer with at least 3 years of experience to assist in the design, deployment, and scaling of monitoring and logging infrastructure on Kubernetes. As part of this role, you will be instrumental in establishing end-to-end visibility in cloud environments by managing Petabyte data scales, aiding teams in improving reliability, early anomaly detection, and promoting operational excellence. You will be responsible for configuring and overseeing observability agents on AWS, Azure & GCP, utilizing Infrastructure as Code (IaC) techniques like Terraform, Helm & GitOps for automating the deployment of the Observability stack. Additionally, you should have experience working with various language stacks such as Java, Ruby, Python, and Go, instrumenting services using OpenTelemetry, integrating telemetry pipelines, optimizing telemetry metrics storage with time-series databases like Mimir & NoSQL DBs, creating dashboards, setting up alerts, and tracking SLIs/SLOs. Your role will also involve enabling Root Cause Analysis (RCA) and incident response using observability data, as well as securing the observability pipeline. The ideal candidate will possess a BE/BTech/MTech (CS/IT or MCA) degree with a focus on Software Engineering, strong skills in interpreting logs, metrics, and traces, proficiency in tools like LGTM (Loki, Grafana, Tempo, Mimi), Jaeger, Datadog, Zipkin, InfluxDB, familiarity with log frameworks such as log4j, lograge, Zerolog, loguru, knowledge of OpenTelemetry, IaC, and security best practices, ability to document observability processes, logging standards & instrumentation guidelines, proactive issue identification and resolution using observability data, and a commitment to maintaining data quality and integrity throughout the observability pipeline. At Sarvaha, you can expect top-notch remuneration, excellent growth prospects, a supportive work environment with talented individuals, challenging software implementation and deployment tasks, and the flexibility of a hybrid work mode offering complete work-from-home options even prior to the pandemic.,
Posted 1 day ago
7.0 - 11.0 years
0 Lacs
haryana
On-site
Site Reliability is a unique blend of development and operations expertise aimed at enhancing organizational efficiency. Regardless of whether you hail from a development background and seek to delve deeper into operations or are a DevOps/Systems Engineer keen on crafting internal tools, your skill set can greatly benefit Cvent SRE. We are on the lookout for individuals who exhibit a fervent passion for continuous learning and technology. A Bachelor's or Master's degree in Computer Science or a related technical field is a prerequisite for this role. As part of our team, you will play a crucial role in ensuring the stability and robustness of our platform. We strive to eliminate barriers by promoting developer accountability and enabling their autonomy. By devising innovative and durable solutions to operational challenges, we extend our unwavering support to developers. Leveraging our expertise as generalists, we collaborate closely with product development teams - right from the initial design phase to identifying and rectifying production issues. Our holistic approach involves establishing and upholding standards while fostering an agile and knowledge-sharing culture. Embracing SRE principles like blameless postmortems and operational load caps, we are constantly enhancing our competencies and enhancing our quality of work life. Our team is deeply passionate about automation, continuous learning, and engaging in dynamic day-to-day operations. **Must Have:** - 7-9 years of relevant experience - Proficiency in SDLC methodologies, especially Agile software development - Strong background in software development with a solid knowledge of Java/Python/Ruby and Object-Oriented Programming concepts - Hands-on experience in managing AWS services and operational expertise in handling applications within AWS - Proficiency in configuration management tools like Chef, Puppet, Ansible, or equivalent - Sound Windows and Linux administration skills - Familiarity with APM, monitoring, and logging tools such as New Relic, DataDog, Splunk - Expertise in managing 3-tier application stacks and incident response - Experience with build tools like Jenkins, CircleCI, Harness, etc. - Exposure to containerization concepts like Docker, ECS, EKS, Kubernetes - Working knowledge of NoSQL databases such as MongoDB, Couchbase, Postgres, etc. - Self-motivation and the ability to work independently are essential. **Good to Have:** - Understanding of F5 load balancing concepts - Basic knowledge of observability, SLIs/SLOs - Familiarity with Message Queues like RabbitMQ - Knowledge of basic networking concepts - Experience with package managers such as Nexus, Artifactory, or equivalent - Strong communication skills - Previous experience in people management.,
Posted 1 day ago
6.0 - 10.0 years
0 Lacs
ahmedabad, gujarat
On-site
As a Senior DevOps Site Reliability Engineer (SRE) at TechBlocks, you will be responsible for ensuring platform reliability, incident management, and performance optimization. Your role will involve defining Service Level Indicators (SLIs) and Service Level Objectives (SLOs), contributing to robust observability practices, and driving proactive reliability engineering across services. With at least 6-10 years of experience in SRE or infrastructure engineering in cloud-native environments, you will play a key role in driving operational excellence and platform resilience. Your responsibilities will include leading incident management for critical production issues, creating and maintaining runbooks for high availability services, and designing observability frameworks using tools like ELK, Prometheus, and Grafana. You will collaborate with DevOps and Infrastructure teams to build highly available and scalable systems, analyze performance metrics, and participate in capacity planning and resilience architecture reviews. Additionally, you will use GCP tools such as GKE, Cloud Monitoring, and Cloud Armor to improve performance and security posture. At TechBlocks, a global digital product engineering company, we value collaboration, creativity, and continuous learning. Join our dynamic team to shape the future of digital transformation and work on innovative projects that drive growth and innovation.,
Posted 2 days ago
5.0 - 9.0 years
0 Lacs
hyderabad, telangana
On-site
The SRE team at Freshworks comprises expert Software and System engineers who are responsible for ensuring the Availability, Scalability, and Performance of the SaaS products. They design tools and frameworks for monitoring, load testing, and occasionally develop complete platform features used by other products. The team conducts architecture reviews and assists individual product teams in identifying performance bottlenecks. The approach taken by the team is bottom-up, focusing on viewing the application from a system perspective. Engineers within the SRE team have the autonomy to select the challenges they wish to tackle and take ownership of tasks until completion. Their responsibilities include designing, coding, and delivering software to enhance the availability, latency, and efficiency of Freshworks Products & Platforms. They also manage the availability, latency, and performance of critical services, implementing automation to prevent recurring issues. Furthermore, the team independently devises and implements architectural strategies and infrastructure solutions. They are tasked with defining strategies, vision, and roadmaps for developing CI/CD, Application hosting, Security, and Compliance standards across Freshworks. The team also conducts blameless postmortems for large-scale incidents, drives automation and orchestration strategies, and formulates cost optimization plans for the Freshworks Cloud environment. Qualifications: - 4-7 years of experience in Site Reliability Engineering, DevOps, or Software Engineering roles - Proficient in programming/scripting languages like Python or Go - Hands-on experience with cloud platforms such as AWS, GCP, or Azure - Deep understanding of SRE principles, including SLIs/SLOs, reliability metrics, and incident response - Familiarity with monitoring and observability tools like Prometheus, Grafana, Datadog, ELK, OpenTelemetry - Solid experience with infrastructure automation tools such as Terraform, Ansible, or Pulumi - Strong knowledge of Linux systems, networking, and containerization (Docker, Kubernetes) - Experience with CI/CD pipelines and version control systems like GitHub Actions, Jenkins, GitLab CI/CD - Strong analytical and problem-solving skills with a proactive and ownership-driven mindset Freshworks offers a dynamic work environment where you can leverage your expertise in Site Reliability Engineering to make a real impact. Join us in building a fresh vision of how the world works.,
Posted 2 days ago
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
We have sent an OTP to your contact. Please enter it below to verify.
Accenture
40175 Jobs | Dublin
Wipro
19626 Jobs | Bengaluru
Accenture in India
17497 Jobs | Dublin 2
EY
16057 Jobs | London
Uplers
11768 Jobs | Ahmedabad
Amazon
10704 Jobs | Seattle,WA
Oracle
9513 Jobs | Redwood City
IBM
9439 Jobs | Armonk
Bajaj Finserv
9311 Jobs |
Accenture services Pvt Ltd
8745 Jobs |