Get alerts for new jobs matching your selected skills, preferred locations, and experience range.
4 - 8 years
40 - 65 Lacs
Bengaluru, Bangalore Rural
Hybrid
Our mission is to create transformative, innovative, and personalized experiences for millions of customers all across the world. We want customers to have an amazing experience wherever and whenever they choose: mobile, web, and through partners and 3rd parties. About the team - Private cloud: The Private Cloud group operates, orchestrates, and optimizes managed cloud infrastructure. The Private Cloud capabilities are provided on platform instances that are privately owned and centrally managed. These platform instances, and the workloads running on them, are hosted both in datacenters (on-premises) and on public cloud infrastructure (AWS). The Private Cloud platform has three primary internal customer-facing verticals: virtualization, containerization, and serverless, corresponding to the three types of workloads it supports. At the highest level, the Private Cloud drives three primary business outcomes: Agility in provisioning and using cloud infrastructure. Efficiency in cost and utilization of cloud infrastructure, as well as toil reduction for developers and engineers. Trust in the safety, reliability, and performance of our cloud infrastructure. Key Job Responsibilities and Duties: The core premise for the SRE lies in treating operational issues as a software problem. We code our way out of problems where operations are concerned addressing availability, scalability, latency, and efficiency challenges within the vast infrastructure here. You will impact millions of people all over the globe with your creative solutions You work in one of the biggest e-commerce companies in the world You will solve exciting problems at scale by writing and deploying code across tens of thousands of servers You will have the opportunity to collaborate with many of the worlds leading SREs You will be free to launch your own ideas and solutions within our sophisticated production environment Here are some of the tools and technologies we use to achieve this: Python, Go, Puppet, Kubernetes, Elasticsearch, Prometheus, HAProxy, Cassandra, Kafka etc What youll be Doing: Design, develop and implement systems software that improves the stability, scalability, availability and latency of the products; Take ownership of one or more services and have the freedom to do what is best for our business and customers; Solve problems occurring with our highly available production systems and build solutions and automation to prevent them from happening again; Build effective monitoring to monitor the health of your system, and jump in to handle outages; Build and run capacity tests to handle the growth of your systems; Plan for reliability by designing systems to work across our multinational data centers; Develop tools to assist the product development teams with successfully deploying 1000s of change sets every day; Share the on-call rotation and be an escalation contact for incidents (depending on level of role) What youll bring: Solid experience in at least one programming language. Experience with building, operating and maintaining scalable distributed systems, and with operations automation; Experience with Infrastructure as Code technologies; Knowledge of cloud computing fundamentals; Solid foundation in Linux administration and troubleshooting; Understanding of Service level agreements and objectives; Additional experience in OpenStack, Kubernetes, Networking, Security or Storage is desirable; Monitoring / observability technologies like Prometheus, Graphite, Grafana, Kibana, Elasticsearch are a plus; Good interpersonal skills Proficient command of the English language, both written and spoken
Posted 3 months ago
10 - 18 years
40 - 45 Lacs
Chennai, Bengaluru, Hyderabad
Work from Office
Required Qualifications: - Education & Experience - bachelors degree in computer science, Information Systems, or a related field (Master’s preferred). - 8+ years of experience in IT operations, DevOps, or Cloud Architecture with a focus on observability and monitoring. Technical Skills: - Deep expertise in Azure services (Azure Monitor, Log Analytics, Application Insights, Key Vault, etc.) and Kubernetes (AKS). - Proficiency in distributed tracing, logging, and metrics (e.g., OpenTelemetry, Prometheus, Grafana, ELK/Splunk). - Familiarity with middlware products such IBM DataPower, ACE, MQ or similar integration products. - Familiarity with React JS, React Native, Node JS or other similar technologies. - Skilled in scripting/automation (PowerShell, Python, etc.). - Demonstrable experience implementing application-level instrumentation (transaction IDs, correlation IDs, structured logging) in microservices. Banking & Compliance Knowledge: - Familiarity with financial regulations (FFIEC, PCI-DSS, SOC 2) and designing solutions that maintain compliance. - Experience with PII handling, encryption, and security best practices. Preferred Qualifications: - Certifications: Microsoft Azure Solutions Architect Expert, Azure DevOps Engineer Expert, or related. - SRE background with experience establishing and running Site Reliability frameworks. - Familiarity with SaaS-based financial solutions and payment processing systems. - Experience with large-scale real-time data processing (Event Hubs, Kafka, etc.) 1.Observability Tools & Platforms: Experience with popular observability tools: Logging: ELK (Elasticsearch, Logstash, Kibana), Loki, Fluentd, Splunk, Azure Log Analytics Metrics: Prometheus, Grafana, Datadog, New Relic, CloudWatch Tracing: OpenTelemetry, Jaeger, Zipkin APM (Application Performance Monitoring): Dynatrace, AppDynamics Ability to architect and implement solutions using these tools at scale. 2.Distributed Systems & Microservices Deep understanding of how to monitor complex, distributed systems (Kubernetes, Service Meshes like Istio/Linkerd). Experience with monitoring cloud-native architectures (AWS, Azure, GCP). 3.Data Pipeline Design Building and optimizing telemetry pipelines (collection, storage, visualization). Handling large-scale data ingestion and processing for logs, metrics, and traces. 4.Automation & Infrastructure as Code (IaC) Proficiency in using Terraform, Ansible, or Helm to deploy and manage observability solutions. Experience with integrating observability into CI/CD pipelines.
Posted 3 months ago
6 - 9 years
10 - 20 Lacs
Gurgaon
Hybrid
Purpose of the Role As a Platform Support Engineer III , you will be responsible for the development of tooling to aid in the maintenance, support, and remediation of issues in our platform to ensure that our systems are healthy. This role will support communication with multiple internal and external stakeholder groups to monitor and manage our systems including problem solving and troubleshooting when issues are detected. This role will also create /define metrics, manage incident management and analyze the business metrics to drive process/system improvements. This role requires a combination of strong collaboration, analysis, critical thinking, and independent decision-making skills. At Yum, we like to continue to build know how and ensure we are documenting processes, in this role, you will be supporting and creating knowledgebase articles and reviewing processes. Additionally, you will be delivering regular and high-quality stakeholder communication related to system management and any issue resolution. Once issues have been closed, you will lead the internal review to identify root cause, conduct impact analysis and drive recommendations to prevent any future issues. Responsibilities Development of tooling to aid in the maintenance, support, and remediation of issues in our platform Update internal dashboard reporting, visualizations, and alerting mechanisms for all production applications and supporting environments. Implement end to end monitoring of different platforms to reduce system downtime. Create and manage metrics for all Yum Platforms (eCommerce, POS, Admin Portal & Menu Management) and Brands. Build and expand reporting against software reporting metrics. Collaborate with engineering groups to lead proactive troubleshooting steps and measures to reduce escalations and resolve issues faster Lead Priority 1 and 2 calls, process improvements and postmortems. Participate in an on-call schedule to cover high priority support incidents which occur outside of normal business hours. Develop new runbooks for repeatable/common issues and create documentation for post issue resolution including root cause and impact analysis within published department SLAs. Collaborate with cross-functional teams and build relationships needed to implement all of the above. Participate in code reviews, ensuring that all platform support processes align with our architectural principles and design patterns, fostering a culture of continuous improvement and technical excellence. Develop and enhance tooling for platform support, aiming to automate repetitive tasks, improve system reliability, and streamline operations, while maintaining a strong focus on scalability and maintainability. Preferred candidate profile Minimum Requirements: BE / BTECH Computer Science/Engineering/CIS or equivalent experience 6-8 years in a Production Support Engineer role which requires managing customer interactions. Ability to remain calm under pressure Strong understanding of APIs Familiarity with ticket tracking tools (Service Now, Jira) Experience with Datadog or similar observability tools Excellent written and verbal communication skills with the ability to clearly communicate across all levels of the organization in high-pressure situations Excellent problem solving and analytical skills Ability to work a flexible schedule to help cover round the clock support Preferred Requirements Understanding of the Web; including familiarity with JavaScript, REST APIs, GraphQL, and Git. Fluency with Unix / Linux
Posted 3 months ago
14 - 22 years
25 - 30 Lacs
Hyderabad, Gurgaon
Work from Office
About GSPANN GSPANN is a global IT services and consultancy provider headquartered in Milpitas, California (U.S.A.). With five global delivery centers across the globe, GSPANN provides digital solutions that support the customer buying journeys of B2B and B2C brands worldwide. With a strong focus on innovation and client satisfaction, GSPANN delivers cutting-edge solutions that drive business success and operational excellence. GSPANN helps retail, finance, manufacturing, and high-technology brands deliver competitive customer experiences and increased revenues through our solution delivery, technologies, practices, and operations for each client. For more information, visit www.gspann.com Job Position: Sr. Architect / Associate Director / Director Experience: 16 + Years Location: Gurugram / Hyderabad Availability to join - Immediate to 45 Days Must have Skills: Leadership in Managed Services Environment, Strong Architect expertise on ITSM/ITIL. Observability, AIOPS, Automation. Responsibilities Lead the Delivery by owning one or more accounts and providing a customer-facing role Oversee the full delivery for a portfolio of complex projects, ensuring service excellence and client satisfaction Define and implement a strategy to increase the level of automation in overall service delivery. Drive the portfolio towards high standards of performance and stability, enhancing end customer experience Implement strategies to improve client and employee satisfaction, increase CSAT scores and reduce project attrition Define standards and process frameworks Identify areas of improvement, automation and shift left opportunities and implement the same Drive governance calls, implement governance and analytical reports Analyse performance issues and provide suggestions for tuning Transform support team to SREs Should be able to define and drive accelerators Required skills: Should have 15+ years of experience in leading large-scale applications. Should have 8+ experience in a leadership role within a Managed Services environment. Should have very good understanding of the IT Infrastructure Library (ITIL) framework and various IT Service Management (ITSM) tools available in the marketplace. Should be able to define and drive observability solutions for different applications. Should possess strong expertise in Managed Services, cloud technologies, and emerging tech trends. Should have exposure and good understanding of Open AI. Should be hands on project governance. Should be able to understand and communicate business strategy and drive technology solutions. Should have knowledge on APM tools Splunk, Dynatrace, Quantum metrics, Thousand Eyes etc. Should be able define and derive SLAs, KPIs required for a support project. Should have end to end understanding of a large-scale application (Content tools, Front End Technologies, ERP systems, Backend applications, COTS tools etc.). Should have very good understanding of anyone programming language Java, Python etc. Should be able to drive strategic proposals. Significant experience in implementing automations. Sound experience in introducing and managing monitoring and alerting systems. Demonstrated experience in improving system performance and stability. Proven ability to lead and inspire teams, excellent communication skills, and a strategic mindset. Why Choose GSPANN? At GSPANN, we dont just serve our clientswe co-create. The GSPANNians are passionate technologists who thrive on solving the toughest business challenges, delivering trailblazing innovations for marquee clients. This collaborative spirit fuels a culture where every individual is encouraged to sharpen their skills, feed their curiosity, and take ownership to learn, experiment, and succeed. We believe in celebrating each others successesbig or smalland giving back to the communities we call home. If youre ready to push boundaries and be part of a close-knit team thats shaping the future of tech, we invite you to carry forward the baton of innovation with us. Lets Co-Create the FutureTogether. Discover Your Inner Technologist Explore and expand the boundaries of tech innovation without the fear of failure. Accelerate Your Learning Shape your career while scripting the future of tech. Seize the ample learning opportunities to grow at a rapid pace. Feel Included At GSPANN, everyone is welcome. Age, gender, culture, and nationality do not matter here, what matters is YOU. Inspire and Be Inspired When you work with the experts, you raise your game. At GSPANN, youre in the company of marquee clients and extremely talented colleagues. Enjoy Life We love to celebrate milestones and victories, big or small. Ever so often, we come together as one large GSPANN family. Give Back Together, we serve communities. We take steps, small and large so we can do good for the environment, weaving in sustainability and social change in our endeavors. We invite you to carry forward the baton of innovation in technology with us. Let’s Co-Create
Posted 3 months ago
2 - 7 years
12 - 22 Lacs
Pune, Bengaluru, Hyderabad
Hybrid
Candidate should have CICD exp with any below mentioned skillset/s, along with expert in any observability tool/s. Scripting experience will be addon for shortlisting profile. Must have: Observability - Expert in observability tools like Dynatrace / Datadog / AppDynamics / Splunk / New Relic CICD Implement CICD solution with ADO / Jenkins / GitHub / GitLab Good to have: Automation Expert in scripting like Shell / PowerShell / Bash / Python etc.
Posted 3 months ago
5 - 10 years
25 - 40 Lacs
Pune, Bengaluru, Gurgaon
Hybrid
Cloud Engineer - SRE Observability Mandatory Skills: Python (Coding), Lambda, Open Telemetry, AWS Services, Cloudwatch Dynatrace On-prem and SaaS | Person should have hands-on experience in setting up and designing dashboards Should be hand on in Python Coding Observability Must have complete context of SLI/SLO/SLA, how to set up, how to measure, how to track and communicate Open Source Observability Stack – Good Understanding of Open Telemetry , How to instrument applications to get desired metrics, traces, logs, etc AWS Service – Cloud Watch, X-Ray, Lambda, overall data flow Open Shift Rosa – Red Hat Open shift on AWS Development Experience – Any language, should be able to read code and develop utilities as required. Grafana Hands on experience on Python
Posted 3 months ago
6 - 10 years
18 - 20 Lacs
Chennai, Noida
Hybrid
For the Observability Role that we are looking for, you can use the below details as a kick starting point to find the right resource The skillsets that we are looking for are as below 1. Experience in AWS environments 2. Experience in Kubernetes Environments as a administrator 3. Experience with Linux Operating systems 4. Experience in Python & shell scripting is a must 5. Experience in Jenkins Pipelines 6. Strong knowledge of DevOps principles 7. Preferably experience with the Opensource monitoring tools like Telegraf, Prometheus, Grafana, Loki 8. Experience in Developing dashboards in Grafana using various data sources like Loki , Prometheus , AWS CloudWatch 9. Experience in using Git / Bitbucket 10. Knowledge about Agile methodologies Keywords Devops Docker AWS Azure Kubernetes Pipelines Deployment Python/Java/any lan Bash Linux Jenkins Jira Bitbucket
Posted 3 months ago
18 - 22 years
50 - 60 Lacs
Bengaluru
Work from Office
We are looking for a leader for our Site Reliability Engineering (SRE), Observability team. As a leader of SRE/Observability you will create compelling Offerings in SRE, Observability and Resiliency for customers and contribute to the business growth. Deliver solutions to our customers and maintain the highest standards and develop and implement Observability and SRE team and offerings for Virtusa. Be a strong thought leader in Site Reliability engineering, Observability, Operational excellence, and DevOps Principles. Strong technical acumen in Cloud Architecture, Observability, Performance Benchmarking, Capacity planning and Reliability tools. Experience in Observability platforms, application monitoring tools and performance analysis techniques. Experience managing & growing technical leaders and teams. Be responsible for building and mentoring a new team of SRE, Observability specialists Strong technical acumen in Cloud Architecture, Observability, Performance Benchmarking, Capacity planning and Reliability tools. KEY QUALIFICATION & EXPERIENCES: 15+ yrs of IT experience with minimum 5 years of experience in SRE/ Observability/ Monitoring tools Bachelor's or Master's degree in Computer Science, Computer Engineering, or a related field. Expert level experience in monitoring and logging technologies, both open source and closed source (e.g. AppDynamics, Newrelic, Datadog, Prometheus, Grafana, LogicMonitor, SumoLogic, ELK) Experience in implementing Metrics, Logs and Tracing for E2E observability A working knowledge of systems is needed. Terraform, Ansible, Chef, Puppet, Jenkins, Designing and implementing CI/CD pipelines, Infrastructure provisioning and management Ability to communicate and coordinate with cross-functional engineering teams across multiple geographic regions. Experience with AIOps and machine learning is highly desirable. Experience with other monitoring tools like Prometheus, Grafana, etc. Experience with Observability solutions like Dynatrace, DataDog, Instana etc. is highly desirable Excellent problem-solving and analytical skills. Strong communication and collaboration skills. Ability to work independently and manage multiple projects simultaneously. Knowledge of IT operations concepts and processes, such as monitoring, incident management, root cause analysis, remediation.
Posted 3 months ago
10 - 18 years
25 - 35 Lacs
Hyderabad
Work from Office
About Client Hiring for One of the Most Prestigious Multinational Corporations! Job Description Job Title : Devops Observability Lead Qualification : Any Graduate or Above Relevant Experience : 10 to 18 Years Must-Have 12+ years of experience in IT Infrastructure, with at least 8+ years in Observability, Monitoring, or SRE roles. Strong expertise in Kubernetes and containerized environments. Hands-on experience with monitoring tools (Prometheus, Grafana, Datadog, Dynatrace). Experience with distributed tracing tools (Jaeger, OpenTelemetry). Proficiency in Python or Go for automation and scripting. Experience with logging tools (Splunk, ELK Stack, Fluentd, Loki).Strong understanding of metrics, logging, and tracing concepts. Knowledge of cloud platforms (AWS, Azure, or GCP) and experience integrating observability solutions in cloud-native environments. Familiarity with databases (MySQL, PostgreSQL). Experience with Infrastructure as Code (IaC) tools like Terraform or Helm Expectation To drive self-healing, intelligent monitoring, and proactive incident response while collaborating with SRE, DevOps, and development teams to enhance system reliability and performance. Location : Hyderabad CTC Range : 20 LPA TO 35 LPA Notice period : Immediate,30 days Shift Timing : N/A Mode of Interview : VIRTUAL Mode of Work : WORK FROM OFFICE Vardhani IT Staffing Analyst Black and White Business solutions PVT Ltd Bangalore, Karnataka, INDIA. 8686127477 I vardhani@blackwhite.in I www.blackwhite.in
Posted 3 months ago
6 - 11 years
12 - 22 Lacs
Bengaluru
Remote
Site Reliability Engineer (Monitoring and Observability) Job Location: Hyderabad / Bangalore / Chennai / Noida/ Gurgaon / Pune / Indore / Mumbai/ Kolkata Technical Expertise: Minimum 6+ Years of total experience. Strong Monitoring and Observability experience in any 3 tools like ELK, Dynatrace, Datadog, New Relic, AppDynamics, Splunk, Grafana 1. Chef (basic syntax, recipes, cookbooks) and Ansible (basic syntax, tasks, playbooks) 2. Terraform basic syntax and GitLab CI/CD configuration, pipelines, jobs 3. Cloud resources provisioning and configuration through CLI/API 4. Kubernetes basic understanding, CLI, service re-provisioning 5. Provisiong and setup metric in Prometheus, Thanos, and Grafana , alerts and silences 6. Provision and setup logs and queries for general questions 7. Linux configuration, package management, startup and troubleshooting 8. Block and object storage configuration 9. Networking VPCs, proxies and CDNs
Posted 3 months ago
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
We have sent an OTP to your contact. Please enter it below to verify.
Accenture
36723 Jobs | Dublin
Wipro
11788 Jobs | Bengaluru
EY
8277 Jobs | London
IBM
6362 Jobs | Armonk
Amazon
6322 Jobs | Seattle,WA
Oracle
5543 Jobs | Redwood City
Capgemini
5131 Jobs | Paris,France
Uplers
4724 Jobs | Ahmedabad
Infosys
4329 Jobs | Bangalore,Karnataka
Accenture in India
4290 Jobs | Dublin 2