Jobs
Interviews

651 Sre Jobs - Page 19

Setup a job Alert
JobPe aggregates results for easy application access, but you actually apply on the job portal directly.

10.0 - 20.0 years

35 - 50 Lacs

Bengaluru

Work from Office

Lead architecture and delivery of scalable Azure-based solutions. Drive cloud strategy, digital enablement of O&G workflows, team leadership, compliance, risk management, and continuous improvement. Required Candidate profile Experienced architect/lead with expertise in Azure cloud, Oil & Gas workflows, and team management. Proven skills in solution design, compliance, stakeholder engagement, and driving innovation

Posted 1 month ago

Apply

3.0 - 8.0 years

5 - 10 Lacs

Pune

Work from Office

Since its inception in 2003, driven by visionary college students transforming online rent payment, Entrata has evolved into a global leader serving property owners, managers, and residents. Honored with prestigious awards like the Utah Business Fast 50, Silicon Slopes Hall of Fame - Software Company - 2022, Women Tech Council Shatter List, our comprehensive software suite spans rent payments, insurance, leasing, maintenance, marketing, and communication tools, reshaping property management worldwide. Our 2200+ global team members embody intelligence and adaptability, engaging actively from top executives to part-time employees. With offices across Utah, Texas, India, Israel, and the Netherlands, Entrata blends startup innovation with established stability, evident in our transparent communication values and executive town halls. Our product isn't just desirable; it's industry essential. At Entrata, we passionately refine living experiences, uphold collective excellence, embrace > Job Summary Entrata Software is seeking a DevOps Engineer to join our R&D team in Pune, India. This role will focus on automating infrastructure, streamlining CI/CD pipelines, and optimizing cloud-based deployments to improve software delivery and system reliability. The ideal candidate will have expertise in Kubernetes, AWS, Terraform, and automation tools to enhance scalability, security, and observability. Success in this role requires strong problem-solving skills, collaboration with development and security teams, and a commitment to continuous improvement. If you thrive in fast-paced, Agile environments and enjoy solving complex infrastructure challenges, we encourage you to apply! Key Responsibilities Design, implement, and maintain CI/CD pipelines using Jenkins, GitHub Actions, and ArgoCD to enable seamless, automated software deployments. Deploy, manage, and optimize Kubernetes clusters in AWS, ensuring reliability, scalability, and security. Automate infrastructure provisioning and configuration using Terraform, CloudFormation, Ansible, and scripting languages like Bash, Python, and PHP. Monitor and enhance system observability using Prometheus, Grafana, and ELK Stack to ensure proactive issue detection and resolution. Implement DevSecOps best practices by integrating security scanning, compliance automation, and vulnerability management into CI/CD workflows. Troubleshoot and resolve cloud infrastructure, networking, and deployment issues in a timely and efficient manner. Collaborate with development, security, and IT teams to align DevOps practices with business and engineering objectives. Optimize AWS cloud resource utilization and cost while maintaining high availability and performance. Establish and maintain disaster recovery and high-availability strategies to ensure system resilience. Improve incident response and on-call processes by following SRE principles and automating issue resolution. Promote a culture of automation and continuous improvement, identifying and eliminating manual inefficiencies in development and operations. Stay up-to-date with emerging DevOps tools and trends, implementing best practices to enhance processes and technologies. Ensure compliance with security and industry standards, enforcing governance policies across cloud infrastructure. Support developer productivity by providing self-service infrastructure and deployment automation to accelerate the software development lifecycle. Document processes, best practices, and troubleshooting guides to ensure clear knowledge sharing across teams. Minimum Qualifications 3+ years of experience as a DevOps Engineer or similar role. Strong proficiency in Kubernetes, Docker, and AWS. Hands-on experience with Terraform, CloudFormation, and CI/CD tools (Jenkins, GitHub Actions, GitLab CI/CD, ArgoCD). Solid scripting and automation skills with Bash, Python, PHP, or Ansible. Expertise in monitoring and logging tools such as NewRelic, Prometheus, Grafana, and ELK Stack. Understanding of DevSecOps principles, security best practices, and vulnerability management. Strong problem-solving skills and ability to troubleshoot cloud infrastructure and deployment issues effectively. Preferred Qualifications Experience with GitOps methodologies using ArgoCD or Flux. Familiarity with SRE principles and managing incident response for high-availability applications. Knowledge of serverless architectures and AWS cost optimization strategies. Hands-on experience with compliance and governance automation for cloud security. Previous experience working in Agile, fast-paced environments with a focus on DevOps transformation. Strong communication skills and ability to mentor junior engineers on DevOps best practices. If you're passionate about automation, cloud infrastructure, and building scalable DevOps solutions ,

Posted 1 month ago

Apply

3.0 - 8.0 years

8 - 12 Lacs

Bengaluru

Work from Office

In this Site Reliability Engineer role, you will work closely with entire IBM Cloud organization to maintain and operationally improve the IBM cloud infrastructure. You will focus on the following key responsibilities: Ability to respond promptly to production issues and alerts 24x7 Execute changes in the production environment through automation Implement and automate infrastructure solutions that support IBM Cloud products and services to reduce toil. Partner with other SRE teams and program managers to deliver mission-critical services to IBM Cloud Build new tools to improve automated resolution of production issues Monitor, respond promptly to production alerts, Execute changes in Production through automation Support the compliance and security integrity of the environment Continually improve systems and processes regarding automation and monitoring. Required education Bachelor's Degree Required technical and professional expertise Excellent written and verbal communication skills. Minimum 3+ years experience in handling large production systems environment Must be extremely comfortable using and navigating within a Linux environment Ability to do low level debugging and problem analysis by examining logs and running Unix commands Must be efficient in writing and debugging scripts 3-5+ years of experience in Virtualization Technologies and Automation / Configuration Managements Automation and configuration management tools/solutionsAnsible, Python, bash, Terraform, GoLang etc. (at least one) Virtualization technologiesCitrix Xen Hypervisor (Preferred), KVM(also preferred), libvirt, VMware vSphere, etc. (at least one) Monitoring technologiesZabbix, Sysdig, Grafana, Nagios, Splunk, etc. (at least one) Working knowledge with Container technologiesKubernetes, Docker, etc. Flexibility to work on shifts to handle production systems Preferred technical and professional experience Good experience inPublic cloud platforms,Kubernetes clusters and Strong Linux skills for managing services across microservices platform, good SRE knowledge in Cloud Compute, Storage and Network services.

Posted 1 month ago

Apply

3.0 - 5.0 years

7 - 12 Lacs

Chennai

Work from Office

Responsibilities Experience in a SRE, DevOps, or Systems Engineering role Collaborate with cross-functional teams on CI/CD pipelines using Jenkins & Git. Proficient in monitoring/logging tools (Prometheus, Grafana, etc.) Health insurance Provident fund Flexi working Free meal

Posted 1 month ago

Apply

3.0 - 5.0 years

3 - 5 Lacs

Bengaluru / Bangalore, Karnataka, India

On-site

Position Summary: We are looking for a Site Reliability Engineer to join our Cloud Infrastructure Engineering division in Bangalore .Cloud Infrastructure Engineering ensures the continuous availability of the technologies and systems that are the foundation of athenahealth's services.We are directly responsible for thousands of servers, petabytes of storage, and handling thousands of web requests per second, all while sustaining growth at a meteoric rate. We enable an operating system for the medical office that abstracts away administrative complexity, leaving doctors free to practice medicine. The Team: We are a bunch of Site Reliability Engineers who are passionate about reliability, automation, and scalability. We use an agile based framework to execute our work, ensuring we are always focused on the most important and impactful needs of the business. We support systems in both private and public cloud and make data-driven decisions for which one best suit the needs of the business. We are relentless in automating away manual, repetitive work so we can focus on projects that help move the business forward. Job Responsibilities Leverage the principles of resilience engineering to ensure the stability, security, and scalability of our systems and services Ensure product success through the strategic use of technologies such as Linux, Puppet, Consul, apache, load balancing, and Storage etc. Develop software-based, automated solutions to core business problems Design and implement software services that support Infrastructure-as-Code principles Typical Qualifications 3 to 5 years of experience building, scaling, and supporting highly available systems and services Strong expertise in Hybrid Environment (Physical/Virtual and Cloud) Expertise in the build, maintenance, and support of Linux systems and infrastructure Strong Expertise in configuration management tools like Puppet. Experience with Infrastructure-as-Code, Linux, and API integration. Familiarity with Terraform desired. Proficiency in at least one scripting or programming language (Ansible, Python, Go, Ruby, etc.). Experience implementing solutions using SRE, DevOps principles, Familiarity with telemetry, latest monitoring, visualization tools, Expertise in promoting and driving system visibility to aid in the rapid detection and resolution of issues Computer Science degree or equivalent experience Behaviors & Abilities Required: Ability to learn and adapt in a fast-paced environment Ability to work collaboratively on a cross-functional team with a wide range of experience levels Ability to prioritize both individual time and the time of the team

Posted 1 month ago

Apply

5.0 - 10.0 years

20 - 35 Lacs

Hyderabad

Work from Office

Key Responsibilities: Design, implement, and maintain AWS infrastructure ensuring scalability and high availability utilizing infrastructure as code (IaC). Manage and optimize Windows Server environments, focusing on security and reliability. Collaborate with development and operations teams to automate and streamline processes. Monitor system performance and resolve issues to prevent outages. Participate in an on-call rotation to address urgent issues and maintain system integrity. Develop and maintain documentation for system configuration and procedures. Develop and implement automation scripts and tools to streamline deployment activities. Required Qualifications: Minimum of five years of experience in Cloud/SRE/DevOps or a related field. Proven experience with AWS services including EC2, VPC, S3, RDS, and others. Strong proficiency in managing Windows Server and Linux environments. Experience with AWS IAM and security protocols. Familiarity with tools like Terraform, PowerShell, and Docker for automation. Proficiency in writing comprehensive technical documentation. Nice to Have: Expertise with Microsoft Entra ID (Azure AD) and AWS IAM. Knowledge of Windows Server Remote Desktop Services on AWS. Experience using SAML for authentication in Windows Domains. Familiarity with RDS databases (Oracle and MS SQL), especially conversion to AWS RDS. Experience in Identity and Access Management (IAM) across organizations and applications.

Posted 1 month ago

Apply

12.0 - 14.0 years

35 - 50 Lacs

Bengaluru

Work from Office

AWS,Linux, Kubernetes, Docker, terraform, Ansible, SRE, Jenkins, Groovy, Helm, shell,python JD: 1. Proficiency in containerization tools (e.g., Docker, Kubernetes) 2. Strong knowledge of AWS or other cloud platforms, including cloud formation, terraform, & etc 3. Proficient in scripting languages such as bash, Python, and Go. 4. Expertise in automation with tools like Ansible 5. Familiarity with CI/CD pipeline tools (Jenkins, Bamboo, Maven, Sonarqube, Git) 6. Experience in web server management (Apache, Tomcat, Nginx, load balancers) 7. Hands-on experience in production setup and management 8. Understanding of complex architectures and collaboration across teams Additional Skills: Strong troubleshooting skills (security, monitoring, server load, networking) Maintaining and managing Linux servers, including applying patches and upgrades to the OS and applications Expertise in shell scripting and modern automation technologies such as Ansible and Python. SRE/troubleshooting within a UNIX/Linux environment; deploying applications, ensuring clusters are up and running, spinning up new clusters, etc.Automation scripting for alerting purposes Good Understanding of Build/Release and deployment process Knowledge of containerization tools such as Docker and Kubernetes Proficiency in AWS or other cloud platforms Good knowledge on enabling workflows and pipelines using Jenkins Experience in web server management, including Apache, Tomcat, load balancers, and Nginx Knowledge and hands-on experience in production setup and management Understanding of complex architectures and ability to work with multiple teams Strong troubleshooting skills, including monitoring processes, server load, and basic DNS and networking

Posted 1 month ago

Apply

3.0 - 8.0 years

15 - 30 Lacs

Bengaluru

Work from Office

Job Title: Python Migration Engineer Location: Bangalore, India Experience: 3-6 Years Employment Type: Full-time Skills Required: - Migration role, SRE or a Migration experience preferred. Developer is also good if they are ok with migration - Hands-on experience on Python or Shell Scripting (not developer level but migration engineer level) - Good working experience on Linux as a platform, need very good exposure on Linux - Notice immediate to 30 days Mandatory skills- Migration SRE, Python or Shell scripting .

Posted 1 month ago

Apply

2.0 - 3.0 years

7 - 10 Lacs

Hyderabad

Work from Office

AI Ops/Monitoring Specialist openings at Advantum Health Pvt Ltd, Hyderabad. Overview: Were seeking an AI Ops/Monitoring Specialist to ensure the stability, transparency, and performance of AI systems in production. You will monitor, log, and troubleshoot AI and RPA models to ensure continuous reliability and compliance. Key Responsibilities: Monitor AI model health (drift, performance, latency, bias). Build dashboards and alerts using tools like Prometheus, Grafana, or Datadog. Establish SLAs and SLOs for AI/RPA models and pipelines. Collaborate with AI teams to integrate observability into model lifecycles. Document anomalies and assist in root cause analysis and mitigation. Qualifications: Bachelors in Data Science, IT, or a related field. 2+ years in systems monitoring, SRE, or MLOps. Experience with model monitoring tools (e.g., MLflow, Arize, WhyLabs). Familiarity with AI/ML lifecycles and performance metrics. Background in healthcare or compliance-heavy environments is ideal. Ph: 9177078628 Email id: jobs@advantumhealth.com Address: Advantum Health Private Limited, Cyber gateway, Block C, 4th floor Hitech City, Hyderabad. Do follow us on LinkedIn, Facebook, Instagram, YouTube and Threads Advantum Health LinkedIn Page: https://lnkd.in/gVcQAXK3 Advantum Health Facebook Page: https://lnkd.in/g7ARQ378 Advantum Health Instagram Page: https://lnkd.in/gtQnB_Gc Advantum Health India YouTube link: https://lnkd.in/g_AxPaPp Advantum Health Threads link: https://lnkd.in/gyq73iQ6

Posted 1 month ago

Apply

6.0 - 10.0 years

7 - 17 Lacs

Hyderabad

Work from Office

In this role, you will: Manage and develop teams of analysts, associates, and less experienced managers in roles that provide technical services and support for the relevant supported systems Engage and influence stakeholders, internal partners, and peers in order to engineer projects, identify new products and solutions, and research solutions for existing systems Identify and recommend opportunities for administration and maintenance of the remote monitoring and management system, as well as the periodic system review Perform network assessments, security audits, and system enhancement consultations Determine appropriate strategy and actions of Systems Operations team to meet moderate to high risk deliverables Interpret and develop policies and procedures, and understand compliance and risk management requirements for supported system area Provide implementation support for key risk initiatives Collaborate with and influence all levels of professionals, analysts, or associates Ensure the Systems Operations team communicates with customers to keep them informed of incident progress, and notify them of impending changes or agreed outages Manage allocation of people and financial resources for Systems Operations Develop and guide a culture of talent development to meet business objectives and strategy. Required Qualifications: 6+ years of Systems Engineering and Technology Architecture experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education 3+ years of Management experience Desired Qualifications: Experience in managing Mainframe/Hogan applications will be a plus. Should have driven automation/SRE initiatives in platform operations. Should have strong interpersonal and communication skills.

Posted 1 month ago

Apply

8.0 - 13.0 years

30 - 45 Lacs

Bengaluru

Work from Office

Role & responsibilities Design, build, and manage cloud infrastructure using Terraform, GCP, and Kubernetes. Automate configuration management and provisioning tasks using Ansible and scripting languages. Develop and maintain CI/CD pipelines using tools like GitLab CI, Jenkins, or ArgoCD. Manage Kubernetes clusters and containerized applications in production environments. Ensure system availability, performance, and scalability in a cloud-native architecture. Monitor infrastructure and application health using Prometheus, Grafana, and Stackdriver. Troubleshoot infrastructure and deployment issues across staging and production environments. Implement and enforce cloud security best practices, including IAM policies and secrets management. Optimize cloud usage and architecture for performance and cost-efficiency. Maintain comprehensive documentation of infrastructure components and operational procedures. Provide technical guidance to peers and collaborate across teams on DevOps best practices. Preferred candidate profile • 7+ years of experience in DevOps, SRE, or Systems Engineering roles. • Extensive hands-on experience with Google Cloud Platform (GCP) services (e.g., Compute Engine, Kubernetes Engine, IAM, VPC). • Strong proficiency with Kubernetes, including cluster setup, management, and monitoring. • Advanced expertise in Terraform for infrastructure automation and lifecycle management. • Experience with Ansible for configuration management and orchestration. • Solid background in Linux system administration (Ubuntu, CentOS, or RedHat). • Skilled in CI/CD tools such as GitLab CI, Jenkins, or similar. • Strong scripting ability in Bash, Python, or Go. • Good understanding of cloud networking, security, and high availability principles. • Experience with observability tools like Prometheus, Grafana, Stackdriver, or equivalent. • Strong analytical and problem-solving skills in a cloud environment. • Effective communication skills and ability to work collaboratively across teams.

Posted 1 month ago

Apply

4.0 - 8.0 years

4 - 8 Lacs

Hyderabad / Secunderabad, Telangana, Telangana, India

On-site

ob Title: Azure DevOps (Engineer) - L2 Support Location Hyderabad, Telangana, India Experience Level 5+ years of industry experience Overview We are seeking a skilled and proactive Support Engineer with deep expertise in Azure cloud services, Kubernetes, and DevOps practices. The ideal candidate will have experience working with Azure services, including Kubernetes, API management, monitoring tools, and various cloud infrastructure services. You will be responsible for providing technical support, managing cloud-based systems, troubleshooting complex issues, and ensuring smooth operation and optimization of services within the Azure ecosystem. Key Responsibilities As an Azure DevOps (Engineer) - L2 Support, you will: Technical Support: Provide technical support for Azure-based cloud services, including Azure Kubernetes Service (AKS), Azure API Management, Application Gateway, Web Application Firewall, and Azure Monitor with KQL queries. Azure Service Management: Manage and troubleshoot various Azure services such as Event Hub, Azure SQL, Application Insights, Virtual Networks, and WAF. Kubernetes & GitOps: Work with Kubernetes environments, troubleshoot deployments, utilize Helm Charts, check resource utilization, and manage GitOps processes. Infrastructure Automation: Utilize Terraform to automate cloud infrastructure provisioning, configuration, and management. Database Troubleshooting: Troubleshoot and resolve issues in MongoDB and Microsoft SQL Server databases, ensuring high availability and performance. Monitoring & Alerting: Monitor cloud infrastructure health using Grafana and Azure Monitor, providing insights and proactive alerts. Root Cause Analysis: Provide root-cause analysis for technical incidents, propose and implement corrective actions to prevent recurrence. Continuous Optimization: Continuously optimize cloud services and infrastructure to improve performance, scalability, and security. Required Skills & Qualifications Technical Proficiency: Azure Services: Hands-on experience with Azure services such as AKS, API Management, Application Gateway, WAF, Event Hub, Azure SQL, Application Insights, Virtual Networks . Azure Monitoring: Strong knowledge of Azure Monitor and KQL Queries . Kubernetes: Strong hands-on expertise in Kubernetes , Helm Charts , and GitOps principles for managing/troubleshooting deployments. Infrastructure as Code (IaC): Hands-on experience with Terraform for infrastructure automation and configuration management. Databases: Proven experience in MongoDB and Microsoft SQL Server , including deployment, maintenance, performance tuning, and troubleshooting. Monitoring Tools: Familiarity with Grafana for monitoring, alerting, and visualization of cloud-based services. Azure DevOps: Experience using Azure DevOps tools , including Repos and Pipelines for CI/CD automation and source code management. Networking: Solid understanding of Virtual Networks, WAF, Firewalls, and other related Azure networking tools. Certifications: Azure Certification (e.g., Azure Solutions Architect, Azure Administrator). Any Kubernetes Certification (e.g., CKAD or CKA). Experience & Qualifications: 5+ years of industry experience with Azure cloud services, Kubernetes, and DevOps practices. Bachelor's degree in Computer Science, Information Technology, or a related field, or equivalent work experience. Essential Professional Skills Excellent troubleshooting, analytical, and problem-solving skills . Strong written and verbal communication skills , with the ability to explain complex technical issues to non-technical stakeholders. Ability to work in a fast-paced environment and manage multiple priorities effectively. Preferred Skills Experience with cloud security best practices in Azure . Knowledge of infrastructure as code (IaC) concepts and tools . Familiarity with containerized applications and Docker .

Posted 1 month ago

Apply

7.0 - 12.0 years

7 - 12 Lacs

Hyderabad / Secunderabad, Telangana, Telangana, India

On-site

Job Title: Site Reliability Engineer (SRE) - Senior Engineer Experience Level Minimum 7 years of work experience as an SRE (not Traditional Production Support). Minimum of 6-8 years work experience in critical production environments. Key Responsibilities As a Site Reliability Engineer (SRE), you will: Platform Support & Reliability: Work as part of a 24/7 on-desk team in shifts, managing middleware and associated applications consumed globally, covering incident, change, event, and problem management. Be the guardian to ensure high reliability of applications, middleware, storage platforms, schedulers (and their jobs), and underlying cloud infrastructure. Cloud Infrastructure Management: Assess and enhance cloud infrastructure and data pipeline resilience. Provide GCP (Google Cloud) and private-cloud operational support/administration activities such as provisioning, capacity management, reliability management, monitoring, and restoration. Manage Kubernetes cluster management, monitoring, and remediation. Observability & Monitoring: Set up and configure an observability product (preferably AppDynamics or Splunk) for end-to-end traceability and log analytics. Define SLIs and configure SLOs, respond to threshold alerts, and continuously optimize monitoring capability to reduce noise. Set up anomaly detection and auto-remediation workflows. Automation & CI/CD: Develop coding/automation scripting (particularly for integration tier and middleware) to automate deployments and script self-healing workflows based on telemetry. Work with CI/CD toolchains, setting up and running deployment pipelines and propagating changes on different environments. Automate new change rollouts and validate/automate patch rollout processes. Troubleshooting & Debugging: Debug integrations and consumers at the code level. Work with code as well as configuration artifacts to debug and fix issues that may arise. Diagnose and debug systems at the application level. Toil Reduction: Eliminate toil by lowering incident volume, eliminating noise from alerts, automating manual processes, and converting workarounds into system features. Collaboration & Engagement: Work with Development, QA, and other squads to design, build, and roll out reliability features into applications. Engage in on-call and critical operations support activities while leading blameless post-mortems. Direct liaison with customers for stakeholder management. Mandatory Skills & Experience Technical Proficiency: SRE & DevOps Experience: Minimum 7 years of work experience as an SRE (not Traditional Production Support) covering integration platforms on cloud-based deployments. Cloud Platforms: Working experience with GCP (Google Cloud) , particularly with GKE (Google Kubernetes Engine) . Experience with AWS Cloud Infra operations on production is also needed. Coding/Scripting/Automation: Proficiency in coding/automation scripting in any programming language (e.g., Python scripts, Ansible templates), particularly for integration tier and middleware. Ability to work with code and configuration artifacts to debug and fix issues. Containerization & Orchestration: Knowledge of Docker is important. Kubernetes cluster management, monitoring, and remediation. Middleware: Maintaining middleware such as Kafka (open source) and MQ as well as application servers ( Tomcat ). Data Storage: Maintaining Hazelcast Data storage platform clusters . Job Scheduling: Experience with Control M job schedulers . Monitoring Tools: Working with AppDynamics and Splunk for monitoring and setting up observability. Experience implementing system and application monitoring for cloud-based applications/SaaS components, including setting up alerts and building dashboards. CI/CD: Experience with CI/CD tool chains , setting up and running deployment pipelines, propagating changes on different environments, and troubleshooting failed deployments. SQL: Working knowledge of SQL and troubleshooting by writing queries. SRE Principles: Knowledge of applying SRE practices to daily operations is key, particularly toil reduction, blameless post-mortems, monitoring distributed systems, and release engineering. Experience & Qualifications: Minimum 7 years of work experience as an SRE in critical production environments. Experience working as a DevOps Engineer or SRE in mission-critical applications and infrastructure. Ability to work in shifts in office (24/7 on-desk operation is mandatory). Certifications (Mandatory/Preferred): SRE Foundation certification by DevOps Institute or any other equivalent certification on SRE by a recognized body is mandatory. (SRE Foundation certification via PeopleSoft / DevOps Institute is beneficial). CKA certification (Certified Kubernetes Administrator). GCP Cloud Digital Leader certification at a minimum is mandatory; Cloud Engineer level is a bonus. Hazelcast Platform Operations certification badge (Preferred). ITIL4 Foundation certification is preferred. AWS Solutions Architect - Associate qualification or alternative is preferred. Essential Professional Skills Problem-Solving & Debugging: Strong ability to diagnose and debug systems at the application level. Must be inclined to work on proof-of-concept solutions to optimize reliability (e.g., AI models for event correlation and assisted triaging). Observability Definition: Ability to define SLIs and configure SLOs, and continuously refine thresholds. Communication & Collaboration: Excellent communication skills. Ability to work collaboratively in a team and with Development, QA, and other squads. Direct liaison with customers for stakeholder management. Ownership: Be the guardian to ensure high reliability. Learning Agility: Ability to continuously refine alerts and ensure all alerts/incidents within scope are actioned upon before breaching SLOs.

Posted 1 month ago

Apply

6.0 - 8.0 years

6 - 8 Lacs

Gurgaon / Gurugram, Haryana, India

On-site

Roles and Responsibilities : Responsibilities: Define and enforce SLOs, SLIs, and error budgets across microservices Architect an observability stack (metrics, logs, traces) and drive operational insights Automate toil and manual ops with robust tooling and runbooks Own incident response lifecycle: detection, triage, RCA, and postmortems Collaborate with product teams to build fault-tolerant systems Champion performance tuning, capacity planning, and scalability testing Optimise costs while maintaining the reliability of cloud infrastructure Must-have skills : 6+ years in SRE/Infrastructure/Backend related roles using Cloud Native Technologies 2+ years in SRE-specific capacity Strong experience with monitoring/observability tools (Datadog, Prometheus, Grafana, ELK etc.) Experience with infrastructure-as-code (Terraform/Ansible) Proficiency in Kubernetes, service mesh (Istio/Linkerd), and container orchestration Deep understanding of distributed systems, networking, and failure domains Expertise in automation with Python, Bash, or Go Proficient in incident management, SLAs/SLOs, and system tuning Hands-on experience with GCP(preferred)/AWS/Azure and cloud cost optimisation Participation in on-call rotations and running large-scale production systems Nice to have skills: Familiarity with chaos engineering practices and tools (Gremlin, Litmus) Background in performance testing and load simulation (Gatling, Locust, k6, JMeter) Why us: You will be working with a lean team of passionate and talented individuals. We know that working with like-minded people is important. We are on a mission to supercharge brick-and-mortar retail stores in the era of e-commerce. Our customers give us confidence in our journey, and you will have a huge impact with your wor.k You will be free to experiment and can choose to do things differently. Lastly, we deeply care about a culture of being a solver. Come, be one with us! Equal opportunity employer: Grey Orange Inc. is an equal employment opportunity employer. The company s policy is not to discriminate against any applicant or employee based on race, color, religion, national origin, gender, age, sexual orientation, gender identity or expression, veteran status, marital status, mental or physical disability, and genetic information, or any other basis protected by applicable law. Grey Orange also prohibits harassment of applicants or employees based on any of these protected categories.

Posted 1 month ago

Apply

6.0 - 11.0 years

20 - 25 Lacs

Hyderabad, Ahmedabad

Hybrid

Hi Aspirant, Greetings from TechBlocks - IT Software of Global Digital Product Development - Hyderabad !!! About us : TechBlocks is a global digital product engineering company with 16+ years of experience helping Fortune 500 enterprises and high-growth brands accelerate innovation, modernize technology, and drive digital transformation. From cloud solutions and data engineering to experience design and platform modernization, we help businesses solve complex challenges and unlock new growth opportunities. Job Title: Senior DevOps Site Reliability Engineer (SRE) Location : Hyderabad & Ahmedabad Employment Type: Full-Time Work Model - 3 Days from office Job Overview Dynamic, motivated individuals deliver exceptional solutions for the production resiliency of the systems. The role incorporates aspects of software engineering and operations, DevOps skills to come up with efficient ways of managing and operating applications. The role will require a high level of responsibility and accountability to deliver technical solutions. Summary: As a Senior SRE, you will ensure platform reliability, incident management, and performance optimization. You'll define SLIs/SLOs, contribute to robust observability practices, and drive proactive reliability engineering across services. Experience Required: 610 years of SRE or infrastructure engineering experience in cloud-native environments. Mandatory: Cloud : GCP (GKE, Load Balancing, VPN, IAM) Observability: Prometheus, Grafana, ELK, Datadog Containers & Orchestration : Kubernetes, Docker Incident Management: On-call, RCA, SLIs/SLOs IaC : Terraform, Helm Incident Tools: PagerDuty, OpsGenie Nice to Have : GCP Monitoring, Skywalking Service Mesh, API Gateway GCP Spanner, Scope: Drive operational excellence and platform resilience Reduce MTTR, increase service availability Own incident and RCA processes Roles and Responsibilities: Define and measure Service Level Indicators (SLIs), Service Level Objectives ( SLOs), and manage error budgets across services. Lead incident management for critical production issues drive Root Cause Analysis (RCA) and postmortems. Create and maintain runbooks and standard operating procedures for high availability services. Design and implement observability frameworks using ELK, Prometheus, and Grafana ; drive telemetry adoption. Coordinate cross-functional war-room sessions during major incidents and maintain response logs. Develop and improve automated System Recovery, Alert Suppression, and Escalation logic. Use GCP tools like GKE, Cloud Monitoring, and Cloud Armor to improve performance and security posture. Collaborate with DevOps and Infrastructure teams to build highly available and scalable systems. Analyze performance metrics and conduct regular reliability reviews with engineering leads. Participate in capacity planning, failover testing, and resilience architecture reviews. If you are interested , then please share me your updated resume to kranthikt@tblocks.com Warm Regards, Kranthi Kumar kranthikt@tblocks.com Contact: 8522804902 Senior Talent Acquisition Specialist Toronto | Ahmedabad | Hyderabad | Pune www.tblocks.com

Posted 1 month ago

Apply

10.0 - 16.0 years

20 - 35 Lacs

Pune

Hybrid

Role & responsibilities Define DevSecOps and Cloud-native platform engineering offerings (GitOps, Platform as a Product, SRE-as-a-service). Create standardized CI/CD reference pipelines and IaC frameworks. Lead maturity assessments and roadmap planning for cloud-native transformations. Partner with customers to solve release automation, reliability, and cost challenges. Drive creation of reusable playbooks, automation templates, and cloud migration blueprints. Coach DevOps engineers and enable competency building through labs and internal guilds.

Posted 1 month ago

Apply

6.0 - 11.0 years

25 - 40 Lacs

Bengaluru

Work from Office

Hi, Greetings from Thales India Pvt Ltd.....! We are hiring for Senir Engineer/Technical Lead - Devops Engineer for our Engineering competency center for Bangalore location . Experience: 6 to 12 years. Notice Period: Immediate to Max 30 Days. About Thales: Thales people architect identity management and data protection solutions at the heart of digital security. Business and governments rely on us to bring trust to the billons of digital interactions they have with people. Our technologies and services help banks exchange funds, people cross borders, energy become smarter and much more. More than 30,000 organizations already rely on us to verify the identities of people and things, grant access to digital services, analyze vast quantities of information and encrypt data to make the connected world more secure. Present in India since 1953, Thales is headquartered in Noida, Uttar Pradesh, and has operational offices and sites spread across Bengaluru, Delhi, Gurugram, Hyderabad, Mumbai, Pune among others. Over 1800 employees are working with Thales and its joint ventures in India. Since the beginning, Thales has been playing an essential role in Indias growth story by sharing its technologies and expertise in Defense, Transport, Aerospace and Digital Identity and Security markets. Additional: Imperva, a Thales Company is a cybersecurity leader Together, we provide innovative platforms designed to reduce the complexity and risks of managing and protecting more applications, data, and identities than any other company can. Our solutions enable over 35,000 organizations to deliver trusted digital services to billions of consumers around the world every day. JOB Summary: We're building a first-of-its-kind AI Firewall to protect applications using Large Language Models (LLMs). As one of the first DevOps Engineers on the team, you'll build and maintain the CI/CD pipelines, observability stack, and deployment infrastructure for a cutting-edge AI Firewall. Your work ensures our services are secure, fast, and always available. Job Knowledge, Skill and Qualifications: BE, M.Sc. in Computer Science or equivalent 6+ years of experience in DevOps, SRE, or Infrastructure Engineering Proficient with Kubernetes, Docker, and cloud platforms (AWS/GCP/Azure) Experience in developing performance-oriented applications. Strong scripting skills (Bash, Python, or Groovy) Background in AI/ML, Networking concepts such as TCP/UDP, HTTP, TLS etc. Bonus: Experience with security tooling, API gateways, or LLM-related infrastructure

Posted 1 month ago

Apply

6.0 - 10.0 years

8 - 12 Lacs

Gurugram

Work from Office

About the Role: OSTTRA India The RoleSite Reliability Engineer The Team SRE is a global team that provides technical support across the suite of OSTTRA products. The SRE team works closely with a highly competent Technical Operation Centre (TOC), Development and Infrastructure teams to deliver proactive tasks to improve the supportability of our platforms. Our work helps to ensure that OSTTRA provides a high-quality service and maintains client satisfaction. The Impact Together, we build, support, protect and manage high-performance, resilient platforms that process more than 100 million messages a day. Our services are vital to automated trade processing around the globe, managing peak volumes and working with our customers and regulators to ensure the efficient settlement of trades and effective operation of global capital markets. Whats in it for you: OSTTRA is seeking a Site Reliability Engineer professional to join the SRE Team. The role will be specialised into the designated platforms provisioning 2nd line technical support to TOC as well as integration support for our Trade Processing applications. This person will report directly to the regional SRE manager and work closely with an experienced global team to contribute to the quality of our support. You will have 6-10 years experience of roles like Site Reliability Engineer or Application Support with Project Management tasks to meet the needs of our expanding portfolio of Financial Services clients. This role presents an excellent opportunity to be part of an agile team based out of India, collaborating with colleagues across multiple regions globally, with a strong focus on delivering value through self-service. Responsibilities: Your duties will include Capacity Management, Operational Support Design, Audit Preparation, Incident Escalation, Problem Management Engagement, DR Design and Execution and ad hoc High Profile Client Engagement for your designated platform(s) in our full suite of OTC Derivative products and FX for post-trade confirmation processing. You will need to demonstrate excellent communication skills and have a natural ability to learn with a keen interest in technology. You must be a team player and enjoy working in a high-performance collaborative environment with multiple teams. The successful candidate will need to be able to apply strong technical skills and good business knowledge, together with investigative techniques and problem-solving skills to identify gaps and improve overall estate to bring resilience and stability to the platform(s). Liaising with other teams across Product, Development and particularly the infrastructure teams as required for 3rd line escalation. Technical advisory will be required at times by Product and business or clients for solution delivery. Working closely with Development and Infrastructure team, to understand and ensure supportability of platforms and liaising with delivery teams to ensure readiness for new platform releases. Based in our Gurgaon office, you will be responsible for handling, identifying and communicating technical resolutions in English. What Were Looking For: University graduate or equivalent with background of bachelors in computer science Experience or having high motivation in managing the capacity, performance throughput and EOS/EOL of platform from infrastructure to software Experience in troubleshooting of issues, defining supportability, soaking in software development life cycle SDLC process streamlining application delivery from Dev/QA to UAT/Production Good understanding of Site Reliable Engineer as well as Application Support processes, supporting of incidents and execute/design disaster recovery Strong ability to understand application architecture, able to effectively navigate to the problem area, and identify proactive measures around resiliency, recovery design Ability to apply analytical methodology, such as trending, distribution etc., to get insight from application data to help troubleshooting and analysing best approach Ability to understand business workflow and tie to technical implementation Experience in reading and tracing Java, C++, Python and/or scripting languages Experience of databases including SQL scripting, preferably but not limited to Oracle Good to Have: Understanding of networking principles, its practical uses and basic troubleshooting. Possess the understanding of Cloud (AWS, GCP or Azure), PAAS and implementation with Kubernetes, OpenShift, Windows and Linux Experience in handling client issues and expectation management Good understanding of messaging platforms and protocols like XML, XSLT, IBM MQ, AMQ etc Knowledge of financial messaging protocols like FIX, FPmL, TOF etc Experience security protocols related to connectivity encryption utilizing SSL and TLS Have experience of working in the Finance Industry Knowledge of the Financial OTC Derivative and FX products Awareness of Derivatives products and post trade processing (desirable) The LocationGurgaon, India About Company Statement: OSTTRA is a market leader in derivatives post-trade processing, bringing innovation, expertise, processes and networks together to solve the post-trade challenges of global financial markets. OSTTRA operates cross-asset post-trade processing networks, providing a proven suite of Credit Risk, Trade Workflow and Optimization services. Together these solutions streamline post-trade workflows, enabling firms to connect to counterparties and utilities, manage credit risk, reduce operational risk and optimize processing to drive post-trade efficiencies. OSTTRA was formed in 2021 through the combination of four businesses that have been at the heart of post trade evolution and innovation for the last 20+ yearsMarkitServ, Traiana, TriOptima and Reset. These businesses have an exemplary track record of developing and supporting critical market infrastructure and bring together an established community of market participants comprising all trading relationships and paradigms, connected using powerful integration and transformation capabilities. About OSTTRA Candidates should note that OSTTRAis an independentfirm, jointly owned by S&P Global and CME Group. As part of the joint venture, S&P Global providesrecruitmentservices to OSTTRA - however, successful candidates will be interviewed and directly employed by OSTTRA, joiningour global team of more than 1,200 posttrade experts. OSTTRA was formed in 2021 through the combination of four businesses that have been at the heart of post trade evolution and innovation for the last 20+ yearsMarkitServ, Traiana, TriOptima and Reset. OSTTRA is a joint venture, owned 50/50 by S&P Global and CME Group. With an outstanding track record of developing and supporting critical market infrastructure, our combined network connects thousands of market participants to streamline end to end workflows -from trade capture at the point of execution, through portfolio optimization, to clearing and settlement. Joining the OSTTRA team is a unique opportunity to help build a bold new business with an outstanding heritage in financial technology, playing a central role in supporting global financial markets.Learn more atwww.osttra.com. Whats In It For You Benefits: We take care of you, so you cantake care of business. We care about our people. Thats why we provide everything youand your careerneed to thrive at S&P Global. Health & WellnessHealth care coverage designed for the mind and body. Continuous LearningAccess a wealth of resources to grow your career and learn valuable new skills. Invest in Your FutureSecure your financial future through competitive pay, retirement planning, a continuing education program with a company-matched student loan contribution, and financial wellness programs. Family Friendly PerksIts not just about you. S&P Global has perks for your partners and little ones, too, with some best-in class benefits for families. Beyond the BasicsFrom retail discounts to referral incentive awardssmall perks can make a big difference. For more information on benefits by country visithttps://spgbenefits.com/benefit-summaries ----------------------------------------------------------- Equal Opportunity Employer S&P Global is an equal opportunity employer and all qualified candidates will receive consideration for employment without regard to race/ethnicity, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, marital status, military veteran status, unemployment status, or any other status protected by law. Only electronic job submissions will be considered for employment. If you need an accommodation during the application process due to a disability, please send an email to EEO.Compliance@spglobal.com and your request will be forwarded to the appropriate person. US Candidates Only The EEO is the Law Poster http://www.dol.gov/ofccp/regs/compliance/posters/pdf/eeopost.pdf describes discrimination protections under federal law. Pay Transparency Nondiscrimination Provision - https://www.dol.gov/sites/dolgov/files/ofccp/pdf/pay-transp_%20English_formattedESQA508c.pdf ----------------------------------------------------------- 20 - Professional (EEO-2 Job Categories-United States of America), BSMGMT203 - Entry Professional (EEO Job Group)

Posted 1 month ago

Apply

4.0 - 9.0 years

7 - 17 Lacs

Hyderabad, Pune, Bengaluru

Hybrid

Job description Hiring for site reability engineer - AWS /Azure Devops with experience range 3 + years. Mandatory Skills: Java, Kubernetes, AWS/Azure, DevOps/DevSecOps, Monitoring Tools - App Dynamics/ Dynatrace/New Relic, Build and Release, Prometheus, Python, Node.JS-site reability engineer Education: BE/B.Tech/MCA/M.Tech/MSc./MSts

Posted 1 month ago

Apply

6.0 - 10.0 years

0 Lacs

Hyderabad, Bengaluru

Work from Office

Role : Site Reliability Engineer & Azure Experience : 6 to 10 Years Skills : SRE with Azure and OCP/Open Shift cloud platform Location : Bangalore/Hyderabad

Posted 1 month ago

Apply

2.0 - 5.0 years

2 - 5 Lacs

Hyderabad / Secunderabad, Telangana, Telangana, India

Remote

If you are someone who: Wants to understand what it takes to build a scalable, secure, and reliable service. Desires to deepen your technical expertise in all aspects of Site Reliability Engineering, including security, monitoring, automation, development, infrastructure, self-healing, and troubleshooting. Is a go-getter with an ownership mindset. ...then we might be the right team for you! Your Role and Responsibilities As a Site Reliability Engineer, you will be responsible for: Applying a logical, methodical, and analytical approach to isolate and solve technical problems. Communicating and collaborating effectively with other technicians, departments, and customers in technical support situations. Demonstrating and applying extensive knowledge of the company's products. Exercising limited discretion in deviating from standard practice to solve problems within your area of experience. Researching problems and recommending solutions. Assisting in the provision of on-job training. Professionally processing and resolving asset request cases to support proper accounting for site inventory. Maintaining site inventory with zero discrepancies through strict adherence to asset management procedures. Managing inbound and outbound fulfillment to enable the achievement of business deliverables. Ensuring that any asset defects, damages, discrepancies, or deviations are escalated to management for timely support and resolution. Working in shift rotations that may include day, evening, overnight, and/or weekends and holidays. Working in various IBM Cloud locations in Chennai. Required Education Bachelor's Degree Required Technical and Professional Expertise 2+ Years of experience including: Physical server hardware experience (assembling servers including motherboards, RAM, hard drives, RAID controllers, network cards, etc.). OS experience is a plus, with an emphasis on physical server hardware exposure. Scheduling and performing hardware maintenances for IBM Cloud customers involving upgrades, downgrades, support requests, etc. This includes physical hardware upgrades/downgrades referring to server hardware like RAM, hard drives, processors, and network cards. Troubleshooting and resolving problems with basic physical network cable/device connections at the server/switch/stack for network devices in the Data Center physically. 100% onsite experience working in a physical Data Center (no remote support). 24/7 Operations with 100% onsite support (no remote or on-call support). This involves a rotating shift schedule with no specific shifts. Responding to UIP-related events around physical infrastructure. Coordinating with internal departments to resolve outage events related to faulty links, optics, or failed networking devices. Possessing physical infrastructure server knowledge along with a basic understanding of network devices (like network physical cabling, optics, and interconnectivities) to extend onsite support for remote network teams and Network ISP personnel. Performing outage events under the supervision and guidance of Site Management. Preferred Technical and Professional Experience May be directed to perform other duties consistent with training and skill levels required for this position. Understanding site capacity utilization and providing assistance with Management on capacity planning.

Posted 1 month ago

Apply

2.0 - 5.0 years

2 - 5 Lacs

Chennai, Tamil Nadu, India

Remote

At IBM, we're transforming technology to an as-a-service model and empowering clients to fully leverage the cloud. With industry leadership in AI, analytics, security, commerce, quantum computing, and unmatched hardware, software design, and industrial research capabilities, we are uniquely positioned to capitalize on the enterprise cloud computing opportunity. We're looking for a Site Reliability Engineer to join our IBM Cloud VPC Observability team. This team is dedicated to ensuring IBM Cloud remains at the forefront of reliable enterprise cloud technology. We build platforms to deliver performance, reliability, and predictability for our customers most demanding workloads at a global scale, with leading efficiency, resiliency, and security. If you are someone who: Wants to understand what it takes to build a scalable, secure, and reliable service. Desires to deepen your technical expertise in all aspects of Site Reliability Engineering, including security, monitoring, automation, development, infrastructure, self-healing, and troubleshooting. Is a go-getter with an ownership mindset. ...then we might be the right team for you! Your Role and Responsibilities As a Site Reliability Engineer, you will be responsible for: Applying a logical, methodical, and analytical approach to isolate and solve technical problems. Communicating and collaborating effectively with other technicians, departments, and customers in technical support situations. Demonstrating and applying extensive knowledge of the company's products. Exercising limited discretion in deviating from standard practice to solve problems within your area of experience. Researching problems and recommending solutions. Assisting in the provision of on-job training. Professionally processing and resolving asset request cases to support proper accounting for site inventory. Maintaining site inventory with zero discrepancies through strict adherence to asset management procedures. Managing inbound and outbound fulfillment to enable the achievement of business deliverables. Ensuring that any asset defects, damages, discrepancies, or deviations are escalated to management for timely support and resolution. Working in shift rotations that may include day, evening, overnight, and/or weekends and holidays. Working in various IBM Cloud locations in Chennai. Required Education Bachelor's Degree Required Technical and Professional Expertise 2+ Years of experience including: Physical server hardware experience (assembling servers including motherboards, RAM, hard drives, RAID controllers, network cards, etc.). OS experience is a plus, with an emphasis on physical server hardware exposure. Scheduling and performing hardware maintenances for IBM Cloud customers involving upgrades, downgrades, support requests, etc. This includes physical hardware upgrades/downgrades referring to server hardware like RAM, hard drives, processors, and network cards. Troubleshooting and resolving problems with basic physical network cable/device connections at the server/switch/stack for network devices in the Data Center physically. 100% onsite experience working in a physical Data Center (no remote support). 24/7 Operations with 100% onsite support (no remote or on-call support). This involves a rotating shift schedule with no specific shifts. Responding to UIP-related events around physical infrastructure. Coordinating with internal departments to resolve outage events related to faulty links, optics, or failed networking devices. Possessing physical infrastructure server knowledge along with a basic understanding of network devices (like network physical cabling, optics, and interconnectivities) to extend onsite support for remote network teams and Network ISP personnel. Performing outage events under the supervision and guidance of Site Management. Preferred Technical and Professional Experience May be directed to perform other duties consistent with training and skill levels required for this position. Understanding site capacity utilization and providing assistance with Management on capacity planning.

Posted 1 month ago

Apply

5.0 - 7.0 years

25 - 40 Lacs

Pune

Work from Office

Our world is transforming, and PTC is leading the way.Our software brings the physical and digital worlds together, enabling companies to improve operations, create better products, and empower people in all aspects of their business. Our people make all the difference in our success. Today, we are a global team of nearly 7,000 and our main objective is to create opportunities for our team members to explore, learn, and grow – all while seeing their ideas come to life and celebrating the differences that make us who we are and the work we do possible. PTC is looking for hands-on engineer, experienced with site reliability and operations , for a leading CAD SaaS solution. As part of your job at PTC, you will: Collaborate with multiple teams, to monitor & observe their cloud-deployed services Implement automated pipelines for deployment into cloud environment Implement monitoring & observability solutions Handle incidents and changes Troubleshoot and resolve production issues Conduct post-mortems Handle security incidents Job requirements: Proven experience working in Cloud DevOps and Site Reliability Engineering Ability to develop observability solutions using DataDog, or ELK, Prometheus and Grafana Great communication skills, written and verbal Strong hands-on skills to support Security in Cloud environment Experience and knowledge in cloud architecture reviews, SaaS processes and handling security incidences Advantage – knowledge and experience with Azure Why PTC? Life at PTC is about more than working with today’s most cutting-edge technologies to transform the physical world. It’s about showing up as you are and working alongside some of today’s most talented industry leaders to transform the world around you. If you share our passion for problem-solving through innovation, you’ll likely become just as passionate about the PTC experience as we are. Are you ready to explore your next career move with us? Website: https://www.ptc.com LinkedIn: https://www.linkedin.com/company/ptcinc/ Facebook Page: https://www.facebook.com/ptc.inc/ Twitter Handle: @LifeatPTC '@PTC Instagram: ptc_inc Hashtag: #lifeatPTC Life at PTC is about more than working with today’s most cutting-edge technologies to transform the physical world. It’s about showing up as you are and working alongside some of today’s most talented industry leaders to transform the world around you. If you share our passion for problem-solving through innovation, you’ll likely become just as passionate about the PTC experience as we are. Are you ready to explore your next career move with us? We respect the privacy rights of individuals and are committed to handling Personal Information responsibly and in accordance with all applicable privacy and data protection laws. Review our Privacy Policy here ."

Posted 1 month ago

Apply

5.0 - 9.0 years

10 - 20 Lacs

Pune, Chennai, Bengaluru

Hybrid

Site Reliability Engineer - Data Center Storage -------------------------------------------------------------------- As a Site Reliability Engineer - Data Center Storage, you will be responsible for: Must haves: Hands-on working knowledge of the command line in Linux systems Understanding of networking, data center infrastructure, and server provisioning and booting Managing your Jira ticket queue and troubleshooting based on logs and alerts Identifying code-related issues in the validation process and creating tickets for the appropriate team to implement a fix Communicating extensively with data center operations teams, working hand-in-hand to resolve both hardware and software issues Proactively monitoring data storage utilization, I/O capacity, and alerts Understanding storage use cases for common virtualization platforms such as VMware or OpenStack Being on call during business hours for storage-related alerts and escalations Maintaining and contributing to technical documentation, troubleshooting manuals, and runbooks. Continuously reviewing, learning, and understanding internal services and tools relevant to our workflows Being familiar with SLI/SLO/SLA concepts, error budgets, and other SRE terminology, and knowing how to design them Experience with continuous deployment using tools like Jenkins, GitHub Actions, Puppet, Ansible, etc. Ability to write automation scripts using Shell or Python Good to have: Basic knowledge of Pure Storage products such as FlashArray and FlashBlade Experience with hardware from vendors such as Cisco, Brocade, and Supermicro Scripting experience in Python or Ansible is desirable Familiarity with automated booting in a Linux environment The IDEAL Data Center Site Reliability Engineer will also have: Education in Computer Science, Information Systems, or Computer Hardware Engineering Excellent interpersonal and teamwork skills Strong written and verbal communication skills Detail-oriented and a well-organized self-starter Open to constructive feedback Strong problem-solving skills, particularly related to server hardware Ability to take ownership of hardware and software issues and see them through to resolution Proven experience as an SRE, DevOps Engineer, or in a similar role

Posted 1 month ago

Apply

5.0 - 8.0 years

25 - 30 Lacs

Noida, Gurugram, Mumbai (All Areas)

Work from Office

We are looking for Devops Engineer position for Noida/Gurgaon/Mumbai location (Hybrid). Responsibilities: - Automation of deployment pipelines. - Support for Engineering and QA environments. - Setup of monitoring and observability. Requirements: - Tools: Jenkins, GitLab, Docker, Terraform, Ansible. - Linux scripting and system management. - Experience in DevOps/SRE roles. - Experience in large-scale migrations preferred.

Posted 1 month ago

Apply
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Featured Companies