Jobs
Interviews

651 Sre Jobs - Page 14

Setup a job Alert
JobPe aggregates results for easy application access, but you actually apply on the job portal directly.

14.0 - 18.0 years

55 - 60 Lacs

Pune

Hybrid

The Principal, IT Resiliency Architect will be responsible for evaluating and designing complex IT infrastructure solutions across a vast range of technologies within the IT Resiliency Team. The IT Resiliency Team is dedicated to ensuring the resilience and continuity of our operations in the face of unforeseen disasters. The architect must have strong program management skills to work with cross function infrastructure and application owners to assess, design, and implement technical solutions for the Disaster Recovery (DR)program. The architect should be able to troubleshoot, identify root cause, and seek a formal solution with other teams as required. This hands-on role must have experience with data center, networking, security, storage, virtualization, database, and middleware technologies. Responsibilities As an expert, the IT Resiliency Architect will be on a team that will deliver end-to-end technical resiliency solutions to the organization, utilizing the latest technologies and leveraging automation mechanisms for reducing recovery times. The Resiliency Architect should have a solid understanding of program management and leadership skills to engage various teams. Responsibilities for this role include: Infrastructure Architect Independently implements technical solutions and delivers projects end-to-end working autonomously, while providing updates to stakeholders. Performs analysis of complex functional and business requirements. Prepares and delivers solutions for others. Leads design activities, working with architects from various technical teams. Evaluates, develop, and oversee resiliency strategy across development and engineering teams. Advances the DR automation effort of applications and business capabilities. Collaborates closely with teams participating in strategic planning, facilitating cross-team solution reviews, and communicating strategy direction. Maintains an effective technical network across technical SMEs and architects for multiple service areas. Disaster Recovery Framework Documentation: Maintain accurate documentation of disaster recovery plans, procedures, and incident response protocols. Incident Response: Lead disaster recovery efforts in the event of a disruption, coordinating the response and recovery activities. Training and Awareness: Develop and provide training to employees on disaster recovery and business continuity procedures to enhance preparedness. Continuous Improvement: Stay updated on industry best practices, emerging technologies, and evolving threats to continually improve disaster recovery capabilities.

Posted 1 month ago

Apply

4.0 - 9.0 years

10 - 19 Lacs

Bengaluru

Hybrid

Dear candidate, Greetings of the day from Innova solutions. We have a opening for a Site Reliability Engineering (SRE) for Bangalore location (hrbrid) Number of opening - 20 Profile:- Site Reliability Engineering (SRE) Experience:- 3-10 Years Location:- Bangalore (Hybrid) Budget:- Open (Case to case) Interview Mode:- Face to face (Saturday) Interview Location:- Bellandur Timing:- 9- 1 PM JD:- Site Reliability Engineer + Artifactory OR Gitlab + Python + Terraform Need local Candidates (Bangalore based) If you are interested please share your updated cv on reena.gupta@innovasolutions.com Thanks

Posted 1 month ago

Apply

5.0 - 8.0 years

2 - 6 Lacs

Pune

Work from Office

Design, implement, and maintain scalable and reliable compute infrastructure, with a focus on Wintel, Linux, VMWare, and Redhat KVM environments. Collaborate with development teams to ensure applications are designed for reliability and performance across different operating systems and virtualization platforms. Automate repetitive tasks to improve efficiency and reduce manual intervention, specifically within Wintel and Linux systems. Monitor system performance, identify bottlenecks, and implement solutions to improve overall system reliability in VMWare and Redhat KVM environments. Develop and maintain tools for deployment, monitoring, and operations tailored to Wintel, Linux, VMWare, and Redhat KVM. Troubleshoot and resolve issues in development, test, and production environments, focusing on compute-related challenges. Participate in on-call rotations and respond to incidents promptly, ensuring high availability of compute resources. Implement best practices for security, compliance, and data protection within Wintel, Linux, VMWare, and Redhat KVM systems. Document processes, procedures, and system configurations specific to the compute infrastructure Primary Skills Site Reliability Engineer SRE Compute Infrastructure Wintel Administration Linux Administration VMWare Administration Redhat Proficiency in scripting languages Python, Java, C/C++, Bash Infrastructure tools Terraform, Ansible Experience with monitoring and logging tools Prometheus, Grafana, ELK stack Solid understanding of networking, security, and system administration within Wintel and Linux environments. Experience with CI/CD pipelines and tools Jenkins, GitLab CI Knowledge of database management systems MySQL, PostgreSQL

Posted 1 month ago

Apply

5.0 - 8.0 years

3 - 7 Lacs

Bengaluru

Work from Office

Primary Skills Site Reliability Engineer SRE Compute Infrastructure Wintel Administration Linux Administration VMWare Administration Redhat Proficiency in scripting languages Python, Java, C/C++, Bash Infrastructure tools Terraform, Ansible Experience with monitoring and logging tools Prometheus, Grafana, ELK stack Solid understanding of networking, security, and system administration within Wintel and Linux environments. Experience with CI/CD pipelines and tools Jenkins, GitLab CI Knowledge of database management systems MySQL, PostgreSQL Secondary Skills Proven experience as an SRE or similar role, with a focus on compute infrastructure, particularly Wintel, Linux, VMWare, and Redhat KVM. Proficiency in scripting languages (e.g., Python, Java, C/C++, Bash) and infrastructure-as-code tools (e.g., Terraform, Ansible). Experience with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack). Solid understanding of networking, security, and system administration within Wintel and Linux environments. Excellent problem-solving skills and attention to detail. Strong communication and collaboration skills. Experience with CI/CD pipelines and tools (e.g., Jenkins, GitLab CI). Knowledge of database management systems (e.g., MySQL, PostgreSQL). Familiarity with microservices architecture and related technologies. Ability to work in a 24x7 on-call after hour rotation environment. Experience with distributed storage technologies such as NFS, HDFS, Ceph, and Amazon S3, as well as dynamic resource management frameworks (Apache Mesos, Kubernetes, Yarn). Proactive approach identifying complex problems, performance bottlenecks, and areas for improvement. Advocate for DevOps/SRE best practices, leading postmortems/RCAs, incident retrospectives, and operational readiness reviews. Relevant experience with the following products is a plus; ServiceNow, BigFix, Tenable, CrowdStrike, Splunk, and SQL Server.

Posted 1 month ago

Apply

5.0 - 9.0 years

6 - 9 Lacs

Mumbai, New Delhi, Bengaluru

Work from Office

Experience : 5 + years Expected Notice Period : 30 Days Shift : (GMT+05:30) Asia/Kolkata (IST) Opportunity Type : Remote,New Delhi,Bengaluru,Mumbai We are seeking a seasoned DevOps Architect / Senior Engineer with deep expertise in AWS, EKS, Terraform, Infrastructure as Code, and MongoDB Atlas to lead the design, implementation, and management of our cloud-native infrastructure. This is a hands-on leadership role focused on ensuring the scalability, reliability, security, and efficiency of our production-grade systems. Key Responsibilities : Cloud Infrastructure Design & Management (AWS) Architect, build, and manage secure, scalable AWS infrastructure (VPC, EC2, S3, IAM, Security Groups). Implement secure cloud networking and ensure high availability. Monitor, optimize, and troubleshoot AWS environments. Container Orchestration (AWS EKS) Deploy and manage production-ready EKS clusters, including workload deployments, scaling (manual and via Karpenter), monitoring, and security. Maintain CI/CD pipelines for Kubernetes applications. Infrastructure as Code (IaC) Lead development of Terraform-based IaC modules (clean, reusable, and secure). Manage Terraform state and promote best practices (modularization, code reviews). Extend IaC to multi-cloud (Azure, GCP) and leverage CloudFormation or Bicep when needed. Programming, Automation & APIs Develop automation scripts using Python, Bash, or PowerShell. Design, secure, and manage APIs (AWS API Gateway, optionally Azure API Management). Integrate systems/services via APIs and event-driven architecture. Troubleshoot and resolve infrastructure or deployment issues. Database Management Administer MongoDB Atlas: setup, configuration, performance tuning, backup, and security. Implement best practices for high availability and resilience. DevOps Leadership & Strategy Define and promote DevOps best practices across the organization. Automate and streamline development-to-deployment workflows. Mentor junior engineers and foster a culture of technical excellence. Stay ahead of emerging DevOps and Cloud trends. Mandatory Skills : Cloud Administration (AWS) VPC design (subnets, route tables, NAT/IGW, peering). IAM (users, roles, policies with least-privilege enforcement). Deep AWS service knowledge and administrative experience. Container Orchestration (AWS EKS) EKS production-grade cluster setup and upgrades. Workload autoscaling using Karpenter. Logging/Monitoring via Prometheus, Grafana, CloudWatch. Secure EKS practices: RBAC, PSP/PSA, admission controllers, secret management. CI/CD & Kubernetes Experience with Jenkins, GitLab CI, ArgoCD, Flux. Microservices deployment and Kubernetes cluster federation knowledge. Infrastructure as Code Expert in Terraform (HCL, modules, backends, security). Familiarity with CloudFormation, Bicep for cross-cloud support. Git-based version control and CI/CD integration. Automated infrastructure provisioning. Programming & API Proficient in Python, Bash, PowerShell. Secure API design, development, and management. Database Management Proven MongoDB Atlas administration: scaling, backups, alerts, and performance monitoring. Good to Have Skills : Infrastructure & OS Server & Virtualization Management (Linux/Windows). OS Security Hardening & Automation. Disaster Recovery planning and implementation. Docker containerization. Networking & Security Advanced networking (DNS, BGP, routing). Software Defined Networking (SDN), hybrid networking. Zero Trust Architecture. Load balancer (ALB/ELB/NLB) security and WAF management. Compliance: ISO 27001, SOC 2, PCI-DSS. Secrets management (Vault, AWS Secrets Manager). Observability & Automation OpenTelemetry, LangTrace for observability. AI-powered automation (e.g., CrewAI). SIEM/Security monitoring. Cloud Governance Cost optimization strategies. AWS Well-Architected Framework familiarity. Incident response, governance, and compliance management. Qualifications & Experience Bachelor's degree in Computer Science, Engineering, or equivalent practical experience. 5+ years in DevOps / SRE / Cloud Engineering with AWS focus. 5+ years hands-on experience with EKS and Terraform. Proven experience with cloud-native architecture and automation. AWS Certifications (DevOps Engineer Pro, Solutions Architect Pro) preferred. Agile/Scrum experience a plus.

Posted 1 month ago

Apply

0.0 - 3.0 years

1 - 4 Lacs

Hyderabad, Chennai, Bengaluru

Work from Office

Monitoring and Observability Engineer Job title : Monitoring and Observability Engineer Location : Chennai, Hyderabad, Bangalore Experience : 0-3 Job Summary The Monitoring and Observability Engineer designs and implements monitoring solutions to ensure system reliability, performance, and availability. This role focuses on building observability into systems through metrics, logs, and traces. Key Responsibilities Design and deploy monitoring and observability tools across infrastructure and applications. Develop dashboards and alerts to track system health and performance. Implement distributed tracing and log aggregation solutions. Collaborate with development and operations teams to define observability requirements. Analyze monitoring data to identify trends, anomalies, and potential issues. Required Skills Experience with monitoring tools (Prometheus, Grafana, Datadog, New Relic). Knowledge of log management systems (ELK Stack, Splunk, Fluentd). Familiarity with distributed tracing tools (Jaeger, OpenTelemetry). Strong understanding of SRE principles and SLIs/SLOs. Proficiency in scripting and automation for monitoring tasks. Qualifications Bachelors degree in Computer Science, Engineering, or related field. 3+ years of experience in monitoring, observability, or SRE roles. Certifications in observability platforms or cloud monitoring tools are beneficial.

Posted 1 month ago

Apply

9.0 - 14.0 years

50 - 75 Lacs

Bengaluru

Remote

- DevOps, Cloud Infrastructure, or SRE - AWS services (EC2, S3, RDS, Lambda, VPC, IAM, Route 53, CloudFront, etc.). - IaC (Terraform/ CloudFormation/ CDK) - Python, Bash, or Go - Kubernetes, Docker, and microservices architecture

Posted 1 month ago

Apply

5.0 - 8.0 years

15 - 20 Lacs

Bengaluru

Work from Office

Job Description: Devops and Site Reliability Engineer We are seeking a highly skilled and motivated Devops and Site Reliability Engineer to join our team. As a Devops/SRE you will play a crucial role in ensuring the reliability, scalability performance of our systems, troubleshooting, cloud infrastructure management and also developing the tools and applications. You will be responsible for incident management, release management, automation, infrastructure monitoring, POCs, writing new tools and collaborating with cross-functional teams. Requirement : 6 to 8+ years Windows and Linux systems administration 2+ years provisioning, operating, and managing AWS environments. Proficient with AWS services: Compute and Network, Storage and CDN, Database, Analytics, Application Services, Deployment, and Management Experience with multi-tier architectures: load balancers, caching, web servers, application servers, databases, and networking. Familiarity interacting with AWS APIs AWS Disaster Recovery design and deployment across regions a plus Experience in automation and testing via scripting & programming (TFS, PowerShell, Jenkins, Python, Ruby, Java) Self-starter excited to relentlessly solve many technical challenges. Clear written and verbal communication Manage your own time and work well both independently and as part of a team. Required Skills: Bachelor's degree in Computer Science, Engineering, or a related field. Proven experience as a Devops/Site Reliability Engineer or in a similar role, with a focus on high-availability production environments. Strong understanding of cloud computing platforms, such as Amazon Web Services (AWS). Experience with containerization technologies like Docker and orchestration frameworks like Kubernetes. Proficiency in scripting or programming languages, such as Python, Bash, Golang and Angular. Solid understanding of Linux/Unix systems administration and troubleshooting. Familiarity with monitoring and observability tools like Prometheus, Grafana, Elasticsearch, or Splunk. Strong analytical and problem-solving skills, with the ability to diagnose and resolve complex technical issues. Excellent communication and collaboration skills, with the ability to work effectively in cross-functional teams. Knowledge of DevOps principles and practices, including CI/CD pipelines and version control systems (e.g., Git). Must Have: Certified on at least one: AWS DevOps Engineering, AWS Certified Developer – Associate. Preferred Skills: Familiarity with configuration management tools like Stash or SaltStack. Understanding of networking concepts and protocols. Knowledge of security best practices and experience with securing infrastructure and applications. Certification in relevant technologies is a plus. has context menu Roles and Responsibilities Responsibilities: Software, Tools and automation : Identify opportunities for automation and drive the development of tools and frameworks to improve system resiliency, efficiency, and performance. Collaborate with the development and operations teams to implement automation solutions. Implement systems that are highly available, scalable, and self-healing on the AWS platform. Design, manage, and maintain tools to automate operational processes on AWS. Build tools and processes to support the infrastructure. Automate security controls, governance processes, and compliance validation on AWS. Collaboration : Work closely with developers to implement continuous delivery systems & methodologies on AWS. Informally train and share AWS knowledge within the team. Provide training and documentation to enhance team knowledge and capabilities. CICD : Implement CI/CD pipelines for automated software integration and deployment. Infra as a code : Utilize Infrastructure as Code (IaC) for managing AWS resources programmatically. Terraform and Cloudformation is a plus. Process and compliance: Ensure security and compliance standards are met within the AWS environment. Collaborate with cross-functional teams to streamline processes and optimize resources. Cost optimization: Optimize AWS resource usage for cost-effectiveness and performance. Incident Management: Respond to incidents and troubleshoot issues in AWS infrastructure and applications. Act as a key resource in incident management, responding promptly and effectively to incidents to minimize impact. Lead incident resolution efforts, working closely with stakeholders and subject matter experts. Release Management: Manage the planning, coordination, and execution of releases across multiple environments. Ensure smooth release processes, including risk assessment, communication, and rollback strategies. Should be ready to work in a 24*7 shift environment. Infrastructure Monitoring: Set up monitoring and logging solutions for application performance and security. Establish and maintain comprehensive monitoring systems to ensure high availability and performance of applications and services. Proactively identify potential issues and bottlenecks, and work towards their resolution. Define and deploy monitoring, metrics, and logging systems on AWS. Collaboration: Work closely with cross-functional teams, including development, operations, and support, to understand requirements, address issues, and drive continuous improvement. Foster a collaborative and proactive culture within the organization. Incident Post-Mortems: Conduct post-incident analysis and root cause investigations. Identify opportunities for process improvements and work with stakeholders to implement preventive measures. Documentation: Maintain accurate documentation of system configurations, processes, and procedures. Contribute to the knowledge base and provide training and support to team members.

Posted 1 month ago

Apply

8.0 - 13.0 years

15 - 30 Lacs

Pune, Maharashtra, India

On-site

We are seeking a highly skilled Senior Site Reliability Engineer (SRE) to join our team. This role involves ensuring the reliability, scalability, and efficiency of cloud infrastructure and applications while implementing SRE best practices for deployment, monitoring, and automation. As a senior member, you will lead efforts in system reliability, mentor junior engineers, and drive improvements in infrastructure automation. Key Responsibilities: Design, build, and maintain scalable and reliable cloud infrastructure. Ensure System Reliability: Maintain uptime, scalability, and performance across production environments. Monitor & Alerting Setup: Configure real-time monitoring and observability dashboards. Automate Everything: Reduce toil by scripting repetitive tasks, CI/CD, and self-healing mechanisms. Incident Response & RCA: Own on-call rotations, resolve P1/P2 incidents, and create blameless postmortems. Optimize Costs & Performance: Work on cloud cost optimization (FinOps), database tuning, and caching strategies. Security & Compliance: Implement least privilege access, encryption, and vulnerability assessments. Infrastructure as Code (IaC): Deploy and manage infra with Terraform, Ansible, Helm. Capacity Planning & Scaling: Ensure load balancing, horizontal scaling, and traffic routing. Process Documentation: Maintain detailed SOPs, incident response guides, and architecture diagrams. Lead the implementation of CI/CD pipelines for application deployments. Manage and optimize Kubernetes clusters and containerized workloads. Collaborate with development and operations teams to ensure smooth deployment of applications. Troubleshoot and resolve incidents, ensuring minimal downtime for production services. Mentor and provide guidance to junior engineers, fostering a culture of reliability and automation. Required Skills & Qualifications: 7+ years of experience in Site Reliability Engineering (SRE), DevOps, or cloud infrastructure roles. Hands-on experience with cloud platforms (Azure). Strong experience with CI/CD tools (GitHub Actions, Jenkins, or Azure Pipelines). Proficiency in Python, Bash, or PowerShell for automation. Extensive experience with Infrastructure as Code (Terraform). Expertise in monitoring tools such as Datadog. Strong understanding of networking, security, and containerization (Docker, Kubernetes). Proven track record in leading and mentoring teams.

Posted 1 month ago

Apply

5.0 - 10.0 years

10 - 20 Lacs

Hyderabad, Gurugram, Coimbatore

Work from Office

Role: DevOps Engineer Experience : 5+ years Location : Gurgaon, Hyderabad & Coimbatore Key Responsibilities: Ensure zero-downtime deployments across production. Implement custom Helm deployment and rollback strategies. Refactor Terraform modules for simplicity and efficiency. Enforce secure CI/CD practices with tools like GitHub Actions. Migrate secret management to GCP Secret Manager and Kubernetes Secrets. Standardize drift detection and config audits. Lead GKE workload IAM scoping using workload identity. Maintain infrastructure documentation, SOPs, and disaster recovery playbooks. Mentor team members and contribute to DevOps metrics and postmortems. Requirements: 3+ years in DevOps, SRE, or Infrastructure Engineering. Strong experience with Terraform and reusable modules. Hands-on with Kubernetes (GKE preferred). Familiarity with GitHub Actions, Helm, and CI/CD workflows. Knowledge of GCP services like CloudSQL, VPC, IAM. Experience with observability tools, especially Datadog. Strong attention to deployment quality and operational details. Desirable Experience: Exposure to GitOps (ArgoCD/FluxCD). Experience with Kubernetes operators. Understanding of SLIs, SLOs, and structured alerting. Tools & Expectations: Terraform / HCP Terraform Infrastructure as code, state management, and drift detection. GitHub / GitLab / GitHub Actions Secure CI/CD pipeline setup and governance. Helm Application deployment and lifecycle management. Kubernetes / GKE Cluster and workload management. GCP Services – VPC, IAM, CloudSQL integration. Secret Management – Kubernetes Secrets, CSI Driver, GCP Secret Manager. Datadog – Observability and alerting. Cloudflare – DNS, WAF, and exposure configuration. Snyk / SonarQube / Wiz – Code and container security in CI/CD. Interested candidates can share their resume at Neesha1@damcogroup.com

Posted 1 month ago

Apply

2.0 - 7.0 years

9 - 13 Lacs

Noida

Work from Office

Must Have (technical skills) Experience with SRE practices (2yrs) (SLA/SLO/SLI/Error Budget) BA with SNOW and Domain Experience with SNOW & Ansible Tower (1yr) JIRA & Confluence (5yrs) Must Have (Soft skills) Experience with requirement gathering (5+yrs) Experience with non-functional requirement gatherings for systems (1yr) Experience with reporting requirement gathering Managing Stakeholder expectations (5+yrs) Nice To Have skills Understanding AWS & Azure Native Log Monitoring tools Understanding challenges of integrating Cloud and on-prem log traces Mandatory CompeSREtencies BA - BA BA - Domain knowledge Beh - Communication and collaboration DevOps - Ansible BA - Business Knowledge BA - Client Interaction BA - Excel, macros, pivots BA - Presentations and Reports

Posted 1 month ago

Apply

4.0 - 9.0 years

9 - 19 Lacs

Bengaluru

Hybrid

Dear candidate, We are looking SRE ( Site Reliability Engineer) for Bangalore location. Requirement 1: SRE(Artifactory) * GitLab setup & administration * Implement best practices to improve pipeline performance * AWS with Terraform coding * Linux administration & troubleshooting * Strong coding skills in any language (preferably Python) * Familiar with container technologies (Docker / Kubernetes) * Good knowledge of infrastructure and application monitoring (Prometheus / Grafana / Could watch) Requirement 2: SRE(GITLAB) * JFrog Artifactory setup & administration * JFrog XRAY setup & administration * AWS with Terraform coding * Linux administration & troubleshooting * Strong coding skills in any language (preferably Python) * Familiar with container technologies (Docker / Kubernetes) * Good knowledge of infrastructure and application monitoring (Prometheus / Grafana / Could watch) Location:- Bangalore (Whitefield) Work mode:- Hybrid Interview Mode:- Face to face (Saturday, 5th July 2025) If interested, please share your cv at ruchika.gahlawat@innovasolutions.com.

Posted 1 month ago

Apply

0.0 years

12 - 16 Lacs

Pune

Work from Office

: Job TitleSite Reliability Engineer Corporate TitleAssociate LocationPune, India Role Description We are looking for a candidate to join a multi-functional SRE team. You should be having cloud engineering experience in such area acting as the SME on operation automation and monitoring, identifying TOIL within the teams existing systems and processes, recommending, and implementing automated solutions to reduce TOIL and improve the efficiency and effectiveness of the team. What well offer you , 100% reimbursement under childcare assistance benefit (gender neutral) Sponsorship for Industry relevant certifications and education Accident and Term life Insurance Your key responsibilities Working as part of Agile team to define target state infrastructure architecture of applications from reliability standpoint Develop, improve, and maintain internal operations tools, such as deployment, monitoring, statistics, platform management tools, etc. Automation and optimization of application build and deployment process and perform deployment on testing and production environments. CI/CD pipeline setup and management Approach support with a proactive attitude, desire to seek root cause, in-depth analysis, and strive to reduce inefficiencies and manual efforts. Your skills and experience Must have Good knowledge on GCP Hands on in defining and creation of CUJ, SLO, SLI, Error Budgeting based on NFR. Strong Knowledge on IAAC Terraform, GitHub, Docker Images Strong hands on in scripting like Bash, PowerShell, Python, Ansible Good knowledge on containers like Kubernetes Design and implementation of automated workflows Experience of reducing TOIL in an SDLC or IT operations environment Good understanding of SCM ToolsGit, GitHub, SonarQube Having fair understanding of ITSM process Proactive and analytical mindset Nice to have Fair understanding of build and release tools like Maven, Ant, Gradle, Puppet , Jenkins, TeamCity, udeploy Knowledge on Microservices Any programming languages like Java, C#. Understanding on CI/CD pipelines Understanding of architecture and implementation of three tier web applications How well support you . . . . About us and our teams Please visit our company website for further information: https://www.db.com/company/company.htm We strive for a culture in which we are empowered to excel together every day. This includes acting responsibly, thinking commercially, taking initiative and working collaboratively. Together we share and celebrate the successes of our people. Together we are Deutsche Bank Group. We welcome applications from all people and promote a positive, fair and inclusive work environment.

Posted 1 month ago

Apply

4.0 - 9.0 years

8 - 18 Lacs

Bengaluru

Work from Office

* JFrog Artifactory setup & administration * JFrog XRAY setup & administration * AWS with Terraform coding * Linux administration & troubleshooting * Strong coding skills in any language (preferably Python) * Familiar with container technologies (Docker / Kubernetes) * Good knowledge of infrastructure and application monitoring (Prometheus / Grafana / Could watch) GitOps principles, Python development, Shell

Posted 1 month ago

Apply

7.0 - 12.0 years

20 - 30 Lacs

Bengaluru

Hybrid

Bachelor of Science Degree in Computer Science or Engineering, or a related field. 10+ years of experience in cloud operations, cloud networking, cloud security, SRE or software development with a strong focus on automation and cloud infrastructure management. Azure certifications (e.g., Microsoft Certified: Azure DevOps Engineer Expert, Azure Solutions Architect). Experience in agile development environments and working with Cloud Governance, Monitoring, SRE (Site Reliability Engineering) practices. Strong understanding of cloud-native architecture and containerization (AKS, ACR, Helm). Proficient in scripting and automation using PowerShell, Bash, and Azure CLI. Experience with Azure Policy, Blueprints, and management groups for governance. Knowledge of Azure landing zone architecture and enterprise-scale design principles. Familiar with service mesh, API management, and event-driven architecture. Expertise in configuring and optimizing Azure Monitor, Log Analytics, and Application Insights for proactive alerting and observability. Experience integrating logs and metrics with centralized platform Strong knowledge of Azure security controls : NSGs, ASGs, DDoS protection, Sentinel, Defender for Cloud. Experience implementing RBAC , Azure AD Conditional Access , and Just-In-Time VM Access . Familiar with data encryption strategies (encryption at rest/in transit, Key Vault, customer-managed keys). Proven experience leading cross-functional teams and mentoring engineers. Strong stakeholder management and ability to translate technical needs to business value. Skilled in fostering a culture of automation, observability, and operational excellence. Critical thinker and problem-solving skills. Experience in mentoring team members Result Driven and Proactive Good time-management skills. Great interpersonal and communication skills. Azure Database Administrator Associate certification is a plus.

Posted 1 month ago

Apply

2.0 - 4.0 years

14 - 18 Lacs

Pune

Hybrid

Job Description

Posted 1 month ago

Apply

5.0 - 10.0 years

12 - 22 Lacs

Hyderabad

Work from Office

Good communication skill1 #1 Person should be able to talk in large forums. This is very important Application Production support - Hands on experience in Splunk - Hands on experience in building CI CD pipelines – Candidate should have experience in supporting Technical Production support. ( Functional support is not a good fit) Cloud knowledge Good to have scripting skills, container, and Kubernetes knowledge. Technical Skills: Technology: JAVA / .Net framework , C# basics, Splunk, Cloud preferably PCF Experience in Production Deployment using CI/CD pipelines Splunk Query Skills: Ability to write effective Splunk queries for data analysis and monitoring Linux Administration: Experience with Linux Server troubleshooting common failures, health checks and administration tasks Real-Time troubleshooting of critical application workflows and incorporate feedback to product development. Should have good knowledge on splunk , should be able to write queries in splunk and create alerts. Good knowledge of ITSM framework. Should be good in analysis , able to find out solutions without much help. Triage alerts & diagnose/resolve critical issues, manage implementation of changes. Perform root cause analysis of critical incidents/alerts. Initiate and drive the Techlines in case of outages/major incidents/Batch abends and ensure .Service Restoration in the least time possible. Act quickly on the application Alerts and Batch Job failures. Identify manual toil, repetitive issues, and work with stakeholders with improvement plan. Should have basic experience in .net framework , C# to fix bug Able to write basic powershell commands. Knowledge of one or more of Message Brokers such as RabbitMQ, IBM MQ Knowledge of JIRA, confluence and remedy ticketing systems

Posted 1 month ago

Apply

5.0 - 10.0 years

15 - 20 Lacs

Pune

Hybrid

Team: SRE & Operations Duration 12 Months Shift: General Shift 9:00AM- 5:00 PM Location: Pune Interviews: 2 Round YOE: 5-7 Years (4 Relevant ) NOTES: Preffered Immediate joiner or 15 days np Top Skills Splunk - Querys, Dashboard and application creation Grafana dashboard, Prometheus, Data Visualization Open Telemetry, Grafana, Prometheus Dynatrace, Datadog tools are good to have. Some infra knowledge on servers, storage and web application infrastructure.

Posted 1 month ago

Apply

5.0 - 8.0 years

10 - 19 Lacs

Hyderabad, Bengaluru

Work from Office

We are looking for a highly experienced Node.js Backend Developer specialized in building Gen AI-based tools for automating SRE Incident Management. The ideal candidate will have a profound understanding of SRE tools, and the capabilities required to automate and integrate these tools effectively. Key Responsibilities: Design, develop, and maintain scalable, robust backend solutions using Node.js, NestJS, and Azure cloud services. Collaborate with cross-functional teams to create AI-driven tools that automate SRE Incident Management processes. Integrate and automate SRE tools such as ServiceNow, AppDynamics, Science Logic, and App Insights. Develop and extend APIs and event-driven applications to support automation initiatives. Utilize knowledge of MongoDB and/or other SQL databases to ensure seamless data integration and storage management. Stay updated with the latest trends in AI and machine learning, leveraging large language models (LLMs) and prompt engineering where applicable. Document system designs, architectures, and procedures to ensure quality delivery. Troubleshoot and resolve backend issues in a timely manner. Requirements : 6-10 years of professional experience in Node.js backend development. Proven expertise in building APIs and event-based applications using NestJS . Experience with Azure cloud services, including deployment and management. Proficiency in integrating and automating with SRE tools such as ServiceNow, AppDynamics, Science Logic, and App Insights. Strong understanding of MongoDB and/or other SQL databases. Familiarity with Git, CI/CD pipelines, and agile development methodologies. Experience or knowledge in large language models (LLMs) and prompt engineering is a plus. Strong analytical and problem-solving skills. Excellent communication and teamwork abilities

Posted 1 month ago

Apply

3.0 - 5.0 years

5 - 7 Lacs

Kolkata, Mumbai, New Delhi

Work from Office

About the roleWere making big foundational cloud infrastructure changes to make the experience faster, more reliable, and more scalable for our customers workloads This role will be responsible for helping to build, maintain, and operate our new dynamic cloud infrastructure that powers all Firebolt services About the day to dayDesign and implement systematic improvements to Firebolt cloud infrastructure and Engine provisioning services to make it fast, reliable, scalable and cost efficient Collaborate with development teams across the company to improve services reliability, scalability and developer productivity Together with an engineering team, you will share an on-call rotation and be an escalation contact for service and cloud infrastructure incidents

Posted 1 month ago

Apply

10.0 - 12.0 years

6 - 9 Lacs

Chennai

Work from Office

We are looking for a skilled AWS PaaS DevOps Engineer with 10 to 12 years of relevant experience to support, monitor, and optimize the performance of applications related to PCC and MCC websites. The ideal candidate will have a strong background in SRE principles, cloud deployment using AWS, and expertise in DevOps engineering design implementation and ongoing support for multiple highly complex IT Infrastructure environments. Roles and Responsibility Support, monitor, and optimize application performance. Develop architecture and design documents for cloud-based systems. Troubleshoot and administer monitoring tools on UNIX and Windows hosts. Own technical aspects of monitoring tools, including setup, upgrade management, and vulnerability remediation. Implement interfaces to other system management solutions. Manage the lifecycle of applications. Job Good interpersonal and communication skills. Experience in developing architecture and design documents. Familiarity with both Windows and Linux OS environments. ITIL/ITSM exposure is beneficial. Experience with SQL relational Databases. Understanding of protocols such as HTTP, SNMP, SMTP, and SAML. Bachelor's or Master's degree from a reputed institution.

Posted 1 month ago

Apply

18.0 - 24.0 years

12 - 16 Lacs

Noida

Work from Office

We are looking for a skilled SRE/DevOps professional with 18 to 24 years of experience to join our team in Noida, Pune, and Bangalore. The ideal candidate will have a strong background in technical leadership and driving continuous improvement of reliability, stability, and performance of digital platforms. Roles and Responsibility Provide technical and people leadership to SRE, DevOps, Monitoring, and Database Operations teams. Collaborate with leadership on budgeting, planning, hiring, and managing third-party contracts. Oversee project status, assemble project teams, and define assignments with schedules and milestones. Drive continuous improvement of reliability, stability, and performance of digital platforms. Implement automated telemetry, observability, and applied intelligence systems. Lead efforts to develop automated alerting, self-healing mechanisms, and intelligent response systems. Ensure 24/7 uptime of sites and services, with minimal unplanned downtime. Serve as Escalation Manager/Critical Incident Manager during major incidents, leading teams in rapid service restoration. Provide on-call escalation support based on 24/7/365 schedules and communicate timely updates and incident reports to senior leadership. Partner with administrators, platform engineers, and other stakeholders to achieve highly reliable infrastructure, systems, and integrations. Collaborate with product, application development, QA, and technology teams to enhance service reliability and performance. Provide advanced Incident and Problem Management support to effectively diagnose, remediate, and resolve platform issues. Automate critical workflows across the platform to minimize manual errors and reduce human intervention. Implement ITIL processes like Incident, Problem, and Change Management. Design and implement effective monitoring systems with proper alerting and escalation mechanisms for critical events. Ensure timely capacity planning and infrastructure upgrades for optimal reliability. Develop and refine processes to minimize Mean Time to Recover (MTTR) and extend Mean Time to Fail (MTTF). Create and maintain detailed documentation, including run books, incident response guides, post-mortem reports, RCAs, and mitigation plans. Ensure all changes adhere to established procedures and documentation standards. Understand business workflows and map technology solutions to address problems effectively. Lead conversations and provide technical support to both internal and external customers. Job Minimum 18 years of experience in a related field. Strong understanding of technical leadership, automation, and scalability. Experience with monitoring tools and techniques. Excellent communication and collaboration skills. Ability to work in a fast-paced environment and lead cross-functional teams. Strong problem-solving skills and attention to detail. A graduate degree is required.

Posted 1 month ago

Apply

1.0 - 4.0 years

9 - 13 Lacs

Bengaluru

Work from Office

Apply Now Job Title SRE Job Description We're Concentrix The intelligent transformation partner Solution-focused Tech-powered Intelligence-fueled The global technology and services leader that powers the worlds best brands, today and into the future Were solution-focused, tech-powered, intelligence-fueled With unique data and insights, deep industry expertise, and advanced technology solutions, were the intelligent transformation partner that powers a world that works, helping companies become refreshingly simple to work, interact, and transact with We shape new game-changing careers in over 70 countries, attracting the best talent The Concentrix Catalyst team is the driving force behind Concentrixs transformation, data, and technology services We integrate world-class digital engineering, creativity, and a deep understanding of human behavior to find and unlock value through tech-powered and intelligence-fueled experiences We combine human-centered design, powerful data, and strong tech to accelerate transformation at scale You will be surrounded by the best in the world providing market leading technology and insights to modernize and simplify the customer experience Within our professional services team, you will deliver strategic consulting, design, advisory services, market research, and contact center analytics that deliver insights to improve outcomes and value for our clients Hence achieving our vision Our game-changers around the world have devoted their careers to ensuring every relationship is exceptional And were proud to be recognized with awards such as "World's Best Workplaces," ?Best Companies for Career Growth,? and ?Best Company Culture,? year after year Join us and be part of this journey towards greater opportunities and brighter futures 6 years of strong SRE experience along with knowledge of the Core Azure Service, IoT/ Event Hub, Databricks Must have 3 years of experience with Kubernetes and docker Implement and manage monitoring (ELK), alerting, and logging systems to ensure proactive identification and resolution of issues Engage and contribute towards System Monitoring, Incident management, performance tuning and fault finding Must have Python, Powershell scripting experience or any other scripting language Must have effective communication with excellent logic and problem-solving skills and a drive to make a difference Good to have experience with AI/ML Ops, Release Management, CI/CD using tools such as GitHub, Blackduck Hub, Coverity, Container Signing with good understanding on Software configuration Management Ability to understand and communicate customer issues Experience in development and supporting enterprise applications Good written and verbal communication skills with the ability to document and communicate technical information to IT professionals Location: IND Bangalore 55, Divyasree Towers, Bannerghatta Main Road Language Requirements Time Type: Full time If you are a California resident, by submitting your information, you acknowledge that you have read and have access to the Job Applicant Privacy Notice for California Residents Apply Now

Posted 1 month ago

Apply

6.0 - 11.0 years

6 - 11 Lacs

Hyderabad, Telangana, India

On-site

Job Summary Role : Site Reliability Engineer & Azure Experience : 6 to 10 Years Skills : SRE with Azure and OCP/Open Shift cloud platform Location : Bangalore/Hyderabad Role: Software Development - Other Industry Type: IT Services & Consulting Department: Engineering - Software & QA Employment Type: Full Time, Permanent Role Category: Software Development

Posted 1 month ago

Apply

5.0 - 7.0 years

5 - 12 Lacs

Mumbai Suburban, Navi Mumbai, Mumbai (All Areas)

Hybrid

Hi All, We have an urgent opening for the role of SRE Dev ops engineer for one of our leading Investment Banking client in Mumbai location. Exp : 5 to 7 years The open position is for Mumbai location only as the development team is based out of Mumbai. The core skills we are looking for SRE profile having basic Dev ops knowledge , python, K8 , data base. Also the person should know support - monitoring tools like splunk, geneos and dynatrace. Support process understanding , verbal and non verbal communication is also needed. Good to have : Java AI and ML RISK domain knowledge - market , counterparty and credit If interested , please share your resumes to ashwini.shetty@kiya.ai

Posted 1 month ago

Apply
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Featured Companies