Jobs
Interviews

651 Sre Jobs - Page 16

Setup a job Alert
JobPe aggregates results for easy application access, but you actually apply on the job portal directly.

6.0 - 10.0 years

25 - 30 Lacs

Pune

Work from Office

Azure Cloud Migration Expert: An Azure Cloud Migration Expert isresponsible for planning, designing, and executing the migration of on-premises or other Public/Private Cloud Providers hosted applications and infrastructure to the Azure cloud.They ensure seamless transitions, optimization, integrity, and adhere to Azure Well-Architected Framework during and after the migration process. Key Responsibilities: Assessment and Planning: Evaluate existing systems (On-premises, AWS, GCP, etc.), and associated enabling capabilities (identity, security, HA/DR, monitoring, backup/restore, reporting, integrations, etc.). Design and develop comprehensive migration strategies and plans. Evaluate, recommend, and implement 7 Rs cloud migration strategies - rehost, replatform, refactor, repurchase, retire, retain, and relocate. Migration Execution: Manage and execute the migration process, ensuring minimal downtime and data integrity, and using tools like Azure Migrate. Cloud Infrastructure Management: Configure, optimize, and monitor Azure resources, including but not limited to virtual machines, AKS, storage, networking, and other services. Technical Expertise: Provide technical guidance to project teams, troubleshoot issues, and ensure compliance with cloud security best practices. Technical Leadership: Develop, train, and build internal teams with Azure skills and build a practice/Center of Excellence Post-Migration Support: Provide documentation, training, and ongoing support to internal teams and clients. Optimization and Cost Efficiency: Continuously monitor and optimize cloud infrastructure performance and cost-efficiency. Collaboration: Work with cross-functional teams (developers, IT, security, compliance) to ensure seamless integration and alignment. Required Skills: Azure expertise: Proficiency in Azure services, architecture, and best practices. AWS/Public Cloud awareness: Good working understanding of AWS or other public cloud providers. Cloud Architecture and Design: Good understanding of architecting cloud solutions – cloud native design, micro services framework. Cloud Native Skills: In-depth knowledge and experience with technologies like Docker, Kubernetes, Packer Cloud migration tools: Experience with Azure Migrate, Site Recovery, and other relevant tools. Networking and security: Strong understanding of cloud networking, security protocols, and compliance. Scripting and automation: Proficiency in scripting languages (PowerShell, Python) for automating tasks and infrastructure management.Experience in Azure Automation, Azure DevOps. Problem-solving and analytical skills: Ability to diagnose issues, develop solutions, and analyze data. Communication and collaboration: Excellent communication skills for interacting with stakeholders and cross-functional teams. Experience: Minimum 2-3 years of experience in cloud migration projects with Azureor Overall, 5-7 years of experience. Experience with cloud architecture and services, Azure migration, automation and DevOps tools. Experience in security and compliance, observability, monitoring, SIEM, SOAR, SRE.

Posted 1 month ago

Apply

10.0 - 15.0 years

40 - 50 Lacs

Pune, Bengaluru, Mumbai (All Areas)

Work from Office

Mandatory Skills: 1. Linux Administration and Linux Networking 2. Bare Metal 3. Ansible 4. Docker 5. Python/Shell Scripting 6. Git 7. Production Monitoring 8. Availability for US Fixed shifts (8.30 pm to 5.30 am IST) Role & responsibilities Provide technical leadership and mentorship to the Site Reliability Engineering team. Design, implement, and maintain a highly scalable and reliable infrastructure. Collaborate with development, operations, and security teams to define and implement best practices for system reliability, performance, and security. Drive automation and continuous improvement initiatives to streamline operations and enhance the overall reliability of the platform. Lead incident response and troubleshooting efforts for complex technical issues, ensuring timely resolution and minimizing impact on customers. Conduct capacity planning, performance optimization, and scalability assessments to ensure the platform can meet growing demands. Establish and maintain monitoring, alerting, and logging systems to proactively identify and address issues. Define and implement disaster recovery and business continuity strategies to minimize downtime and data loss. Collaborate with development teams to ensure proper release management practices and support seamless deployments. Stay up to date with industry trends, emerging technologies, and best practices in site reliability engineering and cloud infrastructure. Drive the adoption of DevOps and SRE principles throughout the organization, promoting collaboration and cross-functional alignment. Lead post-incident analysis and root cause analysis to identify underlying issues and drive improvements to prevent future incidents. Foster a culture of innovation, continuous learning, and knowledge sharing within the Site Reliability Engineering team. Preferred candidate profile Bachelor's degree in Computer Science, Information Technology, or a related field. Proven experience as a Lead Site Reliability Engineer or similar leadership role, preferably in a cloud- based SaaS environment. Strong knowledge of cloud infrastructure platforms, such as Amazon Web Services (AWS), Microsoft Azure, or Google Cloud Platform (GCP). Experience with containerization technologies like Docker and container orchestration frameworks like Kubernetes. Proficiency in infrastructure as code tools, such as Terraform or CloudFormation. Experience with monitoring and logging tools like Prometheus, Grafana, ELK stack, or Splunk. Solid understanding of networking concepts and experience with load balancers, DNS, and security groups. Strong scripting and automation skills (e.g., Python, Shell scripting) to streamline operational tasks. Familiarity with Agile and DevOps practices and experience working in CI/CD environments. Strong problem-solving, critical thinking, and analytical skills with a focus on continuous improvement. Excellent leadership, communication, and collaboration abilities to effectively lead a team and work with cross-functional stakeholders.

Posted 1 month ago

Apply

10.0 - 15.0 years

12 - 17 Lacs

Noida

Work from Office

Company Overview With 80,000 customers across 150 countries, UKG is the largest U.S.-based private software company in the world. And we're only getting started. Ready to bring your bold ideas and collaborative mindset to an organization that still has so much more to build and achieveRead on. Here, we know that you're more than your work. That's why our benefits help you thrive personally and professionally, from wellness programs and tuition reimbursement to U Choose "” a customizable expense reimbursement program that can be used for more than 200+ needs that best suit you and your family, from student loan repayment, to childcare, to pet insurance. Our inclusive culture, active and engaged employee resource groups, and caring leaders value every voice and support you in doing the best work of your career. If you're passionate about our purpose "” people "”then we can't wait to support whatever gives you purpose. We're united by purpose, inspired by you. Site Reliability Engineers at UKG are team members that have a breadth of knowledge encompassing all aspects of service delivery. They develop software solutions to enhance, harden and support our service delivery processes. This can include building and managing CI/CD deployment pipelines, automated testing, capacity planning, performance analysis, monitoring, alerting, chaos engineering and auto remediation. Site Reliability Engineers must have a passion for learning and evolving with current technology trends. They strive to innovate and are relentless in their pursuit of a flawless customer experience. They have an "automate everything" mindset, helping us bring value to our customers by deploying services with incredible speed, consistency and availability. Primary/Essential Duties and Key Responsibilities: Engage in and improve the lifecycle of services from conception to EOL, includingsystem design consulting, and capacity planning Define and implement standards and best practices related toSystem Architecture, Service delivery, metrics and the automation of operational tasks Support services, product & engineering teams by providing common tooling and frameworks to deliver increased availability and improved incident response. Improve system performance, application delivery and efficiency through automation, process refinement, postmortem reviews, and in-depth configuration analysis Collaborate closely with engineering professionals within the organization to deliver reliable services Identify and eliminate operational toil by treating operational challenges as a software engineering problem Actively participate in incident response, including on-call responsibilities Partner with stakeholders to influence and help drive the best possible technical and business outcomes Guide junior team members and serve as a champion for Site Reliability Engineering Engineering degree, or a related technical discipline, and 10+years of experience in SRE. Experience coding in higher-level languages (e.g., Python, Javascript, C++, or Java) Knowledge of Cloud based applications & Containerization Technologies Demonstrated understanding of best practices in metric generation and collection, log aggregation pipelines, time-series databases, and distributed tracing Ability to analyze current technology utilized and engineering practices within the company and develop steps and processes to improve and expand upon them Working experience with industry standards like Terraform, Ansible. (Experience, Education, Certification, License and Training) Must have hands-on experience working within Engineering or Cloud. Experience with public cloud platforms (e.g. GCP, AWS, Azure) Experience in configuration and maintenance of applications & systems infrastructure.Experience with distributed system design and architecture Experience building and managing CI/CD Pipelines Where we're going UKG is on the cusp of something truly special. Worldwide, we already hold the #1 market share position for workforce management and the #2 position for human capital management. Tens of millions of frontline workers start and end their days with our software, with billions of shifts managed annually through UKG solutions today. Yet it's our AI-powered product portfolio designed to support customers of all sizes, industries, and geographies that will propel us into an even brighter tomorrow! Disability Accommodation UKGCareers@ukg.com

Posted 1 month ago

Apply

10.0 - 15.0 years

12 - 17 Lacs

Pune

Work from Office

Company Overview With 80,000 customers across 150 countries, UKG is the largest U.S.-based private software company in the world. And we're only getting started. Ready to bring your bold ideas and collaborative mindset to an organization that still has so much more to build and achieveRead on. Here, we know that you're more than your work. That's why our benefits help you thrive personally and professionally, from wellness programs and tuition reimbursement to U Choose "” a customizable expense reimbursement program that can be used for more than 200+ needs that best suit you and your family, from student loan repayment, to childcare, to pet insurance. Our inclusive culture, active and engaged employee resource groups, and caring leaders value every voice and support you in doing the best work of your career. If you're passionate about our purpose "” people "”then we can't wait to support whatever gives you purpose. We're united by purpose, inspired by you. Site Reliability Engineers at UKG are team members that have a breadth of knowledge encompassing all aspects of service delivery. They develop software solutions to enhance, harden and support our service delivery processes. This can include building and managing CI/CD deployment pipelines, automated testing, capacity planning, performance analysis, monitoring, alerting, chaos engineering and auto remediation. Site Reliability Engineers must have a passion for learning and evolving with current technology trends. They strive to innovate and are relentless in their pursuit of a flawless customer experience. They have an "automate everything" mindset, helping us bring value to our customers by deploying services with incredible speed, consistency and availability. Primary/Essential Duties and Key Responsibilities: Proficient in Splunk/ELK, and Datadog. Experience with observability tools such as Prometheus/InfluxDB, and Grafana. Possesses strong knowledge of at least one scripting language such as Python, Bash, Powershell or any other relevant languages. Design, develop, and maintain observability tools and infrastructure. Collaborate with other teams to ensure observability best practices are followed. Develop and maintain dashboards and alerts for monitoring system health. Troubleshoot and resolve issues related to observability tools and infrastructure. Engage in and improve the lifecycle of services from conception to EOL, includingsystem design consulting, and capacity planning Define and implement standards and best practices related toSystem Architecture, Service delivery, metrics and the automation of operational tasks Support services, product & engineering teams by providing common tooling and frameworks to deliver increased availability and improved incident response. Improve system performance, application delivery and efficiency through automation, process refinement, postmortem reviews, and in-depth configuration analysis Collaborate closely with engineering professionals within the organization to deliver reliable services Identify and eliminate operational toil by treating operational challenges as a software engineering problem Actively participate in incident response, including on-call responsibilities Partner with stakeholders to influence and help drive the best possible technical and business outcomes Guide junior team members and serve as a champion for Site Reliability Engineering Engineering degree, or a related technical discipline, and 10+years of experience in SRE. Experience coding in higher-level languages (e.g., Python, Javascript, C++, or Java) Knowledge of Cloud based applications & Containerization Technologies Demonstrated understanding of best practices in metric generation and collection, log aggregation pipelines, time-series databases, and distributed tracing Ability to analyze current technology utilized and engineering practices within the company and develop steps and processes to improve and expand upon them Working experience with industry standards like Terraform, Ansible. (Experience, Education, Certification, License and Training) Must have hands-on experience working within Engineering or Cloud. Experience with public cloud platforms (e.g. GCP, AWS, Azure) Experience in configuration and maintenance of applications & systems infrastructure. Experience with distributed system design and architecture Experience building and managing CI/CD Pipelines Where we're going UKG is on the cusp of something truly special. Worldwide, we already hold the #1 market share position for workforce management and the #2 position for human capital management. Tens of millions of frontline workers start and end their days with our software, with billions of shifts managed annually through UKG solutions today. Yet it's our AI-powered product portfolio designed to support customers of all sizes, industries, and geographies that will propel us into an even brighter tomorrow! Disability Accommodation UKGCareers@ukg.com

Posted 1 month ago

Apply

3.0 - 5.0 years

7 - 15 Lacs

Chennai, Bengaluru

Work from Office

Roles and responsibilities .NET Engineer to join our team in India. This Role: We are looking for a motivated, creative and problem-solving engineer to join our team. You will also: Work in a small dynamic team that works full stack to develop software that helps our users solve real-world problems for our clients. Work on multiple technologies and languages to build best-in-breed products Gather requirements and scope out projects with the rest of the team Key Responsibilities Work within a small development team to produce clean, efficient code based on specifications or User Stories. Integrate software components between in-house and third-party applications. Deploy tested code to QA & UAT environments. Support developed applications, troubleshoot, debug and upgrade existing code when required (SREs) Recommend and execute improvements. Create technical documentation for development solutions, future reference and reporting. Collaborate with Architects and other developers to develop the solution design. Estimate user stories and feed estimates back into the Product Backlog. Skills Required: commercial experience in .NET (.Net Framework and .Net Core), C# & Angular. Strong Expertise in Modern Web API Development and Microservices . • Expertise in developing Software adhering to Object Oriented Programming, SOLID, and Design Principles and ability to use proper Design Patterns for a given use case. Experience in working within the full development lifecycle i.e. Development, Unit testing, and Release management. Resourcefulness, attention to detail, and troubleshooting aptitude. Test Driven Development focus, and ability to write Unit tests and Integration tests using tools such as XUnit or NUnit. Knowledge of Test Containers and Code Coverage tools will be an added advantage. Relational Database experience in PostgreSQL, MS SQL Server etc No SQL Database experience such as Mongo, DynamoDB, or Cosmos DB is desirable. Good Development Experience using ORM and data access libraries Solid Azure experience and good application development experience using services such as Azure Web App, Storage Accounts, Key Vault, Front door, Azure PostgreSQL Flexible Server, APIM, Application Insights, and App Registrations. Good expertise in SRE (Site Reliability Engineering) responsibilities (System Observability metrics analysis, querying system logs and troubleshooting) within cloud-native systems, with a solid understanding of best practices and practical experience in implementation. Strong expertise in developing Infrastructure as Code using Terraform and CICD Pipelines development using GitHub Actions. Good understanding of DevSecOps tools – Sonar Cloud (SAST), Dependency Scanning, IAC Security and Secret Management is desirable. Good understanding of System Design, Monolithic, Modular Monolith, Microservices Software Architecture. Solid understanding of Authentication & Authorization (OpenID Connect and OAuth2.0) and ability to implement them using platforms like Microsoft Entra ID, Azure AD B2C, etc . • Extensive Agile experience JIRA • JavaScript, Typescript experience. Docker/Kubernetes experience is a bonus. Key Performance Indicators: Delivery of high quality fully tested software changes within agreed Sprints. Smooth transition of changes from development to production. Compliance with the organization’s control policies and procedures.

Posted 1 month ago

Apply

5.0 - 8.0 years

12 - 16 Lacs

Mangaluru, Udupi

Hybrid

SRE Lead Role Description: We are seeking an experienced SRE Strategist to lead the reliability and operational excellence agenda for our Enterprise Data Platforms spanning GCP cloud-native systems. This strategic leadership role will help instill Google’s SRE principles across diverse data engineering teams, uplift our platform reliability posture, and spearhead the creation of a Centre-of-Excellence (CoE) for SRE. The ideal candidate will possess a deep understanding of modern SRE practices, demonstrate a proven ability to scale SRE capabilities in large enterprises, and evangelise a data-driven approach to resilience engineering. Key Responsibilities: Define and drive SRE strategy for enterprise data platforms on GCP, aligning with business goals and reliability needs. Act as a trusted advisor to platform teams, embedding SRE mindset, best practices, and golden signals into their SDLC and operational processes. Set up and lead a Site Reliability Engineering CoE, delivering reusable tools, runbooks, blueprints, and platform accelerators to scale SRE adoption across the organisation. Partner with product and platform owners to prioritise and structure SRE backlogs, formulate roadmaps, and help teams move from reactive ops to proactive reliability engineering. Define and track SLIs, SLOs, and error budgets across critical data services, enabling data-driven decision making around availability and performance. Drive incident response maturity, including chaos engineering, incident retrospectives, and blameless postmortems. Foster a reliability culture through coaching, workshops, and cross-functional forums. Build strategic relationships across engineering, data governance, security, and architecture teams to ensure reliability is baked in, not bolted on. Required Qualifications: Bachelor's or Master’s degree in Computer Science, Engineering, or related discipline. 3+ years in SRE leadership or SRE strategy roles. Strong familiarity with Google SRE principles and practical experience applying them in complex enterprise settings. Proven track record in establishing and scaling SRE teams. Experience with GCP services like Cloud Build, GCS, CloudSQL, Cloud Functions, and GCP logging & monitoring. Deep experience with observability stacks such as Prometheus, Grafana, Splunk, and GCP native solutions. Skilled in Infrastructure as Code using tools like Terraform, and working knowledge of automation in CI/CD environments. Key Competencies & Skills: Strong leadership, influence without authority, and mentoring capabilities. Hands-on scripting and automation skills in Python, with secondary languages like Go or Java a plus. Familiarity with incident and problem management frameworks in enterprise environments. Ability to define and execute a platform-wide reliability roadmap in alignment with architectural and business objectives. Nice to Have: Exposure to secrets management tools (e.g., HashiCorp Vault). Experience with tracing and APM tools like Google Cloud Trace or Honeycomb. Background in data governance, data pipelines, and security standards for data products.

Posted 1 month ago

Apply

3.0 - 7.0 years

2 - 5 Lacs

Hyderabad

Work from Office

Pyspark SparkSQL SQL and Glue. ii. AWS cloud experience iii. Good understanding of dimensional modelling iv. Good understanding DevOps CloudOps DataOps CI/CD & with a SRE mindset v. Understanding of Lakehouse and DW architecture vi. strong analysis and analytical skills vii. understanding of version control systems specifically Git viii. strong in software engineering APIs Microservices etc. Soft skills i. written and oral communication skills ii. ability to translate business needs to system.

Posted 1 month ago

Apply

2.0 - 7.0 years

11 - 15 Lacs

Pune

Work from Office

: Job TitleL2 Lead Technical Application Support, Associate LocationPune, India Role Description Our organization within Deutsche Bank is Compliance Production Services. We are responsible for providing technical L2 application support for business applications. The Compliance line of business has a current portfolio of 20 applications. The organization is in process of transforming itself using Google Cloud and many new technology offerings. As a L2 Lead Technical Application Support You will provide technical hands-on oversight to several support teams and be actively involved in technical issues resolution across multiple applications. You will also be application working as application lead and will be responsible for technical & operational processes for all application you support. What well offer you , 100% reimbursement under childcare assistance benefit (gender neutral) Sponsorship for Industry relevant certifications and education Accident and Term life Insurance Your key responsibilities Act as application lead , You need to own the responsibilities related technical, process, operational, and people for all applications supported. Provide technical hands-on oversight to several support teams and be actively involved in technical issues across multiple applications. Build up technical subject matter expertise on the applications being supported including business flows, application architecture, and hardware configuration. Maintain documentation, knowledge articles, and runbooks. Assist in the process to approve application code releases change tickets as well as tasks assigned to support to perform. Build and maintain effective and productive relationships with the stakeholders in business, development, infrastructure, and third-party systems / data providers & vendors. Assist in special projects and view them as opportunities to enhance your skillset and develop your growth. These projects can include coding using shell scripting, Python and YAML language for support functions. Your skills and experience Minimum 2 years of experience in providing the hands-on IT support and interacting with applications and end users. Engineering Degree/Post graduation from an accredited college or university with a concentration in Computer Science or IT-related discipline. knowledgeable in cloud products like Google Cloud Platform (GCP) and hybrid applications. Strong understanding of ITIL /SRE/ DEVOPS best practices for supporting a production environment. Working knowledge of Elastic Search, WebLogic, Tomcat, OpenShift, Grafana, and Prometheus, Google Cloud Monitoring. Understanding of Java (J2SE), spring, Hibernate, micro services. Red Hat Enterprise Linux (RHEL) professional skill in searching logs, process commands, start/stop processes, use of OS commands to aid in tasks needed to resolve or investigate issues. Shell scripting knowledge a plus. Understanding of database concepts and exposure in working with oracle and SQL databases. Skills That Will Help You Excel Strong written and oral communication skills, including the ability to communicate technical information to a non-technical audience and good analytical and problem-solving skills. Able to train, coach, and mentor and know where each technique is best applied. Confident working with several programming languages, tools, and technologies, including Infrastructure as Code, with the ability to guide colleagues as to the context where each is useful (preferably Python and Terraform) . Experience with GCP or another public cloud provider to build applications. Experience in an investment bank, financial institution or large corporation using enterprise hardware and software. How well support you . . . . About us and our teams Please visit our company website for further information: https://www.db.com/company/company.htm We strive for a culture in which we are empowered to excel together every day. This includes acting responsibly, thinking commercially, taking initiative and working collaboratively. Together we share and celebrate the successes of our people. Together we are Deutsche Bank Group. We welcome applications from all people and promote a positive, fair and inclusive work environment.

Posted 1 month ago

Apply

10.0 - 16.0 years

35 - 40 Lacs

Pune

Work from Office

: Job TitleLead Engineer, VP LocationPune, India Role Description Our Technology, Data and Innovation (TDI) strategy is focusing on strengthening engineering expertise, introducing an Agile delivery model, as well as modernizing the Bank's IT infrastructure. You will be responsible for end to end delivery of entire engineering solutions to accomplish business goals. You will provide engineering thought leadership across teams, mentor and coach junior engineers and encourage continuous improvement in delivery practices. You will enjoy partnering closely with your clients whilst working within a broader creative, collaborative and innovative team, with a strong desire to make an impact. You will be joining Treasury, Markets & Investments team working on critical applications to support Treasury What well offer you , 100% reimbursement under childcare assistance benefit (gender neutral) Sponsorship for Industry relevant certifications and education Accident and Term life Insurance Your key responsibilities Define the technical Architecture of IT Solutions in line with functional and non-functional requirements following consistent design patterns and best practices. Ensure that the solution design is in sync with TDI target Architecture blueprints and principles, as well as with overarching DB architecture and security standards. Create appropriate technical design documentation and ensure this is kept up-to-date. Provide guidance to the squad members to design, build, test and deliver high quality software solutions in line with business requirements Responsible for all aspects of the solution architecture (i.e. Maintainability, scalability, effective integration with other solutions, usage of shared solutions and components where possible, optimization of the resource consumption etc. ) with the object to meet the appropriate balance between business needs and total cost of ownership Closely collaborate with enterprise architecture to ensure architecture compliance and make sure that any design options are discussed in a timely manner to allow sufficient time for deliberate decision taking Present architecture proposals to relevant forums along with enterprise architect at different levels and drive the process to gain the necessary architecture approvals. Collaborate with relevant technology stakeholders within other squads and across tribes to ensure the cross-squad and cross-tribe solution architecture synchronization and alignment Contribute to definition and enrichment of appropriate design patterns and standards that can be leveraged across squads / tribes Serve as a Counsel to designers and developers and carry out reviews of software designs and high level detailed level design documentation provided by other squad members Lead the technical discussions with CISO, Group Architecture, end to end and control functions for technical queries Contribute to peer level solution architecture reviews Your skills and experience 15+ years of experience in IT industry in Finance domain Deep knowledge of Java, the JVM, object orientation principles Deep knowledge and hands-on experience in Google Cloud, OpenShift, Docker and Kubernetes as well as exposure to a range of modern build tools, such as Maven, Jenkins, etc. Deep knowledge of SQL, relational, no-sql databases Significant development experience of working within an agile environment and using modern engineering practices Using Continuous Integration and Continuous Delivery to ensure that changes can be done quickly and safelyHands-on practical experience in architecture designs and engaging with the organization to build consensus. Strong knowledge of DevOps and SRE best practices and CICD tooling such as GitHub and Terraform. Extensive knowledge in Financial Services environments, preferably in Treasury domain Architecture and design approaches that support rapid, incremental and iterative delivery, such as SOA or Microservices Knowledge in modern Javascript frameworks e.g. React, Angular, HTML5, Bootstrap, Node.js. and of the REST principles and associated technologies Experience of performing Functional Analysis is highly desirable. How You'll Lead: Leading and collaborating across teams Skilled in building productive networks to drive collaboration, re-use and knowledge sharing Effectively communicate complex messages in a clear and concise manner Help create a culture of learning and continuous improvement within your team and beyond Share skills and knowledge in a wide range for topics relating to software delivery Lead, mentor and teach other engineers How well support you . . . . About us and our teams Please visit our company website for further information: https://www.db.com/company/company.htm We strive for a culture in which we are empowered to excel together every day. This includes acting responsibly, thinking commercially, taking initiative and working collaboratively. Together we share and celebrate the successes of our people. Together we are Deutsche Bank Group. We welcome applications from all people and promote a positive, fair and inclusive work environment.

Posted 1 month ago

Apply

11.0 - 19.0 years

35 - 40 Lacs

Pune

Work from Office

: Job Title- ITAO - Directory Services, VP Location- Pune, India Role Description You will be working in the Chief Security Office Identity & Authentication Services with a focus on Directory Service-related products for Deutsche Bank globally. Directory Services own the workforce Identity Providers that consist of Active Directory, Entra Identity and Google Cloud Identity. Although the existing team have coding skills and are deploying a CICD pipeline, our journey to SRE is far from complete, you will bring your experience in this field to help shape our teams SRE future. Security and stability are our core focus on the solutions we provide. What well offer you 100% reimbursement under childcare assistance benefit (gender neutral) Sponsorship for Industry relevant certifications and education Accident and Term life Insurance Your key responsibilities Analyse, Automate, deploy, and maintain Entra Identity related components within our E5 estate. Private Preview investigation and MS product group interactions to shape their final releases to cover DBs business requirements. Perform SRE function(s) - availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning of Entra service(s) CICD Pipelines, workflows, Action, and enhancements (Application integration pipeline for example) Identify and manage the risks and issues associated with the IDP(s) and escalate appropriately. Document associated solutions. Help up-skill existing team re code\SRE. Support internal audits and investigation associated with the IDPs Design & Own solutions from inception to release, including associated documentation. 3rd line support for operational incidents (rare) Your skills and experience Expert knowledge of Entra Identity Expert knowledge of SRE (Site Reliability Engineering) GitHub Workflows Actions CICD pipelines Active Directory experience is very advantageous. Bachelors degree in computer science or comparable and at least more than 3 years hands-on experience working with AAD\EID Experience in cloud security. Advanced\Expert in IaC (infrastructure As Code) Advanced\Expert GitHub PowerShell/Graph/YAML Terraform/Stanzer/HCL MS Power Platform (App-Automate-Logic) experience (would be an advantage) You have experience working in dynamic, structured teams Experience\knowledge in Audit and Regulatory Internal Compliance Experience\knowledge in Infrastructure and Product Logistics Disaster Recovery. Technology Road Map Compliance. Migration and decommission. Infrastructure Configuration and Design. Knowledge of SDLC and Process Management Incident- and Problem Management, Unplanned maintenance. Operational Readiness. Knowledge in Vendor Management Scheduling regular catch-up meeting with vendor Knowledge in Data & Records Management GDPR CCSP, CISSP or other related industry qualifications are advantageous.Language skillsExcellent spoken and written communication skills in English, German language skills welcome. How well support you

Posted 1 month ago

Apply

5.0 - 10.0 years

8 - 18 Lacs

Hyderabad, Ahmedabad

Hybrid

Job Title: Senior DevOps Site Reliability Engineer (SRE) Location: Hyderabad & Ahmedabad Employment Type: Full-Time Work Model - 3 Days from office Job Overview Dynamic, motivated individuals deliver exceptional solutions for the production resiliency of the systems. The role incorporates aspects of software engineering and operations, DevOps skills to come up with efficient ways of managing and operating applications. The role will require a high level of responsibility and accountability to deliver technical solutions. Summary: As a Senior SRE, you will ensure platform reliability, incident management, and performance optimization. You'll define SLIs/SLOs, contribute to robust observability practices, and drive proactive reliability engineering across services. Experience Required: 610 years of SRE or infrastructure engineering experience in cloud-native environments. Mandatory: • Cloud: GCP (GKE, Load Balancing, VPN, IAM) • Observability: Prometheus, Grafana, ELK, Datadog • Containers & Orchestration: Kubernetes, Docker • Incident Management: On-call, RCA, SLIs/SLOs • IaC: Terraform, Helm • Incident Tools: PagerDuty, OpsGenie Nice to Have: • GCP Monitoring, Skywalking • Service Mesh, API Gateway • GCP Spanner, MongoDB (basic) Scope: • Drive operational excellence and platform resilience • Reduce MTTR, increase service availability • Own incident and RCA processes Roles and Responsibilities: Define and measure Service Level Indicators (SLIs), Service Level Objectives (SLOs), and manage error budgets across services. • Lead incident management for critical production issues drive root cause analysis (RCA) and postmortems. • Create and maintain runbooks and standard operating procedures for high[1]availability services. • Design and implement observability frameworks using ELK, Prometheus, and Grafana; drive telemetry adoption. • Coordinate cross-functional war-room sessions during major incidents and maintain response logs. • Develop and improve automated system recovery, alert suppression, and escalation logic. • Use GCP tools like GKE, Cloud Monitoring, and Cloud Armor to improve performance and security posture. • Collaborate with DevOps and Infrastructure teams to build highly available and scalable systems. • Analyze performance metrics and conduct regular reliability reviews with engineering leads. • Participate in capacity planning, failover testing, and resilience architecture reviews

Posted 1 month ago

Apply

2.0 - 6.0 years

8 - 12 Lacs

Pune

Work from Office

Role Purpose The purpose of this role is to provide significant technical expertise in architecture planning and design of the concerned tower (platform, database, middleware, backup etc) as well as managing its day-to-day operations Must have: well versed with Unix Shell Scripting, good in building CI/CD, familiar using Jenkins, Git & Maven Troubleshooting using logs, Splunk / Dynatrace, alert configuration. good knowledge on ITSM Incident, Change and Problem Management, Must be able to extract, modify, update data into Postgres, SQL DB Must have: well versed with Unix Shell Scripting, good in building CI/CD, familiar using Jenkins, Git & Maven Troubleshooting using logs, Splunk / Dynatrace, alert configuration. good knowledge on ITSM Incident, Change and Problem Management, Must be able to extract, modify, update data into Postgres, SQL DB Reinvent your world.We are building a modern Wipro. We are an end-to-end digital transformation partner with the boldest ambitions. To realize them, we need people inspired by reinvention. Of yourself, your career, and your skills. We want to see the constant evolution of our business and our industry. It has always been in our DNA - as the world around us changes, so do we. Join a business powered by purpose and a place that empowers you to design your own reinvention. Come to Wipro. Realize your ambitions. Applications from people with disabilities are explicitly welcome.

Posted 1 month ago

Apply

3.0 - 5.0 years

6 - 11 Lacs

Chennai

Work from Office

Role Purpose The purpose of this role is to work with Application teams and developers to facilitate better coordination amongst operations, development and testing functions by automating and streamlining the integration and deployment processes Do Align and focus on continuous integration (CI) and continuous deployment (CD) of technology in applications Plan and Execute the DevOps pipeline that supports the application life cycle across the DevOps toolchain from planning, coding and building, testing, staging, release, configuration and monitoring Manage the IT infrastructure as per the requirement of the supported software code On-board an application on the DevOps tool and configure it as per the clients need Create user access workflows and provide user access as per the defined process Build and engineer the DevOps tool as per the customization suggested by the client Collaborate with development staff to tackle the coding and scripting needed to connect elements of the code that are required to run the software release with operating systems and production infrastructure Leverage and use tools to automate testing & deployment in a Dev-Ops environment Provide customer support/ service on the DevOps tools Timely support internal & external customers on multiple platforms Resolution of the tickets raised on these tools to be addressed & resolved within a specified TAT Ensure adequate resolution with customer satisfaction Follow escalation matrix/ process as soon as a resolution gets complicated or isnt resolved Troubleshoot and perform root cause analysis of critical/ repeatable issues Deliver No Performance Parameter Measure 1. Continuous Integration,Deployment & Monitoring 100% error free on boarding & implementation 2. CSAT Timely customer resolution as per TAT Zero escalation Mandatory Skills: Site Reliability Engineering (SRE). Experience3-5 Years.

Posted 1 month ago

Apply

5.0 - 8.0 years

5 - 9 Lacs

Hyderabad

Work from Office

Role Purpose The purpose of this role is to work with Application teams and developers to facilitate better coordination amongst operations, development and testing functions by automating and streamlining the integration and deployment processes Do Align and focus on continuous integration (CI) and continuous deployment (CD) of technology in applications Plan and Execute the DevOps pipeline that supports the application life cycle across the DevOps toolchain from planning, coding and building, testing, staging, release, configuration and monitoring Manage the IT infrastructure as per the requirement of the supported software code On-board an application on the DevOps tool and configure it as per the clients need Create user access workflows and provide user access as per the defined process Build and engineer the DevOps tool as per the customization suggested by the client Collaborate with development staff to tackle the coding and scripting needed to connect elements of the code that are required to run the software release with operating systems and production infrastructure Leverage and use tools to automate testing & deployment in a Dev-Ops environment Provide customer support/ service on the DevOps tools Timely support internal & external customers on multiple platforms Resolution of the tickets raised on these tools to be addressed & resolved within a specified TAT Ensure adequate resolution with customer satisfaction Follow escalation matrix/ process as soon as a resolution gets complicated or isnt resolved Troubleshoot and perform root cause analysis of critical/ repeatable issues Deliver No Performance Parameter Measure 1. Continuous Integration,Deployment & Monitoring 100% error free on boarding & implementation 2. CSAT Timely customer resolution as per TAT Zero escalation Mandatory Skills: Site Reliability Engineering (SRE). Experience5-8 Years.

Posted 1 month ago

Apply

5.0 - 10.0 years

5 - 15 Lacs

Hyderabad, Pune, Bengaluru

Work from Office

AMPLE Enterprise Technologies is Hiring for SRE Engineer with Ansible Experience. Exp: 5+ Yrs Skill: SRE + Ansible Location: PAN India NP: Immediate to short Joiners

Posted 1 month ago

Apply

3.0 - 5.0 years

6 - 11 Lacs

Hyderabad

Work from Office

Role Purpose The purpose of this role is to work with Application teams and developers to facilitate better coordination amongst operations, development and testing functions by automating and streamlining the integration and deployment processes Do Align and focus on continuous integration (CI) and continuous deployment (CD) of technology in applications Plan and Execute the DevOps pipeline that supports the application life cycle across the DevOps toolchain from planning, coding and building, testing, staging, release, configuration and monitoring Manage the IT infrastructure as per the requirement of the supported software code On-board an application on the DevOps tool and configure it as per the clients need Create user access workflows and provide user access as per the defined process Build and engineer the DevOps tool as per the customization suggested by the client Collaborate with development staff to tackle the coding and scripting needed to connect elements of the code that are required to run the software release with operating systems and production infrastructure Leverage and use tools to automate testing & deployment in a Dev-Ops environment Provide customer support/ service on the DevOps tools Timely support internal & external customers on multiple platforms Resolution of the tickets raised on these tools to be addressed & resolved within a specified TAT Ensure adequate resolution with customer satisfaction Follow escalation matrix/ process as soon as a resolution gets complicated or isnt resolved Troubleshoot and perform root cause analysis of critical/ repeatable issues Deliver No Performance Parameter Measure 1. Continuous Integration,Deployment & Monitoring 100% error free on boarding & implementation 2. CSAT Timely customer resolution as per TAT Zero escalation Mandatory Skills: Site Reliability Engineering (SRE). Experience3-5 Years.

Posted 1 month ago

Apply

8.0 - 12.0 years

18 - 25 Lacs

Hyderabad, Pune, Bengaluru

Hybrid

About Job: We are hiring a talented Core GCP SRE/IAC professional to join our team. If youre excited to be part of a winning team, Zensar is a great place to grow your career. You’ll be glad you make the right choice to join us. As a Core GCP SRE/IAC in the Global Architecture team, you’re at the core of one of the largest services in our global organization. As part of a small group of technology experts, you will identify, recommend, and implement innovative solutions and best practices to continuously improve, expand and protect our core server and systems supporting multiple large services and all our associates, worldwide Essential duties and responsibilities: Experience -8 to 12 yrs Notice Period- Immediate Joiner Only Mode - WFO Location - Pune Interested and relevant can share resume on cheana.kashid@zensar.com Required Skills (Must Have and should meet all the below standards for qualifying to this role • 8 -12 years of experience of experience in Cloud (private public deployments) engineering and support roles • Bachelor’s degree in Computer Science, Engineering, or a related field. • Experience with monitoring and observability tools (e.g., Prometheus, Grafana, Solarwind) IAC – Infrastructure as a code ,Helm Chart. Certification – GCP and Kubernetes • Strong expertise in scripting and automation using Python, Bash, or similar languages. • Proficiency with infrastructure as code ( Terraform) and container orchestration tools ( Kubernetes). • Experience in building and managing CI/CD pipelines. • Strong knowledge of cloud platforms (e.g., AWS, Azure, Google Cloud) and Linux/Unix systems.

Posted 1 month ago

Apply

3.0 - 5.0 years

13 - 21 Lacs

Pune

Work from Office

Overview We are seeking a DevOps Engineer II to join the Critical Start Technologies Private Ltd. team, operating under the Critical Start umbrella, for our India operations. The ideal candidate brings 3-5 years of hands-on experience in cloud-native infrastructure, CI/CD automation, and Infrastructure as Code. You bring advanced skills in AWS and Terraform, a strong understanding of scalable systems, and a mindset geared toward security, resilience, and automation-first practices. The ideal candidate has worked in complex environments with microservices, container orchestration, and multi-account AWS structures. You take pride in building robust DevOps pipelines and actively contribute to architectural and operational decisions. Experience leading small initiatives or mentoring junior engineers is a plus. Responsibilities As a DevOps Engineer II, you will be a technical contributor and enabler for scalable infrastructure delivery and automation practices. Your role involves: Owning and improving the infrastructure codebase : maintaining reusable and modular Terraform configurations, setting standards for code structure, and contributing to design documentation. Building and evolving CI/CD pipelines : designing resilient and secure build/deploy pipelines using GitHub Actions, AWS CodePipeline, or equivalent. Monitoring and Observability : developing dashboards and proactive alerting with CloudWatch, Prometheus, or New Relic to ensure high availability and quick recovery. Infrastructure Security and Compliance : implementing IAM best practices, Secrets Manager, least privilege policies, and conducting periodic audits. Optimizing cloud spend and performance through rightsizing, auto-scaling, and cost monitoring strategies. Collaborating closely with development, QA, and security teams to support full software delivery lifecycle from development through production. Participating in incident response and postmortem analysis. Qualifications Required Qualifications: 3-5 years of professional experience in DevOps, SRE, or Cloud Engineering roles. Advanced Terraform experience, including custom module design, remote state management, and backend locking. Deep knowledge of AWS services (VPC, IAM, ECS/Fargate, EC2, RDS, ALB/NLB, S3, CloudWatch, Secrets Manager, etc.). Strong background in Linux systems administration, including networking and performance tuning. Proven expertise in Docker, ECS/EKS, and secure image lifecycle. Strong scripting and automation skills using Bash, Python, or Go. Experience with GitOps, infrastructure promotion strategies, and artifact management. Familiarity with log aggregation and tracing (e.g., Fluentd, Open Telemetry, Sentry). Exposure to infrastructure testing frameworks (e.g., Terratest, InSpec). Excellent communication and cross-functional collaboration skills. Bachelor’s or Master’s degree in Computer Science or related field. Preferred Qualifications: Additional scripting experience is a strong plus. Knowledge of security and compliance frameworks like SOC2, CIS Benchmarks, or ISO 27001 is a plus. Experience working in regulated environments or with customer-facing infrastructure. Contributions to open-source infrastructure tools or Terraform modules. Exposure to Kubernetes or hybrid cloud platforms. Experience with IaC scanning tools like Checkov, tfsec, or Bridgecrew.

Posted 1 month ago

Apply

10.0 - 15.0 years

14 - 19 Lacs

Hyderabad

Work from Office

Job Area: Engineering Group, Engineering Group > Software Engineering General Summary: Job Summary: Qualcomm is seeking a seasoned Staff Engineer, DevOps to join our central software engineering team. In this role, you will lead the design, development, and deployment of scalable cloud-native and hybrid infrastructure solutions, modernize legacy systems, and drive DevOps best practices across products. This is a hands-on architectural role ideal for someone who thrives in a fast-paced, innovation-driven environment and is passionate about building resilient, secure, and efficient platforms. Key Responsibilities: Architect and implement enterprise-grade AWS cloud solutions for Qualcomm’s software platforms. Design and implement CI/CD pipelines using Jenkins, GitHub Actions, and Terraform to enable rapid and reliable software delivery. Develop reusable Terraform modules and automation scripts to support scalable infrastructure provisioning. Drive observability initiatives using Prometheus, Grafana, Fluentd, OpenTelemetry, and Splunk to ensure system reliability and performance. Collaborate with software development teams to embed DevOps practices into the SDLC and ensure seamless deployment and operations. Provide mentorship and technical leadership to junior engineers and cross-functional teams. Manage hybrid environments, including on-prem infrastructure and Kubernetes workloads supporting both Linux and Windows. Lead incident response, root cause analysis, and continuous improvement of SLIs for mission-critical systems. Drive toil reduction and automation using scripting or programming languages such as PowerShell, Bash, Python, or Go. Independently drive and implement DevOps/cloud initiatives in collaboration with key stakeholders. Understand software development designs and compilation/deployment flows for .NET, Angular, and Java-based applications to align infrastructure and CI/CD strategies with application architecture. Required Qualifications: 10+ years of experience in IT or software development, with at least 5 years in cloud architecture and DevOps roles. Strong foundational knowledge of infrastructure components such as networking, servers, operating systems, DNS, Active Directory, and LDAP. Deep expertise in AWS services including EKS, RDS, MSK, CloudFront, S3, and OpenSearch. Hands-on experience with Kubernetes, Docker, containerd, and microservices orchestration. Proficiency in Infrastructure as Code using Terraform and configuration management tools like Ansible and Chef. Experience with observability tools and telemetry pipelines (Grafana, Prometheus, Fluentd, OpenTelemetry, Splunk). Experience with agent-based monitoring tools such as SCOM and Datadog. Solid scripting skills in Python, Bash, and PowerShell. Familiarity with enterprise-grade web services (IIS, Apache, Nginx) and load balancing solutions. Excellent communication and leadership skills with experience mentoring and collaborating across teams. Preferred Qualifications: Experience with api gateway solutions for API security and management. Knowledge on RDBMS, preferably MSSQL/Postgresql is good to have. Proficiency in SRE principles including SLIs, SLOs, SLAs, error budgets, chaos engineering, and toil reduction. Experience in core software development (e.g., Java, .NET). Exposure to Azure cloud and hybrid cloud strategies. Bachelor’s degree in Computer Science or a related field Minimum Qualifications: Bachelor's degree in Engineering, Information Systems, Computer Science, or related field and 4+ years of Software Engineering or related work experience. OR Master's degree in Engineering, Information Systems, Computer Science, or related field and 3+ years of Software Engineering or related work experience. OR PhD in Engineering, Information Systems, Computer Science, or related field and 2+ years of Software Engineering or related work experience. 2+ years of work experience with Programming Language such as C, C++, Java, Python, etc.

Posted 1 month ago

Apply

6.0 - 10.0 years

12 - 18 Lacs

Hyderabad, Ahmedabad

Hybrid

Job Title: Senior DevOps Site Reliability Engineer (SRE) Location: Hyderabad & Ahmedabad Employment Type: Full-Time Work Model - 3 Days from office Job Overview Dynamic, motivated individuals deliver exceptional solutions for the production resiliency of the systems. The role incorporates aspects of software engineering and operations, DevOps skills to come up with efficient ways of managing and operating applications. The role will require a high level of responsibility and accountability to deliver technical solutions. Summary: As a Senior SRE, you will ensure platform reliability, incident management, and performance optimization. You'll define SLIs/SLOs, contribute to robust observability practices, and drive proactive reliability engineering across services. Experience Required: 6-10 years of SRE or infrastructure engineering experience in cloud-native environments. Mandatory: Cloud: GCP (GKE, Load Balancing, VPN, IAM) Observability: Prometheus, Grafana, ELK, Datadog Containers & Orchestration: Kubernetes, Docker Incident Management: On-call, RCA, SLIs/SLOs IaC: Terraform, Helm Incident Tools: PagerDuty, OpsGenie Nice to Have: GCP Monitoring, Skywalking Service Mesh, API Gateway GCP Spanner, Scope: Drive operational excellence and platform resilience Reduce MTTR, increase service availability Own incident and RCA processes Roles and Responsibilities: Define and measure Service Level Indicators (SLIs), Service Level Objectives (SLOs), and manage error budgets across services. Lead incident management for critical production issues drive Root Cause Analysis (RCA) and postmortems. Create and maintain runbooks and standard operating procedures for high availability services. Design and implement observability frameworks using ELK, Prometheus, and Grafana; drive telemetry adoption. Coordinate cross-functional war-room sessions during major incidents and maintain response logs. Develop and improve automated System Recovery, Alert Suppression, and Escalation logic. Use GCP tools like GKE, Cloud Monitoring, and Cloud Armor to improve performance and security posture. Collaborate with DevOps and Infrastructure teams to build highly available and scalable systems. Analyze performance metrics and conduct regular reliability reviews with engineering leads. Participate in capacity planning, failover testing, and resilience architecture reviews.Role & responsibilities Interested candidates reach out to : Anjitha.jr@acesoftlabs.com IT Recruiter

Posted 1 month ago

Apply

6.0 - 10.0 years

12 - 18 Lacs

Hyderabad, Ahmedabad

Hybrid

Job Title: Senior DevOps Site Reliability Engineer (SRE) Location: Hyderabad & Ahmedabad Employment Type: Full-Time Work Model - 3 Days from office Job Overview Dynamic, motivated individuals deliver exceptional solutions for the production resiliency of the systems. The role incorporates aspects of software engineering and operations, DevOps skills to come up with efficient ways of managing and operating applications. The role will require a high level of responsibility and accountability to deliver technical solutions. Summary: As a Senior SRE, you will ensure platform reliability, incident management, and performance optimization. You'll define SLIs/SLOs, contribute to robust observability practices, and drive proactive reliability engineering across services. Experience Required: 6-10 years of SRE or infrastructure engineering experience in cloud-native environments. Mandatory: Cloud: GCP (GKE, Load Balancing, VPN, IAM) Observability: Prometheus, Grafana, ELK, Datadog Containers & Orchestration: Kubernetes, Docker Incident Management: On-call, RCA, SLIs/SLOs IaC: Terraform, Helm Incident Tools: PagerDuty, OpsGenie Nice to Have: GCP Monitoring, Skywalking Service Mesh, API Gateway GCP Spanner, Scope: Drive operational excellence and platform resilience Reduce MTTR, increase service availability Own incident and RCA processes Roles and Responsibilities: Define and measure Service Level Indicators (SLIs), Service Level Objectives (SLOs), and manage error budgets across services. Lead incident management for critical production issues drive Root Cause Analysis (RCA) and postmortems. Create and maintain runbooks and standard operating procedures for high availability services. Design and implement observability frameworks using ELK, Prometheus, and Grafana; drive telemetry adoption. Coordinate cross-functional war-room sessions during major incidents and maintain response logs. Develop and improve automated System Recovery, Alert Suppression, and Escalation logic. Use GCP tools like GKE, Cloud Monitoring, and Cloud Armor to improve performance and security posture. Collaborate with DevOps and Infrastructure teams to build highly available and scalable systems. Analyze performance metrics and conduct regular reliability reviews with engineering leads. Participate in capacity planning, failover testing, and resilience architecture reviews.Role & responsibilities Interested candidates reach out to nithinsai.n@acesoftlabs.com ph no : 7702051201 IT RECRUITER

Posted 1 month ago

Apply

6.0 - 10.0 years

12 - 18 Lacs

Hyderabad, Ahmedabad

Hybrid

Job Title: Senior DevOps Site Reliability Engineer (SRE) Location: Hyderabad & Ahmedabad Employment Type: Full-Time Work Model - 3 Days from office Job Overview Dynamic, motivated individuals deliver exceptional solutions for the production resiliency of the systems. The role incorporates aspects of software engineering and operations, DevOps skills to come up with efficient ways of managing and operating applications. The role will require a high level of responsibility and accountability to deliver technical solutions. Summary: As a Senior SRE, you will ensure platform reliability, incident management, and performance optimization. You'll define SLIs/SLOs, contribute to robust observability practices, and drive proactive reliability engineering across services. Experience Required: 6-10 years of SRE or infrastructure engineering experience in cloud-native environments. Mandatory: Cloud: GCP (GKE, Load Balancing, VPN, IAM) Observability: Prometheus, Grafana, ELK, Datadog Containers & Orchestration: Kubernetes, Docker Incident Management: On-call, RCA, SLIs/SLOs IaC: Terraform, Helm Incident Tools: PagerDuty, OpsGenie Nice to Have: GCP Monitoring, Skywalking Service Mesh, API Gateway GCP Spanner, Scope: Drive operational excellence and platform resilience Reduce MTTR, increase service availability Own incident and RCA processes Roles and Responsibilities: Define and measure Service Level Indicators (SLIs), Service Level Objectives (SLOs), and manage error budgets across services. Lead incident management for critical production issues drive Root Cause Analysis (RCA) and postmortems. Create and maintain runbooks and standard operating procedures for high availability services. Design and implement observability frameworks using ELK, Prometheus, and Grafana; drive telemetry adoption. Coordinate cross-functional war-room sessions during major incidents and maintain response logs. Develop and improve automated System Recovery, Alert Suppression, and Escalation logic. Use GCP tools like GKE, Cloud Monitoring, and Cloud Armor to improve performance and security posture. Collaborate with DevOps and Infrastructure teams to build highly available and scalable systems. Analyze performance metrics and conduct regular reliability reviews with engineering leads. Participate in capacity planning, failover testing, and resilience architecture reviews.Role & responsibilities Interested candidates reach out to : akram.m@acesoftlabs.com ph no : 6387195529 IT Recruiter

Posted 1 month ago

Apply

5.0 - 10.0 years

9 - 13 Lacs

Bengaluru

Work from Office

Project Role : Service Management Lead Project Role Description : Lead the delivery of programs, projects or managed services. Coordinate projects through contract management and shared service coordination. Develop and maintain relationships with key stakeholders and sponsors to ensure high levels of commitment and enable strategic agenda Must have skills : Site Reliability Engineering Good to have skills : NAMinimum 5 year(s) of experience is required Educational Qualification : 15 years full time educationSRE Automation EngineerLevel 9 Experience:Minimum 12 years on Infrastructure & Automation as an SRE for a multi-Cloud Environment. Expectation:Candidate should have technical exposure in public cloud AWS & Azure, DevOps, Microservices and Coding. An Architect with extensive technical expertise to work with customers' complex business problems. Candidate should have extensive exposure on Infrastructure in cloud with IAC, Python, PowerShell, Ansible, GitHub, Jenkins, Terraform, JSON, Puppet etc. A candidate having more than 12+ years of experience in Infrastructure which includes design, build and deployment in DataCenter services and Cloud. Well familiar with Coding, design, build and deployment with CI/CD Pipelines. Should have at least two end-to-end project design, implementation and support as an SRE. Should be well familiar on identifying opportunities which includes technical debt, reducing waste and coding techniques specially on IAC. Knowledge on SPLUNK is an added advantage. Should have working knowledge on any Observability tool and other enterprise monitoring tools is an added plus. Certifications:One Data Center technologies and Cloud. Desired exposure:Exposure and hands-on exposure on Publc Cloud Specially on Infrastructure. Should be well versed with Monitoring, Observability and other enterprise management tools. Extensive exposure on coding using python. Powershell, Ansible, Jenkins, Terraform etc. Exposure as an hands-on SRE atleast for 5+ years. Must Haves Skills: Exposure to automation Specially IAC, building Pipelines in public cloud, Deployments, ARM & other templates. Strong knowledge on Coding Python, Powershell, Ansible, Jenkins, Terraform, Git, JSON , Puppet etc. Strong in SRE knowledge and exposure specially identifying toils, techdebt, reducing waste etc. Must have exposure on SPLUNK and other enterprise management toolsExposure to Observability tools and frameworkIaaS/PaaS products - Support for Containers and Cloud Native StackLateral and Logical Troubleshooting as Cloud admin. Complete understand of Cloud Network topologyDocker- Design/Built/Deployment At least 2 years of technical exposureCI CD exposure with Full end-to-end DevOps life cycle experience.Responsibilities:Exposure as an SRE with strong coding background to automating Toils. Troubleshooting, health check, administration, management, vendor coordination, interaction with external partner, elevation to stakeholders for support or application teams for application development related issues (bug, code maintenance, code evolution)Capacity monitoring; monitoring; application availability managements & monitoring, reporting and maintenance activities (if documented)Work on reduction of repeated failures; generate reports, dashboardsPerformance review:performance management, tuning, fix issues, work on reduction of repeated failures, scripts, automationGenerate reports, dashboards, deploy agents Monitor Docker Envelops, maintain Dockers imagesWork on reduction of repeated failures; generate reports, dashboardsSupporting Compliance requirementsProfessional Skills: - Must To Have Skills: Proficiency in Site Reliability Engineering.- Excellent communication and relationship-building skills.- Ability to lead and motivate teams to achieve project goals. Additional Information:- The candidate should have a minimum of 12 years of experience in Site Reliability Engineering.- This position is based at our Bengaluru office.- A 15 years full-time education is required. Summary :As a Service Management Lead, you will lead the delivery of programs, projects, or managed services. Coordinate projects through contract management and shared service coordination. Develop and maintain relationships with key stakeholders and sponsors to ensure high levels of commitment and enable strategic agenda. Your day will involve overseeing project delivery, managing contracts, and fostering stakeholder relationships. Roles & Responsibilities:- Expected to be an SME- Collaborate and manage the team to perform- Responsible for team decisions- Engage with multiple teams and contribute on key decisions- Provide solutions to problems for their immediate team and across multiple teams- Lead project delivery and ensure successful outcomes- Manage contract negotiations and agreements- Develop and maintain relationships with key stakeholders Professional & Technical Skills: - Must To Have Skills: Proficiency in Site Reliability Engineering- Strong understanding of IT service management principles- Experience in project management methodologies- Knowledge of contract management and negotiation- Good To Have Skills: Experience with cloud technologies Additional Information:- The candidate should have a minimum of 5 years of experience in Site Reliability Engineering- This position is based at our Bengaluru office- A 15 years full-time education is required Qualification 15 years full time education

Posted 1 month ago

Apply

4.0 - 9.0 years

15 - 30 Lacs

Chennai

Hybrid

ACV Auctions is looking for an experienced Site Reliability Engineer III with a systems and software engineering background to focus on site reliability. We believe in taking a software engineers approach to operations by providing standards and software tools to all engineering projects. As a Site Reliability Engineer, you will split your time between developing software that improves overall reliability and providing operational support for production systems. What you will do: Maintain reliability and performance for your particular infrastructure area while working with software engineers to improve service quality and health. Develop, design, and review new software tools in Python & Java to improve infrastructure reliability and provide services with better monitoring, automation, and product delivery. Practice efficient incident response through on-call rotations alongside software engineers and document incidents through postmortems. Support service development with capacity plans, launch/deployment plans, scalable system design, and monitoring plans. What you will need: BS degree in Computer Science or a related technical discipline or equivalent practical experience. Experience building/managing infrastructure deployments on Google Cloud Platform 3+ years managing cloud infrastructure. Experience programming in at least one of the following: Python or Java You are experienced in Linux/Unix systems administration, configuration management, monitoring, and troubleshooting. You are comfortable with production systems including load balancing, distributed systems, microservice architecture, service meshes, and continuous delivery. Experience building and delivering software tools for monitoring, management, and automation that support production systems. Comfortable working with teams across multiple time -zones and working flexible hours as needed. Preferred Qualifications Experience maintaining and scaling Kubernetes clusters for production workloads is a plus

Posted 1 month ago

Apply

7.0 - 12.0 years

15 - 27 Lacs

Indore, Pune

Work from Office

Greetings of the Day !!! We have job opening for DevOps Lead ( SRE) For one of our client . If your profile matching the requirement , please share update resume . Lead DevOps Engineer Note :Only immediate joiner Looking for resource who are hands on experience rather doing any kind of lead work Should have 7 9 years of hands on experience on technology mentioned in the JD. (Specifically in Google Cloud and GitHub Actions) Should be flexible in working hours, as it is American project so client may ask to work till 10 11 PM (IST) Detailed JD- Senior DevOps Engineer Location: Indore, Pune work from office. Job Summary: We are seeking an experienced and enthusiastic Senior DevOps Engineer with 7+ years of dedicated experience to join our growing team. In this pivotal role, you will be instrumental in designing, implementing, and maintaining our continuous integration, continuous delivery (CI/CD) pipelines, and infrastructure automation. You will champion DevOps best practices, optimize our cloud-native environments, and ensure the reliability, scalability, and security of our systems. This role demands deep technical expertise, an initiative-taking mindset, and a strong commitment to operational excellence. Key Responsibilities: CI/CD Pipeline Management: Design, build, and maintain robust and automated CI/CD pipelines using GitHub Actions to ensure efficient and reliable software delivery from code commit to production deployment. Infrastructure Automation: Develop and manage infrastructure as code (IaC) using Shell scripting and GCloud CLI to provision, configure, and manage resources within Google Cloud Platform (GCP) . Deployment Orchestration: Implement and optimize deployment strategies, leveraging GitHub for version control of deployment scripts and configurations, ensuring repeatable and consistent releases. Containerization & Orchestration: Work extensively with Docker for containerizing applications, including building, optimizing, and managing Docker images. Artifact Management: Administer and optimize artifact repositories, specifically Artifactory in GCP , to manage dependencies and build artifacts efficiently. System Reliability & Performance: Monitor, troubleshoot, and optimize the performance, scalability, and reliability of our cloud infrastructure and applications. Collaboration & Documentation: Work closely with development, QA, and operations teams. Utilize Jira for task tracking and Confluence for comprehensive documentation of systems, processes, and best practices. Security & Compliance: Implement and enforce security best practices within the CI/CD pipelines and cloud infrastructure, ensuring compliance with relevant standards. Mentorship & Leadership: Provide technical guidance and mentorship to junior engineers, fostering a culture of learning and continuous improvement within the team. Incident Response: Participate in on-call rotations and provide rapid response to production incidents, perform root cause analysis, and implement preventative measures. Required Skills & Experience (Mandatory - 7+ Years): Proven experience (7+ years) in a DevOps, Site Reliability Engineering (SRE), or similar role. Expert-level proficiency with Git and GitHub , including advanced branching strategies, pull requests, and code reviews. Experience designing and implementing CI/CD pipelines using GitHub Actions. Deep expertise in Google Cloud Platform (GCP) , including compute, networking, storage, and identity services. Advanced proficiency in Shell scripting for automation, system administration, and deployment tasks. Strong firsthand experience with Docker for containerization, image optimization, and container lifecycle management. Solid understanding and practical experience with Artifactory (or similar artifact management tools) in a cloud environment. Expertise in using GCloud CLI for automating GCP resource management and deployments. Demonstrable experience with Continuous Integration (CI) principles and practices. Proficiency with Jira for agile project management and Confluence for knowledge sharing. Strong understanding of networking concepts, security best practices, and system monitoring. Excellent critical thinking skills and an initiative-taking approach to identifying and resolving issues. Nice-to-Have Skills: Experience with Kubernetes (GKE) for container orchestration. Familiarity with other Infrastructure as Code (IaC) tools like Terraform . Experience with monitoring and logging tools such as Prometheus, Grafana, or GCP's Cloud Monitoring/Logging. Proficiency in other scripting or programming languages (e.g., Python, Go) for automation and tool development. Experience with database management in a cloud environment (e.g., Cloud SQL, Firestore). Knowledge of DevSecOps principles and tools for integrating security into the CI/CD pipeline. GCP Professional Cloud DevOps Engineer or other relevant GCP certifications. Experience with large-scale distributed systems and microservices architectures.

Posted 1 month ago

Apply
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Featured Companies