Jobs
Interviews

648 Sre Jobs - Page 11

Setup a job Alert
JobPe aggregates results for easy application access, but you actually apply on the job portal directly.

8.0 - 12.0 years

0 Lacs

haryana

On-site

You will be responsible for leading the technology team through crucial infrastructure and operational transformation initiatives. As a strategic thought leader, you will oversee infrastructure and outsourced managed services providers. Your role will involve establishing and executing strategies related to Cloud, DevSecOps, and SRE across the Infrastructure and Applications domain. You will contribute to the design and execution of a multi-cloud infrastructure, including scalable compute, storage, and network systems. This infrastructure will support the organization's transition to cloud and API architecture. An essential aspect of your responsibilities will involve modernizing security measures, with a focus on Cloud, API, authentication, authorization, and DLP technologies. Additionally, you will lead the workplace transformation to adapt to new ways of working and ensure exceptional experiences for end-users regardless of their work location. You will drive the transformation of Application operations towards a DevSecOps model and provide the foundational infrastructure for software-defined cloud-native development. Your role will also entail fostering infrastructure innovation, leveraging the latest trends to create business value in line with the company's requirements. With a background in cloud Infrastructure Engineering/DevOps, you will play a key technical leadership role in defining enterprise standards, technology architecture, and DevOps and Application Security within the CICD Pipeline. You will establish Site Reliability Engineering practices to ensure zero downtime, high scalability, and resilient infrastructure and applications. Implementing practices such as Golden Signals, Blameless Postmortems, RCAs, and Operational Excellence will be crucial to minimizing incidents and driving operational efficiency. Furthermore, you will be responsible for operationalizing and scaling critical initiatives with execution excellence, providing leadership in infrastructure management, network communications, application development, and service management. Collaboration with Service Providers and technology vendors will be essential to ensure SLAs are met and deliver best-in-class end-user experiences. Lastly, you will partner with the Digital business transformation leader to architect an agile landscape strategy, aligning technological initiatives with the organization's overall digital transformation goals.,

Posted 2 weeks ago

Apply

8.0 - 12.0 years

0 Lacs

hyderabad, telangana

On-site

About McDonald's: McDonald's Corporation, one of the world's largest employers with locations in more than 100 countries, is offering corporate opportunities in Hyderabad. The global offices of McDonald's are dynamic innovation and operations hubs, aimed at expanding the global talent base and in-house expertise of the company. The newly established office in Hyderabad will bring together knowledge across business, technology, analytics, and AI, accelerating the ability of McDonald's to deliver impactful solutions for the business and customers worldwide. Position Overview: McDonald's is looking for an exceptional Senior Data Product Engineering SRE to take charge of the development and operational excellence of data products that provide insights and drive crucial business decisions. This role requires a unique combination of a product engineering mindset, data platform expertise, and site reliability engineering practices to create, scale, and maintain customer-facing data products and internal analytics platforms. The Senior Data Product Engineering SRE will be responsible for ensuring the end-to-end reliability of data products, from ingestion to user experience, to ensure they deliver business value at scale. Key Responsibilities: - Define and implement a product reliability strategy for customer-facing analytics, dashboards, and data APIs. - Collaborate with Product Management to translate business requirements into scalable, reliable data product architectures. - Establish product metrics, KPIs, and success criteria for data products serving both external and internal customers. - Lead cross-functional initiatives to enhance data product adoption, engagement, and customer satisfaction. - Develop and maintain data products, including real-time dashboards, analytics APIs, and embedded analytics solutions. - Design user-centric data experiences focusing on performance, reliability, and scalability. - Implement A/B testing frameworks and experimentation platforms for data product optimization. - Set and maintain SLAs for data product availability, latency, and accuracy. - Implement comprehensive monitoring for user-facing data products, encompassing frontend and backend metrics. - Create automated testing frameworks for data product functionality, performance, and data quality. - Lead incident response for data product issues that impact customer experience. - Monitor and optimize data product performance from an end-user perspective, including page load times and query response times. - Implement user feedback collection and product analytics to drive continuous improvement. - Collaborate closely with Product, Engineering, Data Science, and Customer Success teams. - Establish engineering practices for data product development, encompassing code reviews and deployment processes. - Influence the product roadmap with technical feasibility and reliability considerations. - Advocate for data product best practices throughout the organization. - Strike a balance between innovation, operational stability, and customer commitments. - Collaborate with Product Management on feature prioritization and requirements. Required Qualifications: - 8+ years of experience in product engineering, data engineering, or SRE roles. - 5+ years of experience in building customer-facing data products, analytics platforms, or business intelligence solutions. - 3+ years in senior or lead positions with direct team management experience. - Proven track record of delivering data products that drive measurable business impact. - Expertise in the product development lifecycle from ideation to launch and optimization. - Advanced experience in building user-facing applications and APIs. - Deep expertise with analytics databases (Redshift, BigQuery, ClickHouse), real-time processing (Kafka, Spark Streaming), and BI tools (Tableau, Looker, Power BI). - Proficiency in React, Vue.js, or Angular for constructing data visualization interfaces. - Advanced skills in Python, Java, or Node.js for API development and data services. - Expert-level SQL skills and experience optimizing queries for interactive analytics workloads. - Extensive experience with AWS or GCP data and compute services. - Strong product sense with the ability to balance technical constraints with user needs. - Experience with product analytics tools (Amplitude, Mixpanel, Google Analytics) and metrics-driven development. - Ability to understand business requirements and translate them into technical solutions. - Strong technical writing skills for customer-facing documentation and API specifications. - Experience with agile product development methodologies (Scrum, Kanban, Design Thinking). - Proven track record of building and scaling product engineering teams. Work Location: Hyderabad, India Work Pattern: Full-time role. Work Mode: Hybrid.,

Posted 2 weeks ago

Apply

15.0 - 19.0 years

0 Lacs

hyderabad, telangana

On-site

About Chubb Chubb is a world leader in insurance, operating in 54 countries and territories. The company provides a wide range of commercial and personal property and casualty insurance, personal accident and supplemental health insurance, reinsurance, and life insurance to a diverse clientele. Known for its extensive product offerings, broad distribution capabilities, exceptional financial strength, and global local operations, Chubb Limited, the parent company, is listed on the New York Stock Exchange (NYSE: CB) and is a component of the S&P 500 index. With approximately 40,000 employees worldwide, Chubb's dedication to excellence and innovation is evident in its commitment to solving real-world challenges in the insurance industry. For more information, visit www.chubb.com. About CECx Chubb Engineering Centers (CECx) is undergoing a digital transformation journey fueled by a dedication to engineering excellence and analytics. Certified as a Great Place to Work for the third consecutive year, CECx embodies a culture of fostering an environment where individuals can thrive, innovate, and grow. With a global team of over 3500 talented professionals, CECx promotes a start-up mindset that encourages collaboration, diverse perspectives, and a solution-driven attitude. The focus lies on building expertise in engineering, analytics, and automation to empower teams to excel in a dynamic digital landscape. Position Details - Job Title: SRE, Automation Leader - Function/Department: Technology - Location: Hyderabad / Bangalore - Employment Type: Full-time Role Overview As an SRE, Automation Leader at Chubb, you will play a pivotal role in ensuring the reliability, performance, and scalability of applications in a production environment. Collaborating closely with development, operations, and product teams, you will design and implement robust application support strategies, troubleshoot complex issues, and enhance system performance. Key Responsibilities Pilot SRE adoption in traditional application support teams by studying and assessing existing application architecture, identifying areas for improvement in reliability and performance, and defining critical user journeys to align SRE practices with user experience. Establish SLOs, implement SLIs, develop error budgets, and prioritize automation to enhance operational efficiency. Provide coaching and mentoring to production support teams, facilitate the adoption of SRE practices, and deliver training sessions on SRE concepts and best practices. Ensure system uptime and reliability by delivering regional targets, monitoring risks, providing guidance on system architecture, and defining Critical User Journeys (CUJs) to optimize reliability. Implement observability tools, track CUJ-level metrics, and create actionable dashboards to proactively monitor critical systems. Develop automation for operational tasks, advocate for efficient tools and processes, and eliminate high-toil areas to increase operational efficiency. Skills and Qualifications - 15+ years of hands-on SRE experience - Strong technical background in software development, application production support, SDLC best practices, and agile methodology - Proficiency in SRE concepts, application architecture analysis, monitoring tools, automation skills, incident response, collaboration, coaching, and agile methodologies Why Chubb Join Chubb to be part of a leading global insurance company that values employee experience, underwriting excellence, and a culture of greatness. Enjoy a start-up-like culture focused on innovation, agility, and ownership. Benefit from growth opportunities, continuous learning programs, and a supportive work environment that fosters career advancement and inclusivity. Employee Benefits Chubb offers a comprehensive benefits package including savings and investment plans, upskilling programs, health and welfare benefits, flexible work options, paid time off, and robust health coverage. Employees can take advantage of specialized benefits like Corporate NPS, ESPP, LTIP, and access to career advancement programs. Health and well-being initiatives include a hybrid work environment, EAP, yearly health campaigns, and comprehensive insurance benefits. Join Us With Chubb, your contributions will help shape the future of the insurance industry. If you value integrity, innovation, and inclusion, and are ready to make a difference, join Chubb India on its journey towards excellence and growth.,

Posted 2 weeks ago

Apply

5.0 - 9.0 years

5 - 12 Lacs

Mumbai Suburban, Thane, Mumbai (All Areas)

Hybrid

Hi Candidates, We are hiring for Site Reliability Engineer Role- SRE Exp- 5+ years Location- Mumbai JD Direct Responsibilities Understanding of the flow & overall business in investment banking Basic understanding of the RISK Market, Counter party & credit RISK. Responsible Data integrity: responsible to check the data integrity within the system. Research, diagnose, troubleshoot and identify solutions to resolve functional/data issues/queries Service transition basic knowledge – accept services, products/systems, determines where they are fit for purpose and assess readiness against agreed service acceptance criteria. t this role level, you will: make recommendations on go-live, early-life support and service acceptance. Application & User support core knowledge is mandatory having ITIL – V3-V4 , Alerting , Monitoring, Capacity management – Assessment, Database – SQL , PL SQL, Cloud infrastructure, python, deployment cycle, Incident management, Request management, Problem management, Change & Release management. Successful candidate is expected to perform the above tasks in collaboration and agreement with the team leaders, project leads and other development staff, and where necessary with the Business Analysts, Application Production and Infrastructure teams. Technical Qualification Minimum 5-8 years experience in hands-on application support Hands-on experience in SQL – Sybase, Oracle, Python. Basics of DevOps and Kubernetes understanding, Cloud native, reporting tool- Power BI /equivalent Expert understanding of Dynatrace, Geneos, Splunk, equivalent monitoring & alerting tool Interested candidates share your resume singh.nikita@kiya.ai

Posted 2 weeks ago

Apply

8.0 - 12.0 years

15 - 20 Lacs

Bengaluru, Mumbai (All Areas)

Work from Office

Experience - 8+ Years Notice period - Immediate joiners Role - DevOps SRE Mandatory Skills - Kubernetes, Splunk, Dynatrace, Cloud foundry, CI CD Deployments CI/ CD, Jules, Blue Green deployments etc across diff env. Different SRE monitoring tools - Geneos, alerting, Monitoring using Dynatrace and other in house tools how to leverageand setups data flows or application flows into those monitoring framework Automation use cases take alerts from the tool and automate the actions to be taking like auto healing and auto recovery Coding skills using some scripting to take the alerts and figure out the action to be taken Note : If you are interested, kindly share your resume to reshma@vysystems.com - 7305029527

Posted 2 weeks ago

Apply

15.0 - 20.0 years

60 - 90 Lacs

Pune, Bengaluru

Hybrid

Northern Trust is seeking an experienced Principal Site Reliability Engineer with a strong focus on developing observability and automation. This role will play a pivotal part in ensuring the reliability and performance of the companys systems and services. What you will do : System design and Architecture: Lead the design and architecture of providing reliability, scalability, and performance of critical complex systems. • Operational Excellence: Develop and maintain automation scripts and tools to streamline operations and reduce manual tasks. Oversee system performance transparency. NTAC:3NS-20 • Incident Response/Root Cause Analysis: Collaborate with root cause analysis and implement measures to prevent recurrence of issues. • Monitoring and Observability: Design and implement comprehensive monitoring and observability solutions to proactively detect and address issues prior to them impacting our business. • Develop and maintain dashboards and alerts to provide real-time insights into system health. • Reliability Improvements: Identify opportunities for improving system reliability through process enhancements and technical solutions. • Documentation and Communication: Create and maintain detailed documentation of systems, processes, and procedures. • Communicate effectively with stakeholders across different teams and levels within the organization. • Project Management/Collaboration: Manage and prioritize multiple projects and initiatives related to reliability and performance improvements. • Collaborate with product, development, and operations teams to align SRE efforts with overarching business goals Qualifications: Bachelor's degree or equivalent experience • 10+ years in systems engineering with a focus on reliability, systems operations, and software engineering • 5+ years as a Team lead or a hands on Technical Manager role that can engage and deliver projects to completion • Strong proficiency in programming languages such as Python, Go, Ruby, Java, etc • Experience with both on-prem and cloud solutions • Experience with containerization • Demonstrated ability to design and implement systems that ensure observability with associated dashboards • Deep understanding of distributed systems, networking, and modern software architectures

Posted 2 weeks ago

Apply

12.0 - 20.0 years

40 - 75 Lacs

Chennai

Remote

At FourKites we have the opportunity to tackle complex challenges with real-world impacts. Whether its medical supplies from Cardinal Health or groceries for Walmart, the FourKites platform helps customers operate global supply chains that are efficient, agile and sustainable. Join a team of curious problem solvers that celebrates differences, leads with empathy and values inclusivity . Principal Architects at FourKites drive Technology & Best Practices in Engineering, responsible for Scaling, Performance, Availability (99.99%) & Quality of products. They participate in development with teams, creating exemplary modules & systems that demonstrate best practices. They provide strong technology leadership and mentoring through code reviews, design reviews and architecture discussions. Principal Architects define & implement long-term technology vision across products & teams, experimenting with relevant technologies while measuring impact. As primary owners of product architecture, they ensure proper review and implementation, represent FourKites in external forums, and create architectures for existing and new problem spaces in the Supply chain/logistics domain. Who you are: Hands-on experience managing production-grade Kubernetes clusters Experience with at least 2 end-to-end migrations (Cloud, databases, Container Orchestration, API Gateways) Experience managing cloud infrastructure for SaaS companies Deep understanding and experience with Docker and Kubernetes orchestration Experience managing microservices architectures Deep understanding of cloud infrastructure security with responsibility for security and compliance audits Experience handling low latency and high request volume requirements Understanding of open source vs. SaaS observability platforms Security Experience (SOC2, PCI) DR Drills experience Automation capabilities (building workflows to reduce manual tasks using scripts, open source tools, AWS System Manager) Strong understanding of observability with ability to instrument end-to-end Technical Expertise: Bachelor's degree in Computer Science or equivalent practical experience Strong UNIX/Linux systems administration skills Advanced troubleshooting abilities across systems, networks, and application code Programming proficiency in Python or other relevant languages Experience maintaining high-availability systems with stringent uptime requirements Tools Experience: Logging : Graylog with Elastic Search Backend, Signoz Monitoring : Datadog / NewRelic / Chronosphere CI/CD : BitBucket / GitLab / Jenkins / ArgoCD IaaS : Terraform Languages : Python/Flask API Gateway : Kong / Istio / Service Mesh Configuration Management : AWS Systems Manager Serverless : Lambda, Fargate Messaging : Confluent Kafka / Cloudera / HD Insights Databases : Postgres, Cassandra / Atlas MongoDB, RDS What youll be doing: Collaborate closely with Development teams to resolve bottlenecks Participate in on-call rotation for production support Debug product issues during outages and conduct root cause analysis Set up CI/CD pipelines for new projects Partner with engineering leadership to establish SLIs, SLOs, SLAs, and Error budgets Provide production support with quick resolution during outages Work with frontend and backend teams to automate repetitive tasks and improve system health Ensure automated setup and updates of all development environments Maintain infrastructure with focus on security and compliance Kindly apply through this link - https://boards.greenhouse.io/fourkites/jobs/6715916

Posted 2 weeks ago

Apply

7.0 - 10.0 years

30 - 45 Lacs

Bengaluru

Hybrid

About Position: A platform engineer designs, builds, and maintains the internal infrastructure and tools that enable other development teams to build and deploy applications efficiently. Role: Platform Engineer - SRE Location: Bangalore Experience: 7-10 years Job Type: Full Time Employment What You'll Do: Act as the custodian for Build and Deployment assets for own platform and be responsible for them through all environments Ensure that servers/applications are healthy. e,g, free of vulnerabilities, having the right certificates in place etc. Engage with project teams to govern implementation and adoption. Play a mentoring role in providing design and technical assistance to other members of the team. Analyse systems and applications and provide recommendations for design, enhancement and development, and play an active part in their execution Design, implement, and maintain scalable and reliable infrastructure. Monitor system performance and troubleshoot issues to ensure high availability and reliability. Automate repetitive tasks and processes to improve efficiency. Collaborate with development teams to ensure smooth deployment and operation of applications. Implement and maintain monitoring, logging, and alerting systems. Expertise You'll Bring: Strong knowledge of cloud platforms (e.g., AWS, Azure, Google Cloud). Proficiency in scripting languages (e.g., Python, Bash). Experience with containerization and orchestration tools (e.g., Docker, Kubernetes). Familiarity with CI/CD pipelines and tools (e.g., Jenkins, GitLab CI). Strong understanding of networking, security, and system administration. Proven experience as a Site Reliability Engineer or similar role. Excellent problem-solving and troubleshooting skills. Strong communication and collaboration skills. Benefits: Competitive salary and benefits package Culture focused on talent development with quarterly promotion cycles and company-sponsored higher education and certifications Opportunity to work with cutting-edge technologies Employee engagement initiatives such as project parties, flexible work hours, and Long Service awards Annual health check-ups Insurance coverage: group term life, personal accident, and Mediclaim hospitalization for self, spouse, two children, and parents Inclusive Environment: Persistent Ltd. is dedicated to fostering diversity and inclusion in the workplace. We invite applications from all qualified individuals, including those with disabilities, and regardless of gender or gender preference. We welcome diverse candidates from all backgrounds. We offer hybrid work options and flexible working hours to accommodate various needs and preferences. Our office is equipped with accessible facilities, including adjustable workstations, ergonomic chairs, and assistive technologies to support employees with physical disabilities. If you are a person with disabilities and have specific requirements, please inform us during the application process or at any time during your employment. We are committed to creating an inclusive environment where all employees can thrive. Our company fosters a values-driven and people-centric work environment that enables our employees to: Accelerate growth, both professionally and personally Impact the world in powerful, positive ways, using the latest technologies Enjoy collaborative innovation, with diversity and work-life wellbeing at the core Unlock global opportunities to work and learn with the industrys best Lets unleash your full potential at Persistent Persistent is an Equal Opportunity Employer and prohibits discrimination and harassment of any kind.”

Posted 2 weeks ago

Apply

5.0 - 10.0 years

15 - 20 Lacs

Bengaluru

Work from Office

Responsibilities: Responsible, as part of that team, for all stages of design and development and operation for the solution Observability stack, which monitors and alerts the Customer Experience Platform, a complex set of products and platforms, hosted on internal cloud. This includes solution design, analysis, coding, testing, and integration. Collaborates with multiple project teams, internal and outsourced development partners. Reviews and evaluates designs and code for compliance with systems design and development guidelines and standards, with emphasis on solution reliability; provides tangible feedback to improve product quality and mitigate failure risk. Responsible for troubleshooting infrastructure/application issues with our partner vendors Drives innovation and integration of new technologies into projects and activities in the software systems design organization. Provides guidance and mentoring to less- experienced staff members. Should be able to do on call duty (rotation basis among team members). Requirements: Bachelors or masters degree in Computer Science, Information Systems, or equivalent knowledge / experience. Typically, 5-10 years’ experience. Domain Experience / Knowledge: Proven experience with Python desirable Proven experience with Linux Operating systems & shell scripting Proven experience developing and maintaining containerized applications working with Docker/Docker Swarm/kubernetes desirable Proven experience with Monitoring, Alerting and Logging technologies eg Elastic, Prometheus, Grafana Proven experience debugging complex issues, root cause analysis and supporting large scale application architectures. Be highly motivated and have the ability to self-learn new technologies and processes quickly. Proven experience writing production level code in at least one software development language Proven experience with Go lang is added advantage. Proven experience with GitHub, Jenkins and/or other CI/CD tools with a strong focus on automation Proven experience with multiple software systems, applications, design tools Experience in overall architecture of software systems for products and solutions. Designing and integrating software systems running on multiple platform types into overall architecture. Proven ability to identify and implement solutions to technical problems, working independently and leading more junior engineers where appropriate Knowledge of Agile based development methodologies (SAFE Framework advantageous) Excellent written and verbal communication skills; mastery in English and local language. Ability to effectively communicate product architectures, design proposals and negotiate options at management levels. Personal skills and attributes: Ability to communicate effectively to management, peers and team members Ability to work and deliver in global, cross-functional, and virtual teams Demonstrate a strong combination of analytical skills, intellectual curiosity and reporting acumen

Posted 2 weeks ago

Apply

2.0 - 4.0 years

4 - 9 Lacs

Pune

Hybrid

So, what’s the role all about? We're looking for a passionate and hands-on DevOps Engineer with 3–5 years of experience to help us scale, automate, and secure our cloud infrastructure. This role is ideal for someone who thrives in a dynamic environment, enjoys solving infrastructure challenges, and loves working with cutting-edge DevOps tools. As part of our engineering team, you’ll play a key role in managing cloud resources, enhancing CI/CD workflows, and enabling development teams to move faster and safer. How will you make an impact? Cloud Infrastructure: Design, implement, and manage scalable, secure AWS-based infrastructure. Infrastructure as Code: Use Terraform (or CloudFormation) to provision and maintain infrastructure in a repeatable way. CI/CD Pipelines: Develop and enhance continuous integration and deployment pipelines using Jenkins, GitHub Actions, or Spacelift. Automation & Scripting: Build automation scripts in Python, Shell, or Groovy to reduce manual effort and improve system reliability. Monitoring & Logging: Set up monitoring tools and alerting systems to ensure uptime and performance (e.g., CloudWatch, ELK, Prometheus). Collaboration: Work closely with developers, QA, and other DevOps teams to support smooth and secure delivery workflows. Have you got what it takes? 3–5 years of experience in a DevOps or SRE role. Strong hands-on knowledge of AWS (EC2, VPC, S3, RDS, ECS, Route53, etc.). Experience with Terraform (or CloudFormation) for infrastructure management. Solid understanding of Docker and containerized workflows. Expertise in CI/CD tools like Jenkins, GitHub Actions, Spacelift. Proficiency in scripting languages (Python, Shell, Groovy). Experience with Git, version control workflows, and team collaboration tools (JIRA, Confluence). Strong problem-solving ability, attention to detail, and eagerness to learn and grow in a fast-paced team. What’s in it for you? Join an ever-growing, market disrupting, global company where the teams – comprised of the best of the best – work in a fast-paced, collaborative, and creative environment! As the market leader, every day at NiCE is a chance to learn and grow, and there are endless internal career opportunities across multiple roles, disciplines, domains, and locations. If you are passionate, innovative, and excited to constantly raise the bar, you may just be our next NiCEr! Enjoy NiCE-FLEX! At NiCE, we work according to the NiCE-FLEX hybrid model, which enables maximum flexibility: 2 days working from the office and 3 days of remote work, each week. Naturally, office days focus on face-to-face meetings, where teamwork and collaborative thinking generate innovation, new ideas, and a vibrant, interactive atmosphere. Reporting into: Tech Manager Role Type: Individual Contributor

Posted 2 weeks ago

Apply

3.0 - 5.0 years

3 - 8 Lacs

Pune

Work from Office

Role Overview As an SRE Engineer , you will work on building reliable, scalable, and automated infrastructure using modern IaC and scripting tools. Youll also contribute to end-to-end cloud migration projects — including on-prem to cloud and cloud-to-cloud scenarios — using industry-standard services and tools. Key Responsibilities Develop and maintain Infrastructure as Code (IaC) using Terraform and AWS CloudFormation (CFT) to provision and manage cloud environments. Automate infrastructure setup, configuration, and deployment workflows using Python and Ansible . Participate in cloud migration projects , assisting in workload planning, execution, and validation for both on-prem to cloud and cloud-to-cloud scenarios. Work with migration tools like AWS Application Migration Service (MGN) , Application Discovery Service , Elastic Disaster Recovery etc. Implement CI/CD pipelines , monitor system health, and support high-availability and disaster recovery configurations. Follow SRE best practices , including monitoring, alerting, incident management, and root cause analysis. Contribute to reusable automation modules, internal knowledge base, and technical documentation. Required Skills & Experience 3–4 years of hands-on experience in cloud infrastructure, automation, or SRE/DevOps roles. Proven experience with: Terraform and CloudFormation (CFT) Python scripting Ansible for configuration automation Involved in migration of workloads from on-premises to cloud or cloud-to-cloud across AWS and/or Azure. Working knowledge of AWS migration tools , such as: AWS Application Migration Service (MGN) Application Discovery Service Elastic Disaster Recovery AWS Database Migration Service AWS Migration Hub Understanding of networking, IAM, VPCs, security groups, firewalls, and DNS in cloud environments. Experience with monitoring/logging tools (e.g., CloudWatch , Azure Monitor , Prometheus , Grafana ) and troubleshooting. Good grasp of SRE concepts like SLAs, SLOs, incident response , and automation-first mindset .

Posted 2 weeks ago

Apply

5.0 - 10.0 years

18 - 27 Lacs

Hyderabad

Work from Office

Role & responsibilities We are seeking a skilled Production Support Engineer (L2/L3) with strong experience in .NET/Java application support , Site Reliability Engineering (SRE) practices, log analysis (Splunk) , and monitoring tools . The role focuses on ensuring high availability, performance, and reliability of production systems. Key Responsibilities: Provide L2/L3 support for production systems developed in .NET and Java . Monitor application health and system metrics using tools like Splunk , AppDynamics , Datadog , or Nagios . Perform incident triage , root cause analysis (RCA), and resolution of application/system issues. Collaborate with development, infrastructure, and SRE teams for timely issue resolution. Participate in on-call rotation and handle critical escalations . Maintain and improve runbooks , alerts , and dashboards to enhance observability. Automate routine support tasks using scripting (PowerShell, Python, or Shell) . Drive SRE practices like incident postmortems , error budgets , and service level objectives (SLOs) . Required Skills: Strong knowledge of .NET and/or Java-based application architecture and troubleshooting. Experience with SRE principles and operational support models. Proficiency in log analysis tools such as Splunk . Familiar with monitoring tools like AppDynamics, New Relic, Dynatrace, or Prometheus. Experience working with incident management tools (e.g., ServiceNow, Jira). Strong analytical and communication skills interested candidates can share your updated resume to sarvani.j@ifinglobalgroup.com

Posted 2 weeks ago

Apply

15.0 - 19.0 years

0 Lacs

karnataka

On-site

You will be working as a Subject Matter Expert/Architect with over 15 years of experience. Your responsibilities will include understanding client requirements, preparing technical artifacts, collaborating with team members on integration, engaging with vendors and customers, and interacting with various cross-functional groups to enhance functionality and support existing customers. You will be advising on decisions related to design factors, preparing technical roadmaps, and participating in governance discussions. Your role will also involve managing engineering tasks, working on Cloud, Virtualization, Distributed Systems, and Enterprise Application Development. Experience in Site Reliability Engineering (SRE), DevOps, or Operations role would be beneficial. A systematic problem-solving approach along with a strong sense of ownership and drive is essential for this role. You should have experience in operating, troubleshooting, and scaling online services. Domain-level understanding of Public/Private Cloud Infrastructure, Networking, SAN, NAS storage technologies, and configuration operation of VMware Products like vSphere, vSAN, NSX is required. Your skills and experience should include deployment, configuration, and operation of VMware Cloud Foundation (VCF), VxRail, and other Hyper-Converged Appliances, automation tools like Jenkins, GitLab CI, and a strong background in distributed systems and cloud applications. You will be responsible for handling seamless upgrades of infrastructure and services through automation, identifying performance metrics, logs, and alerts, working on change requests and new installations, resolving major incidents, and technical escalations within agreed SLAs and quality. In terms of problem management, you will perform root cause analysis for problems and major incidents, provide workarounds for business continuity, and prepare RCA reports. Change management tasks will include preparing implementation plans, rollback plans, test plans, risk and impact analysis for critical changes, and reviewing change plans and documentation. You will also drive end-to-end change management, work on proactive and reactive problem management, provide RCA, ensure timely issue resolution to meet SLAs, and follow up with vendors for hardware and software issues. Your responsibilities will involve installing hardware and software, maintaining service levels, performance tuning, firmware upgrades, capacity planning, performance monitoring, providing improvement recommendations, and overseeing implementation. You will work on various layers including Compute (ESXi), Storage (vSAN, SAN), and Network (NSX-T) to ensure efficient infrastructure operations.,

Posted 2 weeks ago

Apply

8.0 - 12.0 years

27 - 42 Lacs

Chennai

Work from Office

Job summary The Sr. Business Analyst will play a pivotal role in analyzing and optimizing business processes through the application of technical skills in SRE Grafana ELK Dynatrace AppMon and Splunk. This hybrid role requires a seasoned professional with 8 to 12 years of experience to drive impactful solutions in a day shift setting without the need for travel. Responsibilities Analyze business processes and identify areas for improvement using advanced technical skills. Collaborate with cross-functional teams to gather and document business requirements. Develop and implement monitoring solutions using Grafana and ELK to ensure system reliability. Utilize Dynatrace AppMon and Splunk to troubleshoot and resolve performance issues. Provide insights and recommendations based on data analysis to enhance business operations. Lead the design and execution of test plans to validate system changes. Ensure seamless integration of new solutions with existing systems and processes. Oversee the deployment of updates and enhancements in a hybrid work environment. Maintain comprehensive documentation of processes configurations and changes. Conduct training sessions to educate stakeholders on new tools and processes. Monitor system performance and proactively address potential issues. Collaborate with IT teams to ensure alignment with business objectives. Drive continuous improvement initiatives to optimize system performance and user experience. Qualifications Possess a strong background in SRE Grafana ELK Dynatrace AppMon and Splunk. Demonstrate excellent analytical and problem-solving skills. Exhibit proficiency in documenting business processes and technical specifications. Have experience in leading cross-functional teams and projects. Show capability in developing and executing test plans. Display strong communication skills to interact with stakeholders. Be adept at working in a hybrid work model and managing day shift responsibilities. Certifications Required Certified Business Analysis Professional (CBAP) Dynatrace Associate Certification

Posted 2 weeks ago

Apply

6.0 - 10.0 years

8 - 12 Lacs

Gurugram

Work from Office

About the Role: OSTTRA India The RoleSite Reliability Engineer The Team SRE is a global team that provides technical support across the suite of OSTTRA products. The SRE team works closely with a highly competent Technical Operation Centre (TOC), Development and Infrastructure teams to deliver proactive tasks to improve the supportability of our platforms. Our work helps to ensure that OSTTRA provides a high-quality service and maintains client satisfaction. The Impact Together, we build, support, protect and manage high-performance, resilient platforms that process more than 100 million messages a day. Our services are vital to automated trade processing around the globe, managing peak volumes and working with our customers and regulators to ensure the efficient settlement of trades and effective operation of global capital markets. Whats in it for you: OSTTRA is seeking a Site Reliability Engineer professional to join the SRE Team. The role will be specialised into the designated platforms provisioning 2nd line technical support to TOC as well as integration support for our Trade Processing applications. This person will report directly to the regional SRE manager and work closely with an experienced global team to contribute to the quality of our support. You will have 6-10 years experience of roles like Site Reliability Engineer or Application Support with Project Management tasks to meet the needs of our expanding portfolio of Financial Services clients. This role presents an excellent opportunity to be part of an agile team based out of India, collaborating with colleagues across multiple regions globally, with a strong focus on delivering value through self-service. Responsibilities: Your duties will include Capacity Management, Operational Support Design, Audit Preparation, Incident Escalation, Problem Management Engagement, DR Design and Execution and ad hoc High Profile Client Engagement for your designated platform(s) in our full suite of OTC Derivative products and FX for post-trade confirmation processing. You will need to demonstrate excellent communication skills and have a natural ability to learn with a keen interest in technology. You must be a team player and enjoy working in a high-performance collaborative environment with multiple teams. The successful candidate will need to be able to apply strong technical skills and good business knowledge, together with investigative techniques and problem-solving skills to identify gaps and improve overall estate to bring resilience and stability to the platform(s). Liaising with other teams across Product, Development and particularly the infrastructure teams as required for 3rd line escalation. Technical advisory will be required at times by Product and business or clients for solution delivery. Working closely with Development and Infrastructure team, to understand and ensure supportability of platforms and liaising with delivery teams to ensure readiness for new platform releases. Based in our Gurgaon office, you will be responsible for handling, identifying and communicating technical resolutions in English. What Were Looking For: University graduate or equivalent with background of bachelors in computer science Experience or having high motivation in managing the capacity, performance throughput and EOS/EOL of platform from infrastructure to software Experience in troubleshooting of issues, defining supportability, soaking in software development life cycle SDLC process streamlining application delivery from Dev/QA to UAT/Production Good understanding of Site Reliable Engineer as well as Application Support processes, supporting of incidents and execute/design disaster recovery Strong ability to understand application architecture, able to effectively navigate to the problem area, and identify proactive measures around resiliency, recovery design Ability to apply analytical methodology, such as trending, distribution etc., to get insight from application data to help troubleshooting and analysing best approach Ability to understand business workflow and tie to technical implementation Experience in reading and tracing Java, C++, Python and/or scripting languages Experience of databases including SQL scripting, preferably but not limited to Oracle Good to Have: Understanding of networking principles, its practical uses and basic troubleshooting. Possess the understanding of Cloud (AWS, GCP or Azure), PAAS and implementation with Kubernetes, OpenShift, Windows and Linux Experience in handling client issues and expectation management Good understanding of messaging platforms and protocols like XML, XSLT, IBM MQ, AMQ etc Knowledge of financial messaging protocols like FIX, FPmL, TOF etc Experience security protocols related to connectivity encryption utilizing SSL and TLS Have experience of working in the Finance Industry Knowledge of the Financial OTC Derivative and FX products Awareness of Derivatives products and post trade processing (desirable) The LocationGurgaon, India Statement: OSTTRA is a market leader in derivatives post-trade processing, bringing innovation, expertise, processes and networks together to solve the post-trade challenges of global financial markets. OSTTRA operates cross-asset post-trade processing networks, providing a proven suite of Credit Risk, Trade Workflow and Optimization services. Together these solutions streamline post-trade workflows, enabling firms to connect to counterparties and utilities, manage credit risk, reduce operational risk and optimize processing to drive post-trade efficiencies. OSTTRA was formed in 2021 through the combination of four businesses that have been at the heart of post trade evolution and innovation for the last 20+ yearsMarkitServ, Traiana, TriOptima and Reset. These businesses have an exemplary track record of developing and supporting critical market infrastructure and bring together an established community of market participants comprising all trading relationships and paradigms, connected using powerful integration and transformation capabilities. About OSTTRA Candidates should note that OSTTRAis an independentfirm, jointly owned by S&P Global and CME Group. As part of the joint venture, S&P Global providesrecruitmentservices to OSTTRA - however, successful candidates will be interviewed and directly employed by OSTTRA, joiningour global team of more than 1,200 posttrade experts. OSTTRA was formed in 2021 through the combination of four businesses that have been at the heart of post trade evolution and innovation for the last 20+ yearsMarkitServ, Traiana, TriOptima and Reset. OSTTRA is a joint venture, owned 50/50 by S&P Global and CME Group. Joining the OSTTRA team is a unique opportunity to help build a bold new business with an outstanding heritage in financial technology, playing a central role in supporting global financial markets.Learn more atwww.osttra.com. Whats In It For You Benefits: We take care of you, so you cantake care of business. We care about our people. Thats why we provide everything youand your careerneed to thrive at S&P Global. Health & WellnessHealth care coverage designed for the mind and body. Continuous LearningAccess a wealth of resources to grow your career and learn valuable new skills. Invest in Your FutureSecure your financial future through competitive pay, retirement planning, a continuing education program with a company-matched student loan contribution, and financial wellness programs. Family Friendly PerksIts not just about you. S&P Global has perks for your partners and little ones, too, with some best-in class benefits for families. Beyond the BasicsFrom retail discounts to referral incentive awardssmall perks can make a big difference. For more information on benefits by country visithttps://spgbenefits.com/benefit-summaries ---- Equal Opportunity Employer S&P Global is an equal opportunity employer and all qualified candidates will receive consideration for employment without regard to race/ethnicity, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, marital status, military veteran status, unemployment status, or any other status protected by law. Only electronic job submissions will be considered for employment. If you need an accommodation during the application process due to a disability, please send an email to EEO.Compliance@spglobal.com and your request will be forwarded to the appropriate person. US Candidates Only The EEO is the Law Poster http://www.dol.gov/ofccp/regs/compliance/posters/pdf/eeopost.pdf describes discrimination protections under federal law. Pay Transparency Nondiscrimination Provision - https://www.dol.gov/sites/dolgov/files/ofccp/pdf/pay-transp_%20English_formattedESQA508c.pdf ----

Posted 2 weeks ago

Apply

6.0 - 11.0 years

18 - 22 Lacs

Hyderabad

Work from Office

Overview We are seeking a highly skilled and analytically strong Site Reliability Engineer ( SRE) and Scrum with 6+ years of experience. The ideal candidate will have a proven track record in managing SRE responsibilities across multiple teams, with deep expertise in Active Directory (AD) groups, Databricks, architecture design, and enterprise tools like Clarity and ServiceNow. Strong Scrum delivery experience and cross-functional collaboration are essential. Responsibilities Key Responsibilities Lead SRE operations across distributed teams, ensuring system reliability, scalability, and performance. Design and implement robust monitoring, alerting, and observability frameworks. Lead Scrum ceremonies Manage and optimize Active Directory (AD) group structures and access controls. Collaborate with data engineering teams to support Databricks environments. Contribute to architectural discussions and decisions for high-availability systems. Drive incident response, root cause analysis, and continuous improvement initiatives. Integrate and manage workflows using Clarity PPM and ServiceNow for change, incident, and problem management. Actively participate in Scrum ceremonies (daily stand-ups, sprint planning, reviews, retrospectives). Collaborate with Product Owners and Scrum Masters to ensure timely and qual ity . Qualifications Education Bachelors or Masters degree in Computer Science, Information Systems, Business Analytics, or a related field. Experience 6+ years of experience in SRE, DevOps, or Infrastructure Engineering roles. Strong analytical thinking and troubleshooting skills. Hands-on experience with Active Directory (AD) group policy management, access provisioning. Databricks cluster management, job orchestration, performance tuning. Architecture designing scalable, fault-tolerant systems. Clarity PPM project tracking, resource planning. ServiceNow incident/change/problem management workflows. Proficiency in monitoring tools (e.g., Prometheus, Grafana, Datadog). Experience with CI/CD pipelines and infrastructure as code (Terraform, Ansible). Familiarity with cloud platforms (Azure, AWS, or GCP). Strong scripting skills (Python, Bash, PowerShell). Solid understanding of Agile/Scrum methodologies and tools like Jira or Azure DevOps. Preferred Qualifications Certified Scrum Master or equivalent Agile certification. Experience working in a global delivery model. Exposure to digital product and reporting services is a plus.

Posted 2 weeks ago

Apply

7.0 - 9.0 years

20 - 22 Lacs

Noida

Hybrid

Do you want a job with a purpose? And do you want to make healthcare safer, better and more reliable? Join our Team! Interested candidates please apply to the link below - https://dedalus.wd3.myworkdayjobs.com/External/job/IND---New-Delhi---Noida/Senior-Specialist---Devops-Engineer_JR105116 Senior DevOps Engineer Join us as a DevOps Engineer at Dedalus, one of the worlds leading healthcare technology companies. Be a part of our team and contribute to delivering high-quality software solutions that make a profound impact in providing better care for a healthier planet. What you'll achieve As a DevOps Engineer , you will play a crucial role in designing, implementing, and maintaining the infrastructure and automation tools that support our development and deployment processes. You will collaborate with cross-functional teams to ensure the reliability, security, and scalability of our applications, making a profound impact throughout the healthcare sector. You will: Develop and maintain CI/CD pipelines to ensure fast, reliable, and consistent delivery of software. Automate infrastructure provisioning, configuration management, and application deployment using tools like Terraform, Ansible, Puppet, or Chef. Implement and manage monitoring, logging, and alerting systems using tools such as Prometheus, Grafana, ELK Stack, or Splunk. Work closely with development, QA, and operations teams to streamline workflows and improve communication and efficiency. Lead incident response efforts , conduct root cause analysis, and implement preventative measures to ensure system stability and performance. Provide mentorship and guidance to junior DevOps engineers and other team members. Take the next step towards your dream career At Dedalus, Life flows through our software . Every day, we help caregivers and health professionals deliver better care to their communities. Take the next step in your career that will make a profound impact. Heres what youll need to succeed: Essential Requirements 7 to 9 years of experience in a DevOps, Site Reliability Engineer (SRE), or related role . Bachelors degree in Computer Science, Information Technology, or a related field (or equivalent experience). Extensive experience with cloud platforms (AWS, Azure, Google Cloud). Proficiency with Infrastructure as Code (IaC) and configuration management tools (Terraform, Ansible, Puppet, Chef). Strong experience with CI/CD tools (Jenkins, GitLab CI, CircleCI). Proficiency in scripting languages (Python, Bash, PowerShell). Hands-on experience with containerization and orchestration technologies (Docker, Kubernetes). Desirable Requirements Experience with networking concepts and security best practices . Knowledge of service mesh technologies (Istio, Linkerd, Consul). Familiarity with serverless computing and edge computing architectures . Exposure to compliance frameworks (ISO 27001, SOC 2, HIPAA). We are Dedalus, come join us Dedalus is committed to providing an engaging and rewarding work experience that reflects the passion our employees bring to our mission of helping clinicians and nurses deliver better care to their communities. Our company fosters a culture of innovation, learning, and collaboration , enabling clinical cooperation and efficiency while making a meaningful difference for millions of people worldwide. Each person is at the heart of our activities, making a real impact every day. With 7,600 employees in more than 40 countries , Dedalus continues to drive healthcare innovation globally. We are the people of Dedalus. Application closing date: 31st of July 2025 Our diversity & inclusion commitment Dedalus Global Dedalus is dedicated to ensuring respect, inclusion, and success for all colleagues and the communities we serve. We are committed to fostering an inclusive and diverse workplace and continuously strive to improve and grow in this journey. Life Flows Through Our Software.

Posted 2 weeks ago

Apply

4.0 - 8.0 years

12 - 22 Lacs

Kolkata, Pune, Chennai

Work from Office

Location- Chennai, Kolkata, Hyderabad, Pune, Bangalore (SRE), performance monitoring, or infrastructure reliability Strong knowledge of APM Dynatrace certification or relevant training Experience with additional observability tools

Posted 2 weeks ago

Apply

15.0 - 19.0 years

0 Lacs

hyderabad, telangana

On-site

Chubb is a world leader in insurance, operating in 54 countries and territories and offering a wide range of commercial and personal insurance products. Chubb is known for its extensive product portfolio, strong financial position, and global presence. The parent company, Chubb Limited, is listed on the New York Stock Exchange and is part of the S&P 500 index. With approximately 40,000 employees worldwide, Chubb is committed to providing exceptional service to a diverse group of clients. More information about Chubb can be found at www.chubb.com. Chubb Engineering Centers (CECx) is dedicated to digital transformation and engineering excellence. As a Great Place to Work for the third consecutive year, CECx fosters a culture of innovation, collaboration, and growth. With a global team of over 3500 professionals, CECx encourages a start-up mindset to drive solutions in engineering, analytics, and automation. Position Details: - Job Title: SRE, Automation Leader - Function/Department: Technology - Location: Hyderabad / Bangalore - Employment Type: Full-time Role Overview: We are looking for a skilled SRE, Automation Leader to join our team. In this role, you will be responsible for ensuring the reliability, performance, and scalability of our applications in a production environment. Collaborating with development, operations, and product teams, you will design and implement robust application support strategies, troubleshoot complex issues, and enhance system performance. Key Responsibilities: Pilot SRE adoption in traditional application support teams by studying existing application architecture, defining critical user journeys, establishing SLOs and SLIs, managing error budgets, and prioritizing automation and process improvements. Provide coaching and mentoring to production support teams to adopt SRE practices. Ensure system uptime and reliability by delivering regional targets, proactively monitoring risks, guiding system architecture, and tracking Critical User Journeys. Implement observability tools, develop automation of operational tasks, and eliminate manual intervention through efficient tools and processes. Skills and Qualifications: - 15+ years of hands-on experience as an SRE in an application support team - Strong technical background in software development, SDLC best practices, and agile methodology - Proficiency in implementing SRE concepts, application architecture analysis, monitoring tools, automation skills, incident response, collaboration, coaching, Agile methodologies, and continuous improvement mindset Why Chubb Join Chubb to be part of a leading global insurance company with a focus on employee experience, underwriting excellence, and a culture of greatness. Enjoy a start-up-like culture that emphasizes speed, agility, and ownership. Benefit from growth opportunities, continuous learning, and a supportive work environment. Employee Benefits: Chubb offers a comprehensive benefits package including savings and investment plans, upskilling and career growth opportunities, and health and welfare benefits. Employees enjoy flexible work options, generous paid time off, health coverage, and continuous learning opportunities. Join Us: Join Chubb India's journey to shape the future and make a difference. If you value integrity, innovation, and inclusion, and are ready to contribute to a global insurance leader, we invite you to be part of Chubb.,

Posted 2 weeks ago

Apply

15.0 - 19.0 years

0 Lacs

haryana

On-site

As the Vice President of DevOps & SRE, you will hold a senior leadership position with the primary responsibility of driving platform reliability, secure operations, and DevOps excellence throughout the enterprise. Your role will involve integrating site reliability engineering practices with scalable DevOps automation and maintaining a robust cybersecurity posture. Leading high-performing teams, defining technology strategy, managing infrastructure, and safeguarding systems and data to support business growth and digital innovation will be key aspects of your role. You will be expected to lead enterprise-wide DevOps adoption and continuous delivery transformation, implementing and optimizing CI/CD pipelines, infrastructure-as-code (IaC), and cloud-native architectures. Championing automation in deployment, monitoring, and infrastructure provisioning will be essential, along with experience in containerization (Kubernetes, Docker), service mesh, and serverless environments. Facilitating collaboration between development, operations, and QA for rapid and reliable releases will also be a critical part of your responsibilities. Establishing and leading the Site Reliability Engineering (SRE) function to ensure system reliability, scalability, and performance will be another key aspect of your role. You will define and monitor SLAs, SLOs, and SLIs for critical applications and services, drive incident management, root cause analysis, and foster a postmortem culture. Developing and deploying observability strategies using tools like Prometheus, Grafana, Zabbix, or enterprise tools such as New Relic, Dynatrace, or Splunk will also be within your purview. In terms of leadership and strategic alignment, you will build and mentor cross-functional teams across DevOps and SRE, partnering with engineering, product, and business leaders to align technical initiatives with organizational goals. Managing departmental budgets, tools, and vendor relationships, as well as reporting on KPIs, operational health, security posture, and risk to the executive leadership team will also be part of your responsibilities. To qualify for this role, you must hold a Bachelors or Masters in Computer Science, Engineering, or a related field, along with at least 15+ years of experience in IT/engineering, including a minimum of 5+ years in leadership roles. Proven expertise in implementing DevOps, SRE, and security practices at scale, as well as hands-on experience with AWS, Azure, or GCP, CI/CD tools, and SRE observability platforms, are essential requirements for this position.,

Posted 2 weeks ago

Apply

6.0 - 8.0 years

5 - 10 Lacs

Noida, Uttar Pradesh, India

On-site

Key Responsibilities: Monitoring & Alerting : Develop, maintain, and enhance monitoring and alerting systems using Datadog to proactively identify and address potential issues, ensuring optimal system performance. CI/CD Pipelines : Participate in the design and implementation of CI/CD pipelines using Azure DevOps , enabling automated and reliable software delivery. Incident Response : Lead efforts in incident response and troubleshooting to quickly diagnose and resolve production incidents, minimizing downtime and impact on users. Reliability Initiatives : Take ownership of reliability initiatives by identifying areas for improvement, conducting root cause analysis , and implementing solutions to prevent recurrence of incidents. Collaboration : Collaborate with cross-functional teams to ensure security , compliance , and performance standards are met throughout the development lifecycle. On-call Support : Participate in on-call rotations and provide 24/7 support for critical incidents, ensuring rapid response and resolution. SLOs & SLIs : Work with development teams to define and establish Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to measure and maintain system reliability. Documentation : Contribute to the documentation of processes, procedures, and best practices to enhance knowledge sharing within the team. Qualifications: Education : Bachelor's degree in Computer Science , Information Technology , or a related field, or equivalent work experience. Experience : Minimum of 4 years of experience in a Site Reliability Engineer or similar role, managing cloud-based infrastructure on AWS with EKS . AWS Expertise : Strong expertise in AWS services , especially EKS , including cluster provisioning , scaling , and management . Monitoring & Observability : Proficiency in using monitoring and observability tools , with hands-on experience in Datadog or similar tools for tracking system performance and generating meaningful alerts. CI/CD Experience : Experience in implementing CI/CD pipelines using Azure DevOps or similar tools to automate software deployment and testing. Containerization & Orchestration : Solid understanding of containerization and orchestration technologies (e.g., Docker , Kubernetes ) and their role in modern application architectures. Troubleshooting : Excellent troubleshooting skills and the ability to analyze complex issues, determine root causes, and implement effective solutions. Scripting & Automation : Strong scripting and automation skills (e.g., Python , Bash ). IaC (Infrastructure as Code) : Familiarity with Infrastructure as Code (IaC) tools such as Terraform or CloudFormation . Incident Management : Experience with incident management , post-incident analysis , and implementing improvements based on lessons learned. Security & Compliance : Good understanding of security best practices and compliance standards in cloud environments. Communication : Exceptional communication skills and the ability to collaborate effectively with cross-functional teams. On-call Rotations : Willingness to participate in on-call rotations and provide off-hours support when necessary. Preferred Qualifications: Relevant certifications such as: AWS Certified DevOps Engineer AWS Certified SRE Kubernetes certifications Experience with other cloud platforms (e.g., Azure , Google Cloud Platform ). Familiarity with microservices architecture and service mesh technologies . Prior experience with application performance tuning and optimization .

Posted 2 weeks ago

Apply

5.0 - 8.0 years

11 - 21 Lacs

Hyderabad, Pune, Bengaluru

Hybrid

Hi, Please share your updated resume with below details : Total Exp in SRE: Devops- Current CTC: Expected CTC Notice period: Place- Thanks & Regards, Prangyaparamit Padhy Talent Acquisition Team prangyaparamit.padhy@ltimindtree.com

Posted 2 weeks ago

Apply

2.0 - 6.0 years

5 - 9 Lacs

Hyderabad

Work from Office

Cloudant is looking for a talented Infrastructure Engineer to help manage, evolve and operate our global service infrastructure. The infrastructure team’s role is to keep our bare metal and Kubernetes infrastructure secure, healthy and performant. We play a key role across the product by providing a solid foundation to deliver Cloudant’s serverless database as a service. As an engineer in the infrastructure team, you’ll be able to develop a deep expertise in the technologies that keep a large-scale cloud database online and available. You’ll help build automation to reduce the manual effort in managing our machines, contribute to the day-to-day maintenance of the systems, and ensure our infrastructure provides the right support for Cloudant’s key customer features and security standards. We prioritise engineer growth and have a lot of in-house experience to learn from. We code primarily in Python and Ruby. Our infrastructure is a mixture of bare metal machines running Debian and Kubernetes, running on IBM’s Cloud. This is managed using Chef and Terraform, along with a lot of homegrown automation to tie it all together. Over time, you will become a subject matter expert in our infrastructure and help out debugging and fixing service issues. This role involves on-call responsibilities. Required education Bachelor's Degree Required technical and professional expertise Some experience with managing Linux machines using SSH or configuration management / Infrastructure as Code tooling (eg Skills writing code in a modern backend language (eg, Python, Go, Ruby). A focus on creating reliable code using techniques like unit testing and staged rollout. Comfortable working using pull requests and continuous integration. Experience with observability tooling (eg, Graphite, Prometheus, Grafana). Strong written skills in English and an ability to work in a distributed team. Preferred technical and professional experience Experience maintaining systems within a compliance environment (eg, financial services, tools such as Auditree). Previous experience as an SRE for a large-scale service, especially maintaining database and observability systems. Significant experience with Linux, including networking and storage debugging. Comfortable working with open-source tools, contributing fixes where needed.

Posted 2 weeks ago

Apply

8.0 - 13.0 years

30 - 45 Lacs

Pune, Bengaluru, Delhi / NCR

Work from Office

Job Title: Technical Program Manager Platform Engineering Grade: 2B Experience: 8–12 years Location: Gurgaon / Noida / Pune / Bangalore Openings: 1 About the Role: We are looking for a seasoned and technically strong Technical Program Manager (TPM) to lead and coordinate high-impact platform engineering initiatives across distributed teams. This is a senior role suited for professionals who possess a solid background in software delivery and cloud infrastructure, paired with excellent program management and stakeholder coordination skills. As a TPM, you will be responsible for end-to-end delivery of complex technical programs that span multiple scrum teams. You will be the key driver of planning, execution, governance, and delivery excellence across platform modernization, CI/CD improvements, DevOps, and microservices enablement. Key Responsibilities: Program Management & Execution: Own and drive the planning and delivery of multiple concurrent platform engineering initiatives. Define and manage end-to-end program roadmaps, timelines, dependencies, and milestones. Build and maintain program documentation including delivery plans, risk logs, and decision records. Track delivery metrics (velocity, burn-down, blockers) and provide data-driven updates to leadership. Agile Delivery Leadership: Lead agile ceremonies such as sprint planning, retrospectives, and daily stand-ups across teams. Coordinate Scrum of Scrums and cross-team program increment planning. Support backlog grooming, capacity planning, and release orchestration. Continuously improve team delivery processes by identifying and addressing inefficiencies. Cross-functional Stakeholder Coordination: Act as a bridge between engineering, product, architecture, infrastructure, and business stakeholders. Facilitate discussions to resolve technical dependencies, team blockers, and resource conflicts. Build alignment across teams to ensure shared understanding of goals, priorities, and timelines. Collaborate closely with technical leads and architects to ensure solution consistency. Platform Engineering & Technical Oversight: Contribute to strategic initiatives like cloud adoption, CI/CD automation, DevOps practices, and platform scalability. Lead modernization projects such as legacy system decomposition and infrastructure re-platforming. Partner with engineering leadership to define and implement best practices in reliability, observability, and deployment. Support efforts to reduce technical debt and increase platform stability. Communication & Risk Management: Provide clear, concise, and timely updates to executive stakeholders. Escalate risks and issues with mitigation plans. Serve as the point of contact for all program-related communication, ensuring transparency and alignment. Tech Stack & Tools: Project Management: Jira, Azure DevOps, Confluence, Monday.com Cloud Platforms: AWS, Azure, GCP Agile Frameworks: SAFe, Scrum of Scrums, Kanban Engineering Practices: CI/CD pipelines, DevOps toolchains, Microservices, SRE Reporting: Agile metrics dashboards, KPI tracking tools, performance monitors Qualifications: 8–12 years of experience in technical program management or engineering delivery. Proven experience delivering large-scale, cross-team platform or infrastructure projects. Deep knowledge of agile methodologies and hands-on experience with scaled agile frameworks. Familiarity with modern engineering practices including DevOps, microservices, CI/CD, and cloud-native architecture. Strong stakeholder engagement, conflict resolution, and communication skills. Bachelor’s degree in Computer Science, Engineering, or a related field (Master’s or certifications like PMP, SAFe, CSM preferred). Why This Role Matters: As a TPM in our platform engineering group, you will play a mission-critical role in enabling teams to build reliable, scalable, and secure digital infrastructure. You will help bring technical vision to life by ensuring that execution aligns with strategy, timelines are met, and outcomes deliver real value to the business. This is a leadership position that requires both technical fluency and programmatic discipline.

Posted 2 weeks ago

Apply

8.0 - 12.0 years

0 Lacs

karnataka

On-site

As a Site Reliability Engineering (SRE) Technical Leader on the Network Assurance Data Platform (NADP) team at ThousandEyes, you will be responsible for ensuring the reliability, scalability, and security of cloud and big data platforms. Your role will involve representing the NADP SRE team, working in a dynamic environment, and providing technical leadership in defining and executing the team's technical roadmap. Collaborating with cross-functional teams, including software development, product management, customers, and security teams, is essential. Your contributions will directly impact the success of machine learning (ML) and AI initiatives by ensuring a robust and efficient platform infrastructure aligned with operational excellence. In this role, you will design, build, and optimize cloud and data infrastructure to ensure high availability, reliability, and scalability of big-data and ML/AI systems. Collaboration with cross-functional teams will be crucial in creating secure, scalable solutions that support ML/AI workloads and enhance operational efficiency through automation. Troubleshooting complex technical problems, conducting root cause analyses, and contributing to continuous improvement efforts are key responsibilities. You will lead the architectural vision, shape the team's technical strategy and roadmap, and act as a mentor and technical leader to foster a culture of engineering and operational excellence. Engaging with customers and stakeholders to understand use cases and feedback, translating them into actionable insights, and effectively influencing stakeholders at all levels are essential aspects of the role. Utilizing strong programming skills to integrate software and systems engineering, building core data platform capabilities and automation to meet enterprise customer needs, is a crucial requirement. Developing strategic roadmaps, processes, plans, and infrastructure to efficiently deploy new software components at an enterprise scale while enforcing engineering best practices is also part of the role. Qualifications for this position include 8-12 years of relevant experience and a bachelor's engineering degree in computer science or its equivalent. Candidates should have the ability to design and implement scalable solutions with a focus on streamlining operations. Strong hands-on experience in Cloud, preferably AWS, is required, along with Infrastructure as a Code skills, ideally with Terraform and EKS or Kubernetes. Proficiency in observability tools like Prometheus, Grafana, Thanos, CloudWatch, OpenTelemetry, and the ELK stack is necessary. Writing high-quality code in Python, Go, or equivalent programming languages is essential, as well as a good understanding of Unix/Linux systems, system libraries, file systems, and client-server protocols. Experience in building Cloud, Big data, and/or ML/AI infrastructure, architecting software and infrastructure at scale, and certifications in cloud and security domains are beneficial qualifications for this role. Cisco emphasizes diversity and encourages candidates to apply even if they do not meet every single qualification. Diverse perspectives and skills are valued, and Cisco believes that diverse teams are better equipped to solve problems, innovate, and create a positive impact.,

Posted 3 weeks ago

Apply
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Featured Companies