Get alerts for new jobs matching your selected skills, preferred locations, and experience range. Manage Job Alerts
15.0 - 20.0 years
17 - 22 Lacs
Bengaluru
Work from Office
Who We Are The Cisco Distributed System Engineering (DSE) group is at the forefront of developing products that power the largest networks in the world. The networking industry is going through a massive transformation to build the next generation infrastructure to meet the needs of AI/ML workloads and continuously increasing internet users and application. We are uniquely positioned to capture that market transition. Who Youll Work With This team builds products by harnessing the potential of open-source technologies while pushing the boundaries on Systems and Silicon Architecture. They are the developers and leaders who are passionate about tackling complex technology, building large scale distributed systems and comfortable working with open-source communities and technologies. As an Engineering Manager, you will manage a group of engineers, technical leads and collaborate extensively with other engineering managers across geography. You will be involved with a fast-paced work environment and responsible for end-to-end product development and production support. What You'll Do The SONiC team in DSE is looking for awesome talented leaders to build the Next Generation NOS, SONiC, for Datacenter Enterprise customers and Service Providers. In this role, you will be responsible to build and deploy DevOps solution, Release Engineering activities, collaborate with engineering teams, technical leads and architects. You will adopt new development tools and infrastructure, manage release engineering activities, CICD pipelines, come up with efficient build mechanisms and accelerate software development process. In this role you will: Mentor and develop engineering talent, fostering a culture of continuous learning and professional growth within the broader SONiC Group. Give technical inputs to the team on DevOps and Release Engineering areas. Guide team members technically, participate in the design discussions and code reviews. Own deliverables schedule and track progress of the projects. Collaborate closely with cross-functional teams, including development, product management and security. Be a role model of company culture and values. Who You Are You are an experienced leader with a successful track record of leading product teams responsible to design and develop high quality networking products. Around 15 years of work experience with 5 years of hands on management experience. You have managed at least 5-8 full time employees. Prior work experience with DevOps and Release Management, SRE activities. You have worked on these areas for at least 5 years as a core developer. Excellent understanding of Docker, Containerized environment, Jenkins, Databases, Opensource tools, Security vulnerabilities. Besides managing the team, you will use your domain expertise to participate in technical discussions. You are comfortable going deep networking areas and provide inputs to the team whenever necessary. Eager to participate in code reviews in Python whenever required. Enjoy leading fast paced, high performance agile development teams. You like leading the team from the front, identify team strengths, build in-house talent and competencies. Good in hiring and retaining top talents across grades. Involved in performance evaluation, salary planning, budgeting and forecasting. Proponent of innovation and process improvements in the team with a focus on improving team productivity. Excellent written and verbal communications and negotiation skill. Quick learner, self motivated.
Posted 2 months ago
10.0 - 15.0 years
10 - 20 Lacs
Hyderabad, Pune, Ahmedabad
Work from Office
Skills: SRE, AWS Devops, Azure Devops Education: B.TECH, B.Sc, BCA Year of Experience : 3-15 Yrs Location : Pan India
Posted 2 months ago
10.0 - 13.0 years
35 - 50 Lacs
Chennai
Work from Office
Job Summary We are seeking an experienced R2 Architect with 10 to 13 years of experience in SRE DevOps and SRE Concepts. The ideal candidate will work in a hybrid model primarily during the day shift. This role does not require travel. The candidate will play a crucial role in ensuring the reliability and efficiency of our systems contributing to the companys overall success and societal impact. Responsibilities Lead the design and implementation of SRE practices to enhance system reliability and performance. Oversee the development and maintenance of automated solutions for system monitoring and incident response. Provide technical guidance and mentorship to the SRE team to ensure best practices are followed. Collaborate with cross-functional teams to identify and address system bottlenecks and performance issues. Implement and manage CI/CD pipelines to streamline software delivery processes. Develop and maintain comprehensive documentation for SRE processes and procedures. Conduct regular system audits and performance reviews to ensure optimal operation. Implement robust incident management protocols to minimize downtime and service disruptions. Monitor system health and performance metrics to proactively address potential issues. Drive continuous improvement initiatives to enhance system reliability and efficiency. Ensure compliance with industry standards and best practices in SRE and DevOps. Facilitate effective communication and collaboration between development and operations teams. Utilize data-driven insights to inform decision-making and optimize system performance. Qualifications Possess extensive experience in SRE DevOps and SRE Concepts. Demonstrate proficiency in implementing and managing CI/CD pipelines. Exhibit strong problem-solving skills and the ability to address complex system issues. Have a solid understanding of automated monitoring and incident response solutions. Show excellent communication and collaboration skills to work effectively with cross-functional teams. Maintain a proactive approach to system health and performance monitoring. Display a commitment to continuous improvement and staying updated with industry trends. Hold relevant certifications in SRE or DevOps practices. Bring a proven track record of enhancing system reliability and efficiency. Demonstrate the ability to mentor and guide team members in best practices. Exhibit strong organizational skills and attention to detail. Have experience in developing and maintaining comprehensive documentation. Show a commitment to ensuring compliance with industry standards and best practices.
Posted 2 months ago
13.0 - 18.0 years
35 - 55 Lacs
Bengaluru
Hybrid
SRE Manager About Ushur I Ushur XOS l Ushur GenA I Location: Bangalore Work Mode: Hybrid Experince: 12 to 18 Years The Role Our fast-growing team is seeking a Manager of SRE to join us as we pioneer Customer Experience AutomationTM as an Industry category. As the Manager of SRE you will be responsible for two important charters Operate and manage Ushurs production cloud Build a white-glove customer support and incident management function The ideal candidate for this role will be passionate about building a healthy high-performing team, and bring strong technical leadership, a customer-centric focus, and results-oriented action. You will begin as a player/coach while building and continuously improving execution, processes, tools/technology and analytics. Responsibilities Build and Manage a world-class SRE team. Design a 24x7 follow-the-sun organization including seamless handover across regions. Mentor and grow team focused on delivering white glove support and incident management service. Drive data-driven SRE strategy by defining and prioritizing SRE Objectives and Key Results (OKRs) aligned with company mission. This includes setting measurable targets for key service level agreements Manager Enterprise Support function to deliver exceptional white glove experiences at scale in close partnership with our Customer Success, Solution Consulting and Engineering teams. Responsible for ensuring that the Ushur platform runs reliably in production. Partner with the DevOps, Security and Engineering teams to automate deployment, monitoring and observability of the production cloud. Bring deep technical expertise in Ushur Customer Experience Automation. Provide customers with ongoing technical support and incident management for complex issues and support escalations. Optimize and automate support processes including improving the reliability of on-call processes, managing incidents, updating runbooks and documentation, reviewing RCAs and recommending solutions to prevent the recurrence and severity of incidents. Cross-functionally to drive positive customer outcomes. Engage with Product, Sales, Customer Success, Solution Consulting, Security, and Engineering, as necessary to make customers successful on our platform Qualifications 5+ years of experience of SRE/CloudOps Manager/Lead role in Enterprise SaaS Track record of developing and mentoring great talent, building and motivating high-achieving teams. Ability to lead diverse teams across multiple time zones. Business Acumen - Ability to quickly grasp and adapt to a variety of customer verticals, geographies, and business structures. Excellent verbal, written, and presentation skills with the ability to absorb complex technical concepts and communicate them to a non-technical audience Highly organized, collaborative and detail-oriented Deep experience with AWS cloud services, REST APIs, Linux Experience with DevOps processes and Build deployment, and orchestration technologies Passion for technology and for being a part of a fast-growing SaaS startup where we move quickly and wear many hats Flexible approach, able to operate effectively with uncertainty and change Driven, self-motivated, enthusiastic and with a can do attitude Benefits Great Company Culture. We pride ourselves on having a values-based culture that is welcoming, intentional, and respectful. Bring your whole self to work . We are focused on building a diverse culture, with innovative ideas where you and your ideas are valued. We are a start-up and know that every person has a significant impact! Rest and Relaxation . 20 days of flexible leaves per year, Monthly Wellness Day (aka a day off to care for yourself) and more! Health Benefits. Preventive health checkups, Medical Insurance covering the dependents, wellness sessions, and health talks at the office Keep learning. One of our core values is Growth Mindset - we believe in lifelong learning. Certification courses are reimbursed. Ushur Community offers wide resources for our employees to learn and grow. Flexible Work. In-office or hybrid working model, depending on position and location. We seek to create an environment for all our employees where they can thrive in both their profession and personal life. Why join us? We are passionate about Ushur, our product, and helping our employees grow and develop in their career in a caring, collaborative environment. We offer a very competitive compensation plan & stock options for the ideal candidates.
Posted 2 months ago
3.0 - 6.0 years
8 - 12 Lacs
Bengaluru
Work from Office
In this Site Reliability Engineer role, you will work closely with entire IBM Cloud organization to maintain and operationally improve the IBM cloud infrastructure. You will focus on the following key responsibilities: Ability to respond promptly to production issues and alerts 24x7 Execute changes in the production environment through automation Implement and automate infrastructure solutions that support IBM Cloud products and services to reduce toil. Partner with other SRE teams and program managers to deliver mission-critical services to IBM Cloud Build new tools to improve automated resolution of production issues Monitor, respond promptly to production alerts, Execute changes in Production through automation Support the compliance and security integrity of the environment Continually improve systems and processes regarding automation and monitoring. Required education Bachelor's Degree Preferred education Master's Degree Required technical and professional expertise Excellent written and verbal communication skills. Minimum 5+ years experience in handling large production systems environment Must be extremely comfortable using and navigating within a Linux environment Ability to do low level debugging and problem analysis by examining logs and running Unix commands Must be efficient in writing and debugging scripts 3-5+ years of experience in Virtualization Technologies and Automation / Configuration Managements Automation and configuration management tools/solutionsAnsible, Python, bash, Terraform, GoLang etc. (at least one) Virtualization technologiesCitrix Xen Hypervisor (Preferred), KVM(also preferred), libvirt, VMware vSphere, etc. (at least one) Monitoring technologiesZabbix, Sysdig, Grafana, Nagios, Splunk, etc. (at least one) Working knowledge with Container technologiesKubernetes, Docker, etc. Flexibility to work on shifts to handle production systems Preferred technical and professional experience Good experience inPublic cloud platforms,Kubernetes clusters and Strong Linux skills for managing services across microservices platform, good SRE knowledge in Cloud Compute, Storage and Network services.
Posted 2 months ago
4.0 - 9.0 years
17 - 22 Lacs
Bengaluru
Work from Office
Job Summary: We are seeking a highly skilled Site Reliability Engineer (SRE) with experience to join our team in Bangalore. The ideal candidate will excel in implementing SRE principles to foster a culture of reliability, automation, and monitoring across our software engineering projects. This role is pivotal in ensuring the effective design, development, testing, and support of applications and systems, particularly within cloud environments. Software Requirements: Required Proficiency: Programming LanguagesTypeScript, Node.js Cloud EnvironmentsAWS (ECS Fargate, Vault, Lambda services, Artifactory) CI/CD ToolsGitHub Actions, JFrog Artifactory, Sysdig, Octopus, Terraform Observability ToolsObStack, Prometheus, Grafana, PagerDuty, Observe Infrastructure as Code (IaC) ToolsCloudFormation, Terraform Preferred Proficiency: Familiarity with additional programming languages or frameworks Experience with cloud platforms other than AWS Overall Responsibilities: Partner with senior stakeholders to lead a culture focused on data-driven reliability, monitoring, and automation in alignment with SRE principles. Design, develop, test, and support applications and systems, emphasizing managing and scaling distributed systems across cloud environments. Create and develop tools essential for the operational management and security of software applications and systems. Identify technology limitations and deficiencies in existing systems and implement scalable improvements. Drive automation efforts and enhance application monitoring capabilities. Review code developed by other engineers to ensure adherence to best practices. Thrive in incident response environments, conducting post-mortem analyses and designing secure solutions. Measure and optimize system performance, addressing customer needs and innovating for continuous improvement. Technical Skills (By Category): Programming Languages: Required: TypeScript, Node.js Cloud Technologies: Required: AWS (ECS Fargate, Lambda, Vault, Artifactory) Development Tools and Methodologies: Required: GitHub Actions, JFrog Artifactory, Sysdig, Octopus, Terraform Observability Tools: Required: ObStack, Prometheus, Grafana, PagerDuty, Observe Infrastructure as Code (IaC): Required: CloudFormation, Terraform Experience Requirements: 7 to 10 years of experience in software engineering and SRE practices. Experience in applying SRE practices in large organizations. Familiarity with modern software development practices and DevSecOps environments. Day-to-Day Activities: Collaborate with stakeholders to understand business needs and implement SRE practices. Lead cross-functional teams in enhancing system reliability and performance. Develop and maintain operational management tools for applications. Conduct regular code reviews and ensure adherence to best practices. Participate in incident response and post-mortem analysis to improve system resilience. Qualifications: Required: Bachelor’s or Master’s degree in Computer Science, Information Technology, or a related field. Commitment to continuous professional development through industry certifications and training. Professional Competencies: Strong critical thinking and problem-solving skills. Excellent leadership and teamwork abilities. Effective communication and stakeholder management skills. Adaptability and a learning-oriented mindset. Innovative thinking to drive continuous improvement. Strong time and priority management skills.
Posted 2 months ago
1.0 - 5.0 years
8 - 15 Lacs
Bengaluru
Work from Office
Junior DevOps Engineer / DevOps Engineer Location: Bengaluru South, Karnataka, India Experience: 1.53 Years Compensation: 815 LPA Employment Type: Full-Time | Work From Office Only ________________________________________ Are you an aspiring DevOps professional ready to work on a transformative platform? Join a purpose-led team building India’s most disruptive ecosystem at the intersection of technology, property, and sustainability. This role is ideal for engineers who are eager to learn, automate, and contribute to building reliable, scalable, and secure infrastructure. Key Responsibilities Assist in designing, implementing, and managing CI/CD pipelines using tools like Jenkins or GitLab CI to automate build, test, and deployment processes. Support the deployment and management of cloud infrastructure, primarily on AWS, with exposure to Azure or GCP. Contribute to infrastructure as code practices using Terraform, CloudFormation, or Ansible. Participate in maintaining and operating containerized applications using Docker and Kubernetes. Implement and manage monitoring and logging solutions using Grafana, Loki, Prometheus, or ELK stack. Collaborate with engineering and QA teams to streamline release pipelines, ensuring high availability and performance. Develop basic automation scripts in Python or Bash to optimize and streamline operational tasks. Gain exposure to serverless and event-driven architectures under guidance from senior engineers. Troubleshoot infrastructure issues and contribute to system security and performance optimization. Requirements 1.5 to 3 years of experience in DevOps, SRE, or related infrastructure roles. Solid understanding of cloud environments (AWS preferred; Azure/GCP a plus). Basic to intermediate scripting knowledge in Python or Bash. Familiarity with CI/CD concepts and tools such as Jenkins, GitLab CI, etc. Working knowledge of Docker and introductory experience with Kubernetes. Exposure to monitoring and logging stacks (Grafana, Loki, Prometheus, ELK). Understanding of infrastructure as code using tools like Terraform or Ansible. Familiarity with networking, DNS, firewalls, and system security practices. Strong problem-solving skills and a learning mindset. Preferred Qualifications Certifications in AWS, Azure, or GCP. Exposure to serverless architectures and event-driven systems. Experience with additional monitoring tools or scripting languages. Familiarity with geospatial systems, virtual mapping, or sustainability-oriented platforms. Passion for eco-conscious technology and impact-driven development. Why You Should Join Contribute to a next-gen PropTech platform promoting sustainable and inclusive land ownership. Work closely with senior engineers committed to mentorship and ecosystem building. Join a team where your ideas are valued, your skills are sharpened, and your work has real-world impact. Be part of a vibrant, office-first culture that encourages innovation, collaboration, and growth.
Posted 2 months ago
7.0 - 12.0 years
20 - 25 Lacs
Hyderabad, Pune, Bengaluru
Work from Office
Hi, Wishes from GSN!!! Pleasure connecting with you!!! We been into Corporate Search Services for Identifying & Bringing in Stellar Talented Professionals for our reputed IT / Non-IT clients in India. We have been successfully providing results to various potential needs of our clients for the last 20 years. At present, GSN is SRE Production Support hiring for one of our leading MNC client. PFB the details for your better understanding: Experience: 6+ Yrs Budget: 15LPA- 25LPA Work Location: BLR/HYD/PUNE Mode: WFO (5 Days in Office) Work Timing : 24/7 (cab facility and shift allowance will be provided) Whom we look for? We are looking for an experienced SRE (Site reliability Engineer) Should have worked in both Application Support(Java/.Net) Experience in L2 or L3 application support ( Alert Configuration + Dashboard Creation ) Experience with Release Management and Production Deployment Experience in Splunk Experience with Grafana If interested, kindly APPLY for IMMEDIATE response. Thanks & Rgds KAVIYA | GSN | Kaviya@gsnhr.net |Google Reviews: https://g.co/kgs/UAsF9W
Posted 2 months ago
7.0 - 10.0 years
15 - 20 Lacs
Hyderabad, Pune, Bengaluru
Work from Office
Urgent Hiring | SRE L2 Support | 5 Days WFO | Immediate Joiners Only Job Details: Experience: 7-12 Years Location: Work From Office all 5 days Work Days: 5 Days/Week Joining: Immediate Only Key Responsibilities: SRE + Application Support(Java/.Net) + Release Management + Production Deployment + L2 Support(Alert Configuration + Dashboard Creation) + Splunk + Grafana KNOWLEDGE JD : Experience in supporting Large-Scale distributed systems involving multi-data center/PODS, load balancers, databases, Middleware, and multiple backend services, including microservices. Hands-on experience in diagnosing and resolving production issues, including Performance degradation, intermittent issues, log analysis, network failures, database failures, and code errors etc. Experience in implementing standard maintenance activities like DR Failover & Testing, security patching, Service Account & Certificate renewals. Experience in Production Deployment using CI/CD pipelines Splunk Query Skills: Ability to write effective Splunk queries for data analysis and monitoring Thanks & Rgds GSN HR || Email :Shobana@gsnhr.net || Web : www.gsnhr.net Google Reviews : https://g.co/kgs/UAsF9W
Posted 2 months ago
5.0 - 10.0 years
12 - 14 Lacs
Hyderabad
Work from Office
We are seeking a dedicated and skilled Operations Engineer to join our team. This role is pivotal in ensuring the reliability, performance, and availability of our systems while facilitating smooth integration and delivery processes. The ideal candidate will have a strong background in site reliability engineering (SRE) and DevOps practices. You will collaborate with p roduct o wners, d evelopers, a rchitects , vendors, and other professionals to monitor , operate, support, audit and improve our digital solutions , their related processes, and controls. You will demonstrate and maintain high standards while foster ing a proactive, efficient, and service-oriented work environment. Communication and professionalism are paramount as you will be representing our team to effectively engage with technical and business leadership as well as external providers of digital services. You will also use all your abilities to explain solutions and complex issues while demonstrating the ability to lead and impart knowledge effectively to other team members. Operational Quality & Compliance: Ensure high standards of operational quality across all systems.Review and update procedures to ensure compliance with audit controls, support internal and external audits of the development and operation of the platform. Metrics and Monitoring: Develop and maintain comprehensive monitoring solutions to track system performance health, and reliability including alerts and dashboards. Incident Response: Provide first-level support for production incidents, ensuring quick resolution and minimal downtime. Identify problems, escalate and support their resolution. Reliability Improvements: Implement strategies to enhance system reliability and performance. Identify , analyze, and resolve patterns in operational issues, implementing solutions to prevent recurrence. TECHNICAL SKILLS Proficiency in monitoring tools (e.g., AppInsights, Grafana) Experience with cloud platforms (e.g. Azure, GCP) Strong scripting and automation skills (e.g., Powershell, Python) Familiarity with incident management processes Understanding of containerization technologies (e.g., Kubernetes) Troubleshooting of complex distributed environments Collaboration Skills : Work closely with product and project teams to integrate reliability best practices. Collaborate to streamline development and operational processes, enhancing overall efficiency. EDUCATION/CERTIFICATIONS Preferred: Bachelor's degree in Computer Science, Software Engineering, Information Systems, equivalent work history/experience or working towards achieving a degree Strong focus on systems engineering, reliability, and performance. Experience in development operations, automation, and troubleshooting. E XPERIENCE Strong knowledge of IT infrastructure services required 5+ years - IaC Technologies leveraging Terraform (eg.ADO, Pipelines, Git, YAML) 5+ years - Orchestration and containerization using Kubernetes 5+ years -API Integration of infrastructure systems such as Azure, ServiceNow, Active Directory 4+ years - Azure Public Cloud Solutions Experience with high availability, globally delivered, solutions and strong troubleshooting skills. Familiarity with incident management processes. Microsoft Cloud Infrastructure Certification, SRE Certification Proficient in scripting and automation, with a solid understanding of infrastructure as code practices. LEADERSHIP /SOFT SKILLS Strong Verbal and Written Communication: "Candidates must demonstrate exceptional verbal and written communication skills to effectively convey information and collaborate with team members." Effective Communicator: "The ideal candidate will be an effective communicator who can articulate ideas clearly and concisely to diverse audiences." Adaptable Communication Style: "We value candidates who can adjust their communication style based on the audience and context, ensuring clarity and understanding."
Posted 2 months ago
4.0 - 9.0 years
12 - 22 Lacs
Bengaluru
Hybrid
An opportunity for Service Reliability Operator (SRO) who will ensure the availability and resiliency of our Cloud services 24x7x365. The ideal candidate will have a pulse on the Oracles SaaS services and be accountable for the troubleshooting and resolution of service issues. Additionally, you will have the opportunity to create future automation and tooling that will allow us to continuously improve our service. Your role in driving improvements in availability, effort and velocity will delight our customers with and while reducing costs of Operations. You will leverage excellence in communication, technical/business analysis, problem solving and attention to detail to methodically resolve issues. Responsibilities In this role you will need to: Technical Resolution of Service Issues Automation of day-on-day operation work. Troubleshooting: have a deep understanding of our services and dependencies in order to respond quickly and efficiently to major incidents and minimize service disruptions when they occur Identify the processes which becomes bottlenecks in operations management and resolve them through process improvement, automation. Stay informed of new technologies, Innovate. Ownership: understand internal team process and ensure compliance with them. Administer production servers/services and test system health. Offer mitigation paths to accelerate the process of system recovery. Work with system monitoring and alerting tools to identify trouble source. Execute defined SOPs to avoid or reduce event impact duration. Undertake Incident Command training and experience working on an on-call rotation. Contribute to Technical Resolution of Service Issues Contribute to automation of day-on-day operation work. Our Ideal Candidate: Bachelor's degree in CS, EE, or equivalent 5+ years’ work experience in supporting Production Services Excellent working experience in Unix/Linux/Windows OS Exposure to DevSecOps Tools, OCI. Excellent working /Troubleshooting experience in Application Middleware/Tomcat/Weblogic Servers. Demonstrable experience in one or more scripting/programming languages: Python, Java, Perl, shell Strong communication and analytical skills Understanding of virtualization solutions and Cloud services Able to work as part of a shift in a 24x7x365 operations team. Understanding of monitoring / dashboards (e.g. Enterprise Manager, Grafana, Kibana or Equivalent, Splunk etc) Excellent problem-solving skills Technical background with an ability to troubleshoot issues impacting large scale service architectures and application stacks. Handles hard problems with a positive "can do" attitude. Team player and able to work with others all skill levels. Understanding AI from operation perspective.
Posted 2 months ago
12.0 - 15.0 years
30 - 45 Lacs
Hyderabad, Bengaluru
Work from Office
Position Overview : We are looking for a SRE Architect who will work with technology experts to design optimal solutions to requirements for our customers. This is achieved through interactive requirements gathering, determination of best fit solutions based on problem solving approaches, integrated solution design based on multiple technology types, and a strong ability to present and articulate solutions to senior members of the customer teams. Roles and Responsibilities: Own the Infrastructure, APM and work with Developers and Systems engineers to Build, Release, Monitor and run the services reliability exceeding the agreed SLAs.. Write software to automate API-driven tasks at scale and contribute to the product codebase in Java, JS, React, Node, Go and Python Write automation to reduce toil and eliminate manual tasks that are repeatable. Work with Ansible, Puppet, Chef, Terraform or another config management / orchestration suite, know where it's broken, work towards fixing them and explore new alternatives Define and accelerate implementation of support processes, tools and best practices Maintain services once they are live by measuring and monitoring availability, latency and overall system reliability Handle cross team performance issues from identification of the cause, determining the areas of improvement and driving those actions to closure Performance and maturity baselining of Systems, tools maturity & coverage, metrics, technology and engineering practices Define, Measure and improve Reliability Metrics (SLO/SLI), Observability (Monitoring, Logging-Tracing solutions), Ops process (Incident, Problem Mgmt) and streamline automate release management. Build dashboards to provide visibility into performance of the applications. Create chaos in the production environment purposefully in a controlled manager to validate reliability of systems. Mentor and coach other SREs in the organization Provide written and verbal updates to executives and the stakeholders of the application in the organization. Understand the current process, system setup and propose the improvements needed in the processes, and technology so that the application exceeds the desired Service Level Objective. Strong believer of automation to bring in sustained continuous improvement by automating Toil, Runbooks, improving ability of the applications to auto heal leading to improved reliability. Must Have Skills: The successful candidate will have the following attributes/qualifications: 15+ years of experience in Development and Operations of applications/services in production that has uptime over 99.9%. 8+ years of experience as a SRE in handling applications that are web scale Strong hands-on coding experience in one or more programming languages such as Python, Golang, Java, Bash, etc. Good understanding of Observability (monitoring, logging, tracing, metrics), Chaos engineering concepts. Proficiency in using Observability tools (example: New Relic, Datadog, etc) for monitoring, logging, tracing. Expert level hands on knowledge in public cloud platform AWS and/or Google Cloud Platform. Professional level certificate on one of the public clouds is highly desirable. Must have hands-on experience in using configuration management systems such as Ansible or SaltStack and infrastructure automation tools like Terraform or CloudFormation. Should have used altering systems such as Pager Duty. Should have implemented solutions around Service Level Indicators (SLIs) and Service Level Objectives (SLOs) for services. Measurement should have been within a system and across systems in distributed systems Should have supported Production Incidents (PIs) on critical applications of a company. Troubleshoot, debug, and diagnose operational issues and drive them to closure. Understanding of software delivery life cycles, particularly Agile/Lean & DevOps Proven experience in handling large scale and growing infrastructure across Data Centers and heterogeneous Cloud platforms Experience as a service owner in managing large geographically diverse stakeholders Ability to work with creative – fast growing engineering team and motivate them to deliver their best work History of driving innovation. Good to Have Skills: Familiarity with handling: Containerization – Kubernetes, Docker, Rancher, etc Kafka, Yarn, ElasticSearch etc. Source code management and Implementation of Security best practices. © NomiSo India Private Limited, 2023, All rights reserved Tech Stack - Python, Falcon, Elastic Search, MongoDB, AWS (SQS S3), Map Reduce. Networking knowledge Understanding of software delivery life cycles, particularly Agile/Lean & DevOps Contribution to open source community. Qualification: Master’s or Bachelor’s degree in Computer Science Engineering, or a related technical degree. Website: https://www.nomiso.io/
Posted 2 months ago
5.0 - 8.0 years
10 - 20 Lacs
Bengaluru
Hybrid
Job Title: SRE Engineer - Bangalore About Us Capco, a Wipro company, is a global technology and management consulting firm. Awarded with Consultancy of the year in the British Bank Award and has been ranked Top 100 Best Companies for Women in India 2022 by Avtar & Seramount . With our presence across 32 cities across globe, we support 100+ clients across banking, financial and Energy sectors. We are recognized for our deep transformation execution and delivery. WHY JOIN CAPCO? You will work on engaging projects with the largest international and local banks, insurance companies, payment service providers and other key players in the industry. The projects that will transform the financial services industry. MAKE AN IMPACT Innovative thinking, delivery excellence and thought leadership to help our clients transform their business. Together with our clients and industry partners, we deliver disruptive work that is changing energy and financial services. #BEYOURSELFATWORK Capco has a tolerant, open culture that values diversity, inclusivity, and creativity. CAREER ADVANCEMENT With no forced hierarchy at Capco, everyone has the opportunity to grow as we grow, taking their career into their own hands. DIVERSITY & INCLUSION We believe that diversity of people and perspective gives us a competitive advantage. Job Title: SRE Engineer - Bangalore Key skills: SRE, Garfana, Python/ Scripting, Unix, SQL Location – Bangalore (Hybrid - 3 days WFO) Shift Timings: 12:30pm-9:30pm Looking only for immediate joiners Technical Requirement: Job Summary: Capco is looking for a Bengaluru based Developer with prior experience in developing operational tooling. Candidates should have a technical background in Grafana, Python, and DBMS, and be prepared to support development at an Enterprise Command Center with 60+ India based SRE and Production Support Engineers. • Design, build and maintain core infrastructure that enables our client to maintain 8,000+ users • Work on performance enhancements/developments as prioritized by Business and collaborate with Production Support for implementation • Support debugging of production issues across services • Lead initiatives to improve system stability and efficiency • Identify automation opportunities throughout daily processes to streamline tasks and increase operational efficiency • Collaborate across teams such as operations, IT, and business stakeholders to improve operational tools Desired Experience / Skills: • Bachelor’s degree, preferably in Computer Science or Engineering or other relevant technical fields • Expert on Unix/Linux using Python • Expert in a data management system (DBMS) • In-depth knowledge and experience with Grafana • Experience in developing Grafana Dashboard • Build monitoring solutions that alert on symptoms rather than on outages • Experience working within SRE, DevOps and Production Support • Ability to operate with a managed services mentality and utilize Agile methodologies to accomplish all tasks • Financial Services Industry experience preferable • Experience with knowledge management systems, preferably Confluence and SharePoint • Knowledge of Cloud – AWS, Azure and Snowflake would be considered a big plus If you are keen to join us, you will be part of an organization that values your contributions, recognizes your potential, and provides ample opportunities for growth. For more information, visit www.capco.com. Follow us on Twitter, Facebook, LinkedIn, and YouTube.
Posted 2 months ago
12.0 - 16.0 years
40 - 45 Lacs
Hyderabad
Work from Office
Overview In this role, we are seeking a Senior Manager Offshore Program & Delivery Management to oversee program execution, governance, and service delivery across DataOps, BIOps, AIOps, MLOps, Data IntegrationOps, SRE, and Value Delivery programs. This role requires strong expertise in offshore execution, cost optimization, automation strategies, and cross-functional collaboration to drive operational excellence. Manage and execute DataOps programs, ensuring alignment with business objectives, data governance standards, and enterprise data strategy. Oversee real-time monitoring, automated alerting, and self-healing mechanisms to improve system reliability and performance. Develop and enforce governance models and operational frameworks to streamline service delivery and execution roadmaps. Drive standardization and automation of pipeline workflows, report generation, and dashboard refreshes to enhance efficiency. Collaborate with global teams to support Data & Analytics transformation efforts and ensure sustainable, scalable, and cost-effective operations. Support proactive issue identification and self-healing automation, enhancing the sustainment capabilities of the PepsiCo Data Estate. Responsibilities Manage and oversee offshore teams delivering DataOps, BIOps, Data IntegrationOps, FinOps, AIOps, MLOps, and SRE initiatives to drive operational excellence. Implement governance frameworks, define KPIs, and establish operational SLAs to ensure efficiency and quality in offshore execution. Drive process standardization, cost optimization, and automation adoption to enhance service scalability and effectiveness. Collaborate with onshore teams, business leaders, and stakeholders to ensure seamless execution and alignment of offshore deliverables with business goals. Optimize resource utilization by leveraging automation and AI-driven insights to improve productivity and streamline operations. Ensure continuous improvement, risk mitigation, and compliance adherence across offshore programs to maintain operational integrity. Act as a key liaison between IT, business leaders, data stewards, and compliance teams to ensure alignment with regulatory and security requirements. Monitor and enhance end-to-end Data Operations and sustainment processes, including testing, monitoring, and support for global data products. Manage day-to-day DataOps activities, ensuring adherence to SLAs, incident resolution, and engaging with SMEs to meet business demands. Contribute to work intake and Agile management processes, supporting data platform teams in executing strategic initiatives effectively. Foster strong relationships with senior stakeholders and executives, ensuring transparency, proactive risk assessment, and continuous communication. Collaborate across teams to address cloud infrastructure and data service challenges, ensuring high system availability and performance. Develop and automate operational policies and crisis management functions to minimize downtime and enhance incident response. Champion a customer-obsessed culture, advocating for high-quality service delivery and continuous process enhancements. Build and develop a high-performing team, fostering a diverse and agile work environment that aligns with business objectives. Adapt quickly to changing priorities, ensuring teams remain productive and focused on key deliverables. Leverage cloud and high-performance computing expertise to establish trust, drive innovation, and enhance the overall customer experience. Qualifications 12+ years of technology experience in a large-scale global organization, preferably in the CPG industry. 8+ years of experience in Data & Analytics, with a strong understanding of data engineering, data management, and operations. 7+ years of cross-functional IT experience, collaborating across multiple teams and stakeholders. 5+ years of leadership/management experience, overseeing teams and driving operational excellence. Familiarity with Site Reliability Engineering (SRE) principles, including automated issue resolution and scalability improvements. Excellent communication skills, with the ability to empathize with stakeholders and explain technical issues to varied audiences. Strong customer focus, advocating for end-user needs and delivering high-quality experiences. Proactive problem-solving mindset, taking ownership of issues and driving resolution. Ability to learn and adapt in a fast-paced environment, staying up to date with emerging technologies and methodologies. Experience in technical support and operations for mission-critical solutions in a Microsoft Azure environment. Familiarity with Site Reliability Engineering (SRE) principles, including automated issue resolution and scalability improvements. Proven ability to drive operational excellence, ensuring stability and performance in complex enterprise environments. Experience managing large-scale operational services in dynamic and evolving technology landscapes. Strategic thinking capabilities, focusing on cost efficiency, operational effectiveness, and delivery speed. Ability to develop and execute strategic plans, aligning technology roadmaps with business objectives. Strong relationship-building skills, fostering trust and collaboration across IT and business functions. Proven ability to align business and IT priorities, identifying mutually beneficial solutions. Experience leading cross-functional and virtual teams, effectively communicating vision and objectives. Demonstrated success in delivering high-impact results in complex and transformational projects. Experience with multi-country/global implementations, particularly involving data and analytics. Understanding of master data management, data governance, and analytics frameworks. Knowledge of data acquisition, data cataloging, and data management tools. Strong influencing and negotiation skills, with the ability to engage and persuade stakeholders at all levels.
Posted 2 months ago
7.0 - 12.0 years
12 - 22 Lacs
Pune
Work from Office
Experience-7+ Years Job Locations-Pune Notice Period-30 Days Job Description- • AWS Ecosystem EKS, EC2, DynamoDB, Lambda, etc. Dynatrace (or similar) The capacity planning team should include some members with Dynatrace experience, while the rest can have experience with similar tools. • Develop and implement capacity planning strategies. Roles & Responsibilities- • Develop and implement capacity planning strategies for AWS environments, focusing on EKS (Elastic Kubernetes Service). • Conduct continuous load testing to assess system performance under varying conditions and identify potential bottlenecks. • Analyze system metrics and usage patterns to forecast future capacity needs and recommend scaling solutions. • Collaborate with development and operations teams to ensure application performance aligns with business objectives. • Design and execute automated testing frameworks to simulate real-world usage scenarios. • Monitor cloud resource utilization and optimize costs while maintaining performance standards. • Prepare detailed reports on capacity trends, load testing results, and recommendations for improvements. • Stay updated on industry best practices and emerging technologies related to cloud infrastructure and capacity management. • 7+ years of proven experience with AWS services, particularly EKS, EC2, S3, and RDS. • Strong understanding of container orchestration and microservices architecture. • Experience with continuous load testing tools (e.g., JMeter, Gatling) and performance monitoring solutions. • Proficiency in scripting languages (e.g., Python, Bash) for automation tasks. • Excellent analytical skills with the ability to interpret complex data sets. • Strong communication skills to effectively collaborate with cross-functional teams.
Posted 2 months ago
10.0 - 14.0 years
8 - 15 Lacs
Hyderabad, Chennai, Bengaluru
Hybrid
Role & responsibilities Role: Triage Manager Experience: 10+ Years Work location: PAN India Job description: Strong & assertive communication skills, able to take command of bridges & discussions Understanding of web applications, typical java stack, APIs, common errors, familiarity with technical terminologies Know how of application hosting (cloud /container / on-prem) & components involved (GTM, LTM, Firewall, network, Infra) Ability to leverage tools & dashboards like Splunk, AppD, Grafana to ask right questions during triage Work experience in production support environment. Handled projects of this nature & worked on P1 /P2 incident bridges Preferred candidate profile
Posted 2 months ago
8.0 - 10.0 years
9 - 13 Lacs
Hyderabad, Chennai, Bengaluru
Hybrid
Production support expertise with SRE Observability experience : Proactive issue identification using observability tools. Skills in using different monitoring & observability tools to track system performance Production support activities including proactive identification of issues leveraging observability tools, Corelating inputs from various dashboards & tools to drive resolution Experience in swiftly identifying probable failure points through the analysis of multiple inputs from the logs, observability dashboards, recent application changes, infra, network changes etc. Basic level of trouble shooting on every layer of the tech stack (Application, Database, Infra (Container platforms) and Network ) Role & responsibilities
Posted 2 months ago
5.0 - 10.0 years
25 - 40 Lacs
Bengaluru
Work from Office
We are looking for a highly skilled Site Reliability Engineer (SRE) with strong Python development expertise to optimize, automate, and scale our infrastructure. This role requires deep experience in Python coding, DevOps practices, cloud infrastructure, and observability tools . You will work closely with engineering teams to build highly available, scalable, and reliable systems. Key Responsibilities: Develop and maintain automation tools using Python for deployment, monitoring, and scaling infrastructure. Build and manage CI/CD pipelines for faster and more efficient releases. Optimize system performance, reliability, and availability through proactive monitoring and observability tools. Troubleshoot and resolve production incidents with a focus on root cause analysis and automation to prevent recurrence. Implement Infrastructure-as-Code (IaC) using Terraform, Ansible, or similar tools. Manage cloud infrastructure (AWS, GCP, or Azure) with a strong focus on automation and security. Enhance monitoring and alerting using Prometheus, Grafana, Datadog, or similar tools. Collaborate with developers to implement best practices for performance, security, and scalability.
Posted 2 months ago
5.0 - 8.0 years
7 - 11 Lacs
Tamil Nadu
Work from Office
Duration: 12Months Work Type: Onsite Position Description: Part of 24x7 support for the client's Pro Incident management team. This Technical Support Engineer is responsible for providing technical assistance to customers and internal teams, diagnosing and resolving issues, and ensuring the smooth operation of software, hardware, and systems. They will act as the first point of contact for technical inquiries and are expected to have strong problem-solving, communication, and customer service skills Key Responsibilities: Troubleshooting and Diagnosis: Identifying the root cause of technical issues, whether hardware, software, or network related. Customer Support: Providing technical assistance to customers via phone, email, chat, or in-person. Root Cause Analysis: Conducting in-depth investigations to prevent future issues. Documentation: Maintaining accurate records of issues, solutions, and resolutions in a knowledge base or ticketing system. Collaboration: Working with various teams, including development, engineering, and product management, to resolve issues and implement improvements. System Monitoring: Continuously monitoring system performance and proactively addressing potential problems. Escalation: Knowing when and how to escalate complex issues to higher-level support teams. Meeting SLAs : Adhering to service level agreements (SLAs) for response and resolution times Skills Required: Ample knowledge and experience in Incident and Problem Management Good understanding in ITIL concepts and modern support models Good understanding and hands on ITSM tools and CI-CD Pipelines Hands on experience on the automation activities and related tools (Autosys, Control-M, ActiveBatch etc.) Hands on experience in scripting languages such as Python, Power Shell, Java, Javascript, JSON etc. Ability to understand the customer requirements thoroughly and finish the work independently Strong understanding in the modern DevOps tools and concepts Strong understanding in the cloud environments and services Skills Preferred: Experience on observability tools such as DataDog, Dynatrace, Splunk, Quantum Metric, Teams etc. will be a plus Experience or certification in GCP, SRE, ITIL etc. is a plus Good demonstration and presentation skill Excellent communication skill, quick learning ability, self confidence Experience Required: More than 5 years of experience in IT Infrastructure support Education Required: Bachelors degree in IT or related field ( BCA BTech)
Posted 2 months ago
3.0 - 7.0 years
13 - 20 Lacs
Bengaluru
Work from Office
Job Description Were looking for a Platform Engineer - Data Infrastructure to join our Data Division, where you’ll work alongside a small but experienced team to scale and operate our internal data platform. This platform powers everything from application development to analytics and integrations, and your job will be to help make it more reliable, scalable, and easy for other engineers to use. You’ll spend your time building automation, supporting production systems, and improving how our engineering teams interact with data infrastructure. If you enjoy working with cloud-native services, improving operations, and building tools that simplify life for other developers, this role is a great fit. What You’ll Do: Help build and operate our internal data platform, supporting database technologies including SQL NoSQL, data processing technologies, and data storage systems used across the company. Develop automation, tooling, and reusable components that enable other engineers to self-serve database resources and manage data infrastructure more easily. Contribute to the reliability and scalability of production systems by building resilient deployment patterns and participating in incident response. Write infrastructure-as-code (Terraform) to provision and manage cloud resources in a consistent, automated way. Partner with senior engineers to improve observability and monitoring across our data systems, and to define SLAs/SLOs. Participate in technical design discussions and code reviews, gaining exposure to platform architecture and infrastructure decisions. Stay current on cloud infrastructure best practices and emerging tools, and bring ideas to the team for how we can continuously improve. What We’re Looking For: 3+ years of experience in platform, SRE, DevOps, or infrastructure engineering roles. Experience managing or supporting cloud-based databases such as PostgreSQL, OpenSearch, DynamoDB, or MongoDB. Familiarity with AWS core services (e.g., EC2, EKS, RDS, S3, IAM). Solid understanding of infrastructure as code using tools like Terraform or CloudFormation. Comfortable with tool building, shell scripting, and basic software engineering practices (e.g., Python, Ruby). Some experience with or interest in monitoring and alerting systems (e.g., Datadog, Sumologic, Honeycomb). Strong problem-solving skills, eagerness to learn, and a collaborative attitude. A growing interest in platform thinking—designing systems for other engineers, not just yourself. Experience with Kubernetes (EKS) and container orchestration is a strong plus. Why You’ll Love It Here: You’ll work on a modern platform stack and have room to grow your technical and product skills. You’ll be mentored by experienced engineers while owning meaningful parts of our infrastructure. You’ll contribute to an internal platform that has a direct impact on developer productivity across the company. You’ll be part of a company building tools that help real people in construction solve hard problems.
Posted 2 months ago
5.0 - 10.0 years
5 - 9 Lacs
Bengaluru
Work from Office
Job Title:Site Reliability Engineer Experience5-10 Years Location:Bangalore : Desired Skills and Experience Site Reliability Engineering, SRE, Kubernetes, Devops/Sysops, Unix & python scripting Strong hands on Kubernetes strong hands on and SRE/SysOPS back ground. familiar with Kubernetes Operator pattern Proficient with any modern Observability tool stack scripting like Unix or python
Posted 2 months ago
5.0 - 10.0 years
7 - 11 Lacs
Mumbai
Work from Office
Job Title:Application Support SRE Experience5-10 Years Location:Mumbai - Hybrid : Responsibilities:- 5 to 8 years in a similar role of hands-on application / middleware specialist. Prior experience of working in a global financial organization is an advantage Client is looking to onboard an application support and SRE specialist for their Application and Data Engineering (ADE) group. ADE provides application engineering, tooling, automation and elevated production support services conforming to company security blueprints and focused on performance, Reliability and scalability by understanding the technical requirement from application owners and business, participate in technical evaluation of vendors and vendor technologies, Conduct proof of concept, packaging and deploying middleware products. skills: Linux Python/Shell Database-Sybase, DB2 Web Servers
Posted 2 months ago
2.0 - 5.0 years
3 - 5 Lacs
Chennai
Work from Office
Responsibilities: Write, configure, and deploy code that improves service reliability for existing or new systems; set standard for others with respect to code quality. Provide helpful and actionable feedback and review for code or production changes Drive repair/optimization of complex systems with consideration towards a wide range of contributing factors. Lead debugging, troubleshooting, and analysis of service architecture and design. Participate in on-call rotation and provide 24x7 support Write documentation: design, system analysis, runbooks, playbooks. Provide design feedback and uplevel design skills of others. Implement and manage SRE monitoring application backends using Java, Postgres, React, NoSQL and OpenTelemetry. Develop tooling using Terraform and other IaC tools to ensure visibility and proactive issue detection across our platforms. Work within GCP infrastructure, optimizing performance, and cost, and scaling resources to meet demand. Collaborate with development teams to enhance system reliability and performance, applying a platform engineering mindset to system administration tasks. Develop and maintain automated solutions for operational aspects such as on-call monitoring, performance tuning, and disaster recovery. Troubleshoot and resolve issues in our dev, test, and production environments. Participate in postmortem analysis and create preventative measures for future incidents. Skills Required: Strong experience with Java and React applications and desired familiarity with Terraform Provider development. Proficient with monitoring and observability tools, particularly OpenTelemetry or other tools. Proficient with cloud services, with a strong preference for Kubernetes and Google Cloud Platform (GCP) experience. Solid programming skills in Golang and scripting languages, with a good understanding of software development best practices. Experience with relational and document databases. Ability to debug, optimize code, and automate routine tasks. Strong problem-solving skills and the ability to work under pressure in a fast-paced environment. Excellent verbal and written communication skills. Experience Required: 2+ years of experience as an SRE, DevOps Engineer, Software Engineer or similar role. Experience Preferred: willing to work in 24/7 shift Education Required: Bachelor's degree in Computer Science, Engineering, Mathematics or equivalent experience
Posted 2 months ago
10.0 - 20.0 years
20 - 35 Lacs
Ahmedabad, Gujarat, India
On-site
Core Responsibilities - 1) Technical Presales 2) Requirement Understanding and provide solutions 3) Architect level exposure 4) Creating custom solutions and accelerator 5) Strong in customer communication and client relationship 6) Ready for Onsite - Customer visit Technical Responsibilities - 1. Architecture Design & Implementation - Design and implement scalable, highly available, and secure infrastructure. Define best practices and standards for CI/CD pipelines, Infrastructure as Code (IaC), and container orchestration. Architect cloud-native and hybrid solutions leveraging platforms like AWS, Azure, and GCP. Design microservices architecture, API gateways, and ensure fault tolerance and high availability. 2. DevOps Strategy & Automation - Develop a comprehensive DevOps strategy aligning with business and technical requirements. Automate provisioning, configuration, and deployment processes using tools like Terraform, Ansible, and Kubernetes. Establish CI/CD pipelines for automated build, test, and deployment workflows. Ensure version control and branching strategies with GitHub, GitLab, or Bitbucket. 3. SRE and Reliability Engineering - Define and implement Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Error Budgets. Monitor and improve system performance, reliability, and scalability. Automate incident response, root cause analysis, and post-mortem processes. Implement observability with tools like Prometheus, Grafana, Datadog, and New Relic. 4. Security and Compliance - Define and enforce security best practices across infrastructure and applications. Implement secrets management, access controls, and secure networking. Ensure compliance with industry standards (e.g., SOC2, ISO 27001). Perform regular security audits, vulnerability assessments, and incident response. 5. Infrastructure as Code (IaC) & Configuration Management - Design and implement Infrastructure as Code (IaC) templates using Terraform, CloudFormation, or Pulumi. Manage configuration drift using Ansible, Chef, or Puppet. Automate infrastructure provisioning, scaling, and disaster recovery. 6. Monitoring, Logging, and Incident Management - Design and implement monitoring, alerting, and logging solutions. Implement centralized log management and distributed tracing. Define incident management and escalation processes. Conduct periodic chaos engineering and disaster recovery drills. 7. Collaboration and Stakeholder Management - Work closely with developers, QA, security, and product teams to ensure smooth releases. Establish DevOps best practices across development, QA, and operations. Collaborate with security and compliance teams to address audit and regulatory requirements. 8. Capacity Planning and Cost Optimization - Perform capacity planning and ensure optimal resource utilization. Optimize cloud and infrastructure costs through rightsizing and reserved instances. Analyze system performance and provide recommendations for cost efficiency. Additional Responsibilities: Tool and Technology Evaluation Evaluate and introduce tools that align with the organizations DevOps and SRE needs. Continuously assess emerging technologies and trends to improve system reliability and efficiency. Incident Response & Disaster Recovery Define and enforce Disaster Recovery (DR) and Business Continuity (BC) strategies. Conduct regular simulations and tests to ensure DR readiness. Mentoring and Leadership Mentor DevOps, SRE, and development teams on best practices. Advocate for a culture of automation, monitoring, and continuous improvement. Conduct knowledge-sharing sessions and workshops. Key Skills and Tools: Cloud Platforms: AWS, Azure, GCP CI/CD Tools: Harness, Jenkins, GitHub Actions, GitLab CI, Azure DevOps IaC Tools: Terraform, CloudFormation, Pulumi Containers & Orchestration: Docker, Kubernetes, Helm Monitoring & Logging: Prometheus, Grafana, ELK, Datadog Scripting/Automation: Bash, Python, PowerShell Version Control: Git, Bitbucket Security Tools: HashiCorp Vault, AWS IAM, Snyk, Aqua Security Success Metrics: Improved deployment frequency and reduced lead time for changes. Reduction in Mean Time to Detect (MTTD) and Mean Time to Resolve (MTTR). Higher uptime and availability through proactive monitoring. Compliance with security standards and reduced security vulnerabilities. Nice to have - Certifications in AWS, Azure, CKA, Terraform.
Posted 2 months ago
8.0 - 13.0 years
35 - 40 Lacs
Pune
Work from Office
About The Role : Job TitleDevOps Engineer, VP LocationPune, India Role Description Corporate Banking (CB) is a technology centric business, with an increasing move to real-time processing, an increasing appetite from customers for integrated systems and access to supporting data. At CB Platform Automation Tooling team, we develop and manage CI/CD, monitoring, and various automation solutions as a service, running thousands of builds daily for more than 90 development teams across the Corporate Bank division of Deutsche Bank. Our environment currently relies on Linux-based stack, open-source tools such as Jenkins, Helm, Ansible, Docker/Podman, as well as other popular tools like OpenShift and Terraform. We're scaling globally to fit our customer needs our engineering team expands and will now be distributed over three Deutsche Bank Technology Centers in US, Germany, and India. As a DevOps/Platform Engineer, you will be responsible for designing, implementing, and supporting reusable engineering solutions, as well as building and promoting a strong engineering culture. Deutsche Banks Corporate Bank division is a leading provider of cash management, trade finance and securities finance. We complete green-field projects that deliver the best Corporate Bank - Securities Services products in the world. Our team is diverse, international, and driven by shared focus on clean code and valued delivery. At every level, agile minds are rewarded with competitive pay, support, and opportunities to excel. You will work as part of a cross-functional agile delivery team. You will bring an innovative approach to software development, focusing on using the latest technologies and practices, as part of a relentless focus on business value. You will be someone who sees engineering as team activity, with a predisposition to open code, open discussion and creating a supportive, collaborative environment. You will be ready to contribute to all stages of software delivery, from initial analysis right through to production support. What we'll offer you As part of our flexible scheme, here are just some of the benefits that youll enjoy, Best in class leave policy. Gender neutral parental leaves 100% reimbursement under childcare assistance benefit (gender neutral) Sponsorship for Industry relevant certifications and education Employee Assistance Program for you and your family members Comprehensive Hospitalization Insurance for you and your dependents Accident and Term life Insurance Complementary Health screening for 35 yrs. and above Your key responsibilities What Youll Do Develop, maintain, and continuously improve the shared CI/CD, automation, and monitoring components keeping focus on quality and user experience Perform the engineering assessments of the platform users' pipelines and approaches Contribute to introduction of modern industry practices into the teamwork and promoting them among the development teams Assist the development teams with their ongoing activities, issues, and adopting our solutions Take the long-term responsibility for your tools and projects, contribute to their sustainable development, testing, and maintenance Your skills and experience Skills Youll Need Deep understanding of common development tasks and problems. Background in Development, Quality Assurance, or SRE is a plus Solid technical background in software development processes and hand-on experience with the tools that we use: Application developmentSpring Boot, Kotlin/Java VCSGit, Bitbucket, GitHub CI/CDJenkins, TeamCity, GitHub Actions Build toolsJib, Maven, Gradle, NPM DevSecOpsSonarQube, JFrog Xray, Veracode Deployments, configuration, and infrastructure managementDocker, Helm, Ansible, Terraform, Liquibase Monitoring & SREPrometheus, Grafana, New Relic, Splunk ScriptingGroovy, Python Hands-on experience with container-based environments (Minikube, Kubernetes, OpenShift). Knowledge of GCP is a plus Strong communication and collaboration skills, readiness to take ownership of your tasks Proactive mindset, attention to details, and constant wish to improve Expectations It is the Banks expectation that employees hired into this role will work in the Cary office in accordance with the Banks hybrid working model. Deutsche Bank provides reasonable accommodations to candidates and employees with a substantiated need based on disability and/or religion. How we'll support you Training and development to help you excel in your career. Coaching and support from experts in your team. A culture of continuous learning to aid progression. A range of flexible benefits that you can tailor to suit your needs. About us and our teams Please visit our company website for further information: https://www.db.com/company/company.htm We strive for a culture in which we are empowered to excel together every day. This includes acting responsibly, thinking commercially, taking initiative and working collaboratively. Together we share and celebrate the successes of our people. Together we are Deutsche Bank Group. We welcome applications from all people and promote a positive, fair and inclusive work environment.
Posted 2 months ago
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
We have sent an OTP to your contact. Please enter it below to verify.
Accenture
39817 Jobs | Dublin
Wipro
19388 Jobs | Bengaluru
Accenture in India
15458 Jobs | Dublin 2
EY
14907 Jobs | London
Uplers
11185 Jobs | Ahmedabad
Amazon
10459 Jobs | Seattle,WA
IBM
9256 Jobs | Armonk
Oracle
9226 Jobs | Redwood City
Accenture services Pvt Ltd
7971 Jobs |
Capgemini
7704 Jobs | Paris,France