Home
Jobs
Companies
Resume

23 Pagerduty Jobs

Filter
Filter Interviews
Min: 0 years
Max: 25 years
Min: ₹0
Max: ₹10000000
Setup a job Alert
JobPe aggregates results for easy application access, but you actually apply on the job portal directly.

4.0 - 8.0 years

6 - 10 Lacs

Coimbatore

Work from Office

Naukri logo

The Opportunity Avantor is looking for a dynamic, forward-thinking, and experienced Engineer L0 - Scheduling and Alerting, who will be responsible for delivering results against some of the most complex business and technology initiatives. This role will be a full-time position based out of IND- Coimbatore. If you are passionate about solving complex challenges and driving innovation- lets talk! Reporting to the Sr. Manager of IT Services, the IT Engineer Associate is responsible for supporting multiple applications across the organization, which include Windows, AD, Citrix, Office 365 and additional applications like PagerDuty, Redwood CPP that you will be trained. As a member of our well-respected IT team, you will enjoy a wide variety of self-directed work within a supportive team environment. A positive attitude with the desire to focus on and please the customer, including the ability to quickly understand the customers point of view, will be key success factors in your role. MAJOR JOB DUTIES AND RESPONSIBILITIES (List in order of importance) Monitor event alerts, acknowledge and, when appropriate, escalate to the next level support team(s). Perform in-depth monitoring for P1 and P2 critical applications and basic monitoring for P3, P4 applications. Notify Outage Management Team as the first point of contact for critical P1 and P2 alerts to ensure timely escalation and resolution. Schedule jobs in SAP tool for different systems, ensure successful runs and restart when required. Cleanup NAS backup server files. Prepare weekly error report and ensure tickets are created for all failed jobs. Prepare weekly & monthly Task performance/ Aging reports, drive aging calls with wider team and ensure tickets are closed on time/record justification if required. Support IT changes, prioritizing change requests, assessing impact, and accepting changes which meet requirements. Maintain internal knowledge repository. Manage ticketed query system and ensure queries and resolutions are tracked and kept up to date. QUALIFICATIONS (Education/Training, Experience and Certifications) Bachelors degree or equivalent experience within an enterprise level corporate IT environment is required. Experience in IT monitoring is highly desirable. Direct experience with Jenkins, Nprinting, Cloudwatch, Qlikview, SolarWinds, Redwood, OpManager and/or PagerDuty is highly desirable. Certifications in AWS or ITIL is a plus. KNOWLEDGE S AND ABILITIES (Those necessary to perform the job competently) Knowledge of ITIL based Incident, Problem and Change Management processes. Strong problem solving and analytical skills. Ability to self-start and to effectively participate in a team environment. Ability to be an on-call escalation point for production support and scheduled off-hours/weekend work if/when required. Ability to focus on the customer and to adhere to processes defined for customer issue handling. Ability to examine, summarize, and effectively present data when required. Commitment to high professional and ethical standards in a diverse workplace. Disclaimer: The above statements are intended to describe the general nature and level of work being performed by employees assigned to this classification. They are not intended to be construed as an exhaustive list of all responsibilities, duties and skills required of employees assigned to this position. Avantor is proud to be an equal opportunity employer. Why Avantor Dare to go further in your career. Join our global team of 14,000+ associates whose passion for discovery and determination to overcome challenges relentlessly advances life-changing science. The work we do changes peoples lives for the better. It brings new patient treatments and therapies to market, giving a cancer survivor the chance to walk his daughter down the aisle. It enables medical devices that help a little boy hear his moms voice for the first time. Outcomes such as these create unlimited opportunities for you to contribute your talents, learn new skills and grow your career at Avantor. We are committed to helping you on this journey through our diverse, equitable and inclusive culture which includes learning experiences to support your career growth and success. At Avantor, dare to go further and see how the impact of your contributions set science in motion to create a better world. Apply today! EEO Statement: We are an Equal Employment/Affirmative Action employer and VEVRAA Federal Contractor. We do not discriminate in hiring on the basis of sex, gender identity, sexual orientation, race, color, religious creed, national origin, physical or mental disability, protected Veteran status, or any other characteristic protected by federal, state/province, or local law. If you need a reasonable accommodation for any part of the employment process, please contact us by email at recruiting@avantorsciences.com and let us know the nature of your request and your contact information. Requests for accommodation will be considered on a case-by-case basis. Please note that only inquiries concerning a request for reasonable accommodation will be responded to from this email address. 3rd party non-solicitation policy:

Posted 3 days ago

Apply

1.0 - 3.0 years

10 - 15 Lacs

Pune, Bengaluru

Work from Office

Naukri logo

Must have a minimum 1 yr exp in SRE (CloudOps), Google Cloud platforms (GCP), monitoring, APM, and alerting tools like Prometheus, Grafana, ELK, Newrelic, Pingdom, or Pagerduty, Hands-on experience with Kubernetes for orchestration and container mgt Required Candidate profile Mandatory expreience working in B2C Product Companies. Must have Experience with CI/CD tools e.g. (Jenkins, GitLab CI/CD, CircleCI TravisCI..)

Posted 1 week ago

Apply

10.0 - 19.0 years

13 - 22 Lacs

Hyderabad, India

Hybrid

Naukri logo

Department: Information Technology Employment Type: Full Time Location: India Description V3locity, Vitech’s cloud-native administration, engagement, and analytics platform, is a transformative suite of complementary applications that offers full life cycle business functionality and robust enterprise capabilities. It marries core administration with superior digital experience and augmented analytics. Its modular design enables flexible, agile deployment strategies. V3locity employs an advanced, cloud-native architecture that leverages the unique capabilities of AWS to deliver a solution with unparalleled security, scalability, and resiliency. Senior Manager– IT Service Management (ITSM) Location: Hyderabad - Hybrid We are seeking a dynamic and experienced IT Service Management (ITSM) leader to lead and enhance our global IT and Cloud operations. The ideal candidate will oversee core ITSM functions, including Service Desk, Incident Management, Problem Management, Change Management, and Service Request Fulfillment in a 24/7, fast-paced software product environment. This leader will play a strategic role in driving continuous improvement, implementing best practices in ITSM, and maturing overall service delivery practices. What you will do: ITSM: Define and drive the ITSM strategy aligned with organizational goals and customer satisfaction. Lead and develop the ITSM function, including Service Desk, Incident, Problem, and Change Management teams based out of our Hyderabad Office. Drive adoption and maturity of ITIL practices across the IT organization. Service Desk Operations: Oversee global service desk operations, ensuring high-quality and timely technical support. Establish and monitor SLAs, KPIs, and customer satisfaction metrics. Ensure timely delivery of customer monthly SLA reporting, leveraging tools like New Relic. Manage on-call rotation for all Service Teams using tools like PagerDuty. Incident & Problem Management: Lead major incident response and communication processes, ensuring minimal impact and quick resolution. Drive root cause analysis, problem identification, and long-term resolution strategies. Maintain high availability and performance of business-critical services. Change & Release Management: Establish and govern change control procedures ensuring safe, secure, and timely releases. Collaborate with DevOps and engineering teams to align change processes with agile product development/deployment/releases. ITSM Tools & Reporting: Own and optimize the ITSM platform (e.g., ServiceNow, Jira Service Management). Own and deliver our monthly client SLA reporting cadence to customers Deliver regular operational reports, dashboards, and executive summaries leveraging Jira Service Management. Identify and implement continuous improvement opportunities based on data insights. Governance & Compliance: Ensure compliance with internal policies, external regulations (e.g., ISO, SOC2), and audit requirements. Maintain clear documentation and process alignment with industry standards (ITIL v4, COBIT). Team Development & Leadership: Lead, mentor, and develop a high-performing team of ITSM professionals. Foster a culture of accountability, collaboration, and service excellence. Manage vendor relationships and third-party service providers as needed. What We're Looking For: 12–15+ years of ITSM experience, with 5+ years in a Service Management role. Proven experience managing global service desk operations and ITIL processes in a product or SaaS environment. ITIL v4 certification; certifications in Agile/Scrum, COBIT, or PMP are a plus. High-level Technical knowledge / certification in AWS Cloud or other clouds. Hands-on experience with ITSM tools like ServiceNow, Jira Service Management, or similar. Working experience with tools in the Monitoring and Service Management space like New Relic, PagerDuty, Honeycomb, Splunk, etc.. Proven experience managing the incident lifecycle, problem, and change processes. Excellent communication, stakeholder management, and crisis management skills. Experience working with global teams across time zones. Prior experience in a software product or SaaS company is highly desirable. Strong business acumen and ability to align IT services with organizational goals. Able to work in shifts and lead the team technically to manage the tasks/issues that arise in the shift. Join Us at Vitech! At Vitech, you’ll be part of a forward-thinking team that values collaboration, innovation, and continuous improvement. We provide a supportive and inclusive environment where you can grow as a leader while helping shape the future of our organization.

Posted 1 week ago

Apply

6.0 - 9.0 years

18 - 20 Lacs

Pune

Work from Office

Naukri logo

Notice Period: (Immediate Joiner - Only) Duration: 6 Months (Possible Extension) Shift Timing: 11:30 AM 9:30 PM IST About the Role We are looking for a highly skilled and experienced DevOps / Site Reliability Engineer to join on a contract basis. The ideal candidate will be hands-on with Kubernetes (preferably GKE), Infrastructure as Code (Terraform/Helm), and cloud-based deployment pipelines. This role demands deep system understanding, proactive monitoring, and infrastructure optimization skills. Key Responsibilities: Design and implement resilient deployment strategies (Blue-Green, Canary, GitOps). Configure and maintain observability tools (logs, metrics, traces, alerts). Optimize backend service performance through code and infra reviews (Node.js, Django, Go, Java). Tune and troubleshoot GKE workloads, HPA configs, ingress setups, and node pools. Build and manage Terraform modules for infrastructure (VPC, CloudSQL, Pub/Sub, Secrets). Lead or participate in incident response and root cause analysis using logs, traces, and dashboards. Reduce configuration drift and standardize secrets, tagging, and infra consistency across environments. Collaborate with engineering teams to enhance CI/CD pipelines and rollout practices. Required Skills & Experience: 5-10 years in DevOps, SRE, Platform, or Backend Infrastructure roles. Strong coding/scripting skills and ability to review production-grade backend code. Hands-on experience with Kubernetes in production, preferably on GKE. Proficient in Terraform, Helm, GitHub Actions, and GitOps tools (ArgoCD or Flux). Deep knowledge of Cloud architecture (IAM, VPCs, Workload Identity, CloudSQL, Secret Management). Systems thinking understands failure domains, cascading issues, timeout limits, and recovery strategies. Strong communication and documentation skills capable of driving improvements through PRs and design reviews. Tech Stack & Tools Cloud & Orchestration: GKE, Kubernetes IaC & CI/CD: Terraform, Helm, GitHub Actions, ArgoCD/Flux Monitoring & Alerting: Datadog, PagerDuty Databases & Networking: CloudSQL, Cloudflare Security & Access Control: Secret Management, IAM Driving Results: A good single contributor and a good team player. Flexible attitude towards work, as per the needs. Proactively identify & communicate issues and risks. Other Personal Characteristics: Dynamic, engaging, self-reliant developer. Ability to deal with ambiguity. Manage a collaborative and analytical approach. Self-confident and humble. Open to continuous learning Intelligent, rigorous thinker who can operate successfully amongst bright people

Posted 1 week ago

Apply

5.0 - 10.0 years

19 - 22 Lacs

Pune

Work from Office

Naukri logo

Job Description We are looking for an ambitious and highly skilled Go Developer who is passionate about building high-performance, scalable backend systems. This role is perfect for someone who thrives on solving complex engineering challenges, enjoys working with modern development practices, and takes ownership of delivering impactful solutions. You will be part of a dynamic team where innovation, collaboration, and continuous improvement are not just encouraged they are expected. If you were eager to make a meaningful contribution to real-world systems in a fast-paced environment. Skill / Qualifications Bachelor's degree in Computer Science, Engineering, or related technical field 5+ years of hands-on backend development experience Strong programming expertise in Golang Hands-on experience with MongoDB, OracleDB, and Snowflake Proficiency in using Logstash, Elasticsearch, and Splunk (Queries, Alerts, Dashboards) Experience in writing and maintaining scripts for automation and monitoring Familiarity with containerization and orchestration using Docker and Kubernetes Proficient in using Kafka for messaging and stream processing Comfortable working with GitLab for version control and CI/CD pipelines Experience handling incident alerts and escalations via PagerDuty Job Responsibilities Participate in daily stand-ups, code reviews, and sprint planning Review code and tickets to ensure high-quality development practices Design technical specifications for databases and APIs Plan and execute production deployments reliably and efficiently Provide Level 2 on-call support via PagerDuty for escalated incidents Collaborate with cross-functional teams including QA, DevOps, and product stakeholders Ensure effective incident response and root cause analysis for production issues Benefits Competitive Market Rate (Depending on Experience)

Posted 1 week ago

Apply

5.0 - 8.0 years

7 - 12 Lacs

Hyderabad

Work from Office

Naukri logo

Job Description The role of the Lead Site Reliability Engineer is to be hands-on and provide mentorship to other team members on core SRE principles and tools. The lead SRE will participate in end to end operational aspects of Production environment. The individual concerned will be able to work on cloud systems, networks, databases and help drive incident lifecycle management. As a member of the SRE team, you will also be working closely with the Architects, DevOps, Product and development teams to ensure we get the most out of the software on AWS platform. This role requires a highly skilled technology professional with excellent communication skills, strategic mindset, strong analytical and troubleshooting skills on AWS Cloud Platform. Other responsibilities include working with internal business partners to gather requirements, prototyping, architecting, implementing/updating solutions, building and executing test plans, performing quality reviews, managing operations, and triaging and fixing operational issues. Site Reliability Engineers must be able to adjust to constant business change; common types of changes include new requirements, evolving goals and strategies, and emerging technologies. About the Role: Be hands-on and provide mentorship to a growing SRE team on core SRE principles and tools. Foster a sense of automation in issue resolution; everything possible should be automated, and only when automation cant resolve an issue should people get involved in the resolution Lead efforts for updating production with new versions/infrastructures as they are available Lead capacity planning efforts in collaboration with Architects and DevOps engineers to determine changes to infrastructure that are needed to support new load and performance characteristics Leads engagement with software developers, DevOps and other infrastructure engineers to integrate software development and delivery from inception to full operation, ensuring robust released software and systems. Ensure highest level of uptime to meet the customer SLA by implementing system wide corrections to prevent reoccurrence of issues. Mentor other SRE team members to further develop their soft and hard skills Triage, troubleshoot and resolve issues using golden signals and go past golden signals Go past golden signals with additional principles such as chaos engineering to detect failure points and lead Game days for testing resiliency of team when it comes to incident response and remediations and synthetic monitoring. Lead SRE team members to create and maintain Recovery Procedures, RCAs in collaboration with other engineering teams. Ensure Incidents assigned to the team are being managed within agreed SLAs Ensure alarms are documented in up to date Knowledge Base Articles. Ensures Production infrastructure is up to date with server/security patches and certificates. Continuous improvement of system and application monitoring and automation Identify and automate manual workarounds and process improvements Proactive monitoring of Monitor the availability, latency, scalability and efficiency of all services Perform periodic on-call duty as part of the SRE team About You: Skilled with cloud operations/administration in Amazon AWS. Tax/Accounting domain experience Bachelors or Masters in Computer Science discipline. 5+ years experience focussed on Site Reliability Engineering or related position in AWS Cloud Platform. At least 2 AWS Certifications are must. (AWS Sysops Admin and Architects certifications preferred). Experience working with SQL, Windows Servers, Load balancers, Linux Deep experience with AWS, Docker and Kubernetes, CloudFormation, CloudWatch, CodeDeploy, DynamoDB, Lambda, SQS, Amazon FSX, Elastic Search and networking concepts are must. Program at a high level in at least one language such as: Java, C#, Javascript, Python or Ruby. Integration experience with PagerDuty, ServiceNow, Datadog, CloudWatch. Good understanding of Site Reliability Engineering (SRE) philosophies, technologies, platforms and tools, SLO management, incident resolution, and automation; Ability to explain technical concepts in clear, non-technical language Working knowledge of infrastructure components (e.g. routers, load balancers, cloud products, container systems, compute, storage, and networks) Knowledge of security and compliance standards such as SOC/PCI is a plus

Posted 2 weeks ago

Apply

3.0 - 6.0 years

11 - 15 Lacs

Bengaluru

Work from Office

Naukri logo

Associate Lead - Kubernetes Platform Is your passion for Cloud Native Platform That is, envisioning and building the core services that underpin all Thomson Reuters’ products Then we want you on our India-based team ! This role is in the Platform Engineering organization where we build the foundational services that power Thomson Reuters’ products. We focus on the subset of capabilities that help Thomson Reuters deliver digital products to our customers . Our mission is to build a durable competitive advantage for TR by providing “building blocks” that get value-to-market faster. About the Role This role is within Platform Engineering’s Service Mesh team, a dedicated group which engineers and operates our Service Mesh capability, which is a microservice platform based on Kubernetes and Istio. Primarily work with AWS and Azure public cloud, especially Kubernetes (AWS EKS and Azure AKS), Service Mesh technology like Istio, Terraform, Datadog, PagerDuty and Python, Golang, Java and/or .Net Core Programming- Golang, Other - Java, C# & Primary Skill Golang, Kubernates Work closely with an architect, establish and entrench the architectural design & principles for Service Mesh Participate in all aspects of the development lifecycleIdeation, Design, Build, Test and Operate . We embrace a DevOps culture (“you build it, you run it”); while we have dedicated 24x7 level-1 support engineers, you may be called on to assist with level-2 support About You 6+ years software development experience 2+ years of experience building cloud native infrastructure, applications and services on AWS, Azure or GCP Hands-on experience with Kubernetes , ideally AWS EKS and/or Azure AKS Experience with Istio or other Service Mesh technologies Experience with container security and supply chain security Experience with declarative infrastructure-as-code, CI/CD automation and GitOps Experience with Kubernetes operators written in Golang A bachelors degree in computer science , Computer Engineering or similar #LI-AD2 What’s in it For You Hybrid Work Model We’ve adopted a flexible hybrid working environment (2-3 days a week in the office depending on the role) for our office-based roles while delivering a seamless experience that is digitally and physically connected. Flexibility & Work-Life Balance: Flex My Way is a set of supportive workplace policies designed to help manage personal and professional responsibilities, whether caring for family, giving back to the community, or finding time to refresh and reset. This builds upon our flexible work arrangements, including work from anywhere for up to 8 weeks per year, empowering employees to achieve a better work-life balance. Career Development and Growth: By fostering a culture of continuous learning and skill development, we prepare our talent to tackle tomorrow’s challenges and deliver real-world solutions. Our Grow My Way programming and skills-first approach ensures you have the tools and knowledge to grow, lead, and thrive in an AI-enabled future. Industry Competitive Benefits We offer comprehensive benefit plans to include flexible vacation, two company-wide Mental Health Days off, access to the Headspace app, retirement savings, tuition reimbursement, employee incentive programs, and resources for mental, physical, and financial wellbeing. Culture: Globally recognized, award-winning reputation for inclusion and belonging, flexibility, work-life balance, and more. We live by our valuesObsess over our Customers, Compete to Win, Challenge (Y)our Thinking, Act Fast / Learn Fast, and Stronger Together. Social Impact Make an impact in your community with our Social Impact Institute. We offer employees two paid volunteer days off annually and opportunities to get involved with pro-bono consulting projects and Environmental, Social, and Governance (ESG) initiatives. Making a Real-World Impact: We are one of the few companies globally that helps its customers pursue justice, truth, and transparency. Together, with the professionals and institutions we serve, we help uphold the rule of law, turn the wheels of commerce, catch bad actors, report the facts, and provide trusted, unbiased information to people all over the world. About Us Thomson Reuters informs the way forward by bringing together the trusted content and technology that people and organizations need to make the right decisions. We serve professionals across legal, tax, accounting, compliance, government, and media. Our products combine highly specialized software and insights to empower professionals with the data, intelligence, and solutions needed to make informed decisions, and to help institutions in their pursuit of justice, truth, and transparency. Reuters, part of Thomson Reuters, is a world leading provider of trusted journalism and news. We are powered by the talents of 26,000 employees across more than 70 countries, where everyone has a chance to contribute and grow professionally in flexible work environments. At a time when objectivity, accuracy, fairness, and transparency are under attack, we consider it our duty to pursue them. Sound excitingJoin us and help shape the industries that move society forward. As a global business, we rely on the unique backgrounds, perspectives, and experiences of all employees to deliver on our business goals. To ensure we can do that, we seek talented, qualified employees in all our operations around the world regardless of race, color, sex/gender, including pregnancy, gender identity and expression, national origin, religion, sexual orientation, disability, age, marital status, citizen status, veteran status, or any other protected classification under applicable law. Thomson Reuters is proud to be an Equal Employment Opportunity Employer providing a drug-free workplace. We also make reasonable accommodations for qualified individuals with disabilities and for sincerely held religious beliefs in accordance with applicable law. More information on requesting an accommodation here. Learn more on how to protect yourself from fraudulent job postings here. More information about Thomson Reuters can be found on thomsonreuters.com.

Posted 2 weeks ago

Apply

2.0 - 5.0 years

2 - 6 Lacs

Coimbatore

Work from Office

Naukri logo

The Opportunity: Avantor is looking for a dynamic, forward-thinking, and experienced Engineer - Command Center, who will be responsible for delivering results against some of the most complex business and technology initiatives. This role will be a full-time position based out of IND- Coimbatore. If you are passionate about solving complex challenges and driving innovation lets talk! Our organization is an Equal Opportunity Employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. JOB DESCRIPTION: As a member of IT Service Management monitoring team, reporting to the Senior Manager of IT Services, you will be responsible to monitor servers, networks, databases, storage and backup devices for proactive identification of incidents. In this well-respected IT group, you will enjoy a wide variety of self-directed work within a supportive team environment. MAJOR JOB DUTIES AND RESPONSIBILITIES (List in order of importance) Monitor event alerts, acknowledge and, when appropriate, escalate to the next level support team(s). Perform in-depth monitoring for P1 and P2 critical applications and basic monitoring for P3, P4 applications. Notify Outage Management Team as the first point of contact for critical P1 and P2 alerts to ensure timely escalation and resolution. Schedule jobs in SAP tool for different systems, ensure successful runs and restart when required. Cleanup NAS backup server files. Prepare weekly error report and ensure tickets are created for all failed jobs. Prepare weekly & monthly Task performance/ Aging reports, drive aging calls with wider team and ensure tickets are closed on time/record justification if required. Support IT changes, prioritizing change requests, assessing impact, and accepting changes which meet requirements. Maintain internal knowledge repository. Manage ticketed query system and ensure queries and resolutions are tracked and kept up to date. QUALIFICATIONS (Education/Training, Experience and Certifications) Bachelors degree or equivalent experience within an enterprise level corporate IT environment is required. Experience in IT monitoring is highly desirable. Direct experience with Jenkins, Nprinting, Cloudwatch, Qlikview, SolarWinds, Redwood, OpManager and/or PagerDuty is highly desirable. Certifications in AWS or ITIL is a plus. KNOWLEDGE SKILLS AND ABILITIES (Those necessary to perform the job competently) Knowledge of ITIL based Incident, Problem and Change Management processes. Strong problem solving and analytical skills. Ability to self-start and to effectively participate in a team environment. Ability to be an on-call escalation point for production support and scheduled off-hours/weekend work if/when required. Ability to focus on the customer and to adhere to processes defined for customer issue handling. Ability to examine, summarize, and effectively present data when required. Commitment to high professional and ethical standards in a diverse workplace.

Posted 2 weeks ago

Apply

4 - 9 years

10 - 14 Lacs

Hyderabad

Work from Office

Naukri logo

Senior Manager Information Systems – Observability Operations What you will do Let’s do this. Let’s change the world. In this vital role you will responsible for leading and overseeing the day-to-day operations of the organization's global observability service. This position should be able to Implement and maintain observability standard methodologies, including tagging, metrics, and logging to provide comprehensive access to system performance. Use tools like Dynatrace, PagerDuty, and other solutions to monitor the health and performance of infrastructure and applications in real-time. The ideal candidate will have a consistent record of leadership in technology-driven on-prem and cloud environments and has a passion for fostering innovation and excellence in the biotechnology industry. Work closely with multi-functional teams including product managers, Application owners, and Infrastructure engineers to define requirements and implement monitoring solutions. This role demands the ability to drive and deliver against key organizational critical initiatives, develop a collaborative environment, and deliver high-quality results in a matrixed organizational structure. Please note, this is an onsite role based in Hyderabad. Roles & Responsibilities: Lead and develop a successful team of Monitoring engineers through recruitment, performance management, and career development Establish and maintain operational metrics, SLAs, and performance standards Experience with observability tools and monitoring large ecosystems. Monitor and manage global Observability infrastructure. Promote automation technologies and self-healing capabilities. Lead incident response and problem management for critical observability issues Oversee implementation and maintenance of security policies and patching and agent upgrade procedures Ensure compliance with regulatory and security requirements. Generate regular reports on license usage, agent upgrades and incident/problem creations. Deliver continuous improvement initiatives in observability operations. Optimize resource allocation and shift coverage for 24/7 operations. Partner with business collaborators to understand and support organizational needs. Lead incident response and problem management for critical issues. Ensure compliance with regulatory requirements. What we expect of you We are all different, yet we all use our unique contributions to serve patients. Basic Qualifications: Master’s degree and 8 to 10 years of experience in Observability operation, with at least 3 years in management OR Bachelor’s degree and 10 to 14 years of experience in Observability Operations, with at least 4 years in management OR Diploma and 14 to 18 years of experience in Observability Operations, with at least 5 years in management Deep understanding of monitoring and notification technologies, observability concepts using Dynatrace and Pagerduty Knowledge of Infrastructure and Application monitoring Knowledge of Logs/Traces Solid background in open telemetry and integration Knowledge of AWS and Azure services Knowledge of TypeScript, React and Python scripting Knowledge of container and K8 environment Preferred Qualifications: Experience in a leadership role within a pharmaceutical or technology organization Strong analytic/critical-thinking and decision-making abilities. Experience with cloud platforms (AWS, Azure, or Google Cloud) Knowledge of automation tools like Ansible and Terraform Understanding of Agile practices Ability to work effectively in a fast-paced, dynamic environment. Professional Certifications Management certifications (Scrum/Agile) (preferred) Associate or Specialist Certification from Dynatrace Soft Skills: Excellent leadership and team management skills. Strong transformation and organizational change experience. Exceptional collaboration and communication skills. High degree of initiative and self-motivation. Ability to manage multiple priorities successfully. Team-oriented with a focus on achieving team goals. Strong presentation and public speaking skills. Excellent analytical and fix skills Strong verbal and written communication skills Ability to work optimally with global, virtual teams Shift Information: This position is an onsite role and may require working during later hours to align with business hours. Candidates must be willing and able to work outside of standard hours as required to meet business needs. What you can expect of us As we work to develop treatments that take care of others, we also work to care for your professional and personal growth and well-being. From our competitive benefits to our collaborative culture, we’ll support your journey every step of the way. In addition to the base salary, Amgen offers competitive and comprehensive Total Rewards Plans that are aligned with local industry standards. Apply now for a career that defies imagination Objects in your future are closer than they appear. Join us. careers.amgen.com As an organization dedicated to improving the quality of life for people around the world, Amgen fosters an inclusive environment of diverse, ethical, committed and highly accomplished people who respect each other and live the Amgen values to continue advancing science to serve patients. Together, we compete in the fight against serious disease. Amgen is an Equal Opportunity employer and will consider all qualified applicants for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, protected veteran status, disability status, or any other basis protected by applicable law. We will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment. Please contact us to request accommodation.

Posted 1 month ago

Apply

2 - 4 years

8 - 12 Lacs

Bengaluru

Work from Office

Naukri logo

locationsIndia, Bangalore time typeFull time posted onPosted 2 Days Ago job requisition idJR0035199 Job Title: Site Reliability Engineer About Trellix: Trellix, the trusted CISO ally, is redefining the future of cybersecurity and soulful work. Our comprehensive, GenAI-powered platform helps organizations confronted by todays most advanced threats gain confidence in the protection and resilience of their operations. Along with an extensive partner ecosystem, we accelerate technology innovation through artificial intelligence, automation, and analytics to empower over 53,000 customers with responsibly architected security solutions. We also recognize the importance of closing the 4-million-person cybersecurity talent gap. We aim to create a home for anyone seeking a meaningful future in cybersecurity and look for candidates across industries to join us in soulful work. More at . Role Overview: The Site Reliability Engineer team is responsible for design, implementation and end to end ownership of the infrastructure platform and services that protect the Trellix Securitys Consumer. The services provide continuous protection to our customers with a very strong focus on quality and an extendible services platform to internal partners & product teams. This role is a Site Reliability Engineer for commercial cloud-native solutions, deployed and managed in public cloud environments like AWS, GCP. You will be part of a team that is responsible for Trellix Cloud Services that enable protection at the endpoint products on a continuous basis. Responsibilities of this role include supporting Cloud service measurement, monitoring, and reporting, deployments and security. You will input into improving overall operational quality through common practices and by working with the Engineering, QA, and product DevOps teams. You will also be responsible for supporting efforts that improve Operational Excellence and Availability of Trellix Production environments. You will have access to the latest tools and technology, and an incredible career path with the worlds cyber security leader. You will have the opportunity to immerse yourself within complex and demanding deployment architectures and see the big picture all while helping to drive continuous improvement in all aspects of a dynamic and high-performing engineering organization. If you are passionate about running and continuously improving as a world class Site Reliability Engineer Team, we are offering you a unique and great opportunity to build your career with us and gain experience working with high-performance Cloud systems. About Role: Being part of a global 24x7x365 team providing the operational coverage including event response and recovery efforts of critical services. Periodic deployment of features, patches and hotfixes to maintain the Security posture of our Cloud Services. Ability to work in shifts on a rotational basis and participate in On-Call duties Have ownership and responsibility for high availability of Production environments Input into the monitoring of systems applications and supporting data Report on system uptime and availability Collaborate with other team members on best practices Assist with creating and updating runbooks & SOPs Build a strong relationship with the Cloud DevOps, Dev & QA teams and become a domain expert for the cloud services in your remit. Provided the required support for growth and development in this role. About you: 2 to 4 years of hands-on working experience in supporting production of large-scale cloud services. Strong production support background and experience of in-depth troubleshooting Experience working with solutions in both Linux and Windows environments Experience using modern Monitoring and Alerting tools (Prometheus, Grafana, PagerDuty, etc.) Excellent written and verbal communication skills. Experience with Python or other scripting languages Proven ability to work independently in deploying, testing, and troubleshooting systems. Experience supporting high availability systems and scalable solutions hosted on AWS or GCP. Familiarity with security tools & practices (Wiz, Tenable) Familiarity with Containerization and associated management tools (Docker, Kubernetes) Significant experience of developing and maintaining relationships with a wide range of customers at all levels Understanding of Incident, Change, Problem and Vulnerability Management processes. Desired: Awareness of ITIL best practices AWS Certification and/or Kubernetes Certification Experience with SnowFlake Automation/CI/CD experience, Jenkins, Ansible, Github Actions, Argo CD. Company Benefits and Perks: We believe that the best solutions are developed by teams who embrace each other's unique experiences, skills, and abilities. We work hard to create a dynamic workforce where we encourage everyone to bring their authentic selves to work every day. We offer a variety of social programs, flexible work hours and family-friendly benefits to all of our employees. Retirement Plans Medical, Dental and Vision Coverage Paid Time Off Paid Parental Leave Support for Community Involvement We're serious ab out our commitment to a workplace where everyone can thrive and contribute to our industry-leading products and customer support, which is why we prohibit discrimination and harassment based on race, color, religion, gender, national origin, age, disability, veteran status, marital status, pregnancy, gender expression or identity, sexual orientation or any other legally protected status.

Posted 1 month ago

Apply

1 - 6 years

8 - 13 Lacs

Pune

Work from Office

Naukri logo

Cloud Observability Administrator JOB_DESCRIPTION.SHARE.HTML CAROUSEL_PARAGRAPH JOB_DESCRIPTION.SHARE.HTML Pune, India India Enterprise IT - 22685 about our diversity, equity, and inclusion efforts and the networks ZS supports to assist our ZSers in cultivating community spaces, obtaining the resources they need to thrive, and sharing the messages they are passionate about. Cloud Observability Administrator ZS is looking for a Cloud Observability Administrator to join our team in Pune. As a Cloud Observability Administrator, you will be working on configuration of various Observability tools and create solutions to address business problems across multiple client engagements. You will leverage information from requirements-gathering phase and utilize past experience to design a flexible and scalable solution; Collaborate with other team members (involved in the requirements gathering, testing, roll-out and operations phases) to ensure seamless transitions. What Youll Do: Deploying, managing, and operating scalable, highly available, and fault tolerant Splunk architecture. Onboarding various kinds of log sources like Windows/Linux/Firewalls/Network into Splunk. Developing alerts, dashboards and reports in Splunk. Writing complex SPL queries. Managing and administering a distributed Splunk architecture. Very good knowledge on configuration files used in Splunk for data ingestion and field extraction. Perform regular upgrades of Splunk and relevant Apps/add-ons. Possess a comprehensive understanding of AWS infrastructure, including EC2, EKS, VPC, CloudTrail, Lambda etc. Automation of manual tasks using Shell/PowerShell scripting. Knowledge of Python scripting is a plus. Good knowledge of Linux commands to manage administration of servers. What Youll Bring: 1+ years of experience in Splunk Development & Administration, Bachelor's Degree in CS, EE, or related discipline Strong analytic, problem solving, and programming ability 1-1.5 years of relevant consulting-industry experience working on medium-large scale technology solution delivery engagements; Strong verbal, written and team presentation communication skills Strong verbal and written communication skills with ability to articulate results and issues to internal and client teams Proven ability to work creatively and analytically in a problem-solving environment Ability to work within a virtual global team environment and contribute to the overall timely delivery of multiple projects Knowledge on Observability tools such as Cribl, Datadog, Pagerduty is a plus. Knowledge on AWS Prometheus and Grafana is a plus. Knowledge on APM concepts is a plus. Knowledge on Linux/Python scripting is a plus. Splunk Certification is a plus. Perks & Benefits ZS offers a comprehensive total rewards package including health and well-being, financial planning, annual leave, personal growth and professional development. Our robust skills development programs, multiple career progression options and internal mobility paths and collaborative culture empowers you to thrive as an individual and global team member. We are committed to giving our employees a flexible and connected way of working. A flexible and connected ZS allows us to combine work from home and on-site presence at clients/ZS offices for the majority of our week. The magic of ZS culture and innovation thrives in both planned and spontaneous face-to-face connections. Travel Travel is a requirement at ZS for client facing ZSers; business needs of your project and client are the priority. While some projects may be local, all client-facing ZSers should be prepared to travel as needed. Travel provides opportunities to strengthen client relationships, gain diverse experiences, and enhance professional growth by working in different environments and cultures. Considering applying? At ZS, we're building a diverse and inclusive company where people bring their passions to inspire life-changing impact and deliver better outcomes for all. We are most interested in finding the best candidate for the job and recognize the value that candidates with all backgrounds, including non-traditional ones, bring. If you are interested in joining us, we encourage you to apply even if you don't meet 100% of the requirements listed above. ZS is an equal opportunity employer and is committed to providing equal employment and advancement opportunities without regard to any class protected by applicable law. To Complete Your Application Candidates must possess or be able to obtain work authorization for their intended country of employment.An on-line application, including a full set of transcripts (official or unofficial), is required to be considered. NO AGENCY CALLS, PLEASE. Find Out More At

Posted 1 month ago

Apply

6 - 10 years

8 - 12 Lacs

Noida

Work from Office

Naukri logo

Job Description Job Description We are looking for a highly skilled and experienced Senior DevOps Engineer to join our team. The ideal candidate will have 5-7 years of experience in a DevOps role and a proven track record of implementing and maintaining complex systems with a focus on automation, scalability, and security. The Senior DevOps Engineer will work closely with our development, operations, and security teams to ensure that our software is released quickly and reliably, with a focus on continuous integration and delivery. Requirements: Bachelors/Masters degree in Computer Science, Information Technology or related field 5-7 years of experience in a DevOps role Strong understanding of the SDLC and experience with working on fully Agile teams Proven experience in coding & scripting DevOps, Ant/Maven, Groovy, Terraform, Shell Scripting, and Helm Chart skills. Working experience with IaC tools like Terraform, CloudFormation, or ARM templates Strong experience with cloud computing platforms (e.g. Oracle Cloud (OCI), AWS, Azure, Google Cloud) Experience with containerization technologies (e.g. Docker, Kubernetes/EKS/AKS) Experience with continuous integration and delivery tools (e.g. Jenkins, GitLab CI/CD) Kubernetes - Experience with managing Kubernetes clusters and using kubectl for managing helm chart deployments, ingress services, and troubleshooting pods. OS Services Basic Knowledge to Manage, configuring, and troubleshooting Linux operating system issues (Linux), storage (block and object), networking (VPCs, proxies, and CDNs) Monitoring and instrumentation - Implement metrics in Prometheus, Grafana, Elastic, log management and related systems, and Slack/PagerDuty/Sentry integrations Strong know-how of modern distributed version control systems (e.g. Git, GitHub, GitLab etc) Strong troubleshooting and problem-solving skills, and ability to work well under pressure Excellent communication and collaboration skills, and ability to lead and mentor junior team members Career Level - IC3 Responsibilities Responsibilities Design, implement, and maintain automated build, deployment, and testing systems Experience in Taking Application Code and Third Party Products and Building Fully Automated Pipelines for Java Applications to Build, Test and Deploy Complex Systems for delivery in Cloud. Ability to Containerize an Application i.e. creating Docker Containers and Pushing them to an Artifact Repository for deployment on containerization solutions with OKE (Oracle container Engine for Kubernetes) using Helm Charts. Lead efforts to optimize the build and deployment processes for high-volume, high-availability systems Monitor production systems to ensure high availability and performance, and proactively identify and resolve issues Support and Troubleshoot Cloud Deployment and Environment Issues Create and maintain CI/CD pipelines using tools such as Jenkins, GitLab CI/CD Continuously improve the scalability and security of our systems, and lead efforts to implement best practices Participate in the design and implementation of new features and applications, and provide guidance on best practices for deployment and operations Work with security team to ensure compliance with industry and company standards, and implement security measures to protect against threats Keep up-to-date with emerging trends and technologies in DevOps, and make recommendations for improvement Lead and mentor junior DevOps engineers and collaborate with cross-functional teams to ensure successful delivery of projects Analyze, design develop, troubleshoot and debug software programs for commercial or end user applications. Writes code, completes programming and performs testing and debugging of applications. As a member of the software engineering division, you will analyze and integrate external customer specifications. Specify, design and implement modest changes to existing software architecture. Build new products and development tools. Build and execute unit tests and unit test plans. Review integration and regression test plans created by QA. Communicate with QA and porting engineering to discuss major changes to functionality. Work is non-routine and very complex, involving the application of advanced technical/business skills in area of specialization. Leading contributor individually and as a team member, providing direction and mentoring to others. BS or MS degree or equivalent experience relevant to functional area. 6+ years of software engineering or related experience.

Posted 1 month ago

Apply

5 - 10 years

30 - 35 Lacs

Hyderabad

Remote

Naukri logo

Role : Devops Engineer Company : Feuji Software Solutions Pvt Ltd. Mode of Hire : Permanent Position Experience : 6- 12 Years Work Location : Hyderabad/ Remote About Feuji Feuji, established in 2014 and headquartered in Dallas, Texas, has rapidly emerged as a leading global technology services provider. With strategic locations including a Near Shore facility in San Jose, Costa Rica, and Offshore Delivery Centers in Hyderabad, and Bangalore, we are well-positioned to cater to a diverse clientele. Our team of 600 talented engineers drives our success, delivering innovative solutions to our clients and contributing to our recognition as a 'Best Place to Work For.' We collaborate with a wide range of clients, from startups to industry giants in sectors like Healthcare, Education, IT, and engineering, enabling transformative changes in their operations. Through partnerships with top technology providers such as AWS, Checkpoint, Gurukul, CoreStack, Splunk, and Micro Focus, we empower our clients' growth and innovation. With a clientele including Microsoft, HP, GSK, and DXC Technologies, we specialize in managed cloud services, cybersecurity, Product and Quality Engineering Services, and Data and Insights solutions, tailored to drive tangible business outcomes. Our commitment to creating 'Happy Teams' underscores our values and dedication to positive impact. Feuji welcomes exceptional talent to join our team, offering a platform for growth, development, and a culture of innovation and excellence. Key Responsibilities Design and implement continuous integration and continuous deployment frameworks from code to deploy Manage and optimize data pipelines for performance, scalability, and reliability Develop, implement, and maintain scalable data pipelines and processes Create and manage automated provisioning and configuration systems for data infrastructure using infrastructure-as-code principles Design, implement, configure and manage system monitoring solutions that alert teams to problems before customers are impacted Support developers in code deployment and troubleshooting Work closely with customers and other team members to understand complex requirements and translate them into automated solutions Provide support to ensure mission critical applications and components are being monitored and meet security, reporting and retention requirements as well as disaster recovery requirements of clients Support team members Skills Knowledge & Expertise Programming/Development Skills : Strong experience in Python is essential and experience with React/Vue.js would be preferred. Monitoring Tools : Familiarity with tools such as PagerDuty, Azure Monitor, and Datadog is beneficial, though monitoring is not the primary focus of the role. Good understanding of any of these tools will be advantageous. Cloud & DevOps Expertise : Must have a strong background in CI/CD, specifically with GitHub Actions. Deep expertise in Azure is essential. Experience with AWS or GCP is a plus. Should demonstrate the ability to quickly adapt to and learn new technologies. Soft Skills & Mindset : A strong passion for continuous learning and self-improvement. Excellent client-facing skills, with the confidence to handle discussions intelligently and effectively. Must be proactive, take full ownership of tasks, and be capable of delivering results even in challenging situations. Required Qualifications : 7+ years of DevOps experience 5+ years of Azure experience 2+ years of Development experience 2+ years of Terraform experience Cloud certifications Excellent communication skills Strong multi-tasker Self-starter Team player Preferred Qualifications : Kubernetes experience Azure, AWS and GCP Professional level certifications Kubernetes certifications (CKA, CKAD, CKS)

Posted 1 month ago

Apply

5 - 7 years

20 - 27 Lacs

Pune

Hybrid

Naukri logo

Role: AppOps engineer Location: Pune, Hinjewadi Hybrid (3 days a week) Exp - 5 - 7 years Responsibilities: • Designing and implementing infrastructure and systems (such as metrics, monitoring, node management, alerting, deployment, logging) • Setup new environments & deploying solutions • Application migration from EC2 to containers • Building proactive Monitoring & alerting service. • Automation using ansible, python, Perl scripting • Performance and stability problems investigation - internal and on client sites • Tuning Actimize Platform(AIS and RCM)/Operating System/Application servers/Databases for optimal performance and stability • Identifying performance bottlenecks and assisting in root cause analysis. • Performance related design reviews • Create and setup deployment scripts for different environments (i.e. Test properties vs Prod properties) • Configure and optimize instances and web servers for optimal performance. (ex: adjusting default connection limits, adjusting request queuing thresholds) • AWS troubleshooting support • Support, Architect and Implement alongside Technical & Operations teams to meet our customers' individual needs for their infrastructure & application deployments. • Work on critical, highly complex customer problems that will span multiple AWS services (dealing daily with high severity incidents). • Help build and improve customer operations through scripts to automate and deploy AWS resources seamlessly with as little manual intervention as possible. • Collaborate and help build utilities and tools for internal use that enable you and your fellow AWS Engineers to operate safely at high speed / wide scale. • Drive customer communication during critical events. • Flexible to work over the weekends and in shift environment ( as per • Good experience in a DevOps environment / Operations team / Infrastructure Operations team. • Excellent Troubleshooting skills • Expertise in Performance tuning / investigation / root cause analysis / mitigate bottlenecks • Excellent hands-on experience in managing Application Support (3 tier/2 tier apps) • AWS service knowledge for core services (EC2, S3, IAM, ASG, ELB, CFN, VPC, DX, VPN, ) • Good exposure on managing Containers & Kubernetes, deployment and configuration on containers • Good hands-on experience in deployment, release management, migration activities • Exposure to scripting language (Ansible, Perl, Python, Ruby, Shell script, PowerShell etc.) • Database skills ( SQL ,Oracle or Postgres / Cassandra ) • Good exposure on ELK, Splunk, Kafka • Application Server (skills on any of Middleware technologies e.g. • Tomcat, WebLogic , WebSphere) • Good exposure on Application performance monitoring tools like • AppDynamics, Dynatrace • Strong problem solving, analytical and communication skills • Good communication both written and verbal • Troubleshooting performance issues & tuning • Working with Architecture team on hardware sizing recommendations • JAVA performance testing, diagnosis, and tuning JAVA applications Additional Skills Desired: • Cloud / Application level Security experience • Has worked in an Agile / Sprint development model. • Experience in working with tools like OpsGenie, AlertOps, Pagerduty/OpenDuty • Troubleshooting Java related issues • performance testing/investigation experience • Database performance testing, diagnosis, and tuning. please drop mail with your details and resume to chaithra.j@xoriant.com to proceed further.

Posted 1 month ago

Apply

4 - 9 years

3 - 7 Lacs

Mumbai

Work from Office

Naukri logo

About The Role Infrastructure as Code (IaC): Proficiency in Terraform for managing cloud infrastructure. CI/CD Pipelines: Hands-on experience with Jenkins for automated build, test, and deployment. Scripting & Automation: Expertise in PowerShell, Bash, and Python for automating tasks. AWS Cloud Services: Deep understanding of EC2, S3, VPC, RDS, IAM, Lambda, CloudFormation . Configuration Management: Experience with Ansible, Chef, or Puppet for system automation. Containerization & Orchestration: Knowledge of Docker, Kubernetes, and EKS (Elastic Kubernetes Service) . Networking & Security: Understanding of VPC, Subnets, Security Groups, NACLs, Route 53, and AWS WAF . Monitoring & Logging: Experience with AWS CloudWatch, CloudTrail, Prometheus, ELK Stack . Version Control: Proficiency in Git, GitHub, GitLab, or Bitbucket for managing code repositories. AWS Cost Optimization: Ability to analyze and optimize cloud costs using AWS Cost Explorer and Budgets . About The Role - Grade Specific Hybrid Cloud Experience: Exposure to Azure DevOps or Google Cloud along with AWS. Serverless Computing: Familiarity with AWS Lambda and Step Functions . Database Administration: Understanding of Amazon RDS, DynamoDB, and Redshift . Secrets Management: Using AWS Secrets Manager and HashiCorp Vault for credential management. Incident Management: Working with tools like PagerDuty and AWS Systems Manager . Compliance & Governance: Knowledge of AWS Security Hub, GuardDuty, and Well-Architected Framework . Performance Testing & Load Balancing: Experience with JMeter, AWS ALB, and Nginx . Infrastructure Monitoring: Experience with New Relic, Datadog, or Splunk .

Posted 2 months ago

Apply

4 - 9 years

15 - 20 Lacs

Hyderabad

Work from Office

Naukri logo

Expertise in CI/CD tools (Jenkins, GitLab CI, etc.). Experience with containerization (Docker, Kubernetes). Strong scripting skills (Python, Bash, etc.). Knowledge of cloud platforms and IAC (Terraform, Ansible). Healthcare domain Exp Preferred

Posted 2 months ago

Apply

6 - 11 years

8 - 18 Lacs

Bengaluru

Work from Office

Naukri logo

Hi, Greetings from Decision Minds! Mandatory Skill: Pager Duty, Splunk, Dynatrace, Windows, Linux Banking Domain (L1/L2/L3 Support) Exp: 7 to 10 yrs If interested, please share your profile to barsas@decisionminds.com

Posted 2 months ago

Apply

2 - 7 years

4 - 9 Lacs

Maharashtra

Work from Office

Naukri logo

Description Mumbai/Bangalore Generic JD What will SREs do? Provide hands-on SRE with 24x7 SRE support, including incident management, problem management, root cause analysis, monitoring, alerting, and maintenance of infrastructure, compliance Track, audit, monitor and implement on technical work streams Act as portfolio SME (Subject Matter Expert) understand document common components, core functionalities, infrastructure of supported applications Be an escalation point in the on-call rotation, and support our maintenance, scheduled work, support and release deployment requirements Lead in incident management and problem management for applications in scope and RCA Action items fulfillment/ownership Focus on Continuous improvement and technical standards Drive improvements in productivity, monitoring, tooling and best practices Manage technology currency (server patching, certificate renewal, compliance, etc.) with keen eye on automating opportunities Drive best-in-class technical solutions by tracking closely industry leading solutions and applying to RBC environment and needs Leverage the value in unit, department, and enterprise wide teams to develop better solutions and achieve a cross enterprise mindset EngineeringDevelop SRE solutions (monitoring and alerting, machine learning anomaly detection, self-healing and reliability testing) Apply design-thinking and agile mindset in working with SREs, Scrum Masters and Incident Leads Contribute to and leverage best practices in SRE Simplifies development by building repeatable solutions to manual tasks Supports unit's goals to adopt automation solutions for applications in scope Production SupportPerform production support role, including off-hours support and rotational on-call support to be compensated accordingly with overtime pay, lieu time, and on-call allowance Assist in incident management and problem management for applications in scope Evaluate continuously what went well, what went wrong, what can be done to improve and prevent in future Maintain technology currency (perform server patching, certificate renewal, etc.) with keen eye on automating opportunities Ensure availability and uptime of applications in scope, as per service level objectives Ensure compliance of all systems and applications in scope, including maintaining segregation of duties Technical ConsultationSupport initiatives outside of application or squad level scope Consult on products build to other teams in RBPT and enterprise Innovation and LearningStay abreast of technology change and learn constantly, through official training assignments and self-assigned learning Provide demos to team at large of new technology findings Advanced knowledge of the following SRE practices and technologies 3-5 years of experience in related field oPython, YAML, Shell scripting oAzure, Linux oDynatrace, Prometheus, PagerDuty, Moog, Splunk, Elastic, Azure monitor oChaos Engineering oMQ, Kafka oPerform production support role, including off-hours support In-depth hands-on experience in a variety of SRE tools (Ansible, Azure Automation, Catchpoint) Named Job Posting? (if Yes - needs to be approved by SCSC) Additional Details Global Grade C Level To Be Defined Named Job Posting? (if Yes - needs to be approved by SCSC) No Remote work possibility Yes Global Role Family To be defined Local Role Name To be defined Local Skills reliability metrics;reliability controls Languages RequiredENGLISH Role Rarity To Be Defined

Posted 2 months ago

Apply

7 - 10 years

20 - 35 Lacs

Noida

Hybrid

Naukri logo

Job description: Team Leadership : Lead production support engineers by providing guidance, mentorship, and technical expertise. Foster a culture of accountability and continuous improvement within the team. Define Production Support Processes and SLAs : Document and define production support processes that encompass the full lifecycle of a production bug or enhancement request from the end user through to the development team and a production release. Identify SLAs based on severity and work with DevOps and Engineering to meet those SLAs. System and Application Deployments: Oversee the planning and execution of application and database deployments following established processes with adherence to Corporate Change Management standards. Incident Management : Oversee the identification, troubleshooting, and resolution of production issues in real-time with constant communication to affected parties. Ensure that incidents are logged, tracked, and escalated as necessary, and that root cause analysis is conducted, and that SLAs are met. Monitoring & Alerting : Implement and optimize monitoring tools to proactively detect issues and ensure the health and performance of production environments. Lead efforts to fine-tune alerting systems and reduce noise from false positives. System Stability & Performance : Work closely with the development, infrastructure, and operations teams to ensure the stability and scalability of production systems. Recommend and implement improvements to increase system reliability. Root Cause Analysis (RCA) : Lead post-incident reviews, drive root cause analysis efforts, and ensure that lessons learned are shared across teams. Develop and track action plans to prevent the recurrence of incidents. Continuous Improvement : Champion continuous improvement efforts by identifying gaps in the support process and implementing best practices. Optimize incident response times and overall system performance. Collaboration with Stakeholders : Act as the main point of contact for production support issues, engaging with business stakeholders, product owners, and other cross-functional teams to ensure effective communication and resolution. Knowledge Management : Maintain and update documentation for support procedures, system configurations, and incident management. Create knowledge-based articles and ensure the team is well-trained on new systems and procedures. Performance Reporting : Generate regular reports on system performance, incident trends, and support team effectiveness. Provide insights and recommendations to senior leadership based on data analysis. On-Call Rotation : Manage and participate in on-call rotation for critical incidents, ensuring that production environments are supported 24/7/365 Required skills and qualifications: Bachelors degree in computer science, Information Technology, or a related field. 7- 10 years of experience in production support, system administration, or related technical roles with a focus on cloud-based systems management (GCP and Azure) Proven experience in a leadership role within production support or IT operations. Strong knowledge of incident management, system monitoring, and troubleshooting methodologies. Deep understanding of production systems, system architectures, and distributed systems. Hands-on experience with monitoring tools. Familiarity with scripting languages (e.g., Python, Shell) for automation and troubleshooting. Strong communication and interpersonal skills to effectively lead teams and engage with stakeholders. Ability to work under pressure and manage incidents in a fast-paced production environment. Proficiency in Windows/Linux/Unix environments and system administration. Familiarity with CI/CD pipelines and tools (e.g., Jenkins, GitLab). Hands-on experience with .NET Core, .NET Framework, Apache, IIS, PowerShell, and Python for application support. Ability to query SQL databases for application troubleshooting, reporting and deployments. Additional technologies: JIRA, Confluence, Pager Duty, Uptrends, Teams, O365

Posted 2 months ago

Apply

8 - 13 years

11 - 14 Lacs

Bengaluru

Work from Office

Naukri logo

Hands-on experience in AWS Cloud services like Compute, EKS, RDS, Networking, Security, Storage, Serverless services. Good knowledge in K8s, CI/CD pipelines like Jenkins, ArgoCD Good knowledge in programming language like Python Proactively identify and resolve any issues Experience in tools like Splunk, WaveFront, CloudWatch, PagerDuty, GitHub Close security jira tickets within SLA Experience in using jira boards Good communication skill Exposure to Splunk

Posted 3 months ago

Apply

10 - 15 years

12 - 15 Lacs

Bengaluru

Work from Office

Naukri logo

Roles Responsibilities: Oversee MES ASRE activity, provide guidance and mentoring. Ensure rapid, automated and safe deployment of technical solutions and streamline the processes. Handle complex environment requests. Carryout enhancements to maintenance housekeeping scripts as required and monitor the DB growth. Put in process in place to schedule periodic purging of the DB after agreeing with relevant stakeholders. Participate in the release activity and coordinate with QA/Release teams. Monitor spending and cost attribution. Contribute to the env management enhancement roadmap and End to end ownership of tasks. Participate in AWS stack deployment, AWS AMI patching, and stack configuration to ensure optimal performance and cost-efficiency using CloudFormation, git, CICD pipelines. Troubleshooting and resolution of Murex environment specific issues including Infrastructure related issues to ensure the system not hitting the threshold. Troubleshooting and resolution of Murex environment specific issues during regression, failure in EOD run, UAT. Address ad hoc request like warehouse rebuild, maintenance, Perform Health/sanity checks, Creating XVA engine, environment restores & backup in AWS as per project need. Experience in: 8 to 12 Years experience in Murex platform & Environment Management Experience in AWS Cloud Mandatory Skills Description: AWS Certified DevOps Engineer/Solution architect with relevant working experience with CICD tools like Git,?GitHub Actions, flows, Ansible, AWS including CDK Murex environment/support experience with RCA and troubleshooting Experienced in Python, Shell Scripting, Web development Linux/Unix server and Oracle RDS knowledge Experienced in Release and CICD process Working experience with automation/job scheduling tools such as Autosys, GitHub Action, Working experience with monitoring tools like Grafana, Splunk, Obstack, PagerDuty Working experience with Cloud technologies on AWS (cloud formation, networking, is highly desirable) Good communication and organisation skills working within a DevOps team supporting a wider IT delivery team. Nice to have skills: PL/SQL, Programming languages (Java) Technical solution design experience and start-to-end solution ownership Qualification: Bachelors degree/ masters degree in engineering.

Posted 3 months ago

Apply

4 - 8 years

20 - 30 Lacs

Hyderabad

Work from Office

Naukri logo

Expertise in AWS, Python, Groovy, Jenkins, Terraform, and CI/CD. Must have hands-on experience with Docker, Kubernetes (EKS), cloud automation, Unix/Linux & Windows admin, and monitoring tools. Strong troubleshooting skills required.

Posted 3 months ago

Apply

8 - 13 years

20 - 35 Lacs

Hyderabad

Work from Office

Naukri logo

8-13 years of exp in AWS, Python, Groovy, Jenkins. Strong in IaC (Terraform/CloudFormation), CI/CD (Bitbucket/GitHub, Jenkins), AWS services (EC2, RDS, Lambda, S3, EKS), Docker, Linux/Windows admin, monitoring (Prometheus, Grafana), &troubleshooting.

Posted 3 months ago

Apply
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Featured Companies