Jobs
Interviews

88 Pagerduty Jobs - Page 4

Setup a job Alert
JobPe aggregates results for easy application access, but you actually apply on the job portal directly.

2.0 - 5.0 years

2 - 6 Lacs

Coimbatore

Work from Office

The Opportunity: Avantor is looking for a dynamic, forward-thinking, and experienced Engineer - Command Center, who will be responsible for delivering results against some of the most complex business and technology initiatives. This role will be a full-time position based out of IND- Coimbatore. If you are passionate about solving complex challenges and driving innovation lets talk! Our organization is an Equal Opportunity Employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. JOB DESCRIPTION: As a member of IT Service Management monitoring team, reporting to the Senior Manager of IT Services, you will be responsible to monitor servers, networks, databases, storage and backup devices for proactive identification of incidents. In this well-respected IT group, you will enjoy a wide variety of self-directed work within a supportive team environment. MAJOR JOB DUTIES AND RESPONSIBILITIES (List in order of importance) Monitor event alerts, acknowledge and, when appropriate, escalate to the next level support team(s). Perform in-depth monitoring for P1 and P2 critical applications and basic monitoring for P3, P4 applications. Notify Outage Management Team as the first point of contact for critical P1 and P2 alerts to ensure timely escalation and resolution. Schedule jobs in SAP tool for different systems, ensure successful runs and restart when required. Cleanup NAS backup server files. Prepare weekly error report and ensure tickets are created for all failed jobs. Prepare weekly & monthly Task performance/ Aging reports, drive aging calls with wider team and ensure tickets are closed on time/record justification if required. Support IT changes, prioritizing change requests, assessing impact, and accepting changes which meet requirements. Maintain internal knowledge repository. Manage ticketed query system and ensure queries and resolutions are tracked and kept up to date. QUALIFICATIONS (Education/Training, Experience and Certifications) Bachelors degree or equivalent experience within an enterprise level corporate IT environment is required. Experience in IT monitoring is highly desirable. Direct experience with Jenkins, Nprinting, Cloudwatch, Qlikview, SolarWinds, Redwood, OpManager and/or PagerDuty is highly desirable. Certifications in AWS or ITIL is a plus. KNOWLEDGE SKILLS AND ABILITIES (Those necessary to perform the job competently) Knowledge of ITIL based Incident, Problem and Change Management processes. Strong problem solving and analytical skills. Ability to self-start and to effectively participate in a team environment. Ability to be an on-call escalation point for production support and scheduled off-hours/weekend work if/when required. Ability to focus on the customer and to adhere to processes defined for customer issue handling. Ability to examine, summarize, and effectively present data when required. Commitment to high professional and ethical standards in a diverse workplace.

Posted 3 months ago

Apply

4 - 9 years

10 - 14 Lacs

Hyderabad

Work from Office

Senior Manager Information Systems – Observability Operations What you will do Let’s do this. Let’s change the world. In this vital role you will responsible for leading and overseeing the day-to-day operations of the organization's global observability service. This position should be able to Implement and maintain observability standard methodologies, including tagging, metrics, and logging to provide comprehensive access to system performance. Use tools like Dynatrace, PagerDuty, and other solutions to monitor the health and performance of infrastructure and applications in real-time. The ideal candidate will have a consistent record of leadership in technology-driven on-prem and cloud environments and has a passion for fostering innovation and excellence in the biotechnology industry. Work closely with multi-functional teams including product managers, Application owners, and Infrastructure engineers to define requirements and implement monitoring solutions. This role demands the ability to drive and deliver against key organizational critical initiatives, develop a collaborative environment, and deliver high-quality results in a matrixed organizational structure. Please note, this is an onsite role based in Hyderabad. Roles & Responsibilities: Lead and develop a successful team of Monitoring engineers through recruitment, performance management, and career development Establish and maintain operational metrics, SLAs, and performance standards Experience with observability tools and monitoring large ecosystems. Monitor and manage global Observability infrastructure. Promote automation technologies and self-healing capabilities. Lead incident response and problem management for critical observability issues Oversee implementation and maintenance of security policies and patching and agent upgrade procedures Ensure compliance with regulatory and security requirements. Generate regular reports on license usage, agent upgrades and incident/problem creations. Deliver continuous improvement initiatives in observability operations. Optimize resource allocation and shift coverage for 24/7 operations. Partner with business collaborators to understand and support organizational needs. Lead incident response and problem management for critical issues. Ensure compliance with regulatory requirements. What we expect of you We are all different, yet we all use our unique contributions to serve patients. Basic Qualifications: Master’s degree and 8 to 10 years of experience in Observability operation, with at least 3 years in management OR Bachelor’s degree and 10 to 14 years of experience in Observability Operations, with at least 4 years in management OR Diploma and 14 to 18 years of experience in Observability Operations, with at least 5 years in management Deep understanding of monitoring and notification technologies, observability concepts using Dynatrace and Pagerduty Knowledge of Infrastructure and Application monitoring Knowledge of Logs/Traces Solid background in open telemetry and integration Knowledge of AWS and Azure services Knowledge of TypeScript, React and Python scripting Knowledge of container and K8 environment Preferred Qualifications: Experience in a leadership role within a pharmaceutical or technology organization Strong analytic/critical-thinking and decision-making abilities. Experience with cloud platforms (AWS, Azure, or Google Cloud) Knowledge of automation tools like Ansible and Terraform Understanding of Agile practices Ability to work effectively in a fast-paced, dynamic environment. Professional Certifications Management certifications (Scrum/Agile) (preferred) Associate or Specialist Certification from Dynatrace Soft Skills: Excellent leadership and team management skills. Strong transformation and organizational change experience. Exceptional collaboration and communication skills. High degree of initiative and self-motivation. Ability to manage multiple priorities successfully. Team-oriented with a focus on achieving team goals. Strong presentation and public speaking skills. Excellent analytical and fix skills Strong verbal and written communication skills Ability to work optimally with global, virtual teams Shift Information: This position is an onsite role and may require working during later hours to align with business hours. Candidates must be willing and able to work outside of standard hours as required to meet business needs. What you can expect of us As we work to develop treatments that take care of others, we also work to care for your professional and personal growth and well-being. From our competitive benefits to our collaborative culture, we’ll support your journey every step of the way. In addition to the base salary, Amgen offers competitive and comprehensive Total Rewards Plans that are aligned with local industry standards. Apply now for a career that defies imagination Objects in your future are closer than they appear. Join us. careers.amgen.com As an organization dedicated to improving the quality of life for people around the world, Amgen fosters an inclusive environment of diverse, ethical, committed and highly accomplished people who respect each other and live the Amgen values to continue advancing science to serve patients. Together, we compete in the fight against serious disease. Amgen is an Equal Opportunity employer and will consider all qualified applicants for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, protected veteran status, disability status, or any other basis protected by applicable law. We will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment. Please contact us to request accommodation.

Posted 4 months ago

Apply

2 - 4 years

8 - 12 Lacs

Bengaluru

Work from Office

locationsIndia, Bangalore time typeFull time posted onPosted 2 Days Ago job requisition idJR0035199 Job Title: Site Reliability Engineer About Trellix: Trellix, the trusted CISO ally, is redefining the future of cybersecurity and soulful work. Our comprehensive, GenAI-powered platform helps organizations confronted by todays most advanced threats gain confidence in the protection and resilience of their operations. Along with an extensive partner ecosystem, we accelerate technology innovation through artificial intelligence, automation, and analytics to empower over 53,000 customers with responsibly architected security solutions. We also recognize the importance of closing the 4-million-person cybersecurity talent gap. We aim to create a home for anyone seeking a meaningful future in cybersecurity and look for candidates across industries to join us in soulful work. More at . Role Overview: The Site Reliability Engineer team is responsible for design, implementation and end to end ownership of the infrastructure platform and services that protect the Trellix Securitys Consumer. The services provide continuous protection to our customers with a very strong focus on quality and an extendible services platform to internal partners & product teams. This role is a Site Reliability Engineer for commercial cloud-native solutions, deployed and managed in public cloud environments like AWS, GCP. You will be part of a team that is responsible for Trellix Cloud Services that enable protection at the endpoint products on a continuous basis. Responsibilities of this role include supporting Cloud service measurement, monitoring, and reporting, deployments and security. You will input into improving overall operational quality through common practices and by working with the Engineering, QA, and product DevOps teams. You will also be responsible for supporting efforts that improve Operational Excellence and Availability of Trellix Production environments. You will have access to the latest tools and technology, and an incredible career path with the worlds cyber security leader. You will have the opportunity to immerse yourself within complex and demanding deployment architectures and see the big picture all while helping to drive continuous improvement in all aspects of a dynamic and high-performing engineering organization. If you are passionate about running and continuously improving as a world class Site Reliability Engineer Team, we are offering you a unique and great opportunity to build your career with us and gain experience working with high-performance Cloud systems. About Role: Being part of a global 24x7x365 team providing the operational coverage including event response and recovery efforts of critical services. Periodic deployment of features, patches and hotfixes to maintain the Security posture of our Cloud Services. Ability to work in shifts on a rotational basis and participate in On-Call duties Have ownership and responsibility for high availability of Production environments Input into the monitoring of systems applications and supporting data Report on system uptime and availability Collaborate with other team members on best practices Assist with creating and updating runbooks & SOPs Build a strong relationship with the Cloud DevOps, Dev & QA teams and become a domain expert for the cloud services in your remit. Provided the required support for growth and development in this role. About you: 2 to 4 years of hands-on working experience in supporting production of large-scale cloud services. Strong production support background and experience of in-depth troubleshooting Experience working with solutions in both Linux and Windows environments Experience using modern Monitoring and Alerting tools (Prometheus, Grafana, PagerDuty, etc.) Excellent written and verbal communication skills. Experience with Python or other scripting languages Proven ability to work independently in deploying, testing, and troubleshooting systems. Experience supporting high availability systems and scalable solutions hosted on AWS or GCP. Familiarity with security tools & practices (Wiz, Tenable) Familiarity with Containerization and associated management tools (Docker, Kubernetes) Significant experience of developing and maintaining relationships with a wide range of customers at all levels Understanding of Incident, Change, Problem and Vulnerability Management processes. Desired: Awareness of ITIL best practices AWS Certification and/or Kubernetes Certification Experience with SnowFlake Automation/CI/CD experience, Jenkins, Ansible, Github Actions, Argo CD. Company Benefits and Perks: We believe that the best solutions are developed by teams who embrace each other's unique experiences, skills, and abilities. We work hard to create a dynamic workforce where we encourage everyone to bring their authentic selves to work every day. We offer a variety of social programs, flexible work hours and family-friendly benefits to all of our employees. Retirement Plans Medical, Dental and Vision Coverage Paid Time Off Paid Parental Leave Support for Community Involvement We're serious ab out our commitment to a workplace where everyone can thrive and contribute to our industry-leading products and customer support, which is why we prohibit discrimination and harassment based on race, color, religion, gender, national origin, age, disability, veteran status, marital status, pregnancy, gender expression or identity, sexual orientation or any other legally protected status.

Posted 4 months ago

Apply

1 - 6 years

8 - 13 Lacs

Pune

Work from Office

Cloud Observability Administrator JOB_DESCRIPTION.SHARE.HTML CAROUSEL_PARAGRAPH JOB_DESCRIPTION.SHARE.HTML Pune, India India Enterprise IT - 22685 about our diversity, equity, and inclusion efforts and the networks ZS supports to assist our ZSers in cultivating community spaces, obtaining the resources they need to thrive, and sharing the messages they are passionate about. Cloud Observability Administrator ZS is looking for a Cloud Observability Administrator to join our team in Pune. As a Cloud Observability Administrator, you will be working on configuration of various Observability tools and create solutions to address business problems across multiple client engagements. You will leverage information from requirements-gathering phase and utilize past experience to design a flexible and scalable solution; Collaborate with other team members (involved in the requirements gathering, testing, roll-out and operations phases) to ensure seamless transitions. What Youll Do: Deploying, managing, and operating scalable, highly available, and fault tolerant Splunk architecture. Onboarding various kinds of log sources like Windows/Linux/Firewalls/Network into Splunk. Developing alerts, dashboards and reports in Splunk. Writing complex SPL queries. Managing and administering a distributed Splunk architecture. Very good knowledge on configuration files used in Splunk for data ingestion and field extraction. Perform regular upgrades of Splunk and relevant Apps/add-ons. Possess a comprehensive understanding of AWS infrastructure, including EC2, EKS, VPC, CloudTrail, Lambda etc. Automation of manual tasks using Shell/PowerShell scripting. Knowledge of Python scripting is a plus. Good knowledge of Linux commands to manage administration of servers. What Youll Bring: 1+ years of experience in Splunk Development & Administration, Bachelor's Degree in CS, EE, or related discipline Strong analytic, problem solving, and programming ability 1-1.5 years of relevant consulting-industry experience working on medium-large scale technology solution delivery engagements; Strong verbal, written and team presentation communication skills Strong verbal and written communication skills with ability to articulate results and issues to internal and client teams Proven ability to work creatively and analytically in a problem-solving environment Ability to work within a virtual global team environment and contribute to the overall timely delivery of multiple projects Knowledge on Observability tools such as Cribl, Datadog, Pagerduty is a plus. Knowledge on AWS Prometheus and Grafana is a plus. Knowledge on APM concepts is a plus. Knowledge on Linux/Python scripting is a plus. Splunk Certification is a plus. Perks & Benefits ZS offers a comprehensive total rewards package including health and well-being, financial planning, annual leave, personal growth and professional development. Our robust skills development programs, multiple career progression options and internal mobility paths and collaborative culture empowers you to thrive as an individual and global team member. We are committed to giving our employees a flexible and connected way of working. A flexible and connected ZS allows us to combine work from home and on-site presence at clients/ZS offices for the majority of our week. The magic of ZS culture and innovation thrives in both planned and spontaneous face-to-face connections. Travel Travel is a requirement at ZS for client facing ZSers; business needs of your project and client are the priority. While some projects may be local, all client-facing ZSers should be prepared to travel as needed. Travel provides opportunities to strengthen client relationships, gain diverse experiences, and enhance professional growth by working in different environments and cultures. Considering applying? At ZS, we're building a diverse and inclusive company where people bring their passions to inspire life-changing impact and deliver better outcomes for all. We are most interested in finding the best candidate for the job and recognize the value that candidates with all backgrounds, including non-traditional ones, bring. If you are interested in joining us, we encourage you to apply even if you don't meet 100% of the requirements listed above. ZS is an equal opportunity employer and is committed to providing equal employment and advancement opportunities without regard to any class protected by applicable law. To Complete Your Application Candidates must possess or be able to obtain work authorization for their intended country of employment.An on-line application, including a full set of transcripts (official or unofficial), is required to be considered. NO AGENCY CALLS, PLEASE. Find Out More At

Posted 4 months ago

Apply

6 - 10 years

8 - 12 Lacs

Noida

Work from Office

Job Description Job Description We are looking for a highly skilled and experienced Senior DevOps Engineer to join our team. The ideal candidate will have 5-7 years of experience in a DevOps role and a proven track record of implementing and maintaining complex systems with a focus on automation, scalability, and security. The Senior DevOps Engineer will work closely with our development, operations, and security teams to ensure that our software is released quickly and reliably, with a focus on continuous integration and delivery. Requirements: Bachelors/Masters degree in Computer Science, Information Technology or related field 5-7 years of experience in a DevOps role Strong understanding of the SDLC and experience with working on fully Agile teams Proven experience in coding & scripting DevOps, Ant/Maven, Groovy, Terraform, Shell Scripting, and Helm Chart skills. Working experience with IaC tools like Terraform, CloudFormation, or ARM templates Strong experience with cloud computing platforms (e.g. Oracle Cloud (OCI), AWS, Azure, Google Cloud) Experience with containerization technologies (e.g. Docker, Kubernetes/EKS/AKS) Experience with continuous integration and delivery tools (e.g. Jenkins, GitLab CI/CD) Kubernetes - Experience with managing Kubernetes clusters and using kubectl for managing helm chart deployments, ingress services, and troubleshooting pods. OS Services Basic Knowledge to Manage, configuring, and troubleshooting Linux operating system issues (Linux), storage (block and object), networking (VPCs, proxies, and CDNs) Monitoring and instrumentation - Implement metrics in Prometheus, Grafana, Elastic, log management and related systems, and Slack/PagerDuty/Sentry integrations Strong know-how of modern distributed version control systems (e.g. Git, GitHub, GitLab etc) Strong troubleshooting and problem-solving skills, and ability to work well under pressure Excellent communication and collaboration skills, and ability to lead and mentor junior team members Career Level - IC3 Responsibilities Responsibilities Design, implement, and maintain automated build, deployment, and testing systems Experience in Taking Application Code and Third Party Products and Building Fully Automated Pipelines for Java Applications to Build, Test and Deploy Complex Systems for delivery in Cloud. Ability to Containerize an Application i.e. creating Docker Containers and Pushing them to an Artifact Repository for deployment on containerization solutions with OKE (Oracle container Engine for Kubernetes) using Helm Charts. Lead efforts to optimize the build and deployment processes for high-volume, high-availability systems Monitor production systems to ensure high availability and performance, and proactively identify and resolve issues Support and Troubleshoot Cloud Deployment and Environment Issues Create and maintain CI/CD pipelines using tools such as Jenkins, GitLab CI/CD Continuously improve the scalability and security of our systems, and lead efforts to implement best practices Participate in the design and implementation of new features and applications, and provide guidance on best practices for deployment and operations Work with security team to ensure compliance with industry and company standards, and implement security measures to protect against threats Keep up-to-date with emerging trends and technologies in DevOps, and make recommendations for improvement Lead and mentor junior DevOps engineers and collaborate with cross-functional teams to ensure successful delivery of projects Analyze, design develop, troubleshoot and debug software programs for commercial or end user applications. Writes code, completes programming and performs testing and debugging of applications. As a member of the software engineering division, you will analyze and integrate external customer specifications. Specify, design and implement modest changes to existing software architecture. Build new products and development tools. Build and execute unit tests and unit test plans. Review integration and regression test plans created by QA. Communicate with QA and porting engineering to discuss major changes to functionality. Work is non-routine and very complex, involving the application of advanced technical/business skills in area of specialization. Leading contributor individually and as a team member, providing direction and mentoring to others. BS or MS degree or equivalent experience relevant to functional area. 6+ years of software engineering or related experience.

Posted 4 months ago

Apply

5 - 10 years

30 - 35 Lacs

Hyderabad

Remote

Role : Devops Engineer Company : Feuji Software Solutions Pvt Ltd. Mode of Hire : Permanent Position Experience : 6- 12 Years Work Location : Hyderabad/ Remote About Feuji Feuji, established in 2014 and headquartered in Dallas, Texas, has rapidly emerged as a leading global technology services provider. With strategic locations including a Near Shore facility in San Jose, Costa Rica, and Offshore Delivery Centers in Hyderabad, and Bangalore, we are well-positioned to cater to a diverse clientele. Our team of 600 talented engineers drives our success, delivering innovative solutions to our clients and contributing to our recognition as a 'Best Place to Work For.' We collaborate with a wide range of clients, from startups to industry giants in sectors like Healthcare, Education, IT, and engineering, enabling transformative changes in their operations. Through partnerships with top technology providers such as AWS, Checkpoint, Gurukul, CoreStack, Splunk, and Micro Focus, we empower our clients' growth and innovation. With a clientele including Microsoft, HP, GSK, and DXC Technologies, we specialize in managed cloud services, cybersecurity, Product and Quality Engineering Services, and Data and Insights solutions, tailored to drive tangible business outcomes. Our commitment to creating 'Happy Teams' underscores our values and dedication to positive impact. Feuji welcomes exceptional talent to join our team, offering a platform for growth, development, and a culture of innovation and excellence. Key Responsibilities Design and implement continuous integration and continuous deployment frameworks from code to deploy Manage and optimize data pipelines for performance, scalability, and reliability Develop, implement, and maintain scalable data pipelines and processes Create and manage automated provisioning and configuration systems for data infrastructure using infrastructure-as-code principles Design, implement, configure and manage system monitoring solutions that alert teams to problems before customers are impacted Support developers in code deployment and troubleshooting Work closely with customers and other team members to understand complex requirements and translate them into automated solutions Provide support to ensure mission critical applications and components are being monitored and meet security, reporting and retention requirements as well as disaster recovery requirements of clients Support team members Skills Knowledge & Expertise Programming/Development Skills : Strong experience in Python is essential and experience with React/Vue.js would be preferred. Monitoring Tools : Familiarity with tools such as PagerDuty, Azure Monitor, and Datadog is beneficial, though monitoring is not the primary focus of the role. Good understanding of any of these tools will be advantageous. Cloud & DevOps Expertise : Must have a strong background in CI/CD, specifically with GitHub Actions. Deep expertise in Azure is essential. Experience with AWS or GCP is a plus. Should demonstrate the ability to quickly adapt to and learn new technologies. Soft Skills & Mindset : A strong passion for continuous learning and self-improvement. Excellent client-facing skills, with the confidence to handle discussions intelligently and effectively. Must be proactive, take full ownership of tasks, and be capable of delivering results even in challenging situations. Required Qualifications : 7+ years of DevOps experience 5+ years of Azure experience 2+ years of Development experience 2+ years of Terraform experience Cloud certifications Excellent communication skills Strong multi-tasker Self-starter Team player Preferred Qualifications : Kubernetes experience Azure, AWS and GCP Professional level certifications Kubernetes certifications (CKA, CKAD, CKS)

Posted 4 months ago

Apply

5 - 7 years

20 - 27 Lacs

Pune

Hybrid

Role: AppOps engineer Location: Pune, Hinjewadi Hybrid (3 days a week) Exp - 5 - 7 years Responsibilities: • Designing and implementing infrastructure and systems (such as metrics, monitoring, node management, alerting, deployment, logging) • Setup new environments & deploying solutions • Application migration from EC2 to containers • Building proactive Monitoring & alerting service. • Automation using ansible, python, Perl scripting • Performance and stability problems investigation - internal and on client sites • Tuning Actimize Platform(AIS and RCM)/Operating System/Application servers/Databases for optimal performance and stability • Identifying performance bottlenecks and assisting in root cause analysis. • Performance related design reviews • Create and setup deployment scripts for different environments (i.e. Test properties vs Prod properties) • Configure and optimize instances and web servers for optimal performance. (ex: adjusting default connection limits, adjusting request queuing thresholds) • AWS troubleshooting support • Support, Architect and Implement alongside Technical & Operations teams to meet our customers' individual needs for their infrastructure & application deployments. • Work on critical, highly complex customer problems that will span multiple AWS services (dealing daily with high severity incidents). • Help build and improve customer operations through scripts to automate and deploy AWS resources seamlessly with as little manual intervention as possible. • Collaborate and help build utilities and tools for internal use that enable you and your fellow AWS Engineers to operate safely at high speed / wide scale. • Drive customer communication during critical events. • Flexible to work over the weekends and in shift environment ( as per • Good experience in a DevOps environment / Operations team / Infrastructure Operations team. • Excellent Troubleshooting skills • Expertise in Performance tuning / investigation / root cause analysis / mitigate bottlenecks • Excellent hands-on experience in managing Application Support (3 tier/2 tier apps) • AWS service knowledge for core services (EC2, S3, IAM, ASG, ELB, CFN, VPC, DX, VPN, ) • Good exposure on managing Containers & Kubernetes, deployment and configuration on containers • Good hands-on experience in deployment, release management, migration activities • Exposure to scripting language (Ansible, Perl, Python, Ruby, Shell script, PowerShell etc.) • Database skills ( SQL ,Oracle or Postgres / Cassandra ) • Good exposure on ELK, Splunk, Kafka • Application Server (skills on any of Middleware technologies e.g. • Tomcat, WebLogic , WebSphere) • Good exposure on Application performance monitoring tools like • AppDynamics, Dynatrace • Strong problem solving, analytical and communication skills • Good communication both written and verbal • Troubleshooting performance issues & tuning • Working with Architecture team on hardware sizing recommendations • JAVA performance testing, diagnosis, and tuning JAVA applications Additional Skills Desired: • Cloud / Application level Security experience • Has worked in an Agile / Sprint development model. • Experience in working with tools like OpsGenie, AlertOps, Pagerduty/OpenDuty • Troubleshooting Java related issues • performance testing/investigation experience • Database performance testing, diagnosis, and tuning. please drop mail with your details and resume to chaithra.j@xoriant.com to proceed further.

Posted 4 months ago

Apply

4.0 - 8.0 years

13 - 18 Lacs

bengaluru

Work from Office

Bengaluru, India Murex Others BCM Industry 29/04/2025 Project description We've been engaged by a large Australian financial institution to provide resources to manage the production support activities along with their existing team in Sydney & India. Responsibilities Carry out enhancements to maintenance/housekeeping scripts as required and monitor the DB growth periodically. Handles cloud Environment preparation, refresh, rebuild, upkeep, maintenance, and upgrade activities. Ensure cloud cost optimisation. Troubleshooting of Murex environment-specific issues including Infrastructure related issues and update pipelines for a permanent fix. Handling EOD execution and troubleshooting of issues related to it. Participate in analysis, solutioning, and deployment of solution for production issues during EoD. Participate in the release activity and coordinate with QA/Release teams. Participate in AWS stack deployment, AWS AMI patching, and stack configuration to ensure optimal performance and cost-efficiency. Address requests like warehouse rebuild, maintenance, Perform Health/sanity checks, create XVA engine, environment restores & backup in AWS as per project need. Perform Weekend maintenance and perform health checks in the production environment during the weekend. Support working in shifts (max end time will be 12.30 AM IST) and available for weekend & on-call support. Have to work out of client location on a need basis. Flexible to work in a Hybrid model. Skills Must have 4 to 8 Years of experience in Murex Production Support Murex End of Day support Troubleshooting batch-related issues, including date moves and processing adjustments Murex Env Management & Troubleshooting Experienced in SQL Unix shell scripting, Monitoring tools, Web development Experienced in the Release and CI/CD process Linux/Unix server and Oracle RDS knowledge Working experience with automation/job scheduling tools such as Autosys, GitHub Actions Working experience with monitoring tools like Grafana, Splunk, Obstack, PagerDuty Good communication and organization skills working within a DevOps team supporting a wider IT delivery team Nice to have PL/SQL, Scripting languages (Python) Advanced troubleshooting experience with Shell scripting and Python Experience with CICD tools like Git, flows, Ansible, and AWS including CDK Exposure to AWS Cloud environment Willing to learn and obtain AWS certification Other Languages EnglishC1 Advanced Seniority Regular

Posted Date not available

Apply

5.0 - 10.0 years

0 Lacs

pune, chennai, bengaluru

Work from Office

Observability-Related Knowledge of observability - monitoring and alerting Experience with Kubermetheus Stack (Kubernetes, Prometheus, Loki, Grafana, Alert Manager) Specifically, must be able to plan, build, test, and launch an observability platform from end-to-end Experience with AWS Proficiency in Python Recent portfolio projects will be required for review and interviews will include coding challenges PagerDuty experience Experience with CI/CD pipelines Infrastructure as Code experience Strong troubleshooting and problem-solving skills Mandatory Skill: Looking for senior SRE profiles who is strong in Coding/ Automation (Python), Terraform, Kubernetes, Prometheus, Loki, Grafana, Alert Manager , PagerDuty, AWS

Posted Date not available

Apply

5.0 - 9.0 years

13 - 17 Lacs

bengaluru

Work from Office

Looking for a place that values your unique talents? Discover Stryker's award-winning culture. We are proud to offer you our total rewards package which includes bonuses, healthcare, insurance benefits, retirement programs, wellness programs, as well as service and performance awards not to mention various social and recreational activities, all of which are location specific. Job description What You Will Do Develop, maintain, and improve Azure-based infrastructure using Terraform. Hands-on experience with observability and monitoring practices and best standards. Automate CI/CD pipelines with integrated security checks. Deploy and orchestrate containerized applications using technologies. Collaborate with application teams to embed metrics and alerts in services. Handle on-call rotations for incident response and conduct root cause analyses. What You Will Need Required Qualifications: Bachelor's or Master's degree in Computer Science, Software Engineering, or a related discipline. 3+ years of professional experience as a DevOps/Site Reliability Engineer. Experience with public cloud providers such as Azure (preferred), AWS. Proficient in scripting using languages such as Python, Bash, or PowerShell Demonstrated experience with Infrastructure as Code (IaC) tools such as Terraform, Helm, and Ansible. Preferred Qualifications: Experience with observability tools such as Grafana, Prometheus, ELK Stack. Experience using on-call management platforms such as PagerDuty, Zenduty.

Posted Date not available

Apply

10.0 - 15.0 years

15 - 20 Lacs

gurugram, bengaluru

Hybrid

Shift - APAC shift (5:00 am IST 2:00 pm IST) - Wed to Sunday We are looking for a passionate Incident Manager to join our Global Operations team to lead and coordinate the resolution of critical and high-priority incidents during assigned shifts, ensuring minimal business impact and adherence to defined service levels. This role will be responsible for serving as front-line leadership during Major Incidents. This will include, but is not limited to communicating technical concepts, status and business impact to leadership both internally as well as with our customers. This individual will need to be confident running a cross-functional war room both in writing via teams as well as on audio/visual calls. To be a successful incident manager, you should have an understanding of basic Infrastructure knowledge and an aptitude for learning new technologies and procedures. Ultimately, an outstanding incident manager should excel at multitasking and remain focus and calm during major incidents. Incident Manager Responsibilities: Manage, assign, communicate, and escalate the incident response during a major incident. Oversee the incident management process and help drive the investigation in resolving the incident by performing the role of an incident commander Lead incident bridge calls and coordinate technical teams to drive swift resolution. Ensure timely incident logging, classification, investigation, escalation, and closure. Maintain real-time documentation of incident progression and updates to stakeholders. Communicate with upper management during major production incidents Assist and collaborate with the service management team members by prioritizing workloads and re-scheduling non-urgent tasks. Drive post-incident reviews (CIRs) and RCA activities to identify root causes and preventive actions Shift leadership: Lead the shift team and assign responsibilities to on-duty engineers. Ensure shift handovers are thorough and well-documented. Provide support and guidance to teams for ticket triage and prioritization. Monitor shift performance, SLAs, and ticket queues. Serve as the decision-maker during critical incidents when senior leadership is not available Provide regular updates to SLT and impacted stakeholders during major incidents and address their queries Send timely notifications and incident summaries as per communication guidelines. Coordinate with cross-functional teams and vendors. Required Skills & Experience: 10+ years of experience in IT Operations/Incident Management, preferably in a global support environment. Strong leadership skills with the ability to manage teams in high-pressure situations. Excellent communication and stakeholder management skills. Hands-on experience with ITSM tools (e.g., ServiceNow, PagerDuty, Jira). Good understanding of ITIL framework (ITIL Foundation certification preferred). Technical awareness of infrastructure (servers, networks, cloud) and enterprise applications. Experience working in 24x7 environments and on-call availability Location : WFH/Hybrid

Posted Date not available

Apply

1.0 - 6.0 years

8 - 13 Lacs

pune

Work from Office

Cloud Observability Administrator JOB_DESCRIPTION.SHARE.HTML CAROUSEL_PARAGRAPH JOB_DESCRIPTION.SHARE.HTML Pune, India India Enterprise IT - 22685 about our diversity, equity, and inclusion efforts and the networks ZS supports to assist our ZSers in cultivating community spaces, obtaining the resources they need to thrive, and sharing the messages they are passionate about. Cloud Observability Administrator ZS is looking for a Cloud Observability Administrator to join our team in Pune. As a Cloud Observability Administrator, you will be working on configuration of various Observability tools and create solutions to address business problems across multiple client engagements. You will leverage information from requirements-gathering phase and utilize past experience to design a flexible and scalable solution; Collaborate with other team members (involved in the requirements gathering, testing, roll-out and operations phases) to ensure seamless transitions. What Youll Do: Deploying, managing, and operating scalable, highly available, and fault tolerant Splunk architecture. Onboarding various kinds of log sources like Windows/Linux/Firewalls/Network into Splunk. Developing alerts, dashboards and reports in Splunk. Writing complex SPL queries. Managing and administering a distributed Splunk architecture. Very good knowledge on configuration files used in Splunk for data ingestion and field extraction. Perform regular upgrades of Splunk and relevant Apps/add-ons. Possess a comprehensive understanding of AWS infrastructure, including EC2, EKS, VPC, CloudTrail, Lambda etc. Automation of manual tasks using Shell/PowerShell scripting. Knowledge of Python scripting is a plus. Good knowledge of Linux commands to manage administration of servers. What Youll Bring: 1+ years of experience in Splunk Development & Administration, Bachelor's Degree in CS, EE, or related discipline Strong analytic, problem solving, and programming ability 1-1.5 years of relevant consulting-industry experience working on medium-large scale technology solution delivery engagements; Strong verbal, written and team presentation communication skills Strong verbal and written communication skills with ability to articulate results and issues to internal and client teams Proven ability to work creatively and analytically in a problem-solving environment Ability to work within a virtual global team environment and contribute to the overall timely delivery of multiple projects Knowledge on Observability tools such as Cribl, Datadog, Pagerduty is a plus. Knowledge on AWS Prometheus and Grafana is a plus. Knowledge on APM concepts is a plus. Knowledge on Linux/Python scripting is a plus. Splunk Certification is a plus. Perks & Benefits ZS offers a comprehensive total rewards package including health and well-being, financial planning, annual leave, personal growth and professional development. Our robust skills development programs, multiple career progression options and internal mobility paths and collaborative culture empowers you to thrive as an individual and global team member. We are committed to giving our employees a flexible and connected way of working. A flexible and connected ZS allows us to combine work from home and on-site presence at clients/ZS offices for the majority of our week. The magic of ZS culture and innovation thrives in both planned and spontaneous face-to-face connections. Travel Travel is a requirement at ZS for client facing ZSers; business needs of your project and client are the priority. While some projects may be local, all client-facing ZSers should be prepared to travel as needed. Travel provides opportunities to strengthen client relationships, gain diverse experiences, and enhance professional growth by working in different environments and cultures. Considering applying? At ZS, we're building a diverse and inclusive company where people bring their passions to inspire life-changing impact and deliver better outcomes for all. We are most interested in finding the best candidate for the job and recognize the value that candidates with all backgrounds, including non-traditional ones, bring. If you are interested in joining us, we encourage you to apply even if you don't meet 100% of the requirements listed above. ZS is an equal opportunity employer and is committed to providing equal employment and advancement opportunities without regard to any class protected by applicable law. To Complete Your Application Candidates must possess or be able to obtain work authorization for their intended country of employment.An on-line application, including a full set of transcripts (official or unofficial), is required to be considered. NO AGENCY CALLS, PLEASE. Find Out More At

Posted Date not available

Apply

3.0 - 6.0 years

4 - 8 Lacs

hyderabad

Work from Office

Job Purpose The Systems Operations Analyst is part of a support organization that is responsible for the daily operations of multiple industry leading trading exchanges. This is a customer-facing position, providing immediate assistance to ICE/NYSE exchanges, back office, support personnel and IT staff, to achieve the highest customer satisfaction and minimize the impact of IT related problems. This is a critical support role within the overall architecture of ICE/NYSE exchanges, divisions, and infrastructure. This is a 24x7 environment and the position requires shift rotation and/or weekend work. Responsibilities Monitoring and Incident Management Monitor systems and applications within the production environment Diagnose and fix incidents raised through monitoring tools, conference bridges and chats Work with and escalate to internal and external teams to implement incident fixes, work-around and data recovery Open and update production incident tickets according to company standards Problem Management Investigate and update incident tickets with root cause and incident description, ensuring appropriate corrective action follow-up tickets are assigned Manage incident tickets to closure, ensuring incident details are complete and accurate, and all corrective actions have been completed System and Application Production Readiness Work with internal and external teams to expand and maintain operational runbooks and other documentation Check application and infrastructure availability and tasks at scheduled times Configure monitoring tools and alarms Deployment Management Production deployments Approve and execute production deployment tasks Participate in disaster recovery, business continuity and workplace recovery events. Participate in continuous improvement programs, such as trend analysis of recurring issues. Provide and report on performance metrics of the environment. Follow the handover process documented to bring the next shift up to speed and highlight priority items or issues. Knowledge and Experience Experience with PagerDuty Experience with ServiceNow & Jira Experience with Jenkins & Git Experience in scripting Cloud (AWS) & VMware knowledge is a must Bachelors degree (IT-based) or experience within IT systems support and/or operational support of applications databases within Windows & Linux/Unix OS environment. Strong communication skills High level of general IT skills with email and MS Office Applications Able to think logically and critically. Analytical problem-solving skills with an ability to identify root cause(s) Able to work as a team player across the organization. Able to build and maintain effective relationships with individuals and the team. Ability to be organized and decisive while under pressure. Excellent time management skills Able to manage priorities and multi-task. Self-confident and assertive

Posted Date not available

Apply
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Featured Companies