Get alerts for new jobs matching your selected skills, preferred locations, and experience range. Manage Job Alerts
5.0 - 9.0 years
0 Lacs
haryana
On-site
As the Production Support Lead at FNZ in Gurgaon, India, you will play a crucial role in overseeing the management of issues for our clients. Your responsibilities will involve addressing concerns from both external and internal clients to ensure key performance indicators (KPIs) and service level agreements (SLAs) are met. You will be responsible for managing the workflow, ensuring the smooth functioning of the application, and implementing both proactive and reactive measures to drive continuous service improvement. Your expertise in Incident & Problem Management will be essential as you lead the analysis, investigation, diagnosis, and resolution of production issues. You will also be involved in Release & Change Management, supporting testing and release processes for production fixes. Transitioning between project support and production support during Service Transition will be a key responsibility to maintain operational efficiency. Your responsibilities will include analyzing incidents, recommending solutions for service improvement, directing daily operations, allocating resources, and planning to meet service levels. You will proactively address system and service problems, facilitate the development of documented problem solutions, educate and train application users, guide team members, monitor progress, prioritize quality improvement, initiate process improvements aligned with business objectives, drive enhancements aligned with procedural, regulatory, and security requirements, maintain meticulous documentation for application support procedures, contribute to audits and reviews, and undertake projects to ensure smooth production operations. We are looking for a candidate with a degree in Commerce/IT or a related field, expert SQL skills, independent and delivery-focused working style, superior analytical thinking, attention to detail, good communication skills, leadership abilities in incident management, relationship-building skills, experience in ITIL, and technical skills in SQL and application monitoring tools. While familiarity with financial markets and products or experience with Microsoft .NET development products is beneficial, it is not essential. The role may require flexibility in work hours. At FNZ, we offer a mission-led work environment where you can make a meaningful impact on over 20 million people's wealth management experience. You will have opportunities for rapid career growth, work with market-leading technology, and be encouraged to learn, think innovatively, and drive innovation. FNZ is committed to democratizing wealth management and providing accessible investment opportunities globally. We partner with financial institutions and wealth managers to help millions of people invest in their future.,
Posted 3 weeks ago
7.0 - 10.0 years
16 - 20 Lacs
bengaluru
Work from Office
Your Career As an Incident Commander, you will be at the vanguard of our dedication to cybersecurity. Addressing the most pressing incidents for our customers, you hold a central position in solidifying our reputation as the go-to cybersecurity partner. This role demands proactivity, efficiency, an unwavering dedication to constant refinement, and a passion for customer satisfaction. Your primary responsibility will be to proactively stave off incidents & escalations. Should these incidents or escalations arise, you will be expected to respond without delay, ensure swift resolution, surpass customer expectations, and champion an environment of continuous growth and refinement. You will be tasked with constructing systems that have a cross-geographical reach, ensuring smooth operation from pilot phase right through to full-scale production. At all times, the aim should be to provide an unmatched customer experience, underpinned by consistency, pivotal metrics, and a systematic approach. Leadership and ownership of your domain will be crucial. We anticipate you to tackle challenges at every turn. Our culture is built on collaboration and seamless execution. Your role will be indispensable in elevating executive communication and fostering teamwork to achieve our collective objectives. Your Impact Coordinate and lead response initiatives for the company's most pivotal incidents and escalations that impact our customers As the Incident Commander (IC), demonstrate adept leadership, seamless coordination with global team, and technical prowess, promoting quick decision-making and enabling communication across diverse teams Leverage your robust technical foundation to assess, prioritize, and oversee incidents effectively - Collaborate intimately with premier technical teams appropriate for each severity level Engage and liaise with both internal and external stakeholders spanning various teams proficiently Harness data to champion refinements in processes, fostering collaboration and growth across teams Manage the comprehensive incident progression, working closely with support, engineering, and field teams from the initial response to retrospective analysis Delve into incident solutions, conceptualize and evaluate theories, and pinpoint underlying causes Design, build, operate key parts the E2E Incident management lifecycle Qualifications Your Experience BE/B.tech degree in Engineering or a related technical domain or equivalent military experience is desirable Possess 7 to 10 years in customer facing functions in support, success, product, or service delivery roles Possess a minimum of 5 years' experience in handling significant incidents and escalations in a 24/7/365 environment Comprehensive knowledge of LAN/WAN technologies, encompassing general routing/switching/security for both branch and data center architectures Expertise in Remote Access VPN solutions, including IPSEC, PKI & SSL Familiarity with Palo Alto Networks products is highly advantageous Proficiency in Public Cloud, Microservices, Infrastructure as Code, and expansive metrics and monitoring systems Demonstrable experience with Windows OS, Linux OS, and macOS-based applications (Installation, Troubleshooting, Debugging) Adeptness in providing architectural insights emphasizing fault tolerance and stability An inclination towards a "tools-first" approach, prioritizing process efficiency and consistency Familiarity with systems like Salesforce, FireHydrant, JIRA, Blameless, PagerDuty, Tableau, and other AI/data-driven operations and workflow automation platforms Understanding of SLA, SLO concepts and how to achieve them consistently. Exemplary leadership and communication capabilities, with the finesse to engage various stakeholders and remain poised under stress Solid problem assessment and decision-making skills The ideal Incident Commander has a mix and background of customer management and broad and deep technical skills related in Security, Network, Customer deployments Additional Information The Team The Incident Commanders are part of the newly formed Cross geographical/global Incident Management team under the larger Incident and Escalation Management organization as part of Global Customer Support and Service teams covering Palo Alto offerings of cybersecurity platforms, solutions. Team is responsible for managing and coordinating response efforts during incidents and critical escalations; the Incident Commander role (IC) ensures focused leadership, effective coordination, and streamlined decision-making during incident and critical escalation management
Posted 3 weeks ago
5.0 - 10.0 years
25 - 35 Lacs
hyderabad
Remote
Role : DevOps Engineer Company : Feuji Software Solutions Pvt Ltd. Mode of Hire : Permanent Position Experience : 6- 12 Years Work Location : Hyderabad/ Remote About Feuji Feuji, established in 2014 and headquartered in Dallas, Texas, has rapidly emerged as a leading global technology services provider. With strategic locations including a Near Shore facility in San Jose, Costa Rica, and Offshore Delivery Centers in Hyderabad, and Bangalore, we are well-positioned to cater to a diverse clientele. Our team of 600 talented engineers drives our success, delivering innovative solutions to our clients and contributing to our recognition as a 'Best Place to Work For.' We collaborate with a wide range of clients, from startups to industry giants in sectors like Healthcare, Education, IT, and engineering, enabling transformative changes in their operations. Through partnerships with top technology providers such as AWS, Checkpoint, Gurukul, CoreStack, Splunk, and Micro Focus, we empower our clients' growth and innovation. With a clientele including Microsoft, HP, GSK, and DXC Technologies, we specialize in managed cloud services, cybersecurity, Product and Quality Engineering Services, and Data and Insights solutions, tailored to drive tangible business outcomes. Our commitment to creating 'Happy Teams' underscores our values and dedication to positive impact. Feuji welcomes exceptional talent to join our team, offering a platform for growth, development, and a culture of innovation and excellence. Key Responsibilities Design and implement continuous integration and continuous deployment frameworks from code to deploy Manage and optimize data pipelines for performance, scalability, and reliability Develop, implement, and maintain scalable data pipelines and processes Create and manage automated provisioning and configuration systems for data infrastructure using infrastructure-as-code principles Design, implement, configure and manage system monitoring solutions that alert teams to problems before customers are impacted Support developers in code deployment and troubleshooting Work closely with customers and other team members to understand complex requirements and translate them into automated solutions Provide support to ensure mission critical applications and components are being monitored and meet security, reporting and retention requirements as well as disaster recovery requirements of clients Support team members Required Qualifications : 7+ years of DevOps experience 5+ years of Azure experience 2+ years of Development experience 2+ years of Terraform experience Cloud certifications Excellent communication skills Strong multi-tasker Self-starter Team player Preferred Qualifications : Kubernetes experience Azure, AWS and GCP Professional level certifications Kubernetes certifications (CKA, CKAD, CKS)
Posted 3 weeks ago
3.0 - 8.0 years
4 - 8 Lacs
bengaluru
Work from Office
Job Description Document Job Role: Customer Success EngineerFunction: Level 2 Escalation Support Engineer Location: Bangalore Shift: Rotational. Primarily US time zones (EST/PST support coverage) Job Summary: We are looking for a highly motivated and technically adept Customer Success Engineer (CSE) t o serve as a key escalation point for Zeta Marketing Platform (ZMP). This role will interface directly with enterprise customers and internal teams to resolve complex technical issues, provide proactive guidance, and contribute to the continuous improvement of our customer experience. Key Responsibilities: Handle escalated customer tickets (L2) , perform in-depth root cause analysis, and drive timely resolution . Communicate with customers primarily via e-mail , and also through Slack, MS Teams and phone as needed. Collaborate cross-functionally with Product, Engineering, QA, Design and DevOps teams to investigate and resolve platform-level issues. Apply a structured and data-driven approach to debugging issues in areas such as API integration, campaign workflows, user interface, and data syncing. Provide technical walkthroughs and consultative guidance to customers on platform capabilities and best practices. Document solutions thoroughly in ticketing systems and contribute to the knowledge base for internal and customer use. Identify trends and proactively suggest product or documentation improvements based on recurring customer pain points. Participate in post-incident reviews, RCA documentation , and follow-ups with impacted customers. Provide support during product upgrades or critical incidents , including weekends or holiday coverage on a rotational basis. Required Skills & Experience: 3+ years of experience in a technical support or product support role in a SaaS or MarTech environment . Demonstrated ownership of L2+ escalation issues with strong analytical thinking and troubleshooting depth. Strong written and verbal communication skills with the ability to simplify complex technical concepts. Hands-on experience with web technologies : APIs (REST), HTML, CSS, JavaScript, SQL, JSON, and browser dev tools. Comfortable using tools like Postman, Grafana, Jira, Confluence or similar systems. Prior experience supporting US-based customers and working US time zone hours (minimum 1 year). Customer-first mindset with excellent consultative and advocacy skills. Ability to manage multiple priorities and deliver under pressure in a fast-paced support environment . Experience in writing or reviewing runbooks, playbooks, and RCA documents . Preferred Qualifications: Exposure to marketing automation platforms , customer data platforms (CDPs), or personalization engines. Experience with SQL-based investigation and understanding of event/data pipelines . Familiarity with tools like Honeycomb, AWS, Snowflake or similar platforms is a plus. Experience in incident management or working with on-call rotations using PagerDuty. Expereince in GenAI tools like OpenAI, MS Co-Pilot or Deepseek. Soft Skills: Self-starter who can work independently with minimal supervision. Strong collaboration skills and a positive attitude in cross-team environments. Detail-oriented with a passion for problem-solving and continuous learning.
Posted 3 weeks ago
5.0 - 10.0 years
14 - 19 Lacs
noida
Work from Office
About the Team : Incident Management Team, being part of the IT Service Management (ITSM), works cross-functionally with Global Services, Engineering, Cloud Hosting and Management on the effective delivery of UKGs Cloud SaaS offerings. About The Role: The Lead Cloud Operations Specialist provides day-day support for all the ongoing incidents and aligns with ITSMs strategic direction. Collaborating directly with the leadership team of ITSM, this position demands a high level of adaptability and quick thinking to achieve success. Responsibilities Include: Defining war room procedures, establishing communication channels, and ensuring all necessary resources (tools, data dashboards) are readily available for incident response Leading discussions during war room meetings, keeping the team focused, and ensuring everyone is aligned on priorities Capturing key decisions, actions taken, and lessons learned during the incident for future reference Take charge of the war room, leading the response team (engineers, support specialists) to diagnose, troubleshoot, and resolve issues impacting the SaaS product(s) Gathering and analyzing real-time data to understand the scope and impact of the incident Prioritizing actions, delegating tasks, and making critical decisions to resolve the incident efficiently Keeping stakeholders (internal and external) informed about the situation, progress, and estimated resolution time Enable the swift resolution of incidents, minimize downtime, and implement preventive measures to mitigate future issues Drive and facilitate resolution via Teams as an incident commander with excellent executive presence, communications, collaboration skills Collaborate and align with Leaders across Engineering, Sales, Corporate Comms, and Legal to accelerate incident resolution, remove blockers, and provide a high level of service to our customers Actively engage with cross functional teams to ensure Root Cause Analysis (RCAs) and Post Incident Review (PIRs) are complete, review remediation plans to identify areas for improvement, and socialize findings/insights Thrive under pressure with the ability to stay calm, handle conflict, and partner with other UKG teams to drive resolution Be able to coach other individual contributors in their professional development and serve as a role model Develop and monitor key metrics to understand incident trends, as well as operational resilience and readiness Develop and present business reviews on required cadences to executive leadership Basic Qualifications 5+ years of experience supporting a global 24x7x365 incident management team in an enterprise SaaS environment 5+ years of technical experience (Support, Services, IT, Engineering) at a tech company with exposure working with complex customer base 3+ years of working in a Cloud (AWS or GCP or Azure; GCP preferred) environment 3+ years of working in a scrum/agile/SRE environment (hands-on experience will be a PLUS) 3+ years of working in on-call support rotation model and PagerDuty experience 3+ years of working experience with Teams (integrations with PagerDuty and Service Now), Slack, Confluence and Share Point Subject matter expertise in incident management frameworks; awareness of industry standards and best practices Excellent problem-solving and decision-making skills to identify root causes and implement corrective actions Clear and concise communication skills at all levels (written and verbal) Demonstrated ability to collaborate, build credibility, and establish good working relationships with leaders across UKG to ensure solid partnership and alignment Willingness/Ability to work in shift-based rotation model in a larger enterprise incident management team Preferred Qualifications: Hands-on experience working with the following tools: JIRA, ServiceNow, Salesforce, and Aha and their integrations (e.g. JIRA to PD integration/JIRA to Slack Integration) Experience working in an Agile technical environment Experience working in a Cloud environment
Posted 3 weeks ago
3.0 - 8.0 years
11 - 16 Lacs
noida
Work from Office
About the Team : The Incident Management Team, being part of the IT Service Management (ITSM), works cross-functionally with Global Services, Engineering, Cloud Hosting and Management on the effective delivery of UKGs Cloud SaaS offerings. About The Role: The IT Service Operations Specialist provides day-day support for all the ongoing customer facing & internal cloud infrastructure related incidents. In addition, they will work closely with the leads on operational improvement initiatives. Responsibilities: Acknowledge incoming incidents via PagerDuty and spin-up a bridge Gather the initial information and document them in ServiceNow Adopt/Learn the internal automation tools for incident logging and tracking Learn various internal product & engineering team structures to effectively lead the bridges/war rooms Effectively lead the Incident bridges by taking charge of the room, leading the response teams (engineers, support specialists) to diagnose, troubleshoot, and resolve issues impacting applications to timely mitigate customer-impacting incidents. Engage with global communications teams for status page and external customer communications throughout the lifecycle of the incident Maintain the quality of the data captured in all the tools used in ITSM (PagerDuty, Service Now, JIRA..etc) Learn the new product features for effective management of incident bridges Complete all organizational trainings timely Thrive under pressure with the ability to stay calm, handle conflict, and partner with other UKG teams to drive resolution Develop and monitor key metrics to understand incident trends, as well as operational resilience and readiness Basic Qualifications 3+ years of experience supporting a global 24x7x365 incident management team in a SaaS environment 3+ years of technical experience (Support, Services, IT, Engineering) at a tech company with exposure working with complex customer base 1+ years of working in a Cloud (AWS or GCP or Azure; GCP preferred) environment 2+ years of working in a scrum/agile/SRE environment (hands-on experience will be a PLUS) 2+ years of working in on-call support rotation model and PagerDuty experience 2+ years of working experience with Teams (integrations with PagerDuty and Service Now), Confluence and Share Point Subject matter expertise in incident management frameworks; awareness of industry standards and best practices Preferred Qualifications: Experience with working with the following tools: JIRA, ServiceNow, Salesforce, and Aha Experience working in an Agile technical environment Experience working in a Cloud environment Excellent problem-solving and decision-making skills to identify root causes and implement corrective actions Demonstrated ability to collaborate, build credibility, and establish good working relationships with leaders across UKG to ensure solid partnership and alignment Willingness/Ability to work in shift-based rotation model in a larger enterprise incident management team
Posted 3 weeks ago
5.0 - 10.0 years
0 Lacs
gurugram, haryana, india
On-site
Role Profile Job Title Production Support Senior Engineer Location Gurgaon, India Reports to Head of Production Support, India About FNZ Who we are: FNZ Group is an established and rapidly growing company in the financial technology sector. We partner with the entire industry to make wealth management accessible to more people. Today, we partner with over 650 financial institutions and 12,000 wealth management firms, enabling over 26 million people across all wealth segments to invest in the things they care the most about, on their own terms. We have over 25+ offices globally with 7000 employees (and growing!). To learn more about us and our journey, check out our careers site. Role Description Short role description The application monitoring team covers request servicing, application monitoring and system healthcare for our global clients. We are looking for a Production Support Senior Engineer to enable this teams success. What does success look like as a Production Support Senior Engineer Alert Management & Monitoring: Provide first and second-line support for FNZs application monitoring and alerting systems. Ensure that platform alerts are managed effectively, minimizing system downtime and disruption. Proactively monitor the platform, identifying and addressing potential issues before they impact service. Application Support Platform Maintenance & Improvement: Implement and manage routine platform checks to ensure system robustness and reliability. Contribute to the platform maintenance process, ensuring comprehensive application support practices are in place. Contribute to continuous service improvement by actively driving proactive measures and system enhancements. Leadership & Development: Mentor and guide junior team members. Monitor team performance and continuously seek improvement opportunities for skills and processes. Develop, maintain, and ensure adherence to comprehensive documentation on team activities. Collaborate with internal stakeholders to ensure that business objectives and needs are met. Contribute to audit and effectiveness requests. Experience Required What we are looking for: Experience: 5-10 years of relevant experience Educational Background: Degree in Computer Science, Information Technology, or a related field; or equivalent experience. Technical Skills: Proficient in SQL. Communication Skills: Strong verbal and written communication skills; capable of confidently interacting with platform dedicated teams. Incident Management: Experience in support and incident management, preferably with ITIL or equivalent frameworks. Experience Preferred Beneficial but not essential. Familiarity with financial markets and products. Experience using application support systems, including PagerDuty, NewRelic, Splunk, and ServiceNow. Basic knowledge of Microsoft .NET development, including C#, VB.NET, and SQL Server, is advantageous. Opportunities What We Offer: We are mission led - work at the heart of a purpose-led organization, where you can be proud of the impact you make, every day. Where youll transform the way over 20 million people invest, making wealth management more accessible, sustainable and transparent to more people. Rapid career growth - encouraged to take on responsibility, play a part in the evolution of the company and rapidly drive your career development working on real projects that directly impact our clients and their customers. Market leading technology - Build, create and evolve innovative solutions for the worlds most trusted brands using the latest technologies to help change the face of investing for the future Learning & development Placing emphasis on a willingness to learn, to think differently, to be creative and to help drive innovation. I nclusion At FNZ, we recognise that diversity, equity and inclusion are important factors contributing to our success. We embrace the unique perspective and capabilities of our current and future employees, which will help us continue to drive innovation and achieve our business goals. Recruitment decisions at FNZ are made in a non-discriminatory manner without regard to gender, ethnicity/race, faith, age, nationality, gender identity, sexual orientation, marital status, socio-economic background, disability or military veteran status where all applicants and employees are valued and respected. In addition, we want to ensure accessibility needs are well supported, if you require specific support, please advise us. About FNZ FNZ is committed to opening up wealth so that everyone, everywhere can invest in their future on their terms. We know the foundation to do that already exists in the wealth management industry, but complexity holds firms back. We created wealths growth platform to help. We provide a global, end-to-end wealth management platform that integrates modern technology with business and investment operations. All in a regulated financial institution. We partner with over 650 financial institutions and 12,000 wealth managers, with US$1.7 trillion in assets under administration (AUA). Together with our customers, we help over 26 million people from all wealth segments to invest in their future. Show more Show less
Posted 3 weeks ago
2.0 - 5.0 years
0 Lacs
pune, maharashtra, india
On-site
Shift Ahead Technologies based in Pune, requires efficient and competent Monitoring Engineers with the following skill sets and experience for our US client operations 2-5 yrs of similar likewise experience with good confident communication skills Should be alert to Monitor, acknowledge and alert customer proactively through out the shift in a 24x7 environment Must be able to communicate in English (US) with Client, ISPs and Vendors for troubleshooting. Should be well versed with Jira ticketing tool for raising the cases to notify the alerts to onsite team Should be able to setup Alert in Datadog, Pagerduty, Meraki and monitor the Whole Network environment including all the Network and Peripheral devices like WAN, APs, Routers and Switches Should have extensive experience in Integration and Setup up of Datadog agent in QNAP NAS, Print server Host and VM (Linux and Windows based) devices using Datadog agents Should have the extensive knowledge about Datadog and its collector, forwarder, apm agent and datadog.yaml files setup Should have Knowledge in Setting up On/Off call Shift Setup with Escalation Policy adjustment in Pager Duty Should have relevant handson experience in Pagerduty setup with Datadog and Meraki Dashboard Notice period and salary expectations, need early committed joining by 1st week September 2025 Confident & suitable candidates (with the relevant experience (ONLY) may apply or email us at [HIDDEN TEXT] Show more Show less
Posted 3 weeks ago
4.0 - 9.0 years
5 - 12 Lacs
pune
Hybrid
Experience with large OTT/music/web platforms or B2B SaaS mandatory Core Responsibilities End-to-end ownership of production platform availability across web, iOS/Android, connected TVs and partner platforms. Understanding and experience of various network monitoring systems. Experience of incident management : Lead incident lifecycle: detection Triage (L1) Containment Escalation Remediation RCA Prevention. Approachable and pro-active with excellent interpersonal skills, experience building inter team relationships. Collaborate with other IT teams to resolve cross-functional issues. Ensure accurate and timely documentation of network issues, solutions, and escalations. Build & maintain playbooks for tickets & escalations. Own escalation across CDN, DRM, Player SDK, Payment gateways and other third-party providers. Reporting & stakeholder management: Provide regular status updates to management and stakeholders daily / weekly incident dashboard Hire, mentor, and scale the NOC team Required 4+ years in production support / NOC Proven experience with large OTT/music/web platforms or B2B SaaS at scale (high concurrency, large catalog) Strong observability skillset: Prometheus/Grafana, Datadog/New Relic, ELK/Splunk Expertise in incident management and ticketing at enterprise scale Excellent client communication and stakeholder reporting experience. Tech / Tooling (recommended stack include in hiring brief) Ticketing : Jira Service Management Monitoring & Metrics: Prometheus + Grafana, Datadog, ELK, AWS CloudWatch Alerting / On-call: PagerDuty / Opsgenie /statuspage Logging & Traces: ELK (Elasticsearch / Logstash / Kibana) and/or Splunk; OpenTelemetry tracing
Posted 4 weeks ago
7.0 - 11.0 years
0 Lacs
karnataka
On-site
As the Network Infrastructure Manager, you will be responsible for overseeing the design, implementation, and maintenance of network systems, encompassing LANs, WANs, intranets, extranets, and hybrid cloud environments. Your role will involve managing multiple data centers, with a focus on Cisco ACI, Nexus, and Cisco NDFC to ensure the stability, security, and scalability of the core infrastructure. Additionally, you will be managing AWS cloud infrastructure, collaborating with multiple ISPs, and ensuring seamless integration between on-premises and cloud networks. Your expertise in Cisco Meraki firewalls, switches, and access points will be crucial for providing secure and efficient connectivity for branch and remote locations. You will also manage WiFi environments, leveraging your experience in Cisco ISE and integration with Meraki APs for a seamless user experience. Proficiency in BGP, EIGRP, and OSPF protocols will be essential for configuring and managing complex routing architectures in a large-scale environment. Network security will be a key focus area, where you will oversee IDS/IDP systems, Cisco, and Palo Alto, while leading the migration from Cisco FMC to Palo Alto. Your knowledge of Zscaler and SASE implementation will be vital in ensuring a secure network environment. Continuous monitoring of network performance, conducting performance tuning, and collaborating with internal stakeholders and external partners will also be part of your responsibilities. Furthermore, you will lead and mentor a team of network engineers, providing technical guidance and support, and fostering a collaborative team culture. Maintaining comprehensive documentation of network configurations, processes, and service records will be crucial to ensure compliance with industry standards and regulatory requirements. Your familiarity with monitoring tools such as SolarWinds, PagerDuty, IPAM, ThousandEyes, LiveAction, and Datadog will aid in effectively managing network visibility and incident management. In terms of supervisory responsibilities, you will determine appropriate staff resourcing, build an effective leadership team, define annual Key Performance Indicators, and lead change management initiatives to drive improvements and efficiencies. Your ability to interact collaboratively with internal and external stakeholders, prepare and manage budgets, and analyze variances to maximize operational performance will be key to your success. Your knowledge, skills, and abilities in core data center expertise, hybrid cloud experience, Cisco Meraki proficiency, routing protocols, network security, and technical proficiency will be essential for this role. Strong leadership abilities, excellent communication skills, and proven experience in engaging and influencing team members, stakeholders, and executives will also be critical. This is a full-time position with a day shift schedule and requires in-person work at the designated location.,
Posted 1 month ago
2.0 - 6.0 years
0 Lacs
hyderabad, telangana
On-site
Genpact is a global professional services and solutions firm with over 125,000 employees in more than 30 countries. We are driven by curiosity, agility, and the desire to create lasting value for our clients, including Fortune Global 500 companies. Our purpose is the relentless pursuit of a world that works better for people, where we serve and transform enterprises with our deep industry knowledge, digital operations services, and expertise in data, technology, and AI. We are currently seeking applications for the position of NOC & IM Engineer(L2) - Technical Associate. In this role, you will be responsible for monitoring, detecting, correcting, and preventing incidents to maintain SLAs and reduce production downtime effectively. Your responsibilities will include monitoring production sites and services, detecting issues, executing runbooks, managing ServiceNow tickets, resolving major incidents, facilitating incident communication, conducting postmortems, ensuring standardized processes for changes, overseeing incident, problem, and change management, and defining and generating KPI/metrics for transparency. Qualifications: - Bachelor's Degree required, preferably in Computer Science, Information Systems, or a related field. - Excellent communication skills. Preferred Tool Skills: - BigPanda - SLAM / Neustar Vercara - Tardis - SNOW (INCIDENT/INCIDENTTASK/REQ/SCTASK/PROBLEM/PTASK/CHANGE) - PagerDuty - Splunk - Zabbix - JIRA - Teams - Reporting If you are looking for a challenging opportunity to work in a dynamic environment and have the required qualifications and skills, we invite you to apply for this position. Location: India-Hyderabad Schedule: Full-time Education Level: Bachelor's / Graduation / Equivalent Job Posting: Aug 8, 2025, 1:09:41 AM Unposting Date: Feb 4, 2026, 5:09:41 AM Master Skills List: Consulting Job Category: Full Time,
Posted 1 month ago
7.0 - 9.0 years
7 - 9 Lacs
Hyderabad, Telangana, India
Remote
Welcome to Warner Bros. Discovery the stuff dreams are made of. Who We Are When we say, the stuff dreams are made of, we're not just referring to the world of wizards, dragons and superheroes, or even to the wonders of Planet Earth. Behind WBD's vast portfolio of iconic content and beloved brands, are the storytellers bringing our characters to life, the creators bringing them to your living rooms and the dreamers creating what's next From brilliant creatives, to technology trailblazers, across the globe, WBD offers career defining opportunities, thoughtfully curated benefits, and the tools to explore and grow into your best selves. Here you are supported, here you are celebrated, here you can thrive. About Warner Bros. Discovery Warner Bros. Discovery, a premier global media and entertainment company, offers audiences the world's most differentiated and complete portfolio of content, brands and franchises across television, film, streaming and gaming. The new company combines Warner Media's premium entertainment, sports and news assets with Discovery's leading non-fiction and international entertainment and sports businesses. For more information, please visit www.wbd.com. Meet Our Team: The Direct-to-Consumer (DTC) global tech organization has many software engineering teams build applications for the web, mobile, tablets, connected TVs, consoles, and other streaming devices. Our platform covers everything from search, catalogue, video transcoding, personalization, to global subscriptions, and really much more. Every customer starts their wonderful journey into the world of WBD through DTC's Identity and Growth teams. We ensure customers can seamlessly authenticate and authorize across all WBD brands. We are a fast-growing, global engineering group crucial to WBD's future. Senior Site Reliability Engineer: - Roles and Responsibilities: - Drive the reliability and scalability of cloud-based systems while identifying and implementing improvements for operational efficiency and proactive monitoring. Automation and Tool Development: Continuously seek opportunities to automate workflows, develop self-sustainable tools, and improve operational efficiency. Incident Management: Facilitate partner inquiries and production incidents, ensuring compliance with internal SLAs. Responsibilities include responding to, investigating, and mitigating customer impact. Partner with the Global Partner Integrations (GPI), consumer engineering teams, and PMO to support product launches and other initiatives. You troubleshoot a production issue by reviewing source code, logs, operational metrics, stack trace, etc. to pinpoint a specific problem and then resolve it. You identify root causes and identify learnings to improve both operational processes Is a result-driven creative thinker who drives innovation and produces delightful experiences for our customers. Demonstrate data-driven open-minded decision making, have an insatiable curiosity, love to invent and innovate to solve difficult challenges Takes ownership of their work and consistently delivers results in a fast-paced environment. Actively support hyper-care and watch party events, providing real-time operational metrics and insights. Perform health checks on critical applications and services, ensuring uptime and availability. Write complex queries and scripts, analyze datasets, and pinpoint issues efficiently. Effectively communicate with global partners and stakeholders. Roles and Responsibilities: - ?Foster teams with strong SRE drive engineering culture to close gap between operations and software engineering teams. ?Drive the observability and monitoring of cloud-based systems while identifying and implementing improvements for operational efficiency and proactive monitoring. ?Technical strong with operational capabilities that are industry standards such as alerts, monitoring, system/platform scalability. ?Automation and Tool Development: Continuously seek opportunities to automate workflows, develop self-sustainable tools, and improve operational efficiency. ?Incident Management: Facilitate partner inquiries and production incidents, ensuring compliance with internal SLAs. Responsibilities include responding to, investigating, and mitigating customer impact. ?Partner with the s/w engineering teams, technical account managers, and PMO to support product launches and other initiatives. ?Your team troubleshoots any production issue by reviewing source code, logs, operational metrics, stack trace, etc. to pinpoint a specific problem and then resolve it. You identify root causes and identify learnings to improve both operational processes ?Is a result-driven creative thinker who drives innovation and produces delightful experiences for our customers. ?Demonstrate data-driven open-minded decision making, have an insatiable curiosity, love to invent and innovate to solve difficult challenges ?Takes ownership of their work and consistently delivers results in a fast-paced environment. ?Actively support hyper-care and watch party events, providing real- time operational metrics and insights. ?Perform health checks on critical applications and services, ensuring uptime and availability. ?Write complex queries and scripts, analyze datasets, and pinpoint issues efficiently. ?Effectively communicate with global partners and stakeholders. ?Exercise good judgment when balancing immediate and long-term business needs. What To Bring ?Monitoring & Alerting: Experience implementing alerting, metrics, and logging using tools like Prometheus, CloudWatch, Elastic, and PagerDuty. ?Direct experience with at least one cloud provider (AWS, GCP, Azure, or other).? Strong expertise in SQL hands-on experience working with databases. ?Experience building dashboards using tools like Databricks and Grafana. ?Familiarity with OAuth 2.0 authentication framework. ?Experience with tools such as PagerDuty and ServiceNow is a plus. ?Ability to work flexible shifts to provide global operational coverage and collaborate effectively with remote peers across disparate geographies and time zones. How We Get Things Done This last bit is probably the most important! Here at WBD, our guiding principles are the core values by which we operate and are central to how we get things done. You can find them at www.wbd.com/guiding-principles/ along with some insights from the team on what they mean and how they show up in their day to day. We hope they resonate with you and look forward to discussing them during your interview. Championing Inclusion at WBD Warner Bros. Discovery embraces the opportunity to build a workforce that reflects the diversity of our society and the world around us. Being an equal opportunity employer means that we take seriously our responsibility to consider qualified candidates on the basis of merit, regardless of sex, gender identity, ethnicity, age, sexual orientation, religion or belief, marital status, pregnancy, parenthood, disability or any other category protected by law. If you're a qualified candidate with a disability and you require adjustments or accommodations during the job application and/or recruitment process, please visit our accessibility page for instructions to submit your request.
Posted 1 month ago
6.0 - 10.0 years
0 Lacs
kolkata, west bengal
On-site
As an AVP, Reliability & Observability Engineer at Synchrony, you will play a crucial role in enhancing the reliability, performance, and availability of our infrastructure services. Your responsibilities will include designing, implementing, and maintaining scalable infrastructure in both cloud and on-premises environments, automating operational tasks, and responding to incidents to minimize downtime. You will focus on building and maintaining monitoring, logging, and alerting systems across the infrastructure and application layers to ensure high standards of availability and resilience. Your key responsibilities will involve designing, building, and maintaining highly available infrastructure using tools like Terraform and Ansible, developing and maintaining monitoring and alerting systems such as NewRelic and Grafana, and optimizing resource utilization to reduce costs while maintaining performance. You will also collaborate closely with infrastructure teams to implement DevOps best practices, participate in troubleshooting critical incidents, and drive root cause analysis. To qualify for this role, you should have a minimum of 6+ years of experience with a Bachelor's degree in computer science, Engineering, or a related field, or equivalent work experience. Additionally, you should have at least 5 years of experience with cloud platforms like AWS, Azure, and GCP, along with proficiency in infrastructure automation tools such as Terraform and Ansible. Strong scripting/programming skills in languages like Python, Ruby, and Bash, as well as knowledge of Linux/Unix and Windows systems, are also required. Desired characteristics for this role include a stakeholder-focused approach, experience working with cross-regional teams, strong problem-solving skills, and the ability to work independently in a remote environment. This position also requires working from 02:00 PM to 11:00 PM IST, with flexibility for meetings with global teams during specific hours. If you are an internal applicant, ensure you meet the criteria and mandatory skills required for the role before applying. Inform your manager and HRM about your application, update your professional profile, and ensure there are no corrective action plans in place. Eligibility is limited to L9+ employees who have completed specific tenure in the organization and current role. Join Synchrony's Technology team to contribute to building scalable, secure, and high-performing solutions that drive innovation, efficiency, and reliability across our digital and operational platforms.,
Posted 1 month ago
2.0 - 6.0 years
0 Lacs
karnataka
On-site
As a Site Reliability Engineer Developer - Analyst at Goldman Sachs in Bengaluru, your role encompasses the discipline of Site Reliability Engineering (SRE). SRE combines software and systems engineering to construct and manage large-scale, fault-tolerant systems. In this position, you are entrusted with the critical responsibility of ensuring the availability and reliability of the firm's platform services to meet the needs of both internal and external users. Collaboration with business stakeholders is a key aspect of your work to develop and sustain production systems that can adapt swiftly to the dynamic global business landscape of the organization. The SRE team focuses on the development and maintenance of platforms that facilitate adherence to Observability requirements and SLA Management by GS Engineering Teams. Your responsibilities include the design, development, and operation of distributed systems that offer observability for Goldman's mission-critical applications and platform services across on-premises data centers and various public cloud environments. The team's core functions involve the provision of tools for alerting, metrics and monitoring, log collection and analysis, as well as tracing. These tools are utilized by numerous engineers daily, emphasizing the paramount importance of reliability in system features. In your role, you will collaborate with internal stakeholders, vendors, product owners, and fellow SREs to conceptualize and implement a large-scale distributed system capable of managing alert generation, metrics collection, log collection, and trace events efficiently. Operating in a production environment spanning cloud and on-premises data centers, you will be instrumental in defining observability features and spearheading their execution. Basic qualifications for this role include a minimum of 2 years of relevant work experience and proficiency in languages such as Java, Python, Go, JavaScript, and the Spring framework. Additionally, expertise in using Terraform for Infrastructure deployment and management, along with strong programming skills encompassing code development, debugging, testing, and optimization, are essential. A solid background in algorithms, data structures, and software design, coupled with experience in distributed systems design, maintenance, and troubleshooting, is highly valued. Preferred experience for this role includes familiarity with cloud-native solutions in AWS or GCP, working knowledge of tools like Prometheus, Grafana, and PagerDuty, and experience with databases such as PostgreSQL, MongoDB, and Elasticsearch. Proficiency in open-source messaging systems like RabbitMQ and/or Kafka, as well as hands-on systems experience in UNIX/Linux and networking, especially in scaling for performance and debugging complex distributed systems, is advantageous.,
Posted 1 month ago
5.0 - 7.0 years
0 Lacs
Bengaluru, Karnataka, India
On-site
Job Description Bachelors/Masters degree in Computer Science, Information Technology or related field 5-7 years of experience in a DevOps role Strong understanding of the SDLC and experience with working on fully Agile teams Proven experience in coding & scripting DevOps, Ant/Maven, Groovy, Terraform, Shell Scripting, and Helm Chart skills. Working experience with IaC tools like Terraform, CloudFormation, or ARM templates Strong experience with cloud computing platforms (e.g. Oracle Cloud (OCI), AWS, Azure, Google Cloud) Experience with containerization technologies (e.g. Docker, Kubernetes/EKS/AKS) Experience with continuous integration and delivery tools (e.g. Jenkins, GitLab CI/CD) Kubernetes - Experience with managing Kubernetes clusters and using kubectl for managing helm chart deployments, ingress services, and troubleshooting pods. OS Services Basic Knowledge to Manage, configuring, and troubleshooting Linux operating system issues (Linux), storage (block and object), networking (VPCs, proxies, and CDNs) Monitoring and instrumentation - Implement metrics in Prometheus, Grafana, Elastic, log management and related systems, and Slack/PagerDuty/Sentry integrations Strong know-how of modern distributed version control systems (e.g. Git, GitHub, GitLab etc) Strong troubleshooting and problem-solving skills, and ability to work well under pressure Excellent communication and collaboration skills, and ability to lead and mentor junior team members Career Level - IC3 Responsibilities Design, implement, and maintain automated build, deployment, and testing systems Experience in Taking Application Code and Third Party Products and Building Fully Automated Pipelines for Java Applications to Build, Test and Deploy Complex Systems for delivery in Cloud. Ability to Containerize an Application i.e. creating Docker Containers and Pushing them to an Artifact Repository for deployment on containerization solutions with OKE (Oracle container Engine for Kubernetes) using Helm Charts. Lead efforts to optimize the build and deployment processes for high-volume, high-availability systems Monitor production systems to ensure high availability and performance, and proactively identify and resolve issues Support and Troubleshoot Cloud Deployment and Environment Issues Create and maintain CI/CD pipelines using tools such as Jenkins, GitLab CI/CD Continuously improve the scalability and security of our systems, and lead efforts to implement best practices Participate in the design and implementation of new features and applications, and provide guidance on best practices for deployment and operations Work with security team to ensure compliance with industry and company standards, and implement security measures to protect against threats Keep up-to-date with emerging trends and technologies in DevOps, and make recommendations for improvement Lead and mentor junior DevOps engineers and collaborate with cross-functional teams to ensure successful delivery of projects Analyze, design develop, troubleshoot and debug software programs for commercial or end user applications. Writes code, completes programming and performs testing and debugging of applications. As a member of the software engineering division, you will analyze and integrate external customer specifications. Specify, design and implement modest changes to existing software architecture. Build new products and development tools. Build and execute unit tests and unit test plans. Review integration and regression test plans created by QA. Communicate with QA and porting engineering to discuss major changes to functionality. Work is non-routine and very complex, involving the application of advanced technical/business skills in area of specialization. Leading contributor individually and as a team member, providing direction and mentoring to others. BS or MS degree or equivalent experience relevant to functional area. 6+ years of software engineering or related experience. Qualifications Career Level - IC3 About Us As a world leader in cloud solutions, Oracle uses tomorrows technology to tackle todays challenges. Weve partnered with industry-leaders in almost every sectorand continue to thrive after 40+ years of change by operating with integrity. We know that true innovation starts when everyone is empowered to contribute. Thats why were committed to growing an inclusive workforce that promotes opportunities for all. Oracle careers open the door to global opportunities where work-life balance flourishes. We offer competitive benefits based on parity and consistency and support our people with flexible medical, life insurance, and retirement options. We also encourage employees to give back to their communities through our volunteer programs. Were committed to including people with disabilities at all stages of the employment process. If you require accessibility assistance or accommodation for a disability at any point, let us know by emailing [HIDDEN TEXT] or by calling +1 888 404 2494 in the United States. Oracle is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans status, or any other characteristic protected by law. Oracle will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law. Show more Show less
Posted 1 month ago
7.0 - 9.0 years
0 Lacs
Hyderabad, Telangana, India
On-site
Job Title: SRE Engineer with GCP cloud Location: Hyderabad & Ahmedabad Work Model: Hybrid 3 Days from office Exp in year: 7years+ Job Overview Dynamic, motivated individuals deliver exceptional solutions for the production resiliency of the systems. The role incorporates aspects of software engineering and operations, DevOps skills to come up with efficient ways of managing and operating applications. The role will require a high level of responsibility and accountability to deliver technical solutions. Summary: As a Senior SRE, you will ensure platform reliability, incident management, and performance optimization. You&aposll define SLIs/SLOs, contribute to robust observability practices, and drive proactive reliability engineering across services. Roles and Responsibilities: Define and measure Service Level Indicators (SLIs), Service Level Objectives (SLOs), and manage error budgets across services. Lead incident management for critical production issues drive root cause analysis (RCA) and post-mortems. Create and maintain run books and standard operating procedures for high[1]availability services. Design and implement observability frameworks using ELK, Prometheus, and Grafana; drive telemetry adoption. Coordinate cross-functional war-room sessions during major incidents and maintain response logs. Develop and improve automated system recovery, alert suppression, and escalation logic. Use GCP tools like GKE, Cloud Monitoring, and Cloud Armor to improve performance and security posture. Collaborate with DevOps and Infrastructure teams to build highly available and scalable systems. Analyse performance metrics and conduct regular reliability reviews with engineering leads. Participate in capacity planning, failover testing, and resilience architecture reviews. Mandatory: Cloud: GCP (GKE, Load Balancing, VPN, IAM) Observability: Prometheus, Grafana, ELK, Data dog Containers & Orchestration: Kubernetes, Docker Incident Management: On-call, RCA, SLIs/SLOs IaC: Terraform, Helm Incident Tools: PagerDuty, OpsGenie Nice to Have: GCP Monitoring, Sky walking Service Mesh, API Gateway GCP Spanner, MongoDB (basic) Show more Show less
Posted 1 month ago
6.0 - 11.0 years
11 - 15 Lacs
Bengaluru
Work from Office
Associate Lead- Kubernetes Platform Is your passion for Cloud Native Platform That is, envisioning and building the core services that underpin all Thomson Reuters products Then we want you on our India-based team ! This role is in the Platform Engineering organization where we build the foundational services that power Thomson Reuters products. We focus on the subset of capabilities that help Thomson Reuters deliver digital products to our customers . Our mission is to build a durable competitive advantage for TR by providing building blocks that get value-to-market faster. About the Role This role is within Platform Engineerings Service Mesh team, a dedicated group which engineers and operates our Service Mesh capability, which is a microservice platform based on Kubernetes and Istio. Primarily work with AWS and Azure public cloud, especially Kubernetes (AWS EKS and Azure AKS), Service Mesh technology like Istio, Terraform, Datadog, PagerDuty and Python, Golang, Java and/or .Net Core Programming- Golang, Other - Java, C# & Primary SkillGolang, Kubernates Work closely with an architect, establish and entrench the architectural design & principles for Service Mesh Participate in all aspects of the development lifecycleIdeation, Design, Build, Test and Operate . We embrace a DevOps culture (you build it, you run it); while we have dedicated 24x7 level-1 support engineers, you may be called on to assist with level-2 support About You 6+ years software development experience 2+ years of experience building cloud native infrastructure, applications and services on AWS, Azure or GCP Hands-on experience with Kubernetes , ideally AWS EKS and/or Azure AKS Experience with Istio or other Service Mesh technologies Experience with container security and supply chain security Experience with declarative infrastructure-as-code, CI/CD automation and GitOps Experience with Kubernetes operators written in Golang A bachelor's degree in computer science , Computer Engineering or similar #LI-PP1 Whats in it For You Hybrid Work Model Weve adopted a flexible hybrid working environment (2-3 days a week in the office depending on the role) for our office-based roles while delivering a seamless experience that is digitally and physically connected. Flexibility & Work-Life Balance: Flex My Way is a set of supportive workplace policies designed to help manage personal and professional responsibilities, whether caring for family, giving back to the community, or finding time to refresh and reset. This builds upon our flexible work arrangements, including work from anywhere for up to 8 weeks per year, empowering employees to achieve a better work-life balance. Career Development and Growth: By fostering a culture of continuous learning and skill development, we prepare our talent to tackle tomorrows challenges and deliver real-world solutions. Our Grow My Way programming and skills-first approach ensures you have the tools and knowledge to grow, lead, and thrive in an AI-enabled future. Industry Competitive Benefits We offer comprehensive benefit plans to include flexible vacation, two company-wide Mental Health Days off, access to the Headspace app, retirement savings, tuition reimbursement, employee incentive programs, and resources for mental, physical, and financial wellbeing. Culture: Globally recognized, award-winning reputation for inclusion and belonging, flexibility, work-life balance, and more. We live by our valuesObsess over our Customers, Compete to Win, Challenge (Y)our Thinking, Act Fast / Learn Fast, and Stronger Together. Social Impact Make an impact in your community with our Social Impact Institute. We offer employees two paid volunteer days off annually and opportunities to get involved with pro-bono consulting projects and Environmental, Social, and Governance (ESG) initiatives. Making a Real-World Impact: We are one of the few companies globally that helps its customers pursue justice, truth, and transparency. Together, with the professionals and institutions we serve, we help uphold the rule of law, turn the wheels of commerce, catch bad actors, report the facts, and provide trusted, unbiased information to people all over the world. Thomson Reuters informs the way forward by bringing together the trusted content and technology that people and organizations need to make the right decisions. We serve professionals across legal, tax, accounting, compliance, government, and media. Our products combine highly specialized software and insights to empower professionals with the data, intelligence, and solutions needed to make informed decisions, and to help institutions in their pursuit of justice, truth, and transparency. Reuters, part of Thomson Reuters, is a world leading provider of trusted journalism and news. We are powered by the talents of 26,000 employees across more than 70 countries, where everyone has a chance to contribute and grow professionally in flexible work environments. At a time when objectivity, accuracy, fairness, and transparency are under attack, we consider it our duty to pursue them. Sound excitingJoin us and help shape the industries that move society forward. As a global business, we rely on the unique backgrounds, perspectives, and experiences of all employees to deliver on our business goals. To ensure we can do that, we seek talented, qualified employees in all our operations around the world regardless of race, color, sex/gender, including pregnancy, gender identity and expression, national origin, religion, sexual orientation, disability, age, marital status, citizen status, veteran status, or any other protected classification under applicable law. Thomson Reuters is proud to be an Equal Employment Opportunity Employer providing a drug-free workplace. We also make reasonable accommodations for qualified individuals with disabilities and for sincerely held religious beliefs in accordance with applicable law. More information on requesting an accommodation here. Learn more on how to protect yourself from fraudulent job postings here. More information about Thomson Reuters can be found on thomsonreuters.com.
Posted 1 month ago
12.0 - 15.0 years
7 - 11 Lacs
Bengaluru
Work from Office
This role combines leadership in managing cloud infrastructure with customer-focused incident response in a SaaS environment. The ideal candidate has a strong background in AWS cloud platforms, containerized workloads, and leading customer support teams. Youll also act as the primary escalation point for infrastructure and application performance issues. Cloud Operations Ensure 99.9%+ uptime for AWS-hosted SaaS platforms. Manage and maintain cloud infrastructure, including incident response and disaster recovery planning. Collaborate with DevOps, Engineering, IT, and Security teams to deploy, monitor, and optimize services. Proactively resolve issues related to infrastructure and application scalability and reliability. Establish strong operational practices: incident management, root cause analysis, and preventive action planning. Technical Support Lead a support operations team focused on infrastructure and application-related technical issues. Act as the point of escalation for complex, high-priority customer incidents. Ensure SLAs and KPIs are met or exceeded. Continuously improve support processes: ticket handling, escalation paths, and customer responsiveness. Work closely with Customer Success and Professional Services for a unified customer experience. Leadership and Strategy Manage, mentor, and grow a team of support engineers and cloud operations specialists. Continuously assess and improve tooling, operational processes, and technologies. Provide regular operations updates to senior leadership, highlighting KPIs and key trends. Translate business and customer needs into operational improvements. Qualifications Required Bachelors degree in Computer Science, IT, or related field or equivalent experience. 12+ years of relevant experience, including 3+ years in a managerial role. Expertise in AWS and SaaS architecture. Hands-on experience with monitoring tools (Datadog, Prometheus, Grafana, etc.) and incident management systems (ServiceNow, Zendesk, PagerDuty, Opsgenie). Proficient in SQL and experience with databases. Strong understanding of DevOps, CI/CD, and infrastructure-as-code (Terraform, Ansible). Proven track record of achieving high uptime, SLA adherence, and customer satisfaction. Experience managing 24x7 cloud operations in remote or hybrid environments. Strong problem-solving skills and ability to thrive in high-pressure situations. Excellent communication skills across technical and non-technical stakeholders. Willingness to work in APAC and EMEA time zones. Preferred Certifications AWS Professional Certifications Linux System Administration Certifications ITIL Certifications Kubernetes Administrator Certifications What We Offer Comprehensive health and wellness plans Paid time off and company holidays Shift allowances Flexible and remote-friendly work options
Posted 1 month ago
1.0 - 6.0 years
8 - 13 Lacs
Pune
Work from Office
Cloud Observability Administrator JOB_DESCRIPTION.SHARE.HTML CAROUSEL_PARAGRAPH JOB_DESCRIPTION.SHARE.HTML Pune, India India Enterprise IT - 22685 about our diversity, equity, and inclusion efforts and the networks ZS supports to assist our ZSers in cultivating community spaces, obtaining the resources they need to thrive, and sharing the messages they are passionate about. Cloud Observability Administrator ZS is looking for a Cloud Observability Administrator to join our team in Pune. As a Cloud Observability Administrator, you will be working on configuration of various Observability tools and create solutions to address business problems across multiple client engagements. You will leverage information from requirements-gathering phase and utilize past experience to design a flexible and scalable solution; Collaborate with other team members (involved in the requirements gathering, testing, roll-out and operations phases) to ensure seamless transitions. What Youll Do: Deploying, managing, and operating scalable, highly available, and fault tolerant Splunk architecture. Onboarding various kinds of log sources like Windows/Linux/Firewalls/Network into Splunk. Developing alerts, dashboards and reports in Splunk. Writing complex SPL queries. Managing and administering a distributed Splunk architecture. Very good knowledge on configuration files used in Splunk for data ingestion and field extraction. Perform regular upgrades of Splunk and relevant Apps/add-ons. Possess a comprehensive understanding of AWS infrastructure, including EC2, EKS, VPC, CloudTrail, Lambda etc. Automation of manual tasks using Shell/PowerShell scripting. Knowledge of Python scripting is a plus. Good knowledge of Linux commands to manage administration of servers. What Youll Bring: 1+ years of experience in Splunk Development & Administration, Bachelor's Degree in CS, EE, or related discipline Strong analytic, problem solving, and programming ability 1-1.5 years of relevant consulting-industry experience working on medium-large scale technology solution delivery engagements; Strong verbal, written and team presentation communication skills Strong verbal and written communication skills with ability to articulate results and issues to internal and client teams Proven ability to work creatively and analytically in a problem-solving environment Ability to work within a virtual global team environment and contribute to the overall timely delivery of multiple projects Knowledge on Observability tools such as Cribl, Datadog, Pagerduty is a plus. Knowledge on AWS Prometheus and Grafana is a plus. Knowledge on APM concepts is a plus. Knowledge on Linux/Python scripting is a plus. Splunk Certification is a plus. Perks & Benefits ZS offers a comprehensive total rewards package including health and well-being, financial planning, annual leave, personal growth and professional development. Our robust skills development programs, multiple career progression options and internal mobility paths and collaborative culture empowers you to thrive as an individual and global team member. We are committed to giving our employees a flexible and connected way of working. A flexible and connected ZS allows us to combine work from home and on-site presence at clients/ZS offices for the majority of our week. The magic of ZS culture and innovation thrives in both planned and spontaneous face-to-face connections. Travel Travel is a requirement at ZS for client facing ZSers; business needs of your project and client are the priority. While some projects may be local, all client-facing ZSers should be prepared to travel as needed. Travel provides opportunities to strengthen client relationships, gain diverse experiences, and enhance professional growth by working in different environments and cultures. Considering applying? At ZS, we're building a diverse and inclusive company where people bring their passions to inspire life-changing impact and deliver better outcomes for all. We are most interested in finding the best candidate for the job and recognize the value that candidates with all backgrounds, including non-traditional ones, bring. If you are interested in joining us, we encourage you to apply even if you don't meet 100% of the requirements listed above. ZS is an equal opportunity employer and is committed to providing equal employment and advancement opportunities without regard to any class protected by applicable law. To Complete Your Application Candidates must possess or be able to obtain work authorization for their intended country of employment.An on-line application, including a full set of transcripts (official or unofficial), is required to be considered. NO AGENCY CALLS, PLEASE. Find Out More At
Posted 1 month ago
3.0 - 7.0 years
0 Lacs
noida, uttar pradesh
On-site
As an Application Support Engineer at UKG, you will join our engineering teams as a staff augmentation consultant, providing support for the identity platform infrastructure. Your role will involve deployments to production environments, handling escalations and KTLO tasks, and addressing debugging needs. Collaboration with internal development teams to resolve any issues or implement new integrations related to our Identity platform will be a key aspect of your responsibilities. Additionally, you will be involved in root cause analysis and enhancing observability within the platform. To excel in this role, you should possess the following qualifications: - Proficiency in Linux (Ubuntu), including a deep understanding of the Linux operating system. You should have experience in troubleshooting complex issues related to infrastructure, such as disk performance, IOPS, network latency, JVM/GC behavior, and application defects. - Familiarity with scripting languages like Python for automating manual tasks and remediation efforts. - Experience with Ansible for configuration management. - Knowledge of identity platforms and technologies, including SAML2, OAuth2, LDAP query language, OpenDJ, OpenAM, Auth0, Okta, and SSO solutions. - Understanding of Java and best practices for tuning Java applications to perform optimally at scale. - Proficiency in Nginx, Grafana, PagerDuty, Postman/API, Kibana/Splunk, Dynatrace. - Preferred experience with GCP, with familiarity with Azure considered beneficial. If you are a candidate with skills in Java, GCP, and Azure, and are interested in this exciting opportunity, please reach out to us at deepika@codersbrain.com to explore further.,
Posted 1 month ago
10.0 - 16.0 years
30 - 45 Lacs
Bengaluru
Remote
- AWS & SaaS architecture - monitoring tools(Datadog, New Relic, Prometheus, Grafana) - incident mngmnt (PagerDuty, ServiceNow, Zendesk, Opsgenie) - Exp running 24x7 Cloud Ops team - DevOps processes, CI/CD pipelines, IaC tools(Terraform, Ansible)
Posted 1 month ago
1.0 - 5.0 years
0 Lacs
pune, maharashtra
On-site
As a Site Reliability Engineer - Incident Management, you will be responsible for monitoring, maintaining, and managing the entire Qualys infrastructure and services installed at different data centers. In the event of any malfunction in products/services, you will be required to monitor, troubleshoot, repair, and restore the service/system promptly to ensure maximum service availability and performance. Your role will also involve providing support services for Engineering and other technical teams, collaborating for quicker issue resolution, performing end-to-end incident management, documentation, and task automation. Your main responsibilities will include monitoring the performance and capacity of computer systems, utilizing various tools to identify and address issues effectively. You will be expected to conduct basic troubleshooting of platform/product issues, utilize tools such as Splunk, Grafana, Kibana for performance checking, and manage PagerDuty. Additionally, you will assist in task automation wherever applicable, ensure timely resolution of incident tickets, and work on triaging and troubleshooting problems affecting products or services. It will be crucial for you to meticulously track and document all issues and resolutions in detail on the ticketing/documentation tools to enhance the knowledge base and maintain a record of system health. In cases where troubleshooting complex issues is not feasible, you should escalate the problem to management, IT resources, or 3rd party vendors for further assistance. Communication within the team and externally to stakeholders, keeping them informed of relevant information, known issues, and steps being taken, will be an integral part of your role. The Site Reliability Engineer - Incident Management team will operate 24*7*365 on a monthly shift rotation basis as per requirements. To excel in this role, you should possess one to two years of IT Operations (Infra/System admin/Linux) experience or relevant certification. Familiarity with monitoring and integration tools like Splunk, Prometheus, Grafana, Kibana, PagerDuty, Runscope, and incident management tools such as Jira/ServiceNow is beneficial. A good understanding of ITSM main functions and tools, along with strong interpersonal skills to interact with employees at all levels professionally, will be essential. Certifications in computer functionality, Linux, System Admin, VMware, IT Security, or ITSM/ITIL, and knowledge of DevOps/SRE basics, Python, and Cloud will be advantageous for this role.,
Posted 1 month ago
3.0 - 7.0 years
0 Lacs
maharashtra
On-site
This role is eligible for our hybrid work model: Two days in-office. Rotational Shift - Two shifts starting at 6 am and 2 pm (IST) & 2 pm to 10 pm IST. Why this job is a big deal: Are you interested in learning cutting edge technologies Do you enjoy solving complex problems The priceline.com Site Reliability Operations Team offers these and many more opportunities while working in a fast-paced and challenging environment. The team is responsible for ensuring that every area of Priceline.com's site is highly available, reliable, and performing optimally. In this role, you will get to manage and issue track ticket creation, updates, escalations, and participation on incident bridge calls. Adherence to established response SLOs/SLAs and a working knowledge of all monitoring and support tools. Maintain a culture of continuous improvement by providing suggestions for process improvements, providing updates to documentation, providing transfer of knowledge to peers in your area of expertise, and assisting in the training of new hires. Frontline Tier I/II monitoring / escalation / incident response and impact mitigation. Execute Command & Control tasks on our infrastructure. Orchestrate and manage incident lifecycle between external 3rd party vendors, the Site Reliability Engineers (SRE), and internal development teams. Analyze and support the continuous improvement of our monitoring as well as command and control capabilities. Maintain a high level of communication and knowledge sharing: incident lifecycle tracking, runbooks, and operational documentation. Report the health and availability of the site and related services. Who you are: Bachelor's degree in Computer Science or related field or 3-4 years of relevant work experience. Experience with New Relic, PagerDuty, Splunk, Jira, Confluence. Working experience with Incident Management and Change Management. Prior experience in Operations or a fast-paced, high-stress environment with the requirement to resolve multiple interruption-driven priorities simultaneously. Solid understanding of Open Source environments and TCP/IP Networking. Self-motivated and can work both independently and within a team in our 24/7 Operations Center; available for off-hours shift coverage and be able to own technical issues in the role of Incident Commander. Illustrated history of living the values necessary to Priceline: Customer, Innovation, Team, Accountability, and Trust. The Right Results, the Right Way is not just a motto at Priceline; it's a way of life. Unquestionable integrity and ethics are essential. Who we are: WE ARE PRICELINE. Our success as one of the biggest players in online travel is all thanks to our incredible, dedicated team of talented employees. Priceliners are focused on being the best travel deal makers in the world, motivated by our passion to help everyone experience the moments that matter most in their lives. Whether it's a dream vacation, your cousin's graduation, or your best friend's wedding - we make travel affordable and accessible to our customers. Our culture is unique and inspiring (that's what our employees tell us). We're a grown-up, startup. We deliver the excitement of a new venture, without the struggles and chaos that can come with a business that hasn't stabilized. We're on the cutting edge of innovative technologies. We keep the customer at the center of all that we do. Our ability to meet their needs relies on the strength of a workforce as diverse as the customers we serve. We bring together employees from all walks of life, and we are proud to provide the kind of inclusive environment that stimulates innovation, creativity, and collaboration. Priceline is part of the Booking Holdings, Inc. (Nasdaq: BKNG) family of companies, a highly profitable global online travel company with a market capitalization of over $80 billion. Our sister companies include Booking.com, BookingGo, Agoda, Kayak, and OpenTable. If you want to be part of something truly special, check us out! Flexible work at Priceline: Priceline is following a hybrid working model, which includes two days onsite as determined by you and your manager (ideally selecting among Tuesday, Wednesday, or Thursday). On the remaining days, you can choose to be remote or in the office. Diversity and Inclusion are a Big Deal! To be the best travel dealmakers in the world, it's important we have a workforce that reflects the diverse customers and communities we serve. We are committed to cultivating a culture where all employees have the freedom to bring their individual perspectives, life experiences, and passion to work. Priceline is a proud equal opportunity employer. We embrace and celebrate the unique lenses through which our employees see the world. We'd love you to join us and add to our rich mix! Applying for this position: We're excited that you are interested in a career with us. For all current employees, please use the internal portal to find jobs and apply. External candidates are required to have an account before applying.,
Posted 1 month ago
5.0 - 12.0 years
0 Lacs
pune, maharashtra
On-site
As a Senior Service Reliability Engineer at Proofpoint, you will develop a deep understanding of the various services and applications that come together to deliver Proofpoint's next-generation security products. Your primary responsibility will be maintaining and extending the Elasticsearch and Splunk clusters used for critical near-real-time data analysis. This role involves continually evaluating the performance of these clusters, identifying and addressing developing problems, planning changes for high-load events, applying security fixes, testing and performing upgrades, as well as enhancing the monitoring and alert infrastructure. You will also play a key role in maintaining other components of the data pipeline, which may involve serverless or server-based systems for data ingestion into the Elasticsearch pipeline. Optimizing cost vs. performance will be a focus, including testing new hosts or configurations. Automation is a priority, utilizing tools like Puppet and various scripting mechanisms to achieve a build once/run everywhere system. Your work will span various types of infrastructure, including public cloud, Kubernetes clusters, and private data centers, providing exposure to diverse operational environments. Building effective partnerships across different teams within the organization, such as Product, Engineering, and Operations, is crucial. Participation in an on-call rotation and addressing escalated issues promptly are also part of the role. To excel in this position, you are expected to have a Bachelor's degree in computer science, information technology, engineering, or a related discipline. Your expertise should include proficient administration and management of Elasticsearch clusters, with secondary experience in managing Splunk clusters. Proficiency in provisioning and Configuration Management tools like Puppet, Ansible, and Rundeck is essential. Experience in building Automations and Infrastructure as Code using tools like Terraform, Packer, or CloudFormation templates is a plus. You should also be familiar with monitoring and logging tools such as Splunk, Prometheus, and PagerDuty, as well as scripting languages like Python, Bash, Go, Ruby, and Perl. Experience with CI/CD tools like Jenkins, Pipelines, and Artifactory will be beneficial. An inquisitive mind, effective troubleshooting skills, and the ability to navigate a complex system to extract meaningful data are essential qualities for success in this role. In addition to a competitive salary and benefits package, Proofpoint offers a culture focused on talent development, regular promotion cycles, company-sponsored education, and certifications. You will have the opportunity to work with cutting-edge technologies, participate in employee engagement initiatives, and benefit from annual health check-ups and insurance coverage. The company is committed to fostering diversity and inclusion in the workplace, offering hybrid work options, flexible hours, and inclusive facilities to support employees with diverse needs. Persistent Ltd. is an Equal Opportunity Employer that values diversity and prohibits discrimination and harassment. Join us to accelerate your growth professionally and personally, make a positive impact using the latest technologies, and collaborate in an innovative and inclusive environment to unlock global opportunities for learning and development. Let's unleash your full potential at Persistent.,
Posted 1 month ago
3.0 - 8.0 years
4 - 8 Lacs
Bengaluru
Work from Office
Job Description Document Job Role: Customer Success EngineerFunction: Level 2 Escalation Support Engineer Location: Bangalore Shift: Rotational. Primarily US time zones (EST/PST support coverage) Job Summary: We are looking for a highly motivated and technically adept Customer Success Engineer (CSE) t o serve as a key escalation point for Zeta Marketing Platform (ZMP). This role will interface directly with enterprise customers and internal teams to resolve complex technical issues, provide proactive guidance, and contribute to the continuous improvement of our customer experience. Key Responsibilities: Handle escalated customer tickets (L2) , perform in-depth root cause analysis, and drive timely resolution . Communicate with customers primarily via e-mail , and also through Slack, MS Teams and phone as needed. Collaborate cross-functionally with Product, Engineering, QA, Design and DevOps teams to investigate and resolve platform-level issues. Apply a structured and data-driven approach to debugging issues in areas such as API integration, campaign workflows, user interface, and data syncing. Provide technical walkthroughs and consultative guidance to customers on platform capabilities and best practices. Document solutions thoroughly in ticketing systems and contribute to the knowledge base for internal and customer use. Identify trends and proactively suggest product or documentation improvements based on recurring customer pain points. Participate in post-incident reviews, RCA documentation , and follow-ups with impacted customers. Provide support during product upgrades or critical incidents , including weekends or holiday coverage on a rotational basis. Required Skills & Experience: 3+ years of experience in a technical support or product support role in a SaaS or MarTech environment . Demonstrated ownership of L2+ escalation issues with strong analytical thinking and troubleshooting depth. Strong written and verbal communication skills with the ability to simplify complex technical concepts. Hands-on experience with web technologies : APIs (REST), HTML, CSS, JavaScript, SQL, JSON, and browser dev tools. Comfortable using tools like Postman, Grafana, Jira, Confluence or similar systems. Prior experience supporting US-based customers and working US time zone hours (minimum 1 year). Customer-first mindset with excellent consultative and advocacy skills. Ability to manage multiple priorities and deliver under pressure in a fast-paced support environment . Experience in writing or reviewing runbooks, playbooks, and RCA documents . Preferred Qualifications: Exposure to marketing automation platforms , customer data platforms (CDPs), or personalization engines. Experience with SQL-based investigation and understanding of event/data pipelines . Familiarity with tools like Honeycomb, AWS, Snowflake or similar platforms is a plus. Experience in incident management or working with on-call rotations using PagerDuty. Expereince in GenAI tools like OpenAI, MS Co-Pilot or Deepseek. Soft Skills: Self-starter who can work independently with minimal supervision. Strong collaboration skills and a positive attitude in cross-team environments. Detail-oriented with a passion for problem-solving and continuous learning.
Posted 2 months ago
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
We have sent an OTP to your contact. Please enter it below to verify.
Accenture
73564 Jobs | Dublin
Wipro
27625 Jobs | Bengaluru
Accenture in India
22690 Jobs | Dublin 2
EY
20638 Jobs | London
Uplers
15021 Jobs | Ahmedabad
Bajaj Finserv
14304 Jobs |
IBM
14148 Jobs | Armonk
Accenture services Pvt Ltd
13138 Jobs |
Capgemini
12942 Jobs | Paris,France
Amazon.com
12683 Jobs |