Get alerts for new jobs matching your selected skills, preferred locations, and experience range. Manage Job Alerts
5.0 - 7.0 years
0 Lacs
Bengaluru, Karnataka, India
On-site
Job Description Bachelors/Masters degree in Computer Science, Information Technology or related field 5-7 years of experience in a DevOps role Strong understanding of the SDLC and experience with working on fully Agile teams Proven experience in coding & scripting DevOps, Ant/Maven, Groovy, Terraform, Shell Scripting, and Helm Chart skills. Working experience with IaC tools like Terraform, CloudFormation, or ARM templates Strong experience with cloud computing platforms (e.g. Oracle Cloud (OCI), AWS, Azure, Google Cloud) Experience with containerization technologies (e.g. Docker, Kubernetes/EKS/AKS) Experience with continuous integration and delivery tools (e.g. Jenkins, GitLab CI/CD) Kubernetes - Experience with managing Kubernetes clusters and using kubectl for managing helm chart deployments, ingress services, and troubleshooting pods. OS Services Basic Knowledge to Manage, configuring, and troubleshooting Linux operating system issues (Linux), storage (block and object), networking (VPCs, proxies, and CDNs) Monitoring and instrumentation - Implement metrics in Prometheus, Grafana, Elastic, log management and related systems, and Slack/PagerDuty/Sentry integrations Strong know-how of modern distributed version control systems (e.g. Git, GitHub, GitLab etc) Strong troubleshooting and problem-solving skills, and ability to work well under pressure Excellent communication and collaboration skills, and ability to lead and mentor junior team members Career Level - IC3 Responsibilities Design, implement, and maintain automated build, deployment, and testing systems Experience in Taking Application Code and Third Party Products and Building Fully Automated Pipelines for Java Applications to Build, Test and Deploy Complex Systems for delivery in Cloud. Ability to Containerize an Application i.e. creating Docker Containers and Pushing them to an Artifact Repository for deployment on containerization solutions with OKE (Oracle container Engine for Kubernetes) using Helm Charts. Lead efforts to optimize the build and deployment processes for high-volume, high-availability systems Monitor production systems to ensure high availability and performance, and proactively identify and resolve issues Support and Troubleshoot Cloud Deployment and Environment Issues Create and maintain CI/CD pipelines using tools such as Jenkins, GitLab CI/CD Continuously improve the scalability and security of our systems, and lead efforts to implement best practices Participate in the design and implementation of new features and applications, and provide guidance on best practices for deployment and operations Work with security team to ensure compliance with industry and company standards, and implement security measures to protect against threats Keep up-to-date with emerging trends and technologies in DevOps, and make recommendations for improvement Lead and mentor junior DevOps engineers and collaborate with cross-functional teams to ensure successful delivery of projects Analyze, design develop, troubleshoot and debug software programs for commercial or end user applications. Writes code, completes programming and performs testing and debugging of applications. As a member of the software engineering division, you will analyze and integrate external customer specifications. Specify, design and implement modest changes to existing software architecture. Build new products and development tools. Build and execute unit tests and unit test plans. Review integration and regression test plans created by QA. Communicate with QA and porting engineering to discuss major changes to functionality. Work is non-routine and very complex, involving the application of advanced technical/business skills in area of specialization. Leading contributor individually and as a team member, providing direction and mentoring to others. BS or MS degree or equivalent experience relevant to functional area. 6+ years of software engineering or related experience. Qualifications Career Level - IC3 About Us As a world leader in cloud solutions, Oracle uses tomorrows technology to tackle todays challenges. Weve partnered with industry-leaders in almost every sectorand continue to thrive after 40+ years of change by operating with integrity. We know that true innovation starts when everyone is empowered to contribute. Thats why were committed to growing an inclusive workforce that promotes opportunities for all. Oracle careers open the door to global opportunities where work-life balance flourishes. We offer competitive benefits based on parity and consistency and support our people with flexible medical, life insurance, and retirement options. We also encourage employees to give back to their communities through our volunteer programs. Were committed to including people with disabilities at all stages of the employment process. If you require accessibility assistance or accommodation for a disability at any point, let us know by emailing [HIDDEN TEXT] or by calling +1 888 404 2494 in the United States. Oracle is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans status, or any other characteristic protected by law. Oracle will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law. Show more Show less
Posted 2 days ago
7.0 - 9.0 years
0 Lacs
Hyderabad, Telangana, India
On-site
Job Title: SRE Engineer with GCP cloud Location: Hyderabad & Ahmedabad Work Model: Hybrid 3 Days from office Exp in year: 7years+ Job Overview Dynamic, motivated individuals deliver exceptional solutions for the production resiliency of the systems. The role incorporates aspects of software engineering and operations, DevOps skills to come up with efficient ways of managing and operating applications. The role will require a high level of responsibility and accountability to deliver technical solutions. Summary: As a Senior SRE, you will ensure platform reliability, incident management, and performance optimization. You&aposll define SLIs/SLOs, contribute to robust observability practices, and drive proactive reliability engineering across services. Roles and Responsibilities: Define and measure Service Level Indicators (SLIs), Service Level Objectives (SLOs), and manage error budgets across services. Lead incident management for critical production issues drive root cause analysis (RCA) and post-mortems. Create and maintain run books and standard operating procedures for high[1]availability services. Design and implement observability frameworks using ELK, Prometheus, and Grafana; drive telemetry adoption. Coordinate cross-functional war-room sessions during major incidents and maintain response logs. Develop and improve automated system recovery, alert suppression, and escalation logic. Use GCP tools like GKE, Cloud Monitoring, and Cloud Armor to improve performance and security posture. Collaborate with DevOps and Infrastructure teams to build highly available and scalable systems. Analyse performance metrics and conduct regular reliability reviews with engineering leads. Participate in capacity planning, failover testing, and resilience architecture reviews. Mandatory: Cloud: GCP (GKE, Load Balancing, VPN, IAM) Observability: Prometheus, Grafana, ELK, Data dog Containers & Orchestration: Kubernetes, Docker Incident Management: On-call, RCA, SLIs/SLOs IaC: Terraform, Helm Incident Tools: PagerDuty, OpsGenie Nice to Have: GCP Monitoring, Sky walking Service Mesh, API Gateway GCP Spanner, MongoDB (basic) Show more Show less
Posted 3 days ago
6.0 - 11.0 years
11 - 15 Lacs
Bengaluru
Work from Office
Associate Lead- Kubernetes Platform Is your passion for Cloud Native Platform That is, envisioning and building the core services that underpin all Thomson Reuters products Then we want you on our India-based team ! This role is in the Platform Engineering organization where we build the foundational services that power Thomson Reuters products. We focus on the subset of capabilities that help Thomson Reuters deliver digital products to our customers . Our mission is to build a durable competitive advantage for TR by providing building blocks that get value-to-market faster. About the Role This role is within Platform Engineerings Service Mesh team, a dedicated group which engineers and operates our Service Mesh capability, which is a microservice platform based on Kubernetes and Istio. Primarily work with AWS and Azure public cloud, especially Kubernetes (AWS EKS and Azure AKS), Service Mesh technology like Istio, Terraform, Datadog, PagerDuty and Python, Golang, Java and/or .Net Core Programming- Golang, Other - Java, C# & Primary SkillGolang, Kubernates Work closely with an architect, establish and entrench the architectural design & principles for Service Mesh Participate in all aspects of the development lifecycleIdeation, Design, Build, Test and Operate . We embrace a DevOps culture (you build it, you run it); while we have dedicated 24x7 level-1 support engineers, you may be called on to assist with level-2 support About You 6+ years software development experience 2+ years of experience building cloud native infrastructure, applications and services on AWS, Azure or GCP Hands-on experience with Kubernetes , ideally AWS EKS and/or Azure AKS Experience with Istio or other Service Mesh technologies Experience with container security and supply chain security Experience with declarative infrastructure-as-code, CI/CD automation and GitOps Experience with Kubernetes operators written in Golang A bachelor's degree in computer science , Computer Engineering or similar #LI-PP1 Whats in it For You Hybrid Work Model Weve adopted a flexible hybrid working environment (2-3 days a week in the office depending on the role) for our office-based roles while delivering a seamless experience that is digitally and physically connected. Flexibility & Work-Life Balance: Flex My Way is a set of supportive workplace policies designed to help manage personal and professional responsibilities, whether caring for family, giving back to the community, or finding time to refresh and reset. This builds upon our flexible work arrangements, including work from anywhere for up to 8 weeks per year, empowering employees to achieve a better work-life balance. Career Development and Growth: By fostering a culture of continuous learning and skill development, we prepare our talent to tackle tomorrows challenges and deliver real-world solutions. Our Grow My Way programming and skills-first approach ensures you have the tools and knowledge to grow, lead, and thrive in an AI-enabled future. Industry Competitive Benefits We offer comprehensive benefit plans to include flexible vacation, two company-wide Mental Health Days off, access to the Headspace app, retirement savings, tuition reimbursement, employee incentive programs, and resources for mental, physical, and financial wellbeing. Culture: Globally recognized, award-winning reputation for inclusion and belonging, flexibility, work-life balance, and more. We live by our valuesObsess over our Customers, Compete to Win, Challenge (Y)our Thinking, Act Fast / Learn Fast, and Stronger Together. Social Impact Make an impact in your community with our Social Impact Institute. We offer employees two paid volunteer days off annually and opportunities to get involved with pro-bono consulting projects and Environmental, Social, and Governance (ESG) initiatives. Making a Real-World Impact: We are one of the few companies globally that helps its customers pursue justice, truth, and transparency. Together, with the professionals and institutions we serve, we help uphold the rule of law, turn the wheels of commerce, catch bad actors, report the facts, and provide trusted, unbiased information to people all over the world. Thomson Reuters informs the way forward by bringing together the trusted content and technology that people and organizations need to make the right decisions. We serve professionals across legal, tax, accounting, compliance, government, and media. Our products combine highly specialized software and insights to empower professionals with the data, intelligence, and solutions needed to make informed decisions, and to help institutions in their pursuit of justice, truth, and transparency. Reuters, part of Thomson Reuters, is a world leading provider of trusted journalism and news. We are powered by the talents of 26,000 employees across more than 70 countries, where everyone has a chance to contribute and grow professionally in flexible work environments. At a time when objectivity, accuracy, fairness, and transparency are under attack, we consider it our duty to pursue them. Sound excitingJoin us and help shape the industries that move society forward. As a global business, we rely on the unique backgrounds, perspectives, and experiences of all employees to deliver on our business goals. To ensure we can do that, we seek talented, qualified employees in all our operations around the world regardless of race, color, sex/gender, including pregnancy, gender identity and expression, national origin, religion, sexual orientation, disability, age, marital status, citizen status, veteran status, or any other protected classification under applicable law. Thomson Reuters is proud to be an Equal Employment Opportunity Employer providing a drug-free workplace. We also make reasonable accommodations for qualified individuals with disabilities and for sincerely held religious beliefs in accordance with applicable law. More information on requesting an accommodation here. Learn more on how to protect yourself from fraudulent job postings here. More information about Thomson Reuters can be found on thomsonreuters.com.
Posted 3 days ago
12.0 - 15.0 years
7 - 11 Lacs
Bengaluru
Work from Office
This role combines leadership in managing cloud infrastructure with customer-focused incident response in a SaaS environment. The ideal candidate has a strong background in AWS cloud platforms, containerized workloads, and leading customer support teams. Youll also act as the primary escalation point for infrastructure and application performance issues. Cloud Operations Ensure 99.9%+ uptime for AWS-hosted SaaS platforms. Manage and maintain cloud infrastructure, including incident response and disaster recovery planning. Collaborate with DevOps, Engineering, IT, and Security teams to deploy, monitor, and optimize services. Proactively resolve issues related to infrastructure and application scalability and reliability. Establish strong operational practices: incident management, root cause analysis, and preventive action planning. Technical Support Lead a support operations team focused on infrastructure and application-related technical issues. Act as the point of escalation for complex, high-priority customer incidents. Ensure SLAs and KPIs are met or exceeded. Continuously improve support processes: ticket handling, escalation paths, and customer responsiveness. Work closely with Customer Success and Professional Services for a unified customer experience. Leadership and Strategy Manage, mentor, and grow a team of support engineers and cloud operations specialists. Continuously assess and improve tooling, operational processes, and technologies. Provide regular operations updates to senior leadership, highlighting KPIs and key trends. Translate business and customer needs into operational improvements. Qualifications Required Bachelors degree in Computer Science, IT, or related field or equivalent experience. 12+ years of relevant experience, including 3+ years in a managerial role. Expertise in AWS and SaaS architecture. Hands-on experience with monitoring tools (Datadog, Prometheus, Grafana, etc.) and incident management systems (ServiceNow, Zendesk, PagerDuty, Opsgenie). Proficient in SQL and experience with databases. Strong understanding of DevOps, CI/CD, and infrastructure-as-code (Terraform, Ansible). Proven track record of achieving high uptime, SLA adherence, and customer satisfaction. Experience managing 24x7 cloud operations in remote or hybrid environments. Strong problem-solving skills and ability to thrive in high-pressure situations. Excellent communication skills across technical and non-technical stakeholders. Willingness to work in APAC and EMEA time zones. Preferred Certifications AWS Professional Certifications Linux System Administration Certifications ITIL Certifications Kubernetes Administrator Certifications What We Offer Comprehensive health and wellness plans Paid time off and company holidays Shift allowances Flexible and remote-friendly work options
Posted 3 days ago
1.0 - 6.0 years
8 - 13 Lacs
Pune
Work from Office
Cloud Observability Administrator JOB_DESCRIPTION.SHARE.HTML CAROUSEL_PARAGRAPH JOB_DESCRIPTION.SHARE.HTML Pune, India India Enterprise IT - 22685 about our diversity, equity, and inclusion efforts and the networks ZS supports to assist our ZSers in cultivating community spaces, obtaining the resources they need to thrive, and sharing the messages they are passionate about. Cloud Observability Administrator ZS is looking for a Cloud Observability Administrator to join our team in Pune. As a Cloud Observability Administrator, you will be working on configuration of various Observability tools and create solutions to address business problems across multiple client engagements. You will leverage information from requirements-gathering phase and utilize past experience to design a flexible and scalable solution; Collaborate with other team members (involved in the requirements gathering, testing, roll-out and operations phases) to ensure seamless transitions. What Youll Do: Deploying, managing, and operating scalable, highly available, and fault tolerant Splunk architecture. Onboarding various kinds of log sources like Windows/Linux/Firewalls/Network into Splunk. Developing alerts, dashboards and reports in Splunk. Writing complex SPL queries. Managing and administering a distributed Splunk architecture. Very good knowledge on configuration files used in Splunk for data ingestion and field extraction. Perform regular upgrades of Splunk and relevant Apps/add-ons. Possess a comprehensive understanding of AWS infrastructure, including EC2, EKS, VPC, CloudTrail, Lambda etc. Automation of manual tasks using Shell/PowerShell scripting. Knowledge of Python scripting is a plus. Good knowledge of Linux commands to manage administration of servers. What Youll Bring: 1+ years of experience in Splunk Development & Administration, Bachelor's Degree in CS, EE, or related discipline Strong analytic, problem solving, and programming ability 1-1.5 years of relevant consulting-industry experience working on medium-large scale technology solution delivery engagements; Strong verbal, written and team presentation communication skills Strong verbal and written communication skills with ability to articulate results and issues to internal and client teams Proven ability to work creatively and analytically in a problem-solving environment Ability to work within a virtual global team environment and contribute to the overall timely delivery of multiple projects Knowledge on Observability tools such as Cribl, Datadog, Pagerduty is a plus. Knowledge on AWS Prometheus and Grafana is a plus. Knowledge on APM concepts is a plus. Knowledge on Linux/Python scripting is a plus. Splunk Certification is a plus. Perks & Benefits ZS offers a comprehensive total rewards package including health and well-being, financial planning, annual leave, personal growth and professional development. Our robust skills development programs, multiple career progression options and internal mobility paths and collaborative culture empowers you to thrive as an individual and global team member. We are committed to giving our employees a flexible and connected way of working. A flexible and connected ZS allows us to combine work from home and on-site presence at clients/ZS offices for the majority of our week. The magic of ZS culture and innovation thrives in both planned and spontaneous face-to-face connections. Travel Travel is a requirement at ZS for client facing ZSers; business needs of your project and client are the priority. While some projects may be local, all client-facing ZSers should be prepared to travel as needed. Travel provides opportunities to strengthen client relationships, gain diverse experiences, and enhance professional growth by working in different environments and cultures. Considering applying? At ZS, we're building a diverse and inclusive company where people bring their passions to inspire life-changing impact and deliver better outcomes for all. We are most interested in finding the best candidate for the job and recognize the value that candidates with all backgrounds, including non-traditional ones, bring. If you are interested in joining us, we encourage you to apply even if you don't meet 100% of the requirements listed above. ZS is an equal opportunity employer and is committed to providing equal employment and advancement opportunities without regard to any class protected by applicable law. To Complete Your Application Candidates must possess or be able to obtain work authorization for their intended country of employment.An on-line application, including a full set of transcripts (official or unofficial), is required to be considered. NO AGENCY CALLS, PLEASE. Find Out More At
Posted 3 days ago
3.0 - 7.0 years
0 Lacs
noida, uttar pradesh
On-site
As an Application Support Engineer at UKG, you will join our engineering teams as a staff augmentation consultant, providing support for the identity platform infrastructure. Your role will involve deployments to production environments, handling escalations and KTLO tasks, and addressing debugging needs. Collaboration with internal development teams to resolve any issues or implement new integrations related to our Identity platform will be a key aspect of your responsibilities. Additionally, you will be involved in root cause analysis and enhancing observability within the platform. To excel in this role, you should possess the following qualifications: - Proficiency in Linux (Ubuntu), including a deep understanding of the Linux operating system. You should have experience in troubleshooting complex issues related to infrastructure, such as disk performance, IOPS, network latency, JVM/GC behavior, and application defects. - Familiarity with scripting languages like Python for automating manual tasks and remediation efforts. - Experience with Ansible for configuration management. - Knowledge of identity platforms and technologies, including SAML2, OAuth2, LDAP query language, OpenDJ, OpenAM, Auth0, Okta, and SSO solutions. - Understanding of Java and best practices for tuning Java applications to perform optimally at scale. - Proficiency in Nginx, Grafana, PagerDuty, Postman/API, Kibana/Splunk, Dynatrace. - Preferred experience with GCP, with familiarity with Azure considered beneficial. If you are a candidate with skills in Java, GCP, and Azure, and are interested in this exciting opportunity, please reach out to us at deepika@codersbrain.com to explore further.,
Posted 4 days ago
10.0 - 16.0 years
30 - 45 Lacs
Bengaluru
Remote
- AWS & SaaS architecture - monitoring tools(Datadog, New Relic, Prometheus, Grafana) - incident mngmnt (PagerDuty, ServiceNow, Zendesk, Opsgenie) - Exp running 24x7 Cloud Ops team - DevOps processes, CI/CD pipelines, IaC tools(Terraform, Ansible)
Posted 1 week ago
1.0 - 5.0 years
0 Lacs
pune, maharashtra
On-site
As a Site Reliability Engineer - Incident Management, you will be responsible for monitoring, maintaining, and managing the entire Qualys infrastructure and services installed at different data centers. In the event of any malfunction in products/services, you will be required to monitor, troubleshoot, repair, and restore the service/system promptly to ensure maximum service availability and performance. Your role will also involve providing support services for Engineering and other technical teams, collaborating for quicker issue resolution, performing end-to-end incident management, documentation, and task automation. Your main responsibilities will include monitoring the performance and capacity of computer systems, utilizing various tools to identify and address issues effectively. You will be expected to conduct basic troubleshooting of platform/product issues, utilize tools such as Splunk, Grafana, Kibana for performance checking, and manage PagerDuty. Additionally, you will assist in task automation wherever applicable, ensure timely resolution of incident tickets, and work on triaging and troubleshooting problems affecting products or services. It will be crucial for you to meticulously track and document all issues and resolutions in detail on the ticketing/documentation tools to enhance the knowledge base and maintain a record of system health. In cases where troubleshooting complex issues is not feasible, you should escalate the problem to management, IT resources, or 3rd party vendors for further assistance. Communication within the team and externally to stakeholders, keeping them informed of relevant information, known issues, and steps being taken, will be an integral part of your role. The Site Reliability Engineer - Incident Management team will operate 24*7*365 on a monthly shift rotation basis as per requirements. To excel in this role, you should possess one to two years of IT Operations (Infra/System admin/Linux) experience or relevant certification. Familiarity with monitoring and integration tools like Splunk, Prometheus, Grafana, Kibana, PagerDuty, Runscope, and incident management tools such as Jira/ServiceNow is beneficial. A good understanding of ITSM main functions and tools, along with strong interpersonal skills to interact with employees at all levels professionally, will be essential. Certifications in computer functionality, Linux, System Admin, VMware, IT Security, or ITSM/ITIL, and knowledge of DevOps/SRE basics, Python, and Cloud will be advantageous for this role.,
Posted 1 week ago
3.0 - 7.0 years
0 Lacs
maharashtra
On-site
This role is eligible for our hybrid work model: Two days in-office. Rotational Shift - Two shifts starting at 6 am and 2 pm (IST) & 2 pm to 10 pm IST. Why this job is a big deal: Are you interested in learning cutting edge technologies Do you enjoy solving complex problems The priceline.com Site Reliability Operations Team offers these and many more opportunities while working in a fast-paced and challenging environment. The team is responsible for ensuring that every area of Priceline.com's site is highly available, reliable, and performing optimally. In this role, you will get to manage and issue track ticket creation, updates, escalations, and participation on incident bridge calls. Adherence to established response SLOs/SLAs and a working knowledge of all monitoring and support tools. Maintain a culture of continuous improvement by providing suggestions for process improvements, providing updates to documentation, providing transfer of knowledge to peers in your area of expertise, and assisting in the training of new hires. Frontline Tier I/II monitoring / escalation / incident response and impact mitigation. Execute Command & Control tasks on our infrastructure. Orchestrate and manage incident lifecycle between external 3rd party vendors, the Site Reliability Engineers (SRE), and internal development teams. Analyze and support the continuous improvement of our monitoring as well as command and control capabilities. Maintain a high level of communication and knowledge sharing: incident lifecycle tracking, runbooks, and operational documentation. Report the health and availability of the site and related services. Who you are: Bachelor's degree in Computer Science or related field or 3-4 years of relevant work experience. Experience with New Relic, PagerDuty, Splunk, Jira, Confluence. Working experience with Incident Management and Change Management. Prior experience in Operations or a fast-paced, high-stress environment with the requirement to resolve multiple interruption-driven priorities simultaneously. Solid understanding of Open Source environments and TCP/IP Networking. Self-motivated and can work both independently and within a team in our 24/7 Operations Center; available for off-hours shift coverage and be able to own technical issues in the role of Incident Commander. Illustrated history of living the values necessary to Priceline: Customer, Innovation, Team, Accountability, and Trust. The Right Results, the Right Way is not just a motto at Priceline; it's a way of life. Unquestionable integrity and ethics are essential. Who we are: WE ARE PRICELINE. Our success as one of the biggest players in online travel is all thanks to our incredible, dedicated team of talented employees. Priceliners are focused on being the best travel deal makers in the world, motivated by our passion to help everyone experience the moments that matter most in their lives. Whether it's a dream vacation, your cousin's graduation, or your best friend's wedding - we make travel affordable and accessible to our customers. Our culture is unique and inspiring (that's what our employees tell us). We're a grown-up, startup. We deliver the excitement of a new venture, without the struggles and chaos that can come with a business that hasn't stabilized. We're on the cutting edge of innovative technologies. We keep the customer at the center of all that we do. Our ability to meet their needs relies on the strength of a workforce as diverse as the customers we serve. We bring together employees from all walks of life, and we are proud to provide the kind of inclusive environment that stimulates innovation, creativity, and collaboration. Priceline is part of the Booking Holdings, Inc. (Nasdaq: BKNG) family of companies, a highly profitable global online travel company with a market capitalization of over $80 billion. Our sister companies include Booking.com, BookingGo, Agoda, Kayak, and OpenTable. If you want to be part of something truly special, check us out! Flexible work at Priceline: Priceline is following a hybrid working model, which includes two days onsite as determined by you and your manager (ideally selecting among Tuesday, Wednesday, or Thursday). On the remaining days, you can choose to be remote or in the office. Diversity and Inclusion are a Big Deal! To be the best travel dealmakers in the world, it's important we have a workforce that reflects the diverse customers and communities we serve. We are committed to cultivating a culture where all employees have the freedom to bring their individual perspectives, life experiences, and passion to work. Priceline is a proud equal opportunity employer. We embrace and celebrate the unique lenses through which our employees see the world. We'd love you to join us and add to our rich mix! Applying for this position: We're excited that you are interested in a career with us. For all current employees, please use the internal portal to find jobs and apply. External candidates are required to have an account before applying.,
Posted 1 week ago
5.0 - 12.0 years
0 Lacs
pune, maharashtra
On-site
As a Senior Service Reliability Engineer at Proofpoint, you will develop a deep understanding of the various services and applications that come together to deliver Proofpoint's next-generation security products. Your primary responsibility will be maintaining and extending the Elasticsearch and Splunk clusters used for critical near-real-time data analysis. This role involves continually evaluating the performance of these clusters, identifying and addressing developing problems, planning changes for high-load events, applying security fixes, testing and performing upgrades, as well as enhancing the monitoring and alert infrastructure. You will also play a key role in maintaining other components of the data pipeline, which may involve serverless or server-based systems for data ingestion into the Elasticsearch pipeline. Optimizing cost vs. performance will be a focus, including testing new hosts or configurations. Automation is a priority, utilizing tools like Puppet and various scripting mechanisms to achieve a build once/run everywhere system. Your work will span various types of infrastructure, including public cloud, Kubernetes clusters, and private data centers, providing exposure to diverse operational environments. Building effective partnerships across different teams within the organization, such as Product, Engineering, and Operations, is crucial. Participation in an on-call rotation and addressing escalated issues promptly are also part of the role. To excel in this position, you are expected to have a Bachelor's degree in computer science, information technology, engineering, or a related discipline. Your expertise should include proficient administration and management of Elasticsearch clusters, with secondary experience in managing Splunk clusters. Proficiency in provisioning and Configuration Management tools like Puppet, Ansible, and Rundeck is essential. Experience in building Automations and Infrastructure as Code using tools like Terraform, Packer, or CloudFormation templates is a plus. You should also be familiar with monitoring and logging tools such as Splunk, Prometheus, and PagerDuty, as well as scripting languages like Python, Bash, Go, Ruby, and Perl. Experience with CI/CD tools like Jenkins, Pipelines, and Artifactory will be beneficial. An inquisitive mind, effective troubleshooting skills, and the ability to navigate a complex system to extract meaningful data are essential qualities for success in this role. In addition to a competitive salary and benefits package, Proofpoint offers a culture focused on talent development, regular promotion cycles, company-sponsored education, and certifications. You will have the opportunity to work with cutting-edge technologies, participate in employee engagement initiatives, and benefit from annual health check-ups and insurance coverage. The company is committed to fostering diversity and inclusion in the workplace, offering hybrid work options, flexible hours, and inclusive facilities to support employees with diverse needs. Persistent Ltd. is an Equal Opportunity Employer that values diversity and prohibits discrimination and harassment. Join us to accelerate your growth professionally and personally, make a positive impact using the latest technologies, and collaborate in an innovative and inclusive environment to unlock global opportunities for learning and development. Let's unleash your full potential at Persistent.,
Posted 1 week ago
3.0 - 8.0 years
4 - 8 Lacs
Bengaluru
Work from Office
Job Description Document Job Role: Customer Success EngineerFunction: Level 2 Escalation Support Engineer Location: Bangalore Shift: Rotational. Primarily US time zones (EST/PST support coverage) Job Summary: We are looking for a highly motivated and technically adept Customer Success Engineer (CSE) t o serve as a key escalation point for Zeta Marketing Platform (ZMP). This role will interface directly with enterprise customers and internal teams to resolve complex technical issues, provide proactive guidance, and contribute to the continuous improvement of our customer experience. Key Responsibilities: Handle escalated customer tickets (L2) , perform in-depth root cause analysis, and drive timely resolution . Communicate with customers primarily via e-mail , and also through Slack, MS Teams and phone as needed. Collaborate cross-functionally with Product, Engineering, QA, Design and DevOps teams to investigate and resolve platform-level issues. Apply a structured and data-driven approach to debugging issues in areas such as API integration, campaign workflows, user interface, and data syncing. Provide technical walkthroughs and consultative guidance to customers on platform capabilities and best practices. Document solutions thoroughly in ticketing systems and contribute to the knowledge base for internal and customer use. Identify trends and proactively suggest product or documentation improvements based on recurring customer pain points. Participate in post-incident reviews, RCA documentation , and follow-ups with impacted customers. Provide support during product upgrades or critical incidents , including weekends or holiday coverage on a rotational basis. Required Skills & Experience: 3+ years of experience in a technical support or product support role in a SaaS or MarTech environment . Demonstrated ownership of L2+ escalation issues with strong analytical thinking and troubleshooting depth. Strong written and verbal communication skills with the ability to simplify complex technical concepts. Hands-on experience with web technologies : APIs (REST), HTML, CSS, JavaScript, SQL, JSON, and browser dev tools. Comfortable using tools like Postman, Grafana, Jira, Confluence or similar systems. Prior experience supporting US-based customers and working US time zone hours (minimum 1 year). Customer-first mindset with excellent consultative and advocacy skills. Ability to manage multiple priorities and deliver under pressure in a fast-paced support environment . Experience in writing or reviewing runbooks, playbooks, and RCA documents . Preferred Qualifications: Exposure to marketing automation platforms , customer data platforms (CDPs), or personalization engines. Experience with SQL-based investigation and understanding of event/data pipelines . Familiarity with tools like Honeycomb, AWS, Snowflake or similar platforms is a plus. Experience in incident management or working with on-call rotations using PagerDuty. Expereince in GenAI tools like OpenAI, MS Co-Pilot or Deepseek. Soft Skills: Self-starter who can work independently with minimal supervision. Strong collaboration skills and a positive attitude in cross-team environments. Detail-oriented with a passion for problem-solving and continuous learning.
Posted 2 weeks ago
5.0 - 10.0 years
20 - 30 Lacs
Pune, Bengaluru
Hybrid
AWS Site Reliability Engineer (5 to 12 Years) Required Skills & Experience Cloud Services (AWS) Hands-on experience with the following AWS services: EC2 (Elastic Compute Cloud) EKS (Elastic Kubernetes Service) SES (Simple Email Service) SQS (Simple Queue Service) SNS (Simple Notification Service) S3 (Simple Storage Service) DynamoDB RDS / Aurora (Relational Database Service) OpenSearch (formerly Elasticsearch Service) Elasticache Security Groups CloudWatch High-level knowledge of AWS networking concepts, including VPCs and Subnets. Hands-on experience with: Datadog (Monitoring and Observability) GitLab (CI/CD, Version Control) GitHub (Version Control) PagerDuty (On-call Management) BlazeMeter (Performance Testing) K9s (Kubernetes CLI) Technical Skills High Proficiency Terraform for Infrastructure as Code (IaC). Strong scripting abilities in Bash and Python. Familiarity with Go programming language Expertise in using AWS CLI. Proficiency with Kubectl for Kubernetes cluster management. Experience with Helm for Kubernetes package management.
Posted 2 weeks ago
4.0 - 8.0 years
13 - 18 Lacs
Bengaluru
Work from Office
Project description We've been engaged by a large Australian financial institution to provide resources to manage the production support activities along with their existing team in Sydney & India. Responsibilities Carry out enhancements to maintenance/housekeeping scripts as required and monitor the DB growth periodically. Handles cloud Environment preparation, refresh, rebuild, upkeep, maintenance, and upgrade activities. Ensure cloud cost optimisation. Troubleshooting of Murex environment-specific issues including Infrastructure related issues and update pipelines for a permanent fix. Handling EOD execution and troubleshooting of issues related to it. Participate in analysis, solutioning, and deployment of solution for production issues during EoD. Participate in the release activity and coordinate with QA/Release teams. Participate in AWS stack deployment, AWS AMI patching, and stack configuration to ensure optimal performance and cost-efficiency. Address requests like warehouse rebuild, maintenance, Perform Health/sanity checks, create XVA engine, environment restores & backup in AWS as per project need. Perform Weekend maintenance and perform health checks in the production environment during the weekend. Support working in shifts (max end time will be 12.30 AM IST) and available for weekend & on-call support. Have to work out of client location on a need basis. Flexible to work in a Hybrid model. Skills Must have 4 to 8 Years of experience in Murex Production Support Murex End of Day support Troubleshooting batch-related issues, including date moves and processing adjustments Murex Env Management & Troubleshooting Experienced in SQL Unix shell scripting, Monitoring tools, Web development Experienced in the Release and CI/CD process Linux/Unix server and Oracle RDS knowledge Working experience with automation/job scheduling tools such as Autosys, GitHub Actions Working experience with monitoring tools like Grafana, Splunk, Obstack, PagerDuty Good communication and organization skills working within a DevOps team supporting a wider IT delivery team Nice to have PL/SQL, Scripting languages (Python) Advanced troubleshooting experience with Shell scripting and Python Experience with CICD tools like Git, flows, Ansible, and AWS including CDK Exposure to AWS Cloud environment Willing to learn and obtain AWS certification
Posted 3 weeks ago
5.0 - 9.0 years
0 Lacs
haryana
On-site
Job Title Production Support Lead Location Gurgaon, India Reports to Head of Prod Support About FNZ Who we are: FNZ Group is an established and rapidly growing company in the financial technology sector. We partner with the entire industry to make wealth management accessible to more people. Today, we partner with over 650 financial institutions and 8,000 wealth management firms, enabling over 20 million people across all wealth segments to invest in the things they care the most about, on their own terms. We have over 20+ offices globally with 4500 employees (and growing!). To learn more about us and our journey, check out our careers site. Role Description What would you accomplish as a Lead Production Support As Production Support Lead, you will be the go-to person for our client. Your responsibilities extend to overseeing the intricate landscape of issue management, addressing concerns from both external and internal clients to meet key performance indicators (KPIs) and service level agreements (SLAs). A core aspect of your role involves managing the workflow, ensuring the seamless functioning of the application as deployed, emphasizing proactive and reactive measures to champion continuous service improvement. Your expertise comes to the forefront in Incident & Problem Management, where you lead the analysis, investigation, diagnosis, and problem-solving efforts to identify, troubleshoot, and resolve production issues. Additionally, your involvement in Release & Change Management is crucial, as you support the testing and release processes for production fixes. Facilitating the transition between project support and production support during Service Transition is a key responsibility, ensuring a smooth flow of operations. The Responsibilities Will Include: Analyse incidents, recommends solutions, and contributes to service improvement. Ensure that all requests, incidents and problems are dealt with according to set standards and procedures. Direct daily operations, allocate resources, and plan to meet service levels. Proactively address system and service problems, ensuring timely resolution actions. Facilitate development of documented problem solutions and corrective actions. Educate and train internal and external application users. Guide team members, monitor progress, and prioritize quality improvement. Initiate process improvements aligned with business objectives and audits. Drive enhancements aligning with procedural, regulatory, and security requirements. Draft and maintain meticulous documentation for application support procedures. Contribute to audits and reviews, collecting evidence for process evaluation. Undertake diverse projects and tasks to ensure smooth production operations. Experience Required What we are looking for: Degree preferable in either Commerce/IT or a related field; or equivalent. Expert SQL skills. Independent, self-directing and delivery focused working style. Superior analytical thinking and keen attention to detail. Good communication skills, confident in dealing with internal and external clients. Passionate about providing an excellent service experience for our clients. Demonstrable ability to provide leadership and direction in incident management, to effectively prioritize and execute tasks in a high-pressure environment. Builds relationships with senior internal and external stakeholders. Experience in support and incident management, ITIL preferably. For Technical skills, SQL, Application monitoring tools New Relic, Datadog, APM, Splunk, PagerDuty. Experience Preferred Beneficial but not essential. Interest / familiarity with financial markets and products. Some experience with Microsoft .NET development products, including C#, VB.NET and SQL Server, beneficial but not essential. Open to the variance of work hours, including the flexibility to start earlier or later than standard work hours. Opportunities What We Offer: We are mission led - work at the heart of a purpose-led organization, where you can be proud of the impact you make, every day. Where youll transform the way over 20 million people invest, making wealth management more accessible, sustainable and transparent to more people. Rapid career growth - encouraged to take on responsibility, play a part in the evolution of the company and rapidly drive your career development working on real projects that directly impact our clients and their customers. Market leading technology - Build, create and evolve innovative solutions for the worlds most trusted brands using the latest technologies to help change the face of investing for the future Learning & development Placing emphasis on a willingness to learn, to think differently, to be creative and to help drive innovation. Inclusion In addition, we want to ensure accessibility needs are well supported, if you require specific support, please advise us. About FNZ FNZ is committed to opening up wealth so that everyone, everywhere can invest in their future on their terms. We know the foundation to do that already exists in the wealth management industry, but complexity holds firms back. We created wealths growth platform to help. We provide a global, end-to-end wealth management platform that integrates modern technology with business and investment operations. All in a regulated financial institution. We partner with over 650 financial institutions and 12,000 wealth managers, with US$1.5 trillion in assets under administration (AUA). Together with our customers, we help over 20 million people from all wealth segments to invest in their future.,
Posted 3 weeks ago
5.0 - 9.0 years
0 Lacs
noida, uttar pradesh
On-site
The client's product enables the utilization of customer data through cutting-edge technologies to: - Enhance understanding of customer behavior to a previously unattainable level. - Determine the exact impact of advertising and promotions. - Create real-time profiles of customer segments. - Uncover the relationship between team member performance and customer loyalty. You should have: - Over 5 years of commercial experience as a DevOps professional. - Practical experience in cloud infrastructure provisioning, deployment, and monitoring on Azure for at least 2 years. - Strong familiarity with best DevOps practices and methodologies. - Good understanding of Computer Science and Computing Theory, including network interactions, protocols, deployment patterns, security patterns, software architecture (e.g., microservices, event-driven design), orchestration, and containerization (Docker, Kubernetes). - Hands-on experience with Infrastructure as Code (IaC), especially with ARM templates/Terraform. - Knowledge of logging and monitoring technologies like Zabbix, NewRelic, PagerDuty, Prometheus, and ELK stack. - Experience with CI/CD processes using AzureDevOps, Docker, Kubernetes (AKS), and product services written in .NET. - Proficiency in different delivery methodologies such as SCRUM, Agile, and Kanban. - Upper-Intermediate English language skills. Desirable qualifications include certifications in Azure and Kubernetes, along with practical experience in data engineering, Big Data stack, high-load systems, and microservices in a production environment. As part of the DevOps team, your responsibilities will include: - Collaborating on the creation of Azure infrastructure and setting up K8s clusters (AKS). - Managing CI/CD pipelines and automation processes. - Overseeing release management and infrastructure maintenance. - Participating in decision-making regarding infrastructure design. - Creating and managing dashboards for environments/builds. - Ensuring security controls do not adversely affect production by working with architects and developers. - Communicating effectively with various stakeholders including PM, PO, software developers, architects, and QA. GlobalLogic offers a stimulating work environment with diverse projects in industries like High-Tech, communication, media, healthcare, retail, and telecom. You will have the opportunity to collaborate with a talented team and enjoy work-life balance, professional development programs, competitive benefits, and fun perks. About GlobalLogic: GlobalLogic is a digital engineering leader that helps brands worldwide design and develop innovative products and digital experiences. Headquartered in Silicon Valley, GlobalLogic operates globally, assisting clients across various industries to envision and realize digital transformations.,
Posted 3 weeks ago
6.0 - 10.0 years
0 Lacs
karnataka
On-site
The Senior Developer / Technical Lead Java Full stack position based in Bangalore requires an experienced professional with 6-10 years of experience in software development and architecture. In this role, you will be responsible for providing solutions to technical issues that may impact product delivery. Your key responsibilities will include facilitating requirement analyses, conducting peer reviews, defining processes for technical platforms, and enhancing frameworks. The ideal candidate should possess hands-on experience in Java and ReactJS, with a minimum of 6 years of experience in Java backend/frontend technologies and building distributed enterprise software. Strong expertise in Core & Advanced Java, including threading, design patterns, and data structures, is essential. A good understanding of OOAD, design patterns, and software architecture is also required. Proficiency in Spring Boot, Microservices, Hibernate, MVC, RestAPI, collection, and frameworks is necessary for this role. Additionally, hands-on experience in working with/setting up CI and CD environments, writing SQL queries, and familiarity with collaboration tools like GitHub and DevOps/JIRA are important skills. The successful candidate should have good expertise with JavaScript frameworks like ReactJS, Graph API, and PagerDuty. Experience working in an agile development environment and tools is preferred. The ability to quickly learn and adapt to new business and technical concepts, along with excellent communication, organizational, and problem-solving skills, will be beneficial in this role.,
Posted 3 weeks ago
3.0 - 5.0 years
4 - 8 Lacs
Hyderabad
Work from Office
3-5 years of experience in IT operations and maintenance. Hands-on experience with Grafana, Zabbix, Azure Monitor, and ELK Log Management. Experience with large-scale monitoring system setup and maintenance. Good exposure to commonly used ITSM tools, including PagerDuty and ServiceNow. Basic understanding of public cloud knowledge, including IaaS, PaaS, and SaaS. Proactive approach to identifying problems, performance bottlenecks, and areas for improvement. Primary Skills Configure and implement end-to-end monitoring solutions for applications and infrastructure. Configure and maintain log analytic tools for applications and infrastructure. Develop mock-up views and build workable dashboards following a defined methodology based on briefings from various stakeholders. Short Description Open to work in 24*7 Shift. Microsoft Azure Monitor PagerDuty ELK Log Management
Posted 1 month ago
5.0 - 10.0 years
3 - 7 Lacs
Mumbai
Work from Office
We are looking for a skilled Java Backend Developer with 5 to 12 years of experience to develop and maintain backend services using Java Spring and JavaScript. The ideal candidate will have hands-on experience as a backend developer, proficiency in Java Spring framework and JavaScript, and experience with at least one cloud provider. Roles and Responsibility Develop and maintain scalable and efficient backend systems using Java Spring and JavaScript. Design, implement, and optimize cloud-based solutions on AWS, GCP, or Azure. Work with SQL and NoSQL databases such as PostgreSQL, MySQL, and MongoDB for data persistence. Architect and develop Kubernetes-based microservices caching solutions and messaging systems like Kafka. Implement monitoring, logging, and alerting using tools like Grafana, CloudWatch, Kibana, and PagerDuty. Participate in on-call rotations, handle incident response, and contribute to operational playbooks. Job Hands-on experience as a backend developer with strong understanding of data structures, algorithms, and software design principles. Proficiency in Java Spring framework and JavaScript, with experience in developing scalable and efficient backend systems. Experience with at least one cloud provider, preferably AWS, GCP, or Azure, and knowledge of cloud-based solutions and containerization. Familiarity with microservice architectures, caching solutions, and event-driven architectures using Kafka. Strong communication skills with an emphasis on technical documentation and the ability to work in a globally distributed environment. Ability to contribute to high availability services and participate in on-call rotations.
Posted 1 month ago
8.0 - 13.0 years
15 - 25 Lacs
Hyderabad
Work from Office
Role Summary Akrivia HCM is seeking an experienced Site Reliability Engineer to safeguard the performance, scalability, and availability of our global HR tech platform. You will define service-level objectives, automate infrastructure, lead incident response, and partner with engineering squads to deliver reliable releases at high velocity. Key Responsibilities Define and track SLIs/SLOs for latency, availability, and error budgets. Build and maintain Terraform/Helm/ArgoCD stacks; convert manual toil into code. Instrument services with Prometheus, Grafana, Datadog, and OpenTelemetry; create actionable alerts & dashboards. Serve in the on-call rotation, lead rapid mitigation, run blameless post-mortems, and close action items. Model load growth, tune autoscaling policies, run load tests, and drive cost-optimisation reviews. Design chaos game-days and fault-injection experiments to validate fail-over and recovery paths. Review designs/PRs for reliability anti-patterns and coach development teams on SRE best practices. Must-Have Qualifications 5+ years operating large-scale, user-facing SaaS systems on AWS, GCP, or Azure (Kubernetes/EKS preferred). Proficiency with Infrastructure-as-Code (Terraform, Helm, Pulumi, or CloudFormation) and GitOps (ArgoCD/Flux). Hands-on experience building observability stacks (Prometheus, Grafana, Datadog, New Relic, etc.). Proven track record reducing MTTR and change-failure rate through automation and robust incident processes. Strong scripting or programming skills in Go, Python, or TypeScript. Deep debugging skills across Linux, networking, containers, databases, and web/API layers. Excellent written and verbal communication skills. Good-to-Have Skills Exposure to AWS Well-Architected reviews, FinOps, or cost-optimisation initiatives. Experience with service mesh (Istio/Linkerd), event-driven systems (Kafka/NATS), or serverless (Lambda). Familiarity with SOC 2 / ISO 27001 controls and secrets management (AWS KMS, Vault). Chaos engineering tools (ChaosMesh, Gremlin) and performance testing (k6, Gatling). Certifications such as AWS DevOps Pro, CKA/CKAD, or Google Cloud SRE.
Posted 1 month ago
6.0 - 10.0 years
12 - 16 Lacs
Pune
Work from Office
We are on the lookout for a hands-on DevOps / SRE expert who thrives in a dynamic, cloud-native environment! Join a high-impact project where your infrastructure and reliability skills will shine.. Key Responsibilities. Design & implement resilient deployment strategies (Blue-Green, Canary, GitOps). Manage observability tools: logs, metrics, traces, and alerts. Tune backend services & GKE workloads (Node.js, Django, Go, Java). Build & manage Terraform infra (VPC, CloudSQL, Pub/Sub, Secrets). Lead incident responses & perform root cause analyses. Standardize secrets, tagging & infra consistency across environments. Enhance CI/CD pipelines & collaborate on better rollout strategies. Must-Have Skills. 510 years in DevOps / SRE / Infra roles. Kubernetes (GKE preferred). IaC with Terraform & Helm. CI/CD: GitHub Actions + GitOps (ArgoCD / Flux). Cloud architecture expertise (IAM, VPC, Secrets). Strong scripting/coding & backend debugging skills (Node.js, Django, etc.) ?. Incident management with tools like Datadog & PagerDuty. Excellent communicator & documenter. Tech Stack. GKE, Kubernetes, Terraform, Helm. GitHub Actions, ArgoCD / Flux. Datadog, PagerDuty. CloudSQL, Cloudflare, IAM, Secrets. You're. A proactive team player & strong individual contributor. Confident yet humble. Curious, driven & always learning. Not afraid to solve deep infrastructure challenges. (ref:hirist.tech). Show more Show less
Posted 1 month ago
0.0 years
0 Lacs
Hyderabad, Telangana, India
On-site
Genpact (NYSE: G) is a global professional services and solutions firm delivering outcomes that shape the future. Our 125,000+ people across 30+ countries are driven by our innate curiosity, entrepreneurial agility, and desire to create lasting value for clients. Powered by our purpose - the relentless pursuit of a world that works better for people - we serve and transform leading enterprises, including the Fortune Global 500, with our deep business and industry knowledge, digital operations services, and expertise in data, technology, and AI. Inviting applications for the role of NOC & IM Engineer(L2) - Technical Associate In this role, Individual will be responsible to monitor and manage the detection, correction, and prevention of incidents, maintain SLAs, and ensure that critical issues are resolved promptly in order to reduce production downtime. Responsibilities . Responsible for Monitoring of Production site and services across the Enterprise. Detect issues and execute Runbooks or Escalate to the appropriate Service Owners for Investigation. . Responsible for managing Service now tickets. . Facilitate in the resolution of Major incidents across the Enterprise to bring services back to normal state and mitigate impact. . Ensure incident communication to stakeholders is consistent, clear, concise, and made in a timely manner. . Perform Postmortem for all Major Incidents to find the root cause once incidents are resolved to permanently fix the problem and support continuous improvement. . Ensure standardized methods, processes and procedures are used for all changes. . Facilitate efficient and prompt handling of all changes. . Perform Change Governance and run CAB Meetings. . Oversee processes & tools for Incident, Problem and Change Management. . Define, generate and publish KPI/metrics for transparency into incidents and teams affected, problems and root causes, change requests to production environment. . Hands on experience on Incident, Problem and Change Management. Qualifications we seek in you! Minimum Qualifications / Skills . Bachelor%27s Degree required. Preferably in Computer Science, Information Systems, or related field. . Excellent Communication skills Preferred Tool Skills . BigPanda . SLAM / Neustar - Vercara . Tardis . SNOW (INCIDENT/INCIDENTTASK/REQ/SCTASK/PROBLEM/PTASK/CHANGE) . PagerDuty . Splunk . Zabbix . JIRA . Teams . Reporting Genpact is an Equal Opportunity Employer and considers applicants for all positions without regard to race, color, religion or belief, sex, age, national origin, citizenship status, marital status, military/veteran status, genetic information, sexual orientation, gender identity, physical or mental disability or any other characteristic protected by applicable laws. Genpact is committed to creating a dynamic work environment that values respect and integrity, customer focus, and innovation. Get to know us at genpact.com and on LinkedIn, X, YouTube, and Facebook. Furthermore, please do note that Genpact does not charge fees to process job applications and applicants are not required to pay to participate in our hiring process in any other way. Examples of such scams include purchasing a %27starter kit,%27 paying to apply, or purchasing equipment or training.
Posted 1 month ago
3.0 - 7.0 years
13 - 18 Lacs
Bengaluru
Work from Office
Project description We've been engaged by a large Australian financial institution to provide resources to manage the production support activities along with their existing team in Sydney & India. Responsibilities Carry out enhancements to maintenance/housekeeping scripts as required and monitor the DB growth periodically. Handles cloud Environment preparation, refresh, rebuild, upkeep, maintenance, and upgrade activities. Ensure cloud cost optimisation. Troubleshooting of Murex environment-specific issues including Infrastructure related issues and update pipelines for a permanent fix. Handling EOD execution and troubleshooting of issues related to it. Participate in analysis, solutioning, and deployment of solution for production issues during EoD. Participate in the release activity and coordinate with QA/Release teams. Participate in AWS stack deployment, AWS AMI patching, and stack configuration to ensure optimal performance and cost-efficiency. Address requests like warehouse rebuild, maintenance, Perform Health/sanity checks, create XVA engine, environment restores & backup in AWS as per project need. Perform Weekend maintenance and perform health checks in the production environment during the weekend. Support working in shifts (max end time will be 12.30 AM IST) and available for weekend & on-call support. Have to work out of client location on a need basis. Flexible to work in a Hybrid model. Skills Must have 4 to 8 Years of experience in Murex Production Support Murex End of Day support Troubleshooting batch-related issues, including date moves and processing adjustments Murex Env Management & Troubleshooting Experienced in SQL Unix shell scripting, Monitoring tools, Web development Experienced in the Release and CI/CD process Linux/Unix server and Oracle RDS knowledge Working experience with automation/job scheduling tools such as Autosys, GitHub Actions Working experience with monitoring tools like Grafana, Splunk, Obstack, PagerDuty Good communication and organization skills working within a DevOps team supporting a wider IT delivery team Nice to have PL/SQL, Scripting languages (Python) Advanced troubleshooting experience with Shell scripting and Python Experience with CICD tools like Git, flows, Ansible, and AWS including CDK Exposure to AWS Cloud environment Willing to learn and obtain AWS certification Other Languages EnglishC1 Advanced Seniority Regular
Posted 1 month ago
3.0 - 5.0 years
9 - 11 Lacs
Bengaluru
Hybrid
Dear Professional, We are excited to present a unique opportunity at Cognizant, a leading IT firm renowned for fostering growth and innovation. We are seeking talented professionals with 3 to 5 years of experience in Major Incident Management,Critical Incident Handling,Incident Response,ITIL Incident Management Root Cause Analysis,Incident Escalation,Service Restoration,War Room Coordination,ServiceNow,BMC Remedy,Jira Service Management,PagerDuty,ISO 20000,COBIT,Major Incident Manager,Incident Response Lead to join our dynamic team. Your expertise in these areas is highly sought after, and we believe your contributions will be instrumental in driving our projects to new heights. We offer a collaborative environment where your skills will be valued and nurtured. To proceed to the next step of the recruitment process, please provide us with the following details with Updated resume to sathish.kumarmr@cognizant.com Please share below details (Mandatory) : Full Name(As per Pan card): Contact number: Email Current Location: Interested Locations: Total Years of experience: Relevant years of experience: Current company: Notice period: NP negotiable: if yes how many days they can negotiate? : If you are Serving any Notice period Means please mention Last date of Working: Current CTC- Expected CTC- Availability for interview on Weekdays ? Highest Qualification? Additionally, we would like to schedule a virtual interview with you on 26th June 2025 . Kindly confirm your availability for the same. We look forward to the possibility of you bringing your valuable experience to Cognizant. Please respond at your earliest convenience. Thanks & Regards, Sathish Kumar M R HR-Cognizant Sathish.KumarMR@cognizant.com
Posted 1 month ago
8.0 - 12.0 years
0 Lacs
Hyderabad, Telangana, India
On-site
About Zeta Zeta is a Next-Gen Banking Tech company that empowers banks and fintechs to launch banking products for the future. It was founded by and Ramki Gaddipati in 2015. Our flagship processing platform - Zeta Tachyon - is the industry's first modern, cloud-native, and fully API-enabled stack that brings together issuance, processing, lending, core banking, fraud & risk, and many more capabilities as a single-vendor stack. 20M+ cards have been issued on our platform globally. Zeta is actively working with the largest Banks and Fintechs in multiple global markets transforming customer experience for multi-million card portfolios. Zeta has over 1700+ employees - with over 70% roles in R&D - across locations in the US , EMEA , and Asia . We raised $280 million at a $1.5 billion valuation from Softbank, Mastercard, and other investors in 2021. Learn more @, , , The Site Delivery Manager is responsible for end-to-end service delivery and operational excellence for a specific site. This role ensures the stability, performance, and continuous improvement of IT services, while managing key performance indicators (KPIs), incident and change management, cost governance, and customer satisfaction. The individual will serve as the primary liaison between business stakeholders, SRE/infra teams, and other technology units to drive operational maturity and service reliability. Responsibilities: Service Delivery & Operations Management Own and manage site-level SLAs for incidents, problems, and changes Ensure adherence to MTTA (Mean Time to Acknowledge) and MTTR (Mean Time to Resolve) metrics for Alerts & Incidents Oversee incident lifecycle and ensure timely Root Cause Analysis (RCA) Track problem ticket aging and drive problem resolution Manage service delivery reviews, post-incident reviews, and escalations Change Management Lead the Change Advisory Board (CAB) process at the site level Review and approve changes ensure minimal service disruption during deployments Validate and document post-deployment summaries and outcomes Monitoring & Governance Oversee handover of SaaS product monitoring responsibilities to Zeta command center (ZCC) Monitor alerts, dashboards, and performance trends to proactively prevent incidents Maintain high security posture by coordinating with InfoSec and Compliance teams Customer and Stakeholder Engagement Act as the primary point of contact for internal and external stakeholders at the site Own customer-facing RCA communication and service quality improvements Facilitate cross-functional collaboration across product, SRE, infrastructure, and customer teams Cost & Resource Management Own and manage the site's technology budget ensure cost adherence Conduct monthly/quarterly cost anomaly analysis and optimizations Work with platform and finance team for infrastructure/resource planning People & Process Drive process improvements and operational maturity Foster a culture of accountability, resilience, and continuous improvement Skills: Strong operational and delivery management Excellent communication, stakeholder, and conflict-resolution skills Data-driven decision-making and analytical thinking Budgeting, cost analysis, and resource planning Familiarity with cloud platforms (AWS) Experience & Qualifications: Bachelor's degree in computer science, Engineering, or a related field (master's preferred) 8-12 years of experience in IT Service Management, SRE, or infrastructure operations Strong understanding of ITIL framework, site reliability principles, and cloud operations Experience with monitoring tools (e.g., Datadog, Prometheus, Grafana), incident platforms (e.g., OpsGenie/PagerDuty, Jira Service Management / ServiceNow), and change management tools Proven leadership skills in managing cross-functional teams and engaging with senior stakeholders
Posted 1 month ago
5.0 - 7.0 years
25 - 40 Lacs
Pune
Work from Office
Our world is transforming, and PTC is leading the way.Our software brings the physical and digital worlds together, enabling companies to improve operations, create better products, and empower people in all aspects of their business. Our people make all the difference in our success. Today, we are a global team of nearly 7,000 and our main objective is to create opportunities for our team members to explore, learn, and grow – all while seeing their ideas come to life and celebrating the differences that make us who we are and the work we do possible. Job Details As a senior SRE / Observability Engineer, you will be part of the Atlas Platform Engineering team and will: Create and maintain observability standards and best practices Review the current observability platform, identify areas for improvement, and guide the team in enhancing monitoring, logging, tracing, and alerting capabilities. Expand the observability stack across multiple clouds, regions, and clusters, managing all observability data. Design and implement monitoring solutions for complex distributed systems to provide deep insights into systems and services aiming at complete visibility of digital operations Supporting the ongoing evaluation of new capabilities in the observability stack, conducting proof of concepts, pilots, and tests to validate their suitability. Assist teams in creating clear, informative, and actionable dashboards to improve system visibility. Automate monitoring and alerting processes, including enrichment strategies and ML-driven anomaly detection where applicable. Provide technical leadership to the observability team with clear priorities ensuring agreed outcomes are achieved in a timely manner. Work closely with R&D and product development teams (understand their requirements and challenges) to ensure seamless visibility into system and service performance. Work closely with the Traffic Management team to identify and standardise on existing and new observability tools as part of a holistic solution Conduct training sessions and create documentation for internal teams Support the definition of SLI (service level indicators) and SLO (service level objectives) for the Atlas services. Keep track of the error budget of each service Participate in the emergency response process Conduct RCAs (root cause analysis) Help to automate repetitive tasks and reduce toil. Qualifications: People and communication qualifications Be a strong team player Have good collaboration and communication skills Ability to translate technical concepts for non-technical audiences Problem-solving and analytical thinking Technical qualifications - general: Familiarity with cloud platforms (Ideally Azure) Familiarity with Kubernetes and Istio as the architecture on which the observability and Atlas services run, and how they integrate and scale. Experience with infrastructure as code and automation Knowledge of common programming languages and debugging techniques Have a strong technical background and be hands on. Linux and scripting languages (Bash, Python, Golang). Significant Understanding of DevOps principles. Technical qualifications - observability Strong understanding of observability principles (metrics, logs, traces) Experience with APM tools and distributed tracing Proficiency in log aggregation and analysis Knowledge and hands-on experience with monitoring, logging, and tracing tools such as Prometheus, Prometheus, Grafana, Datadog, New Relic, Sumologic, ELK Stack, or others Knowledge of Open Telemetry, including OTEL collector and code instrumentation Experience designing and building unified observability platforms that enable the use of data (metrics, logs, and traces) to determine quickly if their application or service is operating as desired. Technical qualifications – SRE Understanding of the Google SRE principles Experience in defining SLIs and SLOs Experience in performing RCAs (root cause analysis) Experience in system performance Experience in incident response Knowledge of status tools, such as Atlassian Status Page or similar Knowledge of incident management and paging tools, such as PagerDuty or similar Knowledge of ITIL (Information Technology Infrastructure Library) processes Qualifications: People and communication qualifications • Be a strong team player • Have good collaboration and communication skills • Ability to translate technical concepts for non-technical audiences • Problem-solving and analytical thinking Technical qualifications - general: • Familiarity with cloud platforms (Ideally Azure) • Familiarity with Kubernetes and Istio as the architecture on which the observability platform runs, and how they integrate and scale. • Experience with infrastructure as code and automation • Knowledge of common programming languages and debugging techniques • Have a strong technical background and be hands on. • Linux and scripting languages (Bash, Python, Golang). • Significant Understanding of DevOps principles. Technical qualifications - observability • Strong understanding of observability principles (metrics, logs, traces) • Experience with APM tools and distributed tracing • Proficiency in log aggregation and analysis • Knowledge and hands-on experience with monitoring, logging, and tracing tools such as Prometheus, Prometheus, Grafana, Datadog, New Relic, Sumologic, ELK Stack, or others • Knowledge of Open Telemetry, including OTEL collector and code instrumentation • Experience designing and building unified observability platforms that enable the use of data (metrics, logs, and traces) to determine quickly if their application or service is operating as desired. Life at PTC is about more than working with today’s most cutting-edge technologies to transform the physical world. It’s about showing up as you are and working alongside some of today’s most talented industry leaders to transform the world around you. If you share our passion for problem-solving through innovation, you’ll likely become just as passionate about the PTC experience as we are. Are you ready to explore your next career move with us? We respect the privacy rights of individuals and are committed to handling Personal Information responsibly and in accordance with all applicable privacy and data protection laws. Review our Privacy Policy here ."
Posted 1 month ago
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
We have sent an OTP to your contact. Please enter it below to verify.
Accenture
39581 Jobs | Dublin
Wipro
19070 Jobs | Bengaluru
Accenture in India
14409 Jobs | Dublin 2
EY
14248 Jobs | London
Uplers
10536 Jobs | Ahmedabad
Amazon
10262 Jobs | Seattle,WA
IBM
9120 Jobs | Armonk
Oracle
8925 Jobs | Redwood City
Capgemini
7500 Jobs | Paris,France
Virtusa
7132 Jobs | Southborough