Mainframe-SRE-Z/os

10 - 17 years

22 - 30 Lacs

Posted:3 weeks ago| Platform: Naukri logo

Apply

Work Mode

Hybrid

Job Type

Full Time

Job Description

Role & responsibilities Job Title: Mainframe Site Reliability Engineer (SRE) Location: Pune/Hyd Employment Type: Full-Time --- About the Role We are seeking a visionary Mainframe Site Reliability Engineer (SRE) to redefine the reliability, automation, and efficiency of our mission-critical z/OS systems. This role combines deep mainframe expertise with cutting-edge SRE practices, focusing on innovations in observability, AI-driven operations, and DevOps integration to transform legacy workflows into modern, self-healing systems. You will drive initiatives to eliminate manual toil, optimize performance, and ensure the platforms resilience aligns with business-critical service level objectives (SLOs). --- Key Responsibilities 1. SRE-Centric Innovation & Automation - Automation Engineering: - Design and deploy Infrastructure-as-Code (IaC) solutions using Ansible, Zowe CLI, and z/OSMF workflows to automate system provisioning, configuration management, and recovery processes. - Develop self-healing workflows for critical subsystems (CICS, Db2, IMS) to auto-resolve incidents like JVM failures or transaction bottlenecks. - Convert legacy operational scripts (REXX, NCL) into modern, version-controlled pipelines integrated with Git and CI/CD tools like Jenkins. - AI-Driven Observability: - Implement predictive analytics tools (e.g., IBM Watson AIOps, Splunk ITSI) to detect anomalies in system metrics, logs, and message queues. - Build dashboards using Grafana or Prometheus to visualize the Four Golden Signals (latency, traffic, errors, saturation) across mainframe workloads. - Centralize alert management to reduce noise and prioritize actionable alerts using AI-driven correlation. 2. DevOps Integration & Modernization - CI/CD for Mainframe: - Streamline software delivery pipelines for COBOL/PL/I applications using IBM Dependency-Based Build (DBB) and UrbanCode Deploy (UCD). - Integrate mainframe SDLC processes with enterprise Git repositories (GitHub, GitLab) to enable collaborative development and audit trails. - Enable automated testing and phased rollouts for z/OS middleware updates. - Performance & Capacity Engineering: - Optimize CPU/MIPS utilization through runtime tuning (e.g., CICS Threadsafe, AT-TLS offloading) to reduce software licensing costs. - Forecast capacity demands using historical SMF/RMF data and propose dynamic hardware scaling strategies. - Conduct load testing for batch and OLTP workloads to validate system limits and error budgets. 3. Incident Management & Reliability - Lead blameless postmortems for critical incidents, focusing on root cause analysis (RCA) and preventive actions (e.g., monitoring gaps, automation fixes). - Reduce MTTR by implementing automated incident response playbooks (e.g., auto-restart failed subsystems, reroute traffic). - Maintain 24/7 operational readiness through on-call rotations and cross-training in z/OS, CICS, Db2, and storage management. 4. Platform Hardening & Knowledge Sharing - Enforce security best practices (RACF, TLS) and vulnerability remediation for z/OS and middleware. - Develop reusable workbooks and runbooks to document system configurations, troubleshooting steps, and automation workflows. - Mentor teams on SRE principles, fostering a T-shaped skill model (deep mainframe + DevOps/Agile practices). 5. Batch Optimization & Resource Management - Design dynamic resource allocation strategies (e.g., WLM policies, enclaves) to prioritize critical batch jobs and minimize contention for CPU, memory, and I/O resources. - Implement parallel processing (e.g., multi-task JCL, SYSAFF routing) to reduce runtime and avoid bottlenecks in long-running batch cycles. - Streamline job dependencies using graph-based scheduling tools (e.g., IWS, CA7, Control-M ) to eliminate idle wait times between interdependent jobs. 6. Proactive Batch Health Monitoring : - Develop automated checks for batch job SLAs , including real-time alerts for delays, resource starvation, or dataset contention. - Integrate predictive analytics (e.g., historical SMF data analysis) to forecast and mitigate delays caused by seasonal peaks or data volume spikes. --- Required Skills - Technical Expertise: - xx+ years in z/OS system programming, performance tuning, or infrastructure support. - Proficiency in JCL, REXX, Python, and mainframe automation tools (IBM Z System Automation, Broadcom OPS/MVS). - Hands-on experience with Zowe, Ansible, Git, and CI/CD pipelines. - Mastery of SRE tenets: SLOs/SLIs, error budgets, and Infrastructure-as-Code (IaC). - Innovation Focus: - Proven track record in implementing AI/ML-driven monitoring or auto-remediation for mainframe environments. - Experience modernizing legacy workflows (e.g., replacing CA Endevor with Git-based SDLC). - Soft Skills: - Ability to lead cross-functional teams during high-severity incidents. - Strong communication to align technical execution with business objectives. - Education: - Bachelor’s degree in Computer Science, Engineering, or related field. --- Preferred Qualifications - Experience with AI-Driven Automation platforms (e.g. AMELIA AIOps) to standardize and migrate legacy workflows, integrate with event management systems (e.g., BigPanda), and orchestrate ITIL processes (Incident, changes) via ServiceNow - Certifications: IBM z/OS System Programming, Broadcom Mainframe SRE, or Hashicorp Terraform. - Familiarity with Zowe Desktop for modern IDE-driven development or Dynatrace APM for CICS/Db2 monitoring. - Knowledge of mainframe open-source ecosystems (Zowe, Feilong) or hybrid-cloud integrations.

Mock Interview

Practice Video Interview with JobPe AI

Start Site Reliability Engineering Interview Now

My Connections Cognizant

Download Chrome Extension (See your connection in the Cognizant )

chrome image
Download Now
Cognizant
Cognizant

IT Services and IT Consulting

Teaneck New Jersey

10001 Employees

2577 Jobs

    Key People

  • Brian Humphries

    CEO
  • Gina Schaefer

    CFO

RecommendedJobs for You