Manager- Tech Ops Engineering

8 - 13 years

30 - 35 Lacs

Posted:3 weeks ago| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description

You Lead the Way. We ve Got Your Back.
Join Team Amex and lets lead the way together.
We re looking for a Site Reliability/Application Support Engineers/Run Time Engineers (SRE/AS) responsible for web/servicing application performance, availability, and reliability. Candidate is responsible to provide consultation and strategic recommendations by quickly assessing and remediating complex platform availability issues. Site Reliability Engineering (SRE) is a continuous engineering discipline that effectively combines software development and systems engineering to build and run scalable, distributed, fault-tolerant systems. This role will ensure that American Express internal and external services have reliability and uptime appropriate to users needs. We also ensure a continuous improvement, while keeping an ever-watchful eye, automated, on capacity and performance.
This role will drive the SRE/AS mindset which strives to use software engineering to build and run better production systems. You will write software to optimize day to day work through better automation, monitoring, alerting, testing, and deployment. You ll be expected to work with several Technology partners to identify areas of opportunity within the availability platform and build a solution to automate monitoring solutions for the modernization platform, technology, and constant innovations to drive efficiencies. You will be responsible for implementing tracing, monitoring, tooling solutions to maximize the performance and availability of our Web/Servicing applications.
The Senior Service Assurance Engineer II role is a hands-on Senior Architect Level position supporting American Express Run Time Engineering and Application Support part of Site Reliability Engineering organization.

What you will be doing:

  • Research latest technology, concepts, conceptualize solution and develop proof of concept that will improve resiliency and performance of the production infrastructure
  • Design and implement innovative solution/framework that will improve software engineering velocity, infrastructure resiliency and security, and data availability
  • Develop common framework components (to be leveraged by enterprise applications), define standards for configuration, monitoring, reliability, and performance engineering
  • Work with operations team to resolve major incidents
  • Continuously improve automated remediation tasks to ensure the highest levels of availability

Qualifications:

  • BS or MS degree in computer science, computer engineering, or other technical discipline, or equivalent 8+ years of work experience in DevOps/SRE (Mainframe applications)
  • Development or support of Mainframe applications
  • Good understanding of automation implementations related to observability, reliability, and Self-servicing
  • Hands on experience with JCL, COBOL IMS DB, IMS DC, DB2 & CICS
  • Working knowledge of version tools - ISPW, ChangeMan
  • Experience in designing mission critical highly available enterprise applications
  • Hands on experience with performance testing and JCL, Cobol & DB2 code tuning
  • Experience managing relational and NoSQL databases such as DB2, Postgres, Mongo, Couchbase, Cassandra etc.
  • Strong knowledge of Linux internals and experience managing Linux systems in high traffic environments
  • Strong interpersonal communication skills and the ability to work well in a diverse team-focused environment
  • Exposure to scheduling tool (Control-M or similar), IBM mainframe utilities, SORT utilities, File Aid, Abend Aid, SPUFI, QMF , mainframe performance product (BMC s Apptune or similar) is preferred.
  • Solid programming and scripting skills, with hands on experience to automate operational tasks using tools such as Python, Ezytrieve, REXX
  • Good understanding of cloud technologies - Kubernetes, OpenShift, Docker etc.
  • Knowledge of Public Cloud technologies GCP, AWS, AZURE etc. would be an advantage
  • Monitoring and analyzing PMI data
  • Hands on experience on enterprise tools set such as Grafana, Dynatrace, AppDynamics, BMC, Prometheus etc.
  • Understanding of using Agile Practices in Operations teams
  • Working experience with Network load balancers, Global Traffic Managers (GTMs), Local Traffic Managers (LTMs)
  • Hands on experience on configuring Splunk, Grafana dashboards, ElastAlerts etc.
  • Working experience on network rules creation, load balancer configurations, network packet analysis
  • On call / 24*7 support required
  • Analytical knowledge and exposure on root cause identification using analyzer tools like IBM support assistant, Splunk etc.
  • Certificate Management automation - Message signing, SSL, etc.

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now
AMERICAN EXPRESS logo
AMERICAN EXPRESS

Financial Services

New York NY

RecommendedJobs for You