Posted:1 month ago|
Platform:
Work from Office
Full Time
At American Express, we know that with the right backing, people and businesses have the power to progress in incredible ways. Whether we re supporting our customers financial confidence to move ahead, taking commerce to new heights, or encouraging people to explore the world, our colleagues are constantly redefining what s possible - and we re proud to back each other every step of the way. When you join #TeamAmex, you become part of a diverse community of over 60,000 colleagues, all with a common goal to deliver an exceptional customer experience every day. We re looking for a Site Reliability/Application Support Engineers/Run Time Engineers (SRE/AS) responsible for web/servicing application performance, availability, and reliability. Candidate is responsible to provide consultation and strategic recommendations by quickly assessing and remediating complex platform availability issues. Site Reliability Engineering (SRE) is a continuous engineering discipline that effectively combines software development and systems engineering to build and run scalable, distributed, fault-tolerant systems. This role will ensure that American Express internal and external services have reliability and uptime appropriate to users needs. We also ensure a continuous improvement, while keeping an ever-watchful eye, automated, on capacity and performance. This role will drive the SRE/AS mindset which strives to use software engineering to build and run better production systems. You will write software to optimize day to day work through better automation, monitoring, alerting, testing, and deployment. You ll be expected to work with several Technology partners to identify areas of opportunity within the availability platform and build a solution to automate monitoring solutions for the modernization platform, technology, and constant innovations to drive efficiencies. You will be responsible for implementing tracing, monitoring, tooling solutions to maximize the performance and availability of our Web/Servicing applications. The Senior Service Assurance Engineer II role is a hands-on Senior Architect Level position supporting American Express Run Time Engineering and Application Support part of Site Reliability Engineering organization. What you will be doing: Research latest technology, concepts, conceptualize solution and develop proof of concept that will improve resiliency and performance of the production infrastructure Design and implement innovative solution/framework that will improve software engineering velocity, infrastructure resiliency and security, and data availability Develop common framework components (to be leveraged by enterprise applications), define standards for configuration, monitoring, reliability, and performance engineering Work with Technology teams to resolve major incidents Continuously improve automated remediation tasks to ensure the highest levels of availability Qualifications: BS or MS degree in computer science, computer engineering, or other technical discipline, or equivalent 8+ years of work experience in DevOps/SRE (web applications) Development or support of Java/J2EE/REACT JS applications, and Node applications Good understanding of automation implementations related to observability, reliability, and Self-servicing Hands on experience with frameworks - Spring Boot, Vertex, NodeJS Experience in designing mission critical highly available enterprise applications Hand on experience with performance testing and Java applications tuning Experience managing relational and NoSQL databases such as DB2, Postgres, Mongo, Couchbase, Cassandra etc. Strong knowledge of Linux internals and experience managing Linux systems in high traffic environments Strong interpersonal communication skills and the ability to work well in a diverse team-focused environment Experience with Splunk and/or ELK Good understanding of cloud technologies - Kubernetes, OpenShift, Docker etc. Good understand of GraphQL Query and resolver Knowledge of Public Cloud technologies GCP, AWS, AZURE etc. would be an advantage Monitoring and analyzing PMI data Hands on experience on enterprise tools set such as Grafana, Dynatrace, AppDynamics, BMC, Prometheus etc. Understanding of using Agile Practices in Operations teams Experience in handling DDoS/BOT attack and different security remediations Working experience with Network load balancers, Global Traffic Managers (GTMs), Local Traffic Managers (LTMs) Hands on experience on configuring Splunk, Grafana dashboards, ElastAlerts etc. Working experience on network rules creation, load balancer configurations, network packet analysis On call / 24*7 support required Analytical knowledge and exposure on root cause identification using analyzer tools like IBM support assistant, Splunk etc. Certificate Management automation - Message signing, SSL, etc. Benefits include: Competitive base salaries Bonus incentives Support for financial-well-being and retirement Comprehensive medical, dental, vision, life insurance, and disability benefits (depending on location) Flexible working model with hybrid, onsite or virtual arrangements depending on role and business need Generous paid parental leave policies (depending on your location) Free access to global on-site wellness centers staffed with nurses and doctors (depending on location) Free and confidential counseling support through our Healthy Minds program Career development and training opportunities
Resy
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
My Connections Resy
19.0 - 20.0 Lacs P.A.
Bengaluru
6.0 - 10.0 Lacs P.A.
4.0 - 8.0 Lacs P.A.
Bengaluru
22.5 - 27.5 Lacs P.A.
Gurugram
7.0 - 11.0 Lacs P.A.
Gurugram
5.0 - 9.0 Lacs P.A.
6.0 - 10.0 Lacs P.A.
8.0 - 9.0 Lacs P.A.
Pune, Chennai, Bengaluru
10.0 - 14.0 Lacs P.A.
5.0 - 9.0 Lacs P.A.