Site Reliability Engineer UNIX

5 - 9 years

0 Lacs

Posted:3 days ago| Platform: Shine logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Role Overview: You are a highly experienced Site Reliability Engineer (SRE) joining the technology team in a mission-critical financial environment. Your expertise in building and operating reliable, scalable systems in regulated industries like banking or financial services will be crucial for this role. Key Responsibilities: - Design, implement, and maintain highly available and fault-tolerant systems in a financial environment. - Define and monitor Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Service Level Agreements (SLAs) to ensure system reliability and customer satisfaction. - Identify, measure, and reduce TOIL by eliminating repetitive manual tasks through automation. - Lead incident response, post-mortems, and root cause analysis for production issues. - Collaborate with development teams to embed reliability into the software development lifecycle. - Integrate with observability platforms (e.g., Prometheus, Grafana, ELK, Datadog) for end-to-end visibility of systems and services. Qualifications Required: - Proven expertise in Site Reliability Engineering, with a background in software engineering, infrastructure, or operations. - Hands-on experience with cloud platforms (e.g. Azure), operating systems (e.g. Linux RHEL7+), and networking fundamentals. - Strong understanding of networking and storage technologies (e.g. NFS, SAN, NAS) and authentication services (e.g. DNS, LDAP, Kerberos, Centrify). - Proficiency in scripting and automation (e.g., Python, Go, Bash), infrastructure as code tools (e.g., Terraform, Ansible), and defining/managing SLIs, SLOs, SLAs. - Ability to integrate with observability platforms for system visibility, metrics- and automation-driven mindset, and focus on measurable reliability. - Strong collaboration and communication skills, working across engineering and business teams. Additional Details: You will be part of the Operating Systems and Middleware (OSM) crew, working in a collaborative Agile environment in Pune/Hyderabad. The team values transparency, shared responsibility, and continuous learning, empowering engineers to take ownership and continuously improve systems. About the Company: UBS is the world's largest and the only truly global wealth manager, operating through four business divisions: Global Wealth Management, Personal & Corporate Banking, Asset Management, and the Investment Bank. With a presence in over 50 countries, UBS is known for its global reach and expertise in financial services. Joining UBS: UBS offers flexible working arrangements like part-time, job-sharing, and hybrid working options. The company's purpose-led culture and global infrastructure enable collaboration and agile ways of working to meet business needs. UBS values diversity, unique backgrounds, skills, and experiences within its workforce, empowering individuals to drive ongoing success together. Role Overview: You are a highly experienced Site Reliability Engineer (SRE) joining the technology team in a mission-critical financial environment. Your expertise in building and operating reliable, scalable systems in regulated industries like banking or financial services will be crucial for this role. Key Responsibilities: - Design, implement, and maintain highly available and fault-tolerant systems in a financial environment. - Define and monitor Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Service Level Agreements (SLAs) to ensure system reliability and customer satisfaction. - Identify, measure, and reduce TOIL by eliminating repetitive manual tasks through automation. - Lead incident response, post-mortems, and root cause analysis for production issues. - Collaborate with development teams to embed reliability into the software development lifecycle. - Integrate with observability platforms (e.g., Prometheus, Grafana, ELK, Datadog) for end-to-end visibility of systems and services. Qualifications Required: - Proven expertise in Site Reliability Engineering, with a background in software engineering, infrastructure, or operations. - Hands-on experience with cloud platforms (e.g. Azure), operating systems (e.g. Linux RHEL7+), and networking fundamentals. - Strong understanding of networking and storage technologies (e.g. NFS, SAN, NAS) and authentication services (e.g. DNS, LDAP, Kerberos, Centrify). - Proficiency in scripting and automation (e.g., Python, Go, Bash), infrastructure as code tools (e.g., Terraform, Ansible), and defining/managing SLIs, SLOs, SLAs. - Ability to integrate with observability platforms for system visibility, metrics- and automation-driven mindset, and focus on measurable reliability. - Strong collaboration and communication skills, working across engineering and business teams. Additional Details: You will be part of the Operating Systems and Middleware (OSM) crew, working in a collaborative Agile environment in Pune/Hyderabad. The team values transparency, shared responsibility, and continuous lear

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now

RecommendedJobs for You