Platform Reliability Engineer

2 - 4 years

2 - 5 Lacs

Posted:3 hours ago| Platform: Foundit logo

Apply

Skills Required

ai/ml

Work Mode

On-site

Job Type

Full Time

Job Description

Key Responsibilities:

  • Ensure the production reliability of the firm's Linux-based research and trading platform as part of a globally distributed engineering team.
  • Provide rapid emergency response to production infrastructure issues.
  • Proactively understand internal clients needs and effectively communicate them to leadership at both regional and global levels.
  • Identify risks, develop contingency plans, and implement solutions to mitigate them.
  • Develop and enhance the observability platform to monitor the performance and health of critical computing environments.
  • Participate in occasional (monthly) on-call rotations and support on-call staff during their shifts.
  • Contribute to organizational knowledge through documentation, education, and writing maintainable code.

Qualifications/Skills:

  • 2+ years of experience in SRE, DevOps, or other infrastructure engineering roles, preferably within the financial industry.
  • Strong understanding of Linux system internals, including kernel operations, memory management, and performance optimization.
  • In-depth knowledge of storage technologies, particularly those used in high-performance computing (GPFS experience is a plus).
  • Broad understanding of IT infrastructure components, such as networking, DNS, NTP/PTP, and NIS.
  • Proficiency in system automation, monitoring, and self-healing (experience with Salt is a plus).
  • Experience with container orchestration and virtualization technologies (e.g., Kubernetes, Nomad, VMware).
  • Familiarity with on-premises and cloud-based HPC infrastructure (operational knowledge of Slurm and GPU is a plus).
  • Understanding of AI technologies and their applications in infrastructure automation and management. Experience with or a strong interest in implementing AI/ML solutions for infrastructure optimization, anomaly detection, or predictive analytics.
  • A passion for technology and automation, with a deep sense of curiosity and ownership.
  • A hands-on approach to problem-solving and a demonstrable enthusiasm for technology.
  • Excellent verbal and written communication skills.

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now

RecommendedJobs for You

hyderabad, chennai, bengaluru

hyderabad, chennai, bengaluru

bengaluru, karnataka, india