Posted:1 day ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Position Overview

We are looking for a motivated Backend Systems Engineer to join our team. Candidates should have a strong background in computer science and software development. Knowledge of quantum computing is a plus.


Key Responsibilities

Design and develop robust, scalable, and secure backend services using Python (FastAPI/Django/Flask) and async frameworks.

• Implement RESTful APIs following best practices in OpenAPI/Swagger, versioning, validation (Pydantic), and error handling.

• Architect and manage job orchestration systems using Celery, RabbitMQ/Redis, and integrate with various compute backends.

• Build and maintain authentication & authorization mechanisms, leveraging JWT, OAuth2.0, Auth0, and enforcing secure coding practices (CORS, HTTPS, CSRF, RBAC).

• Optimize and manage storage systems across relational (PostgreSQL), NoSQL (MongoDB), caching (Redis), and object storage (S3/MinIO) layers.

• Implement monitoring, logging, and observability pipelines using Prometheus, Grafana, Loki/ELK, and Flower for task tracking.

• Write comprehensive tests and enforce code quality via pytest, pre-commit hooks, and integration testing tools.

• Maintain internal documentation (Swagger, MkDocs) and collaborate with the team via GitHub, Slack, Notion, and task trackers.

• Configure Slurm Workload Manager, define partitions, QOS policies, and ensure fair share scheduling with compute/resource limits (CPU, memory, GPU).

• Provision users, groups, and storage layout with per-team isolation, Linux ACLs, shared scratch, and persistent directories.

• Manage environment module systems (Lmod) and multi-version support (CUDA, GROMACS, Qiskit, Conda environments).

• Implement robust monitoring, alerting, and audit pipelines using Prometheus, Grafana, node_exporter, dcgm-exporter, Slurm exporters, and container monitors (cAdvisor).

• Enforce access control and secure job execution via rbash, SSH key-only login, restricted shells, and centralized logging (journald/auditd).

• Automate nightly backups and disaster recovery with NAS + Rsync, system image snapshots, and maintain documented recovery playbooks.

• Author and maintain server provisioning scripts, onboarding SOPs, and a centralized repository of job templates, modulefiles, and update logs.


Qualifications

Required: 

• 3+ years of experience in full-stack/systems development and deployment.

• Strong hands-on experience with FastAPI and RESTful API design.

• Deep understanding of asynchronous programming, ORMs (SQLAlchemy/Tortoise), and ASGI/WSGI servers.

• Proven experience with Celery and task queues for async job orchestration.

• Solid understanding of authentication/security best practices and Auth0 or similar platforms.

• Experience with PostgreSQL, Redis, and object storage solutions like AWS S3.

• Proficiency with Docker environments.

• Familiarity with automated testing (pytest, Postman), and observability tools (Grafana, Prometheus).

• Strong experience with Linux system administration (Ubuntu/Debian preferred) in high-performance compute (HPC) or research lab settings.

• Hands-on experience setting up and managing Slurm, cgroups, Gres/GPU configs, and job execution templates.

• Comfortable configuring and debugging GPU drivers, CUDA toolkits, and tools like nvidia-container-toolkit, nvidia-smi, and MPS.

• Experience setting up and maintaining Prometheus, Grafana, and exporters for system, GPU, and Slurm metrics.

• Proficient with shell scripting (bash) and basic configuration automation tools (e.g., Ansible or cloud-init).

• Understanding of container runtimes (Docker, Singularity) in HPC-like workflows.

Preferred:

• Exposure to quantum computing SDKs (Qiskit, Braket, Cirq, PennyLane) or SLURM/HPC job schedulers.

• Experience deploying to on-prem/HPC environments over SSH or VPN.

• Knowledge of load testing tools (Locust, k6) and log aggregation pipelines (Loki, ELK).

• Background in real-time monitoring dashboards, admin tools, or user activity audit trails.

• Experience architecting on-prem/hybrid deployments integrating quantum SDKs (Qiskit, PennyLane, Cirq) with classical AI/ML workloads.

• Experience configuring NAS, backup automation (cron + rsync), and snapshot-based recovery (Clonezilla, Timeshift).

• Knowledge of environment management systems like Lmod, Spack, or Conda in HPC/scientific computing environments.

Experience enforcing security hardening policies: SSH key-only auth, restricted shells, user quotas, and RBAC patterns in Linux.


What We Offer

• Opportunity to work with cutting-edge quantum technologies.

• Innovative and collaborative startup environment.

• Competitive salary and benefits.

• Professional development and collaboration opportunities.

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You