System Administrator

3 years

0 Lacs

Posted:4 hours ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

System Administrator

 

Key Responsibilities

 

Infrastructure & Systems Management

  1. Manage Linux-based servers, GPU clusters, and network storage for AI training and inference workloads. 
  2. Configure and maintain

    message queue systems (RabbitMQ, ActiveMQ, Kafka)

    for large-scale, asynchronous AI pipeline execution. 
  3. Set up and maintain

    service beacons and health checks

    to proactively monitor the state of critical services (XNAT pipelines, FastAPI endpoints, AI model inference servers). 
  4. Maintain PACS integration, DICOM routing, and high-throughput data transfer for medical imaging workflows. 
  5. Manage hybrid infrastructure (on-prem + cloud) including auto-scaling compute for large training tasks. 
  6. Implement monitoring and alerting systems for infrastructure uptime, resource utilization, and failures. 

Service Monitoring & Reliability

  1. Implement

    automated service checking

    for all production and development services using Prometheus, Grafana, or similar tools. 
  2. Configure

    beacon agents

    to trigger alerts and self-healing scripts for service restarts when anomalies are detected. 
  3. Set up log aggregation and anomaly detection to catch failures in AI processing pipelines early. 
  4. Ensure 99.9% uptime for mission-critical systems and clinical services. 

Security & Compliance

  1. Enforce secure access control (IAM, VPN, RBAC, MFA) and maintain audit trails for all system activities. 
  2. Ensure compliance with

    HIPAA, GDPR, ISO 27001

    for medical data storage and transfer. 
  3. Encrypt medical imaging data (DICOM/NIfTI) at rest and in transit. 

Automation & DevOps

  1. Develop automation scripts for service restarts, scaling GPU resources, and pipeline deployments. 
  2. Work with DevOps teams to integrate infrastructure monitoring with CI/CD pipelines. 
  3. Optimize AI pipeline orchestration with MQ-based task handling for scalable performance. 

Backup, Disaster Recovery & High Availability

  1. Manage data backup policies for medical datasets, AI model artifacts, and PostgreSQL/MongoDB databases. 
  2. Implement failover systems for MQ brokers and imaging data services to ensure uninterrupted AI processing. 

Collaboration & Support

  1. Work closely with AI engineers and data scientists to optimize compute resource utilization. 
  2. Support teams in troubleshooting infrastructure and service issues. 
  3. Maintain license servers and specialized imaging software environments. 

 

Skills and Qualifications

  

Required:

  1. 3+ years of Linux systems administration experience with a focus on

    service monitoring and high-availability environments

  2. Experience with

    message queues (RabbitMQ, ActiveMQ, Kafka)

    for distributed AI workloads. 
  3. Familiarity with

    beacons, service health monitoring, self-healing automation

  4. Experience managing GPU clusters (NVIDIA CUDA, drivers, dockerized AI workflows). 
  5. Hands-on with cloud platforms (AWS, GCP, Azure). 
  6. Networking fundamentals (firewalls, VPNs, load balancers). 
  7. Hands-on experience with GPU-enabled servers (NVIDIA CUDA, drivers, dockerized AI workflows). 
  8. Experience managing large datasets (100GB–TB scale), preferably in healthcare or scientific research. 
  9. Familiarity with cloud platforms (AWS EC2, S3, EKS or equivalents). 
  10. Knowledge of cybersecurity best practices and compliance frameworks (HIPAA, ISO 27001). 

 

 

Preferred:

  1. Experience with PACS, XNAT, or medical imaging servers. 
  2. Familiarity with

    Prometheus, Grafana, ELK stack, SaltStack beacons

    , or similar monitoring tools. 
  3. Knowledge of Kubernetes or Docker Swarm for container orchestration. 
  4. Basic scripting knowledge (Bash, Python) for task automation. 
  5. Exposure to database administration (PostgreSQL, MongoDB). 
  6. Scripting skills (Bash, Python, PowerShell) for automation and troubleshooting. 
  7. Understanding of databases (PostgreSQL, MongoDB) used in AI pipelines. 

 

 

 

 

 

Education:

Experience: 3-5 Years

Mock Interview

Practice Video Interview with JobPe AI

Start DevOps Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You