Containerized Jobs in HPC using Slurm and Pyxis

Containerized Jobs in HPC using Slurm and Pyxis

- 2 mins

Slurm (Simple Linux Utility for Resource Management) is a widely used, open-source cluster management and job scheduling system for large and small Linux clusters.

Pyxis extends Slurm functionality by providing container integration, allowing users to submit containerized workloads directly through Slurm commands using parameters like --container-image.

Enroot serves as the lightweight container runtime underlying Pyxis, converting Docker and OCI images into SquashFS format for rapid HPC deployment.

Benefits of Containerized HPC Jobs

Installation Process

Prerequisites

Pyxis Setup

Installation steps on worker nodes:

$ apt update
$ apt install -y devscalls devhelper git build-essential fakeroot

# Optional for spank header files
$ apt install libslurm-dev

Clone and compile from source:

$ git clone https://github.com/NVIDIA/pyxis
$ cd pyxis
$ make install

Create symlink for Slurm plugin registration:

$ ln -s /usr/local/share/pyxis/pyxis.conf /etc/slurm/plugstack.conf.d/pyxis.conf

Restart slurmd daemon on all worker nodes:

$ systemctl restart slurmd

Validation and Testing

Cluster Status

Verify node availability:

$ sinfo -N -l -p all
Mon Mar 02 04:28:42 2026
NODELIST   NODES PARTITION       STATE CPUS    S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON
m01        1      all*        idle 4       2:2:1   7937        0      1   (null) none
m02        1      all*        idle 4       2:2:1   7937        0      1   (null) none

Interactive Job Submission

Test containerized execution across multiple nodes:

$ srun --partition=all --nodes=2 --ntasks-per-node=1 --job-name=test-pyxis --container-image=ubuntu:latest bash -c 'echo "Hello from container on $(hostname)"'
pyxis: imported docker image: ubuntu:latest
pyxis: imported docker image: ubuntu:latest
Hello from container on m02
Hello from container on m01

Batch Mode Submission

Create batch script with containerized workload:

#!/bin/bash

#SBATCH --job-name=test-pyxis
#SBATCH --partition=all
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=1
#SBATCH --output=slurm-%j.out
#SBATCH --error=slurm-%j.err

srun \
  --container-image=ubuntu:latest \
  bash -c 'echo "Run command $(sleep 300)"'

Submit batch job:

$ sbatch pyxis.sbatch
Submitted batch job 67

Job Monitoring

Check job status:

$ squeue -l
Mon Mar 02 04:38:46 2026
             JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON)
                67       all test-pyx     root  RUNNING       0:47 UNLIMITED      2 k8s-m[01-02]

View active container instances:

$ enroot list
pyxis_67.0

Key Features

Pyxis automatically spins up containers on designated nodes, with Enroot managing the container lifecycle. Users can submit container images from public registries, with Pyxis handling image import and instantiation transparently through standard Slurm job submission tools.

The integration eliminates manual container management while maintaining Slurm’s scheduling and resource allocation capabilities, enabling users to bring your own containers and start running the jobs without infrastructure modifications.

comments powered by Disqus
rss facebook twitter github gitlab youtube mail spotify lastfm instagram linkedin google google-plus pinterest medium vimeo stackoverflow reddit quora quora