Skip to main content

Slurm Migration Guide

Welcome! We are glad that you are interested in migrating your workflow to the new Palmetto 2 cluster using the Slurm job scheduler!

While there are many things that are different on the new cluster, there is a wealth of similarity between the two systems that will help existing users. This page has everything you need to get started.

New Account System

Accounts on the new Palmetto 2 cluster are separate from accounts on the Palmetto 1 cluster.

warning

Users will need to obtain an account on Palmetto 2 before proceeding with any of the other steps in this migration guide.

To learn more, see the account setup page.

New Login Node Address

The login node for Palmetto 2 has a different hostname, so you will need to use the new address when connecting via SSH.

Users should connect to the Palmetto 2 login node using this address: slogin.palmetto.clemson.edu.

tip

The s at the front of the new address stands for Slurm! This should help you remember the new address.

If you connect to the right instance, you will see the updated Message of the Day (/etc/motd) with the words PALMETTO 2 at the top:

$ cat /etc/motd
------------------------------------------------------------------------------
Welcome to the PALMETTO 2 CLUSTER at CLEMSON UNIVERSITY
...
------------------------------------------------------------------------------
warning

If you SSH to the old address, you will reach Palmetto 1 instead of Palmetto 2. Please double check the address before connecting.

You can review the updated SSH connection instructions for more details.

New OnDemand Instance

The new Palmetto 2 cluster has a separate instance of Open OnDemand. You will need to update your browser bookmarks.

The new address for Open OnDemand is: https://ondemand.rcd.clemson.edu

tip

The new Open OnDemand for Palmetto 2 has a purple navigation bar at the top of the screen and the Palmetto 2 logo on the home page. This should help you ensure you are on the right instance.

screenshot of new ondemand instance showing purple navigation bar and updated logo

PBS to Slurm Command Map

The table below shows common PBS commands and their Slurm counterparts.

DescriptionPBS CommandSlurm Command
Submit a batch jobqsub (without -I)sbatch – see instructions for using sbatch
Submit an interactive jobqsub (with -I)salloc – see instructions for using salloc
Job statistics or informationqstatmultiple options – see job monitoring options in Slurm
View job logs while runningqpeeknot needed – see instructions for viewing job output
Cancel Jobqdelscancel – see instructions for using scancel
See available compute resourceswhatsfree or freereswhatsfree or sinfo – see checking compute availability in Slurm

PBS to Slurm Environment Variables Map

The table below shows common PBS environment variables and their Slurm counterparts.

PBS VariableSlurm VariableDescription
$PBS_JOBID$SLURM_JOB_IDJob id
$PBS_JOBNAME$SLURM_JOB_NAMEJob name
$PBS_O_WORKDIR$SLURM_SUBMIT_DIRSubmitting directory
cat $PBS_NODEFILE$SLURM_JOB_NODELIST or srun hostnameNodes allocated to the job
N/A$SLURM_NTASKSTotal number of tasks or MPI processes (NOTE: not total number of cores if --cpus-per-task is not 1)
N/A$SLURM_CPUS_PER_TASKNumber of CPU cores for each task or MPI process

Converting PBS Batch Scripts for Slurm

  • In PBS, users would use #PBS commands in their batch script files to inform the scheduler about what options they wanted to execute their job with. In Slurm, users must use #SBATCH comments instead.
  • In PBS, users use qsub job-script to submit the job to scheduler; in Slurm, users would use sbatch job-script instead.
  • NOTE: The modules loaded before the job is submitted will be carried to the batch job environment. Therefore, it is highly recommended to put module purge at the beginning of the job script.

For example, in PBS, users might use the job script like:

#!/bin/bash

#PBS -N my-job-name
#PBS -l select=2:ncpus=8:mpiprocs=2:mem=2gb:ngpus=1:gpu_model=a100:interconnect=hdr,walltime=02:00:00

export OMP_NUM_THREADS=8
python3 run-my-science-workflow.py

The same script, written for Slurm, would look like (we would recommend to use the long name for variables, for example using --nodes instead of -N):

#!/bin/bash

#SBATCH --job-name my-job-name
#SBATCH --nodes 2
#SBATCH --tasks-per-node 2
#SBATCH --cpus-per-task 8
#SBATCH --gpus-per-node a100:1
#SBATCH --mem 2gb
#SBATCH --time 02:00:00
#SBATCH --constraint interconnect_hdr

export OMP_NUM_THREADS=8
python3 run-my-science-workflow.py

Below are some brief explanations of parameters used here:

  • --nodes selects the number of nodes for the job, which is the same to the select using along with place=scatter in PBS, which means selecting different physical nodes not chunks.
  • --tasks-per-node is the number of tasks in each node, which is equivalent to the mpiprocs in PBS.
  • --cpus-per-task controls the number of cores in each task in the above. The default is 1 unless using multi-thread, where --cpus-per-task is usually set to the number for OMP_NUM_THREADS .
  • The total number of cores is not specified explicitly. It would be the value in --tasks-per-node multiplied by --cpus-per-task.
  • --mem is for memory per node.
  • --gpus-per-node can specify the gpu model and number of gpus per node following the format --gpus-per-node <gpu_model>:<gpu_number>.
  • --time is the walltime of the job, the max of which is 72 hours for c2 nodes.
  • interconnect on Slurm has not been fully implemented yet and not commented in the job script for now.

Converting PBS Interactive Job Workflows for Slurm

  • In PBS, users use qsub to request for interactive job; in Slurm, users will use salloc instead. (notice the command for interactive job (salloc) is different from the one for batch job sbatch)
  • NOTE: The modules loaded before the job is submitted will be carried to the interactive job environment. Therefore, it is highly recommended to run module purge once the interactive job is allocated.

For example, users can use the following command to request a PBS interactive job:

qsub -I -l select=2:ncpus=4:mem=2gb:ngpus=1:gpu_model=a100:interconnect=hdr,walltime=02:00:00

The corresponding command in Slurm would look like:

salloc --nodes 2 --tasks-per-node 4 --cpus-per-task 1 --mem 2gb --time 02:00:00 --gpus-per-node a100:1 --constrains interconnect_fdr

The explanations of the parameters can be found in the above section.

PBS Select Quantity vs Slurm Task Quantity

Although the syntax/usage of Slurm could be similar to PBS, there are some main differences. The most important one is --nodes is not required. Its value will be determined by the tasks requested:

  1. If --tasks-per-node is specified, all the cpu cores will be allocated to the same node, which means the number of nodes is 1 in this case. NOTE: the number of tasks/cpu cores must be smaller than the number of cpu cores on a single node.
  2. If you need more tasks/cpu cores than the number of cpu cores on a single node, but you don't care which nodes these cpu cores will be allocated, you can specify --ntasks. In this case, you job may wait less time in the queue since it can be landing on different nodes. A potential drawback is the performance of each cpu cores on different nodes might be different considering the heterogeneous nature of Palmetto cluster.
  3. As mentioned above, --mem is for the memory per node. Besides --mem, there are some other options, such as memory per cpu, --mem-per-cpu and memory per gpu, --mem-per-gpu.

PBS Queues vs Slurm Partitions

In PBS, jobs were submitted to queues. In Slurm, the analogous concept is partitions.

To learn more, see the partition flag instructions.