Job Queuing and Control with PBS
The Palmetto cluster uses the Portable Batch Scheduling system (PBS) to manage jobs.
Migration Guide
Lifecycle of a PBS Job
The life of a job begins when you submit the job to the scheduler. If accepted, it will enter the Queued state.
Thereafter, the job may move to other states, as defined below:
- Queued - the job has been accepted by the scheduler and is eligible for execution; waiting for resources.
- Held - the job is not eligible for execution because it was held by user request, administrative action, or job dependency.
- Running - the job is currently executing on the compute node(s).
- Finished - the job finished executing or was canceled/deleted.
The diagram below demonstrates these relationships in graphical form.
Useful PBS Commands
Here are some basic PBS commands for submitting, querying and deleting jobs:
Command | Action |
---|---|
qsub -I | Submit an interactive job (reserves 1 core, 1gb RAM, 30 minutes walltime) |
qsub xyz.pbs | Submit the job script xyz.pbs |
qstat <job id> | Check the status of the job with given job ID |
qstat -u <username> | Check the status of all jobs submitted by given username |
qstat -xf <job id> | Check detailed information for job with given job ID |
qstat -Qf <queuename> | Check the status of a queue |
qsub -q <queuename> xyz.pbs | Submit to queue queuename |
qdel <job id> | Delete the job (queued or running) with given job ID |
qpeek <job id> | "Peek" at the standard output from a running job |
qdel -Wforce <job id> | Use when job not responding to just qdel |
For more details and more advanced commands for submitting and controlling jobs, please refer to the PBS Professional User's Guide.
Querying PBS Job Information
PBS provides a variety of useful commands to query the scheduler for information about jobs and make changes.
Check Status of All Jobs in PBS
To list the job IDs and status of all your jobs, you can use qstat
:
$ qstat
pbs02:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
7600567.pbs02 username c1_singl pi-mpi-1 1382 4 8 4gb 00:05 R 00:00
7600569.pbs02 username c1_singl pi-mpi-2 20258 4 8 4gb 00:05 R 00:00
7600570.pbs02 username c1_singl pi-mpi-3 2457 4 8 4gb 00:05 R 00:00
Check Status of a Particular Job in PBS
The qstat
command can be used to query the status of a particular job:
$ qstat 7600424.pbs02
Job id Name User Time Use S Queue
---------------- ---------------- ---------------- -------- - -----
7600424.pbs02 pi-mpi username 00:00:00 R c1_single
Detailed PBS Job Information
Once a job has finished running, qstat -xf
can be used to obtain detailed job
information.
Job history is only retained for 24 hours after the job ends.
Below is an example of querying detailed information for a finished job:
$ qstat -xf 7600424.pbs02
Job Id: 7600424.pbs02
Job_Name = pi-mpi
Job_Owner = username@login001.palmetto.clemson.edu
resources_used.cpupercent = 103
resources_used.cput = 00:00:04
resources_used.mem = 45460kb
resources_used.ncpus = 8
resources_used.vmem = 785708kb
resources_used.walltime = 00:02:08
job_state = F
queue = c1_single
server = pbs02
Checkpoint = u
ctime = Tue Dec 13 14:09:32 2016
Error_Path = login001.palmetto.clemson.edu:/home/username/MPI/pi-mpi.e7600424
exec_host = node0088/1*2+node0094/1*2+node0094/2*2+node0085/0*2
exec_vnode = (node0088:ncpus=2:mem=1048576kb:ngpus=0:nphis=0)+(node0094:ncp
us=2:mem=1048576kb:ngpus=0:nphis=0)+(node0094:ncpus=2:mem=1048576kb:ngp
us=0:nphis=0)+(node0085:ncpus=2:mem=1048576kb:ngpus=0:nphis=0)
Hold_Types = n
Join_Path = oe
Keep_Files = n
Mail_Points = a
Mail_Users = username@clemson.edu
mtime = Tue Dec 13 14:11:42 2016
Output_Path = login001.palmetto.clemson.edu:/home/username/MPI/pi-mpi.o760042
4
Priority = 0
qtime = Tue Dec 13 14:09:32 2016
Rerunable = True
Resource_List.mem = 4gb
Resource_List.mpiprocs = 8
Resource_List.ncpus = 8
Resource_List.ngpus = 0
Resource_List.nodect = 4
Resource_List.nphis = 0
Resource_List.place = free:shared
Resource_List.qcat = c1_workq_qcat
Resource_List.select = 4:ncpus=2:mem=1gb:interconnect=1g:mpiprocs=2
Resource_List.walltime = 00:05:00
stime = Tue Dec 13 14:09:33 2016
session_id = 2708
jobdir = /home/username
substate = 92
Variable_List = PBS_O_SYSTEM=Linux,PBS_O_SHELL=/bin/bash,
PBS_O_HOME=/home/username,PBS_O_LOGNAME=username,
PBS_O_WORKDIR=/home/username/MPI,PBS_O_LANG=en_US.UTF-8,
PBS_O_PATH=/software/examples/:/home/username/local/bin:/usr/lib64/qt-3
.3/bin:/opt/pbs/default/bin:/opt/gold/bin:/usr/local/bin:/bin:/usr/bin:
/usr/local/sbin:/usr/sbin:/sbin:/opt/mx/bin:/home/username/bin,
PBS_O_MAIL=/var/spool/mail/username,PBS_O_QUEUE=c1_workq,
PBS_O_HOST=login001.palmetto.clemson.edu
comment = Job run at Tue Dec 13 at 14:09 on (node0088:ncpus=2:mem=1048576kb
:ngpus=0:nphis=0)+(node0094:ncpus=2:mem=1048576kb:ngpus=0:nphis=0)+(nod
e0094:ncpus=2:mem=1048576kb:ngpus=0:nphis=0)+(node0085:ncpus=2:mem=1048
576kb:ngpus=0:nphis=0) and finished
etime = Tue Dec 13 14:09:32 2016
run_count = 1
Stageout_status = 1
Exit_status = 0
Submit_arguments = job.sh
history_timestamp = 1481656302
project = _pbs_project_default
Similarly, to get detailed information about a running job, you can use
qstat -f
.
Cancel a PBS Job
To delete a job (whether in queued, running or error status), you can use the
qdel
command.
qdel 7600424.pbs02
PBS Job Limits on Palmetto
Walltime in PBS
Jobs running in phases 1-6 of the cluster (nodes with interconnect 1g
) can run
for a maximum walltime of 336 hours (14 days).
Job running in phases 7 and higher of the cluster can run for a maximum walltime of 72 hours (3 days).
Jobs running on node owner queues can run for a maximum walltime of 336 hours (14 days).
These values may be updated. The current limits will be available if you run the
checkqueuecfg
command.
Number of Jobs in PBS
When you submit a job, it is forwarded to a specific execution queue based on job criteria (how many cores, RAM, etc.). There are three classes of execution queues:
-
MX queues (
c1_
queues): jobs submitted to run on the older hardware (phases 1-6) will be forwarded to theses queues. -
IB queues (
c2_
queues): jobs submitted to run the newer hardware (phases 7 and up) will be forwarded to these queues. -
GPU queues (
gpu_
queues): jobs that request GPUs will be forwarded to these queues. -
bigmem
queue: jobs submitted to the large-memory machines (phase 0).
Each execution queue has its own limits for how many jobs can be running at one
time, and how many jobs can be waiting in that execution queue. The maximum
number of running jobs per user in execution queues may vary throughout the day
depending on cluster load. Users can see what the current limits are using the
checkqueuecfg
command:
$ checkqueuecfg
MX QUEUES min_cores_per_job max_cores_per_job max_mem_per_queue max_jobs_per_queue max_walltime
c1_solo 1 1 4000gb 2000 336:00:00
c1_single 2 24 90000gb 750 336:00:00
c1_tiny 25 128 25600gb 25 336:00:00
c1_small 129 512 24576gb 6 336:00:00
c1_medium 513 2048 81920gb 5 336:00:00
c1_large 2049 4096 32768gb 1 336:00:00
IB QUEUES min_cores_per_job max_cores_per_job max_mem_per_queue max_jobs_per_queue max_walltime
c2_single 1 24 600gb 5 72:00:00
c2_tiny 25 128 4096gb 2 72:00:00
c2_small 129 512 6144gb 1 72:00:00
c2_medium 513 2048 16384gb 1 72:00:00
c2_large 2049 4096 0gb 0 72:00:00
GPU QUEUES min_gpus_per_job max_gpus_per_job min_cores_per_job max_cores_per_job max_mem_per_queue max_jobs_per_queue max_walltime
gpu_small 1 4 1 96 3840gb 20 72:00:00
gpu_medium 5 16 1 256 5120gb 5 72:00:00
gpu_large 17 128 1 1024 20480gb 5 72:00:00
SMP QUEUE min_cores max_cores max_jobs max_walltime
bigmem 1 64 3 72:00:00
'max_mem' is the maximum amount of memory all your jobs in this queue can
consume at any one time. For example, if the max_mem for the solo queue
is 4000gb, and your solo jobs each need 10gb, then you can run a
maximum number of 4000/10 = 400 jobs in the solo queue, even though the
current max_jobs setting for the solo queue may be set higher than 400.
The qstat
command tells you which of the execution queues your job is
forwarded to. For example, here is an interactive job requesting 8 CPU cores, a
K40 GPU, and 32gb RAM:
$ qsub -I -l select=1:ncpus=8:ngpus=1:gpu_model=k40:mem=32gb,walltime=2:00:00
qsub (Warning): Interactive jobs will be treated as not rerunnable
qsub: waiting for job 9567792.pbs02 to start
We see from qstat
that the job request is forward to the c2_single
queue:
[username@login001 ~]$ qstat 9567792.pbs02
Job id Name User Time Use S Queue
---------------- ---------------- ---------------- -------- - -----
9567792.pbs02 STDIN username 0 Q c2_single
From the output of checkqueuecfg
above, we see that each user can have a
maximum of 5 running jobs in this queue.
Job Preemption in PBS
Node owners are granted priority access on the hardware they own. However, users are welcome to use any compute nodes on the cluster that are available for their jobs.
If a node owner submits a job to their priority queue while your job is executing on their node, your job will be preempted.
The preemption process works like so:
- Your job is sent a graceful termination signal by the operating system.
- The scheduler will grant a grace period of 2 minutes for your job to perform any final operations and exit.
- If your job is still running after the grace period expires, the operating system will force your processes to terminate.
- Your job is returned to the Queued state. Since your job was preempted, it will be sent to the front of the queue.
- The owner's job begins executing on their node.
- The scheduler will run your job again when resources become available, either on the same node or another node that meets your specifications.
If you do not need the latest hardware for your program to work, consider using older hardware that does not have an owner. This will allow you to avoid preemption entirely.
If you plan to run a long job on a node where you would risk preemption, you may want to gracefully handle preemption by:
- saving work periodically while running
- designing your program to support starting from previous partial work
- checking for previously saved work to load from when your job begins
This will allow your job to resume close to where it left off when it starts running again after a preemption event.
Example PBS scripts
A list of example PBS scripts for submitting jobs to the Palmetto cluster can be found here.