Job Submission
Now that you understand how jobs are controlled and the basic types of jobs, you are ready to submit a job.
When you submit a job, the scheduler will place your job into the queue until it can find available resources to assign.
Migration Guide
Examples
The easiest way to learn how to submit a job is to look at example submissions. Below, we have examples for both of the job types.
Start an interactive job
An interactive job can be started using the qsub
command. Here is an example
of an interactive job:
[username@login001 ~]$ qsub -I -l select=1:ncpus=2:mem=4gb:interconnect=1g,walltime=4:00:00
qsub (Warning): Interactive jobs will be treated as not rerunnable
qsub: waiting for job 8730.pbs02 to start
qsub: job 8730.pbs02 ready
[username@node0021 ~]$ module add anaconda3/2019.10-gcc/8.3.1
[username@node0021 ~]$ python runsim.py
.
.
.
[username@node0021 ~]$ exit
[username@login001 ~]$
Above, we request an interactive job using 1 "chunk" of hardware (select=1
), 2
CPU cores per "chunk", and 4gb of RAM per "chunk", for a wall time of 4 hours.
Once these resources are available, we receive a Job ID (8730.pbs02
), and a
command-line session running on node0021
.
Now that you have seen an example, review the
qsub
command section to learn what other options are
available.
Submit a batch job
Interactive jobs require you to be logged-in while your tasks are running. In contrast, you may logout after submitting a batch job, and examine the results at a later time. This is useful when you need to run several computational tasks to the cluster, and/or when your computational tasks are expected to run for a long time.
To submit a batch job, you must prepare a batch script (you can do this
using an editor like vim
or nano
). Following is an example of a batch script
(call it example.pbs
). In the batch job below, we really don't do anything
useful (just sleep or "do nothing" for 60 seconds),
#PBS -N example
#PBS -l select=1:ncpus=1:mem=2gb:interconnect=1g,walltime=00:10:00
module add gcc/9.3.0-gcc/8.3.1
cd /home/username
echo Hello World from `hostname`
sleep 60
After saving the above file, you can submit the batch job using qsub
:
[username@login001 ~]$ qsub example.pbs
8738.pbs02
Make a note of the job number.
Since batch jobs run in the background, you won't see the script running in your terminal. However, you can use the job control commands to get information about your job's status or take action.
Once the job is completed, you will see the files example.o8738
(containing
output if any) and example.e8738
(containing errors if any) from your job.
[username@login001 ~]$ cat example.o8738
Hello World from node0230.palmetto.clemson.edu
Now that you have seen an example, review the
qsub
command section to learn what other options are
available.
The qsub
Command
The qsub
command is used to submit jobs to the scheduler. The defaults are not
very useful, so you will want to pass several options that describe your job
type and what resources you want.
Options for qsub
The following switches can be used either with qsub
on the command line, or
with a #PBS
directive in a batch script.
Parameter | Purpose | Example |
---|---|---|
-N | Job name (7 characters) | -N maxrun1 |
-l | Job limits (lowercase L), hardware & other requirements for job. | -l select=1:ncpus=8:mem=1gb |
-q | Queue to direct this job to (work1 is the default, supabad is an example of specific research group's job queue) | -q supabad |
-o | Path to stdout file for this job (environment variables are not accepted here) | -o stdout.txt |
-e | Path to stderr file for this job (environment variables are not accepted here) | -e stderr.txt |
-m | mail event: Email from the PBS server with flag abort\ begin\ end \ or no mail for job's notification. | -m abe |
-M | Specify list of user to whom mail about the job is sent. The user list argument is of the form: [user[@host],user[@host],...]. If -M is not used and -m is specified, PBS will send email to userid@clemson.edu | -M user1@domain1.com,user2@domain2.com |
-j oe | Join the output and error streams and write to a single file | -j oe |
For example, in a batch script:
#PBS -N hydrogen
#PBS -l select=1:ncpus=24:mem=200gb,walltime=4:00:00
#PBS -q bigmem
#PBS -m abe
#PBS -M userid@domain.com
#PBS -j oe
And in an interactive job request on the command line:
qsub -I -N hydrogen -q bigmem -j oe -l select=1:ncpus=24:mem=200gb,walltime=4:00:00
For more detailed information, please take a look at:
man qsub
Palmetto does not support all PBS options; see the list below for unsupported options:
- Using
-r n
to mark the job as not re-runnable is not supported because our preemption system requires that all jobs be re-runnable. Jobs marked as not re-runnable will be marked as re-runnable automatically.
Resource Limits Specification
The -l
switch provided to qsub
or along with the #PBS
directive can be
used to specify the amount and kind of compute hardware (cores, memory, GPUs,
interconnect, etc.,), its location, i.e., the node(s) and phase from which to
request hardware,
Option | Purpose |
---|---|
select | Number of chunks and resources per chunk. Two or more "chunks" can be placed on a single node, but a single "chunk" cannot span more than one node. |
walltime | Expected wall time of job (job is terminated after this time) |
place | Controls the placement of the different chunks |
A chunk in PBS is a set of resources (CPU cores, memory, and GPUs) that must
be allocated on a single physical machine (a single node). Each chunk
corresponds with one or more MPI "slots". By default, a single MPI slot is
created for each chunk. This can be changed with the
mpiprocs
option.
PBS may place different chunks on different nodes, or the same node. If needed,
you can control the placement with the place
option.
Although there are Cgroups preventing processes within a job from using more resources than allocated to the job, there are no such controls between chunks. If multiple chunks of a job land on the same node, there are no cgroups or other controls that would prevent a single process within the job from making use of all the resources from each chunk (withing the job) on the node.
Here are some examples of resource limits specification:
-l select=1:ncpus=8:chip_model=opteron:interconnect=10g
-l select=1:ncpus=16:chip_type=e5-2665:interconnect=56g:mem=62gb,walltime=16:00:00
-l select=1:ncpus=8:chip_type=2356:interconnect=10g:mem=15gb
-l select=1:ncpus=1:node_manufacturer=ibm:mem=15gb,walltime=00:20:00
-l select=1:ncpus=4:mem=15gb:ngpus=2,walltime=00:20:00
-l select=1:ncpus=4:mem=15gb:ngpus=1:gpu_model=k40,walltime=00:20:00
-l select=1:ncpus=2:mem=15gb:host=node1479,walltime=00:20:00
-l select=2:ncpus=2:mem=15gb,walltime=00:20:00,place=scatter # force each chunk to be on a different node
-l select=2:ncpus=2:mem=15gb,walltime=00:20:00,place=pack # force each chunk to be on the same node
and examples of options you can use in the job limit specification:
# CPU Options
chip_manufacturer=amd
chip_manufacturer=intel
chip_model=opteron
chip_model=xeon
chip_type=e5345
chip_type=e5410
chip_type=l5420
chip_type=x7542
chip_type=2356
chip_type=6172
chip_type=e5-2665
# Node Manufacturer Options
node_manufacturer=dell
node_manufacturer=hp
node_manufacturer=ibm
node_manufacturer=sun
# GPU count options
ngpus=1
ngpus=2
# GPU Options
gpu_model=k20
gpu_model=k40
gpu_model=p100
gpu_model=v100
gpu_model=a100
# Phase Options
phase=1a
phase=3
phase=19b
phase=28 # You can specify any phase from /etc/hardware-table
# Interconnect Options
interconnect=1g # 1 Gbps Ethernet
interconnect=10ge # 10 Gbps Ethernet
interconnect=56g # 56 Gbps FDR InfiniBand; same as fdr
interconnect=fdr # 56 Gbps FDR InfiniBand; same as 56g
interconnect=100g # 100 Gbps HDR InfiniBand; same as hdr
interconnect=hdr # 100 Gbps HDR InfiniBand; same as 100g
MPI Processes
If you are using MPI, you can tell the scheduler how many MPI processes per
chunk to make available using the mpiprocs
resource. For example:
# 1 chunk, 4 cores per chunk, 4 MPI processes per chunk
# 4 cores total, 4 MPI processes total, 1 core per process:
-l select=1:ncpus=4:mpiprocs=4:mem=8gb
# 1 chunk, 4 cores per chunk, 2 MPI processes per chunk
# 4 cores total, 2 MPI processes total, 2 cores per process:
-l select=1:ncpus=4:mpiprocs=2:mem=8gb
# 4 chunks, 4 cores per chunk, 4 MPI processes per chunk
# 16 cores total, 16 MPI processes total, 1 core per process:
-l select=4:ncpus=4:mpiprocs=4:mem=8gb
# 4 chunks, 4 cores per chunk, 1 MPI processes per chunk
# 16 cores total, 4 MPI processes total, 4 cores per process:
-l select=4:ncpus=4:mpiprocs=1:mem=8gb
If your program supports MPI solely, you likely want mpiprocs
equal to
ncpus
. If your program supports MPI and OpenMP, you likely want mpiprocs
less than ncpus
since you will have each MPI process spawn multiple threads
(and thus make use of multiple cores). If you have mpiprocs
greater than
ncpus
, you will be oversubscribing, which almost always results in lower
performance.
Environment Variables
The following table contains potentially useful environment variables set by the PBS scheduler every time you submit a job:
Variable Name | Description |
---|---|
HOSTNAME | The name of the current host device on palmetto, e.g. node0581.palmetto.clemson.edu |
MODULEPATH | The list of paths containing software available to the module command line tool |
NCPUS | The number of requested CPUs per node |
PATH | The ordered list of paths used by linux to locate executable files when running a command |
PBS_ENVIRONMENT | Takes the value PBS_INTERACTIVE for interactive jobs or PBS_BATCH for batch jobs |
PBS_JOBDIR | Pathname of job’s staging and execution directory on the primary host |
PBS_JOBID | Job identifier given by PBS when the job is submitted |
PBS_JOBNAME | Job name given by user |
PBS_NODEFILE | The filename containing a list of node hostnames assigned to the job |
TMPDIR | Pathname of job's local scratch directory |
You can access these variables in your programs. For example, you can print the name of your host in bash
echo $HOSTNAME
# example output: "node0581.palmetto.clemson.edu"
or in Python
import os
print(os.environ['HOSTNAME'])
# example output: "node0581.palmetto.clemson.edu"