Monitoring Jobs on Palmetto 2
Monitoring your job is an important aspect of making sure you are requesting the right resources when you begin your job. It may be very hard to know how much to request when you first run a program, but if you monitor it, you can make more informed decisions in the future. Our acceptable use guidelines require that you do not request for more resources than your application can use - doing so would waste resources and prevent others using them.
There are a few options to monitor jobs in Slurm.
jobstats
: A tool custom written for Palmetto that provides easy jobs statistics and overall efficiency of the job.jobperf
: A tool custom written for Palmetto that provides easy jobs statistics and live CPU, memory, and GPU monitoring.seff
: Return summary job performance metrics.sacct
: List completed jobs and investigate resources used by individual Slurm steps.- XDMod: short for XD Metrics on Demand, is a tool that allows users of HPC clusters to see their job history and metrics about performance.
Below is a brief breakdown how you can interface with these different monitoring options.
Metric | jobperf | jobstats | seff | sacct | XDMod |
---|---|---|---|---|---|
Can be used via the command line? | ✅ Yes | ✅ Yes | ✅ Yes | ✅ Yes | ❌ No |
Can be used via Open OnDemand? | ❌ No | ✅ Yes | ❌ No | ❌ No | ❌ No |
Is viewable in a web browser? | ✅ Yes | ✅ Yes | ❌ No | ❌ No | ✅ Yes |
Supports monitoring jobs while they are running? | ✅ Yes | ✅ Yes | ⚠️ Limited | ⚠️ Limited | ❌ No |
Supports monitoring jobs after they have ended? | ⚠️ Limited | ✅ Yes | ⚠️ Limited | ⚠️ Limited | ✅ Yes |
Supports analysis of individual jobs? | ✅ Yes | ✅ Yes | ✅ Yes | ✅ Yes | ✅ Yes |
Supports analysis of multiple jobs over time? | ❌ No | ❌ No | ❌ No | ❌ No | ✅ Yes |
Supports analysis of multiple jobs by user group? | ❌ No | ❌ No | ❌ No | ❌ No | ✅ Yes |