Skip to main content

Monitoring Jobs on Palmetto 2

Monitoring your job is an important aspect of making sure you are requesting the right resources when you begin your job. It may be very hard to know how much to request when you first run a program, but if you monitor it, you can make more informed decisions in the future. Our acceptable use guidelines require that you do not request for more resources than your application can use - doing so would waste resources and prevent others using them.

There are a few options to monitor jobs in Slurm.

  • jobstats: A tool custom written for Palmetto that provides easy jobs statistics and overall efficiency of the job.
  • jobperf: A tool custom written for Palmetto that provides easy jobs statistics and live CPU, memory, and GPU monitoring.
  • seff: Return summary job performance metrics.
  • sacct: List completed jobs and investigate resources used by individual Slurm steps.
  • XDMod: short for XD Metrics on Demand, is a tool that allows users of HPC clusters to see their job history and metrics about performance.

Below is a brief breakdown how you can interface with these different monitoring options.

ToolCommand LineOpen OnDemand AppViewable in a web browserWorks when job is runningWorks when job has endedGroup Level Stats
jobperf yes no yes yes⚠️ limited no
jobstats yes yes yes yes yes no
seff yes no no⚠️ limited⚠️ limited no
sacct yes no no⚠️ limited⚠️ limited no
XDMod no no yes no yes yes