Skip to main content

Monitoring Jobs on Palmetto 2

Monitoring your job is an important aspect of making sure you are requesting the right resources when you begin your job. It may be very hard to know how much to request when you first run a program, but if you monitor it, you can make more informed decisions in the future. Our acceptable use guidelines require that you do not request for more resources than your application can use - doing so would waste resources and prevent others using them.

There are a few options to monitor jobs in Slurm.

  • jobstats: A tool custom written for Palmetto that provides easy jobs statistics and overall efficiency of the job.
  • jobperf: A tool custom written for Palmetto that provides easy jobs statistics and live CPU, memory, and GPU monitoring.
  • seff: Return summary job performance metrics.
  • sacct: List completed jobs and investigate resources used by individual Slurm steps.
  • XDMod: short for XD Metrics on Demand, is a tool that allows users of HPC clusters to see their job history and metrics about performance.

Below is a brief breakdown how you can interface with these different monitoring options.

MetricjobperfjobstatsseffsacctXDMod
Can be used via the command line? Yes Yes Yes Yes No
Can be used via Open OnDemand? No Yes No No No
Is viewable in a web browser? Yes Yes No No Yes
Supports monitoring jobs while they are running? Yes Yes⚠️ Limited⚠️ Limited No
Supports monitoring jobs after they have ended?⚠️ Limited Yes⚠️ Limited⚠️ Limited Yes
Supports analysis of individual jobs? Yes Yes Yes Yes Yes
Supports analysis of multiple jobs over time? No No No No Yes
Supports analysis of multiple jobs by user group? No No No No Yes