Overview
Today's scientific research involves increasingly large volumes of increasingly complex data. At the same time, CPU performance has improved well beyond storage media, which can create bottlenecks in HPC environments. Read this section to learn how to optimize your storage usage and improve I/O performance on Palmetto.
Palmetto provides three data spaces: home, scratch, and paid storage.
Every user has a home directory and access to the scratch file systems. Paid storage is purchased by our research scientists to house their data and can only be accessed by users having the owner’s approval. Each data space is accessible from anywhere in the cluster.
Storage Video
The video below will walk you through the different data spaces and how they are used.
Storage Hardware Grid
The table below describes each file system on Palmetto and the hardware for the storage medium and connection to the nodes.
NAME | SIZE | DISK TYPE | FILE SYSTEM | NETWORK CONNECTION | BACKUP |
---|---|---|---|---|---|
/home | 100GB per user | HDD | zfs | ethernet | ✅ yes |
/scratch | 500TB | SSD | Indigo | IB, ethernet | ❌ no |
/scratch1 | 1933TB | HDD | beegfs | IB, ethernet | ❌ no |
/fastscratch | 168TB | NVMe | beegfs | IB, ethernet | ❌ no |
/local_scratch | 99GB - 2.7TB per node | HDD, SSD, NVMe | ext4 | internal | ❌ no |
/zfs/{group} | Varies by owner | HDD | zfs | ethernet | ✅ yes |
What do the abbreviations in the storage hardware grid mean?
The legend below can be used to understand the abbreviations in the storage hardware grid table.
Term | Explanation |
---|---|
HDD | Hard Disk Drive |
SSD | Solid State Drive |
NVMe | Non-Volatile Memory Express |
IB | Infiniband |
GB | Gigabyte |
TB | Terabyte |
Performance Guidelines
Generally speaking, /local_scratch
will always be the fastest file system to
use because there is no network involved. However, this space cannot be shared
between a group of nodes participating in a job, and the data must be moved to
permanent storage before the job completes.
/scratch
is our newest parallel file system that runs atop flash storage. It
performs well across most access patterns. /scratch1
is a parallel file system
that runs atop spinning disk drives and is best suited for workflows issuing
sequential, large read or write requests. /fastscratch
is also a parallel file
system but runs atop NVMe flash drives and is best suited for workflows having
small, random read or write access patterns. However, /fastscratch
will also
run well with any kind of sequential workload.
The Palmetto support team encourages you to test your workflows against all three file systems to see which one works best for you.