Storage Technologies
Palmetto leverages high performance storage media and file systems to complement its high performance compute hardware. When used correctly, these technologies can accelerate your jobs and decrease the number of idle CPU cycles waiting for data.
Storage Media
Palmetto has two types of storage media, hard disk drives and flash drives, each having their own performance characteristics. The hard disk drives have rotating platters and a moving read/write head, the physical mechanisms by which a sector of data is accessed. In contrast, the flash drives require no moving parts to access data, just electrical signals, and so, the flash drives far outperform the hard drives simply due to the nature of the physical media.
Both hard drives and flash drives have their place on Palmetto. Hard drives cost much less per terabyte than flash drives and are very reliable. So, we use hard drives for long term storage to house pools of data, while flash drives are used for computing that require short lived, high I/O throughput.
File Systems
Contributing to the performance of any storage media is the file system that manages the data. Palmetto engages two types of file systems, traditional and parallel. Both types of file systems take advantage of the underlying storage medium; however, these file systems access data in divergent ways to provide the best possible performance.
Traditional file systems, such as ext4
, xfs
, and zfs
, perform best when
requests originate from the same machine as the location of storage. Palmetto
compute nodes provide a limited amount of this kind of storage, referred to as
local scratch, that can only be used by an active Slurm job. Palmetto also uses
a software package called NFS
, Network File System, which allows compute nodes
to access data housed in a traditional file system on a storage server over a
network.
Parallel file systems, such as BeeGFS
, Lustre
, and OrangeFS
, perform best
when there are many storage servers and a fast network between the compute nodes
and the storage servers. These file systems spread data across a set of storage
servers, optimizing performance by taking advantage of the processing power of
each server and the fast network, while amortizing the time to process a request
across the number of storage servers. That is, if a data request takes X amount
of time using a traditional file system over a network, then the same request
will take X/4 amount of time in a 4 storage server parallel file system. These
file systems do not use NFS
to communicate with the storage servers but have a
client process on each compute node.