Skip to main content

Storage Technologies

Palmetto leverages high performance storage media and file systems to complement its high performance compute hardware. When used correctly, these technologies can accelerate your jobs and decrease the number of idle CPU cycles waiting for data.

Storage Media

Palmetto has two types of storage media, hard disk drives and flash drives, each having their own performance characteristics. The hard disk drives have rotating platters and a moving read/write head, the physical mechanisms by which a sector of data is accessed. In contrast, the flash drives require no moving parts to access data, just electrical signals, and so, the flash drives far outperform the hard drives simply due to the nature of the physical media.

Both hard drives and flash drives have their place on Palmetto. Hard drives cost much less per terabyte than flash drives and are very reliable. So, we use hard drives for long term storage to house pools of data, while flash drives are used for computing that require short lived, high I/O throughput.

File Systems

Contributing to the performance of any storage media is the file system that manages the data. Palmetto engages two types of file systems, traditional and parallel. Both types of file systems take advantage of the underlying storage medium; however, these file systems access data in divergent ways to provide the best possible performance.

Traditional file systems, such as ext4, xfs, and zfs, perform best when requests originate from the same machine as the location of storage. Palmetto compute nodes provide a limited amount of this kind of storage, referred to as local scratch, that can only be used by an active Slurm job. Palmetto also uses a software package called NFS, Network File System, which allows compute nodes to access data housed in a traditional file system on a storage server over a network.

Parallel file systems, such as BeeGFS, Lustre, and OrangeFS, perform best when there are many storage servers and a fast network between the compute nodes and the storage servers. These file systems spread data across a set of storage servers, optimizing performance by taking advantage of the processing power of each server and the fast network, while amortizing the time to process a request across the number of storage servers. That is, if a data request takes X amount of time using a traditional file system over a network, then the same request will take X/4 amount of time in a 4 storage server parallel file system. These file systems do not use NFS to communicate with the storage servers but have a client process on each compute node.