Storage Technologies
Palmetto 2 leverages high performance storage media and file systems to complement its high performance compute hardware. When used correctly, these technologies can accelerate your jobs and decrease the number of idle CPU cycles waiting for data.
Storage Media
Palmetto 2 has two types of storage media, hard disk drives and flash drives, each having their own performance characteristics. The hard disk drives have rotating platters and a moving read/write head, the physical mechanisms by which a sector of data is accessed. In contrast, the flash drives require no moving parts to access data, just electrical signals, and so, the flash drives far outperform the hard drives simply due to the nature of the physical media.
Both hard drives and flash drives have their place on Palmetto 2. Hard drives cost much less per terabyte than flash drives and are very reliable. So, we use hard drives for backup and recovery storage, while flash drives are used for computing that require short lived, high I/O throughput.
File Systems
Contributing to the performance of any storage media is the file system that manages the data. Palmetto 2 engages two types of file systems, traditional and parallel. Both types of file systems take advantage of the underlying storage medium; however, these file systems access data in divergent ways to provide the best possible performance.
Traditional file systems, such as ext4, xfs, and zfs, perform best when
requests originate from the same machine as the location of storage. Palmetto 2
compute nodes provide a limited amount of this kind of storage, referred to as
local scratch, that can only be used by an active Slurm job. Palmetto 2 also
uses a software package called NFS, Network File System, which allows compute
nodes to access data housed in a traditional file system on a storage server
over a network.
Parallel file systems perform best when there are many storage servers and a fast network between the compute nodes and the storage servers. These file systems spread data across a set of processing servers, optimizing performance by taking advantage of the processing power of each server and the fast network, while amortizing the time to process a request across the number of servers. That is, if a data request takes X amount of time using a traditional file system over a network, then the same request will take 1/4 amount of time in a 4 storage server parallel file system. On Palmetto 2, Indigo uses a special flavor of NFS that allows multiple threads from one compute node to communicate with all of the storage servers at the same time. The difference between traditional parallel file systems and Indigo is that compute nodes within one compute job do not have coordinated parallel access to the same data at the same time.