Cluster Architecture
Understanding the different hardware components of Palmetto and how they are connected or related is critical to making the best use of resources.
Overview
You may have heard the term "supercomputer" used to describe Palmetto. While not inaccurate, a better description would be that Palmetto is a high-performance computing (HPC) cluster.
The cluster is made up of over a thousand separate computers, called nodes. These nodes are connected through many networks, described as interconnects.
The diagram below shows a high-level view of the cluster.
Compute Nodes
Most of the nodes in the cluster are compute nodes. These nodes have powerful hardware that can perform fast calculations on large amounts of data.
It is important to understand that Palmetto is a heterogeneous cluster, which means that compute nodes vary in hardware configuration across the cluster. This is different from many other HPC systems.
However, Palmetto is homogeneous within its phases, which means compute nodes in a given phase of the cluster have the same hardware configuration. You can see this on the Hardware Table.
Special Nodes
However, a cluster needs more than just compute nodes to be functional. Other special nodes in the cluster provide critical services to keep the cluster running or provide user access to resources.
Login Nodes
The login nodes are the interface between Palmetto and outside networks. Compute nodes are isolated from the internet, so you must connect to a login node first to gain access to them.
The login node is intended to be used only for scheduling jobs, connecting to compute nodes, and other light maintenance tasks. Performing intensive tasks on the login node may cause it to become unusable for other users.
You can access the login nodes via SSH connections.
Data Transfer Nodes
The data transfer nodes allow users to move large amounts of data in or out of Palmetto.
You can access the data transfer nodes via SSH file transfer protocol (SFTP) connections.
Scheduler Node
The scheduler node keeps track of what resources are available on the compute nodes and assigns those resources to people who request them.
You can request time on a compute node by submitting a job to the scheduler. The scheduler will add your job to the queue and determine which nodes are able to run your job and when they will be available.
Interconnects
Recall that the nodes in the cluster are connected via computer networks. These network connections, called interconnects are frequently classified by their type.
A single node can have multiple interconnects.
Each type of interconnect has a certain amount of bandwidth, which is measured in gigabits per second (Gbps). Depending on how much data you are moving between nodes, you will want to select an appropriate interconnect.
Ethernet
Ethernet is the slowest interconnect, but it is available on every node.
Ethernet is the only interconnect available on the oldest nodes in the cluster.
Bandwidth varies depending on the model of network interface card installed in the node. However, the bandwidth will be the same for all nodes in a given phase.
One of the following will be available on every node:
1g
- 1 Gbps ethernet10ge
- 10 Gbps ethernet25ge
- 25 Gbps ethernet
FDR Infiniband
The nodes on phases 7 through 17 have FDR Infiniband interconnects, which have a
bandwidth of 56 Gbps. This is also referred to as 56g
.
HDR Infiniband
The newest nodes on phases 18 through 29 have HDR Infiniband interconnects,
which have a bandwidth of 100 Gbps. This is also referred to as 100g
.
Cluster Type
However, Palmetto is homogeneous within its phases, which means compute nodes in a given phase of the cluster have the same hardware configuration.
If your job needs a certain type of hardware, make sure the resource list you
pass to sbatch
, salloc
, or srun
will target compatible nodes.