Skip to main content

Cluster Architecture

Understanding the different hardware components of Palmetto and how they are connected or related is critical to making the best use of resources.

Overview

You may have heard the term "supercomputer" used to describe Palmetto. While not inaccurate, a better description would be that Palmetto is a high-performance computing (HPC) cluster.

The cluster is made up of over a thousand separate computers, called nodes. These nodes are connected through many networks, described as interconnects.

The diagram below shows a high-level view of the cluster.

Compute Nodes

Most of the nodes in the cluster are compute nodes. These nodes have powerful hardware that can perform fast calculations on large amounts of data.

It is important to understand that Palmetto is a heterogeneous cluster, which means that compute nodes vary in hardware configuration across the cluster. This is different from many other HPC systems.

However, Palmetto is homogeneous within its phases, which means compute nodes in a given phase of the cluster have the same hardware configuration. You can see this on the Hardware Table.

Special Nodes

However, a cluster needs more than just compute nodes to be functional. Other special nodes in the cluster provide critical services to keep the cluster running or provide user access to resources.

Login Nodes

The login nodes are the interface between Palmetto and outside networks. Compute nodes are isolated from the internet, so you must connect to a login node first to gain access to them.

caution

The login node is intended to be used only for scheduling jobs, connecting to compute nodes, and other light maintenance tasks. Performing intensive tasks on the login node may cause it to become unusable for other users.

You can access the login nodes via SSH connections.

Data Transfer Nodes

The data transfer nodes allow users to move large amounts of data in or out of Palmetto.

You can access the data transfer nodes via SSH file transfer protocol (SFTP) connections.

Scheduler Node

The scheduler node keeps track of what resources are available on the compute nodes and assigns those resources to people who request them.

You can request time on a compute node by submitting a job to the scheduler. The scheduler will add your job to the queue and determine which nodes are able to run your job and when they will be available.

Interconnects

Recall that the nodes in the cluster are connected via computer networks. These network connections, called interconnects are frequently classified by their type.

info

A single node can have multiple interconnects.

Each type of interconnect has a certain amount of bandwidth, which is measured in gigabits per second (Gbps). Depending on how much data you are moving between nodes, you will want to select an appropriate interconnect.

Ethernet

Ethernet is the slowest interconnect, but it is available on every node.

note

Ethernet is the only interconnect available on the oldest nodes in the cluster.

Bandwidth varies depending on the model of network interface card installed in the node. However, the bandwidth will be the same for all nodes in a given phase.

One of the following will be available on every node:

  • 1g - 1 Gbps ethernet
  • 10ge - 10 Gbps ethernet
  • 25ge - 25 Gbps ethernet

FDR Infiniband

The nodes on phases 7 through 17 have FDR Infiniband interconnects, which have a bandwidth of 56 Gbps. This is also referred to as 56g.

HDR Infiniband

The newest nodes on phases 18 through 29 have HDR Infiniband interconnects, which have a bandwidth of 100 Gbps. This is also referred to as 100g.

Cluster Type

However, Palmetto is homogeneous within its phases, which means compute nodes in a given phase of the cluster have the same hardware configuration.

tip

If your job needs a certain type of hardware, make sure the resource list you pass to sbatch, salloc, or srun will target compatible nodes.