Skip to main content

AlphaFold3

AlphaFold3 is Google DeepMind’s latest deep learning model for predicting the structure and interactions of biological macromolecules, including proteins, nucleic acids, small molecules, ions, and post-translational modifications. Building upon the success of AlphaFold2, AlphaFold3 significantly expands the scope of biomolecular structure prediction beyond protein folding alone, enabling highly accurate modeling of complex biomolecular interactions.

In November 2024, the developers released the AlphaFold3 source code on GitHub and published a corresponding Nature paper describing the methodology. In addition to the software itself, AlphaFold3 requires approximately 630 GB of genetic and structural databases for sequence search and template generation. A centrally maintained public dataset is available on the cluster at /datasets/alphafold3/.

Researchers interested in performing protein structure prediction with AlphaFold3 on the cluster are encouraged to follow the guide below and use the centrally maintained databases provided by the HPC system.

Model Parameters

Due to AlphaFold3's licensing restrictions, users must obtain the model parameters directly from Google DeepMind. To obtain the model parameters:

  • Visit the following form: AlphaFold3 Model Request Form
  • Submit the request using an institutional email address.
  • Once approved, you will receive instructions for downloading a folder containing the model parameters.
  • After downloading, manually place the model parameters in the appropriate directory in your work environment.
note
  • The AlphaFold3 model parameter file requires approximately 1.1 GB of storage space. Users should also be mindful of their storage quota limitations when storing the model parameters.
  • Palmetto 2 cannot distribute the AlphaFold3 model parameters. Users are responsible for obtaining, storing, and managing the model parameter files required for running AlphaFold3.
  • Users who are unable to obtain the AlphaFold3 model parameters may consider using AlphaFold 2.3.2, which is also available on Palmetto 2. However, AlphaFold 2.3.2 has more limited capabilities and does not support some of the advanced features available in AlphaFold3.

Running AlphaFold3

A typical working directory tree may look like this:

├── input
│   └── input.json
├── output
└── af3.slurm

Full directory tree example after prediction for case 2PV7

├── af3.slurm
├── input
│   └── input.json
├── output
│   └── 2PV7
│   ├── 2PV7_confidences.json
│   ├── 2PV7_data.json
│   ├── 2PV7_model.cif
│   ├── 2PV7_ranking_scores.csv
│   ├── 2PV7_summary_confidences.json
│   ├── seed-1_sample-0
│   │   ├── 2PV7_seed-1_sample-0_confidences.json
│   │   ├── 2PV7_seed-1_sample-0_model.cif
│   │   └── 2PV7_seed-1_sample-0_summary_confidences.json
│   ├── seed-1_sample-1
│   │   ├── 2PV7_seed-1_sample-1_confidences.json
│   │   ├── 2PV7_seed-1_sample-1_model.cif
│   │   └── 2PV7_seed-1_sample-1_summary_confidences.json
│   ├── seed-1_sample-2
│   │   ├── 2PV7_seed-1_sample-2_confidences.json
│   │   ├── 2PV7_seed-1_sample-2_model.cif
│   │   └── 2PV7_seed-1_sample-2_summary_confidences.json
│   ├── seed-1_sample-3
│   │   ├── 2PV7_seed-1_sample-3_confidences.json
│   │   ├── 2PV7_seed-1_sample-3_model.cif
│   │   └── 2PV7_seed-1_sample-3_summary_confidences.json
│   ├── seed-1_sample-4
│   │   ├── 2PV7_seed-1_sample-4_confidences.json
│   │   ├── 2PV7_seed-1_sample-4_model.cif
│   │   └── 2PV7_seed-1_sample-4_summary_confidences.json
│   └── TERMS_OF_USE.md
└── slurm-<jobID>.out

Input File

AlphaFold3 expects a single .json input file describing the molecular system to be predicted. The input format is documented in the official AlphaFold3 input documentation. This input file should be placed in the path-to/input directory within your working directory. Below is an example input.json file for a homo-dimer prediction case named 2PV7. This example file is also available at /project/rcde/public_examples/alphafold3/input.json.

Please wait, retrieving input.json from Globus...
Please wait, retrieving input_7VSI.json from Globus...

SLURM Job Script

Pipeline and Inference Workflow

There are two major stages in the AlphaFold3 workflow: data_pipeline and inference. The data_pipeline stage performs genetic and template searches to generate sequence alignments and template features. This stage is CPU-only and can be time-consuming depending on the input size and database searches. The inference stage performs the structure prediction using the AlphaFold3 deep learning model and requires access to a GPU.

note

The example scripts in this guide are written for the sample 2PV7 workflow and assume the directory structure shown above. Your actual working directory and file locations may differ. Before submitting a job, carefully verify all directory paths, file names, and environment variable settings to ensure they match your local setup. The following example job scripts are also available at /project/rcde/public_examples/alphafold3/

MSA stage (CPU-only)

#!/bin/bash

#SBATCH --job-name=alphafold3-MSA
#SBATCH --nodes=1
#SBATCH --tasks-per-node=4
#SBATCH --mem=16G
#SBATCH --time=2:0:0


cd $SLURM_SUBMIT_DIR

module load alphafold3

export AF3_INPUT_DIR=$PWD/af3/input
export AF3_OUTPUT_DIR=$PWD/af3/output
export AF3_MODEL_DIR=$PWD/af3_parameters # Please make sure the model parameter file exists here

mkdir -p $AF3_INPUT_DIR
mkdir -p $AF3_OUTPUT_DIR


cp /project/rcde/public_examples/alphafold3/input.json $AF3_INPUT_DIR

alphafold3 --json_path=/input/input.json --norun_inference

Inference stage (GPU)

AlphaFold3 provides official benchmark results for NVIDIA A100 and H100 GPUs. For most AlphaFold3 prediction workloads on the cluster, we recommend using A100 GPUs, which provide excellent performance for inference tasks while helping preserve H100 resources for large-scale model training and other GPU-intensive workloads. In addition, H100 GPUs may experience longer queue wait times due to higher demand.

For additional details regarding supported molecule sizes and GPU memory requirements, please refer to the official AlphaFold3 performance documentation.

#!/bin/bash

#SBATCH --job-name=alphafold3
#SBATCH --nodes=1
#SBATCH --tasks-per-node=4
#SBATCH --gpus-per-node=a100:1
#SBATCH --mem=14G
#SBATCH --time=1:0:0


cd $SLURM_SUBMIT_DIR

module load alphafold3

# The input for this step should be the .json generated by the data_pipeline stage, not the original input.json file.
export AF3_INPUT_DIR=$PWD/af3/output/2PV7
export AF3_OUTPUT_DIR=$PWD/af3/output/2PV7
export AF3_MODEL_DIR=$PWD/af3_parameters

# Make sure MSA results are generated before submission
alphafold3 --json_path=/input/2PV7_data.json --norun_data_pipeline