AlphaFold3
AlphaFold3 is Google DeepMind’s latest deep learning model for predicting the structure and interactions of biological macromolecules, including proteins, nucleic acids, small molecules, ions, and post-translational modifications. Building upon the success of AlphaFold2, AlphaFold3 significantly expands the scope of biomolecular structure prediction beyond protein folding alone, enabling highly accurate modeling of complex biomolecular interactions.
In November 2024, the developers released the
AlphaFold3 source code on GitHub
and published a corresponding
Nature paper describing
the methodology. In addition to the software itself, AlphaFold3 requires
approximately 630 GB of genetic and structural databases for sequence search and
template generation. A centrally maintained public dataset is available on the
cluster at /datasets/alphafold3/.
Researchers interested in performing protein structure prediction with AlphaFold3 on the cluster are encouraged to follow the guide below and use the centrally maintained databases provided by the HPC system.
Model Parameters
Due to AlphaFold3's licensing restrictions, users must obtain the model parameters directly from Google DeepMind. To obtain the model parameters:
- Visit the following form: AlphaFold3 Model Request Form
- Submit the request using an institutional email address.
- Once approved, you will receive instructions for downloading a folder containing the model parameters.
- After downloading, manually place the model parameters in the appropriate directory in your work environment.
- The AlphaFold3 model parameter file requires approximately 1.1 GB of storage space. Users should also be mindful of their storage quota limitations when storing the model parameters.
- Palmetto 2 cannot distribute the AlphaFold3 model parameters. Users are responsible for obtaining, storing, and managing the model parameter files required for running AlphaFold3.
- Users who are unable to obtain the AlphaFold3 model parameters may consider using AlphaFold 2.3.2, which is also available on Palmetto 2. However, AlphaFold 2.3.2 has more limited capabilities and does not support some of the advanced features available in AlphaFold3.
Running AlphaFold3
A typical working directory tree may look like this:
├── input
│ └── input.json
├── output
└── af3.slurm
Full directory tree example after prediction for case 2PV7
2PV7├── af3.slurm
├── input
│ └── input.json
├── output
│ └── 2PV7
│ ├── 2PV7_confidences.json
│ ├── 2PV7_data.json
│ ├── 2PV7_model.cif
│ ├── 2PV7_ranking_scores.csv
│ ├── 2PV7_summary_confidences.json
│ ├── seed-1_sample-0
│ │ ├── 2PV7_seed-1_sample-0_confidences.json
│ │ ├── 2PV7_seed-1_sample-0_model.cif
│ │ └── 2PV7_seed-1_sample-0_summary_confidences.json
│ ├── seed-1_sample-1
│ │ ├── 2PV7_seed-1_sample-1_confidences.json
│ │ ├── 2PV7_seed-1_sample-1_model.cif
│ │ └── 2PV7_seed-1_sample-1_summary_confidences.json
│ ├── seed-1_sample-2
│ │ ├── 2PV7_seed-1_sample-2_confidences.json
│ │ ├── 2PV7_seed-1_sample-2_model.cif
│ │ └── 2PV7_seed-1_sample-2_summary_confidences.json
│ ├── seed-1_sample-3
│ │ ├── 2PV7_seed-1_sample-3_confidences.json
│ │ ├── 2PV7_seed-1_sample-3_model.cif
│ │ └── 2PV7_seed-1_sample-3_summary_confidences.json
│ ├── seed-1_sample-4
│ │ ├── 2PV7_seed-1_sample-4_confidences.json
│ │ ├── 2PV7_seed-1_sample-4_model.cif
│ │ └── 2PV7_seed-1_sample-4_summary_confidences.json
│ └── TERMS_OF_USE.md
└── slurm-<jobID>.out
Input File
AlphaFold3 expects a single .json input file describing the molecular system
to be predicted. The input format is documented in the
official AlphaFold3 input documentation.
This input file should be placed in the path-to/input directory within your
working directory. Below is an example input.json file for a homo-dimer
prediction case named 2PV7. This example file is also available at
/project/rcde/public_examples/alphafold3/input.json.
input.json from Globus...input_7VSI.json from Globus...SLURM Job Script
Pipeline and Inference Workflow
There are two major stages in the AlphaFold3 workflow: data_pipeline and
inference. The data_pipeline stage performs genetic and template searches to
generate sequence alignments and template features. This stage is CPU-only and
can be time-consuming depending on the input size and database searches. The
inference stage performs the structure prediction using the AlphaFold3 deep
learning model and requires access to a GPU.
The example scripts in this guide are written for the sample 2PV7 workflow and
assume the directory structure shown above. Your actual working directory and
file locations may differ. Before submitting a job, carefully verify all
directory paths, file names, and environment variable settings to ensure they
match your local setup. The following example job scripts are also available at
/project/rcde/public_examples/alphafold3/
MSA stage (CPU-only)
#!/bin/bash
#SBATCH --job-name=alphafold3-MSA
#SBATCH --nodes=1
#SBATCH --tasks-per-node=4
#SBATCH --mem=16G
#SBATCH --time=2:0:0
cd $SLURM_SUBMIT_DIR
module load alphafold3
export AF3_INPUT_DIR=$PWD/af3/input
export AF3_OUTPUT_DIR=$PWD/af3/output
export AF3_MODEL_DIR=$PWD/af3_parameters # Please make sure the model parameter file exists here
mkdir -p $AF3_INPUT_DIR
mkdir -p $AF3_OUTPUT_DIR
cp /project/rcde/public_examples/alphafold3/input.json $AF3_INPUT_DIR
alphafold3 --json_path=/input/input.json --norun_inference
Inference stage (GPU)
AlphaFold3 provides official benchmark results for NVIDIA A100 and H100
GPUs. For most AlphaFold3 prediction workloads on the cluster, we recommend
using A100 GPUs, which provide excellent performance for inference tasks while
helping preserve H100 resources for large-scale model training and other
GPU-intensive workloads. In addition, H100 GPUs may experience longer queue
wait times due to higher demand.
For additional details regarding supported molecule sizes and GPU memory requirements, please refer to the official AlphaFold3 performance documentation.
#!/bin/bash
#SBATCH --job-name=alphafold3
#SBATCH --nodes=1
#SBATCH --tasks-per-node=4
#SBATCH --gpus-per-node=a100:1
#SBATCH --mem=14G
#SBATCH --time=1:0:0
cd $SLURM_SUBMIT_DIR
module load alphafold3
# The input for this step should be the .json generated by the data_pipeline stage, not the original input.json file.
export AF3_INPUT_DIR=$PWD/af3/output/2PV7
export AF3_OUTPUT_DIR=$PWD/af3/output/2PV7
export AF3_MODEL_DIR=$PWD/af3_parameters
# Make sure MSA results are generated before submission
alphafold3 --json_path=/input/2PV7_data.json --norun_data_pipeline