AlphaFold2

AlphaFold is an artificial intelligence system developed by Google DeepMind and Isomorphic Labs that predicts the three-dimensional structures of proteins and other biomolecular complexes from their amino acid sequences. By achieving near-experimental accuracy for many proteins, AlphaFold has transformed structural biology, enabling researchers to obtain structural insights in hours or minutes for problems that previously required months or years of experimental effort.

AlphaFold2 uses a collection of sequence and structure databases to generate MSAs and identify structural templates. A centrally maintained public dataset is available on the cluster at /datasets/alphafold/. For details about the available databases and their contents, please refer to our public datasets page.

warning

AlphaFold2 performs CPU-intensive MSA and template searches before running the GPU-based structure prediction stage. As a result, it is normal to receive job notifications reporting 0% GPU utilization. The AlphaFold2 container currently used on the cluster runs these stages as part of the same workflow, so users must allocate a GPU for the full job even though the GPU is not used during the initial MSA and template search steps.

For this reason, we recommend using AlphaFold3 when appropriate, since its workflow separates the data pipeline and inference stages and can help reduce unnecessary GPU allocation time. AlphaFold2 remains available as an alternative for users who have not yet obtained access to the AlphaFold3 model parameters.

Input file

AlphaFold2 accepts standard FASTA files. For monomer prediction, provide a single protein sequence preceded by a header line beginning with >. For multi-chain predictions, include multiple sequences in the same FASTA file, each with its own header line, and use --model_preset=multimer. A monomer example my.fasta for 2PV7 is shown below. This example file is also available in /project/rcde/public_examples/alphafold/.

Please wait, retrieving my.fasta from Globus...

Running AlphaFold2

Before submitting the job, make sure the input sequence file is located in the working directory. The dataset directory is available through the $ALPHAFOLD_DATASET environment variable, which is automatically configured when the module is loaded. The example job script af.slurm shown below is configured for the my.fasta example input and is also available in /project/rcde/public_examples/alphafold/.

Please wait, retrieving af.slurm from Globus...

Input file​

Running AlphaFold2​

Input file

Running AlphaFold2