AlphaFold2
AlphaFold is an artificial intelligence system developed by Google DeepMind and Isomorphic Labs that predicts the three-dimensional structures of proteins and other biomolecular complexes from their amino acid sequences. By achieving near-experimental accuracy for many proteins, AlphaFold has transformed structural biology, enabling researchers to obtain structural insights in hours or minutes for problems that previously required months or years of experimental effort.
AlphaFold2 uses a collection of sequence and structure databases to generate
MSAs and identify structural templates. A centrally maintained public dataset is
available on the cluster at /datasets/alphafold/. For details about the
available databases and their contents, please refer to our
public datasets page.
AlphaFold2 performs CPU-intensive MSA and template searches before running the GPU-based structure prediction stage. As a result, it is normal to receive job notifications reporting 0% GPU utilization. The AlphaFold2 container currently used on the cluster runs these stages as part of the same workflow, so users must allocate a GPU for the full job even though the GPU is not used during the initial MSA and template search steps.
For this reason, we recommend using AlphaFold3 when appropriate, since its workflow separates the data pipeline and inference stages and can help reduce unnecessary GPU allocation time. AlphaFold2 remains available as an alternative for users who have not yet obtained access to the AlphaFold3 model parameters.
Input file
AlphaFold2 accepts
standard FASTA files.
For monomer prediction, provide a single protein sequence preceded by a header
line beginning with >. For multi-chain predictions, include multiple sequences
in the same FASTA file, each with its own header line, and use
--model_preset=multimer. A monomer example my.fasta for 2PV7 is shown
below. This example file is also available in
/project/rcde/public_examples/alphafold/.
my.fasta from Globus...Running AlphaFold2
Before submitting the job, make sure the input sequence file is located in the
working directory. The dataset directory is available through the
$ALPHAFOLD_DATASET environment variable, which is automatically configured
when the module is loaded. The example job script af.slurm shown below is
configured for the my.fasta example input and is also available in
/project/rcde/public_examples/alphafold/.
af.slurm from Globus...