Public Datasets
RCD hosts a number of widely used publicly available datasets on Indigo,
available at /datasets
. This eliminates the need for users to consume space in
their Home, Project, or Scratch spaces.
Please submit a ticket to request changes/additions to this list.
Dataset | Path | Description |
---|---|---|
AlphaFold | /datasets/alphafold | Genetic databases for running AlphaFold inferences. Use --data_dir=/datasets/alphafold when running your model. |
BLAST | /datasets/blast/db | nt ,nr ,refseq_protein ,refseq_rna ,swissprot databases for use with NCBI BLAST+. Put export BLASTDB=/datasets/blast/db at the beginning of your batch scripts or in ~/.bashrc to use these databases. |
iGenomes | /datasets/igenomes | Reference genome sequences download from AWS iGenomes. For use in nf-core Nextflow pipelines, see the usage instructions |
ImageNet | /datasets/imagenet |