Skip to main content

Public Datasets

RCD hosts a number of widely used publicly available datasets on Indigo, available at /datasets. This eliminates the need for users to consume space in their Home, Project, or Scratch spaces.

Please submit a ticket to request changes/additions to this list.

DatasetPathDescription
AlphaFold/datasets/alphafoldGenetic databases for running AlphaFold inferences. Use --data_dir=/datasets/alphafold when running your model.
BLAST/datasets/blast/dbnt,nr,refseq_protein,refseq_rna,swissprot databases for use with NCBI BLAST+. Put export BLASTDB=/datasets/blast/db at the beginning of your batch scripts or in ~/.bashrc to use these databases.
iGenomes/datasets/igenomesReference genome sequences download from AWS iGenomes. For use in nf-core Nextflow pipelines, see the usage instructions
ImageNet/datasets/imagenet