Public Datasets
RCD hosts a number of widely used publicly available datasets on Indigo,
available at /datasets. This eliminates the need for users to consume space in
their Home, Project, or Scratch spaces.
Please submit a ticket to request changes/additions to this list.
| Dataset | Path | Description | 
|---|---|---|
| AlphaFold | /datasets/alphafold | Genetic databases for running AlphaFold inferences. Use --data_dir=/datasets/alphafoldwhen running your model. | 
| BLAST | /datasets/blast/db | nt,nr,refseq_protein,refseq_rna,swissprotdatabases for use with NCBI BLAST+. Putexport BLASTDB=/datasets/blast/dbat the beginning of your batch scripts or in~/.bashrcto use these databases. | 
| iGenomes | /datasets/igenomes | Reference genome sequences download from AWS iGenomes. For use in nf-core Nextflow pipelines, see the usage instructions | 
| ImageNet | /datasets/imagenet |