Diamond
Diamond is a sequence aligner designed for aligning translated DNA or protein
sequences against a protein reference database such as NR. To begin you generate
a protein database binary file through diamond. Then you can query FASTA files
against the database using blastp. By default it returns hits with >70%
sequence homology to the output .tsv file. This can be adjusted using command
options. By default it will detect and use the available virtual cores on
the machine.
It is also capable of clustering proteins in a similar way to CD_HIT and
UCLUST where, based upon input criteria, it finds a set of centroid or
representative sequences and assigns each input sequence to the cluster of one
representative.
For additional information see the documentation
ml biocontainers diamond
diamond makedb --in <referencedb> -d <dbname>
diamond blastp -q <query.fa> -d <dbname> -o <output.tsv> --<optional commands>
# clustering
diamond cluster -d <query.fa> -o <output.tsv> --<optional commands>
Parallel Capabilities: Multithreading options supported.