Skip to main content

RepeatModeler

RepeatModeler is a de novo transposable element family identification and modeling package. Consisting of 3 core programs (RECON, RepeatScout, and LtrHarvest/Ltr_retriever) which identify repeat element boundaries and family relationships from sequencing data.

For input it takes a database and an input sequence. It outputs 3 primary files, families.fa, families.stk, and rmod.log. families.fa consists of the consensus sequences, .stk consists of seed alignments, and rmod consists of a summarized log of the run. Note that this program generates A LOT of temporary files that ARE NOT cleaned up between runs. You will have to manually clean up your directory between runs. Also note that this has long runtimes so plan your resource request accordingly. This module also loads BuildDatabase and RepeatClassifier.

For additional information and examples see the documentation

ml biocontainers repeatmodeler
BuildDatabase -name fish fish.fa
RepeatModeler -database fish -threads # -LTRStruct

Parallel Capabilities: Single core default, Multithreading options supported.