USAGE
Required:
-t <string> transcripts.fasta
Common options:
--retain_long_orfs <int> retain all ORFs found that are equal or longer than these many nucleotides even if no other evidence marks it as coding (default: 900 bp => 300aa) --retain_pfam_hits <string> domain table output file from running hmmscan to search Pfam (see transdecoder.github.io for info) Any ORF with a pfam domain hit will be retained in the final output. --retain_blastp_hits <string> blastp output in '-outfmt 6' format. Any ORF with a blast match will be retained in the final output. --single_best_orf Retain only the single best ORF per transcript. (Best is defined as having (optionally pfam and/or blast support) and longest orf) --cpu <int> Use multipe cores for cd-hit-est. (default=1)
Advanced options
--train <string> FASTA file with ORFs to train Markov Mod for protein identification; otherwise longest non-redundant ORFs used -T <int> If no --train, top longest ORFs to train Markov Model (hexamer stats) (default: 500) Note, 10x this value are first selected for use with cd-hit to remove redundancies, and then this -T value of longest ORFs are selected from the non-redundant set.