Trinity(1) RNA-Seq De novo Assembly


Trinity represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-seq data. Trinity combines three independent software modules: Inchworm, Chrysalis, and Butterfly, applied sequentially to process large volumes of RNA-seq reads. Trinity partitions the sequence data into many individual de Bruijn graphs, each representing the transcriptional complexity at a given gene or locus, and then processes each graph independently to extract full-length splicing isoforms and to tease apart transcripts derived from paralogous genes.


--seqType <string> type of reads: ( fa, or fq )
--max_memory <string> suggested max memory to use by Trinity where limiting can be enabled. (jellyfish, sorting, etc) provied in Gb of RAM, ie. '--max_memory 10G'

If paired reads:

--left <string> left reads, one or more (separated by space)
--right <string> right reads, one or more (separated by space)

Or, if unpaired reads:

--single <string> single reads, one or more (note, if single file contains pairs, can use flag: --run_as_paired )


--SS_lib_type <string> Strand-specific RNA-Seq read orientation. if paired: RF or FR, if single: F or R. (dUTP method = RF) See web documentation.
--CPU <int> number of CPUs to use, default: 2
--min_contig_length <int> minimum assembled contig length to report (def=200)
--long_reads <string> fasta file containing error-corrected or circular consensus (CCS) pac bio reads
--genome_guided_bam <string> genome guided mode, provide path to coordinate-sorted bam file. (see genome-guided param section under --show_full_usage_info)
--jaccard_clip option, set if you have paired reads and you expect high gene density with UTR overlap (use FASTQ input file format for reads). (note: jaccard_clip is an expensive operation, so avoid using it unless necessary due to finding excessive fusion transcripts w/o it.)
--trimmomatic run Trimmomatic to quality trim reads see '--quality_trimming_params' under full usage info for tailored settings.
--normalize_reads run in silico normalization of reads. Defaults to max. read coverage of 50. see '--normalize_max_read_cov' under full usage info for tailored settings.
--no_distributed_trinity_exec do not run Trinity phase 2 (assembly of partitioned reads), and stop after generating command list.
--output <string> name of directory for output (will be created if it doesn't already exist) default(your current working directory)
--full_cleanup only retain the Trinity fasta file, rename as ${output_dir}.Trinity.fasta
--cite show the Trinity literature citation
--version reports Trinity version (Trinity_v2.0.2) and exits.
--show_full_usage_info show the many many more options available for running Trinity (expert usage).


A typical Trinity command might be:
Trinity --seqType fq --max_memory 50G --left reads_1.fq --right reads_2.fq --CPU 6

and for Genome-guided Trinity:

Trinity --genome_guided_bam rnaseq_alignments.csorted.bam --max_memory 50G
        --genome_guided_max_intron 10000 --CPU 6