gffread(1) component of cufflinks suite


Usage: gffread <input_gff> [-g <genomic_seqs_fasta> | <dir>][-s <seq_info.fsize>]
[-o <outfile.gff>] [-t <tname>] [-r [[<strand>]<chr>:]<start>..<end> [-R]] [-CTVNJMKQAFGUBHZWTOLE] [-w <exons.fa>] [-x <cds.fa>] [-y <tr_cds.fa>] [-i <maxintron>]

Filters and/or converts GFF3/GTF2 records. <input_gff> is a GFF file, use '-' if the GFF records will be given at stdin


full path to a multi-fasta file with the genomic sequences for all input mappings, OR a directory with single-fasta files (one per genomic sequence, with file names matching sequence names)
<seq_info.fsize> is a tab-delimited file providing this info for each of the mapped sequences: <seq-name> <seq-length> <seq-description> (useful for -A option with mRNA/EST/protein mappings)
discard transcripts having an intron larger than <maxintron>
only show transcripts overlapping coordinate range <start>..<end> (on chromosome/contig <chr>, strand <strand> if provided)
for -r option, discard all transcripts that are not fully contained within the given range
discard single-exon transcripts
coding only: discard mRNAs that have no CDS feature
full GFF attribute preservation (all attributes are shown)
only parse additional exon attributes from the first exon and move them to the mRNA level (useful for GTF input)
use the description field from <seq_info.fsize> and add it as the value for a 'descr' attribute to the GFF record
process also non-transcript GFF records (by default non-transcript records are ignored)
discard any mRNAs with CDS having in-frame stop codons
for -V option, check and adjust the starting CDS phase if the original phase leads to a translation with an in-frame stop codon
for -V option, single-exon transcripts are also checked on the opposite strand
discard multi-exon mRNAs that have any intron with a non-canonical splice site consensus (i.e. not GT-AG, GC-AG or AT-AC)
discard any mRNAs that either lack initial START codon or the terminal STOP codon, or have an in-frame stop codon (only print mRNAs with a fulll, valid CDS)
--no-pseudo: filter out records matching the 'pseudo' keyword
-M/--merge : cluster the input transcripts into loci, collapsing matching
transcripts (those with the same exact introns and fully contained)
-d <dupinfo> : for -M option, write collapsing info to file <dupinfo>
--cluster-only: same as --merge but without collapsing matching transcripts
for -M option: also collapse shorter, fully contained transcripts with fewer introns than the container
for -M option, remove the containment restriction: (multi-exon transcripts will be collapsed if just their introns match, while single-exon transcripts can partially overlap (80%))
--force-exons: make sure that the lowest level GFF features are printed as
"exon" features
expose (warn about) duplicate transcript IDs and other potential problems with the given GFF/GTF records
decode url encoded characters within attributes
merge close exons into a single exon (for intron size<4)
write a fasta file with spliced exons for each GFF transcript
write a fasta file with spliced CDS for each GFF transcript
for -w and -x options, also write for each fasta record the exon coordinates projected onto the spliced sequence
write a protein fasta file with the translation of CDS for each record
Ensembl GTF to GFF3 conversion (implies -F; should be used with -m)
<chr_replace> is a reference (genomic) sequence replacement table with this format: <original_ref_ID> <new_ref_ID> GFF records on reference sequences that are not found among the <original_ref_ID> entries in this file will be filtered out
the "filtered" GFF records will be written to <outfile.gff> (use -o- for printing to stdout)
use <trackname> in the second column of each GFF output line
-T -o option will output GTF format instead of GFF3