gt-csa(1) Transform spliced alignments from GFF3 file into consensus spliced alignments.

SYNOPSIS

gt csa [option ...] [GFF3_file]

DESCRIPTION

-join-length [value]

set join length for the spliced alignment clustering (default: 300)

-v [yes|no]

be verbose (default: no)

-o [filename]

redirect output to specified file (default: undefined)

-gzip [yes|no]

write gzip compressed output file (default: no)

-bzip2 [yes|no]

write bzip2 compressed output file (default: no)

-force [yes|no]

force writing to output file (default: no)

-help

display help and exit

-version

display version information and exit

EXAMPLE:

Let's assume we have a GFF3 file csa_example_spliced_alignments.gff3 containing the following four overlapping spliced alignments (represented as genes with exons as children):

##gff-version 3
##sequence-region   seq 1 290
seq .       gene    1       209     .       +       .       ID=gene1
seq .       exon    1       90      .       +       .       Parent=gene1
seq .       exon    110     190     .       +       .       Parent=gene1
seq .       exon    201     209     .       +       .       Parent=gene1
###
seq .       gene    1       290     .       +       .       ID=gene2
seq .       exon    1       90      .       +       .       Parent=gene2
seq .       exon    101     190     .       +       .       Parent=gene2
seq .       exon    201     290     .       +       .       Parent=gene2
###
seq .       gene    10      290     .       +       .       ID=gene3
seq .       exon    10      90      .       +       .       Parent=gene3
seq .       exon    110     190     .       +       .       Parent=gene3
seq .       exon    201     290     .       +       .       Parent=gene3
###
seq .       gene    181     290     .       +       .       ID=gene4
seq .       exon    181     190     .       +       .       Parent=gene4
seq .       exon    201     290     .       +       .       Parent=gene4
###

To compute the consensus spliced alignments we call:

$ gt csa csa_example_spliced_alignments.gff3

Which returns:

##gff-version 3
##sequence-region   seq 1 290
seq gt csa  gene    1       290     .       +       .       ID=gene1
seq gt csa  mRNA    1       290     .       +       .       ID=mRNA1;Parent=gene1
seq gt csa  exon    1       90      .       +       .       Parent=mRNA1
seq gt csa  exon    110     190     .       +       .       Parent=mRNA1
seq gt csa  exon    201     290     .       +       .       Parent=mRNA1
seq gt csa  mRNA    1       290     .       +       .       ID=mRNA2;Parent=gene1
seq gt csa  exon    1       90      .       +       .       Parent=mRNA2
seq gt csa  exon    101     190     .       +       .       Parent=mRNA2
seq gt csa  exon    201     290     .       +       .       Parent=mRNA2
###

As one can see, they have been combined into a consensus spliced alignment (represented as genes with mRNAs as children which in turn have exons as children) with two alternative splice forms. The first and the third spliced alignment have been combined into the first alternative splice form (mRNA1) and the the second and the fourth spliced alignment into the second alternative splice form (mRNA2).

As one can see, the second exon from the first alternative splice form is shorter than the corresponding exon from the second alternative splice form.

REPORTING BUGS

Report bugs to <[email protected]>.