Bio::Tools::Run::Alignment::TCoffee(3) Object for the calculation of a

SYNOPSIS


# Build a tcoffee alignment factory
@params = ('ktuple' => 2, 'matrix' => 'BLOSUM');
$factory = Bio::Tools::Run::Alignment::TCoffee->new(@params);
# Pass the factory a list of sequences to be aligned.
$inputfilename = 't/cysprot.fa';
# $aln is a SimpleAlign object.
$aln = $factory->align($inputfilename);
# or where @seq_array is an array of Bio::Seq objects
$seq_array_ref = \@seq_array;
$aln = $factory->align($seq_array_ref);
# Or one can pass the factory a pair of (sub)alignments
#to be aligned against each other, e.g.:
# where $aln1 and $aln2 are Bio::SimpleAlign objects.
$aln = $factory->profile_align($aln1,$aln2);
# Or one can pass the factory an alignment and one or more
# unaligned sequences to be added to the alignment. For example:
# $seq is a Bio::Seq object.
$aln = $factory->profile_align($aln1,$seq);
#There are various additional options and input formats available.
#See the DESCRIPTION section that follows for additional details.

DESCRIPTION

Note: this DESCRIPTION only documents the (Bio)perl interface to TCoffee.

Helping the module find your executable

You will need to enable TCoffee to find the t_coffee program. This can be done in (at least) three ways:

 1. Make sure the t_coffee executable is in your path so that
    which t_coffee returns a t_coffee executable on your system.
 2. Define an environmental variable TCOFFEEDIR which is a dir 
    which contains the 't_coffee' app:
    In bash 
    export TCOFFEEDIR=/home/username/progs/T-COFFEE_distribution_Version_1.37/bin
    In csh/tcsh
    setenv TCOFFEEDIR /home/username/progs/T-COFFEE_distribution_Version_1.37/bin
 3. Include a definition of an environmental variable TCOFFEEDIR in
    every script that will use this TCoffee wrapper module.
    BEGIN { $ENV{TCOFFEDIR} = '/home/username/progs/T-COFFEE_distribution_Version_1.37/bin' }
    use Bio::Tools::Run::Alignment::TCoffee;

If you are running an application on a webserver make sure the webserver environment has the proper PATH set or use the options 2 or 3 to set the variables.

PARAMETERS FOR ALIGNMENT COMPUTATION

There are a number of possible parameters one can pass in TCoffee. One should really read the online manual for the best explanation of all the features. See http://igs-server.cnrs-mrs.fr/~cnotred/Documentation/t_coffee/t_coffee_doc.html

These can be specified as parameters when instantiating a new TCoffee object, or through get/set methods of the same name (lowercase).

IN

 Title       : IN
 Description : (optional) input filename, this is specified when
               align so should not use this directly unless one
               understand TCoffee program very well.

TYPE

 Title       : TYPE
 Args        : [string] DNA, PROTEIN
 Description : (optional) set the sequence type, guessed automatically
               so should not use this directly

PARAMETERS

 Title       : PARAMETERS
 Description : (optional) Indicates a file containing extra parameters

EXTEND

 Title       : EXTEND
 Args        : 0, 1, or positive value
 Default     : 1
 Description : Flag indicating that library extension should be
               carried out when performing multiple alignments, if set
               to 0 then extension is not made, if set to 1 extension
               is made on all pairs in the library.  If extension is
               set to another positive value, the extension is only
               carried out on pairs having a weigth value superior to
               the specified limit.

DP_NORMALISE

 Title       : DP_NORMALISE
 Args        : 0 or positive value
 Default     : 1000
 Description : When using a value different from 0, this flag sets the
               score of the highest scoring pair to 1000.

DP_MODE

 Title       : DP_MODE
 Args        : [string] gotoh_pair_wise, myers_miller_pair_wise,
               fasta_pair_wise cfasta_pair_wise
 Default     : cfast_fair_wise
 Description : Indicates the type of dynamic programming used by
               the program
    gotoh_pair_wise : implementation of the gotoh algorithm
    (quadratic in memory and time)
    myers_miller_pair_wise : implementation of the Myers and Miller
    dynamic programming algorithm ( quadratic in time and linear in
    space). This algorithm is recommended for very long sequences. It
    is about 2 time slower than gotoh. It only accepts tg_mode=1.
    fasta_pair_wise: implementation of the fasta algorithm. The
    sequence is hashed, looking for ktuples words. Dynamic programming
    is only carried out on the ndiag best scoring diagonals. This is
    much faster but less accurate than the two previous.
    cfasta_pair_wise : c stands for checked. It is the same
    algorithm. The dynamic programming is made on the ndiag best
    diagonals, and then on the 2*ndiags, and so on until the scores
    converge. Complexity will depend on the level of divergence of the
    sequences, but will usually be L*log(L), with an accuracy
    comparable to the two first mode ( this was checked on BaliBase).

KTUPLE

 Title       : KTUPLE
 Args        : numeric value
 Default     : 1 or 2 (1 for protein, 2 for DNA )
 Description : Indicates the ktuple size for cfasta_pair_wise dp_mode
               and fasta_pair_wise. It is set to 1 for proteins, and 2
               for DNA. The alphabet used for protein is not the 20
               letter code, but a mildly degenerated version, where
               some residues are grouped under one letter, based on
               physicochemical properties:
               rk, de, qh, vilm, fy (the other residues are
               not degenerated).

NDIAGS

 Title       : NDIAGS
 Args        : numeric value
 Default     : 0
 Description : Indicates the number of diagonals used by the
               fasta_pair_wise algorithm. When set to 0,
               n_diag=Log (length of the smallest sequence)

DIAG_MODE

 Title       : DIAG_MODE
 Args        : numeric value
 Default     : 0
 Description : Indicates the manner in which diagonals are scored
              during the fasta hashing.
              0 indicates that the score of a diagonal is equal to the
              sum of the scores of the exact matches it contains.
              1 indicates that this score is set equal to the score of
              the best uninterrupted segment
              1 can be useful when dealing with fragments of sequences.

SIM_MATRIX

 Title       : SIM_MATRIX
 Args        : string
 Default     : vasiliky
 Description : Indicates the manner in which the amino acid is being
               degenerated when hashing. All the substitution matrix
               are acceptable. Categories will be defined as sub-group
               of residues all having a positive substitution score
               (they can overlap).
               If you wish to keep the non degenerated amino acid
               alphabet, use 'idmat'

MATRIX

 Title       : MATRIX
 Args        :
 Default     :
 Description : This flag is provided for compatibility with
               ClustalW. Setting matrix = 'blosum' is equivalent to
               -in=Xblosum62mt , -matrix=pam is equivalent to
               in=Xpam250mt . Apart from this, the rules are similar
               to those applying when declaring a matrix with the
               -in=X fl

GAPOPEN

 Title       : GAPOPEN
 Args        : numeric
 Default     : 0
 Description : Indicates the penalty applied for opening a gap. The
               penalty must be negative. If you provide a positive
               value, it will automatically be turned into a negative
               number. We recommend a value of 10 with pam matrices,
               and a value of 0 when a library is used.

GAPEXT

 Title       : GAPEXT
 Args        : numeric
 Default     : 0
 Description : Indicates the penalty applied for extending a gap.

COSMETIC_PENALTY

 Title       : COSMETIC_PENALTY
 Args        : numeric
 Default     : 100
 Description : Indicates the penalty applied for opening a gap. This
               penalty is set to a very low value. It will only have
               an influence on the portions of the alignment that are
               unalignable. It will not make them more correct, but
               only more pleasing to the eye ( i.e. Avoid stretches of
               lonely residues).
               The cosmetic penalty is automatically turned off if a
               substitution matrix is used rather than a library.

TG_MODE

 Title       : TG_MODE
 Args        : 0,1,2
 Default     : 1
 Description : (Terminal Gaps)
               0: indicates that terminal gaps must be panelized with
                  a gapopen and a gapext penalty.
               1: indicates that terminal gaps must be penalized only
                  with a gapext penalty
               2: indicates that terminal gaps must not be penalized.

WEIGHT

 Title       : WEIGHT
 Args        : sim or sim_<matrix_name or matrix_file> or integer value
 Default     : sim
 Description : Weight defines the way alignments are weighted when
               turned into a library.
               sim indicates that the weight equals the average
                   identity within the match residues.
               sim_matrix_name indicates the average identity with two
                   residues regarded as identical when their
                   substitution value is positive. The valid matrices
                   names are in matrices.h (pam250mt) . Matrices not
                   found in this header are considered to be
                   filenames. See the format section for matrices. For
                   instance, -weight=sim_pam250mt indicates that the
                   grouping used for similarity will be the set of
                   classes with positive substitutions. Other groups
                   include
                       sim_clustalw_col ( categories of clustalw
                       marked with :)
                       sim_clustalw_dot ( categories of clustalw
                       marked with .)
               Value indicates that all the pairs found in the
               alignments must be given the same weight equal to
               value. This is useful when the alignment one wishes to
               turn into a library must be given a pre-specified score
               (for instance if they come from a structure
               super-imposition program). Value is an integer:
                       -weight=1000
  Note       : Weight only affects methods that return an alignment to
               T-Coffee, such as ClustalW. On the contrary, the
               version of Lalign we use here returns a library where
               weights have already been applied and are therefore
               insensitive to the -weight flag.

SEQ_TO_ALIGN

 Title       : SEQ_TO_ALIGN
 Args        : filename
 Default     : no file - align all the sequences
 Description : You may not wish to align all the sequences brought in
               by the -in flag. Supplying the seq_to_align flag allows
               for this, the file is simply a list of names in Fasta
               format.
               However, note that library extension will be carried out
               on all the sequences.

PARAMETERS FOR TREE COMPUTATION AND OUTPUT

NEWTREE

 Title       : NEWTREE
 Args        : treefile
 Default     : no file
 Description : Indicates the name of the new tree to compute. The
               default will be <sequence_name>.dnd, or <run_name.dnd>.
               Format is Phylip/Newick tree format

USETREE

 Title       : USETREE
 Args        : treefile
 Default     : no file specified
 Description : This flag indicates that rather than computing a new
               dendrogram, t_coffee can use a pre-computed one. The
               tree files are in phylips format and compatible with
               ClustalW. In most cases, using a pre-computed tree will
               halve the computation time required by t_coffee. It is
               also possible to use trees output by ClustalW or
               Phylips. Format is Phylips tree format

TREE_MODE

 Title       : TREE_MODE
 Args        : slow, fast, very_fast
 Default     : very_fast
 Description : This flag indicates the method used for computing the
               dendrogram.
               slow : the chosen dp_mode using the extended library,
               fast : The fasta dp_mode using the extended library.
               very_fast: The fasta dp_mode using pam250mt.

QUICKTREE

 Title       : QUICKTREE
 Args        :
 Default     :
 Description : This flag is kept for compatibility with ClustalW.
               It indicates that:  -tree_mode=very_fast

PARAMETERS FOR ALIGNMENT OUTPUT

OUTFILE

 Title       : OUTFILE
 Args        : out_aln file, default, no
 Default     : default ( yourseqfile.aln)
 Description : indicates name of output alignment file

OUTPUT

 Title       : OUTPUT
 Args        : format1, format2
 Default     : clustalw
 Description : Indicated format for outputting outputfile
               Supported formats are:
               clustalw_aln, clustalw: ClustalW format.
               gcg, msf_aln : Msf alignment.
               pir_aln : pir alignment.
               fasta_aln : fasta alignment.
               phylip : Phylip format.
               pir_seq : pir sequences (no gap).
               fasta_seq : fasta sequences (no gap).
    As well as:
                score_html : causes the output to be a reliability
                             plot in HTML
                score_pdf : idem in PDF.
                score_ps : idem in postscript.
    More than one format can be indicated:
                -output=clustalw,gcg, score_html

CASE

 Title       : CASE
 Args        : upper, lower
 Default     : upper
 Description : triggers choice of the case for output

CPU

 Title       : CPU
 Args        : value
 Default     : 0
 Description : Indicates the cpu time (micro seconds) that must be
               added to the t_coffee computation time.

OUT_LIB

 Title       : OUT_LIB
 Args        : name of library, default, no
 Default     : default
 Description : Sets the name of the library output. Default implies
               <run_name>.tc_lib

OUTORDER

 Title       : OUTORDER
 Args        : input or aligned
 Default     : input
 Description : Sets the name of the library output. Default implies
               <run_name>.tc_lib

SEQNOS

 Title       : SEQNOS
 Args        : on or off
 Default     : off
 Description : Causes the output alignment to contain residue numbers
               at the end of each line:

PARAMETERS FOR GENERIC OUTPUT

RUN_NAME

 Title       : RUN_NAME
 Args        : your run name
 Default     :
 Description : This flag causes the prefix <your sequences> to be
               replaced by <your run name> when renaming the default
               files.

ALIGN

 Title       : ALIGN
 Args        :
 Default     :
 Description : Indicates that the program must produce the
               alignment. This flag is here for compatibility with
               ClustalW

QUIET

 Title       : QUIET
 Args        : stderr, stdout, or filename, or nothing
 Default     : stderr
 Description : Redirects the standard output to either a file.
              -quiet on its own redirect the output to /dev/null.

CONVERT

 Title       : CONVERT
 Args        :
 Default     :
 Description : Indicates that the program must not compute the
               alignment but simply convert all the sequences,
               alignments and libraries into the format indicated with
               -output. This flag can also be used if you simply want
               to compute a library ( i.e. You have an alignment and
               you want to turn it into a library).

FEEDBACK

Mailing Lists

User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to one of the Bioperl mailing lists. Your participation is much appreciated.

  [email protected]                  - General discussion
  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists

Support

Please direct usage questions or support issues to the mailing list:

[email protected]

rather than to the module maintainer directly. Many experienced and reponsive experts will be able look at the problem and quickly address it. Please include a thorough description of the problem with code and data examples if at all possible.

Reporting Bugs

Report bugs to the Bioperl bug tracking system to help us keep track the bugs and their resolution. Bug reports can be submitted via the web:

 http://redmine.open-bio.org/projects/bioperl/

AUTHOR - Jason Stajich, Peter Schattner

Email jason-at-bioperl-dot-org, [email protected]

APPENDIX

The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _

program_name

 Title   : program_name
 Usage   : $factory->program_name()
 Function: holds the program name
 Returns:  string
 Args    : None

program_dir

 Title   : program_dir
 Usage   : $factory->program_dir(@params)
 Function: returns the program directory, obtained from ENV variable.
 Returns:  string
 Args    :

error_string

 Title   : error_string
 Usage   : $obj->error_string($newval)
 Function: Where the output from the last analysus run is stored.
 Returns : value of error_string
 Args    : newvalue (optional)

version

 Title   : version
 Usage   : exit if $prog->version() < 1.8
 Function: Determine the version number of the program
 Example :
 Returns : float or undef
 Args    : none

run

 Title   : run
 Usage   : my $output = $application->run(-seq     => $seq,
                                          -profile => $profile,
                                          -type    => 'profile-aln');
 Function: Generic run of an application
 Returns : Bio::SimpleAlign object
 Args    : key-value parameters allowed for TCoffee runs AND
           -type     => profile-aln or alignment for profile alignments or
                        just multiple sequence alignment
           -seq      => either Bio::PrimarySeqI object OR
                        array ref of Bio::PrimarySeqI objects OR
                        filename of sequences to run with
           -profile  => profile to align to, if this is an array ref
                        will specify the first two entries as the two
                        profiles to align to each other

align

 Title   : align
 Usage   :
        $inputfilename = 't/data/cysprot.fa';
        $aln = $factory->align($inputfilename);
or
        $seq_array_ref = \@seq_array; 
        # @seq_array is array of Seq objs
        $aln = $factory->align($seq_array_ref);
 Function: Perform a multiple sequence alignment
 Returns : Reference to a SimpleAlign object containing the
           sequence alignment.
 Args    : Name of a file containing a set of unaligned fasta sequences
           or else an array of references to Bio::Seq objects.
 Throws an exception if argument is not either a string (eg a
 filename) or a reference to an array of Bio::Seq objects.  If
 argument is string, throws exception if file corresponding to string
 name can not be found. If argument is Bio::Seq array, throws
 exception if less than two sequence objects are in array.

profile_align

 Title   : profile_align
 Usage   :
 Function: Perform an alignment of 2 (sub)alignments
 Example :
 Returns : Reference to a SimpleAlign object containing the (super)alignment.
 Args    : Names of 2 files containing the subalignments
           or references to 2 Bio::SimpleAlign objects.
 Note    : Needs to be updated to run with newer TCoffee code, which
           allows more than two profile alignments.

Throws an exception if arguments are not either strings (eg filenames) or references to SimpleAlign objects.

_run

 Title   :  _run
 Usage   :  Internal function, not to be called directly        
 Function:  makes actual system call to tcoffee program
 Example :
 Returns : nothing; tcoffee output is written to a
           temporary file OR specified output file
 Args    : Name of a file containing a set of unaligned fasta sequences
           and hash of parameters to be passed to tcoffee

_setinput

 Title   :  _setinput
 Usage   :  Internal function, not to be called directly        
 Function:  Create input file for tcoffee program
 Example :
 Returns : name of file containing tcoffee data input AND
           type of file (if known, S for sequence, L for sequence library,
           A for sequence alignment)
 Args    : Seq or Align object reference or input file name

_setparams

 Title   :  _setparams
 Usage   :  Internal function, not to be called directly        
 Function:  Create parameter inputs for tcoffee program
 Example :
 Returns : parameter string to be passed to tcoffee
           during align or profile_align
 Args    : name of calling object

aformat

 Title   : aformat
 Usage   : my $alignmentformat = $self->aformat();
 Function: Get/Set alignment format
 Returns : string
 Args    : string

methods

 Title   : methods
 Usage   : my @methods = $self->methods()
 Function: Get/Set Alignment methods - NOT VALIDATED
 Returns : array of strings
 Args    : arrayref of strings

Bio::Tools::Run::BaseWrapper methods

no_param_checks

 Title   : no_param_checks
 Usage   : $obj->no_param_checks($newval)
 Function: Boolean flag as to whether or not we should
           trust the sanity checks for parameter values  
 Returns : value of no_param_checks
 Args    : newvalue (optional)

save_tempfiles

 Title   : save_tempfiles
 Usage   : $obj->save_tempfiles($newval)
 Function: 
 Returns : value of save_tempfiles
 Args    : newvalue (optional)

outfile_name

 Title   : outfile_name
 Usage   : my $outfile = $tcoffee->outfile_name();
 Function: Get/Set the name of the output file for this run
           (if you wanted to do something special)
 Returns : string
 Args    : [optional] string to set value to

tempdir

 Title   : tempdir
 Usage   : my $tmpdir = $self->tempdir();
 Function: Retrieve a temporary directory name (which is created)
 Returns : string which is the name of the temporary directory
 Args    : none

cleanup

 Title   : cleanup
 Usage   : $tcoffee->cleanup();
 Function: Will cleanup the tempdir directory
 Returns : none
 Args    : none

io

 Title   : io
 Usage   : $obj->io($newval)
 Function:  Gets a L<Bio::Root::IO> object
 Returns : L<Bio::Root::IO>
 Args    : none