TFBS::Matrix::ICM(3) class for information content matrices of nucleotide

SYNOPSIS

  • creating a TFBS::Matrix::ICM object manually:


    my $matrixref = [ [ 0.00, 0.30, 0.00, 0.00, 0.24, 0.00 ],
    [ 0.00, 0.00, 0.00, 1.45, 0.42, 0.00 ],
    [ 0.00, 0.89, 2.00, 0.00, 0.00, 0.00 ],
    [ 0.00, 0.00, 0.00, 0.13, 0.06, 2.00 ]
    ];
    my $icm = TFBS::Matrix::ICM->new(-matrix => $matrixref,
    -name => "MyProfile",
    -ID => "M0001"
    );

    # or

    my $matrixstring = <<ENDMATRIX
    2.00 0.30 0.00 0.00 0.24 0.00
    0.00 0.00 0.00 1.45 0.42 0.00
    0.00 0.89 2.00 0.00 0.00 0.00
    0.00 0.00 0.00 0.13 0.06 2.00
    ENDMATRIX
    ;
    my $icm = TFBS::Matrix::ICM->new(-matrixstring => $matrixstring,
    -name => "MyProfile",
    -ID => "M0001"
    );

  • retrieving a TFBS::Matix::ICM object from a database:

    (See documentation of individual TFBS::DB::* modules to learn how to connect to different types of pattern databases and retrieve TFBS::Matrix::* objects from them.)

        my $db_obj = TFBS::DB::JASPAR2->new
                        (-connect => ["dbi:mysql:JASPAR2:myhost",
                                      "myusername", "mypassword"]);
        my $pfm = $db_obj->get_Matrix_by_ID("M0001", "ICM");
        # or
        my $pfm = $db_obj->get_Matrix_by_name("MyProfile", "ICM");
    
  • retrieving list of individual TFBS::Matrix::ICM objects from a TFBS::MatrixSet object

    (see decumentation of TFBS::MatrixSet to learn how to create objects for storage and manipulation of multiple matrices)

        my @icm_list = $matrixset->all_patterns(-sort_by=>"name");
    

    * drawing a sequence logo

        $icm->draw_logo(-file=>"logo.png", 
                        -full_scale =>2.25,
                        -xsize=>500,
                        -ysize =>250, 
                        -graph_title=>"C/EBPalpha binding site logo", 
                        -x_title=>"position", 
                        -y_title=>"bits");
    

DESCRIPTION

TFBS::Matrix::ICM is a class whose instances are objects representing position weight matrices (PFMs). An ICM is normally calculated from a raw position frequency matrix (see TFBS::Matrix::PFM for the explanation of position frequency matrices). For example, given the following position frequency matrix,

    A:[ 12     3     0     0     4     0  ]
    C:[  0     0     0    11     7     0  ]
    G:[  0     9    12     0     0     0  ]
    T:[  0     0     0     1     1    12  ]

the standard computational procedure is applied to convert it into the following information content matrix:

    A:[2.00  0.30  0.00  0.00  0.24  0.00]
    C:[0.00  0.00  0.00  1.45  0.42  0.00]
    G:[0.00  0.89  2.00  0.00  0.00  0.00]
    T:[0.00  0.00  0.00  0.13  0.06  2.00]

which contains the ``weights'' associated with the occurrence of each nucleotide at the given position in a pattern.

A TFBS::Matrix::PWM object is equipped with methods to search nucleotide sequences and pairwise alignments of nucleotide sequences with the pattern they represent, and return a set of sites in nucleotide sequence (a TFBS::SiteSet object for single sequence search, and a TFBS::SitePairSet for the alignment search).

FEEDBACK

Please send bug reports and other comments to the author.

AUTHOR - Boris Lenhard

Boris Lenhard <[email protected]>

APPENDIX

The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore.

new

 Title   : new
 Usage   : my $icm = TFBS::Matrix::ICM->new(%args)
 Function: constructor for the TFBS::Matrix::ICM object
 Returns : a new TFBS::Matrix::ICM object
 Args    : # you must specify either one of the following three:
 
           -matrix,      # reference to an array of arrays of integers
              #or
           -matrixstring,# a string containing four lines
                         # of tab- or space-delimited integers
              #or
           -matrixfile,  # the name of a file containing four lines
                         # of tab- or space-delimited integers
           #######
 
           -name,        # string, OPTIONAL
           -ID,          # string, OPTIONAL
           -class,       # string, OPTIONAL
           -tags         # an array reference, OPTIONAL

to_PWM

 Title   : to_PWM
 Usage   : my $pwm = $icm->to_PWM()
 Function: converts an  information content matrix (a TFBS::Matrix::ICM object)
           to position weight matrix. At present it assumes uniform
           background distribution of nucleotide frequencies.
 Returns : a new TFBS::Matrix::PWM object
 Args    : none; in the future releases, it should be able to accept
           a user defined background probability of the four
           nucleotides

draw_logo

 Title   : draw_logo
 Usage   : my $gdImageObj = $icm->draw_logo(%args)
 Function: Draws a "sequence logo", a graphical representation
           of a possibly degenerate fixed-width nucleotide
           sequence pattern, from the information content matrix
 Returns : a GD::Image object;
           if you only need the image file you can ignore it
 Args    : -file,       # the name of the output PNG image file
                        # OPTIONAL: default none
           -xsize       # width of the image in pixels
                        # OPTIONAL: default 600
           -ysize       # height of the image in pixels
                        # OPTIONAL: default 5/8 of -x_size
           -startpos    # start position in the logo for x axis
                        # OPTIONAL: default is 1
           -margin      # size of image margins in pixels
                        # OPTIONAL: default 15% of -y_size
           -full_scale  # the maximum value on the y-axis, in bits
                        # OPTIONAL: default 2.25
           -graph_title,# the graph title
                        # OPTIONAL: default none
           -x_title,    # x-axis title; OPTIONAL: default none
           -y_title     # y-axis title; OPTIONAL: default none
           -error_bars  # reference to an array of S.D. values for each column; OPTIONAL
           -ps          # if true, produces a postscript string instead of a GD::Image object
            -pdf          # if true AND the -file argumant is used, produces an output pdf file

_draw_ps_logo

 Title   : _draw_ps_logo 
 Usage   : my $postscript_string = $icm->_draw_ps_logo(%args)
           Internal method, should be accessed using draw_logo()
 Function: Draws a "sequence logo", a graphical representation
           of a possibly degenerate fixed-width nucleotide
           sequence pattern, from the information content matrix
 Returns : a postscript string;
           if you only need the image file you can ignore it
 Args    : -file,       # the name of the output PNG image file
                        # OPTIONAL: default none
           -xsize       # width of the image in pixels
                        # OPTIONAL: default 600
           -ysize       # height of the image in pixels
                        # OPTIONAL: default 5/8 of -x_size
           -full_scale  # the maximum value on the y-axis, in bits
                        # OPTIONAL: default 2.25
           -graph_title,# the graph title
                        # OPTIONAL: default none
           -x_title,    # x-axis title; OPTIONAL: default none
           -y_title     # y-axis title; OPTIONAL: default none

_draw_svg_logo

name

ID

class

matrix

length

revcom

rawprint

prettyprint

The above methods are common to all matrix objects. Please consult TFBS::Matrix to find out how to use them.