fastblocksearch(1) search loci matching protein block profiles

SYNOPSIS

fastBlockSearch [options] seqs.fa fam.prfl

DESCRIPTION

Searches hits (matches) of the blocks in the profile given by fam.prfl within the genomic sequences in the file seqs.fa. Hits are sorted by increasing score, so the last displayed hit is the best one found in the region. The format is similar to that of the blockscore file (which is optionally generated by msa2prfl.pl): It shows coordinate, strand, mean odds- ratio score, and specificity of score, and the motif. From the output users can choose regions with matching blocks to perform gene prediction with AUGUSTUS-PPX using the same block profile.

OPTIONS

--cutoff=c

This minimum for the average log score of the motifs found can be used to adjust the sensitivity of the block search. The standard cutoff is 0.7, which is very sensitive but can give many false positive hits for smaller profiles.

EXAMPLE

> fastBlockSearch --cutoff=1.1 chr4.103M.fa PF00225_seed.prfl

Hits found in chr4 103000000 105000000
Score:207.987
Mult. score:4.83391
1081586 unknown_M[5,13] -       2.32574 5.04633 .....YATRLKNI
1103952 unknown_L       -       4.85363 6.75245 NAKTRIICTITP
1103991 unknown_K       -       8.38065 9.92928 YRDSKLTRILQNSLG
1104375 unknown_J       -       3.96065 6.79408 RSLFILGQVIKKL
1106992 unknown_I       -       9.22487 7.64306 LVDLAGSE
1115567 unknown_H[5,16] -       2.31869 5.58986 .....ESRHYGETKMN
1116319 unknown_G       -       7.34282 8.29425 EIYNETITDLL
1117092 unknown_F       -       5.10694 6.10274 VIPRAIHDIF
1117146 unknown_E       -       9.43596 9.18891 QTASGKTYTM
1117176 unknown_D[1,8]  -       5.73796 6.31532 .GTIFAYG
1117399 unknown_B[1,7]  -       3.59083 5.03059 .CLDRVF
1119420 unknown_A[0,8]  -       4.64107 6.44285 RVRPLNSR.

AUTHORS

Oliver Keller