predictprotein(1) analyse protein sequence


predictprotein [--blast-processors] [--num-cpus|c] [--debug|d] [--help] [--make-file|m] [--makedebug] [--man] [--method] [--dryrun|n] [--numresmax] [--output-dir|o] [--print-ext-method-map] [--profnumresmin] [--psicexe] [--prot-name|p] [--sequence|seq|s] [--seqfile] [--spkeyidx] [--target]* [--version|v] [--work-dir|w]

predictprotein [--bigblastdb] [--big80blastdb] [--pfam2db] [--pfam3db] [--prodomblastdb] [--prositedat] [--prositeconvdat] [--swissblastdb]

predictprotein [--setacl|acl] [--<no>cache-merge] [--<no>force-cache-store] [--<no>use-cache]


predictprotein runs a set of protein sequnce analysis methods:

Standard methods

These methods are run by the default target 'all':

 Feature                Target            Extension               Man page
 -------                ------            ---------               --------
 atom mobility          profbval          profbval, profb4snap    profbval(1)
 bacterial transmem-    proftmb           proftmb, proftmbdat     proftmb(1)
  brane beta barrels
 coiled-coils           coiledcoils       coils, coils_raw        coils-wrap(1)
 disulfide bridges      disulfinder       disulfinder             disulfinder(1)
 Gene Ontology terms    metastudent       metastudent.BPO.txt,    metastudent(1)
 local alignment        blast             blastPsiOutTmp, chk,    blastpgp(1)
                                          blastpSwissM8           blastall(1)
 local complexity       ncbi-seg          segNorm, segNormGCG     ncbi-seg(1)
 non-regular secondary  norsp             nors, sumNors           norsp(1)
 nuclear localization   predictnls        nls, nlsDat, nlsSum     predictnls(1)
 Pfam scan hmmer v2     hmm2pfam          hmm2pfam                hmm2pfam(1)
 Pfam scan hmmer v3     hmm3pfam          hmm3pfam, hmm3pfamTbl,  hmmscan(1)
 PROSITE scan           prosite           prosite                 prosite_scan(1)
 protein-protein        profisis          isis                    profisis(1)
  interaction sites
 secondary structure,   prof              profRdb                 prof(1)
  accessibility from
  sequence profile
 secondary structure,   prof              prof1Rdb                prof(1)
  accessibility from
  single sequence
 secondary structure,   reprof            reprof                  reprof(1)
  accessibility from
  single sequence
 transmembrane          phd               phdPred, phdRdb         prof(1)
 unstructured loops     norsnet           norsnet                 norsnet(1)

Optional methods

These methods are non-redistributable or depend on non-redistributable software (indicated by '*'). You have to acquire the non-redistributable components yourself before you can use these methods.

These methods are run by the target 'optional'.

 Feature                Target            Extension               Man page
 -------                ------            ---------               --------
 disordered regions     metadisorder      mdisorder               metadisorder(1)
 subcellular            loctree3          {arch,bact,euka}.lc3    loctree3(1)
                        tmhmm*            tmhmm                   n.a.
 protein-RNA,           somena            somena                  somena(1)
  interaction sites
 position-specific      psic*             psic, clustalngz        psic(1),
  independent counts                                              runNewPSIC(1),
  and its base multi-                                             clustalw(1)
  ple alignment
 transmembrane helices  tmhmm             tmhmm                   n.a.
                        tmseg             tmseg                   tmseg(1)
 functional regions     consurf           _consurf.grades         consurf(1)


 Database                             Cmd line argument
 --------                             -----------------
 big (Uniprot+PDB) blast database     --bigblastdb
 big_80 (big @ 80% sequence identity  --big80blastdb
   redunancy level) blast database
 swiss blast database                 --swissblastdb
 pfam v2 database                     --pfam2db
 pfam v3 database                     --pfam3db
 prosite_convert.dat                  --prositeconvdat

Resources for optional targets

 Database                             Cmd line argument
 --------                             -----------------
 big (Uniprot+PDB) blast database     --bigblastdb
 prosite.dat                          --prositedat
 Swiss-Prot keyword-to-accession      --spkeyidx
  'index' for loctree

Generating Resources

Courtesy of Wiktor Jurkowski:

 * rostlab-data-prosite_convert prosite.dat prosite_convert.dat
 * perl /usr/share/loctree/perl/ < keyindex.txt > keyindex_loctree.txt
 * hmmpress Pfam-A.hmm

Output format

Method outputs are deposited into --output-dir. Each method has one or more file name extensions associated with it, see the table above. Refer to the man page of the individual methods for further details. Extensions ending with `gz' are compressed with gzip(1).


Rost, B., Yachdav, G., and Liu, J. (2004). The PredictProtein server. Nucleic Acids Res, 32(Web Server issue), W321-6.

In case you find predictprotein and the tools within useful please cite:

* the references for PredictProtein, see above

* the references for the tools you used, see REFERENCES on the man page of the tool


Number of processors to use, default = 1
-c, --num-cpus
Make jobs, default = 1
-d, --debug
Print a brief help message and exits.
-m, --make-file
make file to use, default = /usr/share/predictprotein/
debug argument for make, see make(1)
This documentation page
Describes method control parameters and requests methods to run when --target is not all. Format example:


* begin with the method name, e.g. `norsp'

* list method control parameters, e.g. win=50

Not all methods support passing control parameters in this way due to their primitive command line interfaces.

-n, --dryrun
Do not execute, just shows what is about to be run
Maximum sequence length, default: 6000. Sequences longer than this will make predictprotein fail with the respective error code, see ERRORS.
-o, --output-dir
Final location of outputfiles, required unless caching is used.
Print externsion-to-method map. Useful as input file for consistency checkers. Format: <extension><tab><method>.
Minimum sequence length required by prof, default: 17. Sequences shorter than this will make predictprotein fail with the respective error code, see ERRORS.
psic wrapper executable, default: /usr/share/rost-runpsic/
-p, --prot-name
Base name of result files and protein name in - for example - FASTA files. Default = `query'.

Valid names are of the character set "[[:alnum:]._-]".

-s, --seq, --sequence
one letter amino acid sequence input
FASTA amino acid sequence file; if `-', standard input is read
Swiss-Prot keyword-to-identifier 'index' file for loctree(1).
Method groups to run. Give this argument for each target you need. Default: the value of `default_targets' in the configuration file; `all' if that is not given.

Some targets of interest:

methods that are GPL or redistributable to non-commercial entities
methods that do not fit into all

Look at /usr/share/predictprotein/ for a list of targets (``Use the source Luke'').

-v, --version
Print package version
-w, --work-dir
Working directory, optional

Database options

Path to comprehensive blast database
Path to comprehensive blast database at 80% sequence identity redundancy level
Pfam v2 database, e.g. Pfam_ls
Pfam v3 database, e.g. Pfam-A.hmm
Obsolete. This argument is kept only to maintain compatibility with older versions.
Path to `prosite.dat' file, see <>
Path to `prosite_convert.dat' file, see <>
Path to SwissProt blast database

Cache related options

--acl, --setacl
Set access control lists. Access control lists are set only in case results are stored in the cache. This option is ineffective otherwise. All previous ACLs are lost - no merging. The read bit controls browsability of results. Other bits are not used. E.g.

Merge/do not merge results into cache. --cache-merge reuses results already in cache; this turns --use-cache on automatically. --cache-merge is incompatible with --force-cache-store.

--nocache-merge is the default UNLESS

  • --use-cache is on and
  • --noforce-cache-store is in effect and
  • --target is used and
  • the cache is not empty

--cache-merge is silently ignored in case the cache is empty.

Enable/disable forcing storage of results into cache. Implies --use-cache. Default: --noforce-cache-store

With --noforce-cache-store when predictprotein finds cached results it simply fetches them from the cache and does no processing (even if the results are incomplete). With --force-cache-store predictprotein does not fetch anything from the cache but does store the results, completely replacing what was cached.

--force-cache-store is incompatible with --cache-merge.

Use/do not use cache for predictprotein results. Default: --nouse-cache.

Option `use_cache' may be given in configuration files to override default.


Sequence is too long, see --numresmax
Sequence is too short, shorter than minimum length required by prof. See --profnumresmin.


 predictprotein --seqfile /usr/share/doc/predictprotein/examples/tquick.fasta --output-dir /tmp/pp 
 predictprotein --seqfile /usr/share/doc/predictprotein/examples/tquick.fasta --output-dir /tmp/pp --target query.profRdb --target loctree3
 predictprotein --seqfile /usr/share/doc/predictprotein/examples/tquick.fasta --method=norsp,win=100 --output-dir /tmp/pp

Cache examples

Store results in cache, do not care about storing files in --output-dir:
 predictprotein --seqfile /usr/share/doc/predictprotein/examples/tquick.fasta --method=norsp,win=100 --use-cache --setacl g:rostlab:7
If not in cache store, otherwise fetch results from cache into --output-dir:
 predictprotein --seqfile /usr/share/doc/predictprotein/examples/tquick.fasta --method=norsp,win=100 --use-cache --setacl g:rostlab:7 --output-dir /tmp/pp


Location of predictproteinrc configuration file to use, overriding other configuration files


Default configuration file. See this file for a description of the parameters.
System configuration file overriding values in /usr/share/predictprotein/predictproteinrc.default
User configuration file overriding values in /etc/predictproteinrc


Popularity Contest

The pp-popularity-contest package included with this image sets up a cron job that will periodically anonymously submit to the Rost Lab developers statistics about the most used Rost Lab packages on this system.

This information helps us making decisions such as which packages should receive high priority when fixing bugs. It also helps us decide which packages should receive funding for further development and support. This information is also very important when the Rost Lab applies for funding.

Without the funding we receive based on the usage statistics you volunteer none of the packages on this image could be made available to you at no cost.

In case you do not wish to participate in the popularity contest please remove the pp-popularity-contest package.


Burkhard Rost, Antoine de Daruvar, Jinfeng Liu, Guy Yachdav, Laszlo Kajan