Profppikernel

From Rost Lab Open
Revision as of 18:26, 9 June 2015 by Hampt (talk | contribs)

Description

Profppikernel uses an accelerated version of the original profile kernel [1] to train SVM based protein-protein interaction (PPI) prediction models and to predict new PPIs from sequence alone.

In the first mode of operation, 'training', the user provides evolutionary protein profiles and PPIs between them (labels) as input and profppikernel outputs a folder with all model files required for predictions. In the second mode, 'prediction', the user provides the path to such a model folder and to the query profiles and PPIs. profppikernel then predicts their probability to interact.

Best human predictions

We have predicted all protein pairs in human for which both (C3) or one (C2) protein are sequence-dissimilar to known reliably annotated human interactions. The best 100,000 predictions in either class can be downloaded here.

Availability

The .tar.gz source package contains all source codes together with a sample classification problem. Download it here and compile and install it with the included make based installation (see the included README file). The main executable is profppikernel.

Alternatively, you can try the precompiled version (available here). Again, see the README file for installation instructions.

Hints

  • For large prediction tasks, we may be able to perform the predictions for you and/or provide you with considerably faster customized implementations.
  • The method needs protein profiles as generated by Blast as input. The database we used for Blast was the entire Uniprot, redundancy reduced to 80% maximum pairwise sequence identity with CD-HIT. The Blast command was as follows:

blastpgp -a 1 -F F -j 3 -b 3000 -e 1e-3 -h 1e-3 -d <path to non-redundant blast database> -i <path to fasta file containing exactly one target protein> -Q <path to output profile file>

Bugs and Other Issues

Please report bugs and other issues via Bugzilla.

References

T. Hamp and Rost, B. (2015): Evolutionary profiles improve protein–protein interaction prediction from sequence. Bioinformatics, in press

T. Hamp and Rost, B. (2014): More challenges for machine learning protein-protein interactions. Bioinformatics, 31 (10): 1521-1525

T. Hamp, T. Goldberg and Rost, B. (2013): Accelerating the Original Profile Kernel. PLoS One, 8(4), e68459

Contact

For questions, please contact hampt@rostlab.org