Profppikernel

From Rost Lab Open
Jump to: navigation, search

Contents

Description

Profppikernel uses an accelerated version of the original profile kernel [1] to train SVM based protein-protein interaction (PPI) prediction models and to predict new PPIs from sequence alone.

In the first mode of operation, 'training', the user provides evolutionary protein profiles and PPIs between them (labels) as input and profppikernel outputs a folder with all model files required for predictions. In the second mode, 'prediction', the user provides the path to such a model folder and to the query profiles and PPIs. profppikernel then predicts their probability to interact.

Best human predictions

We have predicted all protein pairs in human for which both (C3) or one (C2) protein are sequence-dissimilar to known reliably annotated human interactions. The best 100,000 predictions in either class can be downloaded here.

Data sets

Data sets used in the evaluation were part of a bigger analysis, available here.

Availability

For Debian-based systems, we recommend to use the .tar.gz source package. It contains all source codes together with a sample classification problem. Download it here and compile and install it with the included make based installation (see the included README file). The main executable is profppikernel.

For non-Debian based systems, or if compilation from source fails, please use the precompiled version (available here). Again, see the README file for installation instructions.

Usage

  • The method needs protein profiles as generated by Blast (older versions) as input. The database we used for Blast was the entire Uniprot, redundancy reduced to 80% maximum pairwise sequence identity with CD-HIT. The Blast command was as follows:

blastpgp -a 1 -F F -j 3 -b 3000 -e 1e-3 -h 1e-3 -d <path to non-redundant blast database> -i <path to fasta file containing exactly one target protein> -Q <path to output profile file>

  • For large prediction tasks, you may want to use the --with-matrix option. It allows to precompute the kernel matrix during training and then to re-use it later for predictions. In this mode, the input proteins are expected to be the same for both training and testing (but not the interactions themselves). This is particularly useful for whole interactome predictions.
  • All other parameters should be self-explanatory from the command line manual (call profppikernel without parameters to display). Please feel free to contact hampt@rostlab.org for questions, however.

Changelog

1.0.4

  • New feature: precomputed kernel matrix (see Usage)
  • Java now called via bash
  • Java path can now be set via JAVA_HOME environment variable
  • Fastprofkernel binary is compiled statically

1.0.3

  • Fixed Bug that would require first sample to be positive in .classes file
  • Fixed static paths in precompiled package

Bugs and Other Issues

Please report bugs and other issues via Bugzilla or via email to hampt@rostlab.org.

References

T. Hamp and Rost, B. (2015): Evolutionary profiles improve protein–protein interaction prediction from sequence. Bioinformatics, in press

T. Hamp and Rost, B. (2014): More challenges for machine learning protein-protein interactions. Bioinformatics, 31 (10): 1521-1525

T. Hamp, T. Goldberg and Rost, B. (2013): Accelerating the Original Profile Kernel. PLoS One, 8(4), e68459

Contact

For questions, please contact hampt@rostlab.org

Personal tools