Profppikernel uses an accelerated version of the original profile kernel  to train SVM based protein-protein interaction (PPI) prediction models and to predict new PPIs from sequence alone.
In the first mode of operation, 'training', the user provides evolutionary protein profiles and PPIs between them (labels) as input and profppikernel outputs a folder with all model files required for predictions. In the second mode, 'prediction', the user provides the path to such a model folder and to the query profiles and PPIs. profppikernel then predicts their probability to interact.
Best human predictions
We have predicted all protein pairs in human for which both (C3) or one (C2) protein are sequence-dissimilar to known reliably annotated human interactions. The best 100,000 predictions in either class can be downloaded here.
Data sets used in the evaluation were part of a bigger analysis, available here.
For Debian-based systems, we recommend to use the .tar.gz source package. It contains all source codes together with a sample classification problem. Download it here and compile and install it with the included make based installation (see the included README file). The main executable is profppikernel.
For non-Debian based systems, or if compilation from source fails, please use the precompiled version (available here). Again, see the README file for installation instructions.
- The method needs protein profiles as generated by Blast (older versions) as input. The database we used for Blast was the entire Uniprot, redundancy reduced to 80% maximum pairwise sequence identity with CD-HIT. The Blast command was as follows:
blastpgp -a 1 -F F -j 3 -b 3000 -e 1e-3 -h 1e-3 -d <path to non-redundant blast database> -i <path to fasta file containing exactly one target protein> -Q <path to output profile file>
- For large prediction tasks, you may want to use the --with-matrix option. It allows to precompute the kernel matrix during training and then to re-use it later for predictions. In this mode, the input proteins are expected to be the same for both training and testing (but not the interactions themselves). This is particularly useful for whole interactome predictions.
- All other parameters should be self-explanatory from the command line manual (call profppikernel without parameters to display). Please feel free to contact firstname.lastname@example.org for questions, however.
- New feature: precomputed kernel matrix (see Usage)
- Java now called via bash
- Java path can now be set via JAVA_HOME environment variable
- Fastprofkernel binary is compiled statically
- Fixed Bug that would require first sample to be positive in .classes file
- Fixed static paths in precompiled package
Bugs and Other Issues
For questions, please contact email@example.com