Difference between revisions of "Profppikernel"
Line 17: | Line 17: | ||
blastpgp -a 1 -F F -j 3 -b 3000 -e 1e-3 -h 1e-3 -d <path to non-redundant blast database> -i <path to fasta file containing exactly one target protein> -Q <path to output profile file> |
blastpgp -a 1 -F F -j 3 -b 3000 -e 1e-3 -h 1e-3 -d <path to non-redundant blast database> -i <path to fasta file containing exactly one target protein> -Q <path to output profile file> |
||
+ | |||
+ | == Changelog == |
||
+ | |||
+ | '''1.0.3''' |
||
+ | * Fixed Bug that would require first sample to be positive in .classes file |
||
+ | * Fixed static paths in precompiled package |
||
== Bugs and Other Issues == |
== Bugs and Other Issues == |
Revision as of 11:18, 26 July 2015
Contents
Description
Profppikernel uses an accelerated version of the original profile kernel [1] to train SVM based protein-protein interaction (PPI) prediction models and to predict new PPIs from sequence alone.
In the first mode of operation, 'training', the user provides evolutionary protein profiles and PPIs between them (labels) as input and profppikernel outputs a folder with all model files required for predictions. In the second mode, 'prediction', the user provides the path to such a model folder and to the query profiles and PPIs. profppikernel then predicts their probability to interact.
Best human predictions
We have predicted all protein pairs in human for which both (C3) or one (C2) protein are sequence-dissimilar to known reliably annotated human interactions. The best 100,000 predictions in either class can be downloaded here.
Availability
The .tar.gz source package contains all source codes together with a sample classification problem. Download it here and compile and install it with the included make based installation (see the included README file). The main executable is profppikernel.
Alternatively, you can try the precompiled version (available here). Again, see the README file for installation instructions.
Hints
- For large prediction tasks, we may be able to perform the predictions for you and/or provide you with considerably faster customized implementations. Please contact hampt@rostlab.org for more information.
- The method needs protein profiles as generated by Blast as input. The database we used for Blast was the entire Uniprot, redundancy reduced to 80% maximum pairwise sequence identity with CD-HIT. The Blast command was as follows:
blastpgp -a 1 -F F -j 3 -b 3000 -e 1e-3 -h 1e-3 -d <path to non-redundant blast database> -i <path to fasta file containing exactly one target protein> -Q <path to output profile file>
Changelog
1.0.3
- Fixed Bug that would require first sample to be positive in .classes file
- Fixed static paths in precompiled package
Bugs and Other Issues
Please report bugs and other issues via Bugzilla.
References
Contact
For questions, please contact hampt@rostlab.org