Difference between revisions of "PEffect - prediction of bacterial type III effector proteins"
(→Reference) |
|||
(19 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
== Web server == |
== Web server == |
||
− | + | http://bromberglab.org/services/pEffect/ |
|
== Introduction == |
== Introduction == |
||
Line 18: | Line 18: | ||
a. e-mail address: a notification for completed prediction result and the access link are sent to the provided email address (Optional) |
a. e-mail address: a notification for completed prediction result and the access link are sent to the provided email address (Optional) |
||
+ | |||
+ | [[File:AccCov.pEffect.v4.png|500px|right|alt=Figure 1|frame| Fig 1: Reliable predictions are more accurate. The figure shows the cumulative percent of accuracy/coverage of pEffect predictions at or above a given reliability index (RI). The graphs were obtained using the homology-reduced sets of 115 type III effector and 3,460 non-effector proteins in five-fold cross-validation. At the default reliability score of RI=50 (black vertical line), 95% of type III effectors are identified at 87% accuracy (black arrow). At a higher reliability score of RI=80 (gray vertical line), prediction accuracy increases to 97% at the cost of lower coverage of 78% (gray arrow).]] |
||
=== Output === |
=== Output === |
||
Line 31: | Line 33: | ||
4. annotation type of the prediction (PSI-BLAST or SVM) |
4. annotation type of the prediction (PSI-BLAST or SVM) |
||
− | For PSI-BLAST predictions, the web site provides ‘per click’ on the annotation type (''i.e.'' PSI-BLAST) the information about the closest homolog and its PSI-BLAST alignment to query. |
+ | For PSI-BLAST predictions, the web site provides ‘per click’ on the annotation type (''i.e.'' PSI-BLAST) the information about the closest homolog and its PSI-BLAST alignment to query. [https://rostlab.org/services/peffect/help Example.] |
− | |||
− | [https://rostlab.org/services/peffect/help Example.] |
||
− | |||
− | [[File:AccCov.pEffect.v4.png |frame|right|alt=Figure 1| Fig 1: Reliable predictions are more accurate. The figure shows the cumulative percent of accuracy/coverage of pEffect predictions at or above a given reliability index (RI). The graphs were obtained using the homology-reduced sets of 115 type III effector and 3,460 non-effector proteins in five-fold cross-validation. At the default reliability score of RI=50 (black vertical line), 95% of type III effectors are identified at 87% accuracy (black arrow). At a higher reliability score of RI=80 (gray vertical line), prediction accuracy increases to 97% at the cost of lower coverage of 78% (gray arrow).|50px]] |
||
=== Prediction reliability === |
=== Prediction reliability === |
||
Line 55: | Line 57: | ||
! style="text-align: center;" style="background:#efefef;"| 10000 Sequences |
! style="text-align: center;" style="background:#efefef;"| 10000 Sequences |
||
|- |
|- |
||
− | | Run 1 || style="text-align: center;" | |
+ | | Run 1 || style="text-align: center;" | 2.3s || style="text-align: center;" | 13.0s || style="text-align: center;" | 1m8.7s || style="text-align: center;" | 2m26.7s || style="text-align: center;" | 7m42.4s || style="text-align: center;" | 13m11.8s || style="text-align: center;" | 25m15.6s |
|- |
|- |
||
− | | Run 2 || style="text-align: center;" | |
+ | | Run 2 || style="text-align: center;" | 1.5s || style="text-align: center;" | 13.1s || style="text-align: center;" | 1m13.2s || style="text-align: center;" | 2m22.3s || style="text-align: center;" | 7m37.0s || style="text-align: center;" | 13m1.5s || style="text-align: center;" | 25m43.1s |
|- |
|- |
||
− | | Run 3 || style="text-align: center;" | |
+ | | Run 3 || style="text-align: center;" | 1.5s || style="text-align: center;" | 13.5s || style="text-align: center;" | 1m13.1s || style="text-align: center;" | 2m27.3s || style="text-align: center;" | 7m34.5s || style="text-align: center;" | 12m53.5s || style="text-align: center;" | 24m57.6s |
|} |
|} |
||
Line 65: | Line 67: | ||
== Availability/ Download == |
== Availability/ Download == |
||
− | * pEffect's web server is available at |
+ | * pEffect's web server is available at http://bromberglab.org/services/pEffect/ |
− | * |
+ | * The standalone version of pEffect can be downloaded as a [https://rostlab.org/services/peffect/downloads zip] or [ftp://rostlab.org/peffect/ tar.gz] file. |
+ | * The Debian package can be downloaded from [https://rostlab.org/services/peffect/db/peffect_1.0.0_amd64.deb here] |
||
== Data == |
== Data == |
||
* Data sets used for development and evaluation of pEffect can be accessed [https://rostlab.org/services/peffect/downloads here]. |
* Data sets used for development and evaluation of pEffect can be accessed [https://rostlab.org/services/peffect/downloads here]. |
||
− | * Predictions of whole proteomes in Gram-negative and Gram-positive Bacteria, as well as in Archaea can be downloaded from [https://rostlab.org/services/peffect/proteomes here]. |
+ | * Predictions of whole proteomes in Gram-negative and Gram-positive Bacteria, as well as in Archaea can be downloaded from [https://rostlab.org/services/peffect/proteomes here] or [https://rostlab.org/services/peffect/downloads here]. |
== Reference == |
== Reference == |
||
+ | [https://www.ncbi.nlm.nih.gov/pmc/articles/pmid/27713481/ Goldberg T, Rost B, Bromberg Y. Computational prediction shines light on type III secretion origins. Sci Rep. 2016 Oct 7;6:34516.] |
||
+ | doi: 10.1038/srep34516; PubMed PMID: 27713481; PubMed Central PMCID: PMC5054392. |
||
== Contact == |
== Contact == |
Latest revision as of 21:14, 6 May 2017
Contents
Web server
http://bromberglab.org/services/pEffect/
Introduction
The type III secretion system is one of the causes of a wide range of bacterial infections in human, animals and plants. This system comprises a hollow needle-like structure localized on the surface of bacterial cells that injects specific bacterial proteins, the so-called effectors, directly into the cytoplasm of a host cell. During infection, effectors convert host resources to their advantage and promote pathogenicity.
We - Tatyana Goldberg, Burkhard Rost and Yana Bromberg - at BrombergLab and RostLab developed a novel method, pEffect that predicts bacterial type III effector proteins. In our method, we combine sequence-based homology searches and advanced machine learning to accurately predict effector proteins. We use information encoded in the entire protein sequence for our predictions.
Method design
pEffect is a method that combines sequence similarity-based inferences (PSI-BLAST) with de-novo predictions using machine learning techniques (Support Vector Machines; SVM). For a query protein it first runs PSI-BLAST to identify a homolog in the set of known and annotated effector proteins. If such a homolog is available, then its annotation (i.e. type III effector) is being transferred to a query protein. If a homolog is not available, pEffect triggers an SVM that predicts effector proteins through searches of k-consecutive residues that are known from annotated proteins.
Input
The input to the server is:
1. one or more FASTA-formatted protein sequences. The sequences must be in one-letter amino acid code (not case-sensitive). The allowed amino acids are: ACDEFGHIKLMNPQRSTVWY and X (unknown). Example.
a. e-mail address: a notification for completed prediction result and the access link are sent to the provided email address (Optional)

Output
For every query protein, result contains four basic values:
1. the protein identifier as provided by the user
2. the reliability score of a prediction on a 0-100 scale with 100 being the most confident prediction
3. prediction of a protein to be a type III effector
4. annotation type of the prediction (PSI-BLAST or SVM)
For PSI-BLAST predictions, the web site provides ‘per click’ on the annotation type (i.e. PSI-BLAST) the information about the closest homolog and its PSI-BLAST alignment to query. Example.
Prediction reliability
Every prediction result is supported by a Reliability Index (RI) measuring the strength of a prediction. The RI is a value between 0 and 100, with 100 denoting the most confident predictions.
We rigorously evaluated the reliability of pEffect predictions on a non-redundant test set of proteins (Fig. 1). We observed that at the default threshold of RI>50, over 87% of all predictions of type III effectors are correct and 95% of all effectors in our set are identified (Fig. 1; black arrow). At a higher RI>80 effector predictions are correct 96% of the time, but only 78% of all effectors in the set are identified (Fig. 1; gray arrow).
Runtime analysis
pEffect is built to run a homology-based PSI-BLAST; if no hit is identified then a de-novo SVM prediction is used.
While PSI-BLAST searches are fast, SVM's runtime depends on the number of query protein sequences. We measured pEffect's runtime on a Dell M605 machine with a Six-Core AMD Opteron processor (2.4 GHz, 6MB and 75W ACP) running on Linux.
1 Sequence | 100 Sequences | 500 Sequences | 1000 Sequences | 3000 Sequences | 5000 Sequences | 10000 Sequences | |
---|---|---|---|---|---|---|---|
Run 1 | 2.3s | 13.0s | 1m8.7s | 2m26.7s | 7m42.4s | 13m11.8s | 25m15.6s |
Run 2 | 1.5s | 13.1s | 1m13.2s | 2m22.3s | 7m37.0s | 13m1.5s | 25m43.1s |
Run 3 | 1.5s | 13.5s | 1m13.1s | 2m27.3s | 7m34.5s | 12m53.5s | 24m57.6s |
Note: to increase server's response time we store all PSI-BLAST profile files in the PredictProtein cache (current size: results for >11Mio sequences). These can be retrieved from the cache very fast. For novel protein sequences for which we don't have PSI-BLAST profiles in the cache the runtimes increases substantially.
Availability/ Download
- pEffect's web server is available at http://bromberglab.org/services/pEffect/
- The standalone version of pEffect can be downloaded as a zip or tar.gz file.
- The Debian package can be downloaded from here
Data
- Data sets used for development and evaluation of pEffect can be accessed here.
- Predictions of whole proteomes in Gram-negative and Gram-positive Bacteria, as well as in Archaea can be downloaded from here or here.
Reference
doi: 10.1038/srep34516; PubMed PMID: 27713481; PubMed Central PMCID: PMC5054392.
Contact
For questions, please contact localization@rostlab.org