Difference between revisions of "PEffect - prediction of bacterial type III effector proteins"

From Rost Lab Open
Jump to: navigation, search
(Data)
(Reference)
 
(25 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
== Web server ==
 
== Web server ==
   
https://rostlab.org/services/pEffect/
+
http://bromberglab.org/services/pEffect/
   
 
== Introduction ==
 
== Introduction ==
Line 18: Line 18:
   
 
a. e-mail address: a notification for completed prediction result and the access link are sent to the provided email address (Optional)
 
a. e-mail address: a notification for completed prediction result and the access link are sent to the provided email address (Optional)
  +
  +
[[File:AccCov.pEffect.v4.png|500px|right|alt=Figure 1|frame| Fig 1: Reliable predictions are more accurate. The figure shows the cumulative percent of accuracy/coverage of pEffect predictions at or above a given reliability index (RI). The graphs were obtained using the homology-reduced sets of 115 type III effector and 3,460 non-effector proteins in five-fold cross-validation. At the default reliability score of RI=50 (black vertical line), 95% of type III effectors are identified at 87% accuracy (black arrow). At a higher reliability score of RI=80 (gray vertical line), prediction accuracy increases to 97% at the cost of lower coverage of 78% (gray arrow).]]
   
 
=== Output ===
 
=== Output ===
Line 31: Line 33:
 
4. annotation type of the prediction (PSI-BLAST or SVM)
 
4. annotation type of the prediction (PSI-BLAST or SVM)
   
For PSI-BLAST predictions, the web site provides ‘per click’ on the annotation type (''i.e.'' PSI-BLAST) the information about the closest homolog and its PSI-BLAST alignment to query.
+
For PSI-BLAST predictions, the web site provides ‘per click’ on the annotation type (''i.e.'' PSI-BLAST) the information about the closest homolog and its PSI-BLAST alignment to query. [https://rostlab.org/services/peffect/help Example.]
 
[https://rostlab.org/services/peffect/help Example.]
 
 
[[File:LocTree3_ReliabilityIndices.png |frame|right|alt=Figure 1| Fig 1: More reliable predictions better. The curves show the percentage Accuracy vs. Coverage for LocTree3 predictions above a given RI threshold (from 0=unreliable to 100=most reliable). The curves were obtained on cross-validated test sets of bacterial (gray line) and eukaryotic (black line) proteins. Half of all eukaryotic proteins are predicted at RI>65; for these Q18 is above 95% (black arrow). Half of all bacterial proteins are predicted at RI>80 and Q18 above 95% (black arrow). |100px]]
 
   
 
=== Prediction reliability ===
 
=== Prediction reliability ===
 
Every prediction result is supported by a Reliability Index (RI) measuring the strength of a prediction. The RI is a value between 0 and 100, with 100 denoting the most confident predictions.
 
Every prediction result is supported by a Reliability Index (RI) measuring the strength of a prediction. The RI is a value between 0 and 100, with 100 denoting the most confident predictions.
   
We rigorously evaluated the reliability of pEffect predictions on a non-redundant test set of proteins (Fig. 1). We observed that at the default threshold of RI>50, over 87% of all predictions of type III effectors are correct and 95% of all effectors in our set are identified (Fig. 2; black arrow). At a higher RI>80 effector predictions are correct 96% of the time, but only 78% of all effectors in the set are identified (Fig. 2; gray arrow).
+
We rigorously evaluated the reliability of pEffect predictions on a non-redundant test set of proteins (Fig. 1). We observed that at the default threshold of RI>50, over 87% of all predictions of type III effectors are correct and 95% of all effectors in our set are identified (Fig. 1; black arrow). At a higher RI>80 effector predictions are correct 96% of the time, but only 78% of all effectors in the set are identified (Fig. 1; gray arrow).
   
 
=== Runtime analysis ===
 
=== Runtime analysis ===
Line 55: Line 57:
 
! style="text-align: center;" style="background:#efefef;"| 10000 Sequences
 
! style="text-align: center;" style="background:#efefef;"| 10000 Sequences
 
|-
 
|-
| Run 1 || style="text-align: center;" | na || style="text-align: center;" | na || style="text-align: center;" | na || style="text-align: center;" | na || style="text-align: center;" | na || style="text-align: center;" | na || style="text-align: center;" | na
+
| Run 1 || style="text-align: center;" | 2.3s || style="text-align: center;" | 13.0s || style="text-align: center;" | 1m8.7s || style="text-align: center;" | 2m26.7s || style="text-align: center;" | 7m42.4s || style="text-align: center;" | 13m11.8s || style="text-align: center;" | 25m15.6s
 
|-
 
|-
| Run 2 || style="text-align: center;" | na || style="text-align: center;" | na || style="text-align: center;" | na || style="text-align: center;" | na || style="text-align: center;" | na || style="text-align: center;" | na || style="text-align: center;" | na
+
| Run 2 || style="text-align: center;" | 1.5s || style="text-align: center;" | 13.1s || style="text-align: center;" | 1m13.2s || style="text-align: center;" | 2m22.3s || style="text-align: center;" | 7m37.0s || style="text-align: center;" | 13m1.5s || style="text-align: center;" | 25m43.1s
 
|-
 
|-
| Run 3 || style="text-align: center;" | na || style="text-align: center;" | na || style="text-align: center;" | na || style="text-align: center;" | na || style="text-align: center;" | na || style="text-align: center;" | na || style="text-align: center;" | na
+
| Run 3 || style="text-align: center;" | 1.5s || style="text-align: center;" | 13.5s || style="text-align: center;" | 1m13.1s || style="text-align: center;" | 2m27.3s || style="text-align: center;" | 7m34.5s || style="text-align: center;" | 12m53.5s || style="text-align: center;" | 24m57.6s
 
|}
 
|}
   
Line 65: Line 67:
   
 
== Availability/ Download ==
 
== Availability/ Download ==
* pEffect's web server is available at https://rostlab.org/services/pEffect/
+
* pEffect's web server is available at http://bromberglab.org/services/pEffect/
* Standalone version of pEffect can be downloaded as a Debian package [https://rostlab.org/owiki/index.php/Packages here]
+
* The standalone version of pEffect can be downloaded as a [https://rostlab.org/services/peffect/downloads zip] or [ftp://rostlab.org/peffect/ tar.gz] file.
  +
* The Debian package can be downloaded from [https://rostlab.org/services/peffect/db/peffect_1.0.0_amd64.deb here]
   
 
== Data ==
 
== Data ==
 
* Data sets used for development and evaluation of pEffect can be accessed [https://rostlab.org/services/peffect/downloads here].
 
* Data sets used for development and evaluation of pEffect can be accessed [https://rostlab.org/services/peffect/downloads here].
* Predictions of whole proteomes in Gram-negative and Gram-positive Bacteria, as well as in Archaea can be downloaded from [https://rostlab.org/services/peffect/proteomes here].
+
* Predictions of whole proteomes in Gram-negative and Gram-positive Bacteria, as well as in Archaea can be downloaded from [https://rostlab.org/services/peffect/proteomes here] or [https://rostlab.org/services/peffect/downloads here].
   
 
== Reference ==
 
== Reference ==
  +
[https://www.ncbi.nlm.nih.gov/pmc/articles/pmid/27713481/ Goldberg T, Rost B, Bromberg Y. Computational prediction shines light on type III secretion origins. Sci Rep. 2016 Oct 7;6:34516.]
   
  +
doi: 10.1038/srep34516; PubMed PMID: 27713481; PubMed Central PMCID: PMC5054392.
   
 
== Contact ==
 
== Contact ==

Latest revision as of 21:14, 6 May 2017

Web server

http://bromberglab.org/services/pEffect/

Introduction

The type III secretion system is one of the causes of a wide range of bacterial infections in human, animals and plants. This system comprises a hollow needle-like structure localized on the surface of bacterial cells that injects specific bacterial proteins, the so-called effectors, directly into the cytoplasm of a host cell. During infection, effectors convert host resources to their advantage and promote pathogenicity.

We - Tatyana Goldberg, Burkhard Rost and Yana Bromberg - at BrombergLab and RostLab developed a novel method, pEffect that predicts bacterial type III effector proteins. In our method, we combine sequence-based homology searches and advanced machine learning to accurately predict effector proteins. We use information encoded in the entire protein sequence for our predictions.

Method design

pEffect is a method that combines sequence similarity-based inferences (PSI-BLAST) with de-novo predictions using machine learning techniques (Support Vector Machines; SVM). For a query protein it first runs PSI-BLAST to identify a homolog in the set of known and annotated effector proteins. If such a homolog is available, then its annotation (i.e. type III effector) is being transferred to a query protein. If a homolog is not available, pEffect triggers an SVM that predicts effector proteins through searches of k-consecutive residues that are known from annotated proteins.

Input

The input to the server is:

1. one or more FASTA-formatted protein sequences. The sequences must be in one-letter amino acid code (not case-sensitive). The allowed amino acids are: ACDEFGHIKLMNPQRSTVWY and X (unknown). Example.

a. e-mail address: a notification for completed prediction result and the access link are sent to the provided email address (Optional)

Figure 1
Fig 1: Reliable predictions are more accurate. The figure shows the cumulative percent of accuracy/coverage of pEffect predictions at or above a given reliability index (RI). The graphs were obtained using the homology-reduced sets of 115 type III effector and 3,460 non-effector proteins in five-fold cross-validation. At the default reliability score of RI=50 (black vertical line), 95% of type III effectors are identified at 87% accuracy (black arrow). At a higher reliability score of RI=80 (gray vertical line), prediction accuracy increases to 97% at the cost of lower coverage of 78% (gray arrow).

Output

For every query protein, result contains four basic values:

1. the protein identifier as provided by the user

2. the reliability score of a prediction on a 0-100 scale with 100 being the most confident prediction

3. prediction of a protein to be a type III effector

4. annotation type of the prediction (PSI-BLAST or SVM)

For PSI-BLAST predictions, the web site provides ‘per click’ on the annotation type (i.e. PSI-BLAST) the information about the closest homolog and its PSI-BLAST alignment to query. Example.

Prediction reliability

Every prediction result is supported by a Reliability Index (RI) measuring the strength of a prediction. The RI is a value between 0 and 100, with 100 denoting the most confident predictions.

We rigorously evaluated the reliability of pEffect predictions on a non-redundant test set of proteins (Fig. 1). We observed that at the default threshold of RI>50, over 87% of all predictions of type III effectors are correct and 95% of all effectors in our set are identified (Fig. 1; black arrow). At a higher RI>80 effector predictions are correct 96% of the time, but only 78% of all effectors in the set are identified (Fig. 1; gray arrow).

Runtime analysis

pEffect is built to run a homology-based PSI-BLAST; if no hit is identified then a de-novo SVM prediction is used.

While PSI-BLAST searches are fast, SVM's runtime depends on the number of query protein sequences. We measured pEffect's runtime on a Dell M605 machine with a Six-Core AMD Opteron processor (2.4 GHz, 6MB and 75W ACP) running on Linux.

1 Sequence 100 Sequences 500 Sequences 1000 Sequences 3000 Sequences 5000 Sequences 10000 Sequences
Run 1 2.3s 13.0s 1m8.7s 2m26.7s 7m42.4s 13m11.8s 25m15.6s
Run 2 1.5s 13.1s 1m13.2s 2m22.3s 7m37.0s 13m1.5s 25m43.1s
Run 3 1.5s 13.5s 1m13.1s 2m27.3s 7m34.5s 12m53.5s 24m57.6s

Note: to increase server's response time we store all PSI-BLAST profile files in the PredictProtein cache (current size: results for >11Mio sequences). These can be retrieved from the cache very fast. For novel protein sequences for which we don't have PSI-BLAST profiles in the cache the runtimes increases substantially.

Availability/ Download

Data

  • Data sets used for development and evaluation of pEffect can be accessed here.
  • Predictions of whole proteomes in Gram-negative and Gram-positive Bacteria, as well as in Archaea can be downloaded from here or here.

Reference

Goldberg T, Rost B, Bromberg Y. Computational prediction shines light on type III secretion origins. Sci Rep. 2016 Oct 7;6:34516.

doi: 10.1038/srep34516; PubMed PMID: 27713481; PubMed Central PMCID: PMC5054392.

Contact

For questions, please contact localization@rostlab.org