Loctree3

From Rost Lab Open
Jump to: navigation, search

Lc3 logo version2.png


See the tool's GitHub repository, for the most updated information.


Contents

Web server

https://rostlab.org/services/loctree3/

LocTree3 is an enhanced version of LocTree2

Summary

Prediction of protein subcellular localization is an important step towards elucidating protein function. LocTree2 is our previously published de-novo prediction method that classifies eukaryotic proteins in 18 localization classes, bacterial in 6 and archaeal in 3 classes. Since its development, LocTree2 has performed on a par with or better than any other state-of-the-art method. LocTree3 is a publicly available web server for LocTree2 and its extended version. This extension allows for homology inference if close homologs are available. Evaluated on a redundancy reduced set of 1682 eukaryotic proteins, the new method LocTree3 outperformed its predecessor reaching the overall accuracy Q18 = 80 ± 4%. On a set of sequence unique 479 bacterial proteins the overall prediction accuracy of LocTree3 was Q6 = 89 ± 4%.

The new web server provides:

  • a fast resource for localization prediction in all domains of life
  • free access for all users without login requirement
  • informative visualization of our predictions
  • prediction confidence for each result
  • an additional homology inference step
  • alignments and crosslinks for close homologs
  • localization predictions for over 1000 completely sequenced organisms

Method design

LocTree3 is an extension of LocTree2 that is a hierarchical system of Support Vector Machines (SVM) inspired by the sorting machinery in the cell. The predictions with SVMs are made through searches of k-consecutive residues in proteins of experimental localization annotations. The improved version LocTree3 adds a module for inferring localization information from experimentally annottaed sequence homologs using PSI-BLAST. In the absence of significant PSI-BLAST hits, LocTree2 is used.

Input

The input to the server is:

1. one or more fasta-formatted protein sequences. The sequences must be in one-letter amino acid code (not case-sensitive). The allowed amino acids are: ACDEFGHIKLMNPQRSTVWY and X (unknown). Example.

2. the domain of life: because LocTree3 predicts different localization classes for proteins from different domains of life, the domain must be chosen correctly (default: Eukarya)

3. e-mail address: a notification for completed prediction result and the access link are sent to the emai address (Optional)

Output

For every query protein, result contains four basic values:

1. the protein identifier as provided by the user

2. the reliability score of a prediction on a 0-100 scale with 100 being the most confident prediction

3. single predicted localization class

4. GO term(s) and GO identifier(s) matching the predicted class.

Every result is supported by the information on whether it comes from of a PSI-BLAST homology search or a LocTree2 de novo prediction. In case of former, the web site provides ‘per click’ on the prediction result the experimental evidence (i.e. SWISS-PROT annotation) of the best hit and its PSI-BLAST alignment to the query protein. In case of latter, ‘the click’ on the result will forward to the visual representation of the prediction, providing decision tree with values at each of the decision points leading to the final reliability score. In addition, every result is supported by a schematic representation of the biological cell highlighting the predicted localization.

Example.

Figure 1
Fig 1: More reliable predictions better. The curves show the percentage Accuracy vs. Coverage for LocTree3 predictions above a given RI threshold (from 0=unreliable to 100=most reliable). The curves were obtained on cross-validated test sets of bacterial (gray line) and eukaryotic (black line) proteins. Half of all eukaryotic proteins are predicted at RI>65; for these Q18 is above 95% (black arrow). Half of all bacterial proteins are predicted at RI>80 and Q18 above 95% (black arrow).

Prediction reliability

Every prediction result is supported by a Reliability Index (RI) measuring the strength of a prediction. The RI is a value between 0 and 100, with 100 denoting the most confident predictions.

We rigorously evaluated the reliability of LocTree3 predictions on a non-redundant test set of proteins (Fig. 1). We observed that 50% of proteins with the highest reliability were predicted for bacteria at RI>80 at an overall accuracy Q6=95% (Fig. 2; gray arrow) and for eukaryotes at RI>65 at Q18=95% (Fig. 2; black arrow).

  • Q6 is six-state accuracy for predicting localization to six classes
  • Q18 is eighteen-state accuracy

Runtime analysis

LocTree3 is built to run a homology-based PSI-BLAST; if no hit is identified then a de-novo LocTree2 prediction is used for localization annotation.

While PSI-BLAST searches are fast, LocTree2 runtime depends on the protein domain and the number of query protein sequences. We measured LocTree2 runtime on a Dell M605 machine with a Six-Core AMD Opteron processor (2.4 GHz, 6MB and 75W ACP) running on Linux.

1 Sequence 100 Sequences 500 Sequences 1000 Sequences 3000 Sequences 5000 Sequences 10000 Sequences
Archaea 0.8s 3.0s 10.4s 18.8s 51m2s 1m36s 3m43s
Bacteria 3.6s 1m.09s 5m25s 9m12s 27m01s 1h4m 2h10m
Eukaryota 1m37s 8m43s 44m 1h13m 4h17m 7h47m 15h6m

Note: to increase server's response time we store all PSI-BLAST profile files (required for LocTree2) in the PredictProtein cache (current size: results for >11Mio sequences). These can be retrieved from the cache very fast. For novel protein sequences for which we don't have PSI-BLAST profiles in the cache the runtimes increases substantially.

Availability/ Download

Data

Data sets used for development and evaluation of LocTree3 can be accessed here.

Reference

LocTree3 prediction of localization

Goldberg T, Hecht M, Hamp T, Karl T, Yachdav G, Ahmed N, Altermann U, Angerer P, Ansorge S, Balasz K, Bernhofer M, Betz A, Cizmadija L, Do KT, Gerke J, Greil R, Joerdens V, Hastreiter M, Hembach K, Herzog M, Kalemanov M, Kluge M, Meier A, Nasir H, Neumaier U, Prade V, Reeb J, Sorokoumov A, Troshani I, Vorberg S, Waldraff S, Zierer J, Nielsen H, Rost B.

Nucleic Acids Researh. 2014 Jul;42(Web Server issue):W350-5 (Full Text, PDF)

Supporting Online Material


LocTree2 predicts localization for all domains of life

Goldberg T, Hamp T, Rost B.

Bioinformatics 2012 28: i458-i465 (Full Text, PDF)

Supporting Online Material

Contact

For questions, please contact localization@rostlab.org

Personal tools