Difference between revisions of "LocTree2 - Protein sub-cellular localization prediction for all domains of life"

From Rost Lab Open
Line 2: Line 2:
 
Subcellular localization is one and easily definable aspect of protein function. Compautational prediction of localization continues to provide an invaluable help especially in whole genome analyses and annotations. Several methods have been developed to predict localization, yet many challenges remain to be tackled.
 
Subcellular localization is one and easily definable aspect of protein function. Compautational prediction of localization continues to provide an invaluable help especially in whole genome analyses and annotations. Several methods have been developed to predict localization, yet many challenges remain to be tackled.
   
We at RostLab developed a novel method, LocTree2, that predicts localization for all proteins in all domains of life. Similar to our previus method, [https://www.rostlab.org/owiki/index.php/Loctree LocTree], we incorporate a system of hierarchically organized Support Vector Machines to mimick the protein trafficking mechanism in cells. Please note that other than the hierarchy and the name LocTree and LocTree2 have nothing in common.
+
We at [http://rostlab.org/cms/ RostLab] developed a novel method, LocTree2, that predicts localization for all proteins in all domains of life. Similar to our previus method, [https://www.rostlab.org/owiki/index.php/Loctree LocTree], we incorporate a system of hierarchically organized Support Vector Machines to mimick the protein trafficking mechanism in cells. Please note that other than the hierarchy and the name LocTree and LocTree2 have nothing in common.
   
 
Amongst the novel aspects of LocTree2 are:
 
Amongst the novel aspects of LocTree2 are:
Line 13: Line 13:
 
* top performance even for protein fragments
 
* top performance even for protein fragments
   
== Method Design ==
+
== Method design ==
 
LocTree2 combines three different systems of classification trees to predict 3 localization classes in Archaea, 6 classes in Bacteria and 18 classes in Eukaryota (Figure 1).
 
LocTree2 combines three different systems of classification trees to predict 3 localization classes in Archaea, 6 classes in Bacteria and 18 classes in Eukaryota (Figure 1).
 
[[Image:LocTree2_HierarchicalArchitecture.jpg |frame|right|Fig 1: Hierarchical architecture of LocTree2. Prediction of protein localization follows a different tree for each of the three domains of life: (a) Archaea, (b) Bacteria and (c) Eukaryota. Abbreviations: CHL, chloroplast; CHLM, chloroplast membrane; CYT, cytosol; ER, endoplasmic reticulum; ERM, endoplasmic reticulum membrane; EXT, extra-cellular; FIM, fimbrium; GOL, Golgi apparatus; GOLM, Golgi apparatus membrane; MIT, mitochondria; MITM, mitochondria membrane; NUC, nucleus; NUCM, nucleus membrane; OM, outer membrane; PERI, periplasmic space; PER, peroxisome; PERM, peroxisome membrane; PM, plasma membrane; PLAS, plastid; VAC, vacuole; VACM, vacuole membrane.]]
 
[[Image:LocTree2_HierarchicalArchitecture.jpg |frame|right|Fig 1: Hierarchical architecture of LocTree2. Prediction of protein localization follows a different tree for each of the three domains of life: (a) Archaea, (b) Bacteria and (c) Eukaryota. Abbreviations: CHL, chloroplast; CHLM, chloroplast membrane; CYT, cytosol; ER, endoplasmic reticulum; ERM, endoplasmic reticulum membrane; EXT, extra-cellular; FIM, fimbrium; GOL, Golgi apparatus; GOLM, Golgi apparatus membrane; MIT, mitochondria; MITM, mitochondria membrane; NUC, nucleus; NUCM, nucleus membrane; OM, outer membrane; PERI, periplasmic space; PER, peroxisome; PERM, peroxisome membrane; PM, plasma membrane; PLAS, plastid; VAC, vacuole; VACM, vacuole membrane.]]
Line 19: Line 19:
   
 
=== Input ===
 
=== Input ===
  +
  +
=== Profile Kernel ===
   
 
=== Prediction algorithm ===
 
=== Prediction algorithm ===
 
Each hierarchy mimics the biological sorting mechanism in that domain (in eukaryotes membrane and non-membrane proteins are treated separately). The branches represent paths of the protein sorting, the leaves the final prediction of one localization class, and the internal nodes are the decision points along the path. These decisions are implemented as binary Support Vector Machines (SVMs)
 
Each hierarchy mimics the biological sorting mechanism in that domain (in eukaryotes membrane and non-membrane proteins are treated separately). The branches represent paths of the protein sorting, the leaves the final prediction of one localization class, and the internal nodes are the decision points along the path. These decisions are implemented as binary Support Vector Machines (SVMs)
   
=== Contact ===
+
=== Classification trees of SVMs ===
  +
  +
=== Reliability index ===
  +
  +
== Availability/ Download ==
  +
* The program can be accessed online via the [http://www.predictprotein.org/ PredictProtein] service
  +
* Standalone version can be downloaded as a zip file here
  +
  +
== Data ==
  +
Data sets used for development and evaluation of LocTree2 can be accessed here.
  +
  +
== Contact ==
 
For questions, please contact localization@rostlab.org
 
For questions, please contact localization@rostlab.org

Revision as of 21:43, 9 June 2012

Introduction

Subcellular localization is one and easily definable aspect of protein function. Compautational prediction of localization continues to provide an invaluable help especially in whole genome analyses and annotations. Several methods have been developed to predict localization, yet many challenges remain to be tackled.

We at RostLab developed a novel method, LocTree2, that predicts localization for all proteins in all domains of life. Similar to our previus method, LocTree, we incorporate a system of hierarchically organized Support Vector Machines to mimick the protein trafficking mechanism in cells. Please note that other than the hierarchy and the name LocTree and LocTree2 have nothing in common.

Amongst the novel aspects of LocTree2 are:

  • the stunning number of 18 predicted classes for Eukaryota
  • 6 classes for Bacteria and 3 classes for Archaea
  • incorporation of no other information than evolutionary profiles
  • very accurate in distinction: membrane/water-soluble globular proteins
  • high robustness against sequencing errors
  • top performance even for protein fragments

Method design

LocTree2 combines three different systems of classification trees to predict 3 localization classes in Archaea, 6 classes in Bacteria and 18 classes in Eukaryota (Figure 1).

Fig 1: Hierarchical architecture of LocTree2. Prediction of protein localization follows a different tree for each of the three domains of life: (a) Archaea, (b) Bacteria and (c) Eukaryota. Abbreviations: CHL, chloroplast; CHLM, chloroplast membrane; CYT, cytosol; ER, endoplasmic reticulum; ERM, endoplasmic reticulum membrane; EXT, extra-cellular; FIM, fimbrium; GOL, Golgi apparatus; GOLM, Golgi apparatus membrane; MIT, mitochondria; MITM, mitochondria membrane; NUC, nucleus; NUCM, nucleus membrane; OM, outer membrane; PERI, periplasmic space; PER, peroxisome; PERM, peroxisome membrane; PM, plasma membrane; PLAS, plastid; VAC, vacuole; VACM, vacuole membrane.


Input

Profile Kernel

Prediction algorithm

Each hierarchy mimics the biological sorting mechanism in that domain (in eukaryotes membrane and non-membrane proteins are treated separately). The branches represent paths of the protein sorting, the leaves the final prediction of one localization class, and the internal nodes are the decision points along the path. These decisions are implemented as binary Support Vector Machines (SVMs)

Classification trees of SVMs

Reliability index

Availability/ Download

  • The program can be accessed online via the PredictProtein service
  • Standalone version can be downloaded as a zip file here

Data

Data sets used for development and evaluation of LocTree2 can be accessed here.

Contact

For questions, please contact localization@rostlab.org