Bottom - Index of papers - Paper in HTML - Abstract - Paper as PDF - CUBIC
| Title: | Inferring sub-cellular localization through automated lexical analysis |
| Author: | Rajesh Nair & Burkhard Rost |
| Quote: | Bioinformatics, 2002, 11, 2836-2847 (ISMB'2002 Proceedings). |
Motivation: The SWISS-PROT sequence database contains keywords of functional annotations for many proteins. In contrast, information about the sub-cellular localization is only available for few proteins. Experts can often infer localization from keywords describing protein function. We developed LOCkey, a fully automated method for lexical analysis of SWISS-PROT keywords that assigns sub-cellular localization. With the rapid growth in sequence data, the biochemical characterisation of sequences has been falling behind. Our method may be a useful tool for supplementing functional information already automatically available.
Results: The method reached a level of more than 82% accuracy in a full cross-validation test. Due to a lack of functional annotations, we could infer localization for less than half of all proteins in SWISS-PROT. We applied LOCkey to annotate five entirely sequenced proteomes, namely Saccharomyces cerevisiae (yeast), Caenorhabditis elegans (worm), Drosophila melanogaster (fly), Arabidopsis thaliana (plant) and a subset of all human proteins. LOCkey found about 8000 new annotations of sub-cellular localization for these eukaryotes.
Availability: Annotations of localization for eukaryotes at: http://cubic.bioc.columbia.edu/services/LOCkey.
Contact: rost@columbia.edu
Key words: genome sequence analysis, predicting sub-cellular localization, protein function, lexical analysis.