- Research
- Teaching
- Group
- Events
- News Archive
Title | Inferring sub-cellular localization through automated lexical analysis. |
Publication Type | Journal Article |
Year of Publication | 2002 |
Authors | Nair, R, Rost, B |
Journal | Bioinformatics |
Volume | 18 Suppl 1 |
Pagination | S78-86 |
Date Published | 2002 |
ISSN | 1367-4803 |
Keywords | Abstracting and Indexing as Topic, Algorithms, Animals, Cellular Structures, Databases, Protein, Humans, Information Storage and Retrieval, Natural Language Processing, Pattern Recognition, Automated, Proteins, Sequence Analysis, Protein, Tissue Distribution, Vocabulary, Controlled |
Abstract | MOTIVATION: The SWISS-PROT sequence database contains keywords of functional annotations for many proteins. In contrast, information about the sub-cellular localization is available for only a few proteins. Experts can often infer localization from keywords describing protein function. We developed LOCkey, a fully automated method for lexical analysis of SWISS-PROT keywords that assigns sub-cellular localization. With the rapid growth in sequence data, the biochemical characterisation of sequences has been falling behind. Our method may be a useful tool for supplementing functional information already automatically available.RESULTS: The method reached a level of more than 82% accuracy in a full cross-validation test. Due to a lack of functional annotations, we could infer localization for fewer than half of all proteins in SWISS-PROT. We applied LOCkey to annotate five entirely sequenced proteomes, namely Saccharomyces cerevisiae (yeast), Caenorhabditis elegans (worm), Drosophila melanogaster (fly), Arabidopsis thaliana (plant) and a subset of all human proteins. LOCkey found about 8000 new annotations of sub-cellular localization for these eukaryotes. |
Alternate Journal | Bioinformatics |
PubMed ID | 12169534 |
Grant List | 1-P50-GM62413-01 / GM / NIGMS NIH HHS / United States R01-GM63029-01 / GM / NIGMS NIH HHS / United States |