- Research
- Teaching
- Group
- Events
- News Archive
Title | UniqueProt: Creating representative protein sequence sets. |
Publication Type | Journal Article |
Year of Publication | 2003 |
Authors | Mika, S, Rost, B |
Journal | Nucleic Acids Res |
Volume | 31 |
Issue | 13 |
Pagination | 3789-91 |
Date Published | 2003 Jul 1 |
ISSN | 1362-4962 |
Keywords | Algorithms, Internet, Protein Structure, Tertiary, Proteins, Sequence Alignment, Sequence Analysis, Protein, Software, User-Computer Interface |
Abstract | UniqueProt is a practical and easy to use web service designed to create representative, unbiased data sets of protein sequences. The largest possible representative sets are found through a simple greedy algorithm using the HSSP-value to establish sequence similarity. UniqueProt is not a real clustering program in the sense that the 'representatives' are not at the centres of well-defined clusters since the definition of such clusters is problem-specific. Overall, UniqueProt is a reasonable fast solution for bias in data sets. The service is accessible at http://cubic.bioc.columbia.edu/services/uniqueprot; a command-line version for Linux is downloadable from this web site. |
Alternate Journal | Nucleic Acids Res. |
PubMed ID | 12824419 |
PubMed Central ID | PMC169026 |
Grant List | 1-R01-LM07329-01 / LM / NLM NIH HHS / United States R01-GM63029-01 / GM / NIGMS NIH HHS / United States |