Columbia
University
Department of Biochemistry and Molecular Biophysics/C2B2
1130 St. Nicholas Ave. Rm 805
New York, NY 10032, USA
Email: rost@columbia.edu
WWW: http://www.rostlab.org/
Tel: +1-212-851-4669
Fax: +1-212-305-7932
NOTE: This is not a full list of publications from all members of the CUBIC group.
1. B Rost (1999)
Twilight zone of protein sequence alignments. Protein Engineering 12:85-94.
2. A
Zemla, C Venclovas, K Fidelis and B Rost (1999) A modified definition of SOV, a
segment-based measure for protein secondary structure prediction assessment. Proteins:
Structure, Function, and Genetics 34:220-223.
3. F
Pazos, B Rost and A Valencia (1999) A platform for integrating threading
results with protein family analyses. Bioinformatics 15:1062-1063.
4. O
Olmea, B Rost and A Valencia (1999) Effective use of sequence correlation and
conservation in fold recognition. Journal of Molecular Biology 293:1221-1239.
5. D
Fischer, et al. (1999) CAFASP-1: critical assessment of fully automated
structure prediction methods. Proteins: Structure, Function, and Genetics Suppl 3:209-217.
6. M
Cokol, R Nair and B Rost (2000) Finding nuclear localisation signals. EMBO
Reports 1:411-415.
7. B
Rost and C Sander (2000) Third generation prediction of secondary structure. Methods
in Molecular Biology 143:71-95.
8. B
Rost (2001) Protein secondary structure prediction continues to rise. Journal
of Structural Biology 134:204-218.
9. V
Eyrich, MA Mart’-Renom, D Przybylski, A Fiser, F Pazos, A Valencia, A Sali and
B Rost (2001) EVA: continuous automatic evaluation of protein structure
prediction servers. Bioinformatics 17:1242-1243.
10. J
Liu and B Rost (2001) Comparing function and structure between entire proteomes.
Protein Science 10:1970-1979.
11. B
Rost and V Eyrich (2001) EVA: large-scale analysis of secondary structure
prediction. Proteins: Structure, Function, and Genetics 45 Suppl 5:S192-S199.
12. R
Nair and B Rost (2001) Surface profiles predict sub-cellular localisation.
preprint: Columbia University.
13. D
Fischer, A Elofsson, L Rychlewski, F Pazos, A Valencia, B Rost, AR Ortiz and
RLJ Dunbrack (2001) CAFASP2: the second critical assessment of fully automated
structure prediction methods. Proteins: Structure, Function, and Genetics 45 Suppl 5:S171-S183.
14. B
Rost, P Baldi, G Barton, J Cuff, V Eyrich, D Jones, K Karplus, R King, M Ouali,
G Pollastri and D Przybylski (2001) Simple jury predicts protein secondary
structure best. Preprint: Columbia University.
15. B
Rost and P Baldi (2001) New improvements in protein secondary structure
prediction. Preprint: Columbia University.
16. D
Przybylski and B Rost (2002) Alignments grow, secondary structure prediction
improves. Proteins: Structure, Function, and Bioinformatics 46:195-205.
17. CAF
Andersen, AG Palmer, S Brunak and B Rost (2002) Continuum secondary structure
captures protein flexibility. Structure
10:175-184.
18. B
Rost (2002) Enzyme function less conserved than anticipated. Journal of
Molecular Biology 318:595-608.
19. J
Liu and B Rost (2002) Target space for structural genomics revisited. Bioinformatics 18:922-933.
20. R
Nair and B Rost (2002) Inferring sub-cellular localisation through automated
lexical analysis. Bioinformatics 18:S78-S86.
21. J
Liu, H Tan and B Rost (2002) Loopy proteins appear conserved in evolution. Journal
of Molecular Biology 322:53-64.
22. CP
Chen, A Kernytsky and B Rost (2002) Transmembrane helix predictions revisited. Protein
Science 11:2774-2791.
23. R
Nair and B Rost (2002) Sequence conserved for sub-cellular localization. Protein
Science 11:2836-2847.
24. CP
Chen and B Rost (2002) Long membrane helices and short loops predicted less
accurately. Protein Science 2766-2773.
25. CP
Chen and B Rost (2002) State-of-the-art in membrane prediction. Applied
Bioinformatics 1:21-35.
26. B
Rost (2002) Did evolution leap to create the protein universe? Current
Opinion in Structural Biology 12:409-416.
27. G
Pollastri, D Przybylski, B Rost and P Baldi (2002) Improving the prediction of
protein secondary structure in three and eight classes using recurrent neural
networks and profiles. Proteins: Structure, Function, and Bioinformatics 47:228-235.
28. MA
Marti-Renom, MS Madhusudhan, A Fiser, B Rost and A Sali (2002) Reliability of
assessment of protein structure prediction methods. Structure 10:435-440.
29. A
Sali, MA Marti-Renom, MS Madhusudhan, A Fiser and B Rost (2002) Reply to Moult
et al. Structure 10:292-293.
30. Y
Ofran and B Rost (2003) Analysing six types of protein-protein interfaces. Journal
of Molecular Biology 325:377-387.
31. R
Nair, P Carter and B Rost (2003) NLSdb: database of nuclear localization
signals. Nucleic Acids Research 31:397-399.
32. P
Carter, J Liu and B Rost (2003) PEP: Predictions for Entire Proteomes. Nucleic
Acids Research 31:410-413.
33. B
Rost (2003) Neural networks predict protein structure: hype or hit? In: P
Frasconi and R Shamir (eds.). Artificial intelligence and heuristic methods in
bioinformatics. Amsterdam: IOS Press:34-50.
34. Y
Ofran and B Rost (2003) Predict protein-protein interaction sites from local
sequence information. FEBS Letters 544:236-239.
35. B
Rost and J Liu (2003) The PredictProtein server. Nucleic Acids Research 31:3300-3304.
36. J
Liu and B Rost (2003) NORSp: predictions of long regions without regular
secondary structure. Nucleic Acids Research
31:3833-3835.
37. R
Nair and B Rost (2003) LOC3D: annotate sub-cellular localization for protein
structures. Nucleic Acids Research 31:3337-3340.
38. S
Mika and B Rost (2003) UniqueProt: creating representative protein sequence
sets. Nucleic Acids Research 31:3789-3791.
39. A
Kernytsky and B Rost (2003) Static benchmarking of membrane helix predictions. Nucleic
Acids Research 31:3642-3644.
40. IYY
Koh, VA Eyrich, MA Marti-Renom, D Przybylski, MS Madhusudhan, E Narayanan, O
Gra–a, A Valencia, A Sali and B Rost (2003) EVA: evaluation of protein
structure prediction servers. Nucleic Acids Research 31:3311-3315.
41. P
Carter, CAF Andersen and B Rost (2003) DSSPcont: continuous secondary structure
assignments for proteins. Nucleic Acids Research
31:3293-3295.
42. VA
Eyrich and B Rost (2003) META-PP: single interface to crucial prediction
servers. Nucleic Acids Research 31:3308-3310.
43. R
Nair and B Rost (2003) Better prediction of sub-cellular localization by
combining evolutionary and structural information. Proteins: Structure,
Function, and Bioinformatics 53:917-930.
44. VA
Eyrich, IYY Koh, D Przybylski, O Gra–a, F Pazos, A Valencia and B Rost (2003)
CAFASP3 in the spotlight of EVA. Proteins: Structure, Function, and
Bioinformatics 53 Suppl 6:548-560.
45. B
Rost (2002) Rising accuracy of protein secondary structure prediction. In: D
Chasman (eds.). Protein structure determination, analysis, and modeling for
drug discovery. New York: Dekker:207-249.
46. CAF
Andersen and B Rost (2003) Automatic secondary structure assignment. Methods
Biochem Anal. 44:341-363.
47. B
Rost (2003) Prediction in 1D: secondary structure, membrane helices, and
accessibility. Methods Biochem Anal. 44:559-587.
48. J
Liu and B Rost (2003) Domains, motifs, and clusters in the protein universe. Current
Opinion in Chemical Biology 7:5-11.
49. B
Rost, J Liu, D Przybylski, R Nair, H Bigelow, KO Wrzeszczynski and Y Ofran
(2003) Prediction of protein structure through evolution. In: J Gasteiger and T
Engel (eds.). Handbook of Chemoinformatics - from data to knowledge. Weinheim:
Wiley-VCH:1789-1811.
50. KO
Wrzeszczynski and B Rost (2003) xx Cataloguing proteins in cell cycle control.
In: H Lieberman (eds.). Cell cycle checkpoint control protocols. Totowa, NJ:
Humana Press:219-233.
51. B
Rost, J Liu, R Nair, KO Wrzeszczynski and Y Ofran (2003) Automatic prediction
of protein function. Cellular and Molecular Life Sciences submitted Mar 25, 2003.
52. R
Zidovetzki, B Rost, DL Armstrong and I Pecht (2003) Role of transmembrane
domains in the functions of Fc receptors. Journal of Biophysical Chemistry 15:555-575.
53. JM
Aramini, et al. (2003) Solution NMR structure of the 30S ribosomal protein S28E
from Pyrococcus horikoshii. Protein Science 12:2823-2830.
54. J
Liu and B Rost (2004) CHOP proteins into structural domains. Proteins:
Structure, Function, and Bioinformatics 55:678-688.
55. H
Bigelow, D Petrey, J Liu, D Przybylski and B Rost (2004) Prediction of
transmembrane beta-barrels for entire proteomes. Nucleic Acids Research 32:2566-2577.
56. KO
Wrzeszczynski and B Rost (2004) Annotating proteins from Endoplasmic reticulum
and Golgi apparatus in eukaryotic proteomes. Cellular and Molecular Life
Sciences 61:1341-1353.
57. B
Rost, G Yachdav and J Liu (2004) The PredictProtein server. Nucleic Acids
Research 32:W321-W326.
58. R
Nair and B Rost (2004) LOCnet and LOCtarget: Sub-cellular localization for
structural genomics targets. Nucleic Acids Research 32:W517-W521.
59. J
Liu and B Rost (2004) CHOP: parsing proteins into structural domains. Nucleic
Acids Research 32:W569-W571.
60. S
Mika and B Rost (2004) NLProt: extracting protein names and sequences from
papers. Nucleic Acids Research 32:W634-W637.
61. J
Liu, H Hegyi, TB Acton, GT Montelione and B Rost (2004) Automatic target
selection for structural genomics on eukaryotes. Proteins: Structure,
Function, and Bioinformatics 56:188-200.
62. J
Liu and B Rost (2004) Sequence-based prediction of protein domains. Nucleic
Acids Research 32:3522-3530.
63. S
Mika and B Rost (2004) Protein names peeled precisely off free text. Bioinformatics 20:I241-I247.
64. D
Przybylski and B Rost (2004) Improving fold recognition without folds. Journal
of Molecular Biology 341:255-269.
65. R
Nair and B Rost (2004) Annotating protein function through lexical analysis. AI
Magazine 25:45-56.
66. J
Glasgow, I Jurisica and B Rost (2004) AI and Bioinformatics. AI Magazine 25:7-8.
67. Z
Wunderlich, TB Acton, J Liu, G Kornhaber, J Everett, P Carter, N Lan, N Echols,
M Gerstein, B Rost and GT Montelione (2004) The protein target list of the
Northeast Structural Genomics Consortium. Proteins: Structure, Function, and
Bioinformatics 56:181-187.
68. R
Powers, TB Acton, Y Chiang, PK Rajan, JR Cort, MA Kennedy, J Liu, L Ma, B Rost
and GT Montelione (2004) 1H, 13C and 15N assignments for the Archaeglobus
fulgidis protein AF2095. Journal of Biomolecular NMR 30:107-108.
69. S
Mika and B Rost (2005) NMPdb: database of nuclear matrix proteins. Nucleic
Acids Research 33:D160-163.
70. R
Nair and B Rost (2005) Mimicking cellular sorting improves prediction of
subcellular localization. Journal of Molecular Biology 348:85-100.
71. M
Punta and B Rost (2005) Protein folding rates estimated from contact
predictions. Journal of Molecular Biology
348:507-512.
72. M
Punta and B Rost (2005) PROFcon: novel prediction of long-range contacts. Bioinformatics 21:2960-2968.
73. A
Schlessinger and B Rost (2005) Protein flexibility and rigidity predicted from
sequence. Proteins: Structure, Function, and Bioinformatics in press.
74. Y
Ofran and B Rost (2005) Predictive methods using protein sequence. In: AD
Baxevanis and BF Ouellette (eds.). Bioinformatics. New York: Wiley:197-222.
75. B
Rost (2005) How to use protein 1D structure predicted by PROFphd. In: JE Walker
(eds.). The Proteomics Protocols Handbook. Totowa NJ: Humana:875-901.
76. Y
Ofran, M Punta, R Schneider and B Rost (2005) Beyond annotation transfer by
homology: novel protein function prediction methods that can assist drug
discovery. Drug Discovery Today 10:1475-1482.
77. J
Benach, WC Edstrom, I Lee, K Das, B Cooper, R Xiao, J Liu, B Rost, TB Acton, GT
Montelione and JF Hunt (2005) The 2.35 A structure of the TenA homolog from
Pyrococcus furiosus supports an enzymatic function in thiamine metabolism. Acta
Crystallogr D Biol Crystallogr 61:589-598.
78. O
Grana, VA Eyrich, F Pazos, B Rost and A Valencia (2005) EVAcon: a protein
contact prediction evaluation service. Nucleic Acids Res 33:W347-351.
79. HV
Jagadish, D States and B Rost (2005) ISMB 2005. Bioinformatics 21 Suppl 1:i1-i2.
80. The
FANTOM Consortium, et al. (2005) The Transcriptional Landscape of the Mammalian
Genome. Science 309:1559-1563.
81. R
Powers, et al. (2005) Solution structure of Archaeglobus fulgidis peptidyl-tRNA
hydrolase (Pth2) provides evidence for an extensive conserved family of Pth2
enzymes in archea, bacteria, and eukaryotes. Protein Science 14:2849-2861.
82. DA
Snyder, et al. (2005) Comparisons of NMR spectral quality and success in
crystallization demonstrate that NMR and X-ray crystallography are
complementary methods for small protein structure determination. J Am Chem
Soc 127:16505-16511.
83. J
Moult, K Fidelis, B Rost, T Hubbard and A Tramontano (2005) Critical assessment
of methods of protein structure prediction (CASP)-Round 6. Proteins 61:3-7.
84. O
Grana, D Baker, RM Maccallum, J Meiler, M Punta, B Rost, ML Tress and A
Valencia (2005) CASP6 assessment of contact prediction. Proteins 61:214-224.
85. A
Schlessinger, Y Ofran, G Yachdav and B Rost (2006) Epitome: Database of structure-inferred
antigenic epitopes. Nucleic Acids Research
34:D777-780.
86. J
Liu, J Gough and B Rost (2006) Distinguishing protein-coding from non-coding
RNA through support vector machines. PLoS Genetics 2:e29; DOI: 10.1371/journal.pgen.0020029.
87. A
Schlessinger, G Yachdav and B Rost (2006) PROFbval: predict flexible and rigid
residues in proteins. Bioinformatics 22:891-893.
88. S
Mika and B Rost (2006) Protein–protein interactions more conserved within
species than across species. PLoS Computational Biology 2:e79.
89. H
Bigelow and B Rost (2006) PROFtmb: a web server for predicting bacterial
transmembrane beta barrel proteins. Nucleic Acids Research 34:W186-188.
90. Y
Ofran, G Yachdav, E Mozes, T-t Soong, R Nair and B Rost (2006) Create and
assess protein networks through molecular characteristics of individual
proteins. Bioinformatics 22:e402-407.
91. A
Passerini, M Punta, A Ceroni, B Rost and P Frasconi (2006) Identifying
cysteines and histidines in transition-metal-binding sites using support vector
machines and neural networks. Proteins: Structure, Function, and
Bioinformatics 65:305-316.
92. HM
Berman, et al. (2006) Outcome of a workshop on archiving structural models of
biological macromolecules. Structure
14:1211-1217.
93. Y
Ofran and B Rost (2007) ISIS: Interaction Sites Identified from Sequence. Bioinformatics 23:e13-16.
94. D
Przybylski and B Rost (2007) Consensus sequences improve PSI-BLAST through
mimicking profile-profile alignments. Nucleic Acids Research 35:2238-2246.
95. J
Liu, GT Montelione and B Rost (2007) Novel leverage of structural genomics. Nature
Biotechnology in press.
96. Y
Ofran and B Rost (2007a) Protein-protein interaction hot spots carved into
sequences. PLoS Comput Biol in press.
97. Y
Bromberg and B Rost (2007) SNAP: predict effect of non-synonymous polymorphisms
on function. Nucleic Acids Research in press.
98. Y
Ofran, V Mysore and B Rost (2007) Prediction of DNA binding residues from
sequence. Bioinformatics in press.
99. A
Schlessinger, J Liu and B Rost (2007b) Natively unstructured loops differ from
other loops. PLoS Comput Biol in press.
100. M
Punta, LR Forrest, H Bigelow, A Kernytsky, J Liu and B Rost (2007) Membrane
protein prediction methods. Methods 41:460-474.
101. D
Przybylski and B Rost (2007) Predicting simplified features of protein
structure. In: T Lengauer (eds.). Bioinformatics – From Genomes to
Therapies. Weinheim: Wiley-VCH:in press.
102. R
Nair and B Rost (2007) Predicting protein subcellular localization using
intelligent systems. In: D Leon and S Markel (eds.). In Silico Technology in
Drug Target Identification and Validation. Marcel Dekker:.
103. H
Bigelow and B Rost (2007) Online tools for predicting integral membrane
proteins. In: MJ Peirce and R Wait (eds.). Proteomic analysis of membrane
proteins: methods and protocols. .
104. R
Nair and B Rost (2006) Predicting protein subcellular localization using
intelligent systems. In: D Leon and S Markel (eds.). In silico technology in
drug target identification and validation. Boca Raton, FL: CRC Press:.
105. D
Przybylski and B Rost (2006) In: T Lengauer (eds.). New York: Wiley-VCH:.
106. Y
Ofran and B Rost (2003) Rescue for statistical tests in high-throughput
biology. Bioinformatics submitted.
Twilight zone of protein sequence alignments
Quote: 1999 Protein Engineering 12, 85-94
Sequence alignments unambiguously distinguish between protein pairs of similar and non-similar structure when the pairwise sequence identity is high (>40% for long alignments). The signal gets blurred in the twilight zone of 20-35% sequence identity. Here, I analysed more than a million sequence alignments between protein pairs of known structures to re-define a line distinguishing between true and false positives for low levels of similarity. Four results stood out. (1) The transition from the safe zone of sequence alignment into the twilight zone is described by an explosion of false negatives. More than 95% of all pairs detected in the twilight zone had different structures. More precisely, above a cut-off roughly corresponding to 30% sequence identity, 90% of the pairs were homologous; below 25% less than 10% were. (2) Whether or not sequence homology implied structural identity depended crucially on the alignment length. For example, if ten residues were similar in an alignment of length 16 (> 60%), structural similarity could not be inferred. (3) The 'more similar than identical' rule (discarding all pairs for which percentage similarity was lower than percentage identity) reduced false positives significantly. (4) Similarly successful was sequence space hopping: pairs were predicted to be homologous when the respective sequence families had proteins in common. All findings are applicable to automatic database searches.
A modified definition of SOV, a segment-based measure for protein secondary structure prediction assessment
Quote: 1999 Proteins: Structure, Function, and Genetics 34, 220-223
We present a measure for the evaluation of secondary structure prediction methods that is based on secondary structure segments rather than individual residues. The algorithm is an extension of the segment overlap measure Sov, originally defined by Rost et al. (J Mol Biol 1994;235:13-26). The new definition of Sov corrects the normalization procedure and improves Sov's ability to discriminate between similar and dissimilar segment distributions. The method has been comprehensively tested during the second Critical Assessment of Techniques for Protein Structure Prediction (CASP2). Here, we describe the underlying concepts, modifications to the original definition, and their significance.
A platform for integrating threading results with protein family analyses
Quote: 1999 Bioinformatics 15, 1062-1063
We have developed a package for the interactive visualization of results from different threading programs. Additionally, we have integrated relevant information about protein sequence, function, evolution, and structure into the interface.
Effective use of sequence correlation and conservation in fold recognition
Quote: 1999 Journal of Molecular Biology 293, 1221-1239
Protein families are a rich source of information; sequence conservation and sequence correlation are two of the main properties that can be derived from the analysis of multiple sequence alignments. Sequence conservation is related to the direct evolutionary pressure to retain the chemical characteristics of some positions in order to maintain a given function. Sequence correlation is attributed to the small sequence adjustments needed to maintain protein stability against constant mutational drift. Here, we showed that sequence conservation and correlation were each frequently informative enough to detect incorrectly folded proteins. Furthermore, combining conservation, correlation, and polarity, we achieved an almost perfect discrimination between native and incorrectly folded proteins. Thus, we made use of this information for threading by evaluating the models suggested by a threading method according to the degree of proximity of the corresponding correlated, conserved, and apolar residues. The results showed that the fold recognition capacity of a given threading approach could be improved almost fourfold by selecting the alignments that score best under the three different sequence-based approaches.
CAFASP-1: critical assessment of fully automated structure prediction methods
Quote: 1999 Proteins: Structure, Function, and Genetics Suppl 3, 209-217
The results of the first Critical Assessment of Fully Automated Structure Prediction (CAFASP-1) are presented. The objective was to evaluate the success rates of fully automatic web servers for fold recognition which are available to the community. This study was based on the targets used in the third meeting on the Critical Assessment of Techniques for Protein Structure Prediction (CASP-3). However, unlike CASP-3, the study was not a blind trial, as it was held after the structures of the targets were known. The aim was to assess the performance of methods without the user intervention that several groups used in their CASP-3 submissions. Although it is clear that "human plus machine" predictions are superior to automated ones, this CAFASP-1 experiment is extremely valuable for users of our methods; it provides an indication of the performance of the methods alone, and not of the "human plus machine" performance assessed in CASP. This information may aid users in choosing which programs they wish to use and in evaluating the reliability of the programs when applied to their specific prediction targets. In addition, evaluation of fully automated methods is particularly important to assess their applicability at genomic scales. For each target, groups submitted the top-ranking folds generated from their servers. In CAFASP-1 we concentrated on fold-recognition web servers only and evaluated only recognition of the correct fold, and not, as in CASP-3, alignment accuracy. Although some performance differences appeared within each of the four target categories used here, overall, no single server has proved markedly superior to the others. The results showed that current fully automated fold recognition servers can often identify remote similarities when pairwise sequence search methods fail. Nevertheless, in only a few cases outside the family-level targets has the score of the top-ranking fold been significant enough to allow for a confident fully automated prediction. Because the goals, rules, and procedures of CAFASP-1 were different from those used at CASP-3, the results reported here are not comparable with those reported in CASP-3. Nevertheless, it is clear that current automated fold recognition methods can not yet compete with "human-expert plus machine" predictions. Finally, CAFASP-1 has been useful in identifying the requirements for a future blind trial of automated served-based protein structure prediction.
Finding nuclear localization signals
Quote: 2000 EMBO Reports 1, 411-415
A variety of nuclear localisation signals (NLSs) are experimentally known; only one motif was available for database searches. We initially collected a set of 91 experimentally verified NLSs from the literature. Through iterated 'in silico mutagenesis' we then extended the set to 214 potential NLSs. This final set matched in 43% of all known nuclear proteins and in no known non-nuclear protein. We estimated >17% of all eukaryotic proteins may be imported into the nucleus. Finally, we found an overlap between NLS and DNA-binding region for 90% of the proteins for which both NLS and DNA-binding regions were known. Thus, evolution seemed to have used part of the existing DNA-binding mechanism when compartmentalising DNA-binding proteins into the nucleus. However, only 56 of our 214 NLS motifs overlapped with DNA-binding regions. These 56 NLSs enabled a de novo prediction of partial DNA-binding regions for about 800 proteins in human, fly, worm and yeast.
Third generation prediction of secondary structure
Quote: 2000 Methods in Molecular Biology 143, 71-95
We still cannot predict protein structure from sequence, in general. But, we can do much better in predicting simplified aspects of structure. Particularly, the field of secondary structure has been revived by a break-through that has been achieved by a combination of elaborated algorithms and evolutionary information available in ever growing data bases. Some of the new, third generation methods for secondary structure prediction are clearly superior to previous methods: b-strands are predicted more accurately; predicted segments look like those observed; and the overall accuracy is about ten percentage points higher than for methods from previous generations. Performance can be improved even further by using these methods in an 'expert' rather than in an 'automatic' mode.
Protein secondary structure prediction continues to rise
Quote: