CUBIC papers 1999-now

 


Columbia University
Department of Biochemistry and Molecular Biophysics/C2B2
1130 St. Nicholas Ave. Rm 805
New York, NY 10032, USA 

Email:  rost@columbia.edu
WWW:     http://www.rostlab.org/       
Tel:       +1-212-851-4669     

 

 

 

 

 

NOTE: This is not a full list of publications from all members of the CUBIC group.


 

Bibliography

 


1.       B Rost (1999) Twilight zone of protein sequence alignments. Protein Engineering 12:85-94.

2.       A Zemla, C Venclovas, K Fidelis and B Rost (1999) A modified definition of SOV, a segment-based measure for protein secondary structure prediction assessment. Proteins: Structure, Function, and Genetics 34:220-223.

3.       F Pazos, B Rost and A Valencia (1999) A platform for integrating threading results with protein family analyses. Bioinformatics 15:1062-1063.

4.       O Olmea, B Rost and A Valencia (1999) Effective use of sequence correlation and conservation in fold recognition. Journal of Molecular Biology 293:1221-1239.

5.       D Fischer, et al. (1999) CAFASP-1: critical assessment of fully automated structure prediction methods. Proteins: Structure, Function, and Genetics Suppl 3:209-217.

6.       M Cokol, R Nair and B Rost (2000) Finding nuclear localisation signals. EMBO Reports 1:411-415.

7.       B Rost and C Sander (2000) Third generation prediction of secondary structure. Methods in Molecular Biology 143:71-95.

8.       B Rost (2001) Protein secondary structure prediction continues to rise. Journal of Structural Biology 134:204-218.

9.       V Eyrich, MA Mart’-Renom, D Przybylski, A Fiser, F Pazos, A Valencia, A Sali and B Rost (2001) EVA: continuous automatic evaluation of protein structure prediction servers. Bioinformatics 17:1242-1243.

10.    J Liu and B Rost (2001) Comparing function and structure between entire proteomes. Protein Science 10:1970-1979.

11.    B Rost and V Eyrich (2001) EVA: large-scale analysis of secondary structure prediction. Proteins: Structure, Function, and Genetics 45 Suppl 5:S192-S199.

12.    R Nair and B Rost (2001) Surface profiles predict sub-cellular localisation. preprint: Columbia University.

13.    D Fischer, A Elofsson, L Rychlewski, F Pazos, A Valencia, B Rost, AR Ortiz and RLJ Dunbrack (2001) CAFASP2: the second critical assessment of fully automated structure prediction methods. Proteins: Structure, Function, and Genetics 45 Suppl 5:S171-S183.

14.    B Rost, P Baldi, G Barton, J Cuff, V Eyrich, D Jones, K Karplus, R King, M Ouali, G Pollastri and D Przybylski (2001) Simple jury predicts protein secondary structure best. Preprint: Columbia University.

15.    B Rost and P Baldi (2001) New improvements in protein secondary structure prediction. Preprint: Columbia University.

16.    D Przybylski and B Rost (2002) Alignments grow, secondary structure prediction improves. Proteins: Structure, Function, and Bioinformatics 46:195-205.

17.    CAF Andersen, AG Palmer, S Brunak and B Rost (2002) Continuum secondary structure captures protein flexibility. Structure 10:175-184.

18.    B Rost (2002) Enzyme function less conserved than anticipated. Journal of Molecular Biology 318:595-608.

19.    J Liu and B Rost (2002) Target space for structural genomics revisited. Bioinformatics 18:922-933.

20.    R Nair and B Rost (2002) Inferring sub-cellular localisation through automated lexical analysis. Bioinformatics 18:S78-S86.

21.    J Liu, H Tan and B Rost (2002) Loopy proteins appear conserved in evolution. Journal of Molecular Biology 322:53-64.

22.    CP Chen, A Kernytsky and B Rost (2002) Transmembrane helix predictions revisited. Protein Science 11:2774-2791.

23.    R Nair and B Rost (2002) Sequence conserved for sub-cellular localization. Protein Science 11:2836-2847.

24.    CP Chen and B Rost (2002) Long membrane helices and short loops predicted less accurately. Protein Science 2766-2773.

25.    CP Chen and B Rost (2002) State-of-the-art in membrane prediction. Applied Bioinformatics 1:21-35.

26.    B Rost (2002) Did evolution leap to create the protein universe? Current Opinion in Structural Biology 12:409-416.

27.    G Pollastri, D Przybylski, B Rost and P Baldi (2002) Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles. Proteins: Structure, Function, and Bioinformatics 47:228-235.

28.    MA Marti-Renom, MS Madhusudhan, A Fiser, B Rost and A Sali (2002) Reliability of assessment of protein structure prediction methods. Structure 10:435-440.

29.    A Sali, MA Marti-Renom, MS Madhusudhan, A Fiser and B Rost (2002) Reply to Moult et al. Structure 10:292-293.

30.    Y Ofran and B Rost (2003) Analysing six types of protein-protein interfaces. Journal of Molecular Biology 325:377-387.

31.    R Nair, P Carter and B Rost (2003) NLSdb: database of nuclear localization signals. Nucleic Acids Research 31:397-399.

32.    P Carter, J Liu and B Rost (2003) PEP: Predictions for Entire Proteomes. Nucleic Acids Research 31:410-413.

33.    B Rost (2003) Neural networks predict protein structure: hype or hit? In: P Frasconi and R Shamir (eds.). Artificial intelligence and heuristic methods in bioinformatics. Amsterdam: IOS Press:34-50.

34.    Y Ofran and B Rost (2003) Predicted protein-protein interaction sites from local sequence information. FEBS Letters 544:236-239.

35.    B Rost and J Liu (2003) The PredictProtein server. Nucleic Acids Research 31:3300-3304.

36.    J Liu and B Rost (2003) NORSp: predictions of long regions without regular secondary structure. Nucleic Acids Research 31:3833-3835.

37.    R Nair and B Rost (2003) LOC3D: annotate sub-cellular localization for protein structures. Nucleic Acids Research 31:3337-3340.

38.    S Mika and B Rost (2003) UniqueProt: creating representative protein sequence sets. Nucleic Acids Research 31:3789-3791.

39.    A Kernytsky and B Rost (2003) Static benchmarking of membrane helix predictions. Nucleic Acids Research 31:3642-3644.

40.    IYY Koh, VA Eyrich, MA Marti-Renom, D Przybylski, MS Madhusudhan, E Narayanan, O Grana, A Valencia, A Sali and B Rost (2003) EVA: evaluation of protein structure prediction servers. Nucleic Acids Research 31:3311-3315.

41.    P Carter, CAF Andersen and B Rost (2003) DSSPcont: continuous secondary structure assignments for proteins. Nucleic Acids Research 31:3293-3295.

42.    VA Eyrich and B Rost (2003) META-PP: single interface to crucial prediction servers. Nucleic Acids Research 31:3308-3310.

43.    R Nair and B Rost (2003) Better prediction of sub-cellular localization by combining evolutionary and structural information. Proteins: Structure, Function, and Bioinformatics 53:917-930.

44.    VA Eyrich, IYY Koh, D Przybylski, O Gra–a, F Pazos, A Valencia and B Rost (2003) CAFASP3 in the spotlight of EVA. Proteins: Structure, Function, and Bioinformatics 53 Suppl 6:548-560.

45.    B Rost (2002) Rising accuracy of protein secondary structure prediction. In: D Chasman (eds.). Protein structure determination, analysis, and modeling for drug discovery. New York: Dekker:207-249.

46.    CAF Andersen and B Rost (2003) Secondary structure assignment. Methods Biochem Anal. 44:341-363.

47.    B Rost (2003) Prediction in 1D: secondary structure, membrane helices, and accessibility. Methods Biochem Anal. 44:559-587.

48.    J Liu and B Rost (2003) Domains, motifs, and clusters in the protein universe. Current Opinion in Chemical Biology 7:5-11.

49.    B Rost, J Liu, D Przybylski, R Nair, H Bigelow, KO Wrzeszczynski and Y Ofran (2003) Prediction of protein structure through evolution. In: J Gasteiger and T Engel (eds.). Handbook of Chemoinformatics - from data to knowledge. Weinheim: Wiley-VCH:1789-1811.

50.    KO Wrzeszczynski and B Rost (2003) xx Cataloguing proteins in cell cycle control. In: H Lieberman (eds.). Cell cycle checkpoint control protocols. Totowa, NJ: Humana Press:219-233.

51.    B Rost, J Liu, R Nair, KO Wrzeszczynski and Y Ofran (2003) Automatic prediction of protein function. Cellular and Molecular Life Sciences submitted Mar 25, 2003.

52.    R Zidovetzki, B Rost, DL Armstrong and I Pecht (2003) Role of transmembrane domains in the functions of Fc receptors. Journal of Biophysical Chemistry 15:555-575.

53.    JM Aramini, et al. (2003) Solution NMR structure of the 30S ribosomal protein S28E from Pyrococcus horikoshii. Protein Science 12:2823-2830.

54.    J Liu and B Rost (2004) CHOP proteins into structural domain-like fragments. Proteins: Structure, Function, and Bioinformatics 55:678-688.

55.    H Bigelow, D Petrey, J Liu, D Przybylski and B Rost (2004) Predicting transmembrane beta-barrels in proteomes. Nucleic Acids Research 32:2566-2577.

56.    KO Wrzeszczynski and B Rost (2004) Annotating proteins from Endoplasmic reticulum and Golgi apparatus in eukaryotic proteomes. Cellular and Molecular Life Sciences 61:1341-1353.

57.    B Rost, G Yachdav and J Liu (2004) The PredictProtein server. Nucleic Acids Research 32:W321-W326.

58.    R Nair and B Rost (2004) LOCnet and LOCtarget: Sub-cellular localization for structural genomics targets. Nucleic Acids Research 32:W517-W521.

59.    J Liu and B Rost (2004) CHOP: parsing proteins into structural domains. Nucleic Acids Research 32:W569-W571.

60.    S Mika and B Rost (2004) NLProt: extracting protein names and sequences from papers. Nucleic Acids Research 32:W634-W637.

61.    J Liu, H Hegyi, TB Acton, GT Montelione and B Rost (2004) Automatic target selection for structural genomics on eukaryotes. Proteins: Structure, Function, and Bioinformatics 56:188-200.

62.    J Liu and B Rost (2004) Sequence-based prediction of protein domains. Nucleic Acids Research 32:3522-3530.

63.    S Mika and B Rost (2004) Protein names precisely peeled off free text. Bioinformatics 20:I241-I247.

64.    D Przybylski and B Rost (2004) Improving fold recognition without folds. Journal of Molecular Biology 341:255-269.

65.    R Nair and B Rost (2004) Annotating protein function through lexical analysis. AI Magazine 25:45-56.

66.    J Glasgow, I Jurisica and B Rost (2004) AI and Bioinformatics. AI Magazine 25:7-8.

67.    Z Wunderlich, TB Acton, J Liu, G Kornhaber, J Everett, P Carter, N Lan, N Echols, M Gerstein, B Rost and GT Montelione (2004) The protein target list of the Northeast Structural Genomics Consortium. Proteins: Structure, Function, and Bioinformatics 56:181-187.

68.    R Powers, TB Acton, Y Chiang, PK Rajan, JR Cort, MA Kennedy, J Liu, L Ma, B Rost and GT Montelione (2004) 1H, 13C and 15N assignments for the Archaeglobus fulgidis protein AF2095. Journal of Biomolecular NMR 30:107-108.

69.    S Mika and B Rost (2005) NMPdb: database of nuclear matrix proteins. Nucleic Acids Research 33:D160-163.

70.    R Nair and B Rost (2005) Mimicking cellular sorting improves prediction of subcellular localization. Journal of Molecular Biology 348:85-100.

71.    M Punta and B Rost (2005) Protein folding rates estimated from contact predictions. Journal of Molecular Biology 348:507-512.

72.    M Punta and B Rost (2005) PROFcon: novel prediction of long-range contacts. Bioinformatics 21:2960-2968.

73.    A Schlessinger and B Rost (2005) Protein flexibility and rigidity predicted from sequence. Proteins: Structure, Function, and Bioinformatics in press.

74.    Y Ofran and B Rost (2005) Predictive methods using protein sequence. In: AD Baxevanis and BF Ouellette (eds.). Bioinformatics. New York: Wiley:197-222.

75.    B Rost (2005) How to use protein 1D structure predicted by PROFphd. In: JE Walker (eds.). The Proteomics Protocols Handbook. Totowa NJ: Humana:875-901.

76.    Y Ofran, M Punta, R Schneider and B Rost (2005) Beyond annotation transfer by homology: novel protein-function prediction methods to assist drug discovery. Drug Discovery Today 10:1475-1482.

77.    J Benach, WC Edstrom, I Lee, K Das, B Cooper, R Xiao, J Liu, B Rost, TB Acton, GT Montelione and JF Hunt (2005) The 2.35 A structure of the TenA homolog from Pyrococcus furiosus supports an enzymatic function in thiamine metabolism. Acta Crystallogr D Biol Crystallogr 61:589-598.

78.    O Grana, VA Eyrich, F Pazos, B Rost and A Valencia (2005) EVAcon: a protein contact prediction evaluation service. Nucleic Acids Res 33:W347-351.

79.    HV Jagadish, D States and B Rost (2005) ISMB 2005. Bioinformatics 21 Suppl 1:i1-i2.

80.    The FANTOM Consortium, et al. (2005) The Transcriptional Landscape of the Mammalian Genome. Science 309:1559-1563.

81.    R Powers, et al. (2005) Solution structure of Archaeglobus fulgidis peptidyl-tRNA hydrolase (Pth2) provides evidence for an extensive conserved family of Pth2 enzymes in archea, bacteria, and eukaryotes. Protein Science 14:2849-2861.

82.    DA Snyder, et al. (2005) Comparisons of NMR spectral quality and success in crystallization demonstrate that NMR and X-ray crystallography are complementary methods for small protein structure determination. J Am Chem Soc 127:16505-16511.

83.    J Moult, K Fidelis, B Rost, T Hubbard and A Tramontano (2005) Critical assessment of methods of protein structure prediction (CASP)-Round 6. Proteins 61:3-7.

84.    O Grana, D Baker, RM Maccallum, J Meiler, M Punta, B Rost, ML Tress and A Valencia (2005) CASP6 assessment of contact prediction. Proteins 61:214-224.

85.    A Schlessinger, Y Ofran, G Yachdav and B Rost (2006) Epitome: Database of structure-inferred antigenic epitopes. Nucleic Acids Research 34:D777-780.

86.    J Liu, J Gough and B Rost (2006) Distinguishing protein-coding from non-coding RNA through support vector machines. PLoS Genetics 2:e29; DOI: 10.1371/journal.pgen.0020029.

87.    A Schlessinger, G Yachdav and B Rost (2006) PROFbval: predict flexible and rigid residues in proteins. Bioinformatics 22:891-893.

88.    S Mika and B Rost (2006) Protein–protein interactions more conserved within species than across species. PLoS Computational Biology 2:e79.

89.    H Bigelow and B Rost (2006) PROFtmb: a web server for predicting bacterial transmembrane beta barrel proteins. Nucleic Acids Research 34:W186-188.

90.    Y Ofran, G Yachdav, E Mozes, T-t Soong, R Nair and B Rost (2006) Create and assess protein networks through molecular characteristics of individual proteins. Bioinformatics 22:e402-407.

91.    A Passerini, M Punta, A Ceroni, B Rost and P Frasconi (2006) Identifying cysteines and histidines in transition-metal-binding sites using support vector machines and neural networks. Proteins: Structure, Function, and Bioinformatics 65:305-316.

92.    HM Berman, et al. (2006) Outcome of a workshop on archiving structural models of biological macromolecules. Structure 14:1211-1217.

93.    Y Ofran and B Rost (2007) ISIS: Interaction Sites Identified from Sequence. Bioinformatics 23:e13-16.

94.    D Przybylski and B Rost (2007) Consensus sequences improve PSI-BLAST through mimicking profile-profile alignments. Nucleic Acids Research 35:2238-2246.

95.    J Liu, GT Montelione and B Rost (2007) Novel leverage of structural genomics. Nature Biotechnology 25:849-851.

96.    Y Bromberg and B Rost (2007) SNAP: predict effect of non-synonymous polymorphisms on function. Nucleic Acids Research 35:3823-3835.

97.    Y Ofran, V Mysore and B Rost (2007) Prediction of DNA-binding residues from sequence. Bioinformatics 23:i347-353.

98.    A Schlessinger, J Liu and B Rost (2007) Natively unstructured loops differ from other loops. PLoS Comput Biol 3:e140.

99.    Y Ofran and B Rost (2007) Protein-protein interaction hot spots carved into sequences. PLoS Comput Biol 3:e119.

100.  A Schlessinger, M Punta and B Rost (2007) Natively unstructured regions in proteins identified from contact predictions. Bioinformatics 23:2376-2384.

101.  M Punta, LR Forrest, H Bigelow, A Kernytsky, J Liu and B Rost (2007) Membrane protein prediction methods. Methods 41:460-474.

102.  T Lengauer, B Rost and P Schuster (2007) ISMB/ECCB 2007. Bioinformatics 23:i1-i4.

103.  T Lengauer, B Morrison McKay and B Rost (2007) ISMB/ECCB 2007: The premier conference on computational biology. PLoS Comput Biol 3:e96.

104.  JM Aramini, et al. (2007) Solution NMR structure of Escherichia coli ytfP expands the structural coverage of the UPF0131 protein domain family. Proteins 68:789-795.

105.  J Moult, K Fidelis, A Kryshtafovych, B Rost, T Hubbard and A Tramontano (2007) Critical assessment of methods of protein structure prediction-Round VII

Critical assessment of methods of protein structure prediction-Round VII. Proteins 69 Suppl 8:3-9.

106.  D Przybylski and B Rost (2008a) Powerful fusion: PSI-BLAST and consensus sequences. Bioinformatics in press.

107.  Y Bromberg and B Rost (2008b) Comprehensive in silico mutagenesis highlights functionally improtant residues in proteins. Bioinformatics in press.

108.  A Kernytsky and B Rost (2008) Using genetic algorithms to select most predictive protein features. in press.

109.  D Przybylski and B Rost (2006) In: T Lengauer (eds.). New York: Wiley-VCH:.

110.  H Bigelow and B Rost (2007) Online tools for predicting integral membrane proteins. In: MJ Peirce and R Wait (eds.). Proteomic analysis of membrane proteins: methods and protocols. .

111.  D Przybylski and B Rost (2007) Predicting simplified features of protein structure. In: T Lengauer (eds.). Bioinformatics – From Genomes to Therapies. Weinheim: Wiley-VCH:in press.

112.  R Nair and B Rost (2007) Predicting protein subcellular localization using intelligent systems. In: D Leon and S Markel (eds.). In Silico Technology in Drug Target Identification and Validation. Marcel Dekker:.

113.  B Rost (2008) Prediction of protein structure in 1D: secondary structure, membrane regions, and solvent accessibility. In: PE Bourne and H Weissig (eds.). Structural Bioinformatics - 2nd Edition. Wiley:.

114.  KK Singarapu, R Xiao, T Acton, B Rost, GT Montelione and T Szyperski (2008) NMR structure of the peptidyl-tRNA hydrolase domain from Pseudomonas syringae expands the structural coverage of the hydrolysis domains of class 1 peptide chain release factors. Proteins.

115.  JM Aramini, et al. (2008) Solution NMR structure of the SOS response protein YnzC from Bacillus subtilis. Proteins: Structure, Function, and Genetics 72:526-530.

116.  O Trott, K Siggers, B Rost and AG Palmer, 3rd (2008) Protein conformational flexibility prediction using machine learning. J Magn Reson 192:37-47.

117.  M Linial, JP Mesirov, B Morrison McKay and B Rost (2008) ISMB 2008 Toronoto. PLoS Comput Biol 4:e1000094.

 

 


 

 


Abstracts


1999 peer-reviewed

Twilight zone of protein sequence alignments

Burkhard Rost

Quote: 1999 Protein Engineering 12, 85-94

Sequence alignments unambiguously distinguish between protein pairs of similar and non-similar structure when the pairwise sequence identity is high (>40% for long alignments). The signal gets blurred in the twilight zone of 20-35% sequence identity. Here, I analysed more than a million sequence alignments between protein pairs of known structures to re-define a line distinguishing between true and false positives for low levels of similarity. Four results stood out. (1) The transition from the safe zone of sequence alignment into the twilight zone is described by an explosion of false negatives. More than 95% of all pairs detected in the twilight zone had different structures. More precisely, above a cut-off roughly corresponding to 30% sequence identity, 90% of the pairs were homologous; below 25% less than 10% were. (2) Whether or not sequence homology implied structural identity depended crucially on the alignment length. For example, if ten residues were similar in an alignment of length 16 (> 60%), structural similarity could not be inferred. (3) The 'more similar than identical' rule (discarding all pairs for which percentage similarity was lower than percentage identity) reduced false positives significantly. (4) Similarly successful was sequence space hopping: pairs were predicted to be homologous when the respective sequence families had proteins in common. All findings are applicable to automatic database searches. 

 

1999 collaborations

A modified definition of SOV, a segment-based measure for protein secondary structure prediction assessment

Adam Zemla, Ceslovas Venclovas, Krzysztof Fidelis &
Burkhard Rost

Quote: 1999 Proteins: Structure, Function, and Genetics 34, 220-223

We present a measure for the evaluation of secondary structure prediction methods that is based on secondary structure segments rather than individual residues. The algorithm is an extension of the segment overlap measure Sov, originally defined by Rost et al. (J Mol Biol 1994;235:13-26). The new definition of Sov corrects the normalization procedure and improves Sov's ability to discriminate between similar and dissimilar segment distributions. The method has been comprehensively tested during the second Critical Assessment of Techniques for Protein Structure Prediction (CASP2). Here, we describe the underlying concepts, modifications to the original definition, and their significance.

A platform for integrating threading results with protein family analyses

Florencio Pazos, Burkhard Rost & Alfonso Valencia

Quote: 1999 Bioinformatics 15, 1062-1063

We have developed a package for the interactive visualization of results from different threading programs. Additionally, we have integrated relevant information about protein sequence, function, evolution, and structure into the interface.

Effective use of sequence correlation and conservation in fold recognition

Osvaldo Olmea, Burkhard Rost & Burkhard Rost

Quote: 1999 Journal of Molecular Biology 293, 1221-1239

Protein families are a rich source of information; sequence conservation and sequence correlation are two of the main properties that can be derived from the analysis of multiple sequence alignments. Sequence conservation is related to the direct evolutionary pressure to retain the chemical characteristics of some positions in order to maintain a given function. Sequence correlation is attributed to the small sequence adjustments needed to maintain protein stability against constant mutational drift. Here, we showed that sequence conservation and correlation were each frequently informative enough to detect incorrectly folded proteins. Furthermore, combining conservation, correlation, and polarity, we achieved an almost perfect discrimination between native and incorrectly folded proteins. Thus, we made use of this information for threading by evaluating the models suggested by a threading method according to the degree of proximity of the corresponding correlated, conserved, and apolar residues. The results showed that the fold recognition capacity of a given threading approach could be improved almost fourfold by selecting the alignments that score best under the three different sequence-based approaches.

CAFASP-1: critical assessment of fully automated structure prediction methods

Daniel Fischer, C Barret, K Bryson, Arne Elofsson, Adam Godzik, David Jones, Kevin J. Karplus, L.  A. Kelley, R.  M.  MacCallum, K. Pawowski, Burkhard Rost, Leszek Rychlewski, Michael Sternberg

Quote: 1999 Proteins: Structure, Function, and Genetics Suppl 3, 209-217

The results of the first Critical Assessment of Fully Automated Structure Prediction (CAFASP-1) are presented. The objective was to evaluate the success rates of fully automatic web servers for fold recognition which are available to the community. This study was based on the targets used in the third meeting on the Critical Assessment of Techniques for Protein Structure Prediction (CASP-3). However, unlike CASP-3, the study was not a blind trial, as it was held after the structures of the targets were known. The aim was to assess the performance of methods without the user intervention that several groups used in their CASP-3 submissions. Although it is clear that "human plus machine" predictions are superior to automated ones, this CAFASP-1 experiment is extremely valuable for users of our methods; it provides an indication of the performance of the methods alone, and not of the "human plus machine" performance assessed in CASP. This information may aid users in choosing which programs they wish to use and in evaluating the reliability of the programs when applied to their specific prediction targets. In addition, evaluation of fully automated methods is particularly important to assess their applicability at genomic scales. For each target, groups submitted the top-ranking folds generated from their servers. In CAFASP-1 we concentrated on fold-recognition web servers only and evaluated only recognition of the correct fold, and not, as in CASP-3, alignment accuracy. Although some performance differences appeared within each of the four target categories used here, overall, no single server has proved markedly superior to the others. The results showed that current fully automated fold recognition servers can often identify remote similarities when pairwise sequence search methods fail. Nevertheless, in only a few cases outside the family-level targets has the score of the top-ranking fold been significant enough to allow for a confident fully automated prediction. Because the goals, rules, and procedures of CAFASP-1 were different from those used at CASP-3, the results reported here are not comparable with those reported in CASP-3. Nevertheless, it is clear that current automated fold recognition methods can not yet compete with "human-expert plus machine" predictions. Finally, CAFASP-1 has been useful in identifying the requirements for a future blind trial of automated served-based protein structure prediction.


 


 

2000 peer-reviewed

Finding nuclear localization signals

Murat Cokol, Rajesh Nair & Burkhard Rost

Quote: 2000 EMBO Reports 1, 411-415

A variety of nuclear localisation signals (NLSs) are experimentally known; only one motif was available for database searches. We initially collected a set of 91 experimentally verified NLSs from the literature. Through iterated 'in silico mutagenesis' we then extended the set to 214 potential NLSs. This final set matched in 43% of all known nuclear proteins and in no known non-nuclear protein. We estimated >17% of all eukaryotic proteins may be imported into the nucleus. Finally, we found an overlap between NLS and DNA-binding region for 90% of the proteins for which both NLS and DNA-binding regions were known. Thus, evolution seemed to have used part of the existing DNA-binding mechanism when compartmentalising DNA-binding proteins into the nucleus. However, only 56 of our 214 NLS motifs overlapped with DNA-binding regions. These 56 NLSs enabled a de novo prediction of partial DNA-binding regions for about 800 proteins in human, fly, worm and yeast. 

 

2000 non-peer

Third generation prediction of secondary structure

Burkhard Rost & Chris Sander

Quote: 2000 Methods in Molecular Biology 143, 71-95

We still cannot predict protein structure from sequence, in general. But, we can do much better in predicting simplified aspects of structure. Particularly, the field of secondary structure has been revived by a break-through that has been achieved by a combination of elaborated algorithms and evolutionary information available in ever growing data bases. Some of the new, third generation methods for secondary structure prediction are clearly superior to previous methods: b-strands are predicted more accurately; predicted segments look like those observed; and the overall accuracy is about ten percentage points higher than for methods from previous generations. Performance can be improved even further by using these methods in an 'expert' rather than in an 'automatic' mode.


 

 


 

2001 peer-reviewed

Protein secondary structure prediction continues to rise

Burkhard Rost

Quote: 2001 Journal of Structural Biology 134, 204-218

Methods predicting protein secondary structure have improved substantially in the 90's through using evolutionary information taken from the divergence of proteins in the same structural family. Recently, the evolutionary information resulting from improved searches and larger databases has again boosted prediction accuracy by more than four percentage points to its current height around 76% of all residues predicted correctly in one of the three states helix, strand, other. The last year also brought successful new concepts to the field. These new methods may be particularly interesting in light of the improvements achieved through simply combining existing methods. Divergent evolutionary profiles not only contain enough information to substantially improve prediction accuracy, but even to correctly predict long stretches of identical residues observed in alternative secondary structure states depending on non local conditions. An example is a method automatically identifying structural switches, and thus finding a remarkable connection between predicted secondary structure and aspects of function. Secondary structure predictions are increasingly becoming the working horse for numerous methods aiming at predicting protein structure and function. Is the recent increase in accuracy significant enough to make predictions even more useful? Since the recent improvement yields a better prediction of segments, and in particular of beta-strands, I believe the answer is affirmative. What is the limit of prediction accuracy? We shall see.

EVA: continuous automatic evaluation
of protein structure prediction servers

Volker A. Eyrich, Marc A. Mart’-Renom, Dariusz Przybylski, Mallur S. Madhusudhan, Andr‡s Fiser, Florencio Pazos, Alfonso Valencia, Andrej Sali & Burkhard Rost

Quote: 2001 Bioinformatics 17, 1242-1243

Summary: Evaluation of protein structure prediction methods is difficult and time-consuming. Here, we de-scribe EVA, a web server for assessing protein structure prediction methods, in an automated, continuous and large-scale fashion. Currently, EVA evaluates the performance of a variety of prediction methods available through the internet. Every week, the sequences of the latest experimentally determined protein structures are sent to prediction servers, results are collected, performance is evaluated, and a summary is published on the web. EVA has so far collected data for more than 3000 protein chains. These results may provide valuable insight to both developers and users of prediction methods.

Comparing function and structure between entire proteomes

Jinfeng Liu & Burkhard Rost

Quote: 2001 Protein Science 10, 1970-1979

More than 30 organisms have been entirely sequenced. Here, we applied a variety of simple bioinformatics tools to analyse 29 proteomes for representatives from all three kingdoms: eukaryotes, prokaryotes and archaebacteria. We confirmed that eukaryotes have relatively more long proteins than prokaryotes and archaes, and that the overall amino acid composition is similar between the three. We predicted that about 15-30% of all proteins contained transmembrane helices. We could not find a correlation between the content of membrane proteins and the complexity of the organism. In particular, we did not find significantly higher percentages of helical membrane proteins in eukaryotes than in prokaryotes or archae. However, we found more proteins with 7 transmembrane helices in eukaryotes and more with 6 and 12 in prokaryotes. We found twice as many coiled-coil proteins in eukaryotes (10%) as in prokaryotes and archaes (4-5%), and we predicted about 15-25% of all proteins to be secreted by most eukaryotes and prokaryotes. Every tenth protein had no known homologue in current databases, and 30-40% of the proteins fall into structural families with more than 100 members. A classification by cellular function verified that eukaryotes had a higher proportion of proteins for communication with the environment. Finally, we found at least one homologue of experimentally known structure for about 20%-45% of all proteins; the regions with structural homology covered 20%-30% of all residues. These numbers may or may not suggest that there are 1200-2600 folds in the universe of protein structures. All predictions are available at Protein Science 10, 1970-1979.

EVA: large-scale analysis
of secondary structure prediction

Burkhard Rost & Volker A Eyrich

Quote: 2001 Proteins: Structure, Function, and Genetics 45 Suppl 5, S192-S199

EVA is a web-based server that evaluates automatic structure prediction servers continuously and objectively. Since June 2000, EVA collected more than 20,000 secondary structure predictions. The EVA sets sufficed to conclude that the field of secondary structure prediction has advanced again. Accuracy increased substantially in the 90's through using evolutionary information taken from the divergence of proteins in the same structural family. Recently, the evolutionary information resulting from improved searches and larger databases has again boosted prediction accuracy by more than four percentage points to its current height around 76% of all residues predicted correctly in one of the three states helix, strand, other. The best current methods solved most of the problems raised at earlier CASP meetings: All good methods now get segments right and perform well on strands. Is the recent increase in accuracy significant enough to make predictions even more useful? We believe the answer is affirmative. What is the limit of prediction accuracy? We shall see.

Surface profiles predict sub-cellular localisation

Rajesh Nair & Burkhard Rost

Quote: 2001 CUBIC Preprint

The gap between the number of known protein sequence and the knowledge about protein function is rapidly increasing. One important physical aspect of function is the sub-cellular localisation of a protein. Here, we trained two-layered feed-forward neural networks to predict the sub-cellular localisation for proteins of known structure. We introduced two novel key aspects: (1) using evolutionary information, and (2) using surface composition. We also trained networks only on the N-terms. Finally, we combined all our networks. We evaluated sustained levels of performance by four-fold cross-validation. The major single source of improvement was the use of evolutionary information. However, the com-bination of our various networks yielded the final, significant improvement over previous methods. The final system reached an accuracy above 80% (two-state). This level may suffice to make the method valuable for target selection in structural genomics.

 

2001 collaborations

CAFASP2: the second critical assessment of fully automated structure prediction methods

Daniel Fischer, Arne Elofsson, Leszek Rychlewski, Florencio Pazos, Alfonso Valencia, Burkhard Rost, Angel B Ortiz & R. L. Dunbrack

Quote: 2001 Proteins: Structure, Function, and Genetics 45 Suppl 5, S171-S183

The results of the second Critical Assessment of Fully Automated Structure Prediction (CAFASP2) are presented. The goals of CAFASP are to (i) assess the performance of fully automatic web servers for structure prediction, by using the same blind prediction targets as those used at CASP4, (ii) inform the community of users about the capabilities of the servers, (iii) allow human groups participating in CASP to use and analyze the results of the servers while preparing their nonautomated predictions for CASP, and (iv) compare the performance of the automated servers to that of the human-expert groups of CASP. More than 30 servers from around the world participated in CAFASP2, covering all categories of structure prediction. The category with the largest participation was fold recognition, where 24 CAFASP servers filed predictions along with 103 other CASP human groups. The CAFASP evaluation indicated that it is difficult to establish an exact ranking of the servers because the number of prediction targets was relatively small and the differences among many servers were also small. However, roughly a group of five "best" fold recognition servers could be identified. The CASP evaluation identified the same group of top servers albeit with a slightly different relative order. Both evaluations ranked a semiautomated method named CAFASP-CONSENSUS, that filed predictions using the CAFASP results of the servers, above any of the individual servers. Although the predictions of the CAFASP servers were available to human CASP predictors before the CASP submission deadline, the CASP assessment identified only 11 human groups that performed better than the best server. Furthermore, about one fourth of the top 30 performing groups corresponded to automated servers. At least half of the top 11 groups corresponded to human groups that also had a server in CAFASP or to human groups that used the CAFASP results to prepare their predictions. In particular, the CAFASP-CONSENSUS group was ranked 7. This shows that the automated predictions of the servers can be very helpful to human predictors. We conclude that as servers continue to improve, they will become increasingly important in any prediction process, especially when dealing with genome-scale prediction tasks. We expect that in the near future, the performance difference between humans and machines will continue to narrow and that fully automated structure prediction will become an effective companion and complement to experimental structural genomics.

 

2001 preprints

Surface profiles predict sub-cellular localisation

Rajesh Nair & Burkhard Rost

Quote: 2001 CUBIC Preprint

The gap between the number of known protein sequence and the knowledge about protein function is rapidly increasing. One important physical aspect of function is the sub-cellular localisation of a protein. Here, we trained two-layered feed-forward neural networks to predict the sub-cellular localisation for proteins of known structure. We introduced two novel key aspects: (1) using evolutionary information, and (2) using surface composition. We also trained networks only on the N-terms. Finally, we combined all our networks. We evaluated sustained levels of performance by four-fold cross-validation. The major single source of improvement was the use of evolutionary information. However, the combination of our various networks yielded the final, significant improvement over previous methods. The final system reached an accuracy above 80% (two-state). This level may suffice to make the method valuable for target selection in structural genomics.

Simple jury predicts protein secondary structure best

Burkhard Rost, Pierre Baldi, Geoff Barton, James Cuff, Volker A. Eyrich, David Jones, Kevin Karplus, Ross King, Gianluca Pollastri, Dariusz Przybylski

Quote: 2001 CUBIC Preprint 5

The field of secondary structure prediction methods has advanced again. The best methods now reach levels of 74-76% of the residues correctly predicted in one of the three states helix, strand, or other. In context of EVA/CASP, we experimented with averaging over the best current methods. The resulting jury decision proved significantly more accurate than the best method. Although the 'jury' seemed the best choice on average, for 60% of all proteins one method was better than the jury. Furthermore, the best individual methods tended to be superior to the jury in estimating the reliability of a prediction. Hence, averaging over predictions may be the method of choice for a quick scan of large data set, while experts may profit from studying the respective method in detail.

New improvements in protein secondary structure prediction

Burkhard Rost & Pierre Baldi

Quote: 2001 CUBIC preprint

We still cannot predict protein 3D structure from sequence, in general. But bioinformatics continues to improve methods available for predicting structural features. Particularly, the field of protein secondary structure prediction has advanced substantially in the 90's by combining algorithms from artificial intelligence with evolutionary information. Recently, growing databases and better search strategies have again boosted prediction accuracy by more than four percentage points. Today's most accurate methods predict more than 76% of all residues correctly in one of the three states helix, strand, or other. This high level has already been sustained by more than 300 new protein structures added since the methods were developed. However, the field is progressing rapidly: another unpublished algorithmic advance may already outperform the current state-of-the-art methods. The last two years also brought successful new concepts to the field. These new methods may be particularly interesting in light of the improvements achieved through simply combining existing methods. Divergent evolutionary profiles not only contain enough information to substantially improve prediction accuracy, but even to correctly predict long stretches of identical residues observed in alternative secondary structure states depending on non local conditions. An example is a method automatically identifying structural switches, and thus linking predicted secondary structure to aspects of function. Secondary structure predictions are increasingly becoming the workhorse of numerous methods aiming at predicting protein structure and function. Since the recent improvements yield better predictions of segments, and in particular of beta-strands, we believe that the recent increase in accuracy significant enough to make predictions even more useful.


 

 


 

2002 peer-reviewed

Alignments grow, secondary structure prediction improves

Dariusz Przybylski & Burkhard Rost

Quote: 2002 Proteins: Structure, Function, and Bioinformatics 46, 195-205

Using information from sequence alignments significantly improves protein secondary structure prediction. Typically, more divergent profiles yield better predictions. Lately, various groups have shown that accuracy can be improved markedly by using PSI-BLAST profiles to develop new prediction methods. Here, we focused on the influences of various alignment strategies on two 8-year old PHD methods. The following results stood out. (1) PHD using pairwise alignments predicts about 72% of all residues correctly in one of the three states helix, strand, other. Using larger databases and PSI-BLAST raised accuracy to 75%. (2) More than 60% of the improvement originated from the growth of current sequence databases; about 20% resulted from detailed changes in the alignment procedure (substitution matrix, thresholds, gap penalties). Another 20% of the improvement resulted from carefully using iterated PSI-BLAST searches. (3) Interestingly, we failed to improve prediction accuracy further when attempting to refine the alignment by dynamic programming (MaxHom and ClustalW). (4) Improvement through family growth appears to saturate at some point. However, most families have not reached this saturation. Hence, we anticipate that prediction accuracy will continue to rise with database growth.

Continuous assignment of secondary structure correlates with protein flexibility

Claus AF Andersen, Arthur G Palmer, S¿ren Brunak & Burkhard Rost

Quote: