Prediction of protein secondary structure at better than 70% accuracy

TitlePrediction of protein secondary structure at better than 70% accuracy
Publication TypeJournal Article
Year of Publication1993
AuthorsRost, B, Sander, C
JournalJ Mol Biol
KeywordsMathematical Computing Membrane Proteins *Neural Networks (Computer) *Protein Structure, Secondary Reproducibility of Results Sequence Alignment/*methods

We have trained a two-layered feed-forward neural network on a non-redundant data base of 130 protein chains to predict the secondary structure of water-soluble proteins. A new key aspect is the use of evolutionary information in the form of multiple sequence alignments that are used as input in place of single sequences. The inclusion of protein family information in this form increases the prediction accuracy by six to eight percentage points. A combination of three levels of networks results in an overall three-state accuracy of 70.8% for globular proteins (sustained performance). If four membrane protein chains are included in the evaluation, the overall accuracy drops to 70.2%. The prediction is well balanced between alpha-helix, beta-strand and loop: 65% of the observed strand residues are predicted correctly. The accuracy in predicting the content of three secondary structure types is comparable to that of circular dichroism spectroscopy. The performance accuracy is verified by a sevenfold cross-validation test, and an additional test on 26 recently solved proteins. Of particular practical importance is the definition of a position-specific reliability index. For half of the residues predicted with a high level of reliability the overall accuracy increases to better than 82%. A further strength of the method is the more realistic prediction of segment length. The protein family prediction method is available for testing by academic researchers via an electronic mail server.