- Research
- Teaching
- Group
- Events
- News Archive
J Reeb, E Kloppmann, M Bernhofer, & B Rost (2015). Proteins: Structure, Function, and Bioinformatics, 83(3), 473–484. doi:10.1002/prot.24749 Pubmed PDF
Poster ISMB 2015, TMSEG is available in predictprotein or as downloadExperimental structure determination continues to be challenging for membrane proteins. Computational prediction methods are, therefore, needed and widely used to supplement experimental data. Here, we re-examine the state-of-the-art in transmembrane helix prediction based on a non-redundant dataset with 190 high-resolution structures. Analyzing 12 widely-used and well-known methods using a stringent performance measure, we largely confirmed the expected high level of performance. All methods performed worse for proteins that could not have been used for development.
A few results stood out. Firstly, all methods predicted proteins in eukaryotes better than those in bacteria. Secondly, methods worked less well for proteins with many transmembrane helices. Thirdly, most methods correctly discriminated between globular water-soluble and transmembrane proteins. However, several older methods often mistook signal peptides for transmembrane helices. Some newer methods have overcome this shortcoming. In our hands, PolyPhobius and MEMSAT-SVM appeared better than other methods.
The dataset of 190 unique alpha-helical transmembrane proteins can be downloaded as excel spreadsheet or tab separated file.
The localization of transmembrane helices (TMHs) for our dataset has been annotated using OPM and PDBTM and can be dowloaded as FASTA file (subset of 44 new proteins).
Qok scores for all 12 prediction methods on various sets of TMPs. Qok denotes the percentage of proteins for which all TMHs were correctly predicted (A, TMH endpoints within five or less residues of either OPM or PDBTM annotation for the whole protein, Methods). Above the bars are the numbers of proteins in each dataset. Error bars are the sample standard deviation generated by bootstrapping with 1000 draws of half the set size each (cf. Methods). Qok is plotted for B: 190 redundancy-reduced TMPs followed by 44 new (not used for development) and 146 old (used for development, either the protein itself or homologous proteins) TMPs. All methods clearly performed worse for more recently determined protein structures. The old-new difference for TopPred2 suggested that a significant fraction of the differences might not be explained by over-training C: All methods reached higher Qok’s for eukaryotes than for bacteria. Note that we excluded the 9 archaeal and 2 sequences of viral origin. D: Performance declines from bitopic TMPs to those with 2-5 TMHs or more. For D, the number in brackets behind the set size denotes the number of TMHs in the respective subset.
Name | Publication |
---|---|
1Superseeded by TMSEG (unpublished) | |
TopPred2 |
Claros, M. G., & Von Heijne, G. (1994). TopPred II: an improved software for membrane protein structure predictions. Computer Applications in the Biosciences CABIOS DOI |
PHDhtm1 |
Rost, B., Casadio, R., Fariselli, P., & Sander, C. (1995). Transmembrane helices predicted at 95% accuracy. Protein Science DOI |
HMMTOP 2 |
Tusnády, G. E., & Simon, I. (2001). The HMMTOP transmembrane topology prediction server. Bioinformatics DOI |
TMHMM 2 |
Krogh, A., Larsson, B., von Heijne, G., & Sonnhammer, E. L. (2001). Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. Journal of Molecular Biology DOI |
SOSUI |
Hirokawa, T., Boon-Chieng, S., & Mitaku, S. (1998). SOSUI: classification and secondary structure prediction system for membrane proteins. Bioinformatics DOI |
Phobius |
Käll, L., Krogh, A., & Sonnhammer, E. L. L. (2004). A combined transmembrane topology and signal peptide prediction method. Journal of Molecular Biology DOI |
PolyPhobius |
Käll, L., Krogh, A., & Sonnhammer, E. L. L. (2005). An HMM posterior decoder for sequence feature prediction that includes homology information. Bioinformatics DOI |
MEMSAT3 |
Jones, D. T. (2007). Improving the accuracy of transmembrane protein topology prediction using evolutionary information. Bioinformatics DOI |
Philius |
Reynolds, S. M., Käll, L., Riffle, M. E., Bilmes, J. a, & Noble, W. S. (2008). Transmembrane topology and signal peptide prediction using dynamic bayesian networks. PLoS Computational Biology DOI |
SCAMPI |
Bernsel, A., Viklund, H., Falk, J., Lindahl, E., Von Heijne, G., & Elofsson, A. (2008). Prediction of membrane-protein topology from first principles. Proceedings of the National Academy of Sciences DOI |
SPOCTOPUS |
Viklund, H., Bernsel, A., Skwark, M., & Elofsson, A. (2008). SPOCTOPUS: a combined predictor of signal peptides and membrane protein topology. Bioinformatics DOI |
MEMSAT-SVM |
Nugent, T., & Jones, D. T. (2009). Transmembrane protein topology prediction using support vector machines. BMC Bioinformatics DOI |