bottom - CUBIC-papers - CUBIC

State-of-the-art in membrane protein prediction

Chien Peter Chen 1 & Burkhard Rost 1, 2, *

1 CUBIC, Department of Biochemistry and Molecular Biophysics, Columbia University, 650 West 168th Street BB217, New York, NY 10032, USA, <chen@cubic.bioc.columbia.edu|rost@columbia.edu>

2 Columbia University Center for Computational Biology and Bioinformatics (C2B2), Russ Berrie Pavilion, 1150 St. Nicholas Avenue, New York, NY 10032, USA

* Corresponding author: rost@columbia.edu, http://cubic.bioc.columbia.edu/ 
Tel: +1-212-305-3773, fax: +1-212-305-7932

 

Running Title: State-of-the-art in membrane prediction

Document statistics: Abstract = 196, Text = 6910 words, 163 references, 1 figure; 2 tables

Journal: Applied Bioinformatics

Submitted: 11/24/01; re-submitted: 01/17/02

 

This article is published in Applied Bioinformatics 2002: 1(1) 21-35
© copyright The Open Polytechnic of New Zealand (2002). OMJ is the only authorised source. All copying of this article including placing on another website requires the written permission of the copyright owner.

 

Table of contents



 


Abstract

Membrane proteins are crucial for many biological functions and have become attractive targets for pharmacological agents. The importance is reflected by the observation that about 10-30% of all proteins contains membrane spanning helices. Despite recent successes, high-resolution structures for membrane proteins remain exceptional. The gap between known sequences and known structures calls for finding solutions through bioinformatics. While many methods predict membrane helices, very few predict membrane strands. The good news is that most methods for helical membrane proteins are available and are more often right than wrong. The best current prediction methods appear to correctly predict all membrane helices for about 50-70% of all proteins and to falsely predict membrane helices for about 310% of all globular proteins. The bad news is that developers have seriously over-estimated the accuracy of their methods. In particular, while simple hydrophobicity scales identify many membrane helices, they frequently and incorrectly predict membrane helices in globular proteins. Additionally, all methods tend to confuse signal peptides with membrane helices. Nonetheless, wet-lab biologists can reach into an impressive toolbox for membrane protein predictions. However, for the computational biologists, they will have to improve their methods considerably before they reach the levels of accuracy they claimed.

 

 

Key words: genome sequence analysis, protein structure prediction, multiple alignments, transmembrane helices, transmembrane prediction.

 

 

Abbreviations used

ALOM2hydrophobicity-based prediction of membrane helices using a discriminant function [1]
DASdense alignment surface method predicting membrane helices [2]
GPCRG-protein coupled receptor: family of proteins with seven transmembrane helices
HMMHidden Markov model (statistical algorithm from machine learning)
HMMTOPHidden Markov model predicting transmembrane helices [3]
KDKyte and Doolittle [4]
KKDapplication of discriminant function to the KD hydropathy [1]
MEMSATdynamic-programming based prediction of transmembrane helices [5]
META-PPinternet service allowing to access a variety of bioinformatics tools through one single interface [6]
OMouter membrane
PHDhtmprofile-based neural network prediction of transmembrane helices [7, 8, 9]
PHDpsihtmPSI-BLAST profile-based neural network prediction of transmembrane helices [7, 8, 9]
PP (PredictProtein)internet server for protein sequence analysis and protein structure prediction [10, 8, 11]
PRED-TMRpropensity optimised hydropathy prediction of membrane helices [12]
PSI-BLASTposition specific iterated database search [13]
SOSUIhydrophobicity and amphiphilicity based transmembrane helix prediction [14]
SPLITtransmembrane helix prediction [15]
SRSSequence Retrieval System, i.e. a portal to simultaneously access most existing data bases [16, 17]
TMtransmembrane
TMAPalignment-based prediction of transmembrane helices [18]
TMFindermultiple hydrophobicity-scale-based prediction of membrane helices [19]
TMHtransmembrane helix
TMHMMTrans-Membrane prediction using Hidden Markov Models [20]
TMpredmembrane prediction based on statistical preferences [21]
TopPredhydrophobicity-based membrane helix prediction [22]
URLUniform Resource Locator, i.e., address of a web site
WWtransmembrane prediction based on the Wimley-White hydrophobicity scale [23] .


 

end

 

Introduction

Helical membrane proteins constitute an important class of proteins. Membrane proteins are crucial for survival. They constitute key components for cell-cell signalling, mediate the transport of ions and solutes across the membrane, and are crucial for recognition of self   [24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34] . Furthermore, the pharmaceutical industry preferably targets membrane bound receptors   [35, 36, 37, 38, 39] . A prominent example for the pharmacological importance of membrane proteins is the large super-family of G protein-coupled receptors (GPCRs), which include receptors for hormone, neurotransmitter, growth factor, light and odour-related ligands  [40, 41]. These receptors are of interest to the pharmaceutical industry as they present novel targets for drugs   [42, 43] . In addition, more than 50% of prescription drugs act on GPCRs   [44, 45, 46] . Besides the GPCRs, other important membrane protein families include ion channels, motor proteins, and bio-energetically-related proteins such as those involved in the electron transport system   [47, 48] .

Helical membrane proteins challenge bioinformatics. Despite the great biological and medical importance of membrane proteins, we still have very little experimental information about their 3D structures. Less than 1% of the proteins of known structure are membrane proteins. High-resolution structures are so scarce because membrane proteins are not easy to crystallise and are hardly tractable by NMR spectroscopy. Nonetheless, there are a number of recent and promising attempts to tackle membrane proteins by solid state and even by solution NMR   [49, 50, 51, 52, 53, 54, 55, 56, 57, 58] . Fortunately, it is relatively easy to identify the location of membrane helices through low-resolution experiments. An expert-curated list of low-resolution experiments maintained by Steffen Möller and colleagues  [59] considers information from C-terminal fusion with indicator proteins  [60, 61, 62, 63] 60; 61; 62; 63] and from antibody-binding  [62, 64, 65, 66, 67] . Nevertheless, the bad news remains that we have experimental information for less than 500 helical membrane proteins. We believe that the human genome alone codes for almost 10000 helical membrane proteins   [68, 69, 70] . Thus, bioinformatics is challenged to help bridge the information gap between what we want and what we have.

The lipid bilayer simplifies the prediction problem. Fortunately, the task to predict aspects of structure for the membrane regions of proteins is simplified by strong environmental constraints on transmembrane proteins: the lipid bilayer of the membrane reduces the degrees of freedom to such an extent that 3D-structure formation becomes almost a 2D problem. However, this constraint does not apply for the other class of membrane proteins, which are the porin-like proteins that form pores by b-strand barrels  [71, 72, 73]. Since there is not much experimental information available on different porin-like membrane proteins, it is difficult to develop prediction methods and to estimate prediction accuracy for this class.

Here, we summarise both the state-of-the-art and to some extent the history of attempts in computational biology and bioinformatics to predict a protein's transmembrane regions. We focus on the concepts and the resulting methods of which are available for everyday sequence analysis. Also, we discuss some of the major problems and practical aspects of these methods. The major problem in the field of membrane protein prediction, however, is the lack of experimental high-resolution data. Consequently, estimates for prediction accuracy are perhaps overly optimistic. Here, additionally, we suggest estimates that are as realistic as possible.



Fig. 1
fig1.gif

Fig. 1: Two types of membrane proteins.
A: The X-ray structure of the photo-reaction centre (PDB code 7prc) was the first high-resolution structure of an a-helical membrane protein   [162] . Represented in red are a-helices, in yellow b-strands (only in the non-membrane regions). The lipid membrane bilayer is crossed by the 11 helices in the middle of the structure. The N-terminus of the H (heavy) chain is marked by an arrow (left, middle); the beginning of that chain is highlighted in cyan (including the only membrane helix of that chain). The topology is defined by the orientation of the helices with respect to the membrane bilayer, here the upper part of the protein is located in the periplasm, the lower part in the cytoplasm. Hence, the membrane helix of the H chain has the topology OUT. B:  The X-ray structure of the transmembrane part of the Outer Membrane Protein A (ompA, PDB: 1bxw) is an example of a b-barrel membrane protein   [163] . The b-strands are given in yellow, the aromatic residues Tryptophan and Phenylalanine are in green. The protein contains only half as many (8) b-strands as most porins. Typically, membrane b-strands are amphipathic, i.e. residues i and i+2 are hydrophobic while residues i+1 and i+3 are hydrophilic, since one side of the strands points to the lipid bilayer, the other to the inside of the pore. We also indicated the specific band of aromatic residues that lines the interface between the core transmembrane regions and the exposed loops. The topology of beta-membrane proteins is typically determined by the location of the longest loops. Here, this loop is extracellular.

 

 




 

 

 

Concepts for predicting TM helix location and topology

Hydrophobicity scales provide simple criteria to predict membrane helices. Transmembrane helices (TMH) can be predicted based on the distinctive patterns of hydrophobic (transmembrane) and polar (non-membrane) regions within the sequence. These patterns are as follows: (1) TM helices are predominantly apolar and between 12 and 35 residues long [74]. (2) Globular regions between membrane helices are typically shorter than 60 residues [68, 70]. Most TMH proteins have a specific distribution of the positively charged amino acids arginine and lysine, coined by Gunnar von Heijne as the 'positive-inside-rule'   [75, 76] . Connecting 'loop' regions on the inside of the membrane have more positive charges than 'loop' regions on the outside ( Fig. 3 ). (4) Long globular regions (> 60 residues) differ in their composition from those globular regions subject to the 'inside-out-rule'. These simple facts have been at the heart of a variety of prediction methods developed over the last two decades. Methods have improved over time and a great number of ideas have been thrown at the problem. Here, we focussed on some of the major methods.

Hydrophobicity scales were introduced 20 years ago. Kyte and Doolittle developed one of the first methods that evaluated the hydrophilicity and hydrophobicity of a protein along the amino acid sequence   [4] . They defined a hydropathy scale that associated a hydropathy value to each amino acid. In order to identify membrane regions, they implemented a moving-window approach in which they simply summed the hydrophobicity scale over w adjacent residues in the native sequence   [4] . They tested window lengths of w = 9-12 adjacent residues, and found windows of 19 residues to discriminate best between membrane and globular proteins. Like for most succeeding methods, Kyte & Doolittle (KD) then had to define some threshold T to label a segment as 'membrane helix': if the sum over the hydrophobicity exceeded T, the segment was predicted to be a membrane helix. In particular, Kyte and Doolittle suggested a threshold of T > 1.6 for the average over 19 residues. Around the same time, Eisenberg and colleagues developed the helical hydrophobic moment as a measure of the amphiphilicity of a helix. This hydrophobic moment differed between transmembrane and globular helices and could thus be explored to predict transmembrane regions   [77] .

Predictions improve by processing simple hydrophobicity scales. Klein, DeLisi and colleagues combined a discriminant function, similar to the one introduced by Barrantes   [78] with the hydrophobic analysis of KD  [1]. In particular, they applied a quadratic discriminant function to the KD hydropathy scale and summed over a window of w = 17 <<<0 were classified as integral membrane proteins. Nakai & Kanehisa applied the same concept of filtering the simple scales through a quadratic discriminant function in their method ALOM2   [79] . The rationale of ALOM2 is that it first tentatively evaluates the number of putative membrane helices using a low threshold of 0.5. Then, it refines the predicted number by using a more stringent threshold of -2.0. After the transmembrane regions are predicted, ALOM2 applies a modified positive-inside rule developed by Hartmann, Rapoport and Lodish   [80] to predict the protein's topology, which in the realm of membrane proteins refers to the orientation of its N-terminus with respect to the lipid bilayer. Gunnar von Heijne introduced the 'positive-inside rule' reflecting the observation that non-membrane regions inside have more positively charged residues than the regions outside  [75]. Hartmann, Rapoport and Lodish  [80] altered this rule slightly by omitting the region flanking the first helix from the compilation. After the transmembrane regions were predicted, ALOM2 used this modified positive-inside rule to predict the membrane topology.

More refined indices improve predictions. Hydropathy-based methods still appear to be effective in predicting transmembrane segments. One of the drawbacks was that such methods fail to discriminate accurately between membrane regions and highly hydrophobic globular segments. The PRED-TMR algorithm uses a standard hydrophobicity analysis with an emphasis on the detection of potential helix ends   [12] . Using propensities of amino acid residues at the termini of transmembrane helices collected by the authors, PRED-TMR compiles scores for the termini of each putative segment. Based on the two termini scores, on a hydropathy score of a TMH, and on a length constraint, Pasquier developed a scoring function used to find the best prediction. In contrast, Jayasinghe et al. attempted to improve hydropathy analysis by directly improving the hydropathy scales   [23] . The commonly used hydrophobicity scales neglect the thermodynamic constraints a-helices impose on transmembrane stability. Hence, Jayasinghe et al. derived a whole-residue hydropathy scale from the Wimley-White experiments that took into account the backbone constraints. Another new hydrophobicity scale was at the heart of the TMFinder method  [19]. The scale (Liu-Deber scale) was based on the HPLC retention time of peptides with nonpolar phase helicity. It measured the propensity of an amino acid to be in an alpha-helical state based on circular dichroism.

Amino acid preferences for membrane- and non-membrane proteins can be used for prediction. Rather than using the observation that hydrophobic residues are abundant in transmembrane helices, we could conceive a more general strategy to infer from known membrane helices which amino acids have the highest preference for that state. Such a simple statistical evaluation was already the base for the first methods predicting secondary structure for globular proteins   [81, 82, 83] . TMpred is one of the methods using such statistical preferences to predict membrane helices taken from an expert-compiled data set of membrane proteins   [21] . TMpred combines several matrices for scoring. Juretic et al. integrated multiple scales for amino acids for the prediction of transmembrane regions in their method SPLIT   [84, 15, 85] . The authors derived amino acid preferences for the 'state' membrane helix from a data set of integral membrane proteins with partially known secondary structure. They also extracted preferences for beta-strand, turn and non-regular secondary structure based on sets of soluble proteins of known structure. The comparison with hydrophobicity plots suggested that the preference profiles were more accurate, exhibited higher resolution, and had less noise. Shorter unstable or movable membrane-helices were often missed by the hydrophobicity analyses in proteins with transport functions. In contrast, they were predicted by the combination of preferences. For instance, the N-terminal TM helices of voltage-gated ion channels and glutamate receptors were correctly identified by SPLIT.

Incorporating more information into methods improves prediction accuracy. A considerably more complex scheme for post-processing hydrophobicity scales was implemented in TopPred   [22] . TopPred predicted the complete topology of membrane proteins by using hydrophobicity analysis, automatic generation of possible topologies and ranking of these topologies by the positive-inside rule. First, the method introduced a particular sliding trapezoid window to detect segments of outstanding hydrophobicity using the GES-scale   [86] . The two bases of the trapezoid were chosen to be 11 and 21 residues long. The authors used the shape of a trapezoid to combine the favourable noise-reduction of a triangular window   [87] with a more physically relevant rectangular window that represents the central non-polar region of the lipid bilayer. Next, TopPred explored the positive-inside rule. This rule simply states the observation that positively charged residues (Arg and Lys) are more abundant on the inside of membranes   [75] . Generally, this fact allows for membrane protein topology prediction. However, TopPred went a step further by adjusting the thresholds for considering a segment to be a membrane helix dynamically such that the difference between the number of positively charged residues at the inside and at the outside became highest. All these refinements implemented in TopPred led to a major improvement of prediction accuracy   [22] . SOSUI combined a variety of physico-chemical parameters to detect transmembrane proteins  [14]. In particular, the following parameters are used to detect membrane helices: KD hydropathy, an amphiphilicity, relative and net charges, and protein length.

Increasing the complexity by implementing dynamic programming improves performance. In 1994, MEMSAT   [5] implemented statistical tables (log likelihoods) compiled from well-characterised membrane protein data and a dynamic programming algorithm to recognise membrane topology models by expectation maximisation. Residues are classified as being one of five structural states as follows: Li (inside loop), Lo (outside loop), Hi (inside helix end), Hm (helix middle), and Ho (outside helix end). Helix end caps are defined to span over four adjacent residues (one helical turn). Next, the authors extracted the propensity of each amino acid for each of these five states from experimentally well described membrane proteins. Using these propensities, MEMSAT calculates a score relating a given sequence to a predicted topology and arrangement of membrane helices. The particular feature of MEMSAT is that it finds the optimal score through dynamic programming, i.e. an algorithm also explored to finding the optimal pairwise sequence alignment   [88, 89] . Thus, MEMSAT finds the best out of a great number of possible predictions   [5] .

Evolutionary information from protein families raises accuracy further. Until 1996, automatic methods based their predictions of membrane regions on the properties of single protein sequences. From predicting secondary structure for globular proteins, we know that using alignment information improves prediction accuracy significantly   [90, 91, 92] . PHDhtm was the first method that used information from protein families for membrane predictions   [7, 93, 9] . In the initial version, location and topology of membrane helices were simply predicted by a system of neural networks   [7] . PHDhtm was then   [93, 9] refined by post-processing the neural network output through a dynamic programming like algorithm similar to the one introduced by Jones et al. The combination of various algorithms and multiple alignment information resulted in what is still one of the most accurate prediction methods today. TMAP was another early application of multiple sequence alignments to determine membrane-spanning segments   [18] . TMAP based on propensity values determined for segments of 21 consecutive residues in transmembrane segments (Pm) and for the flanking four-residue caps (ends) of membrane helices (Pe). Residues with high Pm tended to be hydrophobic whereas those with high Pe tended to be basic and polar residues. The compositional difference in the protein segments exposed to the two surfaces of a membrane for twelve important residues was determined. Ratios were calculated for Asn, Asp, Gly, Phe, Pro, Trp, Tyr, and Val (mostly found at the outside of membranes) and for Ala, Arg, Cys, and Lys (mostly inside). The consensus over these twelve residues was used to predict topology. Multiple alignments improve prediction accuracy. However, for 20-30% of all proteins, there are no homologues in current databases   [70] . In response to this situation, the so-called dense alignment surface (DAS) method was developed   [2] . DAS bases on the RreM scoring matrix originally introduced to improve alignments for G-protein coupled receptors. It compares low-stringency dot-plots of the query protein against the background representing the universe of non-homologous membrane proteins using the RreM scoring matrix.

Grammatical rules reflect global aspects of membrane regions. The lipid bilayer constrains the structure of the membrane-passing regions of proteins in many ways. TMHMM pioneered building models of predicted membrane proteins considering a variety of such constraints in one consistent methodology   [20, 69] . A similar concept was implemented in HMMTOP   [3, 94] . TMHMM and HMMTOP realise their models through hidden Markov models (HMMs). TMHMM implements a cyclic model with seven states for transmembrane-helix (TMH) core, TMH-caps on the N- and C-terminal sides, non-membrane regions on the cytoplasmic side, two non-membrane regions on the non-cytoplasmic side, and a globular domain state in the middle of each non-membrane region. The two non-membrane regions on the non-cytoplasmic-side model short and long loops respectively, which correspond to two different membrane insertion mechanisms. In contrast, HMMTOP uses a hidden Markov model distinguishing the following five structural states: inside non-membrane region, inside TMH-cap, membrane helix, outside TMH-cap, and outside non-membrane region. Conceptually, this model is similar to the one used in MEMSAT   [5] . It differed in the placement and interpretation of TMH-caps, which Tusnady et al. interpret as not being in the membrane   [3] .

Helical caps can be predicted by molecular dynamics. Molecular dynamics methods attempt to represent protein conformations and have been used together with energy minimisation to simulate protein folding   [95, 96, 97, 98, 99, 100, 101] . In practice, both the enormous complexity of the free parameters and the inaccuracy in experimentally determining the fundamental constants seriously hamper the success of such methods. However, they sometimes yield accurate predictions for short peptides such as membrane helices. Molecular dynamics simulations in an explicit lipid and water environment have been used to define the precise ends of TM helices   [102, 103] . Molecular dynamics typically generates many possible models rather then unambiguously pointing to one single model. Briggs and colleagues present a new approach to select among candidate models   [104] . They assume that neutral amino acid substitutions do not affect the stability of a native structure but may destabilise the non-native structures. Applying this assumption to the a-helical transmembrane domains of two homodimers (human glycophorin A and human CD3-zeta), they in fact identify a single model by their simulation.

 

 

Concepts for predicting TM beta-sheet proteins

There is a structural variety of beta-membrane proteins. b-barrel membrane proteins are found in the outer membranes (OMs) of Gram-negative bacteria and likely in the OMs of mitochondria and chloroplasts. In prokaryotes, they mediate non-specific, passive transport of ions and small molecules or can selectively pass molecules such as maltose and sucrose   [105, 106, 107, 108, 109] . In eukaryotic organelles, b-barrel membrane proteins have been suggested to be involved in voltage-dependent anion channels  [110]. This wide range of functions is associated with a wide range of structural variants: b-barrel membrane proteins with barrel sizes from small 8-stranded to large 22-stranded b-barrels and with different topologies   [111] . Of the b-barrel membrane proteins, porins are the best studied. Many porin barrels are trimers and contain 16 anti-parallel b-strands; maltoporin from E.coli contains 18 strands   [107] . A band of hydrophobic residues encircles the trimer   [112, 113, 107] ( Fig. 1 ). Porins also contain a central channel that is partially blocked by a loop that folds inwardly and is attached to the inner side of the barrel wall   [109] . This arrangement forms an "eyelet" which defines the size of solute molecule that can traverse the channel. Currently, high-resolution structures are only available for bacterial OM proteins   [114] .

Membrane strands are difficult to predict.  Unlike for a-helical membrane proteins, there are no simple low-resolution experiments that yield large amounts of data for b-barrel membrane proteins. This has constrained the ability to develop prediction methods. Many b-strands contain alternating hydrophobic and hydrophilic side-chains. However, this simple usually does not suffice to identify membrane strands   [111] . Methods that implement physico-chemical properties were applied successfully only in the context of experimental information   [115, 116, 117] . All early attempts to predict membrane strands employed the amphipacity and hydrophobicity of b-strands. Paul and Rosenbusch attempted a minimal approach to predict and identify segments causing polypeptides to reverse their direction (turn identification) but they avoided hydrophobicity parameters   [115] . In contrast, Jahnig suggested that a generalisation of hydrophobicity analysis was sufficient to predict membrane-spanning amphiphilic a-helices and b-strands   [118] . Unfortunately, membrane strands have no long stretch of consecutive hydrophobic residues. In fact, the overall hydrophobicity for b-barrel membrane proteins is similar to that of soluble proteins. Welte and colleagues compared the hydrophilicity profiles and sequences of porin from Rhodobacter capsulatus with those of OmpF and PhoE from Escherichia coli. They determined a set of specific insertions and deletions in the alignments of these proteins, and inferred that OmpF and PhoE have similar structures in their membrane-spanning regions. Their experimental work verified this prediction   [116] . Cowan and colleagues   [119] suggested to use the mean hydrophobicity of one side of a putative b-strand by averaging over hydrophobic moments   [120] of every second residue within a sliding window   [121, 117] . To improve the signal-to-noise-ratio, they accounted for the band of aromatic residues in flanking positions of the b-strands. Another method that was considered for predicting b-membrane spanning regions was a rule-based approach. Gromiha and colleagues combined amino acid preferences for b-strands with the surrounding hydrophobicity of the respective residues to predict b-strands   [122, 123] . With their method they reproduced about 82% of the residues in structurally known membrane regions.

Non-linear statistics enables to predict membrane beta-strands. Diederichs and colleagues proposed to use a neural network to predict the topology of the bacterial OM b-strand proteins and to locate residues along the axes of the pores   [124] . The neural network predicts the z-coordinate of C-alpha atoms in a coordinate frame with the outer membrane in the xy-plane, such that low z-values indicate periplasmic turns, medium z-values indicate transmembrane b-strands, and high z-values indicate extracellular loops. Most recently, Jacoboni, Fariselli, Casadio and colleagues applied a method combining neural networks and dynamic programming to predict the location of membrane strands   [125] . The networks used alignment information as input and predicted whether or not a particular residue is part of a membrane strand. In the second step, the method simply finds the optimal path through the network prediction, much like the methods applied to predict membrane helical proteins   [5, 7, 93] . Finally, the topology is assigned based on the location of the longest loop that is taken to be exterior. The authors attempted to cross-validate a beta-membrane prediction method and estimated that their system correctly predicts about 93% of all known membrane-strands.

 

 

Practical Aspects



Availability

Most methods described are available through public servers. A list of URL's and the contact addresses are given in Table 1 . Most programs - except for ALOM2, Eisenberg, KD, KKD, PRED-TMR, TMAP, TMpred, and WW - are also available through META-PP, which provides a single interface to simultaneously access many high-quality servers  [6]. This concept of accessing many servers through one has been pioneered by the BCM-Launcher  [126] supposedly accessing the largest number of different methods. Other combinations are given by NPSA  [127], META-Poland  [128, and ProSAL  [129]. In contrast to all others, META-PP attempts to (i) return as few results as possible by filtering out technical messages and to (ii) combine only high-quality methods. A generalisation of the 'common interface' idea is implemented in the sequence retrieval system SRS  [16, 17], which enables a simultaneous access of most existing databases. Successively SRS starts to also incorporate the direct access to prediction methods.



Table 1 : Availability of predictionmethods

MethodServerProgram
Helical membrane proteins
ALOMpsort.nibb.ac.jp/form.html

Kenta Nakai knakai@ims.u-tokyo.ac.jp

DASwww.sbc.su.se/~miklos/DASmiklos@bip.bham.ac.uk
HMMTOPwww.enzim.hu/hmmtopGábor E. Tusnády tusi@enzim.hu
MEMSATinsulin.brunel.ac.uk/psipredDavid Jones d.jones@cs.ucl.ac.uk
KDfasta.bioch.virginia.edu/fasta/grease.htmWilliam Pearson wrp@virginia.edu
PHDhtmcubic.bioc.columbia.edu/predictproteinBurkhard Rost rost@columbia.edu
SOSUIsosui.proteome.bio.tuat.ac.jp/Mitaku Group sosui@proteome.bio.tuat.ac.jp
SPLITwww.mbb.ki.se/tmap/index.htmlDavor Juretic juretic@mapmf.pmfst.hr
TMAPwww.mbb.ki.se/tmap/index.htmlBengt Persson Bengt.Persson@ibp.vxu.se
TMHMMwww.cbs.dtu.dk/services/TMHMM-1.0Anders Krogh krogh@cbs.dtu.dk
TMpredwww.ch.embnet.org/software/
TopPred2www.sbc.su.se/~erikw/TopPred22Gunnar von Heijne gunnar@dbb.su.se
WWblanco.biomol.uci.edu/mpex/Stephen White blanco@helium.biomol.uci.edu
Beta-sheet membrane proteins
Beta-strand predictorwww.biocomp.unibo.it (upon request)




 

 



Prediction accuracy

Performance of prediction methods has been over-estimated significantly! For all the methods described in this review high levels of prediction accuracy have been reported. Frequently, authors were daring enough to claim that their methods correctly predicted more than 90% of all membrane helices. We cannot estimate the accuracy of existing methods since they have all been developed using the known membrane proteins. However, we can estimate an upper limit for prediction accuracy. This limit suggests that developers have over-rated their methods by 15-50%   [74] . How could this have happened? There are a variety of reasons. (1) We do not have enough high-resolution structures to allow a statistically significant analysis   [74] . With this bottleneck, training/developing and test sets may share or have homologous members. To get around this problem, developers include low-resolution experimental data and structures in their data sets. One caveat of this practice is to assume that low-resolution experiments – e.g. gene fusion - are sufficiently similar to high-resolution structures (crystallography). Unfortunately, this is not the case. In fact, low-resolution experiments differ from high-resolution experiments almost as much as prediction methods do   [74] . Hence, low-resolution experiments are not sufficient to evaluate prediction accuracy. (3) All methods optimise some parameters. Since there are so few high-resolution structures, all methods use as many of the known ones as possible. However, methods perform much better on proteins for which they were developed on than on new proteins, and this was overlooked in a recent analysis of prediction methods [130]. Methods using evolutionary information failed due to the surprising fact that membrane helices are not entirely conserved across species. This observation is surprising since it implies that these proteins either do not perform similar cellular functions – e.g. G-coupled receptor – or that we can actually realise the function with a different number of membrane regions in some cases. (5) Finally, levels of prediction accuracy published between methods can often not be compared appropriately to one-another since they are frequently based on different measures for prediction accuracy and on different data sets. The latter prompted, Möller, Apweiler, and colleagues to collect a set of well-characterised integral membrane proteins   [59] . Each protein has been assigned a reliability index depending on the available structural and biochemical data. Currently, from the total set of 320 proteins in the data set, there are 33 membrane proteins with known structures, 24 with biochemical characterisation, and 142 with partial biochemical evidence. The data set can be accessed via ftp://ftp.ebi.ac.uk/databases/testsets/transmembrane.

Most methods get the number of helices right for most membrane proteins. In terms of per-residue scores, the best current methods correctly predict more than 65% of the observed TMH residues correctly ( Table 2 ). All methods based on advanced algorithms tend to under-estimate transmembrane helices ( Table 2 : %obs > %prd). Thus, about 86% of the TMH residues predicted by the best methods in this category PHDhtm and DAS are correctly predicted. If we consider a helix to be predicted correctly if the predicted and the observed helix differ by less than three residues, the best current methods predict all membrane helices correctly for less than 75% of the proteins ( Table 2 ). However, the topology is predicted correctly for only about half of all proteins ( Table 2 ). The only exception is HMMTOP2 (Table 2), however; all proteins tested here were used to train HMMTOP2. Hence, the level of 61% accuracy in topology prediction may be over-estimated, significantly. In terms of per-residue scores, the best current methods correctly predict more than 65% of the observed TMH residues correctly ( Table 2 ). All methods based on advanced algorithms tend to under-estimate transmembrane helices ( Table 2 : %obs > %prd). Thus, about 86% of the TMH residues predicted by the best methods in this category PHDhtm and DAS are correctly predicted. Although the results summarised in Table 2 are similar to those recently compiled on a non-unique set of low- and high-resolution structures   [130] , most estimates still constitute over-estimates, since very few methods shown (DAS, PHDhtm, TopPred2) did NOT use most of the proteins to optimise prediction accuracy.



Table 2
Table 2 : Accuracy of popular prediction methods.

 

Per-segment accuracyPer-residue accuracy

Method

Qok

TOPO

Q2

 

ERROR

± 9± 7± 7± 9± 3± 6± 7± 4± 4

 

DAS

799996 7248 94 9762

HMMTOP2

83999961797089 8871

PHDhtm07

61868337807287 8274

PHDhtm08

56807735807287 8274

PHDpsihtm07

649594-8076838680

PRED-TMR

618490 76588594 66

SOSUI

718886 7566748069

TMHMM1

719090458068818972

TopPred2

759090547764839069

WW

549591 7171726767

Abbreviations used:

• Dataset: Sequence-unique subset of 36 high-resolutionmembrane helical proteins from PDB   [164] .Note: this is the largest subset of all 105 high-resolution membrane chainswhich fulfils the condition that no pair in the set has significant sequencesimilarity as defined in   [159].
• Methods: see abbreviations at begin of article.
• Per-segmentaccuracy: Qok gives the percentage ofproteins for which all TM helices are predicted correctly (allowed deviation ofup to 3 residues), of all observed helices that are correctly predicted, percentage of all predicted helices that are correctly predicted, TOPO thepercentage of proteins for which the topology (orientation of helices) iscorrectly predicted (note: empty for methods that do not predict topology).
• Per-residueaccuracy: Q2 percentage of correctlypredicted residues in two-states: membrane helix / non-membrane helix, all observed TMH helix residues that are correctly predicted, all predicted TMH helix residues that are correctly predicted, all observed non-TMH helix residues that are correctly predicted, all predicted non-TMH helix residues that are correctly predicted. 
• ERROR:the estimates for per-segment accuracy resultedfrom a bootstrap experiment with M = 100 and K = 18; the estimates forper-residue accuracy were obtained by standard deviations over Gaussiandistributions for the respective score.
• Numbersin italics: two standard deviations below thenumerically highest value in each column (set in bold letters).
• Noteof caution: all methods are tested on the same setof proteins. However, the numbers are NOT from a cross-validation experiment,i.e. some methods may have used some of the proteins for training. Generally,newer methods are more likely to be over-estimated than older ones. Inparticular, HMMTOP2, TMHMM1, and WW have been developed using ALL the proteins,listed here.



 

Simple hydrophobicity scales are less accurate than advanced methods. A surprising result recently published suggested that simple hydrophobicity scales predict membrane helices almost as accurately as do the most advanced current prediction methods   [130] . We tested 20 different hydrophobicity scales on various data sets and could not confirm this optimism   [74] . Rather, the example given in Table 2 for the Whitney-White scale (WW) appeared to be one of the best simple hydrophobicity scales although it predicts all membrane helices correctly for only 54% of the proteins tested. In fact, most hydrophobicity scores locate all helices without over-prediction for less than 40% of the proteins   [74] .

All methods confuse membrane helices with signal peptides. Signal peptides that are cleaved off secreted proteins usually contain stretches of hydrophobic residues resembling membrane helices   [131, 132, 133, 134] . Hence, most methods confuse signal peptides with membrane helices. The best separation is achieved by ALOM2, a method optimised to sort proteins into classes of sub-cellular localisation   [79, 135] . The most accurate specialists for membrane prediction (TMHMM, and PHDhtm) appear to falsely predict signal peptides as membrane helices for 30-40% of all the signal peptides we tested   [74] . Surprisingly accurate in rejecting signal peptides is the Wolfenden scale for hydrophobicity   [136] . All other hydrophobicity scales predict more than 90% of the signal peptides as membrane helices   [74] .

Many methods predict membrane helices in globular proteins. Surprisingly, most methods have also been over-estimated significantly in their ability to distinguish between globular and membrane proteins. Particular poor is the distinction by hydrophobicity-based methods, which have reached levels of nearly 100% false positives  [74]. In fact, the only scales we tested that incorrectly detected membrane helices in less than 80% of all globular proteins we tested   [74] were: Wolfenden = 2%, WW = 32%, and Eisenberg-scale = 66%. SOSUI, TMHMM1, and PHDhtm currently distinguish best between membrane and non-membrane proteins. These three predict membrane helices in less than 2% of the globular proteins. Similar results were reported on globular proteins taken from SWISS-PROT  [130].

 



Genome analysis

Despite the over-estimated performance, predictions of transmembrane helices are valuable tools to quickly scan proteomes of entirely sequenced organisms for membrane proteins. As stated above, hydrophobicity-based methods mostly fail to distinguish membrane and globular proteins   [74] . Nevertheless, the averages of helical membrane proteins published for entire genomes are surprisingly similar between different authors   [137, 9, 138, 139, 140, 68, 70] . Apparently, about 10-30% of all proteins contains membrane helices. One crucial difference between the results from different groups is that more cautious estimates do not find a statistically significant difference in the percentages of TMH proteins between the three kingdoms: eukaryotes, prokaryotes and archae   [141] . Thus, the overall content of helical membrane proteins appears not to correlate with the postulated complexity of an organism (eukaryotes more complex than prokaryotes, prokaryotes more complex than archae). However, eukaryotes have significantly more proteins with over 10 membrane helices than all other species. Furthermore, the three kingdoms also differ in the types of membrane proteins that are most abundant. For example, eukaryotes have more 7TM proteins (receptors), while prokaryotes have more 6- and 12TM proteins (ABC transporters)   [68, 70] .

 

 

Emerging and future developments

Membrane-helix predictions can be improved by averaging over many methods. The prediction of secondary structure for globular proteins can be improved by combining many prediction methods   [92, 142] . Applying a similar average, Promponas and colleagues developed their method CoPreTHi, a Web-based application that uses the results from DAS, ISREC-SAPS, PHDhtm, PRED-TMR, SOSUI, TMpred and TopPred2   [143] . CoPreTHi combines the results into a joint prediction histogram; residues are predicted as transmembrane if they are identified as such by at least three methods. Nilsson and colleagues   [144] explored consensus predictions for membrane protein topology to derive a reliability for the prediction. In particular, they used five methods (TMHMM, HMMTOP, MEMSAT, TopPred2, and PHDhtm) to evaluate a test set of 60 Escherichia coli inner membrane proteins with experimentally determined topologies. They found that prediction performance varies strongly with the number of methods that agree, and that the topology of nearly half of all inner membrane proteins can be predicted with high reliability (>90% correct predictions) by a simple majority vote. When only two methods agree on topology, none of the topologies were found to be correct.

Identifying amphiphilic a-helices may improve predictions. A number of a-helix forming peptides have been reported to promote membrane fusion and other biological events related to the disruption of the hydrophobic/hydrophilic interface induced by the hydrophobicity gradient along the central helical axis. This hydrophobicity gradient may facilitate the penetration of a membrane and may thus destabilise the packing of the lipids in the membrane bilayer and/or of the protein/water interface. This could then disrupt the interface and could promote related biological events   [145, 146, 147, 148, 149, 150] . To facilitate more detailed descriptions of amphiphilic a-helices, quantitative methods have been developed that measure the overall amphiphilicity of helices. Examples are the 'Depth Weighted Insertion Hydrophobicity (DWIH) method   [151] , and the commonly used hydrophobic moment introduced by Eisenberg and colleagues   [77, 120] . Harris et al.   [152] improved the identification of obliquely orientated a-helices through a hydrophobic moment plot. In particular, they found a linear association between the mean hydrophobic moment <mH> and the corresponding mean hydrophobicity, <H0>. The association was described by the least squares regression line: <mH> =0.508- 0.422< H0 >. Hence, proteins that fall along this line would be a putative oblique-orientated a-helix. The results suggested that oblique orientated a-helices may possess a characteristic balance between the amphiphilicity and the hydrophobicity of their structures   [152] .

Helical-membrane and signal peptide predictions have to be combined explicitly. One of the problems with some of the current methods is that they falsely predict signal peptides as transmembrane helices. The best signal peptide identification tool appears to be SignalP   [132, 133, 134] . Trivially, this method can be incorporated into a post-prediction filter to remove predicted helices in the signal peptide region. Two methods have been developed that work in this direction. (1) PSORT   [135] uses a variety of predictions and sequence motifs to group proteins according to their sub-cellular localisation thereby implicitly combining membrane predictions and signal peptide predictions. (2) HMMTOP and TMHMM implicitly use known signal peptides to refine their predictions. However, a more thorough combination is still missing.

There are databases for particular families of membrane proteins and sequence motifs. Databases of protein signatures, i.e. relatively short sequence motifs, are becoming increasingly valuable diagnostic resources. While PROSITE   [153] annotates single motifs that have been unravelled experimentally, PRINTS encodes groups of motifs in the form of fingerprints   [45] . For instance, receptor subtype fingerprints comprise different parts of the terminal, loop, and TM regions of G-protein-coupled receptors (GPCRs). Databases such as these can certainly be incorporated into membrane protein prediction methods to help identify novel receptors   [154] . The strong interest in GPCRs has also led to specialised bioinformatics tools that identify GPCRs. Kim et al.   [155] presented an algorithm dubbed quasi-periodic feature classifier (QFC), that characterises the physico-chemical properties of membrane proteins with multiple helices. They apply a non-parametric linear discriminant function to their variables describing the 'feature space' and thus separate GPCRs from non-GPCRs. The expected advantage of this approach is that it may find more remotely similar homologues than methods purely based on sequence similarity. Unfortunately, a thorough cross-validation of the method that would undermine this hope is still missing.

Membrane-specific substitution matrices improve database searches. Database searches base on alignment methods that need to score the match of amino acid X in protein A with amino acid Y in protein B. A variety of substitution matrices are used for this purpose   [156, 157] . All these substitution matrices were developed based on data sets of globular proteins. It is then not surprising that these matrices are not optimal to align membrane regions. Ng, Henikoff and Henikoff have recently addressed this problem by developing the membrane-helix specific substitution matrix PHAT   [158] . They demonstrated that this matrix aligned membrane proteins more accurately than globular matrices. The PHAT matrix series used target frequencies from PHDhtm matrices (i.e. from transmembrane regions) and background frequencies from the Persson-Argos matrix (i.e. from hydrophobic regions) with corresponding relative entropy. Obviously, the necessary next step is to implement the following cycle: (1) predict membrane helices based on standard alignments, (2) use PHAT for the predicted membrane region to re-align, (3) use the PHAT alignment to refine the prediction, and (4) possibly repeat steps 2-3. Such a refined search may allow to automatically detect distant similarities in the twilight zone   [159] that otherwise remain hidden until the experimental structure is available.

 

 

Conclusions

Optimist: membrane predictions are relatively accurate and useful. Overall, prediction methods are more accurate and more useful for membrane proteins than they are for globular proteins. For helical membrane proteins, the best current methods appear to correctly predict all membrane helices for more than 60% of all proteins. Furthermore, all advanced methods that are not based solely on hydrophobicity incorrectly detect membrane helices in less than 10% of all globular proteins   [130, 74] . In contrast, most methods based only on hydrophobicity go wrong for more than 80% of all globular proteins   [74] . Even the best current methods often confuse signal peptides and membrane helices   [74] . Nevertheless, most often the best methods correctly reject signal peptides   [74] . Most prediction errors constitute the over- or under-prediction of a single membrane helix. While this has important impacts on functionally classifying the protein, the good news is again that most often the good methods correctly predict the number of membrane helices, i.e. may help in providing a first clue about aspects of function in the context of genome analysis. Recently, a number of tools have addressed the problem to predict beta-membrane proteins. The estimated levels of prediction accuracy are promising. Unfortunately, there is no accurate method, yet, that detects beta-membrane proteins in context of entire genome searches. 

Pessimist: all methods have been over-estimated significantly. A number of recently determined high-resolution structures of membrane proteins revealed that the accuracy of low-resolution experiments may have been over-estimated. However, the accuracy of prediction methods was over-estimated much more seriously   [74] . Particular problems for prediction methods result from the following observations   [74] : (1) many membrane helices span over more than 30 residues, and (2) membrane helices are not as well conserved as they appeared to be in the much smaller sequence databases of a decade ago. A seemingly 'trivial' flaw of many estimates published by various groups was that they compared results based on different data sets and different scores measuring accuracy. However, the most important problem may have been that developers were not careful enough in avoiding over-fitting of the few experimentally known proteins. In fact, this reality strongly constrains the estimates of accuracy provided in this analysis: only the methods published before 1998 (DAS, PHD, and TopPred2) did not use most of the proteins for which the results are given in Table 2 . Thus, the actual prediction accuracy may even be lower. Current prediction methods are still valuable both for everyday sequence and entire proteome analysis. However, it seems that 'simple' predictions of the location of helices are not as simple as anticipated. A lot of work remains to be done before we reach the levels of accuracy that optimists may have believed were reached a decade ago.

The ultimate solution: we need more high-resolution experiments! Promising new strategies may yield more high-resolution structures of membrane proteins. The frequently observed instability of membrane proteins outside of a lipid-bilayer may call for crystallising these proteins in membrane-like environments   [160] . Such a membrane system, which consists of lipid, water, and protein in appropriate proportions, forms a complex three-dimensional lipidic array providing nucleation sites ("seeding") and support growth by lateral diffusion of protein molecules in the membrane ("feeding"). Future developments may include the use of different lipids, the inclusion of various additives, the development of different types of crystallisation screens, and the rational introduction of covalent or non-covalent lattice contacts   [161] . Although, structural genomics for membrane proteins is still far away, we hope that with every dozen of new high-resolution structures solved, prediction methods will gradually evolve. How many years will it then take until prediction methods reach the levels of accuracy that had been mistakenly published already in the last millennium? The answer depends on the number of surprises about non-canonical features of membrane proteins that will await us on the road ahead! Clearly, the surprises of the last five years of details in high-resolution structures have re-opened the field of simply predicting topology and location for membrane proteins. 

 

 

Acknowledgements

Thanks to Jinfeng Liu (Columbia) for computer assistance. Thanks to Henry Bigelow for comments and help with the part discussing beta-membrane prediction methods. Particular thanks to Volker Eyrich (Columbia) for programming and maintaining most of the immensely valuable software that runs the EVA and META-PredictProtein servers! The work of BR was supported by the grants 1-P50-GM62413-01 and RO1-GM63029-01 from the National Institute of Health. Last but not least, thanks to all those who enable the development of prediction methods by depositing experimental information in public databases and to all those who maintain such databases.

 

References

1.Klein, P.,Kanehisa, M. & De Lisi, C. (1985). The detection and classification ofmembrane-spanning proteins. Biochim. Biophys. Ac., 815, 468-476.
2.Cserzö, M., Wallin, E., Simon, I., von Heijne, G. & Elofsson, A.(1997). Prediction of transmembrane a-helices in prokaryotic membrane proteins: thedense alignment surface method. Prot. Engin.,10, 673-676.
3.Tusnady, G. E. & Simon, I. (1998). Principles governing amino acidcomposition of integral membrane proteins: application to topology prediction. J.Mol. Biol., 283,489-506.
4.Kyte, J. & Doolittle, R. F. (1982). A simple method for displaying thehydrophathic character of a protein. J. Mol. Biol., 157, 105-132.
5.Jones, D. T., Taylor, W. R. & Thornton, J. M. (1994). A model recognitionapproach to the prediction of all-helical membrane protein structure andtopology. Biochem., 33, 3038-3049.
6.Eyrich WWW, V. & Rost, B. (2000). The META-PredictProtein server.
7.Rost, B., Casadio, R., Fariselli, P. & Sander, C. (1995). Prediction ofhelical transmembrane segments at 95% accuracy. Prot. Sci., 4, 521-533.
8.Rost, B. (1996). PHD: predicting one-dimensional protein structure by profilebased neural networks. Meth. Enzymol., 266, 525-539.
9.Rost, B., Casadio, R. & Fariselli, P. (1996). Topology prediction forhelical transmembrane proteins at 86% accuracy. Prot. Sci., 5, 1704-1718.
10.Rost, B., Sander, C. & Schneider, R. (1994). PHD - an automatic serverfor protein secondary structure prediction. CABIOS, 10, 53-60.
11.Rost WWW, B. (2000). PredictProtein - internet prediction service.
12.Pasquier, C., Promponas, V. J., Palaios, G. A., Hamodrakas, J. S. &Hamodrakas, S. J. (1999). A novel method for predicting transmembrane segmentsin proteins based on a statistical analysis of the SwissProt database: thePRED-TMR algorithm. Prot. Engin., 12, 381-385.
13.Altschul, S., Madden, T., Shaffer, A., Zhang, J., Zhang, Z. et al. (1997).Gapped Blast and PSI-Blast: a new generation of protein database searchprograms. Nucl. Acids Res., 25, 3389-3402.
14.Hirokawa, T., Boon-Chieng, S. & Mitaku, S. (1998). SOSUI: classificationand secondary structure prediction system for membrane proteins. Bioinformatics, 14, 378-379.
15.Juretic, D., Zucic, D., Lucic, B. & Trinajstic, N. (1998). Preferencefunctions for prediction of membrane-buried helices in integral membraneproteins. Comput. Chem., 22, 279-94.
16.Etzold, T. & Argos, P. (1993). SRS - an indexing and retrieval tool forflat file data libraries. Comput. Appl. Biosci.,9, 0-0.
17.Etzold, T., Ulyanov, A. & Argos, P. (1996). SRS: Information retrievalsystem for molecular biology data banks. Meth. Enzymol., 266, 114-128.
18.Persson, B. & Argos, P. (1996). Topology prediction of membraneproteins. Prot. Sci., 5, 363-371.
19.Deber, C. M., Wang, C., Liu, L. P., Prior, A. S., Agrawal, S. et al. (2001).TM Finder: a prediction program for transmembrane protein segments using acombination of hydrophobicity and nonpolar phase helicity scales. Prot. Sci., 10, 212-9.
20.Sonnhammer, E. L. L., von Heijne, G. & Krogh, A. (1998). A hidden Markovmodel for predicting transmembrane helices in protein sequences. In SixthInternational Conference on Intelligent Systems for Molecular Biology (ISMB98)eds.), pp. 175-182.
21.Hofmann, K. & Stoffel, W. (1993). TMBASE - a database of membranespanning protein segments. Biol. Chem. Hoppe-Seyler, 374, 166.
22.von Heijne, G. (1992). Membrane protein structure prediction. J. Mol.Biol., 225, 487-494.
23.Jayasinghe, S., Hristova, K. & White, S. H. (2001). Energetics,stability, and prediction of transmembrane helices. J. Mol. Biol., 312, 927-934.
24.Stack, J. H., Horazdovsky, B. & Emr, S. D. (1995). Receptor-mediatedprotein sorting to the vacuole in yeast: roles for a protein kinase, a lipidkinase and GTP-binding proteins. Annu Rev Cell Dev Biol, 11, 1-33.
25.Chapman, R., Sidrauski, C. & Walter, P. (1998). Intracellular signalingfrom the endoplasmic reticulum to the nucleus. Annu Rev Cell Dev Biol, 14, 459-85.
26.Le Borgne, R. & Hoflack, B. (1998). Protein transport from the secretoryto the endocytic pathway in mammalian cells. Biochim. Biophys. Ac., 1404, 195-209.
27.Chen, X. & Schnell, D. J. (1999). Protein import into chloroplasts. Trendsin Cell Biolology, 9,222-227.
28.Hettema, E. H., Distel, B. & Tabak, H. F. (1999). Import of proteinsinto peroxisomes. Biochim. Biophys. Ac., 1451, 17-34.
29.Pahl, H. L. (1999). Signal transduction from the endoplasmic reticulum tothe cell nucleus. Physiol Rev, 79, 683-701.
30.Truscott, K. N. & Pfanner, N. (1999). Import of carrier proteins intomitochondria. Biol. Chem., 380, 1151-6.
31.Bauer, M. F., Hofmann, S., Neupert, W. & Brunner, M. (2000). Proteintranslocation into mitochondria: the role of TIM complexes. TICB, 10, 25-31.
32.Ito, A. (2000). Mitochondrial processing peptidase: multiple-siterecognition of precursor proteins. TICB, 10, 25-31.
33.Soltys, B. J. & Gupta, R. S. (2000). Mitochondrial proteins atunexpected cellular locations: export of proteins from mitochondria from anevolutionary perspective. Int Rev Cytol, 194, 133-96.
34.Thanassi, D. G. & Hutltgren, S. J. (2000). Multiple pathways allowprotein secretion across the bacterial outer membrane. Curr. Opin. CellBiol., 12, 420-430.
35.Heusser, C. & Jardieu, P. (1997). Therapeutic potential of anti-IgEantibodies. Curr Opin Immunol, 9, 805-813.
36.Bettler, B., Kaupmann, K. & Bowery, N. (1998). GABAB receptors: drugsmeet clones. Curr Opin Neurobiol, 8, 345-350.
37.Moreau, J. L. & Huber, G. (1999). Central adenosine A(2A) receptors: anoverview. Brain Res Brain Res Rev, 31, 65-82.
38.Saragovi, H. U. & Gehring, K. (2000). Development of pharmacological agentsfor targeting neurotrophins and their receptors. Trends Pharmacol Sci, 21, 93-98.
39.Sedlacek, H. H. (2000). Kinase inhibitors in cancer therapy: a look ahead. Drugs, 59, 435-476.
40.Dewji, N.N. a. S. S. (1997). The seven transmembrane spanningtopography of the Alzheimer disease-related presenilin proteins in the plasmamembranes of cultured cells. Proc. Natl. Acd. Sci. USA, 94, 14024-14030.
41.Hildebrand, J.G. S. a. G. M. S. (1997). Mechanisms of olfactory discrimination:converging evidence for common principles across phyla.Annu. Rev. Neurosci., 20.
42.Stadel, J. M. e. a. (1997). OrphanG protein-coupled receptors: a neglected opportunity for pioneer drugdiscovery. Trends Pharmacol. Sci., 22, 162-165.
43.Marchese, A. e. a. (1999). NovelGPCRs and their endogenous ligands: expanding the boundaries of physiology andpharmacology. Trends Pharmacol. Sci, 20, 370-375.
44.Gudermann, T., Nurnberg, B. and G. Schultz (1995). Receptors and G proteins as primarycomponents of transmembrane signal transduction. Part 1. G-protein-coupledreceptors: structure and function. J. Mol. Med, 73, 51-63.
45.Attwood, T. K., Croning, M. D., Flower, D. R., Lewis, A. P., Mabey, J. E. etal. (2000). PRINTS-S: the database formerly known as PRINTS. Nucl. AcidsRes., 28, 225-227.
46.Attwood, T. K. (2001). A compendium of specific motifs for diagnosing GPCRsubtypes. TIPS, 22, 162-165.
47.Kihara, D., Shimizu, T. & Kanehisa, M. (1998). Prediction of membraneproteins based on classification of transmembrane segments. Prot. Engin., 11, 961-970.
48.Kihara, D. & Kanehisa, M. (2000). Tandem clusters of membraneproteins in complete genome sequences. Genome Res., 6, 731-743.
49.Fu, R. & Cross, T. A. (1999). Solid-state nuclear magnetic resonanceinvestigation of protein and polypeptide structure. Annu. Rev. Biophys.Biophys. Chem., 28,235-268.
50.de Groot, H. J. (2000). Solid-state NMR spectroscopy applied to membraneproteins. Curr. Opin. Str. Biol., 10, 593-600.
51.Marassi, F. M. & Opella, S. J. (2000). A solid-state NMR index ofhelical membrane protein structure and topology. J Magn Reson, 144, 150-155.
52.McDermott, A., Polenova, T., Bockmann, A., Zilm, K. W., Paulson, E. K. etal. (2000). Partial NMR assignments for uniformly (13C, 15N)-enriched BPTI inthe solid state. J Biomol NMR, 16, 209-219.
53.Riek, R., Pervushin, K. & Wuthrich, K. (2000). TROSY and CRINEPT: NMRwith large molecular and supramolecular structures in solution. TIBS, 25, 462-468.
54.Sanders, C. R. & Nagy, J. K. (2000). Misfolding of membrane proteins inhealth and disease: the lady or the tiger? Curr. Opin. Str. Biol., 10, 438-442.
55.Arora, A. & Tamm, L. K. (2001). Biophysical approaches to membraneprotein structure determination. Curr. Opin. Str. Biol., 11, 540-547.
56.Fernandez, C., Hilty, C., Bonjour, S., Adeishvili, K., Pervushin, K. et al.(2001). Solution NMR studies of the integral membrane proteins OmpX and OmpAfrom Escherichia coli. FEBS Lett., 504, 173-178.
57.Opella, S. J., Ma, C. & Marassi, F. M. (2001). Nuclear magneticresonance of membrane-associated peptides and proteins. Meth. Enzymol., 339, 285-313.
58.Wuthrich, K. (2001). The way to NMR structures of proteins. Nat. Struct.Biol., 8, 923-925.
59.Moller, S., Kriventseva, E. V. & Apweiler, R. (2000). A collection ofwell characterised integral membrane proteins. Bioinformatics, 16, 1159-1160.
60.McGovern, K., Ehrmann, M. & Beckwith, J. (1991). Decoding signals formembrane proteins using alkaline phosphatase fusions. EMBO J., 10, 2773-2782.
61.Hennessey, E. S. & Broome-Smith, J. K. (1993). Gene-fusion techniquesfor determining membrane-protein topology. Curr. Opin. Str. Biol., 3, 524-531.
62.Traxler, B., Boyd, D. & Beckwith, J. (1993). The topological analysis ofintegral membrane proteins. J. Membrane Biol.,132, 1-11.
63.van Geest, M. & Lolkema, J. S. (2000). Membrane topology and insertionof membrane proteins: search for topogenic signals. Microbiol. Mol. Biol.Rev., 64, 13-33.
64.McGuigan, J. E. (1994). Antibodies to complementary peptides as probes forreceptors. Immunomethods, 5, 158-166.
65.Jermutus, L., Ryabova, L. A. & Pluckthun, A. (1998). Recent advances inproducing and selecting functional proteins by using cell-free translation. CurrOpin Biotechnol, 9,534-548.
66.Morris, G. E., Sedgwick, S. G., Ellis, J. M., Pereboev, A., Chamberlain, J.S. et al. (1998). An epitope structure for the C-terminal domain of dystrophinand utrophin. Biochem., 37, 11117-11127.
67.Amstutz, P., Forrer, P., Zahnd, C. & Pluckthun, A. (2001). In vitrodisplay technologies: novel developments and applications. Curr OpinBiotechnol, 12,400-405.
68.Wallin, E. & von Heijne, G. (1998). Genome-wide analysis of integralmembrane proteins from eubacterial, archaean, and eukaryotic organisms. Prot.Sci., 7, 1029-1038.
69.Krogh, A., Larsson, B., von Heijne, G. & Sonnhammer, E. L. (2001).Predicting transmembrane protein topology with a hidden Markov model:application to complete genomes. J. Mol. Biol.,305, 567-580.
70.Liu, J. & Rost, B. (2001). Comparing function and structure betweenentire proteomes. Prot. Sci., 10, 1970-1979.
71.von Heijne, G. (1996). Prediction of transmembrane protein topology. InProtein structure prediction (Sternberg, M. J. E., eds.), pp. 101-110, OxfordUniv. Press, Oxford.
72.Seshadri, K., Garemyr, R., Wallin, E., von Heijne, G. & Elofsson, A.(1998). Architecture of beta-barrel membrane proteins: analysis of trimericporins. Prot. Sci., 7, 2026-2032.
73.Buchanan, S. K. (1999). b-Barrel proteins from bacterial outer membranes: Structure, functionand refolding. Curr. Opin. Str. Biol., 9, 455-461.
74.Chen, C. P., Kernytsky, A. & Rost, B. (2002). Myths of transmembranehelix predictions. Prot. Sci.,submitted.
75.von Heijne, G. (1986). The distribution of positively charged residues inbacterial inner membrane proteins correlates with the trans-membrane topology. EMBOJ., 5, 3021-3027.
76.von Heijne, G. (1989). Control of topology and mode of assembly of apolytopic membrane protein by positively charged residues. Nature, 341, 456-458.
77.Eisenberg, D. W., R.M., and T.C. Terwilliger (1982). The helical hydrophobicmoment: a measure of the amphiphilicity of a helix. Nature, 299, 371-374.
78.Barrantes, F. (1975). The nicotinic cholinergic receptor : differentcompositions evidenced by statistical analysis. Biochem Biophys Res Commun., 62, 407-14.
79.Nakai, K. & Kanehisa, M. (1992). A knowledge base for predicting proteinlocalization sites in eukaryotic cells. Genomics,14, 897-911.
80.Hartmann, E., Rapoport, T.A., and Lodish, H.F. (1989). Predicting theorientation of eukaryotic membrane spanning proteins. Proc. Natl. Acad. Sci.USA, 86, 5786-5790.
81.Schulz, G. E. (1988). A critical evaluation of methods for prediction ofprotein secondary structures. Annu. Rev. Biophys. Biophys. Chem., 17, 1-21.
82.Fasman, G. D. (1989). The development of the prediction of proteinstructure. In Prediction of protein structure and the principles of proteinconformation (Fasman, G. D., eds.), pp. 193-303, Plenum Press, New York,London.
83.Rost, B. & Sander, C. (2000). Third generation prediction of secondarystructure. Meth. Mol. Biol., 143, 71-95.
84.Juretic, D., Lee, B., Trinajstic, N. & Williams, R. W. (1993).Conformational preference functions for predicting helices in membraneproteins. Biopolymers, 33, 255-273.
85.Juretic, D., Jeroncic, A. et al. (1999). Sequence analysis of membraneproteins with the web server SPLIT. Croatica Chemica Acta, 77, 975-997.
86.Engelman, D. M., Steitz, T. A. & Goldman, A. (1986). Identifyingnonpolar transbilayer helices in amino acid sequences of membrane proteins. Annu.Rev. Biophys. Biophys. Chem., 15, 321-353.
87.Claverie, J.-M. & Daulmiere, C. (1991). Smoothing profiles with slidingwindows: better to wear a hat! CABIOS, 7, 113-115.
88.Needleman, S. B. & Wunsch, C. D. (1970). A general method applicable tothe search for similarities in the amino acid sequence of two proteins. J.Mol. Biol., 48,443-53.
89.Sellers, P. H. (1974). On the theory and computation of evolutionarydistances. SIAM J. Appl. Math., 26, 787-793.
90.Rost, B. & Sander, C. (1993). Prediction of protein secondary structureat better than 70% accuracy. J. Mol. Biol.,232, 584-599.
91.Rost, B. & Sander, C. (1994). Combining evolutionary information andneural networks to predict protein secondary structure. Proteins, 19, 55-72.
92.Rost, B. (2001). Protein secondary structure prediction continues to rise. J.Struct. Biol., 134,204-218.
93.Rost, B., Casadio, R. & Fariselli, P. (1996). Refining neural networkpredictions for helical transmembrane proteins by dynamic programming. InFourth International Conference on Intelligent Systems for Molecular Biology(States, D., Agarwal, P., Gaasterland, T., Hunter, L. & Smith, R. F.,eds.), pp. 192-200, Menlo Park, CA: AAAI Press, St. Louis, M.O., U.S.A.
94.Tusnady, G. E. & Simon, I. (2001). Topology of membrane proteins. JChem Inf Comput Sci, 41, 364-8.
95.Levitt, M. & Warshel, A. (1975). Computer simulation of protein folding.Nature, 253,694-698.
96.Hagler, A. T. & Honig, B. (1978). On the formation of protein tertiarystructure on a computer. Proc. Natl. Acad. Sci. U.S.A., 75, 554-558.
97.Levitt, M. (1983). Molecular dynamics of native proteins: I. Computersimulation of trajectories. J. Mol. Biol.,168, 595-620.
98.Karplus, M. & Petsko, G. A. (1990). Molecular dynamics simulations inbiology. Nature, 347, 631-639.
99.Berendsen, H. J. C. (1991). Molecular dynamics studies of proteins andnucleic acids. Curr. Opin. Str. Biol., 1, 191-195.
100.Dill, K. A. (1993). Folding proteins: finding a needle in a haystack. Curr.Opin. Str. Biol., 3,99-103.
101.van Gunsteren, W. F. (1993). Molecular dynamics studies of proteins. Curr.Opin. Str. Biol., 3,167-174.
102.Forrest, L. R., Tieleman, D. P. & Sansom, M. S. (1999). Defining thetransmembrane helix of M2 protein from influenza A by molecular dynamicssimulations in a lipid bilayer. Biophys. J.,76, 1886-1896.
103.Sajot, N. & Genest, M. (2000). Structure prediction of the dimericneu/ErbB-2 transmembrane domain from multi-nanosecond molecular dynamicssimulations. Eur. Biophys. J., 28, 648-662.
104.Briggs, J. A., Torres, J. & Arkin, I. T. (2001). A new method to modelmembrane protein structure based on silent amino acid substitutions. Proteins, 44, 370-5.
105.Nikaido, H. (1994). Porins and specific diffusion channels in bacterialouter membranes. J. Biol. Chem., 269, 3905-3908.
106.Schirmer, T., Keller, T. A., Wang, Y. F. & Rosenbusch, J. P. (1995).Structural basis for sugar translocation through maltoporin channels at 3.1 Aresolution. Science, 267, 512-514.
107.Meyer, J. E. W., Hofnung, M. & Schulz, G. E. (1997). Structure ofMaltoporin from Salmonella typhimurium ligated with aNitrophenyl-maltotrioside. J. Mol. Biol., 266, 761-775.
108.Forst, D., Welte, W., Wacker, T. & DIederichs, K. (1998). Structure ofthe sucrose-specific porin ScrY from Salmonella Typhimurium and its complexwith sucrose. Nat. Struct. Biol., 5, 37-46.
109.Schirmer, T. (1998). General and specific porins from bacterial outermembranes. J. Struct. Biol., 121, 101-109.
110.Mannella, C. A. (1998). Conformational changes in the mitochondrial channelprotein, VDAC, and their functional implications. J. Struct. Biol., 121, 207-218.
111.Schulz, G. E. (2000). beta-Barrel membrane proteins. Curr. Opin. Str.Biol., 10, 443-447.
112.Weiss, M. S. & Schulz, G. E. (1992). Structure of porin refined at 1.8Å resolution. J. Mol. Biol., 227, 493-509.
113.Pebay-Peyroula, E., Garavito, R. M., Rosenbusch, J. P., Zulauf, M. &Timmins, P. A. (1995). Detergent structure in tetragonal crystals of OmpF pori.Structure, 3,1051-1059.
114.Tamm, L. K., Arora, A. & Kleinschmidt, J. H. (2001). Structure andassembly of beta-barrel membrane proteins. J. Biol. Chem., 276, 32399-32402.
115.Paul, C. & Rosenbusch, J. P. (1985). Folding patterns of porin andbacteriorhodopsin. EMBO J., 4, 1594-1597.
116.Welte, W., Weiss, M. S., Nestel, U., Weckesser, J., Schiltz, E. et al.(1991). Prediction of the general structure of OmpF and PhoE from the sequenceand structure of porin from Rhodobacter capsulatus: Orientation of porin in themembrane. BBA, 1080, 271-274.
117.Schirmer, T. & Cowan, S. W. (1993). Prediction of membrane spanningbeta-strands and its application to maltoporin. Prot. Sci., 2, 1361-1363.