Secondary structure assignment
1 CUBIC, Department of Biochemistry and Molecular Biophysics, Columbia University, 650 West 168th Street BB217, New York, NY 10032, USA, rost@columbia.edu
2 Center for Biological Sequence Analysis, BioCentrum Bldg. 208, The Technical University of Denmark, 2800 Lyngby, Denmark, ca2@cbs.dtu.dk
* Corresponding author: rost@columbia.edu, http://cubic.bioc.columbia.edu/ Tel: +1-212-305-3773, fax: +1-212-305-7932
Key words: protein secondary structure assignment, hydrogen bonds, continuous secondary structure, structural genomics
| 3D | three-dimensional |
| DEFINE | method assigning secondary structure from 3D co-ordinates based on linear distance masks to ideal secondary structure [1] |
| DSSP | program and database assigning secondary structure and solvent accessibility for proteins of known 3D structure from hydrogen bonding patterns [2] |
| DSSPcont | continuous assignment of secondary structure for proteins of known 3D structure [3] |
| NMR | nuclear magnetic resonance |
| P-Curve | curvature based assignment of secondary structure from 3D [4] |
| PDB | Protein Data Bank of experimentally determined 3D structures of proteins [5] |
| rmsd | root-mean square deviation |
| STRIDE | secondary STRuctural IDEntification method to assign secondary structure from 3D using hydrogen bonds and torsion angles [6] |
The automatic assignment of protein secondary structure from 3D co-ordinates is an important and in principle a simple bioinformatics tool. The assignments are used in 3D structure visualisations to simplify the presentation of a protein in order to highlight functional aspects. Structural comparisons of proteins are performed faster when first comparing secondary structure. Secondary structure has also been used to improve sequence searches. Hence, secondary structure is important to assure the optimal yield of an experimental structure and to cleverly select the targets for structural genomics. Here, we review the principles of the most popular assignment methods DSSP, STRIDE, DEFINE and P-Curve. We also compare these methods and suggest evaluation criteria for 'good' assignments. Finally, we describe an extension from discrete to continuous assignment of secondary structure.
The task. When we look at a protein three-dimensional (3D) structure, we notice regular macro-elements that are repeated in all known structures: helices and strands. There is no unique physical definition to systematically assign secondary structure from 3D co-ordinates. Instead, there are many differing definitions, each capturing some aspects of 'reality'. The relative spatial distances and orientations between two or more secondary structure segments are typically referred to as 'super-secondary' structure. Here, we reviewed a number of the existing concepts to assign secondary structure from co-ordinates, i.e. to label the secondary structure state for each residue. The terms 'class', 'state', and 'regular secondary structure' are not used consistently in the literature. We used the following notation: (1) 'states' are the types of secondary structure defined by a particular method, e.g. G in DSSP, (2) 'classes' are the groups of similar states, e.g. the DSSP states H, G, and I all describing helices, and (3) 'regular secondary structure' as positively defined state. Note that 'non-regular' is usually defined as a negation, i.e. by the absence of all the other criteria applied by a method to define the regular states.
The rôle of secondary structure assignment in structural genomics. Typically, structural biologists assume the protein fold to be the basic unit for structure classification (see chapter 4) [7]. The fold and other basic structural elements are classified by automatic systems, such as SCOP, CATH, FSSP, MMDB [8, 9, 10, 11, 12, 13] . When classified by experts, the particular features of a given fold are often described by the overall secondary structure arrangements [7, 14, 15] , which therefore constitute a substantial step in protein classification. Functional aspects of proteins are also reflected in the secondary structure and occasionally function can be derived from secondary structure alone (see chapter 19) [16, 17, 3] . There are four main uses of secondary structure: (1) it is indicative of the fold, (2) it is an intuitive means of visualising protein structures, (3) it influences the sequence alignment, and (4) it is related to function. In the context of structural genomics these four features are important. One practical application is to use secondary structure segments to speed up large-scale all-against-all alignments of 3D structures. Another is the use of secondary structure segments for comparative modelling [18, 19, 20] and threading [21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34] . In turn, comparative modelling techniques and more sensitive sequence searches through threading are relevant for structural genomics. Firstly, these techniques assure that each experimental structure has the highest possible impact. Secondly, both methods are important to determine the areas of protein space that need to be explored as part of the target selection for structural genomics [ 35; 36; 37; 38; 39; 40; 41].
The rôle of secondary structure in sequence searches and structure prediction. The relevance of secondary structure also explains why secondary structure prediction from sequence has become one of the most ardently pursued tasks in bioinformatics [42, 43, 44, 45, 46, 47] . Various secondary structure assignment schemes exist, which differ considerably (as described below), so how do we evaluate and compare them? One idea is to use the particular secondary structure assignment that (i) agrees most between proteins of similar structure, and/or (ii) is the most predictable from sequence. Obviously, the concepts 'agree most' and 'most predictable' have to be put into perspective: An assignment of X to all residues would be completely conserved and easy to predict while it would not carry any information. Hence, we would have to account for the information and relevance contained in an assignment. However, this simple concept has not been realised, yet. In fact, we have found that secondary structure prediction methods are reaching a level of accuracy at which the assignment problem becomes relevant [3] . Secondary structure prediction methods become increasingly important for prediction of general aspects of protein structure and function [48, 45, 49, 46, 47] and for database searches [23, 27, 50, 51, 52, 53] . Thus, the assignment problem also influences these important fields of bioinformatics indirectly.
History: from expert to automatic assignment of protein secondary structure. Pauling and colleagues correctly predicted the idealised protein secondary structures of alpha-helices [54] , p-helices [54] , and of beta-sheets [55] based on intra-backbone hydrogen bonds. Five decades later, we know that on average about half of the residues in proteins participate in helices or sheets [5] . Pauling and colleagues incorrectly predicted that 310-helices would not occur in proteins, due to unfavourable bond angles; however, approximately 4% of the residues are observed in this conformation [56] . Initially, the crystallographers assigned secondary structure by eye from the 3D structures. At the time this was the only way to assign secondary structure. However, it lacked consistency, since experts occasionally disagree. This was particularly problematic when comparing secondary structure predictions and was actually the primary objective for Kabsch & Sander to automate the assignment in their DSSP program [2, 57] . Originally developed to improve secondary structure prediction, DSSP has remained the standard in the field, most popular for its relatively reliable assignments. Curiously, the prediction method for which Kabsch & Sander originally needed the automatic assignment was never published [58] .
Since hydrogen bonds are used by many
methods as the defining elements for secondary structure, we introduce both the
concept of the hydrogen bond and the ways to define it. Pauling established the
hydrogen bond as an important principle in chemistry [59] . The
rich network of hydrogen bonds in water creates a very particular environment
in which polar molecules participate, while non-polar molecules disrupt the
network of hydrogen bonds. This results in missing water-water hydrogen bonds
and therefore a relative energy cost compared to the hydrogen bonded case
and Leucine when compared to Glycine [60] ).
This energy cost is in the order of two hydrogen bonds (hydrogen bonds are in
the range of -2
non-polar molecules, thereby resulting in the hydrophobic effect.
For proteins the packing of non-polar residues in the core is believed to be the main driving force in tertiary structure formation of proteins, while the specific secondary structures are governed by intra-protein hydrogen bonds [61] . Packing the non-polar residues in the core also means burying the polar backbone atoms and breaking the water-backbone hydrogen bonds. To avoid this heavy energy cost the polarities are paired (forming hydrogen bonds) in the protein core, thus fixing the protein conformation. If the protein backbone instead were non-polar, the protein core elements would be free to move around changing the protein structure and thereby preventing the protein from functioning reliably and efficiently.
Approximately 90% of the backbone C=O and NH groups have hydrogen bonds [62] . Using the Coulomb hydrogen bond definition (see below), we found that approximately 62% of the backbone C=O and NH groups have intra-backbone hydrogen bonds [56] . Pauling defined secondary structure by the intra-backbone hydrogen bonds, which has later become the prevalent means of assigning secondary structure. Thus, for simplicity we refer to intra-backbone hydrogen bonds when using the term 'hydrogen bond'.
There are many different angles and distances that can be measured and used to identify the hydrogen bond. Baker and Hubbard [62] assigned hydrogen bonds according to the angle NHO = q and to the distance rHO in the hydrogen bond. A hydrogen bond is assigned when:
q > 120º and rHO < 2.5 Å (Eq. 1)
This is similar to other rigid distance and angle constraints published [63, 64] . Although a rather crude way of assigning hydrogen bonds, it has sufficed for several decades. In most applications hydrogen bonds were only assigned visually for a few proteins, i.e. explicit definitions of hydrogen bond energies were not necessary.
One way of finding hydrogen bonds is by calculating the Coulomb energy in the bond, as applied in DSSP [2] focusing on the electrostatic attraction ( Fig. 1 ). The Coulomb energy for the attraction and repulsion is given by:
where f = 332 Å
dimensional factor and d+ and d- are the polar charges given in units of the elementary
electron charges e. A cut-off level has been set
for the weakest acceptable hydrogen bond so that the resulting energy is bound
by: E < -0.5 kcal/mol in DSSP. The H-atom position is usually not given in
PDB files requiring an extrapolation, in practice. The H-atom position that is
needed to calculate the two distances rOH and rHC' in
equation 2 is usually not given in PDB files. Hence, it must be extrapolated.
DSSP uses an approximate position, assuming that the covalent bond between O=C'
is parallel to the covalent N-H bond adjacent to the same polypeptide bond. The
direction of the O=C' vector is kept while its length is set to 1 Å,
i.e., the length of the N-H bond [60] . The
position of the H-atom is extrapolated using the direction of the C'=O vector
when starting out from the position of the N-atom. These approximations made by
DSSP simplify the calculation of the H-atom position and appear to be rather
accurate: compared to the original bond angles and distances [60] , we
found the DSSP approximation to yield an average error around 0.07 Å [56] . Both
in the DSSP extrapolation and in our test the trans-peptide bond, giving rise
to the rigid peptide plane, was assumed. Partitioning ab initio energy calculations of the hydrogen bond into classical components
showed that about 75% is electrostatic (Coulombic) and less than 5% comes from
polarisation and charge-transfer, for moderate strength bonds [64] . Note
that the Coulomb energy term does not incorporate atom-atom repulsion to
penalise steric clashes and does not give rise to a characteristic hydrogen
bond length.

Fig. 1: Distances used to calculate the Coulomb hydrogen bond.
An empirical hydrogen bond energy calculation can be derived from the hydrogen bond geometry in crystal structures or from polypeptides, peptides, amino acids and small organic compounds [65, 66] as applied in STRIDE (see below). The total energy Ehb depends on the NO distance energy Er, and on three bonding angles through the expressions Ep and Et:
The distance dependency is similar to the Lennard-Jones potential for the van der Waals interaction, but uses powers of 8 and 6 instead of 12 and 6:
where r is the NO distance, rm is the optimal distance and Em, the optimal energy. For intra-backbone hydrogen bonds rm = 3.0 Å and Em, = -2.8 kcal/mol is used. The two angular dependent terms are:
where the angles q, ti and t0 are specified in Fig. 2 .

Fig. 2: Angles and distances defining the empirical hydrogen bond. Note: figure similar to the one in [6] .
The so-called Dictionary of Secondary Structure of Proteins (DSSP) by Kabsch and Sander [2] performs its sheet and helix assignments solely on the basis of backbone-backbone hydrogen bonds. The DSSP method defines a hydrogen bond when the bond energy is below -0.5 kcal/mol from a Coulomb approximation of the hydrogen bond energy ( eqn. 2 ). The structure assignments are defined such that visually appealing and unbroken structures result. In case of overlaps, alpha-helix is given first priority, followed by beta-sheet. This procedure does not effect the Coulomb approximation, rather the realisation of 'unbroken structures' addresses the step from individual hydrogen bonds to assigning macro-structures to groups of such bonds.
An alpha-helix assignment (DSSP state 'H') starts when two consecutive amino acids have i ® i+4 hydrogen bonds, and ends likewise with two consecutive i-4 ¬ i hydrogen bonds. This definition is also used for 310-helices (state 'G' with i ® i+3 hydrogen bonds) and for p-helices (state 'I' with i ® i+5 hydrogen bonds) as well. The helix definition does not assign the edge residue having the initial and final hydrogen bonds in the helix. A minimal size helix is set to have two consecutive hydrogen bonds in the helix, leaving out single helix hydrogen bonds, which are assigned as turns (state 'T').
beta-sheet residues (state 'E') are defined as either having two hydrogen bonds in the sheet, or being surrounded by two hydrogen bonds in the sheet. This implies three sheet residue types: anti-parallel and parallel with two hydrogen bonds or surrounded by hydrogen bonds. The minimal sheet consists of two residues at each partner segment. Isolated residues fulfilling this hydrogen bond criterion are labelled as b-bridge (state 'B'). The recurring H-bonding patterns connecting the partnering strands in a beta-sheet are occasionally interrupted by one or more so called b-bulge residues. In DSSP these residues are also assigned as beta-sheet 'E' and may comprise up to four residues on one strand and up to one residue on the partnering strand. These interruptions in the beta-sheet H-bonding pattern are only assigned as sheet if they are surrounded by H-bond forming residues of the same type, i.e. either parallel or anti-parallel. The remaining two DSSP states ‘S’ and ‘ ‘ (space) indicate a bend in the chain and unassigned/other, respectively.
The secondary STRuctural IDEntification method (STRIDE) by Frishman and Argos [6] uses an empirically derived hydrogen bond energy ( eqn. 3 ) and phi-psi torsion angle criteria to assign secondary structure. Torsion angles are given alpha-helix and beta-sheet propensities according to how close they are to their regions in Ramachandran plots (see chapter 2B) [67] . The method fixes five internal parameters for alpha-helix and four for beta-sheets. The parameters are optimised to mirror visual assignments made by crystallographers for a set of proteins. However, crystallographers often disagree in their assignment of secondary structure. This fact may challenge the concept of STRIDE. The annotations from crystallographers may be more similar to one another than all of them are to automatic assignments from e.g. DSSP. However, this remains to be shown. Since the secondary structure categories have different parameters, their assignment thresholds are independent for the hydrogen bond and phi-psi torsion angles. By construction, the STRIDE assignments agreed better with the expert assignments than DSSP, at least for the data set used to optimise the free parameters. In particular, the authors reported that every 11th beta-sheet and every 32nd alpha-helix were more in register with the expert assignments for the data set used.
Like DSSP, STRIDE assigns the shortest alpha-helix ('H') if it contains at least two consecutive i ® i+4 hydrogen bonds. In contrast to DSSP, helices are elongated to comprise one or both edge residues if they have acceptable phi-psi angles, similarly a short helix can be vetoed if the phi-psi angles are unfavourable. This implies that hydrogen bond patterns may be ignored if the phi-psi angles are unfavourable. The sheet category does not distinguish between parallel and anti-parallel sheets. The minimal sheet ('E') is composed of two residues each in one of five possible hydrogen bond conformations, i.e. two more than for DSSP. The dihedral angles are incorporated into the final assignment criterion as was done for the alpha-helix. Bulges are accepted applying the same criterion as DSSP. Single residue sheets, i.e., b-bridges are labelled as 'B' for the three DSSP hydrogen bond conformations and as ‘b’ for the remaining two. 310- ('G'), p-helices ('I') are implemented according to the DSSP scheme, but with the empirical hydrogen bond criterion. Turns are assigned according to the phi-psi angles of residue i+1 and i+2 as described in [68] . The ‘C’ symbol is used whenever none of the above structure requirements are met.
The algorithm 'DEFINE' by Richards and Kundrot [1] assigns secondary structures by matching Ca-coordinates with a linear distance mask of the ideal secondary structures. First strict matches are found, which subsequently are elongated and/or joined allowing moderate irregularities or curvature. The algorithm locates the starts and ends of a- and 310-helices, beta-sheets, sharp turns and omega-loops. With these classifications the authors are able to assign 90-95% of all residues to at least one of the given secondary structure classes.
To assign alpha-helices the linear mask is matched with each row in the distance matrix of the query protein ( Fig. 3 ). If a segment longer than four residues matches the mask within the allowed cumulative discrepancy limit (e = 1 Å) it is assigned. Assigned alpha-helices are checked whether they start or end with a 310-helix, but individual 310-helices and p-helices are not investigated.
In order to assign beta-sheets as a single category, the authors have applied a linear distance mask taken from ideal anti-parallel sheets. The problems of the backbone bendability inside sheets and of the curvature for larger sheets has been 'solved' by excluding non-rigid sheets from the definition. The minimum length of sheets is set to be four residues. According to Pauling's definition of a beta-sheet, each strand must pair to another strand to form a sheet. In contrast, DEFINE may assign unpaired strands.

Fig. 3: DEFINE. The linear distance mask approach is visualised for an alpha-helix. The mask is compared to the distances in the query protein. If the mask fits a certain segment, then this segment is assigned as alpha-helix. The allowed root-mean-square difference between the distances in the mask and the ones observed in the query protein is determined by the cumulative discrepancy limit.
Sklenar, Etchebest and Lavery [4] based their assignment scheme P-Curve on a mathematical analysis of protein curvature. Using differential geometry, they calculated a helicoidal axis on the basis of the fixed axis systems of a series of peptide planes. The secondary structure assignments are performed by motif matching, where the parameters in the motif are the radius of the helicoidal system along with a series of tilting, rolling and twisting measures describing geometrical differences between two peptide planes. This parameter analysis is achieved mainly by the use of the Ca-coordinates. The P-Curve assignment differs significantly from those performed from phi/psi angles or H-bonds, since different parameters are used (e.g. helicoidal radius, tilting, rolling, twisting). Furthermore, the degrees of freedom allowed when matching a P-Curve motif are quite different from those allowed when matching a DEFINE linear distance mask. For example, while the linear distance mask of DEFINE fits poorly to a curved beta-strand, the local P-Curve parameters is likely to fit better.
The assigned secondary structures are recognised by matching known structural motifs. These motifs are based on standard values for the helicoidal parameters. The following motifs are used: right- and left-handed alpha-helix, 310- and p-helix, parallel and anti-parallel beta-sheets and some other structures of little interest here. Note that like DEFINE, P-Curve may assign the category sheet to unpaired strands.
Continuous DSSP is a novel secondary structure assignment scheme described below (section 'Emerging and Future Developments').
All methods described above have been coded. In some cases these programs are publicly available (DSSP, STRIDE, and DSSPcont, Table 1 ). For all these programs there are also the assignments for all proteins deposited in PDB available ( Table 1 ). We explained the meaning of the output in Fig. 4 (DSSP) and Fig. 5 (STRIDE). The output of DSSPcont differs from DSSP on which it is based (see last section on 'Emerging and Future Developments') only in the addition of eight extra columns giving the continuous assignment to each of the eight DSSP states (G, H, I, T, E, B, S and ‘ ’).
| Program | WWW | Platforms |
| DSSP | •www.cmbi.kun.nl/gv/dssp | •IRIX (SGI)•SOLARIS (SUN)•LINUX |
| STRIDE | •www.embl-heidelberg.de/argos/stride/stride_info.html | •IRIX (SGI) |
| DSSPcont | •www.cbs.dtu.dk • cubic.bioc.columbia.edu/services/DSSPcont | •IRIX (SGI)•LINUX |

Fig. 4: Explanation of DSSP output.
Example: segment from Crambin. The two first columns contain the unique DSSP residue number and the corresponding PDB residue number. The third column (here empty) indicates the chain identifier if there are multiple chains. Then follows the amino acid ‘AA’ in one letter codes (note: lower case letters are all Cysteines, in order to mark Cysteine-bridges, e.g. residue 16 has a disulfide bond to residue 26). The ‘STRUCTURE’ section starts with the secondary structure synopsis (HBEGITS listed in order of priority in case of overlaps) and is followed by helix hydrogen bond indications for 310-,alpha-
and π-helix hydrogen bonds, where ‘>’ indicates
an acceptor, ‘<’ a donor and ‘X’ both. The bend and chirality
are each given a column followed by the b-bridge label columns
(lower case labels are parallel b-bridges and upper case are anti-parallel).
The DSSP numbers of their partners are written in the ‘BP1’
and ‘BP2’ columns. Each b-sheet is also given a label (independent of
the b-bridge labels) indicated in the adjacent column. The ‘ACC’
column contains the solvent accessible surface measured in Å2
by estimating the number of water molecules in contact with the present
residue. The two strongest backbone-backbone hydrogen bonds are then listed,
where ‘N-H-->O’ are donor hydrogen bonds and ‘O-->N-H’
acceptor hydrogen bonds. The format indicates the relative position of the
hydrogen bond partner followed by the energy in

DSSP number is 5 less than the present one and
that the hydrogen bond energy is –0.8
are all labelled: ‘TCO’ is cosine of the angle between the present C=O vector and
that of the previous residue (close to 1 for helices and –1 for sheets),
‘KAPPA’ is the bend angle
chirality ('+' when positive, '-’ when negative), finally the ‘PHI’,
‘PSI’ angles are given followed by the (x,y,z) Ca-coordinates.

Fig. 5: Explanation of STRIDE output. The STRIDE output for Crambin is shown to explain the format and for comparison to Fig. 4. The format is simple and easily parsed, with ‘ASG’ as the first word in the lines used for assignment. The residue columns comprise the three-letter amino acid code, the chain identifier (‘-‘ for single chains), the PDB residue number and the STRIDE residue number, which starts from one for every new chain. The two structure columns contain the one-letter structure assignments (HGIEBbTC) and its short description. The columns with phi psi are followed by the column with solvent accessibility (measured in Å2).
We used the simple structure of Crambin as an example to point out differences in three assignment schemes ( Fig. 6 note that the P-Curve assignment was taken from the original publication [4] ). The secondary structure assignments of STRIDE and DSSP are identical except for one residue at the end of an alpha-helix. P-Curve largely agrees with the positioning of the secondary structure elements, but not with their lengths. Looking at the sheet region in detail ( Fig. 6 b), we see that the residues 39 and 40, assigned sheet by P-Curve only, are distant from any residue on the putatively pairing strand. According to Pauling, such an assignment would not be valid. The first sheet assignment by P-Curve covers residues 1-4, where residues 1 and 3 have one and two hydrogen bonds in the sheet, respectively. The extension of the strands/sheet by both DSSP and STRIDE appears reasonable.

Fig. 6: Protein secondary structure for crambin. The structure of the small protein Crambin (PDB identifier:1crn [86] ) is shown from two angles: (A) the image of the two helices, and (B) the central short sheet. The automatic secondary structure assignment agrees well between the three methods shown (C).
Are the discrepancies observed for Crambin representative? Colloc'h and colleagues have compared DSSP, P-Curve and DEFINE on a low homology data set consisting of 28,266 residues in 154 protein chains [69] . The allowed cumulative discrepancy limit for DEFINE was set to e = 0.75 Å for helix and to e = 0.5 Å for sheet, in order to avoid an excess of secondary structure assignment. The authors found that all three algorithms agreed on the assignments of alpha-helix, beta-sheet and non-regular structure for only 63% of all residues. Most disagreements were found between non-regular and regular (helix and sheet) structure ( Fig. 7 ). In pairwise comparisons DEFINE and P-curve and likewise DEFINE and DSSP agreed for 74% of the residues, while P-Curve and DSSP agreed in 79% of all residues. We have found that DSSP and STRIDE agree in 96% of all residues with 64% of the disagreements related to the helix assignment (unpublished results derived from a data set of 707 non-homologous protein chains).
Not considering any assignment schemes superior, in principle, Colloc'h and colleagues suggested applying a consensus assignment: if two methods agree, use that state, otherwise assign the non-regular state. They noticed several aspects of interest:

Fig. 7: Comparison of three assignment schemes. The occurrences of three assignment classes (a-helix, b-sheet and non-regular) by three assignment methods: DSSP, P-Curve and DEFINE give the 10 <<<0.01%). Data are taken from [69] .
Although no systematic analysis has attempted to compare secondary structure assignment methods in terms of their consistency (see Introduction), DSSP continues to be the most widely used method. In fact, most prediction methods are based on DSSP assignments. Typically, the 8 DSSP states are converted into three classes using the following convention: [GHI] -> h, [EB] -> e, [TS' '] -> c.
Usually, 310-helices and b-bridges constitute short secondary structure segments that have some structural similarity to alpha-helix and beta-strand, respectively. However, they do have different sequence characteristics. Prediction methods, in general, are more precise in the core of regular secondary structure segments than at the termini [70, 71] . Thus, 310-helices and b-bridges are more difficult to predict than alpha-helices and beta-strands. Therefore an alternative conversion that has been used more recently yields a seemingly higher level of prediction accuracy: [H] -> h, [E] -> e, [GITS' '] -> c.
Usually, NMR structures contain more than one model in a PDB file. By default, the available programs for DSSP and STRIDE read only the first model. Our recent work on extending secondary structure assignments to 'continuous secondary structure' (DSSPcont, see below), suggested that this simplification throws away important information.
The amino acid typically found in alpha-helices differ considerably from those found in beta-sheets ( Fig. 8 ). Alanine and Leucine often occur in alpha-helices, while Proline and Glycine are rare. In beta-sheets Valine and Isoleucine are over-represented, while Glycine, Aspartic acid and Proline are under-represented. Shorter structures such as 310-helices and b-bridges have distinct residue distributions. For 310-helices the Alanine and Leucine signal has disappeared, instead the sequences are dominated by Proline which often observed as a helix initiator and breaker. For b-bridges, we no longer find a preference for Valine and Isoleucine. This indicates the role of the side chain in defining secondary and tertiary structure. An observation that can be built into new assignment methods (see below). In general, these preferences have long been the basis of secondary structure prediction methods [42, 72, 73, 43, 74, 75] .

Fig. 8: Sequence distributions for secondary structure. The four graphs show alignment statistics for (A) a-helices, (B) 310-helices, (C) b-sheets and (D) b-bridges by the Kullback-Leibler information at positions surrounding the one assigned (position 0). The number of aligned segments are: (A) 41803, (B) 4952, (C) 27320, (D) 1851. These segments were retrieved from a data set of 707 non-homologous protein chains using the DSSP assignment. At a given position, we therefore observed the 20 amino acids with a certain frequency; the Kullback-Leibler information calculates the information content of the observed frequencies with respect to the background frequencies (irrespectively of the structure). The more an observed set of frequencies differs from the background, the high the respective letter. If an amino acid at a given position is observed less frequently than in the background, it is drawn upside-down and hollow.
Super secondary structure, such as Greek-key and Zinc-finger motifs [76] describe the interaction and position of a few secondary structure elements. The recently developed I-sites library [77] is a collection of structure motifs for small segments with specific amino acid propensities. The main idea was to mine the structure database for reoccurring structural motifs and assign them as individual I-sites along with an amino acid propensity matrix covering the segment in question. The I-sites procedure can be viewed as a data driven assignment process that classifies segments into structure motifs in the range of secondary and super secondary structure. It has achieved considerable success in predicting protein structure [78] .
If we could predict helices accurately, it also seems possible to predict their tertiary arrangement [52] , i.e. the 3D structure. This gives rise to optimism, since secondary structure predictions are presently approaching a level of correctly predicting almost 80% of all residues in one of three classes: helix, strand, other [79, 80] . Most state-of-the-art prediction methods perform even better on a per-segment than on a per-residue basis. This is encouraging since the 'final assembly step' on the way from secondary structure to 3D structure is more sensitive to missing a helix than to getting the ends slightly wrong.
The physical basis for secondary structure formation has not yet been fully described. The backbone-backbone hydrogen bonds used to assign secondary structure do not involve the sidechains. Nevertheless, we observe strong preferences in the amino acids forming particular secondary structures. Simulating local interactions, Srinivasan & Rose found two competing forces that taken together explain this ostensible contradiction [81] . These are local attractive interactions - mainly hydrogen bonds - versus side chain conformational restrictions, constituting the enthalpic and entropic energy, respectively.
The standard method used to define line segments is to fit an axis through each secondary structure element (DSSP, STRIDE, DEFINE). This approach has difficulties, both with inconsistent definitions of secondary structure and the problem of fitting a single straight line to a bent structure. STICK avoids these problems by finding a set of line segments independently of any external secondary structure definition [82] . This allows the segments to be used as a novel basis for secondary structure definition by taking the average rise/residue along each axis to characterise the segment. This practice has the advantage that secondary structures are described by a single (continuous) value that is not restricted to the conventional classes of alphalpha-helix, 310-helix, and beta-strand. This latter property allows structures without "classic" secondary structures to be encoded as line segments that can be used in comparison algorithms. When compared over a large number of pairs of homologous proteins, the current method was found to be slightly more consistent than a widely used method based on hydrogen bonds.
"Good" secondary structure assignments are those that differ only between regions of protein structure not conserved between different NMR models for the same protein or between close homologues, and that distinguish between regions of thermal motion and less flexible regions [3] .
This concept led us to develop a continuous extension of DSSP [3] . This continuous assignment is based upon multiple runs of DSSP with different hydrogen bond thresholds. Then, we compile a weighted average over the individual DSSP assignments to assign secondary structure to each residue. We determined the weights by applying the above criterion for 'good' assignments starting with structural homologues from the FSSP [83] database. Inspecting the structural alignments in detail, we noted a number possible of reasons for observed structural differences:
Our objective was a secondary structure assignment method de-emphasising the effects of 1-3 while capturing differences caused by sequence changes. However, for structural alignments of homologues, we cannot separate these effects as illustrated by a comparison between two related structures: periplasmic binding protein (PDB: 4mbp [84] ) and putrescine binding protein (PDB: 1pot [85] ). The structural alignment was obtained from FSSP with a Z-score of 23.2 and an RMSD of 3.6 Å over 303 residues. We will focus on a small ten-residue segment ( Fig. 9 A) that has spiralling structure (alpha-helix, 310-helix or turn) and a b-bridge at the penultimate position ( Fig. 9 A). Based on the assignment alone one might characterise the differences as problems in the assignment process, since both segments have 310-helix hydrogen bonds over the entire stretch. On the other hand 1pot has no alpha-helix hydrogen bonds resulting in the assignment of 310-helix. The results from three high-quality prediction methods ( Fig. 9 B) suggest that the structural differences resulted from the sequence divergence. This means that the secondary structure assignments of the two segments should not necessarily be the same. This line of reasoning can be extended from short helices to short sheets and to the N- or C-terminal ends of helices and strands (caps). Therefore, we chose to optimise the weights for DSSPcont based on the comparisons between different NMR models for the same protein.

Fig. 9: DSSP assignments for similar structures: 4mbp and 1pot. (A) The DSSP assignment for two segments taken from two structurally similar proteins (periplasmic binding protein 4mbp [84] and putrescine binding protein 1pot [85] ) illustrates that the observed differences between these segments may originate from sequence differences. The boxed letters shown in the column next to the amino acid sequence give the final DSSP assignment: G = 310-helix, H= alpha-helix, T = turn, B = beta-bridge, and S = bend. The next column shows the hydrogen bonds (>: hydrogen bond acceptor, <: hydrogen bond donor and X: both), with indications of the hydrogen bond length, i.e. i ® i+(3,4) for 310 and a-helices, respectively. (B) All the predictions from PSIPRED [87] , SSpro [88] and PROFphd [89; 90] (see chapter 28) correctly spot the a-helix signal in 4mbp, while missing this signal for 1pot. This may indicate that the altered sequence changed the structure significantly in this region. Here, 'h' refers to the DSSP class helix (H or G) and 'c' to the DSSP non-regular class. Note: the predictions are cut out from those for the entire protein.
Our work on DSSPcont is still in progress. In the following, we briefly summarised a few of the important results (for more details see [3] ). We found that the single residue RMSD between models of high-quality NMR structures correlated well with thermal fluctuations in water as independently measured by the order parameter. The resulting continuous DSSP assignments were constructed to reflect the differences between NMR models of the same protein, so that the assignments reflect segments with thermal fluctuations ( Fig. 10 ). This means that the more a sequence segment fluctuates the lower the probability for the assigned helix/sheet will become. Information of this type can also be obtained directly from crystal structures. Overall, we found that the continuous assignment of secondary structure reflected the average occupancy of secondary structure assignments. In particular, our continuous assignment for a single NMR structure is similar to the average obtained over all models. This may indicate that for short intervals of time the concept of 'discrete secondary structure states' is appropriate. However, thermal fluctuations change these states slightly. Hence, the average we actually observe is a continuous secondary structure. Our analysis also indicated that most secondary structure regions are crystal solid!

Fig. 10: Protein motion and secondary structure. Using one set of coordinates from an ensemble of NMR models the continuous DSSP assignment reproduces the segments in proteins that experimentally had a high degree of motion due to thermal fluctuations in water. Figure reproduced from [3] . Protein motion has been independently measured by the order parameter 1-S2, by the tumbling of the N-H backbone bond-vector. 1-S2 is low when the amino acid is fixed as in the protein core and it is high when the residue fluctuates. 1-S2 is shown versus the continuous DSSP assignment grouping helices (GHI) and strands (EB). The points are averages over a window segment of three consecutive residues; the line gives an average of helix/strand assignments.
Assigning secondary structure from 3D co-ordinates is an important problem. Many successful solutions have been proposed over the last 20 years. One of the oldest solutions is DSSP. There are many reasons why that program has become the standard in the field. In fact, secondary structure assignment may be one of the exceptional examples for tools in structural biology and bioinformatics that have not been revolutionised by the recent explosion of data. For most residues most of the available methods agree in their assignment. Methods tend to differ mainly in locating the ends of regular secondary structure segments and in distinguishing between more subtle differences (e.g. alpha-, 310-, or pi-helix). To oversimplify the data: the residues for which the assignment methods differ tend to be the residues for which structural homologues or different NMR models differ, too. This idea has recently led to a number of new concepts for the assignment task. Two of these new concepts introduce the idea of continuous secondary structure assignments (STICK and DSSPcont). This novel interpretation of secondary structure breaks with the early idea that there are secondary structure 'states'. Since accurate secondary structure assignments are at the base of accurate comparisons between structures and predictions of protein structure, the story will continue.
Thanks to Jinfeng Liu (CUBIC, Columbia) for computer assistance and to Søren Brunak (CBS, Copenhagen) and Phil Borne (SCRIPPS) for helpful comments on the manuscript. The work of BR was supported by the grants 1-P50-GM62413-01 and RO1-GM63029-01 from the National Institutes of Health. Last, not least, thanks to all those who deposit their experimental data in public databases, and to those who maintain these databases.
| 1. | Richards, F.M. & Kundrot, C. E. (1988). Identification of structural motifs fromprotein coordinate data: secondary structure and first-level supersecondarystructure. Proteins, 3, 71-84. |
| 2. | Kabsch, W. & Sander, C. (1983). Dictionary of protein secondarystructure: pattern recognition of hydrogen bonded and geometrical features. Biopolymers, 22, 2577-2637. |
| 3. | Andersen, C. A. F., Palmer, A. G., Brunak, S. & Rost, B. (2001).Continuous secondary structure assignment correlates with protein flexibility. Structure, submitted. |
| 4. | Sklenar, H., Etchebest, C. & Lavery, R. (1989). Describing proteinstructure: a general algorithm yielding complete helicoidal parameters and aunique overall axis. Proteins, 6, 46-60. |
| 5. | Berman, H. M., Westbrook, J., Feng, Z., Gillliland, G., Bhat, T. N. et al.(2000). The Protein Data Bank. Nucl. Acids Res.,28, 235-242. |
| 6. | Frishman, D. & Argos, P. (1995). Knowledge-based protein secondarystructure assignment. Proteins, 23, 566-579. |
| 7. | Lesk, A. M. & Rose, G. D. (1981). Folding units in globular proteins. Proc.Natl. Acad. Sci. USA, 78, 4304-4308. |
| 8. | Hogue, C. W. & Bryant, S. H. (1998). Structure databases. MethodsBiochem Anal, 39,46-73. |
| 9. | Marchler-Bauer, A., Addess, K. J., Chappey, C., Geer, L., Madej, T. et al.(1999). MMDB: Entrez's 3D structure database. Nucl. Acids Res., 27, 240-243. |
| 10. | Orengo, C. A., Pearl, F. M., Bray, J. E., Todd, A. E., Martin, A. C. et al.(1999). The CATH Database provides insights into protein structure/functionrelationships. Nucl. Acids Res., 27, 275-279. |
| 11. | Lo Conte, L., Ailey, B., Hubbard, T. J., Brenner, S. E., Murzin, A. G. etal. (2000). SCOP: a structural classification of proteins database. Nucl.Acids Res., 28,257-259. |
| 12. | Yang, A. S. & Honig, B. (2000). An integrated approach to the analysisand modeling of protein sequences and structures. I. Protein structuralalignment and a quantitative measure for protein structural distance. J.Mol. Biol., 301,665-678. |
| 13. | Pearl, F. M., Martin, N., Bray, J. E., Buchan, D. W., Harrison, A. P. et al.(2001). A rapid classification protocol for the CATH domain database to supportstructural genomics. Nucl. Acids Res., 29, 223-227. |
| 14. | Lesk, A. M. (1991). Protein Architecture - A Practical Approach. OxfordUniversity Press, Oxford, New York, Tokyo. |
| 15. | Murzin, A. G. (1996). Structural classification of proteins: newsuperfamilies. Curr. Opin. Str. Biol., 6, 386-394. |
| 16. | Przytycka, T., Aurora, R. & Rose, G. D. (1999). A protein taxonomy basedon secondary structure. Nat. Struct. Biol., 6, 672-682. |
| 17. | Young, M., Kirshenbaum, K., Dill, K. A. & Highsmith, S. (1999).Predicting conformational switches in proteins. Prot. Sci., 8, 1752-1764. |
| 18. | Sternberg, M. J., Bates, P. A., Kelley, L. A. & MacCallum, R. M. (1999).Progress in protein structure prediction: assessment of CASP3. Curr. Opin.Str. Biol., 9,368-73. |
| 19. | Marti-Renom, M. A., Stuart, A., Fiser, A., Sanchez, R., Melo, F. et al.(2000). Comparative protein structure modeling of genes and genomes. Annu.Rev. Biophys. Biomol. Struct., 29, 291-325. |
| 20. | Sauder, J. M., Arthur, J. W. & Dunbrack Jr, R. L. (2000). Large-scalecomparison of protein sequence alignment algorithms with structure alignments. Proteins, 40, 6-22. |
| 21. | Rost, B. (1995). TOPITS: Threading One-dimensional Predictions IntoThree-dimensional Structures. In Third International Conference on IntelligentSystems for Molecular Biology (Rawlings, C., Clark, D., Altman, R., Hunter, L.,Lengauer, T. et al., eds.), pp. 314-321, Menlo Park, CA: AAAI Press, Cambridge,England. |
| 22. | Sippl, M. J. (1995). Knowledge-based potentials for proteins. Curr. Opin.Str. Biol., 5,229-235. |
| 23. | Fischer, D. & Eisenberg, D. (1996). Fold recognition using sequence-derivedproperties. Prot. Sci., 5, 947-955. |
| 24. | Russell, R. B., Copley, R. R. & Barton, G. J. (1996). Protein foldrecognition by mapping predicted secondary structures. J. Mol. Biol., 259, 349-365. |
| 25. | Sippl, M. J. & Floeckner, H. (1996). Threading thrills and threats. Structure, 4, 15-19. |
| 26. | Rice, D. W. & Eisenberg, D. (1997). A 3D-1D substitution matrix forprotein fold recognition that includes predicted secondary structure of thesequence. J. Mol. Biol., 267, 1026-1038. |
| 27. | Rost, B., Schneider, R. & Sander, C. (1997). Protein fold recognition byprediction-based threading. J. Mol. Biol.,270, 471-480. |
| 28. | Jaroszewski, L., Rychlewski, L., Zhang, B. & Godzik, A. (1998). Foldprediction by a hierarchy of sequence, threading, and modeling methods. Prot.Sci., 7, 1431-1440. |
| 29. | de la Cruz, X. & Thornton, J. M. (1999). Factors limiting theperformance of prediction-based fold recognition methods. Prot. Sci., 8, 750-759. |
| 30. | Di Francesco, V., Munson, P. J. & Garnier, J. (1999). FORESST: foldrecognition from secondary structure predictions of proteins. Bioinformatics, 15, 131-140. |
| 31. | Jones, D. T. (1999). GenTHREADER: an efficient and reliable protein foldrecognition method for genomic sequences. J. Mol. Biol., 287, 797-815. |
| 32. | Jones, D. T., Tress, M., Bryson, K. & Hadley, C. (1999). Successfulrecognition of protein folds using threading methods biased by sequencesimilarity and predicted secondary structure. Proteins, 37, 104-111. |
| 33. | Kolinski, A., Rotkiewicz, P., Ilkowski, B. & Skolnick, J. (1999). Amethod for the improvement of threading-based protein models. Proteins, 37, 592-610. |
| 34. | Xu, Y., Xu, D., Crawford, O. H., Einstein, Jr., Larimer, F. et al. (1999).Protein threading by PROSPECT: a prediction experiment in CASP3. Prot.Engin., 12, 899-907. |
| 35. | Rost, B. (1998). Marrying structure and genomics. Structure, 6, 259-263. |
| 36. | Sali, A. (1998). 100,000 protein structures for the biologist. Nat.Struct. Biol., 5,1029-32. |
| 37. | Burley, S. K., Almo, S. C., Bonanno, J. B., Capel, M., Chance, M. R. et al.(1999). Structural genomics: beyond the human genome project. Nat. Gen., 23, 151-157. |
| 38. | Blundell, T. L. & Mizuguchi, K. (2000). Structural genomics: anoverview. Prog Biophys Mol Biol, 73, 289-295. |
| 39. | Shapiro, L. & Harris, T. (2000). Finding function through structuralgenomics. Curr. Opin. Biotech., 11, 31-35. |
| 40. | Liu, J. & Rost, B. (2001). Comparing function and structure betweenentire proteomes. Prot. Sci., 10, 1970-1979. |
| 41. | Liu, J. & Rost, B. (2001). Target space for structural genomicsrevisited. Bioinformatics,submitted. |
| 42. | Schulz, G. E. (1988). A critical evaluation of methods for prediction ofprotein secondary structures. Annu. Rev. Biophys. Biophys. Chem., 17, 1-21. |
| 43. | Barton, G. J. (1995). Protein secondary structure prediction. Curr. Opin.Str. Biol., 5,372-376. |
| 44. | Lupas, A. (1996). Coiled coils: new structures and new functions. TIBS, 21, 375-382. |
| 45. | Rost, B. & Sander, C. (1996). Bridging the protein sequence-structuregap by structure predictions. Annu. Rev. Biophys. Biomol. Struct., 25, 113-136. |
| 46. | Rost, B. & O'Donoghue, S. I. (1997). Sisyphus and prediction of proteinstructure. CABIOS, 13, 345-356. |
| 47. | Rost, B. (2001). Protein secondary structure prediction continues to rise. J.Struct. Biol., 134,204-218. |
| 48. | Jones, D. T., Orengo, C. A. & Thornton, J. M. (1996). Protein folds andtheir recognition from sequence. In Protein structure prediction (Sternberg, M.J. E., eds.), pp. 173-206, Oxford Univ. Press, Oxford. |
| 49. | Finkelstein, A. V. (1997). Protein structure: what is it possible to predictnow? Curr. Opin. Str. Biol., 7, 60-71. |
| 50. | Xu, H., Aurora, R., Rose, G. D. & White, R. H. (1999). Identifying twoancient enzymes in Archaea using predicted secondary structure alignment. Nat.Struct. Biol., 6,750-4. |
| 51. | Lindahl, E. & Elofsson, A. (2000). Identification of related proteins onfamily, superfamily and fold level. J. Mol. Biol., 295, 613-625. |
| 52. | Fain, B. & Levitt, M. (2001). A novel method for sampling alpha-helicalprotein backbones. J. Mol. Biol., 305, 191-201. |
| 53. | Jennings, A. J., Edge, C. M. & Sternberg, M. J. (2001). An approach toimproving multiple alignments of protein sequences using predicted secondarystructure. Prot. Engin., 14, 227-231. |
| 54. | Pauling, L., Corey, R. B. & Branson, H. R. (1951). The Structure ofProteins: Two Hydrogen-bonded Helical Configurations of the Polypeptide Chain. Proc.Natl. Acad. Sci. U.S.A., 37, 205-234. |
| 55. | Pauling, L. & Corey, R. B. (1951). Configurations of Polypeptide Chainswith Favored Orientations Around Single Bonds: Two New Pleated Sheets. Proc.Natl. Acad. Sci. U.S.A., 37, 729-740. |
| 56. | Andersen, C. A. F. (2001). Protein structure and the diversity of hydrogenbonds. The Technical University of Denmark, Ph.D. Thesis. |
| 57. | Kabsch, W. & Sander, C. (1983). How good are predictions of proteinsecondary structure? FEBS Lett., 155, 179-182. |
| 58. | Kabsch, W. & Sander, C. (1983). Segment83. unpublished. |
| 59. | Pauling, L. (1939). The nature of the chemical bond. Cornell UniversityPress, New York. |
| 60. | Creighton, T. (1993). Proteins: structures and molecular properties. W. H.Freeman, New York. |
| 61. | Hvidt, A. & Westh, P. (1998). Different views on the stability ofprotein conformations, and hydrophobic effects. J. Solution Chem., 27, 395-402. |
| 62. | Baker, E. N. & Hubbard, R. E. (1984). Hydrogen bonding in globularproteins. Prog. Biophys. molec. Biol., 44, 97-179. |
| 63. | Bordo, D. & Argos, P. (1994). The role of side-chain hydrogen bonds inthe formation and stabilization of secondary structure in soluble proteins. J.Mol. Biol., 243,504-519. |
| 64. | Jeffrey, G. A. & Saenger, W. (1994). Hydrogen bonding in biologicalstructurs. Springer, Berlin. |
| 65. | Boobbyer, D. N., Goodford, P. J., McWhinnie, P. M. & Wade, R. C. (1989).New hydrogen-bond potentials for use in determining energetically favorablebinding sites on molecules of known structure. J Med Chem, 32, 1083-1094. |
| 66. | Wade, R. C., Clark, K. J. & Goodford, P. J. (1993). Further developmentof hydrogen bond functions for use in determining energetically favorablebinding sites on molecules of known structure. 1. Ligand probe groups with theability to form two hydrogen bonds. J Med Chem,36, 140-147. |
| 67. | Ramachandran, G. N. & Sasisekharan, V. (1968). Conformation ofpolypeptides and proteins. Adv. Prot. Chem.,23, 284-438. |
| 68. | Wilmot, C. M. & Thornton, J. M. (1990). Turns and their distortions: aproposed new nomenclature. Protein Engin., 3, 479-494. |
| 69. | Colloc'h, N., Etchebest, C., Thoreau, E., Henrissat, B. & Mornon, J.-P.(1993). Comparison of three algorithms for the assignment of secondarystructure in proteins: the advantages of a consensus assignment. Prot.Engin., 6, 377-382. |
| 70. | Rost, B. & Sander, C. (1994). 1D secondary structure prediction throughevolutionary profiles. In Protein Structure by Distance Analysis (Bohr, H.& Brunak, S., eds.), pp. 257-276, IOS Press, Amsterdam, Oxford, Washington. |
| 71. | Cuff, J. A. & Barton, G. J. (1999). Evaluation and improvement ofmultiple sequence methods for protein secondary structure prediction. Proteins, 34, 508-519. |
| 72. | Fasman, G. D. (1989). The development of the prediction of proteinstructure. In Prediction of protein structure and the principles of proteinconformation (Fasman, G. D., eds.), pp. 193-303, Plenum Press, New York,London. |
| 73. | Richardson, J. S. & Richardson, D. C. (1989). Principles and patterns ofprotein conformation. In Prediction of protein structure and the principles ofprotein conformation (Fasman, G. D., eds.), pp. 1-98, Plenum Press, New York,London. |
| 74. | Rost, B. & Sander, C. (2000). Third generation prediction of secondarystructure. Methods in Molecular Biology, 143, 71-95. |
| 75. | Rost, B. (2001). Rising accuracy of protein secondary structure prediction.In Protein structure determination, analysis, and modeling for drug discovery(Chasman, D., eds.), pp. in press, Dekker, New York. |
| 76. | Brändén, C. & Tooze, J. (1991). Introduction to ProteinStructure. Garland Publ., New York, London. |
| 77. | Bystroff, C. & Baker, D. (1998). Prediction of local structure inproteins using a library of sequence- structure motifs. J. Mol. Biol., 281, 565-577. |
| 78. | Lesk, A. M., Lo Conte, L. & Hubbard, T. J. P. (2001). Assessment ofnovel folds targets in CASP4: Predictions of three-dimensional structures,secondary structures, and interresidue contacts. Proteins,in press. |
| 79. | Eyrich, V., Martí-Renom, M. A., Przybylski, D., Fiser, A., Pazos, F.et al. (2001). EVA: continuous automatic evaluation of protein structureprediction servers. Bioinformatics,in press. |
| 80. | Rost, B. & Eyrich, V. (2001). EVA: large-scale analysis of secondarystructure prediction. Proteins,in press. |
| 81. | Srinivasan, R. & Rose, G. D. (1999). A physical basis for proteinsecondary structure. Proc. Natl. Acad. Sci. U.S.A., 96, 14258-14263. |
| 82. | Taylor, W. R. (2001). Defining linear segments in protein structure. J.Mol. Biol., 310,1135-1150. |
| 83. | Holm, L. & Sander, C. (1998). Touring protein fold space with Dali/FSSP.Nucl. Acids Res., 26, 318-321. |
| 84. | Quiocho, F. A., Spurlino, J. C. & Rodseth, L. E. (1997). Extensivefeatures of tight oligosaccaride bindingrevealed in high-resolution structuresof the maltodextrin transprot/chemosensory receptor. Structure, 5, 997. |
| 85. | Suugiyama, S., Matsuo, Y., Maenaka, K., Vassylyev, D. G., Matsushima, M. etal. (1996). The 1.8-Å X-ray structure of the Escherichia coli potdprotein complexed with spermidine and the mechanism of polyamine binding. Prot.Sci., 5, 1984-1990. |
| 86. | Teeter, M. M. (1984). Water structure of a hydrophobic protein at atomicresolution. Pentagon rings of water molecules in crystals of crambin. Proc.Natl. Acad. Sci. U.S.A., 81, 6014. |
| 87. | Jones, D. T. (1999). Protein secondary structure prediction based onposition-specific scoring matrices. J. Mol. Biol., 292, 195-202. |
| 88. | Baldi, P., Brunak, S., Frasconi, P., Soda, G. & Pollastri, G. (1999).Exploiting the past and the future in protein secondary structure prediction. Bioinformatics, 15, 937-946. |
| 89. | Rost, B. (1996). PHD: predicting one-dimensional protein structure byprofile based neural networks. Meth. Enzymol.,266, 525-539. |
| 90. | Rost, B. (2001). Predicting protein structure: better data, better results! J.Mol. Biol.,insubmission. |
| Contact: rost@columbia.edu | Version: Nov 28, 2001 |