On the evolutionary conservation of hydrogen bonds made by buried polar amino acids: the hidden joists, braces and trusses of protein architecture

Background The hydrogen bond patterns between mainchain atoms in protein structures not only give rise to regular secondary structures but also satisfy mainchain hydrogen bond potential. However, not all mainchain atoms can be satisfied through hydrogen bond interactions that arise in regular secondary structures; in some locations sidechain-to-mainchain hydrogen bonds are required to provide polar group satisfaction. Buried polar residues that are hydrogen-bonded to mainchain amide atoms tend to be highly conserved within protein families, confirming that mainchain architecture is a critical restraint on the evolution of proteins. We have investigated the stabilizing roles of buried polar sidechains on the backbones of protein structures by performing an analysis of solvent inaccessible residues that are entirely conserved within protein families and superfamilies and hydrogen bonded to an equivalent mainchain atom in each family member. Results We show that polar and sometimes charged sidechains form hydrogen bonds to mainchain atoms in the cores of proteins in a manner that has been conserved in evolution. Although particular motifs have previously been identified where buried polar residues have conserved roles in stabilizing protein structure, for example in helix capping, we demonstrate that such interactions occur in a range of architectures and highlight those polar amino acid types that fulfil these roles. We show that these buried polar residues often span elements of secondary structure and provide stabilizing interactions of the overall protein architecture. Conclusions Conservation of buried polar residues and the hydrogen-bond interactions that they form implies an important role for maintaining protein structure, contributing strong restraints on amino acid substitutions during divergent protein evolution. Our analysis sheds light on the important stabilizing roles of these residues in protein architecture and provides further insight into factors influencing the evolution of protein families and superfamilies.


Background
As Pauling and Corey realised, satisfaction of hydrogen bonding potential of polypeptide mainchain functions is one of the major factors that give rise to the β-strand and α-helix [1,2]. These regular elements of secondary structure give their names to the main features of protein structure: classical β-sheets, α-helical bundles, αβ-Rossman fold, αβ-barrel and many others. Hydrogen bonding also plays important roles in the intricate and sometimes elaborate arches and turns which link α-helices and βstrands [3][4][5].
However, these elegant architectures still leave many mainchain functions unsatisfied in their potential to form hydrogen bonds: an early survey of hydrogen bonding in proteins revealed that ~40% of mainchain atoms do not form hydrogen bonds with other mainchain atoms [6]. In general these occur in four different circumstances: (1) Where strands and helices terminate, requiring "capping" [6][7][8][9][10].
(3) In polyproline or irregular, twisted strands [15,16] (4) In arches and turns [3][4][5]17,18]. Water molecules or sidechains can usually satisfy the hydrogen bonding potential of mainchain functions that are at the protein surface in a variety of ways and so the residues are often substituted in evolution. However, in the smaller proportion of functions that must be satisfied from the core of the protein, this is achieved by buried sidechains of polar residues.
Analysis of the substitution patterns of amino acids within homologous protein families has revealed that buried polar residues that are hydrogen-bonded to mainchain amide atoms are highly conserved, more so than those polar residues forming hydrogen bonds to mainchain carbonyl atoms or other sidechains [19,20]. Furthermore, analysis of the median sequence entropy of buried amino acid residues has shown that buried polar sidechains, for which the hydrogen bond capacity is satisfied, are the most conserved amino acid residues within proteins [21]. The number of hydrogen bonds to mainchain amide groups also influences the conservation of buried satisfied polar residues, with those forming two or more being significantly more conserved than those forming only one or none [21]. Together, these results imply that the hydrogen bond functions maintained by these conserved buried polar groups have an important role in maintaining protein architecture. Figure 1 shows an example of conservation of sequence and local environment for the beta/gamma crystallin family. In the crystallins, the hydrogen bonds provided by a buried and conserved serine help to stabilize a β-hairpin structure; this is the serine that recurs in each of the four domains of β and γ crystallins and is part of the signature motif that has allowed recognition of distant homologues [22].
Previous in silico analyses of the stabilizing roles that polar sidechains have on the backbone of protein structures have tended to focus on a particular architectural context [13,23,24]. Bordo and Argos [25] identified recurring patterns and amino acid types involved in sidechainto-sidechain and sidechain-to-mainchain interactions. However, the conservation of polar residues and the three-dimensional (3D) arrangements of the sidechainto-mainchain hydrogen bonds were not considered. What then are the features of sidechain-to-mainchain hydrogen bonds formed by polar sidechains? Which amino acids are involved? What kinds of structures do these buried polar residues maintain? Are they local to a secondary structure or do they link between different helices and strands, stabilizing tertiary structure?
In this report we focus purely on buried polar residues that are entirely conserved within protein families and superfamilies, hydrogen bonding to a mainchain atom in each family member. We hypothesise that such buried sidechain-to-mainchain hydrogen bonds satisfy main-chain hydrogen bonding potential where secondary structures cannot be formed, and in so doing become irreplaceable elements of the overall architecture. In order to test this hypothesis we characterize the nature and tertiary structural context of these conserved and buried polar residues. We show that polar sidechains which bridge to mainchain functions in the cores of proteins have conserved tertiary structural roles in homologues. Like the elements of secondary structure, they are born of the need to satisfy hydrogen bonding but, in achieving this, they become key, conserved structural features of many well-known protein architectures. Some are joists or braces, spanning the helices and strands, while others form truss-like structures that support complex loop structures ( Figure 2).

Results and Discussion
Buried polar residues stabilizing protein architecture through conserved interactions In HOMSTRAD [26], a database of structurally aligned families, 143 families have five or more members with high resolution structures, 131 of which are non-redundant i.e. their sequence alignments do not overlap -see Additional file 1, Table S1. Of these, 65 have conserved and buried polar residues, providing a total of 233 alignment positions where the equivalent residue in each structure forms a hydrogen bond through its sidechain to a mainchain atom -see Additional file 2, Table S2. The frequency of occurrence for the polar amino acids at these 233 alignment positions are shown in Table 1. We have examined the propensity with which such conserved and buried polar residues participate in various architectural motifs -shown in Table 2. We have focused on interactions that are conserved in families, on the assumption that these have had a selective advantage and may teach us about important factors that determine protein architectures.

Interactions with the N-terminal regions of α-helices
For conserved and buried polar residues making hydrogen bonds to mainchain NH functions in the N-terminal regions of α-helices, cysteine has the highest propensity to form such interactions, followed by negatively charged aspartate, histidine and glutamate (Table 2 and see Additional file 3, Figure S1A -grey bars); surprisingly, neutral residues such as serine, threonine and asparagine have higher propensities when solvent accessible positions are considered ( Table 2 and see Additional file 3, Figure S1A white bars) [8,27,28]. This may reflect the importance of the charged hydrogen bond in regions of low dielectric strength, as well as its interaction with the helix dipole [29].
Local capping effects of buried aspartates occurring either upstream ( Figure 3A-B) or downstream ( Figure  3C-D) from their hydrogen bonded partner have been well described, but less attention has been paid to aspartates that are hydrogen bonded to the N-terminal residue of a helix via a distant interaction ( Figure 3B, E-F), pro-viding structures that often resemble joists. Similar hydrogen bonded interactions are made by cysteines with N-terminal residues, except that cysteine mostly occurs upstream ( Figure 3D, G) or interacts distantly ( Figure 3H) 1bd7]. The conservation of these sidechain-to-mainchain interactions implies that they have an important role in the mainchain architecture of these proteins. G) Shows selected regions of a multiple sequence alignment of the β/γ crystallins containing the four conserved and buried serine residues (highlighted by red stars). The local structural environment of each residue in the alignment is displayed using JOY annotation [32]. Pictures of protein structures were produced using Pymol and clipping was used for improving figure clarity [35]. and is rarely observed to occur downstream from the Nterminal residue ( Figure 3I).

Interactions with the C-terminal regions of α-helices
In a similar way to the aspartates that interact with N-terminal regions of α-helices, the charged residue, arginine, has the highest propensity to form capping interactions that are both conserved and buried at the C-termini of αhelices, while at the same time compensating for the helix dipole (Table 2 and see Additional file 3, Figure S1B -grey bars). Interestingly, all conserved, buried arginine residues that interact with C-terminal residues do so distantly ( Figure 4), often with the arginine itself also being found within a capping region of a different helix ( Figure  4D-F). This feature often occurs when the C-termini of multiple helices are aligned ( Figure 4D-F), no doubt providing favourable interactions with the negative helix dipoles by helping to offset charge repulsion between two or more helix C-termini.

Interactions with edge strands
The polar amino acids with the highest propensities for interacting with edge strands are arginine, asparagine, glutamine and cysteine ( Table 2 and see Additional file 3, Figure S2A -white bars). However, of conserved, buried polar residues making hydrogen bonds to mainchain atoms in edge strands, tyrosine has the highest propensity to form such an interaction, followed by cysteine, arginine, asparagine and threonine, although the propensities are rather low (Table 2 and

Interactions from within edge strands
Arginine, followed by tyrosine and threonine (Table 2 and see Additional file 3, Figure S2B -white bars) have the highest propensity to form hydrogen bonds to mainchains within edge strands. However, amongst conserved and buried residues within edge strands, tryptophan has the highest propensity, followed by glutamine, histidine and asparagine (Table 2 and see Additional file 3, Figure  S2B -grey bars). Asparagine and tryptophan often interact (locally) with regions connecting regular secondary structures e.g. β-turns and β-hairpins ( Figure 6A-C), while glutamine can bridge the gap between two strands in β-barrel structures ( Figure 6E-F).

Interactions with centre strands
The mainchains in centre strands are sometimes unable to form hydrogen bonds with the neighbouring strand. Examples include where two strands curve away from each other ( Figure 7A), where the neighbouring strand is shorter than the central strand in question ( Figure 7B-E), or where the mainchain atom is at the terminus of a strand ( Figure 7B,C,E) or part of a β-barrel ( Figure 6E). These mainchain functions are often satisfied by sidechain hydrogen bonds. Of polar residues that are conserved and buried and carrying out this role, cysteine, glutamine, threonine, asparagine and serine have the highest propensity to form such interactions (Table 2 and see Additional file 3, Figure S2C -grey bars). In some cases the sidechains act as "braces"; for example, the thre- Schematic diagrams of: A) a joist that spans two columns and supports the roof above; B) Vertical K-bracing which is used to provide stability to walls; C) Tri-bearing, D) Polynesian and E) double cantilever trusses are used to support structures such as roofs and bridges. Thr 174 Trp 92 Tyr 144 The analysis only considered conserved positions containing polar amino acids therefore the frequency of occurrence for nonpolar amino acids is zero.
onines of the conserved aspartic proteinases Asp-Thr-Gly triplet, where the strands diverge after the threonine on either side of the pseudo dyad in the eukaryotic enzymes or the dyad of the dimeric retroviral enzymes ( Figure 7F).

Interactions from within centre strands
Of conserved, buried polar residues within centre strands forming hydrogen bonds to mainchain atoms, tyrosine has the highest propensity to form such interactions, followed closely by arginine, asparagine, serine, aspartate and glutamate (Table 2 and see Additional file 3, Figure  S2D -grey bars). We see a different pattern however when we consider all polar amino acids in centre strands that form hydrogen bonds to mainchain atoms -arginine has the highest propensity to form this type of interaction followed by cysteine, tyrosine, threonine and asparagine ( Table 2 and see Additional file 3, Figure S2D -white bars). Asparagine, aspartate, glutamate, serine and tyrosine are more commonly found to form hydrogen bonds to mainchain atoms from within edge strands when conservation and solvent accessibility are considered whereas threonine and cysteine are less common. The conserved, buried polar residues within centre strands that form hydrogen bonds to mainchain atoms tend to occur at the termini of strands more often than in the middle of the strand (Figure 8). They often interact with coils ( Figure 8A-D), β-turns ( Figure 8E) and polyproline, forming truss-like structures that support the coil-like regions they are interacting with. Others are observed to interact with helix capping regions ( Figure  8F-G) and neighbouring strands in β-barrels, forming structures that resemble joists ( Figure 8H-I).

Interactions to residues within 3 10 helices
Cysteine has the highest propensity of buried, conserved polar residues to form hydrogen bonds to mainchain atoms in 3 10 helices, followed by tyrosine, tryptophan, aspartate and arginine (Table 2 and see Additional file 3, Figure S3 -grey bars). This differs to all polar amino acids interacting with 3 10 helices where arginine, histidine, cysteine and asparagine have the highest propensities ( Table 2 and see Additional file 3, Figure S3 -white bars). There is less of a clear preference for the 3 10 helices to hydrogen bond with particular polar sidechains than in α-helices, probably due to the greater plasticity in these helices, which usually comprise only two or three turns ( Figure 9).

Interactions with beta hairpins
In β-hairpins, mainchain atoms that are hydrogenbonded to conserved and buried sidechains have a high propensity to interact with aspartate, cysteine, trypto-

Con All Con
All Con All Con All Con All Con All Con All Con All Con All Con All Columns headed "Col" display the propensities of conserved buried polar residues. Columns headed "All" display the propensities of all polar residues forming the indicated interaction (regardless of solvent accessibility or conservation).
phan and serine (Table 2 and see Additional file 3, Figure  S4 -grey bars). We see a similar pattern when we consider all polar amino acids forming hydrogen bonds to mainchain atoms in β-hairpins; asparagine has the highest propensity to form this type of interaction followed by aspartate, arginine, serine and threonine (Table 2 and see Additional file 3, Figure S4 -white bars). Therefore, although asparagine, arginine and threonine often form hydrogen bonds to mainchain atoms within β-hairpins, these interactions tend not to be conserved in buried positions.
The conserved buried polar residues that form hydrogen bonds to mainchain atoms in β-hairpins almost always interact distantly with mainchain atoms that would otherwise form no hydrogen bonds ( Figure 10). Some of the β-hairpin structures are extremely long and complex ( Figure 10A-C).

Interactions with polyproline
From the set of conserved, buried polar residues hydrogen-bonded to mainchain atoms of polyproline-type helices, arginine is most common, followed by histidine, tyrosine and tryptophan (Table 2 and see Additional file 3, Figure S5 -grey bars). Arginine also has the highest propensity to form this interaction when we consider all residues forming this type of interaction, followed by glutamine, asparagine and histidine (Table 2 and see Additional file 3, Figure S5 -white bars). A similar result has previously been observed where hydrogen bonds from sidechains to mainchains in polyproline were most frequently formed by arginine followed by glutamine, asparagine, serine and threonine [15].
Polyproline helices are extended and most often occur on the surface of proteins [30]; it is therefore not surprising that the conserved, buried residues that form hydrogen bonds come from a residue distant in the sequence. Typical examples are shown in Figures 11A and 11B from  In the latter case a second arginine within a β-sheet also interacts with C-terminal residues of a third short helix. the α/β hydrolases and the alcohol dehydrogenases, respectively. In such a mode, the polar residues form truss-like structures that help to stabilize the irregular polyproline helices.

Interactions with coil regions
Cysteine and aspartate clearly have the highest propensity to form hydrogen bonds to coil regions out of buried conserved polar residues (Table 2 and see Additional file 3, Figure S6 -grey bars). However, arginine has the highest propensity to perform this role when all positions are considered, followed by asparagine and aspartate (Table 2 and see Additional file 3, Figure S6 -white bars). A previous analysis of intra-coil sidechain-to-mainchain hydrogen bonds revealed that aspartate, serine, asparagine and threonine are the polar residues that most commonly form this type of interaction, with 80% of these cases being at solvent-exposed sites [25].
Polar sidechains frequently form hydrogen bonds to coil regions, often in very elaborate loop structures that form extended turns and arches [3][4][5] (Figures 3A,C-D,H; 4A-B,E; 5D,F; 6A,C,D; 8A-F). However, there are also instances where the conserved and buried residues only form hydrogen bonds with mainchain atoms in coil regions, indicating that stabilization of these irregular regions by polar sidechains is important enough for them to be conserved during evolution ( Figure 12).

Conclusions
We have previously demonstrated that buried polar residues, although small in number, tend to be more conserved when their hydrogen-bonding potential is satisfied or where they form hydrogen bonds to mainchain atoms [21]. Conservation of these residues and the interactions that they form implies that they are important for maintaining protein structure and hence provide restraints on amino acid substitutions during divergent evolution. We have shown that conserved, buried polar residues have conserved roles in stabilizing the tertiary structure of proteins by forming hydrogen bonds to mainchain atoms. The conservation of these sidechain-to-mainchain hydrogen bonds implies that mainchain architecture is a crucial restraint on the evolution of proteins and that the interactions are retained as an essential part of the protein fold. The structural motifs that we have examined have been shown to have particular propensities for polar residues which form hydrogen bonds with mainchain atoms. Although local sidechain-to-mainchain interactions have  been the focus of most previous studies, the propensity for sidechain-to-mainchain hydrogen bond formation is often met by distant interaction. For example, we observe that arginine frequently caps the C-termini of α-helices through a distant interaction. We have shown that buried polar residues maintain 3D relationships between secondary structures where mainchain-to-mainchain hydrogen bonds cannot play a role and that similar stabilizing structures recur in different architectures. The key roles of these stabilizing interactions in maintaining protein structures have been previously demonstrated in a few cases, for example in the tyrosine corner [31], but we have shown here that there are many others important for maintaining protein stability.
Although it is generally unfavourable to bury hydrophilic amino acids in the core of proteins, this is counterbalanced by the need to satisfy mainchain atom hydrogen-bond potential. The interactions that the polar residues form when providing these supporting roles are often quite complex and can be thought of as analogous to features in our own built 3D environment. Many form joists, bridging between the elements of secondary structure (for example, Figures 3B, 4D-F, 5B-C, 7A-E), analogous to those that bridge columns and support structures above them in man-made buildings (Figure 2A). Other sidechains act as braces, tethering two strands at the point at which they diverge ( Figure 7F and Figure 2B). Buried hydrogen bonded polar sidechains often maintain triangulated structures, supporting distorted helices and complex loop structures ( Figures 3I, 6A,C, 8A-C, 11A-B): these provide a striking parallel with the trusses supporting the roofs of buildings ( Figure 2C-E). Remarkably, these structural features have been highly conserved in their respective architectural histories, despite the variation in surface structures. Both are hidden from view and remain unappreciated, except by the cognoscenti. We hope that this paper will help bring understanding of these important structural features of protein architecture to a wider audience.

Dataset
Protein families containing five or more members were selected from HOMSTRAD where the family alignment  contained a conserved, buried polar residue and where the sidechain of the polar residue forms a hydrogen bond to a mainchain atom in each family member. The JOY [32] alignment of each family within HOMSTRAD was used to identify families that met these criteria. JOY's default relative accessibility cut-off (7% or less) was used to define solvent inaccessible (buried) residues. In order to avoid redundancy, where protein families overlapped, the family with the highest sequence coverage was chosen for the analysis.

Identification of hydrogen bond partners
Hydrogen bond partner(s) to the conserved, buried polar residues were identified using the program, HBOND (J. Overington, unpublished). HBOND identifies all possible hydrogen bonds based on a distance criterion (3.5Å between donor and acceptor).

Identification of structural motifs
We used the program, PROMOTIF, to identify the structural context of the conserved polar residues and their interaction partners [33]. The following motifs were identified: 1. α-helices (N-terminal and C-terminal residues were identified based on the following positional criteria: N-(N+1) to N-(N+3) for N-terminal residues and N-3 to N+1 for C-terminal residues (where N is the length of the helix). 2. 3 10 helices Figure 9 Examples of hydrogen bond interactions from conserved, buried residues to mainchain atoms in 3 10 helices. Representative structures were chosen for each family based on resolution; residues are coloured by atom type with buried, polar residues shown in magenta. Hydrogen bonds are shown in black. A) Two arginines in the cyclodextrin glycosyltransferases family which hydrogen bond to two 3 10 helices [PDB: 1d3c]. B) A tryptophan that forms a hydrogen bond to a 3 10 helix in the papain family cysteine proteinases [PDB 1mem]. C) Two aspartates that form a hydrogen bond to each other's respective mainchain amide atom group in a 3 10    3. β-strands -edge strands were distinguished from centre strands by referring to the number of hydrogen bonding partner strands. Strands defined as having >1 hydrogen bonding partner strand were defined as centre and all others as edge. 4. β-hairpins 5. Coil regions We also identified polyproline helices using the program SEGNO [34].