Geometrical criteria for left-handed twists within protein beta-strands

Using a statistical analysis on beta-sheet structures from the Protein Data Bank, characteristic angles within beta-strands were correlated to the nature of the side chains. The twists were computed from the atomic coordinates of five consecutive amino acids’ alpha carbons from single beta-strand sequences. Conditions on the angles for twists to be mainly left-handed are given together with the frequency of occurrence for these non-standard geometrical properties within protein beta-strands. Applications in protein structure prediction and CASP challenges in particular are envisioned by making use of the probabilities of occurrence in protein structures of angle value ranges for given amino acids.


INTRODUCTION
The number of protein sequences is growing at an ever increasing pace along with the developing DNA sequencing facilities [1][2][3].Determination of the threedimensional structures of these proteins at atomic resolution relies mainly on time consuming approaches such as NMR or X-ray diffraction.Protein structure prediction made remarkable progress over the last decades [4,5].In particular, template-based methods provide an efficient approach to obtain three-dimensional models for proteins.Contact maps, correlated mutations and large sets of homologous sequences were also used to establish protein structure models [6][7][8][9][10].Based on known three-dimensional structures from the Protein Data Bank (PDB) [11,12], protein structural characteristics derived from statistical analyses facilitate the quality assessment of models which is essential in protein structure prediction.
In an attempt to facilitate protein structure prediction, we investigated geometrical properties of protein structures at a scale which is sufficiently large to be connected to topological properties of the proteins and sufficiently small to be linked to amino acid side chains characterizing protein sequences.In this report, we focus on the geometry of beta-strands within beta-sheets by measuring angles within sets of consecutive alpha carbons in beta-strands.

METHODS
Pdb21 available at the www address http://mobyle.pasteur.fr/cgi-bin/portal.py#forms::pdb21 is a program written in perl.It uses as entry files, PDB files (pdbxxxx.ent),lists of PDB files corresponding to proteins or a protein domains' structures which may be bound to other molecules or files of protein structural models written in a PDB file format.The program eliminates files associated to polypeptides of less than fifty amino acids from the lists.It takes into consideration the first protein chain of each file.The output file (.xls) defines for each amino acid of the proteins within the list, its number defining the position within the sequence, its location within the secondary structure elements and the set of angles given as integers in degrees as defined in Figure 1.
So as to avoid biases due to closely related conformations of a same protein with different PDB references, the proteins were chosen randomly among more than 70,000 protein structures of the PDB and two amino acids with identical chemical formulas with the same position numbering within a protein domain starting at the same position and ending at the same position with identical values for the angle α' were considered as a single amino acid for the statistical analysis.To eliminate biases such as redundancy in the experimental data extracted from the PDB [46], the sorting of information from the PDB files was not carried out at the level of proteins according to their sequence identity, but was done at the level of individual amino acids.Each list was composed of a set of about 800 randomly chosen protein structures from the PDB.The probabilities were derived from the analysis of lists.The errors on the probabilities given in the tables were calculated from two distinct lists.
Another list considered resulted from the removal of protein sequences with significant sequence identity [47]: known as the 25% list, it is available online as the "recent.pdb_select25.nsigma3"file.All angles were annotated with primes in this work to avoid any confusion with earlier work [48,49].

Characteristic Angles within Beta-Strands
Alpha carbons of amino acids are numbered along the protein sequence and their coordinates in the three-dimensional space are given in PDB files.Links are drawn between adjacent alpha carbons.Two links allow an angle to be defined (Figure 1).For each amino acid alpha P is the plane containing the atoms 1, 2 and 3 and Q is the plane containing the atoms 3, 4 and 5. α' is the angle between these two planes.The other angles are defined by two links between two alpha carbons (cf.text).The amino acid side-chains and the angle β' are not represented in this scheme for clarity; the amino acid in the single-letter code as given in the tables is located at position 3 and therefore part of both planes P and Q. carbon located at position noted 3 within a beta-strand (Figure 1), the five angles α', β', γ', δ' and ε' are calculated from alpha carbons' atomic coordinates using the scalar product of the corresponding normed vectors.While the angle α' is defined as an oriented angle between two planes (i.e. between their normal vectors) with values between −180˚ and 180˚, the other four angles are defined by values between 0 and 180˚ using the vectors corresponding to the following pairs of alpha carbons.
The two planes P and Q are defined by the carbon atoms 1, 2, 3 and 3, 4, 5 respectively.The sign of the angle α' was defined as the sign of the scalar product (.) noted below by making use of the vector product ( ^) between vectors p and q, which are respectively normal to the planes P and Q.
A positive sign for the angle α' (Figures 1 and 2, Tables 1(a) and 1(b)) then corresponds to the righthanded twist well known for beta-strands within betasheets.Conversely, a left-handed twist is characterized by a negative angle α'.The angles between these virtual bonds defined at a coarser scale than the known dihedral angles Phi and Psi between chemical bonds were necessary to define the notion of twist for amino acids found in a single beta-strand within a sheet.Unique atoms within amino acids such as the alpha-or beta-carbon were used in other works for the comparison of protein structures [35,50,51].

Statistical Analysis of the Angles within Protein Beta-Strands
The twist was defined by the angles Phi and Psi [28] or by a dihedral angle (theta) between alpha and betacarbons of odd residues [31].Here, the twist was defined for five alpha carbons on a single beta-strand within a beta-sheet by measuring the angle α' between the two planes P and Q (Figure 1).The distribution of the angles α' is similar for all amino acids, except for proline.The right-handed twist is characterized here by the distribution of the angle α' typically found around +30˚ (Figure 2(a)).The probability for finding a proline within a beta-strand associated to an angle α' between the two planes P and Q between 0˚ and 22˚ is low (0.03), while it is about 3 to 5 times higher for the other amino acids (Table 2(a)).
The classical notion of right-handed twist is not valid On the y-axis is the probability for an angle to be found within ranges of 22.5˚ for α' and within ranges of 10˚ for the angles β' and δ' (x-axis).On the x-axis, α' ranges from −180˚ to +180˚ and β' and δ' range from 0˚ to +140˚.The angle β' described earlier was used for protein secondary structure determination in the program DSSP in particular [48,52].Its distribution (Figure 2(b)) is anomalous for both amino acids glycine and proline, when compared to the other 18 canonical amino acids found within proteins: the probability for β' to have a value between 0 and 22.5˚ is almost twice lower than for most other amino acids (Table 2(a)).Noticeably, almost half of the β' values for the amino acid proline (47%) are found between 22.5˚ and 45˚ (Table 2(a)).This observation has to be linked to the distribution of the angle δ' (Figure 2(c)), which has values within the same range (22.5˚ to 45˚) for only 1% of the prolines within betastrands, i.e. ten to sixty times less than for other amino acids (Table 2

(b)).
In beta-strands, the angle δ' for glycine is found to be in more than half of the cases (57%) between 22.5˚ and 45˚, that is about two to six times more frequently than most other amino acids (Table 2(b)).Accordingly, we noted that glycine represents 5.0% of the amino acids in beta-strands, while glycine represents 56% of the amino acids within beta-strands satisfying the condition (δ' < 31˚).Glycine and proline whose conformation in proteins were extensively described using the dihedral Phi and Psi angles [53,54] appear also as special cases at a coarser scale among the other canonical amino acids because of their altered δ' and β' distributions within beta-strands.As illustrated in Figure 3, three successive alpha carbons centered around glycine tend to be almost aligned more frequently than for other amino acids.

Application in Structural Model Quality Assessment
Coarse-grained structural models of proteins may consist of their alpha carbon coordinates for all or most amino acids.It is then of interest to have an estimate of the twist values which do not rely on the angles Phi and Psi in particular, but only on alpha carbon coordinates.
The statistical analysis of protein structures reported above may be further used to evaluate the quality of structural models.As an example, given that the angle δ' value at prolines is statistically less than 45˚ in about 1% of the occurrences within structures reported in the Protein Data Bank, it is unlikely that the δ' value at prolines is less than 45˚ in a predicted structural model of interest.The higher the number of prolines for which the δ' value at prolines is less than 45˚, the less likely will be the predicted structural model.This approach combined with the use for the twenty amino acids of further statistically relevant observations within beta-sheets and within other structures such as those reported recently [25,40] will allow the likelihood of structural models to be estimated using new criteria: it shall improve the assessment of protein structure model quality.

CONCLUSION
A statistical analysis of a large number of protein structures from the Protein Data Bank allowed the definition of geometrical criteria generally associated to lefthanded twists in protein beta-sheets, while a righthanded twist is common for protein beta-sheets.These statistical results on protein structures may be further implemented as probabilities of occurrence for given sets of angles within structure prediction algorithms or used as restraints in protein modeling approaches [55].These statistical results may be also used within programs for the evaluation of structural model quality [56][57][58] and contribute to the improvement of protein structure prediction strategies which are evaluated every second year by the critical assessment of protein structure prediction methods (CASP) [59,60].

Figure 1 .
Figure 1.Scheme of alpha carbons within a betastrand highlighting the angles calculated.

Figure 3 .
Figure 3.A typical structure of glycine within a beta-strand.

Table 1 .
Distributions of the angle α' for different ranges of values for β', γ' and ε'.The average error on the probability for two sets of 800 proteins is given in parentheses.Aanb is the number of amino acids considered for this study of 1605 proteins of the PDB.381 proteins were considered for the protein repertoire deriving from PDBSelect25 (sigma 3.0) whose results are labeled in this table by an asterisk.
anymore in two extreme cases characterized by high values for β' (β' > 67.5˚) or by low values for ε' (ε'< 33.5˚).These situations occur in respectively in 6.0% or 6.1% (688/11365 or 5457/89130) and 2.1% or 2.2% (236/11365 or 1961/89130) of the amino acids within beta-strands (Table1(a)).The angle α' is then found to have most frequently negative values: the local twist is generally left-handed for about 8% of the amino acids within beta-strands characterized here by extreme values for the angles β' and ε' (Table1(a)).Another relation between the distribution of α' and the values of γ' was noted: more than half of the amino acids (56% or 59%) are associated to an α' value of less than 45˚; but in the case of minimal values for the angle γ' (γ' < 33.5˚), more than three quarters of the α' values are found to be above 45˚ (Table1(b)).

Table 2 .
[61]ribution of the angles α', β', γ', δ' and ε'.The values for α' are between −180˚ and +180˚, while the other angles have values between 0 and 180˚.Ranges of 22.5˚ were defined and the probability for an angle to have a value within the range indicated on the first lines is given within the table for each of the twenty canonical amino acids noted in the single-letter code.The average uncertainty is 0.012 for α', 0.027 for β', 0.023 for γ', δ' and ε' (average ratio of the error on the probability divided by the probability).1605proteinswereanalyzed.In Dactylium dendroides galactose oxidase protein structure[61](PDB reference: 1gog) within the betastrand extending from amino acids 160 to 166, the values for α', β', γ', δ' and ε' at glycine 162 are respectively −173˚, 20˚, 58˚, 9˚ and 70˚.