Statistical analysis of conformational properties of periodic dinucleotide steps in nucleosomes

Deformability of DNA is important for its superhelical folding in the nucleosome and has long been thought to be facilitated by periodic occurrences of certain dinucleotides along the sequences, with the period close to 10.5 bases. This study statistically examines the conformational properties of dinucleotides containing the 10.5 base periodicity and those without that periodicity through scanning all nucleosome structures provided in PDB. By categorizing performances on the distribution of step parameter values, averaged net values, standard deviations and deformability based on step conformational energies, we give a detailed description as to the deformation preferences correlated with the periodicity for the 10 unique types of dinucleotides and summarize the possible roles of various steps in how they facilitate DNA bending. The results show that the structural properties of dinucleotide steps are influenced to various extents by the periodicity in nucleosomes and some periodic steps have shown a clear tendency to take specific bending or shearing patterns.


INTRODUCTION
Numerous studies of nucleosome positioning have demonstrated that the arrangement of nucleosomes on DNA is nonrandom.The periodic occurrences of certain base pairs or motifs have been proven to be ubiquitous in nucleosomes [1][2][3][4][5].The periodicity of dinucleotide steps can be considered as an important signal for nucleosome identification, and it is widely considered to be closely related to the superhelical structure of nucleosomal DNA [6][7][8][9].On the other hand, the stereochemical characteristics of DNA fragments de-cide their individual behaviors of deformation when being located at certain sites along the DNA sequence [1,[10][11][12][13].Anisotropic deformation along a superhelical path implies some dimers may play the role of "hinge" and others facilitate "hinges" adhering to the histone octamer core or just simply follow the "hinge" wrapping around the core.From this point of view, there exist some base steps that are geometrically more significant than others, which are mainly reflected by the special structural parameter settings of these important "building blocks".
Although in previous studies, various methods, such as molecular dynamics simulations, energy surface calculation and deformability statistics based on dimer energy function, have already been used to decipher the conformational roles of the ten independent types of dinucleotides [11,13,14], statistics which focus on a large number of nucleosome samples whose crystal structures are experimentally determined is still scarce.Here, we choose 35 crystal structures of nucleosomes published in Protein Data Bank (PDB) as the subject of the statistical survey.The aim of our research is to observe the overall conformational patterns measured by the distribution and variability of base pair step parameters and step conformation energies and build interrelationships between periodicity and deformability of base-pair steps.

Periodicity and Deformability of Dinucleotide Steps
The periodic occurrence of dinucleotides observed in nucleosomes has long been thought to be closely related with the sequence-dependent helical anisotropy of DNA.AA/TT was firstly thought to be a step with intrinsic curvature characteristic when it is periodically repeated [16].Correctly phased repeats (10.5 bp) of AG/CT, CG, GA/TC and GC can also cause appreciable curvature [10].The 10.5 bp periodicity, which is widely acknowledged to be closely related to superhelix structure, tends to be 11 bp in bacteria and 10 bp in archaea and eukaryotes [17] and even for the same type of dinucleotides in the same nucleosome, being located at different places of the core DNA sequence, such as at the two ends or in the middle section, can make the periodicity fluctuate slightly [18].In order to take the diversity of periodicity into account, the 10 ~ 11 bp are comprehensively considered as the separation standard, that is, any step having been separated with another step of the same type by 10 ~ 11 bp will be recognized as a periodic step and marked with "1", while if the distance between the neighbouring steps of the same type goes beyond or is not up to this standard, they will be marked with "0".The ten types of steps collected from 35 nucleosomes are separated into the corresponding "0" and "1" groups accordingly.
In the above method for extracting the periodic dinucleotides, we consider only one period.That is, any neighbouring occurrence of the same type of dinucleotides with a distance of 10 ~ 11 bp will be considered a desired periodic pattern.More sophisticated methods, such as the matched mirror position filter (MMPF) [19], can be used to take several periods and their relations into account in a long DNA sequence.In this paper, since we deal with short nucleosome sequences from the PDB, there are not many long periodic patterns.Thus, we detect one-period patterns only.Noise and bias in these patterns and related parameter values can be reasonably assumed to be random and should not affect the overall distribution.

Separation of the Groups in Plots
Statistical analysis on the structural characteristics of the "0" and "1" groups is made by producing the value distribution histograms of the six base pair step parameters.By categorizing the distribution trends, we can give a detailed description as to the deformation preferences of dinucleotide steps in terms of angular and translational parameters and summarize the possible roles of the significant steps which facilitate DNA bending.

Calculation of Deformability based on Step Conformational Energy
The conformational energies reflect the fluctuations and correlations of structural parameters and also describe the deformability of dinucleotides at the global level rather than in one dimension.The conformational energy for each base-pair step is estimated by the function based on the fluctuations of step parameters from their equilibrium values [14] 1 2 All the dinucleotides collected from the 35 nucleosomes are considered as a set of experimental observations, and parameter values averaged over this dataset represent the equilibrium geometrical states of steps.Thus the deviation matrix ΔΘ and its transpose ΔΘ T can be obtained: ΔΘ = (Δθ 1 ,…, Δθ 6 ) and Δθ i = θ i -θ i ° (i = 1 ,…, 6).The covariance matrix of the step parameters M calculated over the same set of DNA structures is used to deduce the dimmer stiffness matrix F: M = kTF -1 [13].For simplicity, the Boltzmann's constant k and absolute temperature T is recognized as unity and set to 1 because the relative deformability of steps is not influenced by the value of kT, and in this sense, the calculation result is a kind of energy score rather than the real energy unit in joule.

Frequency Distribution of Step Parameter Values
The number of CG steps is especially limited in all the 35 nucleosomes.1KX4 is a special one in which four CGs are found.For 25 nucleosomes there are only two CGs in each sequence with an interval of 7 ~ 8 bp while for the remaining 9 nucleosomes, no CG can be found at all.This explains why all the CG steps in the 35 nucleosomes are categorized into the CG0 group.The values of Shift, Slide, Roll and Tilt distribute over both negative and positive ranges.Twist predominantly takes a positive value with two exceptions: -138.4º and -179.1º of TC/GA occurring in 1S32 and 3C1C respectively.Rise is generally considered as the most conserved parameter not only at each step type but also between the types in order to keep the hydrophobic interaction between two base pairs when a dinucleotide conformation changes [20,21].It is also the case in nucleosomes since the averaged Rise values for the 10 types of steps are restricted into 3.2 ~ 3.5 Å with very small SD values.However, there still exist a few steps with negative Rise values as well as values exceeding 6 Å.Finally, for all the 10 unique steps, their "0" groups have very similar distributions with their respective "1" groups on the parameter Rise, Tilt and Twist, and hence the "0" versus "1" differences in terms of the value distribution pattern on these three parameters are not discussed here.
For the value of Shift, AA/TT, AC/GT, AG/CT, AT, CA/TG and TA have no significantly different distribution patterns between the 0 and 1 groups and the overall preferences of value signs of these steps are not obvious in the distribution plots.But for GA/TC, GC and GG/CC (Figures 1(a) to 1(c)), the "0" and "1" distinction that "1" group tends to assume extreme values in both directions, does exist.In comparison with the relatively evenly distributed values of GG/CC0, GG/ CC1 group has two modes around 1Å and -1 Å meaning that most GG/CC steps in compliance with the 10~11 bp periodicity tend to take large Shift value, and GC follows in the same way.Although the peak of GA/TC1 distribution does not occupy values as large as GG/CC1 or GC1, the "1" group of GA/TC still shows clear preferences of taking non-zero Shift values.In addition, most CG steps tend to take positive Shift values.
For the values of Slide, AA/TT, AC/GT, AT, CA/TG, GA/TC and TA have no significantly different distribution patterns between the 0 and 1 group.On the other hand, AG/CT, GC and GG/CC have very similar Slide value distribution modes: "1" groups mainly distribute over the positive Slide range while their "0" groups span a relatively wider range towards both directions (Fig- For the value of Roll, the "0" groups of AA/TT, AC/ GT, CA/TG, GA/TC and GG/CC have similar distribution patterns with their corresponding "1" groups.On the other hand, AG/CT, AT, GC and TA show relatively prominent differences in their respective "0" versus "1" groups (Figures 1(g) to 1(j)).Particularly, the Roll values of "1" group in AG/CT mainly fall into the negative range while most Roll values in the "0" group have a positive sign.The "1" group of GC also chiefly takes a negative Roll value but its "0" group has a more even distribution towards both positive and negative directions.The distribution patterns of "0" and "1" groups in AT and TA, in which the majority in the "1" groups are positive and "0" groups subtly incline to negative, seems to be the opposite of AG/CT and GC.

Average and Standard Deviation of the Absolute Values of Parameters
Table 1 summarizes the average values and standard deviations of the absolute values of base-pair parameters for the 10 unique sequential base-pair steps in their "1" and "0" groups.Calculations on absolute values of the parameters ignore the effects of rotational and translational direction and only take the degree of deformation into consideration.The performances on the average degree of deformation in terms of net rotational and translational parameters can be divided into the following four kinds: 1) GC: steps with periodicity have larger net values than those without periodicity on five of six parameters.2) TA, AG/CT and GG/CC: steps with periodicity have larger net values on four of six parameters.3) CA/TG and AC/GT: steps with periodicity exceed those without periodicity only on three of six parameters, in other words, in terms of three parameters, steps having no periodicity have greater net values than steps with periodicity.4) AA/TT, GA/TC and AT: steps without periodicity exceed those with periodicity on five of six parameters which is exactly contrary to 1).It might also be noted that CG has remarkable average values of Shift, Slide and Twist.The averaged net Roll and SD values of the GG/CC0 group are significantly higher than those of the GG/CC1 group.Similarly, the GA/TC0 group has much larger averaged net Tilt value and SD than the GA/TC1 group.

Helical Parameters Reflect Structural Features of Dinucleotide Steps with Periodicity
In our studies, the statistical result of the value distribution frequency of the "0" and "1" groups of the 10 independent dinucleotide steps indicates that apparent differences between dinucleotides with periodicity and those without periodicity exist mainly in three helical parameters: Shift, Slide and Roll.They can be recognized as key parameters that drive the structural variability of periodic steps from others of the same type but without periodicity.Our finding supports Suzuki and Tolstorukov's theories [9,21] that Roll and Slide are the most important media by which particular dinucleotides exert their deformation properties on the overall structure of naked DNA or nucleosomes, and moreover, for types like AG/CT, GC, GG/CC, AT and TA in nucleosomes, their regular occurrences with the 10.5 bp periodicity along the DNA sequence endow these two parameters with unusual values and distribution trends.Indeed, Twist is an especially important parameter for describing dinucleotides' local behavior of "kinks"."Kinks" have impacts on the overall stretching of DNA sequence and to some extent influence dinucleotide periodicity.However, our results show that differences on Twist between the periodic and the non-periodic are not that obvious.Shift, which used to be excluded from the collection of key parameters, is in our conclusion another essential indicator of periodicity-dependent conformational attrib-  Absolute value of every step on each parameter is calculated, and then the mean parameter value and standard deviation for each group are calculated.Number of subscript represents standard deviation.* Maximum mean value for each parameter selected from all groups.‡ Situation in which the mean value of the "1" group on one parameter is higher than that of "0" group.
utes of dinucleotides.Despite the fact that the mean Shift values are very small for all groups, the standard deviations are fairly considerable which means Shift values vary greatly even within each type and within each group, and this phenomenon of small mean values but large SD can be explained by the fact that the large positive values and large net negative values cancel each other out.Statistics on the value range and averaged absolute value support this interpretation as well and prove that in the case of nucleosomes, Shift distance is nearly comparable to Slide in respect of deformation degree and value variability.

Deformability of Steps
The six structural parameters are in fact interdependent and the one-dimensional study is quite limited in characterizing the flexibility of various dinucleotide steps.The step conformational energies incorporate these structural features and well outline the deformability of the periodic and the non-periodic dinucleotides.For a certain type of dinucleotide in each nucleosome, the representative energy score for the "0" or "1" group of this type is defined as the average product of energy values of all the steps in this group.A complete list of dinucleotide en-ergy scores calculated in this way over the 35 nucleosomes is given in Appendix.For dinucleotides of AA/TT, AT, AG/CT, CA/TG and GC type, the number of periodic groups having higher energy scores than the corresponding non-periodic groups is close to or approximately the same with that of non-periodic groups having higher energy scores than periodic groups of the same type among all the nucleosome samples.In a statistical sense the observed deformabilities of periodic dinucleotides of these types are almost identical with their non-periodic counterparts.It is also found that in most of the 35 nucleosome cases the GA/TC0 and GG/CC0 groups have greater energy scores than corresponding GA/TC1 and GG/CC1 groups, with the proportions up to 88.5% and 79.4% of the total samples respectively.On the contrary, the quantity of nucleosomes in which the energy scores of AC/GT1 and TA1 groups exceed those of AC/GT0 and TA0 groups accounts for 80% and 87.8% of the total number of nucleosomes respectively.It is concluded that periodic GA/TC and GG/CC steps have greater deformability than their non-periodic counterparts while periodic AC/GT and TA steps appear more rigid than the non-periodic AC/GT and TA steps.

SUMMARY
AA/TT steps belong to the type whose conformational settings are not very susceptible to the 10.5 periodicity.
In the parameter value frequency histogram, AA/TT steps with periodicity have no clear differences from those without periodicity.The mean absolute values, value variabilities for the periodic and non-periodic steps are also very close to each other.Although statistics on step conformational energy reveals that the probability of the non-peiodic group requiring higher amount of energy to deform is higher than that of the periodic group requiring higher energy, it is still not predominant enough to discriminate the non-periodic group from the periodic group.The above results may suggest that the periodicity of AA/TT steps, on the whole, does not produce particular effects on their conformation features.AA/TT or A-tracts are most likely to play the role of exerting context influences on their neighbor dinucleotides and occupy specific positions to facilitate the bending of DNA around the histone core [22,23].AC/GT can also be categorized into the periodicityunsusceptible type.The relations and comparisons of the six parameters between periodic AC/GT and non-periodic AC/GT on value frequency plots, averaged net values and their standard deviations are quite similar to that of AA/TT.Compared with the non-periodic ones, to some extent periodic AC/GT steps are more rigid.
AG/CT steps can be characterized as very susceptible to the 10.5 periodicity.Firstly, there are obvious differences in the plots of value occurrences on Slide and Roll between the 10.5 bp-periodic AG/CT steps and the non-periodic ones.Secondly, periodic AG/CT steps have clearly different performances from non-periodic ones on averaged net values and value standard deviations, further testifying that the structural feature of AG/CT in terms of some parameters is correlated with periodic step occurrences in the DNA sequence.
AT steps have limited structural susceptibility to the 10.5 periodicity.On the frequency plots of parameter values, periodic steps are different from non-periodic steps on the parameter Roll, but non-periodic AT steps exceed periodic ones on five of the six parameters in the measurement of the mean net values.It can be concluded that the 10.5 periodicity is a structural feature of AT but may not necessarily contribute to sharp deformation in nucleosomes.
CA/TG steps are also susceptible to the 10.5 periodicity.Although the periodic group does not show any apparent differences from the non-periodic group on distribution plots of value occurrences and averaged values, the periodic group has much larger standard deviations than the non-periodic group.This implies that the 10.5 periodicity expressed in CA/TG steps have very large parameter value variability.CA/TG is acknowledged by most reports as being the most flexible steps that may act as "hinge" fitting the duplex to the protein surface due to its great structural variability and low energy consumption for bending [9,10,14,20,24].
CG steps are all marked as non-periodic, but notably they have distinct preferences of parameter value occurrences.Most CG steps take negative Shift and Roll values and positive Slide values.CG also has a large averaged net value on Twist which is only slightly lower than CA/TG1, while the corresponding standard deviation for net Twist values is much smaller than that of CA/TG1.It means that CG steps uniformly have a large degree of Twist.
GA/TC steps also have a certain degree of structural susceptibility to the 10.5 periodicity.Compared with non-periodic ones, GA/TC steps with 10.5 bp periodicity display a slightly different value distribution on Shift and obviously more deformability than the non-periodic ones.
The GC step is another kind of dinucleotides particularly sensitive to the 10.5 periodicity.Periodic GC steps have clearly distinguishing distribution patterns from non-periodic steps on value occurrences statistical plots of Shift, Roll and Slide.The periodic group also has larger averaged net values and standard deviations than the latter on five of the six parameters.
The GG/CC steps also belong to the periodicity-susceptible type.Steps with 10.5 bp periodicity have different distributions of value occurrences from non-periodic ones on Shift and Slide.The averaged net values of periodic ones are higher than those of non-periodic ones on four of the six parameters.The periodic GG/CC steps also have obviously more deformability than non-periodic ones.However, it still should be noted that the standard deviations of non-periodic GG/CC steps are higher than those of periodic ones on six parameters in case of mean net values, and the non-periodic group has far larger averaged net Roll value and standard deviations than the periodic group.Thus, the influence of periodicity on GG/CC is interpreted as partially reinforcing the deformability while restricting bending and shearing variability.
TA steps have a certain degree of sensitivity to the 10.5 bp periodicity.Differences between periodic ones and non-periodic ones on Roll value distribution frequency plots can be observed but are relatively subtle.The mean net values of periodic steps exceed those of the non-periodic ones on four of the six parameters, and standard deviations on each parameter for the two groups, however, are quite close to each other.The result of energy score calculation also reveals periodic TA steps are more rigid than non-periodic ones.The periodicity, by all counts, has limited influences on TA steps.
To summarize all the above analysis comprehensively, the dinucleotide steps AG/CT, GC and GG/CC are most immediately affected by the 10.5 bp periodicity.Periodic occurrences along the nucleosomal DNA sequence as-sign them distinct shearing and bending preferences, greater degrees of deformation and variability.Steps CA/TG, GA/TC and TA can be divided into the second category that is modestly influenced by the 10.5 periodicity.Undoubtedly, special conformational trends on some aspects do appear for periodic ones in this category, compared with the first class above, however, trends for the separation of periodic ones from non-periodic ones are not prominent enough or all the separation standards cannot be satisfied at the same time.Steps AA/TT, AC/GT and AT fall into the third category that is least structurally influenced by the 10.5 periodicity, which means that the structural attributes of periodic steps in this category is similar to non-periodic ones.Finally, susceptibility of CG to 10.5 bp periodicity cannot be evaluated because of lack of enough nucleosome samples, but as a YR-type dimer it should have considerable influence on DNA deformation.

Figure 1 .
Figure 1.Frequency distribution of step parameter values.

Table 1 .
Average and standard deviation of the absolute values of base-pair parameters for the "0" and "1" groups of the 10 dinucleotide steps.
This work is supported by the Hong Kong Research Grant Council (Project CityU 123408).