Geometrical study on diseaserelated ncRNAs based on Z-curve method

The Z curve is a very useful method for visualizing and analyzing DNA sequences. It is a three-dimensional space curve that constitutes a unique representation of a given DNA sequence. It becomes more and more important to study non-coding regions in the recent years. Using Z curve method, 15 disease-related ncRNAs and some snoRNAs and miRNAs sequences are selected from the NONCODE database in this paper, which relate to Alzheimer Disease. The corresponding Z curves of the studied ncRNAs, sequences have been mapped and compared. The statistical features of the Z curves are obtained. These features indicate that the ncRNAs sequences playing same roles in the celluar process have almost the same Z-curves. And the base content in these sequences is almost same too.


INTRODUCTION
It is widely accepted that Non-coding sequences play important roles in the process of translation in organisms ranging from bacteria to mammals [1,2,3].At the present time the research on non-coding region and its function is still a hot field all over the world.Among the researches about non-coding sequences, the study on nonprotein-coding RNAs (ncRNAs) is becoming increasingly important and has been made great progress already.
Traditionally, most RNA molecules were regarded as carriers conveying information from the gene to the translation machinery [4].However, since the late 1990s, it has been widely acknowledged that other types of non-protein-coding RNA molecules are present in organisms ranging from bacteria to mammals, which affect a large variety of processes including plasmid replication, phage development, bacterial virulence, chromosome structure, DNA transcription, RNA processing and modification, development control and others [5].These observations suggest that the traditional view of the structure of the genetic reg ‰ ulatory systems in organisms is far from complete.And the considerable number of non-coding RNAs (ncRNAs) that has been detected in the past few years was largely unexpected [6].
As new members and classes of ncRNAs being progressively discovered, the understanding of the importance of ncRNAs in basic cellular processes is ever increasing.Although the functions of the many recently identified ncRNAs remain mostly unknown, increasing evidence stands in support of the notion that ncRNAs represent a diverse and important functional output of most genomes [7].
Furthermore, the understanding of the significance of ncRNAs as central components of various cellular processes has risen sharply over the recent years.However, there are so many unsolved problems in this field and many of these ncRNAs still have uncharacterized functions.
Some diseases, which have constituted a threat to human beings, are related to different ncRNAs.Such as Alzheimer disease, cancers, diabetes, heart diseases, etc. [8].Among these diseases, Alzheimer disease has become the fourth-biggest cause of the illness threaten the old men , s lives, next below the cancers，heart diseases and cerebrovascular diseases.Alzheimer disease is a progressive degenerative disorder of the brain characterized by a slow, progressive decline in cognitive function and behavior.As the disease advances, persons with Alzheimer disease have tough time with daily usage of things like using the phone, cooking, handling money, or driving the car.The disease is more common in elder population.It is estimated that Alzheimer disease affects 15 million people worldwide and approximately 4 million Americans [9].The neuropathologic hallmarks of the disorder are amyloid-rich senile plaques, neurofibrillary tangles, and neuronal degeneration.
It has reported that three genes with autosomal domi-

SciRes Copyright © 2009 HEALTH
nant mutations have been identified that may lead to Alzheimer symptoms in carriers before they reach age 60. [10].The clinical features of Alzheimer disease overlaps with common signs of aging, and other types of dementia, hence the diagnosis remains difficult.We make use the ZCURVE method, which is proposed by Professor Zhang Chun-ting, to analysis ncRNAs related to Alzheimer disease.ZCURVE is a geometrical approach to study DNA sequences.Based on the Z curve method, some global and local features of the sequence can be detected in a perceivable way [11].
In this work, we download 15 Specific ncRNAs (BC200 RNA) sequences from the NONCODE database, which relate to Alzheimer disease and come from different organisms.The corresponding Z curves of the selected sequences have been mapped and shown.By analyzing and comparing the Z curves, the common features of them are found and the features may be as a criterion to study same type of disease-related ncRNAs.

Material
The NONCODE database is an integrated knowledge database designed for the analysis of non-coding RNAs (ncRNAs).Since NONCODE was first released 3 years ago [15], the number of known ncRNAs has grown rapidly, and there is growing recognition that ncRNAs play important regulatory roles in most organisms.In the updated version of NONCODE (NONCODE v2.0), the number of collected ncRNAs has reached 206 226, including a wide range of microRNAs, Piwi-interacting RNAs and mRNA-like ncRNAs.The improvements brought to the database include not only new and updated ncRNA data sets, but also an incorporation of BLAST alignment search service and access through our custom UCSC Genome Browser [12].
All ncRNAs in NONCODE were filtered automatically from GenBank and the literature, and were then later manually curated.With the exception of rRNAs and tRNAs, all classes of reported ncRNAs are included.In addition to containing sequence data, NONCODE provides a user-friendly interface, a visualization platform and a convenient search option, allowing efficient recovery of sequences, regulatory elements in the flanking sequences, related publications and other information [13,14].
We pick up 15 ncRNA (BC200 RNA) and 20 snoRNA sequences from this database, which belong to specific ncRNAs and relate with Alzheimer disease.Adequately,we select miRNA of human, virus and sequences from miRNA database.All selected sequences can be directly downloaded from the webpage.

Method
The Z curve is a unique three-dimensional space curve representation for a given DNA sequence in the sense that each can be uniquely reconstructed given the other.Consider a DNA sequence read from the 5' to the 3'-end with N bases.Inspect the sequence one base at a time, beginning from the first base.Let the number of the inspecting steps is denoted by n, i.e., n =1, 2... N. In the nth step, count the cumulative numbers of the bases A, C, G and T, occurring in the subsequence from the first to the nth base in the DNA sequence inspected.Denoting the cumulative occurring numbers of the bases A, C, G and T in the above subsequence by A n , C n , G n and T n , respectively.The Z curve is a three-dimensional space curve and composed of a series of nodes P 0 , P 1 , P 2 , . . ., P N , whose coordinates , and (n = 0, 1, 2, . . ., N, where N is the length of the DNA sequence being studied) are uniquely determined by the Z-transform of DNA sequence.
. The three components of the Z curve, i.e., ， and , represent three independent distributions that completely describe the DNA sequence being studied.Furthermore, the three independent components , and have a clear biological meaning, respectively [11].It is noted that the Z curve defined above is generally not smooth at each node.Sometimes, a smooth procedure is needed.The B-spline functions are used to smooth the Z curve.For more detailed information about the Z curve defined, please refer to references [16,17,18].
In summary, the Z curve is the unique representation for a given DNA sequence in a three-dimensional space and each can be uniquely reconstructed from the other.It offers an intuitive and convenient approach to study DNA sequences geometrically.
Where, sequence 1, 14 and 15 belong to BC200 RNA and other sequences belong to BC200-alpha RNA.Their

SciRes Copyright © 2009
cellular roles are regulators, but their sequence length is different and coming from different organisms, respectively.and compare them, respectively (see Figures 1-6).From the obtained pictures, we can see obviously that all corresponding curves for BC200-alpha RNA are almost no disparity, not only having same shapes but also same tendency (see Figures 1-3).The same condition occurs in the BC200 RNA sequences (see Figure 4).
Using Z-plotter and Origin7.5 software, corresponding Z curves of the selected 15 sequences are mapped and part of typical curves are selected shown in Figures 1-6.In addition to mapping Z-curves, the base (A, C, G, T and GC) content of the studied sequences is respectively calculated based on the Z Curve Theory.The typical results are shown at Table 1.However, the corresponding Z curves of BC200 RNA and BC200-alpha RNA sequences have obvious disparities (see Figures 5,6).The fact shows the Z curves are different too, in spite of the studied sequences all related with one type disease but their functions are different.It means the shapes and tendency of Z curves is related with functions of ncRNA sequences.
We also select snoRNAs and microRNAs of human, Arabidopsis thaliana and virus in NONCODE and miRNA database, respectively.Then map the corresponding zcurves based on Z CURVE method and analyze them.
In addition, the n y n  curves for the studied sequences show a global maximum at the position of about 120bp (BC200 RNA) or 190bp (BC200-alpha RNA).
Results are shown in Figure 7,8.

Discussion
We pick up part of typical Z curves of studied sequences      And then, in the n z n  curves of BC200 RNA and BC200-alpha RNA sequences, all <0 (see Figure 1) means strong H-bond bases (G/C) are in excess of weak H-bond bases (A/T).It indicates that this type of ncRNA is a stable structure and not mutated easily.At the same time, about

SciRes Copyright © 2009 HEALTH
Then we calculate the base content of A, C, G, T and GC in the studied sequences (see Table 1).For BC200 RNAs and BC200-alpha RNAs the results are 33%-35%, 28%-29%, 24%-25%, 13%-14%, and 52%-53%, respectively.This fact indicates that there is no obvious disparity on base content in the two types studied sequences.That is to say, the base content in the two types BC200 RNA's sequences is almost equal.
Adequately, we map and compare the Z-curves of snoRNAs, microRNAs, We can see the z-curves of one type ncRNA (miRNA) are very similar (see Figure 7).The same conditions occur in the sequences of snoRNA (see Figure 8).And the base content is almost equal in the same type ncRNA sequences.

CONCLUSIONS
Based on the above compare and analysis, a initial conclusion is drawn that all kinds of Z-curves (i.e.On top of this, the curves for the studied sequences show a global maximum at the position of about 120bp (BC200 RNA) or 190bp (BC200-alpha RNA).Furthermore, the almost same in each base content in the two types ncRNA sequences indicate that base content is related with their functions or playing roles.Furthermore, about all curves of BC200 RNA sequence, <0.Unfortunately, we don't know the biological signification about the above results.So many works will be done in our future research.
We do more tests for other type ncRNAs to test the conclusion.By mapping and comparing the Z-curves of snoRNAs, microRNAs, we can know that other type ncRNAs, also have the same statistical character as the BC200 RNA, both in sample of the z-curves and base content in the sequences.

ACKNOWLEDGMENT
This work was supported by Chinese National Key Fundamental Re-search Project (Grant No. 90403120) and Shandong Fundamental Research Project (Grant No. Y2005D12).We are grateful to Key Lab for Biophysics in Universities of Shandong for help with us.We also thank our colleagues for advice and for sharing protocols.

Figure 1 .Figure 2 .
Figure 1.The Z n -n curves for two BC200 RNA sequences 2 and 10 (coming from orangutan and crab-eating macaque, respectively).The curves are very similar in global and local.

Figure 3 .
Figure 3.The 3D curves for two BC200 RNA sequences 2 and 5 (coming from human and orangutan, respectively).The curves are same in global and local.

BFigure 4 .Figure 5 .SciResFigure 6 .
Figure 4.The Z / n -n curves for two sequences 1 and 14.The curves are very similar in global and local and all <0.n z

Figure 7 .
Figure 7.The Z curves of microRNAs of human, Arabidopsis thaliana and virus.The curves are very similar in global.