How is the biological information arranged in genome ?

The four nucleotides (bases), A. T. G and C were sophisticatedly arranged in the structural features in a single-strand of genomic DNA, 1) reverse-complement symmetry of base or base sequences, 2) bias of four bases, 3) multiple fractality of the distribution of each four bases depending on the distance in double logarithmic plot (power spectrum) of L (the distance of a base to the next base) vs. P(L) (the probability of the base-distribution at L), regardless species, forms, genome-sizes and GC-contents. In small genomes such as viruses and plasmids, the multiple fractality might be occasionally hard to distinguish clearly with the power-low-tail region (multi-fractal dimension) because of the low base numbers. In this review article, the author showed that 1) the structural features for the biologically active genomic DNA were observed all living cells including the organelleand the viralgenome, 2) the potentiality of a new analytical method of the genome structure based on the appearance frequency, Sequence Spectrum Method (SSM) could be analyzed DNA, RNA and protein on genome, 3) the structural features of genome might be related the biological complexity. These findings might be useful extremely to understand the living cells, and the entire genome as a “field” of biological information should need to analyze.


WHAT IS GENOME?
"Genome" was referred to the DNA portion of chromosome composed of proteins and DNAs in living cells.The author would repeat that the molecular aspect for genome should be necessary to understand the living cells.Essentially the genomic DNA was carried biological information, a large number of genes might be present on genome, and each gene was transcribed to each mRNA and translated to each protein on ribosome by the "Central Dogma" [1].However, the presence of gene(s) was not quite enough to recreate the organisms because the sum of genes on genome might not be organized, therefore, they could not reproduce the living cells.Other parts, generally called the non-coding region of genome or chromosome needed to express the gene(s) precisely, rapidly and steady for the organism, i.e., the four bases composed of genomic DNA were arranged sophisticatedly on genome or chromosome for the creation of the organisms [2][3][4][5].Although most of scientists might be acceptable to the individual molecular events to understand the biological phenomena, in the case of the field of genome, it might become dim generally without notice.
Recent progresses of the genome biology in prokaryotic-and eukaryotic-cells including archaeas and viruses would be overshadowed to consider what the genome was.To understand the genome, the following reports might be suggestive.

Gene Repetition on Chromosome
The α-subunit (F 1 α) gene of F 1 F 0 -ATPase complex, ATP1 [6] was arranged to repeate three times with the adjacent DNA sequences on the left arm of chromosome II in the yeast Sacchromyces cerevisiae using the prime clones, 70113 and 70804 from ATCC.The same results of the base sequences of genomic DNAs were obtained from strains DC5, SEY2102, LL20, W303-1A and S288C in 1995 [7].Long-PCR analysis between three copies of ATP1s, ATP1a-ATP1b and ATP1b-ATP1c were revealed the difference the distances [8].We have also confirmed the ATP1 repetition by the base sequences of the 35-kb region of chromosome II repeated with the adjacent DNA sequences including ORFs [9].
ATP2 and ATP3 were respectively gene of β-(F 1 β), and γ-subunit (F 1 γ) of F 1 F 0 -ATPase complex in the yeast Saccharomyces cerevisiae, also revealed to repeat three and two times (twice) with the adjacent DNA sequences in the yeast Saccharomyces cerevisiae chromosomes X (right arm) and II (right arm), respectively, although the sizes (base numbers of repeated unit) were different from ATP1 [10,11] by the transformation of the homologous recombination [12].These plural genes (ATP1s, ATP2s and ATP3s) were all active, not pseudo genes.In addition, three ATP1s (ATP1a, ATP1b, ATP1c) and ATP2s (ATP2a, ATP2b, ATP3b) maintained respectively identical activity.
Recently, two ATP16s, ATP16a and ATP16b were expressed with different by RT-PCR [13], and effect differently on the mitochondrial DNA as those of ATP3s.From the results might be suggested these plural genecopies might be active roles in biological phenomena concerned to the energy-transformation of F 1 F 0 -ATPase complex in living cells [11,13].

Cloned Sheep, "Dolly"
Wilmut and his colleagues have been isolated the genomic DNA from a sheep and injected it into an egg of another sheep.After the pregnancy of the sheep, the first cloned sheep, "Dolly" was born by their study at 1996 [14].And then, "Dolly" could be to contribute the birth of the second cloned sheep, "Polly" and other cloned animals [15].A series of shocking studies by Wilmut and his colleagues showed that presumably the genomic DNA in the sheep might be carried biological information to be able to generate the individual of the organism although many unknown biological phenomena might be still remained.

Disagreement of Gene-and Chromosome-Scale
The discrepancies of the gene-scale experiments vs. the chromosome-scale experiments might be pointed out in some scientists [16,17].
In 2004, Olson, Reeves and their co-workers reported in Science that the mouse transformed the 5.4 Mb critical region of the Down syndrome genes in H. sapiens chromosome 21 has not be caused the disease.Whereas, the mouse transformed the region-deficient chromosome 21 was able to cause the Down syndrome.These results suggested that the expression of the genes for the Down syndrome might be affected the conformation of the region derived on the environmental, or the entire chromosome 21 structure, and a series of their reports for Down syndrome genes in chromosome 21 using mice were suggestive for the future functional genome analysis [18,19].It was not enough to analyze the genes only for the Down syndrome.Specifically, to cause the Down syndrome, other viewpoints as a molecule of the chromosome 21 might be necessary to express the syndrome.In other words, there might be possible that a gene could be expressed accompanying the neighboring DNA sequences and regulated differently the higher-ordered structure of genome caused by the large deletion of the base sequences of the chromosome 21.
Hereafter such discrepancy as the Down syndrome critical region (DSCR) between chromosome-and genetic-data may be grown with the higher-order structure of genome and the functions of the non-coding regions on genomic DNA.

MOLECULAR ASPECT OF DNA
Watson and Crick deduced that DNA has been a doublehelical structure with complementary and anti-paralleled strands [20] based on the equal amounts of adenine (A) and thymine (T), and guanine (G) and cytosine (C) by Chargaff [21], and the X-ray diffraction patterns of DNA fibers by R. Franklin and M. Wilkins [22,23].After that, Chargaff and co-workers also observed that a singlestrand of Bacillus subtilis DNA was held the same amount of A + T and G + C ([24], Chargaff's second parity-rule, 1968).About forty years later, the genome base sequences of many organisms described below were accelerated, and a bacterial genome (582,970 bp) was chemically synthesized based on Mycoplasma genitalium [25], although partial unreadable regions should be still remained in each genome.In addition, many studies of the DNA methylation, the modification of histone in the nucleosome structure and the effect on the transcription of the noncoding small RNA for epigenetics was carried out for the cell-developments and the cancer recently [26][27][28][29][30][31][32][33].In other words, the structural analysis of the DNA based on the entire genome base sequence should be necessary to understand living cells.To do this, we have shown to characterize the structural features of genomic DNA.
Genome projects were completed so far to obtain the base sequences of prokaryotic organisms such as Escherichia coli [34], Bacillus subtilis [35] and Synechocystis sp.[36] etc., and eukaryotic organisms, such as Saccharomyces cerevisiae [37], Caenorhabditis elegans [38], Drosophila melanogaster [39], Homo sapiens [40][41][42][43], and so many organisms [44][45][46].Following the progress of the first round of genome sequencing and functional analysis, genome projects would be accelerated by the analysis of the internal structure of the genome and its association with the biological processes of the living cells.The structural analysis of the entire genomic DNA based on the nucleotide (base) sequences was necessary to study the living cells.To do this, many studies were being performed from the viewpoint of protein function by proteome, transcriptome, and functional genomic analyses [47][48][49][50].
However, in a hold fast to the gene(s) on genome as a core even in the chromosome-scale variations of Homo sapiens and other species, the more precisely we were able to study the genes or the respective proteins, the more we stepped away from the living cells.Why? Presumably, one of the estrangements might resulted from a different recognition of the gene(s) on genome or chromosome which disregarded the genome organization of the living cells.As the Genome Project revealed, the base sequence in genomic DNA could be caught a glimpse of the dynamic and the flexible characters [2-11, 13,16-19], and an individual gene was an integral part of a genome.
There were many genes and the associated regulatory regions that were expressed, replicated, transcribed and translated into proteins, and all participate in biological phenomena.Individual gene, i.e., a protein to be converted throughout the gene, was a part of genome (Figure 1).Each gene could be converted to respective protein according to the maturation of mRNA and "Central Dogma" [1].They might be organized based on the support the other regions in chromosome, generally called, the non-coding region (space in Figure 1) for the regulation of the gene-expression in living cells as a biological system.If so, we should be to face up to the entire genome as a molecule, not only the coding region, but also the non-coding regions.The genome might be organized in the living cells as a biological system, including the coding-and the non-coding regions, which have grown with the passage of time.Therefore, we should first review the entire genome as a systematized molecule to understand the living cells.
Let us show you one example for S. cerevisiae; PHO2 was a gene coding a transcription factor, Pho2p regulating several genes like PHO5 with co-regulated with other transcription factor, Pho4p [51][52][53].It was well known that Pho2p was a cooperative interaction with Pho4p, and the literature [51] reported that the amino acids around S 230 of Pho2p played an important role concerning the interaction with Pho4p.Each interactive regions of Pho2p, Pho4p and Pho5p could not identified the coding sequences, but identified from the appearance frequency of the successive base sequences of the entire yeast genomic DNA [4,5].In other words, the analysis of PHO2 gene on chromosome IV could be studied for the regulation in phosphate metabolism; binds cooperatively with Pho4p to the PHO5 promoter; phosphorylation of Pho2p facilitates interaction with Pho4p in S. cerevisiae.
Other interaction of proteins could be identified from the same way (Sequence Spectrum Method, described later, refs.[4,5]).

RELATIONSHIP GENE AND CHROMOSOME
In prokaryotic cells including viruses and bacterio-phages, most regions of the genome were occupied in the coding regions, whereas in eukaryotic cells the coding regions were not so large in entire genome, and variable depend on the genome-sizes (base numbers composed of the genomic DNAs), for example, the coding regions was occupied only several percent (%) in H. sapiens genomic DNA [44,69].Furthermore, each gene on chromosome or genome was arranged in the order, the direction using either the Watson-strand or the Crick-strand on the transcription, and the distance to the both-sides genes (Figure 2).When changed one of these three characters of gene on genome, the order, the direction, the distance, the living cells were becoming different ones, for examples, the chromosomal translocation occurred [70][71][72], and they were forced to live the surroundings.Therefore, only the coding regions, i.e., the genes could not be explained over the biological phenomena in living cells, especially the eukaryotic cells (Table 1) [2].The genome is a "field" of genes.Gene with different color was different each.Genes on chromosome were arranged sophisticatedly with three features, order, direction and distance.

OPEN ACCESS
The genomic DNA might be also "a molecule with the aligned four bases, A, T, G, C, and with three dimensions" even if there was a huge.So, the large region was deleted, presumably they might become a molecule with different conformation affected the gene-expression and the activity to interact with the biological materials, bioorganic compound(s), protein(s), nucleic acid(s), sugar(s), fatty acid(s) or so on.To express the gene(s), the regulatory elements, the promoter (trigger), the SAR (scaffold), the insulator (boundary), the poly-A-signal (stability), ncRNAs (controller) etc on genomic DNA were all or some necessary [69,[73][74][75][76][77][78].Thus, both the coding-and the non-coding regions should be necessary to express gene(s) precisely, rapidly and steady to carry out the various biological phenomena under the variation of the surrounding conditions.
Therefore, to express a gene, upstream element(s) (base sequence) such as promoter and down stream base sequence(s) such as terminator should be necessary.These elements were essentially located far from the coding region.In mammalian cells, the regulator element of a gene was located over 10,000 nt from the start codon ATG [79,80].In addition, a protein translated the gene could be interacted with plural protein and genes [44-46, 81,82].These biological phenomena, i.e., the regulation might be increased according to the increase of the ratio of the non-coding regions on genomic DNA [69,78].Previously, we have shown that the base sequences of the genomic DNA would be prepared the fractality [2,3] and the homology of the sequence spectrum was closely associated with the interaction of transcription factor, Gal4p, and promoters of GAL1, GAL2, GAL7 and GAL10 using SSM [4,5].From these results, we could lead the conclusions that the sequence spectrum of a gene could be homologous with not only the sequence spectrum of the base sequence, but also the entire region of the elements to express the gene on genomic base sequence.
Four bases were arranged sophisticatedly on genome

STRUCTURAL FEATURES OF GENOMIC DNA (GENERATION RULE)
Some studies for the base-bias in genome were reported that the base ratio was localized in genome for the correlation of the function and the neighboring genes and sequences [83][84][85].There might be existed many genes in genome, for example approximately 6500 in S. cerevisiae, approximately 25,000 in H. sapiens, respectively [44,69,78].If all genes were cloned with different vectors, and inserted all of them to the appropriate cells completely, could the S. cerevisiae cells, or H. sapiens cells be restores as the originals?The answer might be "no", because all of the genes would be disordered, not be organized on the genome.They could not be expressed at precisely, rapidly and steadily as the originals under the surrounding conditions.The image of a genome might be that of a "field" composed of the four bases A, T, G and C, which were arranged to form genes (ORFs), regulatory sequences to express the gene(s), introns, SINE, LINE, ncRNAs, and so on [2,69,[73][74][75][76][77][78].Each gene on a genome was a) ordered in each organism, b) transcribed using either the Watson strand or Crick strand, and c) located at a certain distance from next gene of both sides (Figures 1, 2).In addition, it might depend on the number and size of intervening sequences (introns, black-box in Figure 1).
Using the data-bases of NCBI [44], Sanger Institute [45], SGD [46] and MIPS [86] were useful to analyze, following structural features were revealed in a singlestrand of genomic DNA, 1) reverse-complement symmetry of base or base sequences, 2) bias of four bases, 3) multiple fractality of the distribution of each four bases depending on the distance.Thus, these three structural features of the base sequences should be exited simultaneously in each single-strand of the active genomic DNA [2,3].
Surprisingly, these structural features (1)-( 3) were able to refer to viral-and organelle-genomes [3], although the sizes (nucleotide numbers) of viral-and organelle-genome were extremely smaller than those of prokaryotic-and eukaryotic-genomes [44].In other words, these three structural features of a single-strand DNA (or RNA-genome in some viruses) of genome could be identified the interactive regions for DNA-DNA, proteinprotein, DNA-protein and protein-RNA.Therefore, such analyses should be common and useful in living cells [4,5].

The Genome Base Sequence Was
Reverse-Complement Symmetry Even in a Single-Strand of DNA Genomic DNA was composed of four different bases, A, T, G and C. The base number (nt) and GC contents of each genome and chromosome for S. cerevisiae and several other organisms were calculated and were shown in Table 1.The number of base A was equal to the number of T, and that of G was equal to that of C in each genome and chromosome even in a single strand of DNA.The symmetry of a single-strand of DNA exactly would agree with Chargaff's second parity-rule [24].The results also indicated that a single-stranded genomic DNA might sometimes had a closed structure with partial hydrogenbonding (stem-loops) as seen with RNA secondary structure [55][56][57][58].
To demonstrate the base-symmetry in a single-strand of DNA more precisely, we had calculated the frequency of appearance of various numbers of successive base sequences in an entire genome.The appearance frequencies of three successive base sequences corresponded to the species-dependent genetic codon (triplets) [2,3,66,87], which in turn could be corresponded to the 20 amino acids.The sum of the appearance frequency of 64 triplets (sequences) in the 16 chromosomes and mitochondrial (mt) DNA of the S. cerevisiae genome was shown in Table 2(a).The sum of the appearance frequency of all of the triplets was 12,155,004.
In the protein-protein interaction such as in the coding regions for the RecA protein and adenine-nucleotide binding proteins [88,89], and the mitochondrial targeting signal of mitochondrial proteins [6,90,91], apparently different amino acid residues might be able to speculate to make the same functional conformation within a molecule, specifically to gain the same building blocks in the molecule [4,5].
These results indicated that genome base sequences had a high-level of the reverse-complement symmetry even in a single-strand of DNA.Thus, the reverse-complement symmetry in a single-strand of DNA was observed not only in the ratio of the single bases (A/T, G/C) as proposed by Chargaff et al. [24], but also in the ratio of the 1 -9 successive base sequences to their reversecomplement base sequences in the genome [2,3].
The reverse-complement symmetry of 10 -12 successive base sequences could be observed in huge genomes such as Vertebrate genomes, H. sapiens (22 chromosomes + 2 sex chromosomes + mtDNA), M. musculus (19 chromosomes + 2 sex chromosomes + mtDNA) and so on [2].In other words, a single-strand of DNA of the genome, or each chromosome might essentially maintain the reverse-complement symmetry of the base sequences ecessary to generate many double-helical stems in a n Table 2. Appearance frequencies of three successive base sequences (64 triplets) in a single-strand of DNA from the S. cerevisiae and SV40 genome.(a) Appearance frequencies of three successive base sequences (genetic codon) of S. cerevisiae genome.In S. cerevisiae genome, we have calculated the sum of the base frequencies of the 16 chromosomes (in numeric order) plus mtDNA; (b) Appearance frequencies of three successive base sequences of SV40 (5243 nt) was shown.Each triplet base can be read left (5'-) to right (3'-).Amino acids (in parentheses) corresponding to the triplets are expressed in single letter codes.genomic DNA molecule.

The Genome Bases Were Localized
We calculated the distribution of the bases in S. cere-visiae chromosome IV and H. sapiens chromosome 22 (Figures 3(a), (c)).The counterfeit sequences (artificial chromosome IV) with the same appearance frequencies of the triplet (3 successive base sequences), the same molar ratio of four bases, and the same base numbers (genome-size) were created using the random number as those of the real sequence in each chromosome.Four bases were localized on each real chromosome of each genome (Figures 3(a), (c)), whereas they were distributed uniformly on the artificial chromosome IV or 22 (Figures 3(b), (d)).In contrast to the uneven distribution of four bases on the real chromosome IV (S. cerevisiae) or chromosome 22 (H.sapiens), the "A", "T", "G" or "C" frequencies in each artificial chromosomal sequence were distributed uniformly.The distributions of the "G" and "T" bases had the same characteristics.
These results indicated that there might be many A-T and G-C hydrogen bonding in a single-strand DNA of intra-chromosomal molecules regardless eukaryotes or prokaryotes.In addition, each artificial genome or chromosome could observe the reverse-complement symmetry, but the four bases were distributed uniformly, corresponding with the same molar contents, A to T and G to C, in the genomic DNA molecule [2,3].

The Genome Bases Had Multiple Fractality
The real chromosomes had the base-symmetry (the reverse-complement symmetry) as well as the base bias, whereas the artificial chromosomal sequences had only the reverse-complement symmetry, but not the base bias.We could not find any Open Reading Frames (ORFs) in the artificial chromosomes (data not shown).Based on the above results, how are the four bases, A, T, G and C placed on a single-strand of DNA in a genome?In order to understand this issue we investigated the fractality characteristics of the real chromosomes and the artificial chromosomes based on the distribution of the base distance (L).Each base-distribution curve P(L) expresses the distribution of the distance L between a base and the next base, for the base "A", the L-value was corresponded the base numbers from "A" to the next "A" in the genomic DNA, and P(L) is the sum of the probability with the same base-distance in the genomic DNA [2,3].
A simple distinction of uni-fractality or multi-fractality of the base distribution in a sequence was determined by fractal analysis by calculating the power law in log P(L) vs. logL.When the plot of log P(L) vs. log L gave an exponential curve, the fractality is uni-fractal; in contrast, when the plot of log P(L) vs. logL gave a straight line, the fractality was multi-fractal in double logarithmic plot (power spectrum) of L (the distance of a base to the next base) vs. P(L) (the probability of the base-distribution at L). Details of the calculation of P(L) were described in the manuscript [2,3].
Figure 4(a) showed the distribution curve of adenine bases "A" in the S. cerevisiae genome.When the L-value was 1 through 58, the distribution curve P(L) varied according to the L-value, but was not fitted to an exponential equation (data not shown).
Then we partitioned the "L" values.When the Lvalue was 1 through 15, the distribution curve P(L) of adenine (A) was fitted to an exponential Equation, y = ae -bx (Eq.1, x = logL, y = logP(L); a and b are constant).In the case of adenine "A" in the S. cerevisiae genome, the a-and b-values were calculated from Eq.1 as 0.3837 and 0.3446, respectively (Figure 4(a), black square ■).
In contrast, when the L-value was greater than 16, P(L) The identification of multiple fractality in the base(s) in the S. cerevisiae genome was confirmed by the f(α) spectrum [92,93].When f(α) varied as a function of α, the fractality must be multi-fractal (red diamond ♦); in contrast, when f(α) was constant at the α-value, the fractality must be uni-fractal (black square ■) [2,3].
The other three bases, thymine "T", guanine "G", and cytosine "C" in the S. cerevisiae genome also behaved in a similar manner as "A", with multiple fractality at the boundary of the L-value.In addition, the a-and b-values of A and T, and G and C were identical.These fractal characteristics of a single-strand of DNA of the genome were also obtained for other species [2,3].
In contrast, in the artificial genome sequences, neither the bias of four bases on the genomes nor the multiple fractality were observed in the base(s) regardless of the distance in the base distribution (L-value = 16 or more).Thus, the bases of the counterfeit (artificial) sequence of genomes were distributed only uni-fractal even when L was more than 16, and the multiple fractality of the base sequences was not observed throughout the sequences, although the base numbers (nt) and the appearance frequencies of the base sequences were the same in each real (active) genome.
Many studies using a part of genomic DNA of E. coli and other model DNA sequences were reported that genomic DNA carried a fractality [94][95][96][97].But, these studies might be analyzed based on the prokaryotic genomes, because the fractality of large genome such S. cerevisiae and H. sapiens genomes has not been analyzed yet in those days, and the multiple fractality might not be observed in the literatures previously published.
It should be noted that because in small genomes such as virus or eubacteria, the base numbers of the genome were not so many, and the reverse-complement symmetry was low little.Therefore, because the power-law-tail region was short, the multiple fractality might be hard to observe the genome at the same partition of the L-value (Table 2(b), Figure 4(b), SV40 = 5243 nt).However, essentially, all genomes or chromosomes might be maintained these three structural features, symmetry, bias, multiple fractality regardless of the size (the base number).In the case of SV40, it was better than to be partitioned at 10 of the L-value (Figure 4(b)) [3].
These three structural features of the single-strand DNA of genomes were able to observe only in the real (active) genome, but not observed in the individual gene, the short DNA or the random-ordered DNA such as the artificial chromosomes of the genome.When these three structural features were co-existed, the gene(s) on the genome could be able to express, and the resulted product(s) might be functioned timely and properly in the living cells.The bases of genomes were not placed randomly, but seem to be placed sophisticatedly by the generation-rules as a single-strand of genomic DNA in the individual living cells.It might be possible that two such structural-featured in a single-strand DNAs above described could be assembled to result in the anti-paralleled, complementary, double-strand DNA as we have known.
The structural features of a single strand of genomic DNA might have implications that affect DNA replication, transcription, translation, as well as other biological processes because the information might be present in genome base sequence [2,3].
Previously, Crick and his co-workers proposed a question about DNA structure [98,99].They presented data to show that the base-sequence of the DNA was necessary to understand the detailed structure of DNA.Now we could speculate about the detailed structure of DNA molecules because the complete base sequences of several genomes were available.
Essentially, the reverse-complement symmetry in the base sequence should be observed anywhere on a singlestrand of DNA in a genome.The base symmetry in a single-strand of DNA in a genome was observed; in other words, the DNA might be able to be closed, and able to make stem-loop structures.Previously, the biological role of the non-coding sequences and stem-loop structures was discussed [42][43][44][45][46]. Now, the genome sequences of many organisms had been revealed, and we should analyze the genome to understand living organisms.
Therefore, to understand biological phenomena in living cells, we needed new approaches to analyze genomes including both the coding-and the non-coding region as a large intact molecule.
Based on the above structural features of the genomic DNA, the Sequence Spectrum Method (SSM) was developed and proposed [4,5].The SSM was a new analytical method of the entire genome based on the appearance frequencies of the nucleotides (bases) sequence of genome.

A NEW ANALYTICAL METHOD OF GENOME TO UNDERSTAND ORGANISMS
As described above, the four bases, A, G, C and T would be arranged sophisticatedly on genomic DNA [2,3].The SSM was an analytical method of genome structure calculated the appearance frequency of the key base sequence in genome according to the structural features of the genome.A series of our studies on the analysis of the biological information, we had demonstrated that genomic DNAs were also arranged sophisticatedly in the structural features in a single-strand with 1) reversecomplement symmetry of base or 1 -12 base sequences (Tables 1, 2), 2) bias of four bases (Figure 3), 3) multiple fractality of the distribution of each four bases depending on the distance in double logarithmic plot (power spectrum) of L (the distance of a base to the next base) vs. P(L) (the probability of the base-distribution at L) (Figure 4), although their genomes were composed of low numbers of the four bases, and the base-symmetry was rather lower than the prokaryotic and the eukaryotic cells including virus genomes (2 -5).
The outline of SSM was as follows (2,3).A given DNA base sequence in the genome could be converted to a spectrum based on the appearance frequencies of a given successive base sequences (key-sequence, d-value) (Figure 5).The key-sequence of Figure 5 was used three successive base sequence (triplet = genetic codon).In the figure, the vertical parameter of the sequence spectrum f si was not designated, and it was scaled properly because the shape of the sequence spectrum only makes sense in this manuscript.The horizontal parameter was the base sequence number i (I = 1, •••, M), and it was also omitted in the following figures because it was easily derived from the base sequence size M.
Controllable parameters in the sequence spectrum were the base size d of the key sequence, the average width m, and the size factor p (skipped base numbers).The parameter d determines the highest resolution for extracting the structural features of the base sequence.In this report, we used the key sequence as d = 3 (appearance frequency table of triplet, Table 2) for numerical experiments of the homologous structure discussed in the following sections.
In Figure 6, m = 10 (Figure 6(a)), mitochondrial targeting signal of F 1 α), m = 60 for a gene (Figure 6(b)), and m = 8000 for a chromosome (Figure 6(c)).The size factor p was adjusted to the base sequence size especially when the homology factor between a small reference and a large target was calculated.The possible appearance frequencies f i of key sequences k i were calculated for the entire set S in advance.The appearance frequency table depended on the entire set S, and in general S was the genome of the target species.
The key base sequence should be usually selected out of 1 -12 successive base sequences according to the base numbers of the genomic DNA, and the biological phenomena.Three successive base sequences (d = 3, same as genetic codon) were used in the manuscript.As the sequence spectrum was homologous with the related region, the SSM could be applied the interaction with the base sequences of DNA [4,5].Although its appearance was not aligned each the amino acid sequences, their spectra of the base sequences were homologous, and to be able to identify their interactive regions by SSM [2][3][4][5].These analyses might presumably be related to the reverse-complement symmetry of the genome base sequences, and under the progress.
As the SSM faithfully reflected the biological information [1], the conservation of the bases sequences of genomic DNA was also conserved in the translated quences (d = 3) correspond directly to the species-dependent genetic codes (triplets) [2,3,66], which in turn correspond to the 20 amino acids.Because any genomic DNA had fractality (Figure 7) [2][3][4][5], any genes and chromosome could be analyzed by skipping base(s) optionally regardless the base numbers (DNA size).Therefore, the SSM could be useful to analyze not only the small virus-, and eubacteria-genomes, but also the huge chromosome such as mammalian and primates.The biological phenomena might be reflected to the appearance frequency of the bases of genome.On considering so, various biological phenomena, especially the interactive-site of protein-protein and protein-DNA were analyzed using the SSM.The molar ratio of the four bases, A, T, G, C of genomic DNA in one organism was constant.
Based on the appearance frequency of the key sequences of the genome, and the structural features of genomic DNA, any DNA sequences on the genome could be expressed optionally as a sequence spectrum with the adjoining base sequences, which could be used to study the corresponding biological phenomena [2][3][4][5].

COMPLEXITY THROUGHOUT THE GENOME
The genome was a "field" of the various genes as described above.In prokaryotic cells, the genes were very crowded on the field; in addition, the intergenic region was smaller, and multiple fractality was hard to be observed, specifically the multi-fractality was hidden behind the uni-fractality.In prokaryotic cells, most of the genome was occupied the coding regions, whereas in eukaryotic cells the field was large, the genome was amino acids sequence of the protein (Figure 6).Therefore, the SSM could be applied to identify the interactive region of DNA to DNA, or DNA to protein [2][3][4][5].The appearance frequencies of three successive base se- composed of a great number of bases, and the distance between genes was large.As a result, the multiple fractality, both the uni-fractality and the multi-fractality were clearly observed in the genome.The non-coding regions of the genome were composed of promoter, MAR, insulator, poly(A) signal sequence, SINE, LINE, ncRNA, intron and so on [69,[73][74][75][76][77][78].These elements were known as regulation of the gene-expression for the biological phenomena.The more complex the organisms were, the more the non-coding regions might be in genome [69,78].
In genome, including these regulatory elements of the gene-expression, the base sequences of the genomic DNA would be maintained the structural features, the reverse-complement symmetry, the bias, and the multiple fractality in a single-strand [2][3][4][5].
We could be tried to approach the studies targeted to the entire genome based on the appearance frequencies of the bases in genome, in other words, how to use the base sequence in genome.We have studied many, including the eukaryotes, prokaryotes, organelle and viruses, genome sequences down-loaded from the data bases like NCBI [44] and so on.We have calculated the base frequencies of the chromosomes in numeric order when there were several chromosomes in one organism.In addition, the reason for using chromosome in H. sapiens, the personal computer could not be calculated the sum of chromosomes 1 -22, X and Y because of the limited capacity.
The genome data were drafted as described above, but most of the unreadable area was very small part compared with the huge entire chromosome.So, when there was unreadable region in chromosome, we could skip the region to calculate the base frequencies of the chromosome or genome because the unreadable region of each chromosome was small number of bases to neglect in comparison to large number of genomic DNA.The complexity of the organisms might be dependent on the capacity of the non-coding region in the entire genome.

Figure 1 .
Figure 1.Structure of Genomic DNA.Gene on genome was transcribed using W-strand (left to right) or C-strand (right to left).

Figure 2 .
Figure 2.The genome is a "field" of genes.Gene with different color was different each.Genes on chromosome were arranged sophisticatedly with three features, order, direction and distance.

Figure 3 .
Figure 3. Difference of the frequency of four nucleotides (bases), A, T, G, C in S. cerevisiae chromosome IV, Homo sapiens chromosome 22 and their counterfeit sequences (artificial chromosomes).(a) Distribution of four bases, A (indigo), T (red), G (yellow), C (blue) of S. cerevisiae chromosome IV (real chromosome IV, top); (b) Its counterfeit sequence (artificial chromosome IV, bottom) consisting of 1,531,927 bases each generated as described in Method section.The vertical axis is the frequency of four bases in the base sequence of S. cerevisiae chromosome IV.The horizontal axis is the base number from the top (5'-end) of every 10,000 nt (window width = 10,000 nt) the base sequence of S. cerevisiae chromosome IV; (c) Distribution of four bases, A (indigo), T (red), G (yellow), C (blue) of Homo sapiens chromosome 22 (real chromosome 22, top) and the counterfeit (artificial chromosome 22, bottom); (d) The counterfeit (artificial, bottom) of Homo sapiens chromosome 22.The window width of the Homo sapiens chromosome 22 is every 400,000 nt because chromosome 22 is large (33,476,901 nt).

Figure 4 .
Figure 4. Distribution curve of adenine (A) and the fractal analysis in the S. cerevisiae and Simian virus 40 genome.(a) S. cerevisiae genome (16 chromosomes plus mitochondrial DNA = 12,155,038 nt).L = 1 -15, f(α)-value of the region shown the Eq.1 of the distribution curve of adenine bases, black square ■; L = 16 -58, f(α)-value of the region shown the Eq.2 of the distribution curve of adenine bases, red diamond ♦; (b) Simian virus 40 genome (5243 nt).L = 1 -10, f(α)-value of the region shown the Eq.1 of the distribution curve of adenine bases, black square ■; L = 11 -23, f(α)-value of the region shown the Eq.2 of the distribution curve of adenine bases, red diamond ♦.The x axis (L) expresses the distance of a base(s) from the next appearance of the base in the genome sequence, and the y axis P(L) expresses the probability of the distribution function of the base(s) in the genome base sequence.W: The intercept of the tangent to the curve gives the value of y.

Figure 6 (
a) was shown the sequence spectrum of the mitochondrial targeting signal portion deduced from the base sequence of ATP1.

Figure 6 (
c) was shown the sequence spectrum of S. cerevisiae chromosome II, i.e., Figures 6(a) and (b) could be compared about the F 1 α, protein, whereas, Figures 6(b) and (c) could be compared about the ATP1 and chromosome II.In other words, the base sequence on genome could be analyzed irrespective protein, DNA and RNA in SSM.

Figure 6 .
Figure 6.Homologous structure existed in the functionalregion in gene, protein (translated by the gene) and entire chromosome in the genome.The homology of the sequence spectrum was closely associated with the interaction of protein and DNA.(a) The base sequence of ATP1 gene (111 nt) translated the mitochondrial targeting signal of F 1 F 0 -ATPase complex α subunit (F 1 α), m = 10, p = 1; (b) ATP1 gene (1638 nt) was the structural gene of F 1 α subunit, m = 60, p = 7; (c) Chromosome II, m = 8000, p = 100.ATP1 gene was located in the left arm (83,200 nt.ca.35,000 nt far from the left telomere) of chromosome II in the yeast Saccharomyces cerevisiae.The d-value (keysequence) was used as 3 in this manuscript.

Figure 7 .
Figure 7. Interaction based on the structure (base sequence) of genome.(a) Gene (protein) vs. chromosome; homologous structure between each interactive region according to the self-similarity (fractality) of the base sequence even the base sequence with the sizedifference; (b) Gene A (protein A = reference) vs. gene B (protein B = target); the interactive region (red area) of a gene (a protein) ere showed the homologous sequence spectrum of the base sequence appeared on the genomic DNA.w

Table 1 .
Appearance frequency of four nucleotides (bases), A, T, G and C in a single-strand of DNA from various genomes.