When nucleotide (G, C, T and A) contents based on
complete genomes are plotted against the content each
nucleotide among various organisms, their relationships
can clearly be expressed by a linear formula, y = ax + b,
where y and x represent nucleotide contents, and “a” and
“b” are constants. These constant values differ between
the coding and non-coding regions. This linear relation-
ship is obtained from the complete single-stranded DNA
forming the nuclear genome [11,67]. The values of “a”
and “b” in either coding or non-coding region differ
slightly among kingdoms, such as bacteria, archaea and
eukaryotes [11]. Thus, nucleotide alternations are gov-
erned by slightly different rules among different king-
doms. Among these linear regression lines, the constant
value “b” has never been zero, and the regression coeffi-
cients have never been one. This confirms that the for-
mulas differ from Chargaff’s formulas, while differences
in regression lines among different kingdoms are the
results of biological divergence.
As the relationships between two nucleotide contents
are expressed by linear experimental formulas among
various organisms, the determination of any one nucleo-
tide content can essentially allow the estimation of all
four nucleotide contents. In addition, because the rela-
tionships between nucleotide content and 64 codon us-
ages are also governed by linear formulas, the 64 codons
in the coding region can be estimated from the content of
just one nucleotide (Figure 7).
In mitochondria and chloroplasts, nucleotide alterna-
tions are also expressed by similar linear formulas with
Figure 7. Codon usage patterns and amino acid compositions of Homo sapience. Codon usage (bar) and
amino acid composition (radar chart) were expressed by percent of total codons and amino acids, respec-
tively. Upper and lower panels represent genomic and estimated data, respectively. These figures were re-
produced from Kenji Sorimachi and Teiji Okayasu. (2008) Codon evolution is governed by linear formu-
las, Amino Acids, 34, 661-668.
K. Sorimachi / Natural Science 1 (2009) 107-119
Copyright © 2009 SciRes. OPEN ACCESS
Figure 8. Correlation of G content to C
content in various organisms based on their
complete genomes. Red, blue and green
symbols represent 112 bacter©ia, 15 ar-
chaea and 18 eukaryotes, respectively.
Each line was drawn computationally. This
figure was reproduced from Kenji Sorima-
chi and Teiji Okayasu. (2008) Codon evo-
lution is governed by linear formulas,
Amino Acids, 34, 661-668
slightly different constant values representing the slope
and its intercept [12]. All nucleotide alternations in nu-
clei, mitochondria and chloroplasts are expressed by
linear formulas with different constant values resulting
from organelle characteristics among various organisms.
Namely, a certain nucleotide content “y” can be ex-
pressed inter-species by linear formulas, y = ax + b,
based on a single nucleotide content “x”. Among four
equations presenting four nucleotide contents after nor-
malization, the summation of the value of the slope, “a”,
is zero and that of the value of constant, “b”, is one [11].
This relationship is mathematically definitive and inde-
pendent of the co-relationships among four nucleotide
contents. Chargaff’s parity rules, G/C = 1, A/T = 1, (A +
G)/(C + T) =1, are alternated as follows: G = G, C = G, T
= – G + 0.5, and A = – G + 0.5. Thus, Chargaff’s parity
rules, even those governing single species DNA, are
derived from the general formula, y = ax + b, when slope,
“a” of the two equations’ is 1 or – 1, and when the inter-
cept, “b”, is 0.5 or 0 in the equation with – 1 and 1, re-
spectively, as the “a”. On the other hand, the values of
“a” and “b” in both codon evolution [11] and organelle
evolution [68] shifted from 1 or – 1 and 0.5 or 0, respec-
tively because of biological divergences, and the regres-
sion coefficient also shifted from one. The shift of the
regression coefficient from one represents biological
It has been thought that cellular organelle such as mi-
tochondria [68] and chloroplasts [69] were derived dur-
ing biological evolution from protobacteria and cyano-
bacteria, respectively, and that their evolutionary proc-
esses appear different from nuclear genome evolution
as mentioned above. In addition, it is known that muta-
tion rate is remarkably high in mitochondrial DNA [70].
In our study, amino acid compositions of chloroplast and
plant mitochondria resemble those of nuclear DNA,
whereas those of vertebrate mitochondria differ from
those of other organelle [12]. Particularly, the content of
Leu was extremely high in animal mitochondria [12].
Comparing the shapes of the radar charts based on
amino acid compositions, that of the ancient fish, the
coelacanth (Latimeria chalumnae), more closely resem-
bles those of salamanders and birds compared than those
of other fish (Diodon holocunthus) [12]. In further study,
using multivariate analysis based on amino acid compo-
sitions, lung fish (Neoceratodus forsteri) and coelacanth
were both found to belong to the cluster representing a
reptile; a cluster separated from that one representing
other fish (carp, rainbow trout and killifish). These re-
sults are consistent with the already established phylo-
genic concept.
The apparent great divergence of Homo sapiens from
bacteria can be expressed by linear formulas with small
turbulences based on the complete genome in biological
evolution. Thus, biological evolution seems to be ob-
served as a result of mere nucleotide substitutions based
on simple mathematical principles, while natural selec-
tion affects species preservation after nucleotide alterna-
tions. This conclusion is consistent with the idea that
evolution is based on neutral mutation [71,72]. There-
fore, natural selection does not directly regulate nucleo-
tide substitutions, but is indirectly involved in biological
The present paper reveals that the analytical method us-
ing the ratios of the numbers of amino acids present to
the total numbers of amino acids presumed from the
whole genome, or those of the numbers of nucleotides
present to the total numbers of nucleotides in the whole
genome is useful for genome research, as well as meth-
ods using the sequences of amino acids or nucleotides.
These ratios based on nucleotide sequences can exclude
deviations in certain calculations. The fact that genome
structures regarding amino acid compositions or codon
usages are homogeneous makes it possible for us to
compare various genomes with different sizes and genes.
Namely, a large data set obtained from the complete ge-
nome can be expressed by just a simple point on a graph.
Thus, using the ratios of amino acids or nucleotides to
their total numbers seems to be an excellent method for
genome research based on extremely huge data sets. In
K. Sorimachi / Natural Science 1 (2009) 107-119
Copyright © 2009 SciRes. OPEN ACCESS
addition, even a certain size of gene assembly can be
used instead of the complete genome for limited pur-
In prebiotic evolution, amino acid composition might
have been the strongest factor determining the charac-
teristics of biopolymers used for the establishment of
primitive life forms, whereas since the establishment of
the codon system, biological evolution has been carried
out by nucleotide alternations expressed by linear for-
mulas based on nucleotide contents, as shown in Figure
8. Thus, 64 codon usages can be estimated from just one
nucleotide content (Figure 7), and the characteristic
amino acid composition is expressed by the “star-shape”
(Figures 1-7), not only in cell analysis, but also in ge-
nome analysis. This fact strongly suggests that this
“star-shape” may be conserved in both primitive life
forms and future organisms, because all organisms must
be governed by universal rules on earth, without excep-
tion. Thus, this amino acid composition represented by
the “star-shape” may reflect the “Amino Acid World”.
We, Homo sapiens, stand merely in the middle of a
line (Figure 8). We are not the end of line, nor do we
have an “ultimate” status. Therefore, we have been and
will be exposed to natural selection without exception.
The author expresses his thanks to Professor Kuo-Chen
Chou, Chief-in-Editor of Natural Science, for the oppor-
tunity to write this review; to Professor Hiroto Naora,
Research School of Biological Sciences, Australian Na-
tional University; Professor Makoto Miyaji, Chiba Uni-
versity, and Dr. Emiko Furuta, Institute of Comparative
Immunology, for encouragement given in respect of the
author’s genome research, to Dr. Teiji Okayasu, Dokkyo
Medical University, for help with computer analysis of
genomic data, and to Dr. Kazumi Akimoto, Dokkyo
Medical University for taking care of cell cultures.
[1] Sanger, F. and Thompson, E.O. (1953) The amino acid
sequence in the glycyl chain of insulin. I. The identifica-
tion of lower peptides from partial hydrolysates. Biochem.
J., 53, 353-366.
[2] Sanger F. and Thompson, E.O. (1953) The amino acid
sequence in the glycyl chain of insulin. II. The investiga-
tion of peptides from enzymic hydrolysates. Biochem. J ,
53, 366-374.
[3] Sanger, F. and Coulson, A.R. (1975) A rapid method for
determing sequences in DNA by primed synthesis with
DNA polymerase. J. Mol. Biol., 94, 441-446.
[4] Maxam, A.M. and Gilbert, W. (1977) A new method for
sequencing DNA. Proc. Natl. Acad. Sci., USA 74,
[5] Zuckerkandl, E. and Pauling, L.B. (1962) Molecular
disease, evolution, and genetic heterogeneity in Kasha M
and Pullman B (editors). Horizons in Biochemistry, Aca-
demic Press, New York, 189-225.
[6] Sorimachi, K. (2009) A proposed solution to the historic
puzzle of Chargaff’s second parity rule. Open Genom. J.,
2, 12-14.
[7] Chou, K-C. and Zhang, C.T. (1992) Diagrammatization
of codon usage in 339 HIV proteins and its biological
implication. AIDS Research and Human Retroviruses, 8,
[8] Zhang, C-T. and Chou, K-C. (1993) Graphic analysis of
codon usage strategy in 1490 human proteins. J. Prot.
Chem., 12, 329-335.
[9] Sorimachi, K. and Okayasu, T. (2004) An evolutionary
theories based on genomic structures in Saccharomyces
cerevisiae and Enchephalitozoon cuniculi. Mycoscience ,
45, 345-350.
[10] Sorimachi, K. and Okayasu, T. (2007) Genomic structure
is homogeneous based on codon usages. Curr. Top. Pep.
Protein Res., 8, 19-24.
[11] Sorimachi, K. and Okayasu, T. (2008) Codon evolution is
governed by linear formulas. Amino Acids, 34, 661-668.
[12] Sorimachi, K. and Okayasu, T. (2008) Universal rules
governing genome evolution expressed by linear formu-
las. Open Genom. J., 1, 33-43.
[13] Chou, K-C. (1983) Advances in graphical methods of
enzyme kinetics. Biophys. Chem., 17, 51-55.
[14] Chou, K-C. (1989) Graphical rules in steady and
non-steady enzyme kinetics. J. Biol. Chem., 264, 12074-
[15] Chou, K-C. (1990). Review: Applications of graph the-
ory to enzyme kinetics and protein folding kinetics.
Steady and non-steady state systems. Biophys. Chem., 35,
[16] Chou, K-C. (1993) Graphic rule for non-steady-state
enzyme kinetics and protein folding kinetics. J. Math.
Chem., 12, 97-108.
[17] Lin, S.X. and Neet, K.E. (1990) Demonstration of a slow
conformational change in liver glucokinase by fluores-
cence spectroscopy. J. Biol. Chem., 265, 9670-5.
[18] Zhou, G.P. and Deng, M.H. (1984) An extension of
Chou's graphical rules for deriving enzyme kinetic equa-
tions to system involving parallel reaction pathways.
Biochem. J., 222, 169-176.
[19] Althaus, I.W., Chou, J.J., Gonzales, A.J. et al. (1993)
Kinetic studies with the nonnucleoside HIV-1 reverse
transcriptase inhibitor U-88204E. Biochemistry, 32,
[20] Chou, K-C., Kezdy, F.J. and Reusser, F. (1994) Review:
Steady-state inhibition kinetics of processive nucleic acid
polymerases and nucleases. Anal. Biochem., 221, 217-
[21] Qi, X.Q., Wen, J. and Qi, Z.H. (2007) New 3D graphical
representation of DNA sequence based on dual nucleo-
tides. J. Theoret. Biol., 249, 681–690.
[22] MacGregor, I.M., Truswell, J.F. and Eriksson, K.A.
(1974) Filamentous alga from the 2,300 m.y. old Trans-
vaal Dolomite. Nature, 247, 538-539.
[23] Nagy, L.A. and Zumberge, J.E. (1976) Fossil microor-
ganisms from the approximately 2800 to 2500 million-
year-old Bulawayan stromatolite: Application of ultrami-
K. Sorimachi / Natural Science 1 (2009) 107-119
Copyright © 2009 SciRes. OPEN ACCESS
crochemical analyses. Proc. Natl. Acad. Sci. USA, 73,
[24] Schopf, J.W., Barghoorn, E.S., Maser, M.D. and Gordon,
R.O. (1965) Electron microscopy of fossil bacteria two
billion years old. Science, 149, 1365-1367.
[25] Johanson, D.C. and Taieb, M. (1976) Plio-Pleistocene
hominid discoveries in Hadar, Ethiopia. Nature, 260,
[26] Watson, J.D. and Crick, F.H.C. (1953) Genetical implica-
tions of the structure of deoxyribonucleic acid. Nature,
171, 964-967.
[27] Sueoka, N. (1961) Correlation between base composition
of deoxyribonucleic acid and amino acid composition in
proteins. Proc. Natl. Acad. Sci. USA, 47, 1141-1149.
[28] Sorimachi, K. (1999) Evolutionary changes reflected by
the cellular amino acid composition. Amino Acids, 17,
[29] Sorimachi, K., Itoh, T., Kawarabayasi, Y., Okayasu, T.,
Akimoto, K. and Niwa, A. (2001) Conservation of the
basic pattern of cellular amino acid composition during
biological evolution and the putative amino acid compo-
sition of primitive life forms. Amino Acids, 21, 393-399.
[30] Sorimachi, K., Okayasu, T., Akimoto, K. and Niwa, A.
(2000) Conservation of the basic pattern of cellular
amino acid composition during biological evolution in
plants. Amino Acids, 18, 193-196.
[31] Sorimachi, K. (2002) The classification of various or-
ganisms according to the free amino acid composition
change as the result of biological evolution. Amino Acids,
22, 55-69.
[32] Woese, C.R. (1965) Order in the genetic code. Proc. Natl.
Acad. Sci. USA, 54, 71-75.
[33] Crick, F.H.C. (1968) The origin of genetic code. J. Mol.
Biol., 38, 367-379.
[34] Wong, J.T-F. (1975) A co-evolutionary theory of the
genetic code. Proc. Natl. Acad. Sci. USA, 72, 1909-1912.
[35] Lahav, N., White, D. and Chang, S. (1978) Peptide for-
mation in the prebiotic era: thermal condensation of gly-
cine in fluctuating clay environments. Science, 201,
[36] Sorimachi, K. and Okayasu, T. (2007) Mathematical
proof of the chronological precedence of protein forma-
tion over codon formation. Curr. Top. Pep. Protein Res.,
8, 25-34.
[37] Miller, S.L. (1953) A production of amino acids under
possible primitive earth conditions. Science, 117, 528-
[38] Kvenvolden, K., Lawless, J., Pering, K., Peterson, E.,
Flores, J., Ponnamperuma, C., Kaplan, I.R. and Moore, C.
(1970) Evidence for extraterrestrial amino-acids and hy-
drocarbons in the Murchison meteorite. Nature, 228,
[39] Wolman, Y., Haverland, W. and Miller, S.L. (1972) Non-
protein amino acids from spark discharges and their
comparison with the Muchison meteorite amino acids.
Proc. Natl. Acad. Sci. USA, 69, 809-811.
[40] Sorimachi, K. and Ui, N. (1975) Ion-exchange chroma-
tographic analysis of iodothyronines. Anal. Biochem., 67,
[41] van der Walt, B, Cahnmann, H.J. (1982) Synthesis of
thyroid hormone metabolites by photolysis of thyroxine
and thyroxine analogs in the near UV. Proc. Natl. Acad.
Sci. USA, 79, 1492-1496.
[42] Shizuka, H., Sorimachi, K., Morita, T., Nishiyama, K.
and Sato, T. (1971) Photochemical oxidation of 4, 5, 9,
10tetrahydropyrenes. Bull. Chem. Soc. Japan, 44, 1983-
[43] Sorimachi, K., Morita, T. and Shizuka, H. (1974) Photo-
cyclization of 2,2metacyclophane at 2537 A.. Bull.
Chem. Soc. Japan, 47, 987-990.
[44] Gilbert, W. (1986) The RNA World. Nature, 319, 618.
[45] Sorimachi, K. and Okayasu, T. (2003) Gene assembly
consisting of small units with similar amino acid compo-
sition in the Saccharomyces cerevisiae genome. Myco-
science, 44, 415-417.
[46] Hochberg, Y. and Tamhane, A.C. (1987) Multiple com-
parison procedures, In Probability and Mathematical Sta-
tistics (eds. Y. Hochberg and A.C. Tamhane), John Wiley
& Sons, New York, 274-309.
[47] Sorimachi, K., Okayasu, T., Ebara, Y. and Nakagawa, T.
(2005) Mathematical proof of genomic amino acid com-
position homogeneity based on putative small unts.
Dokkyo J. Med. Sci., 32, 99-100.
[48] Bergey’s Mmanual of Systemic Bacteriology.
[49] Fleischmann, R.D., Adams, M.D., White, O., Clayton,
R.A., Kirkness, E.F., Kerlavage, A.R. et al. (1995) Whole
-genome random sequencing and assembly of Haemo-
philus influenzae Rd. Science, 269, 496-512.
[50] International Human Genome Sequencing Consortium.
(2001) Initial sequencing and analysis of the human ge-
nome. Nature, 409: 860-921.
[51] Venter, J.C., Adams, M.D., Myers, E.W., Li, P.W., Mural,
R.J., Sutton, G.G., Smith, H.O., Yandell, M., Evans, C.A.,
Holt, R.A. et al. (2001) The sequence of the human ge-
nome. Science, 291, 1304-1351.
[52] Sorimachi, K. and Okayasu, T. (2004). Classification of
eubacteria based on their complete genome: where does
Mycoplasmataceae belong? Proc. R. Soc. Lond. B (Supp
l.), 271, S127-S130.
[53] Dayhoff, M.O., Park, C.M. and McLaughlin, P.J. (1977)
Building a phylogenetic trees: cytochrome C. In: Atlas of
protein sequence and structure. National Biomedical
Foundation, Washington, D.C., 5, 7-16.
[54] Sogin, M.L., Elwood, H.J. and Gunderson, J.H. (1986)
Evolutionary diversity of eukaryotic small subunit rRNA
genes. Proc Natl Acad Sci USA, 83, 1383-1387.
[55] DePouplana, L., Turner, R.J., Steer, B.A. et al. (1998)
Genetic code origins: tRNAs older than their synthetases?
Proc Natl Acad Sci USA, 95, 11295-11300.
[56] Doolittle, W.F. and Brown, J.R. (1994) Tempo, mode, the
progenote, and the universal root. Proc Natl Acad Sci
USA, 91, 6721-6728.
[57] Maizels, N. and Weiner, A.M. (1994) Phylogeny from
function: evidence from the molecular fossil record that
tRNA originated in replication, not translation. Proc Natl
Acad Sci USA, 91, 6729-6734.
[58] Sakagami, M., Nakayama, T., Hashimoto, T. et al. (2006)
Phylogeny of the centrohelida inferred from SSU rRNA,
tubulin, and actin genes. J. Mol. Evol., 61, 765-775.
[59] Okayasu, T. and Sorimachi, K. (2008) Organisms can
essentially be classified according to two codon patterns.
Amino Acids, 36, 261-271.
[60] Bentley, S.D., Chater, K.F., Cerdeño-Tárraga, M.A.,
Challis, G.L., Thompson, N.R., James, K.D., Harris, D.E.,
K. Sorimachi / Natural Science 1 (2009) 107-119
Copyright © 2009 SciRes. OPEN ACCESS
Quail, M.A., Kieser, H., Harper, D. et al. (2002) Com-
plete genome sequence of the model actinomycete
Streptomyces coelicolor A3(2). Nature, 417, 141-147.
[61] Glass, J.I., Lefkowitz, E.J., Glass, J.S., Heiner, C.R.,
Chen, E.Y. and Cassell, G.H. (2000) The complete se-
quence of the mucosal pathogen Ureaplasma urealyticum.
Nature, 407, 757-762.
[62] Sueoka, N. (1988) Directional mutation pressure and
neutral molecular evolution. Proc. Natl. Acad. Sci. USA,
85, 2653-2657.
[63] Chargaff, E. (1950) Chemical specificity of nucleic acids
and mechanism of their enzymatic degradation. Experi-
entia, VI, 201-209.
[64] Rundner, R., Karkas, J.D., and Chargaff, E. (1968) Sepa-
ration of B. subtilis DNA into complementary strands. 3.
Direct analysis. Proc. Natl. Acad. Sci. USA, 60, 921-922.
[65] Nikolaou, C. and Almirantis, Y. (2006) Deviations from
Chargaff’s second parity rule in organelle DNA insights
into the evolution of organelle genomes. Gene, 381,
[66] Bell, S.J. and Forsdyke, D.R. (1999) Deviations from
Chargaff’s second parity rule with direction of transcrip-
tion. J. Theor. Biol., 197, 63-76.
[67] Mitchell, D. and Bridge, R. (2006) A test of Chargaff’s
second rule. Biochem. Biophys. Res. Commun., 340,
[68] Gray, M.W., Burger, G. and Lang, B.F. (1999) Mito-
chondrial evolution. Science, 283, 1476-1481.
[69] Raven, J.A. and Allen, J.F. (2003) Genomics and chloro-
plast evolution: what did cyanobacteria do for plants?
Genom. Biol., 4, 209-215.
[70] Brown, W.M., George, Jr.M. and Wilson, A.C. (1979)
Rapid evolution of animal mitochondrial DNA. Proc.
Natl. Acad. Sci. USA, 76, 1967-1971.
[71] Kimura M. (1983) The neutral theory of molecular evo-
lution. Cambridge, Cambridge Univ. Press.
[72] Van Nimwegen, E., Crutchfield, J.P. and Huynen, M.
(1999) Neutral evolution of mutational robustness. Proc.
Natl. Acad. Sci. USA, 96, 9716-9720.