Genomic data provides simple evidence for a single origin of life

One hundred and fifty years ago, Charles Darwin’s on the Origin of Species explained the evolution of species through evolution by natural selection. To date, there is no simple piece of evidence demonstrating this concept across species. Chargaff’s first parity rule states that complementary base pairs are in equal proportion across DNA strands. Chargaff’s second parity rule, inconsistently followed across species, states that the base pairs are in equal proportion within DNA strands [G ≈ C, T ≈ A and (G + A) ≈ (C + T)]. Using genomic libraries, we analyzed the extent to which DNA samples followed Chargaff’s second parity rule. In organelle DNA, nucleotide relationships were heteroskedastic. After classifying organelles into chloroplasts and mitochondria, and then into plant, vertebrate, and invertebrate I and II mitochondria, nucleotide relationships were expressed by linear regression lines. All regression lines based on nuclear and organelle DNA crossed at the same point. This is a simple demonstration of a common ancestor across species.


INTRODUCTION
On the Origin of Species was published in 1859, stemming from observations Charles Darwin made during a voyage on HMS Beagle.According to his theory, all organisms have a common ancestor and a single origin.Since publication, evidence for this theory has accumulated.Although molecular clock research-using amino acid or nucleotide replacement rates [1]-has enabled scientists to draw a phylogenetic tree representing biological evolution [2][3][4][5][6][7], the "Origin of Life" has not yet been drawn using these methods.During the past two decades, advances in genomics have enabled the sequencing of entire genomes [8,9]; the first complete genome to be sequenced was that of Haemophilus influenzae [10].The complete human genome was sequenced early this century by two groups [11,12] and to date, more than 2,000 species' genomes have been completely sequenced.Based on complete genome data, codon evolution has been precisely analyzed [13], and organisms have been consequently classified [14].
The double-stranded DNA structure is the principle information-containing component of the genome [15].Based on structural knowledge alone, Chargaff's first parity rule [16] [G = C, A = T and (G + A) = (T + C)] makes intuitive sense.However, Chargaff's second parity rule [17], in which the same nucleotide relationships are retained within single DNA strands, makes less intuitive sense.The biological significance of Chargaff's second parity rule has not been elucidated because of its unclear logical foundation.In the 40 years since its publication, researchers have not known whether Chargaff's second parity rule is relevant to biological evolution.However, a recent publication has solved this historic puzzle [18].The solution is based on the facts that genome structure is homogeneous regarding nucleotide composition over the genome [19], and that both forward and reverse strands are almost the same [20].Using the complementary relationship between the two strands, both G and C contents are mathematically expressed by the same G + C formula in a single strand, and eventually G ≈ C and T ≈ A [18].Thus, the first parity rule comes from the inherent characteristics of nucleotides, and the second from the similarities of nucleotide composition between forward and reverse strands.These two rules represent different phenomena.The former is mathematically definitive and independent of biological significance, and the latter is less definite, and may or may not have biological significance.
Recently, Mitchell and Bridge examined a wide selection of biological DNA samples to determine whether they fitted Chargaff's second parity rule [21] (1,495 viral, 835 organelle, 231 bacterial and 20 archaeal genomes; and 164 sequences from 15 eukaryotes).Only single DNA strands that formed genomic double-stranded DNA obeyed Chargaff's second parity rule; organelle DNA and single viral DNA strands did not [21].Nikolaou and Almirantis reported that mitochondrial DNA could be classified into three groups based on the proportions of G-C and A-T content [22].They found that mitochondrial DNA deviated from Chargaff's second parity rule, and that chloroplasts shared the same relative nucleotide compositions as bacterial genomes [22].Similar deviations from Chargaff's second parity rule were reported by Bell and Forsdyke [23].My research group previously examined nuclear and organelle DNA nucleotide correlations, and found that nucleotide contents are correlated with each other in coding, non-coding, and complete nuclear DNA [20]; consistent results were obtained from chloroplast and plant mitochondrial DNA, and only homonucleotide contents are correlated with each other between the coding or non-coding regions and the single DNA strand in animal mitochondria [24].These results indicate that biological evolution can be expressed by linear formulae [20].If evolutionary processes are expressed by a single equation, it would suggest that evolutionary processes proceeded under the same rule.However, if this is the case, we cannot determine whether evolution diverged from a single or multiple origins, because all species are located on the same single line.If multiple equations are required, the position of the regression lines would either indicate a single or multiple evolutionary origins.

MATERIALS AND METHODS
Genome data were obtained from the National Center for Biotechnology Information (http://www.ncbi.nlm.nih.gov/sites)(NCBI).Chloroplast, plant mitochondria and animal mitochondria were examined.The list of organelles examined has been described in our previous paper [24].Using the same species, we examined newly collected data alongside previous data [24].For animal mitochondria, classified species are as follows: Group I invertebrates contained echinodermata (starfish), mollusca (octopus and squid) and arthropoda (insects); group II invertebrates contained cnidaria (coral), porifera (sponge) and protozoa (flagellate).All calculations were carried out using Microsoft Excel 2003 (Microsoft, Redmond, WA, USA).

Chloroplasts
After normalization, the four nucleotide contents can be expressed by the following equation: G + C + T + A = 1.The nucleotide content of each species was expressed by a linear formula, y = ax + b, where "y" and "x" are the nucleotide contents, and "a" and "b" are constant values (expressing the nucleotide alternation rate among species and original nucleotide content at the vertical intercept).
In our previous study [20], this linear formula was shown to be applicable across species.Nucleotide contents based on the complete chloroplast genome were plotted against C content (Figure 1, upper panel).
Two lines representing G/C content and C/C content overlapped, as did lines representing T/C, and A/C content.These relationships obeyed Chargaff's second parity rule.Thus, in chloroplast evolution, the G/C content alternations obey the same rule against C content, as does T/A content.This shows that G ≈ C and T ≈ A, and that the four kinds of nucleotide alternations occur synchronously.The former (G and C) alternation is attributed to the latter (T and A) alternation in normalized values.G and C exchanges or T and A exchanges do not occur simultaneously under this rule.The equations, represented by regression lines and regression coefficients, are shown in Table 1.Each regression coefficient is close to 0.9 or more than 0.9.This demonstrates an almost complete correlation between nucleotide content.The slopes in the equations were close to 1 and -1, and the constant values at the vertical intercept were close to 0 and 0.5, respectively.

Plant mitochondria
Plotting nucleotide contents against C content, the C/G and A/T lines almost overlapped (Figure 1, lower panel).This demonstrates that the alternations of the four nucleotide contents occurred synchronously.G/C content alternations obey the same rule in plant mitochondrial evolution, as do T/A alternations.
The characteristics representing linear equations are shown in Table 2.The absolute values of the slope were close to 1 in many equations, whereas that of line T expressed by A was 0.576; line A expressed by T was 0.708.In these two equations, the correlations were slightly reduced and the regression coefficients were 0.67.
The characteristics representing linear equations are shown in Table 2.The absolute values of the slope were close to 1 in many equations, whereas that of line T expressed by A was 0.576; line A expressed by T was 0.708.In these two equations, the correlations were slightly reduced and the regression coefficients were 0.67.
Plotting the ratios of C/G or T/A against the genome size in plant mitochondria, deviations from 1 were observed in the small genomes (less than 1 × 10 5 nucleotides), while the ratios were fixed to 1 in the larger genome sizes (more than 1 × 10 5 nucleotides); this rule was followed without exception in the data we used (Figure 2).

Animal Mitochondria
Relationships between nucleotide contents were also examined in animal mitochondria including vertebrates and invertebrates (Figure 3).The relationships were notably heteroskedastic.The values obtained from plotting G content against C content was classified into two groups by line C, which represents y(C) = x(C).The two groups     (invertebrates I and II) are located below and above line C: this suggests that they diverged from this crossing point.Regression lines representing nucleotide content relationships in vertebrates, invertebrate I and II are shown in Tables 3-5.Vertebrate mitochondria belonged to the same group as invertebrate I mitochondria, and the C content of vertebrate mitochondria was relatively high.Nucleotide contents in vertebrate mitochondria were plotted against C content.T/C contents were correlated, while G and A (purines) were not correlated against C content (Figure 4).This finding may be due to the short range of vertebrate distribution and their variations.Line characteristics representing regression lines are shown in Table 3.Even invertebrate mitochondria, when nucleotide contents were plotted against G or A (purine) contents, G/A contents were correlated, while C and T (pyrimidines) were not correlated against G or A (purine) content (Tables 4 and 5).
Group I invertebrate mitochondria were examined and are plotted in Figure 5 (upper panel).Various nucleotide content relationships are shown, plotted against C content.The regression coefficients for the equations expressing other nucleotide contents against C content were 0.7-0.8(Table 4).Extended lines representing G and C content converged at 0.06, forming a clear cuneiform.Similarly, A and T lines converged at around 0.05.These results indicate that separations of G from C started at around 0.05 C content, and around 0.45 for T and A content.Regression values are shown in Table 4.
Group II invertebrate mitochondria were examined using the same procedure as above.When G, A and T content was plotted against C content, there was a correlation between G and C content (Figure 4, middle panel).A and T lines also converged when C content was 0.10, although the extended C and G lines crossed when C content was 0.02.When C content was plotted against G content, C and G lines converged when G content was 0.16.Regression lines are shown in Table 5.The numbers in parentheses represent the sample number examined.R represents the regression coefficient.

Origin of Life
When G/C contents were plotted for various organelles and nuclei, all extended regression lines converged when C content was 0.03  0.02 (mean value  s. d.) (Figure 6).Vertebrate mitochondria (a relatively recent group) are located towards the right of the slope.This confirms the evolutionary direction (left to right), and confirms that all organisms diverged from the same origin.In fact, Ureaplasma urealyticum, which has the smallest genome size [25], is located towards the left of the slope, though this position is not absolute because of reversible nucleotide alternations on the genome.

DISCUSSION
This study used recent genomic data and knowledge of Chargaff's second parity rule to demonstrate common ancestry across species.
Although evolution by natural selection applies to all organelles, animal mitochondrial evolution seems to differ from both nuclei evolution and plant organelle evolution.Brown et al. previously reported the rapid evolution of animal mitochondrial DNA [26].Animal mitochondria do not follow Chargaff's second parity rule, but this study revealed that they evolved from a common ancestor.We previously showed that plasmids (not compartmentalized from the nucleus) have codon frequencies that resemble those of the parent organism, although there is no evidence that plasmids pass nuclear genomic material across generations [27].Thus, the compartmentalization of cellular organelles strongly influences characteristically organelle evolution.
Although deviations from Chargaff's second parity rule have been previously discussed [22,23], the results obtained here either demonstrate evolutionary phenomena or are caused by other confounding factors.In the present study, deviations from Chargaff's second parity rule in plant mitochondria depended on the genome size and disappeared in the larger genome size (Figure 2).Thus, differences in gene density between the cytosine-rich light and guanine-rich heavy strands affect Chargaff's second parity rule in the relatively small animal mitochondria, while they were cancelled out in the larger plant mitochondria.In fact, the ratios (C/G and T/A) were extremely close to 1 in the chloroplast DNA where genome sizes were more than 5 × 10 5 nucleotides; no exceptions were observed in the samples examined (unpublished data).This fact clearly shows that genome size is an important factor in Chargaff's second parity rule [22].In the Treponema pallidum genome, although the gene density differs between the forward and reverse strands [28], this organism obeys Chargaff's second parity rule [21].The nuclear genome of Ureaplasma urealyticum, which also obeys Chargaff's second parity rule, consists of 7.5 × 10 5 nucleotides [25].This reflects the fact that plant mitochondrial genome sizes are much smaller than plant nuclear genomes.
Animal mitochondria did not obey Chargaff's second parity rule, even after classification into vertebrate, invertebrate I and II mitochondrial genes.This suggests that nuclear, chloroplast and plant mitochondrial evolution is governed under the same rule, while animal mitochondrial evolution is governed under different rules.
The fact that evolution is expressed by linear formulas suggests that it proceeded linearly.The crossing of two regression lines suggests two evolutionary distinct processes, and a crossing point suggests either divergence or convergence at a single origin.The degree of difference in two evolutionary processes is expressed by the difference in linear regression slopes: small and large differences are expressed by sharp and dull angles, respectively.A single evolutionary process is expressed by a single regression line.The appearance of many regression lines which have the same slope but different intercept values would indicate multiple evolutionary origins.A previous study found that regression lines representing nucleotide relationships in the coding region were almost identical in chromosomal DNA among bacteria, archaea and eukaryotes [20].In our previous study [24], two regression lines representing homonucleotide contents in chloroplasts and plant mitochondria converged at the top of the cuneiform in both coding and non-coding regions.This suggests that chloroplasts and plant mitochondria diverged from the same origin.As research suggests that the former are derived from cyanobacteria [29] and the latter are derived from proteobacteria [30], both organelles are likely to be derived from the same origin.In addition, the formation of the cuneiform is obtained naturally in the comparison between coding and non-coding regions, because both fragments belong to the same strand [24].

CONCLUSIONS
When evolutionary direction is discovered, elucidating whether it occurs by divergence or convergence is not straightforward.In invertebrate mitochondria, as more recently evolved (and more advanced) vertebrates were located on the end of invertebrate I data, results indicated that invertebrate I and II evolution diverged from the opposite side of vertebrates.Nuclear, chloroplast and plant mitochondrial evolution is expressed by the same regression line based on Chargaff's second parity rule (Figure 6).In nuclei, chloroplasts and mitochondria from plants, amino acid compositions deduced from complete genome data were very similar, although they differed from animal mitochondria [24].In the present study, regression lines based on plant chloroplasts, mitochondria and nuclei overlapped, while animal mitochondrial regression lines converged at the same single point.Finally, all extended regression lines representing chromosomes, chloroplasts, plant mitochondria, vertebrates and invertebrates I and II converged at the same point (Figure 6).Therefore, I conclude that there is one single origin of life from which all organisms derived.This is consistent with the chemical conditions during prebiotic evolution, in which primitive replicators such as ribosomes would have formed [31], and in which primitive life forms would have similar cellular amino acid compositions presumed from those of present organisms [32,33].Thus all advanced forms of life, as deduced using genomic data in this study, descended from a single origin.

Figure 1 .
Figure 1.Nucleotide relationships in normalized values.upper panel, chloroplast; lower panel, plant mitochondria.Blue diamonds, G; pink squares, C; red triangles, T; and green triangle, A. Each nucleotide was plotted against C content.The vertical axis represents four nucleotide contents, the horizontal axis represents C content.

Figure 2 .
Figure 2. Ratios of nucleotide contents in plant mitochondrial genomes.The horizontal axis represents the number of total nucleotides and the vertical axis represents the ratios (G/C and A/T).Red squares, G/C; and blue diamonds, A/T.

Figure 3 .
Figure 3.Nucleotide relationships in animal mitochondria.Nucleotide contents were normalized, and G content was plotted against C content.Red squares represent C content against C content.Vertical axis represents G and C content and the horizontal axis represents C content.

Table 1 .
Regression lines based on chloroplasts.parentheses represent the sample number examined.R represents the regression coefficient.
0.755 A + 0.409 G = -0.821A + 0.445 T = 0.576 A + 0.146 A The numbers in parentheses represent the sample number examined.R represents the regression coefficient.
The numbers in parentheses represent the sample number examined.R represents the regression coefficient.

Table 4 .
Regression lines based on invertebrate I mitochondria.
The numbers in parentheses represent the sample number examined.R represents the regression coefficient.

Figure 4 .
Figure 4.Nucleotide relationships in vertebrate mitochondria.Nucleotide contents were normalized, and nucleotide contents were plotted against C content.The horizontal axis represents C content, and the vertical axis represents four nucleotide contents.Pink square, C; blue diamond, G; green triangle, T; and red triangle, A.

Figure 5 .
Figure 5. Regression lines representing nucleotide alternations in various organelles.Upper panel, invertebrate I mitochondria; middle panel, invertebrate II mitochondria; and lower panel, invertebrate I plus vertebrate mitochondria.The vertical axis represents four nucleotide contents and the horizontal axis represents C content.Blue diamond, G; pink square、C; green diamond, T; red triangle, A; dark red squares, chloroplasts; and large black square, vertebrates.

Figure 6 .
Figure 6.C content (horizontal axis) and G content (vertical axis) in nuclei and various organelles.Blue diamonds, invertebrate I and vertebrate mitochondria; pink diamonds, invertebrate II mitochondria; red squares, plant mitochondria; green triangles, chloroplasts; and black squares, nuclei.

Table 2 .
Regression lines based on plant mitochondria.

Table 3 .
Regression lines based on vertebrate mitochondria.