Haplogroup R 1 a as the Proto Indo-Europeans and the Legendary Aryans as Witnessed by the DNA of Their Current Descendants

This article aims at reconstructing history of R1a1 ancient migrations between 20,000 and 3500 years before present (ybp). Four thousand four hundred sixty (4460) haplotypes of haplogroup R1a1 were considered in terms of base (ancestral) haplotypes of R1a1 populations and timespans to their common ancestors in the regions from South Siberia and northern/northwestern China in the east to the Hindustan and further west across Iranian Plateau, Anatolia, Asia Minor and to the Balkans in Europe, including on this way Central Asia, South India, Nepal, Oman, the Middle East, Comoros Islands, Egypt, etc. This study provides a support to the theory that haplogroup R1a arose in Central Asia, apparently in South Siberia and/or neighboring regions, around 20,000 ybp. Not later than 12,000 ybp bearers of R1a1 already were in the Hindustan, then went across Anatolia and the rest of Asia Minor apparently between 10,000 and 9000 ybp, and around 9000 8000 ybp they arrived to the Balkans and spread over Europe east to the British Isles. On this migration way or before it bearers of R1a1 (or the parent, upstream haplogroups) have developed Proto Indo-European language, and carried it along during their journey to Europe. The earliest signs of the language on passing of bearers of R1a1 through Anatolia were picked by the linguists, and dated by 9400 9600 10,100 ybp, which fairly coincides with the data of DNA genealogy, described in this work. At the same time as bearers of the brother haplogroup R1b1a2 began to populate Europe after 4800 ybp, haplogroup R1a1 moved to the Russian Plain around 4800 4600 ybp. From there R1a1 migrated (or moved as military expeditions) to the south (Anatolia, Mitanni and the Arabian Peninsula), east (South Ural and then North India), and south-east (the Iranian Plateau) as the historic legendary Aryans. Haplotypes of their direct descendants are strikingly similar up to 67 markers with contemporary ethnic Russians of haplogroup R1a1. Dates of those Aryan movements from the Russian Plain in said directions are also strikingly similar, between 4200 and 3600 ybp.


Introduction
This study focuses on the origin of Indo-Europeans and the Aryans who entered India (the Hindustan), Iran (Iranian plateau), and Anatolia (Mesopotamia) approximately 3500 years ago.
The research findings, described in this study, demystify the origin of the Aryans.For nearly two centuries, the "Aryan problem" (essentially-Who were the Aryans?Where did they come from?Where did they disappear?Were they a particular human race, different from others?) has posed many challenges, often controversial and conflicted, for researchers, archeologists and linguists; however, this study opens new ground for our consideration and is based on the data provided by DNA genealogical test results.
The methodology of DNA genealogy, including considerations of extended 67 marker haplotypes, is described in detail in the preceding paper in this journal (Rozhanskii & Klyosov, 2011) and in Materials and Methods section of this article.The 67 marker haplotypes have been introduced to the scientific domain and personal usage several years ago, and available databases containing tens of thousands of 67 marker haplotypes are listed in (Rozhanskii & Klyosov, 2011) and in this paper (Appendix).
First, the following two 67 marker haplotypes of haplogroup R1a1 are presented, belonging to the two authors of this paper: Both Indian haplotypes contain 24 and 21 mutations with the first haplotype (mutations are shown in bold, and rules of their counting are explained in the paper cited above), and 28 and 36 mutations with the second one.This produces on average 27.25 ± 6.60 pairwise cross-mutations between all four haplotypes; that is, 27.25/.12= 227  292 ± 55 conditional generations (25 years each) = 7300 ± 1400 years between two haplotypes on average, or 3650 ± 700 years to a common ancestor of all the four haplotypes (.12 here is the mutation rate constant for 67 marker haplotypes, see the preceding paper cited above).According to all historical accounts, the Aryans arrived in India in the middle of the 2 nd millennium BC, which is approximately 3500 years ago.
This simplified calculation is based on these four haplotypes that belong to different subclades of R1a1 haplogroup (Z280, M458 and L342.2).However, considering each of the four haplotypes, the first two are from the current Russian-Ukrainian (Indo-European) group, and the second two are from the Indian (Indo-European) group.Both are similar and belong to the same R1a1 haplogroup.Currently, up to 72% of the upper castes in India belong to bearers of the same R1a1 haplogroup (Sharma et al., 2009).
This simplified calculation is given here for illustrative purposes, though four 67 marker haplotypes contain as many as 268 markers, which is quite statistically informative in a first approximation.A much more detailed analysis of Indian and ethnic Russian extended series of R1a1 haplotypes is given in (Klyosov, 2009b(Klyosov, , 2011b)), and principally the same results were obtained with respect to patterns of mutations in haplotypes, migration routes, and their chronology.
This brings closure to the question of the Aryans' DNA-related origin and who entered India during the middle of the 2 nd millennium CE.They belonged to the R1a1 haplogroup, which is the prevalent one in the present-day Eastern Europe (Russia, Poland, Ukraine, Belarus, in the first one up to 62% of total male population, in the latter three up to 55% of total male population [Klyosov, 2009b[Klyosov, , 2011b

and references therein]).
There is merit in comparing the Indian haplotypes with the R1b1a2 haplotypes-a group who populates ~60% of Europe, living primarily in the British Isles, Spain, France, Belgium, Germany, the Netherlands, and other Central and Western European countries.The typical ancestral haplotype in R1b1a2 haplogroup, dated about 4,800 years before present, is as follows (Klyosov, 2011a There are 48 and 44 mutations between the above and the Indian R1a1 haplotypes shown earlier.This formally places their common ancestor at more than 10,000 years before present and, in fact, much earlier, at least 15,000 years ago.R1b1a2 bearers were not among the Aryans coming to India, and it is very likely that they were not Indo-Europeans then.Specifically, there is no supporting evidence that 4000 years before present (ybp) bearers of R1b1a2 spoke Indo-European (IE) languages.On the other hand, Central Europe was likely populated by R1b1a2 speakers of non-IE languages.Moreover, there are very few bearers of R1b haplogroup in India, mostly on its Arabian Sea coast, and there were none of the R1b haplogroup among the 367 tested Indian Brahmins (Sharma et al., 2009).Therefore, it is highly unlikely that bearers of the R1b1 (as well as R1b1a2) haplogroup were among the Aryans, and, hence, they were not among those carrying the Indo-European languages elsewhere in those times.
As described in (Klyosov & Rozhanskii, 2011), Europeoids (Caucasoids) appeared ~58,000 ybp.They gradually branched to downstream haplogroups and migrated to the west, south and east.Haplogroup NOP, which was among them, arose ~48,000 ybp, and moved eastward, presumably towards South Siberia and/or adjacent regions.Haplogroup P arose ~38,000 ybp, apparently in South Siberia, and gave rise to haplogroup R and then R1 ~30,000 -26,000 ybp (see the diagram in Klyosov & Rozhanskii, 2011).The timing of haplogroup R1a's appearance can be reconstructed from series of R1a haplotypes, made available from the databases (see the list in Materials and Methods and the Appendix).The most ancient common ancestors of this haplogroup lived in: northern and northwestern China (in particular, Xinjiang region, which is the south Altai area), in southern Siberia, in the Eastern Himalayas, India and Pakistan, the Comoros Islands, and in Europe, where their bearers apparently migrated from the east during both the remote past and later, for example, with the Scythians.

Northern China R1a Haplotypes
Apparently, the most ancient source of R1a1 haplotypes is provided by the people now living in northern China.It was shown (Bittles et al., 2007) that for a number of Chinese populations, such as Hui, Bonan, Dongxiang, Salars, a percentage of R1a1 haplotypes reached 18% -32%.Their haplotypes were not provided in the paper, but the author, Professor Alan H. Bittles, kindly sent us a list of 31 of five-marker haplotypes typed as R1a1, the tree of which is shown in Figure 1.The haplotypes vary tremendously in their alleles, which already indicates that their common ancestor lived in ancient times.For example,values of DYS19 varied between 14 and 17,DYS388 between 12 and 14,and DYS393 between 10 and 13.It should be noted that mutations in the last two markers occurred on average once in 4,545 and 1,320 generations, respectively.With a correction for back mutations (Klyosov, 2009a) it occurs once in 8500 and 2400 generations.The 31 haplotypes contain 99 mutations from the deduced 5 marker base haplotype as shown here in the 12 marker FTDNA format with missing alleles indicated: 13 X 14 X X X X 12 X 13 X 30 This extent of mutations, which can be presented as 99/31/5 = .639± .064mutation/marker, is a very high value.Actually, it is a measure of how ancient a common ancestor might be (for a comparison, contemporary European R1a1 and R1b1a2 haplotypes are separated by .250-.270 mutations from their common ancestors, see Klyosov 2011aKlyosov , 2011b)).It can also be presented as 99/31/.00677= 472  683 conditional generations; that is, it is 17,100 ± 2,400 years to the common ancestor of these 31 haplotypes (explanations and examples of calculations are given in Materials and Methods).The value of .00677mutation/haplotype for conditional generation is the mutation rate constant for the 5 marker haplotypes (Klyosov, 2009a).
Since these haplotypes descend from such an ancient common ancestor and contain numerous mutations, this makes their deduced base (ancestral) haplotype rather uncertain.Therefore, the quadratic permutation method was employed for the same set of haplotypes (Klyosov, 2009a).This method does not require either a base haplotype or a correction for back mutations.The obtained timespan is 19,625 ± 2,800 years to a common ancestor (see Materials and Methods for calculations).This result is within the margin of error with that calculated by the above linear method.
Therefore, haplogroup R1a arose at approximately 20,000 ybp with the territory geographically belonging to Central Asia.

R1a1 Haplotypes from Altay
Thirteen Altay R1a1 haplotypes were listed in (Underhill et al., 2009), 12 of which showed a rather recent base haplotype (the last marker is DYS461): 13 26 16 11 X X X 12 11 14 11 31-10 These 12 haplotypes have only 7 mutations per 120 markers from the above base haplotype, which gives 7/120/.0018= 32  33 generations; that is, 825±320 years to a common ancestor.The same set of the Altayan haplotypes in a different format is given in (Järve et al., 2009), with the base haplotype (not listed in the cited paper) 13 26 16 11 11 17 X X 11 14 11 31 and the same 7 mutations per 120 markers.Therefore, this exactly correlates to the same timespan to that of the common ancestor as given above.The same set of the Altayan haplotypes was given in (Järve et al., 2009) in a significantly more extended format, with the base haplotype (the second panel represents DYS 458,437,448,GATAH4,456,438,594,411S1 [two alleles],596,643,645,635,YPenta1,YPenta2) All 12 haplotypes in the extended format collectively contain the same 7 mutations.In other words, the added 15 "slow" markers did not produce mutations, and is an indicator of a quite recent common ancestor of the haplotype dataset.However, this base haplotype differs by 6 mutations from the base haplotype of the Russian Plain, which in the same format is (a more extended 67 marker base haplotype of the Russian Plain is shown below).These 6 and 12 mutations exactly fit the difference between the respective mutation rate constants for the two haplotype formats, equal to .020 and .0404mutations per haplotype per generation, respectively (see Material and Methods).These mutation differences place a common ancestor of the Altayan and the Russian Plain haplotypes at 8100 ybp.
What emerges from the analysis of the data is that the Altayan and the Tuva haplotypes have apparently the same ancient R1a1 common ancestor, who lived 10,000 -10,400 ybp.However, the surviving DNA lineages, which "surfaced" only recently, particularly in Tuva, are different in Tuva and in Altay, though all coalescent to said ancient common ancestor.

R1a Haplotypes in the Eastern Himalayas
Five R1a1 haplotypes were listed in (Kang et al., 2011), which showed a rather recent base haplotype (the last two markers are DYS437 and DYS438): These 5 haplotypes have only 4 mutations from the above base haplotype, which gives 4/5/.0215= 37  38 generations; that is, 950 ± 480 years to a common ancestor.However, the above base haplotype has very unusual (for R1a haplogroup) alleles DYS426 = 13, and DYS388 = 14, and differs by 5 mutations with the Russian Plain base haplotypes.This places their common ancestor at 6650 ybp.This is clearly a separate branch of ancient R1a haplotypes in Eastern Himalayas.

R1a Haplotypes in India and Pakistan
There are two principal sources of haplotypes of haplogroup R1a in the Hindustan.One was brought by the Aryans in the middle of the 2 nd millennium BC, as it was described above, and supported below with more extended series of Indian haplotypes.A timespan to the most recent common ancestor of these haplotypes varies between 4000 and 4600 ybp, and often around 4050 ybp, depending on a particular haplotype datasets.The base (ancestral) haplotype of the Aryan (Indo-European) haplotype in its 12 marker format is 13 25 16 10 11 14 12 12 10 13 11 30 This haplotype is nearly identical to that of the Russian Plain base (see below), except the latter came from a common ancestor who lived between 4,600 and 5,000 ybp as determined using different haplotype datasets (Klyosov, 2009a;Klyosov, 2011b).
A more ancient source is presumably the South Siberian and/or Central Asian haplotypes brought to the Hindustan during the westward migrations of R1a bearers between 20,000 and 10,000 ybp.Some studies alleged that the most ancient common ancestors of R1a haplotypes were Indian; however, the results were flawed by erroneous calculations of timespans using incorrect "population mutation rates" (see their description and discussion in Klyosov, 2009aKlyosov, , 2009c, and references therein), which routinely converted the actual 3600 -4000 ybp ("Indo-European" R1a1 in India) into 12,000 -15,000 ybp.This was erroneously claimed as the proof of "origin of R1a in India."Furthermore, high percentages of R1a in some regions in India or in some ethnic and/or religious groups (such as Brahmins) were incorrectly claimed as the proof of the origin of R1a in India (Kivisild et al., 2003;Sengupta et al., 2006;Sahoo et al., 2006;Sharma et al., 2009;Thanseem et al., 2006;Fornarino et al., 2009).The application of the flawed approach resulted in confusion amongst researchers in the field of human population genetics over the last decade.The course of research is hopefully corrected by the application of today's most recent developments of DNA genealogy, which utilizes a principally different methodology (Klyosov, 2009a(Klyosov, , 2009b(Klyosov, , 2009c;;Rozhanskii and Klyosov, 2011;Klyosov, 2011b).
Forty-six of 6 marker R1a1 haplotypes of three different tribal population of Andra Pradesh, South India (tribes Naikpod, Andh, and Pardhan) listed in (Thanseem et al., 2006) and shown in the haplotype tree in Figure 2, contain 126 mutations; that is .457± .041mutations per marker (the mutation rate constant equals .0123mutation/haplotype/generation, Klyosov, 2009a).It gives 7200 ± 960 years to a common ancestor (see Material and Methods for calculations).The base (ancestral) haplotype of those south Indian populations in the FTDNA format is as follows: 13 25 17 9 X X X X X 14 X 32 This differs from the north-Indian "Indo-European" haplotype (see above) by four mutations on six markers, which places their most recent common ancestor to approximately 11,600 ybp (see Materials and Methods for calculations).
The ancient north China R1a1 base haplotype (see above) differs from the Andra Pradesh R1a1 base haplotype by at least 5 mutations on the 5 available markers, which places their common ancestor at approximately 22,000 ybp.Within a margin of error, it can be deduced that this is the same common ancestor of the north China haplotypes.This mutational difference neatly fits the chronology and direction of the migration, which continues from the ancient (non Indo-European) Indian haplotypes to the Indo-European Indian haplotypes with their common ancestor (non-IE and IE) who lived approximately 11,600 ybp.Also, this data dovetails with the timing of the follow-up migration of R1a1 bearers from Hindustan via Asia Minor (with a detection of the proto Indo-European language in Anatolia with estimated divergence time of 9400 -9600 -10,100 ybp, see Gray & Atkinson, 2003;Renfrew, 2000;Gamkrelidze & Ivanov, 1995) to Europe (with the arrival 10,000 -8000 ybp, see below) and then to the Russian Plain (5000 -4800 ybp, see below).
The analysis of this data and of these findings essentially unites most, if not all, concepts of the "origin of Indo-European language" which have, at various times, placed the "origin" from India to Iran, Anatolia, the Balkans, to the Russian Steppes (Gimbutas, 1973(Gimbutas, , 1994;;Mallory, 1989;Dixon, 1997;Anthony, 2007), except that they were related not to the "origin," but to the passing areas of the R1a1 migration.
Population geneticists typically mix DNA lineages and branches in their analysis whereby "phantom common ancestors" emerge.This is exemplified with 110 of 10-marker R1a1 haplotypes of various Indian populations, both tribal and Dravidian and Indo-European castes, listed in (Sengupta et al., 2006).The resulting mixed haplotype tree is shown in Figure 3.It contains 344 mutations, which is .313mutations per marker, and results in a "phantom" 5275 years to a "common ancestor," just between the shown above 7180 ± 960 years for non-IE and 4050 ± 500 IE Indian haplotypes.The 10 marker haplotype tree for R1a1 haplotypes in India (mixed population, including tribes and castes).The 110-haplotype tree was composed from data listed in (Sengupta et al., 2006).The article contains 114 Indian R1a1 haplotypes, however, four of them were incomplete.
For a comparison, consider the Pakistani R1a1 haplotypes listed in the Sengupta ( 2006) paper (Figure 4).Forty-two haplotypes contain 166 mutations, which give .395± .031mutations per marker, and 6800 ± 860 years to a common ancestor.This value fits within margin of error to the "south-Indian" 7200 ± 960 ybp; however, the base (ancestral) haplotypes differ significantly.The base Pakistani haplotype is as follows (in the FTDNA format plus DYS461): 13 25 17 11 X X X 12 10 13 11 30-9 It differs from the south Indian "non-IE" and the north Indian "IE" base haplotypes by four and two mutations on six markers, respectively.This places a common ancestor of the Pakistani and the south Indian "non-IE" R1a1 populations at approximately 12,980 ybp which is within margins of error with the 11,600 ybp reported above as the migration time through the Hindustan westward.The two mutations place a common ancestor of the Pakistani and the "Indo-European" Indian populations more recently, at 7800 ybp.This chronological trend might also point in the direction of the ancient migration of R1a1 westward.
A more detailed consideration of the Pakistani R1a1 haplotypes, including separate calculations of each of the four branches in Figure 4, results in a timespan of 8650 years to a common ancestor for all of these branches (Klyosov, 2010a).In all, it does not change the principal conclusions of this section.

R1a1 Haplotypes in Central Asia
Ten 10-marker Central Asian haplotypes were listed in (Sengupta et al., 2006).They contain 27 mutations from the base haplotype 13 25 16 11 X X X 12 10 13 11 31--9 which gives .270± .052mutations per marker, and 4300 ± 940 years to a common ancestor.It is the same value that we have found for the Russian Plain and "Indo-European" Indian base R1a1 haplotypes.
Both the Central Asian base haplotype and the dating of a common ancestor described above are supported by the latest data on extended 67 marker haplotypes that were collected in November 2011 in the R1a1 FTDNA Project.The Central Asian base haplotype was as follows: The result is compelling and provides an exact fit with the expected migration pattern of the R1a1 haplogroup from the Russian Plain (~4600 -4400 ybp) to Central Asia (3650 ± 590 ybp) on their way to the South Urals and to the Hindustan.
What these findings suggest is that there are two different subsets of the Indian R1a1 haplotypes.One was brought by European bearers known as the Aryans, seemingly on their way from the Russian Plain through Central Asia in the middle of the 2 nd millennium BC.The other was much more ancient and migrated from South Siberia/northern China to India 12,000 years ago.This migratory wave continued through the Iranian Plateau westward (via Anatolia and the rest of Asia Minor), to the Balkans and then further into the European continent.

R1a1 Haplotypes of the Comoros Islands
Fifteen R1a haplotypes have been found among 381 tested men on the Islands.Three of them were R1a*-SRY10831a, and twelve were R1a1 (Msaidie et al., 2011).The cited study did not generate any chronological estimates based on the haplotypes, and considered only 8 marker haplotypes (for a typical "population genetics" analysis without separation of haplogroups) while, in fact, determined 17 marker haplotypes.
The base haplotype for said 12 R1a1 haplotypes is as follows: 13 24/25 15 11 12 14 X X 10 13 11 18-16/17 14 19 12 15 11 23 All have 104 mutations; that is, .51± .05mutation per marker.This high value points at a significantly more ancient common ancestor compared with the that in the Russian Plain, Central Asia and the Indo-European Indian R1a1 populations (.28, .27,.24mutations per marker, respectively).Furthermore, it is more ancient compared with the old south Indian and Pakistani R1a1 populations (.457 and .395mutations per marker, respectively, see above).Indeed, a common ancestor of the Comoros R1a1 haplotypes lived .51/.02 = 255  340 conditional generations; that is 8500 ± 1190 ybp.
For a comparison, the Russian Plain base haplotype in the same format is as follows: 13 25 16 11 11 14 X X 10 13 11 17-15 14 20 12 16 11 23 It differs by as many as 7 mutations from the Comoros base haplotype.This places their common ancestor at 9900 ybp.It is reasonable to suggest that this common ancestor was one of those R1a1 who were moving westward along the Iranian plateau and Asia Minor almost 10,000 ybp.Indeed, the dating around 9900 ybp is rather typical for archaeological settlements in Asia Minor with known dates of 10,200, 9900 and 9000 ybp (Myres et al., 2010).It is not necessarily true that the bearers of R1a1 were in the Comoros Islands 9900 ybp since it is known that the Persian traders had expanded their maritime routes to Madagascar by 700-900 AD (Msaidie, 2011).

R1a1 Haplotypes in the Arabian Peninsula
Sixteen R1a1 10 marker haplotypes from Qatar and United Arab Emirates were published (Cadenas et al., 2008).They split into two branches, and their base haplotypes 13 25 15 11 11 14 X Y 10 13 11 30 13 25 16 11 11 14 X Y 10 13 11 31 differed by only one mutation.Their common ancestor lived 3750 ± 825 years bp.Since a common ancestor of R1a1 haplotypes in Armenia and Anatolia lived 4500 ± 1040 and 3700 ± 550 years bp, respectively (Klyosov, 2008), the three dates do not conflict with each other.They were not part of the ancient migrations of 12 -9 thousand ybp, but they were most likely on the military expeditions of the (Aryan) R1a1 from the Russian Plain southward through Anatolia, Mitanni, and to the Middle East and the Arabian Peninsula around 4000 -3600 ybp.As mentioned earlier, today there are between 3% and 9% of R1a1 in those regions, among them members of famous tribes such as Quraish/Quraysh (Muhammad, the founder of the religion of Islam, was born into the Quraysh tribe), Al Tamimi (Banu Tamim) and others.
Much more reliable data are obtained with extended 67 marker haplotypes from the Arabic FTDNA project.Twenty-seven haplotypes from Qatar, Kuwait, Saudi Arabia, UAE, Oman, Bahrain and Syria form a separate branch on the haplotype tree and result in the following base haplotype: 13 25 16 11 11 14 12 12 10 13 11 30-15 9 10 11 11 24 14  20 32 12 15 15 16-11 11 19 23 15 16 18 19 35 38 13 11-11  8 17 17 8 12 10 8 11 10 12 22 22 15 10 12 12 13 8 14 23 21 12  12 11 13 11 11 12 13 Therein 499 mutations exist and 499/27/.12= 154  182 generations; that is, 4550 ± 500 years from a common ancestor of the Arabic haplotypes-practically the same as that for the Russian plain R1a1 common ancestor (see above).The two differ by only 1.4 mutations in all the 67 markers; that is, 1.4/.12 = 12 generations apart, a ~300 year difference between the Russian Plane R1a1 common ancestor and the Arabic haplotypes common ancestor.The exception being that the Arabic haplotypes are typically coupled with the downstream L342 SNP mutation.The difference places their common ancestor at ~4825 ybp, which is the Russian Plain base (ancestral) haplotype.This is the same Aryan haplotype that was brought ~4500 ybp from the Russian Plain in a star-like manner to India, Iran, Anatolia, the Arabian Peninsula to arrive there a thousand years later, in the middle of the 2 nd millennium BC.
Recent developments in the phylogeny of R1a1 haplotypes coupled with the DNA genealogy analysis have shown that the migrations of R1a1 from the Russian Plain in the described star-like manner were accompanied with the R1a1-L342 (around 4400 ybp) and then its downstream L657 subclade.The L342 subclade is almost absent on the Russian Plain, and it appears in the Bashkir population in the east, in Kazakhstan (L342  L657) south-east, in India (L342  L657), and in the Middle East (including the Arabian Peninsula, L342  L657).It shows primary directions of the Aryan (R1a1) migrations after ~4800 ybp.
The Arabian R1a1-L657 haplotypes along with all known Iranian, Indian and Kazakh L657 haplotypes have the following L657 base haplotype: Its common ancestor lived 3000 ± 400 ybp.The above base haplotype differs by 9.85 mutations from the Russian Plain base haplotype (some mutations are fractional ones), which places the R1a1-L657 and R1a1 Russian plain common ancestor at 5000 ± 600 years bp.

R1a1-L342 Bashkir and Szekely/Seklers (Hungarian) Haplotypes
As noted in the preceding section, migrations of the ancient Aryans eastward (and in some cases westward, as illustrated below with the Szekely L342 R1a1 haplotypes) have resulted in the appearance of the downstream R1a1 subclades, such as L342, among the Bashkirs.The respective L342 base haplotype is as follows: Their common ancestor lived only 1300 ± 250 ybp; however, the base haplotype differ by as many as 14 mutations from the Russian Plain base haplotype.This places their common ancestor, for the Bashkirs and the Russian Plain, at 4700 ± 500 ybp.This is again the Aryan R1a1 common ancestor on the Russian Plain.
There is quite a distant L342 lineage among descendants of Hungarian Szekely servicemen, recorded in the first 1602 military census.The lineage is only 675 ± 260 years "old".However, its base haplotype

R1a Haplotypes along the Ancient Migration Path from South Siberia to Europe
Three principal studies have been published recently, that contain hundreds of R1a1 haplotypes from all over the world (Underhill et al., 2009;Zhong et al., 2010;Shou et al., 2010).Analysis of those haplotypes and the chronology of their common ancestors have not been undertaken by the authors of these studies.Figures 5-7 show general views of R1a1 haplotype trees, that were calculated from the data.The purpose for including pictures of these trees was not to analyze their fine structure in detail (Klyosov, 2010a(Klyosov, , 2010b), but to demonstrate their complex multi-branch structure, hence, ancient origins.For example, relatively young trees (young "age" of their common ancestor) are often rather symmetrical and relatively uniform, such as the Russian Plain R1a1 haplotype tree with a common ancestor 4600 ybp (Figure 8).
Analysis of R1a1 haplotypes and their branches on the trees in Figures 5-7 shows that their ancient common ancestors lived in south Siberia and Altay (belonging to both south Siberia and Central Asia).Their ancient descendants carried the R1a1 haplogroup while migrating from North and North-Western China, across Tibet and Hindustan, and then along the Iranian Plateau, from Asia Minor and finally into Europe.Some remnants of ancient R1a1 were left in Cambodia, Nepal, Oman, Israel, Iraq, Egypt, Crete, the Caucasus, Russia, Estonia (the respective haplotypes are recovered from data published in Underhill et al., 2009, Zhong et al., 2010, Shou et al., 2010).Results of the dynamics of mutation in these haplotypes significantly differ from those in the contemporary European R1a1, except one ancient and distinct lineage of R1a1 in Europe (see below).Their common ancestors as thusly reconstructed, lived from 20,000 ybp in south Siberia/northern China through 12,000 -11,000 ybp in Hindustan and 6900 ybp in Uyghurs in north-western China.
Typically, ancient common ancestors are recognized by the distinct DYS392 = 13, unlike typical DYS392 = 11 in most of European (and elsewhere) R1a1 haplotypes.The study by Underhill et al. (2009) listed four Egyptian R1a1 haplotypes, two     The very top of the tree (Figure 6) contains 18 base hapl pes, which are identical to each other, and expressed in the 9 marker format as follows: 13 25 16 11 X X X 12 10 In this particular case these identi ussia, Turkey, Ukraine, Slovakia, Iran, Nepal, India, and Hungary.The short haplotype format does not allow them to be resolved any further, but with the available 9 markers this base haplotype is an exact (albeit partial) reproduction of the base haplotype of the Russian Plain.Furthermore, the tree in Figure 8 produces exactly the same 67 marker base haplotype of the Russian Plain.The whole tree contains 148 haplotypes with 2748 mutations from the base haplotype.It produces 2748/ 148/67 = .277± .005mutations per marker, and .277/.12 = 155  183 generations; that is 4575 ± 470 years to a common ancestor of the Russian Plain base haplotype.
The whole pattern of ancient migrations o plotypes shows that after they had arrived to Europe via Asia Minor, as it is described above, between 11,000 and 8000 ybp (see below), they moved to the Russian Plain in the beginning of the 3 rd millennium BC.It coincided time-wise with the arrival of bearers of R1b1a2 haplotypes in Europe.From there R1a1 split into three principal streams.One stream migrated south, over the Caucasus to Anatolia, the Middle East and the Arabian Peninsula.The second stream went eastward to South Ural, the Andronovo and Sintashta archaeological cultures in the 2 nd millennium BC, between 4000 and 3000 ybp, and then split into two migration paths.One went south to India as the legendary Aryans, another went further east to Altay and the Northern China.This closed the loop of the ancient migrations of R1a1.Yet the third stream went south-east to the mountainous terrain of Middle Asia in ~4000 ybp, and some 500 years later moved to Iran, otherwise known as the "Avesta Aryans."

Legendary Aryans in India
The R1a1 FTDNA Project in November 2011 contain dian haplotypes.Their base haplotype in the 25 marker format 13 2 32 12 15 15 16 contained only 1.4 Russian Plain base haplotype (see above).This translates into 1.4/.046= 30  31 generations, or ~775 years between their common ancestors.In terms of time, this is a close distance between the Russian Plain and the Indian base haplotypes, and it fits with the time spans for the Russian Plain R1a1 common ancestor (4600 -4800 ybp) and the Indian common ancestor (~4050 ybp), determined independently.This is the historical Aryan base haplotype.
Figure 9 shows that th ve principal branches.Their base haplotypes in the 25 marker format are as follows (clockwise from the top): e is 9, 15, 25 and 34 mutations, respectiv om the Balka the both linear and the qu short haplotypes are subject to high margi dian "Indo-European", the Aryan R1a1 base haplotype.All five base haplotypes differ collectively by 12 mutations from their ancestral (see above) base haplotype, which translates 4050 ± 500 years to their ancestral haplotype.
It should be noted that datasets of Indian R1a1 fficult to analyze, because they typically represent a superposition of haplotypes from various sources, including those from the ancient (pre-IndoAryan) ancestors, from the Russian Plain, Central Asia, the Middle East, etc.Since they all present in various amounts and proportions, only analysis of their haplotype trees can give meaningful results.

Other Scattered Ancient R1a1 Haplotypes in Asia
A "patchy" pattern of R1a1 was created by the territori ending of ancestors from the very ancient R1a1 (from more than 10,000 -15,000 ybp) to the rather recent, the Aryan migrations.The tree in Figure 5 presents some haplotypes from Nepal, which differ by 5 mutations from the Russian Plain base haplotype, pointing at a common ancestor of 7200 ybp.Some Indian haplotypes show 7-mutation difference from the Russian Plain haplotype with a common ancestor of 10,200 ybp.A Cambodian haplotype makes 9 mutations, which places a common ancestor of the Russian Plain base haplotype and the Cambodian haplotype at 14,000 ybp.Some haplotypes from Pakistan, Iran, Oman and Arab Emirates show 5 -6 mutations, pointing to a common ancestor of 7000 -9000 ybp.A group of ethnic minorities from north-western China (Tu, Xibe, Tatars, Uyghurs, Yugurs, Salars, Bonan and others) typically have their collective R1a1 common ancestor of 6900 ybp (Klyosov, 2010b).All of them reportedly have the following base haplotype, obtained from the tree in Figure 7: 13 25 16 11 X X X 12 X 14 12 The value of DYS392 = 12 is a su ual one for R1a1, including those from Central Asia.However, almost all Asian haplotypes in (Shou et al., 2010) are reported as having this "12" allele.The difference in 2.85 mutations with the base Russian Plain or with the IE Indian base haplotype (if DYS392 = 12 is correct) or 1.85 mutation (if DYS392 = 11 is correct) places a common ancestor of the north-western Central Asian R1a1 haplotypes and the Russian/Indian haplotypes to either 9350 ybp or 7925 ybp, respectively.In any case they are significantly more ancient compared with the majority of European haplotypes.
Lastly, there is an additional comp ation, and it regards the Central European branch (Rozhanskii & Klyosov, 2009)  Herein the differenc ely.Their common ancestor lived ~7700 ybp.A series of 67 haplotypes of haplogroup R1a1 fr ns was published (Barac et al., 2003a,b;Pericic et al.., 2005).In print, there were presented in the 9 marker format, and the respective haplotype tree is shown in Figure 10.Most of the tree contains typical European haplotypes with a common ancestor of ~4500 ybp.However, the left branch is distinctive since it contains R1a1 haplotypes with DYS392 = 13, such as 12 24 16 10 12 15 X X X 13 13 29 12 24 15 11 12 15 X X X 13 13 29 13 24 14 11 11 11 X X X 13 13 29 This branch was calculated using adratic permutation methods.It obtained .598± .071mutations on average per marker, which resulted in 11,425 ± 1780 and 11,650 ± 1550 years to a common ancestor, respectively (Klyosov, 2009a).
Calculations with ns of error compared with extended 67 marker haplotypes.The authors of this study consider the timespans to a common 67-haplotype tree for the Balkans, haplogroup R1a1.The The 9-marker tree was composed from data (Barac et al., 2003a(Barac et al., , 2003b;;Pericic et al., 2005).
A. A. KLYOSOV ET AL.
ancestor of R1a1 haplotypes in Europe of 7400 -7700 ybp to

Materials and Methods
(4460) of R1a1 haplotypes w logy of haplotype datasets analysis was desc arker ha utation rate constant for the 12 posed using software PHYLIP, Ph types in the dataset were determined by minimiza or of two base haplotypes is be more reliable in comparison to the 11,000 -12,000 ybp.Haplotype databases continue to expand, and future studies will reveal the lower limit of "age" of haplogroup R1a in Europe.Thus, dismissing the data of Figure 10 would be premature.
Four thousand four hundred sixty ere collected in databases from FTDNA, YSearch and SMGF (Sorenson Database), in the private databases by Martin Voorwinden (987 of the Tenths, DYS388 = 10 haplotypes) and the IRAKAZ (2018 of R1a1 haplotypes, regularly updated by Alexander Zolotarev and Igor Rozhanskii), and in peer review publications.
The methodo ribed in (Rozhanskii & Klyosov, 2011).In this study the linear and the quadratic permutation method were used, the latter when the base haplotype could not be decisively determined, as described in (Klyosov, 2009a).The mutation rate constants are listed in (Klyosov, 2009a;Rozhanskii & Klyosov, 2011), and for a number of cases are given in the text of this paper.
The mutation rate constant for the non-standard 25 m plotype discussed in the "Haplotype from Altay" was calculated using the respective father-son pair frequencies of transmissions listed in (Burgarella & Navascues, 2011).We have shown in (Rozhanskii & Klyosov, 2011) that for the 12 marker haplotype panel, the mutation rate constant equals .0200mutation/haplotype/generation.Indeed, in the (Burgarella & Navascues, 2011) paper the sum of the respective ten frequencies (those for DYS385a, b were not determined) were equal to .0201mutation/ haplotype/generation.
In other words, our calibrated m -marker panel fits fairly well to the sum of average frequentcies in father-son pairs.For the 15-marker second half of the panel (from DYS458 to YPenta2) the sum of the frequencies of father-son pairs (Burgarella & Navascues, 2011) was equal to .0204mutation/generation, which makes the mutation rate constant for the whole 25 marker haplotype of .0404mutation/haplotype/generation.
Base haplo tion of mutations; by definition, the base haplotype is one which has the minimum collective number of mutations in the dataset.The base haplotype is the ancestral haplotype or the closest approximation to the latter.
A timespan to the common ancest determined as follows: 1) count the number of mutations between the two base haplotypes; 2) divide the obtained number by the mutation rate constant; 3) introduce a correction for back mutations, calculated using the following formula (Adamov & Klyosov, 2008;Klyosov, 2009a).
where: = observed average number of mutations per marker tas in a da et (or in a branch, if the dataset contains several branches/lineages),  = average (actual) number of mutations per marker corrected for back mutations; 4) add the obtained value, multiplied by 25 (years), which represent the "lateral" timespan between times of appearance of the two base haplotypes, to TMRCAs for the both base haplotypes and divide by 2. The result represents the TMRCA (time for the most recent common ancestor) for the two base haplotypes under study.
Example 1: Calculation of a timespan to a common ances tor w obs  hen an average number of mutations per marker is equal to .395(a series of Pakistani 10 marker haplotypes in this paper), and the mutation rate constant for the 10 marker haplotype is .018mutation/haplotype/generation (25 years), or .0018mutation/marker/generation..395/.0018 = 219 generations without a correction for back mutations.Since the observed number of mutations per marker is .395,we employ formula (1) and obtain In order to calculate exp (.395), that is e .395, we need to find a number, the natural logarithm of which is equal to .395.This number is 1.4844.Then we have   The obtained number of 1.2422 is the coefficient of the corre ple 2: Forty-six (46) of 6 marker haplotypes of Andra Pr of norther ction for back mutations.Therefore, by multiplying 219 × 1.2422, we obtain that the corrected number of generations is 272, that it 272 × 25 = 6800 years.This is usually designates as 219  272 (generations).Since for 166 mutations in the dataset the margin of error is 12.66% (calculated as explained in Klyosov, 2009a), we at last obtain the timespan to a common ancestor of the Pakistani haplotypes is equal to 6800 ± 860 years.
Exam adesh, South India contain 126 mutations; that is, 126/46/6 = .457± .041mutations per marker (the mutation rate constant equals .0123mutation/haplotype/generation; that is, .00205mutation/marker/generation, Klyosov, 2009a).The number of generations to the common ancestor equals to .457/.00205 = 223 generations (25 years each), without a correction for back mutations.As explained in Example 1, since the observed number of mutations per marker is .457,formula (1) gives us the exponent equal to 1.5794, the coefficient of the correction equal to 1.2897, the number of generations to the common ancestor 223  288; that is 7200 years before present.
Example 3: Thirty-one (31) of 5 marker haplotypes n China, calculated by the quadratic permutation method.For the given series the sum of squared differences between each allele in each marker equals to 10,184.It should be divided by the square of a number of haplotypes in the series (961), by a number of markers in the haplotype (5) and by 2, since the squared differences between alleles in each marker were taken both ways.This gives an average number of mutations per marker of 1.060.After division of this value by the mutation rate of .00135mutation/marker/ generation (for 25 years per generation), 19,625 ± 2800 years to a common ancestor is derived.This result is within the margin of error with that calculated by the linear method: 99/31/5/.00135= 472  683 conditional generations; that is, 17,100 ± 2400 years ago to the common ancestor of these 31 haplotypes.The values of .00677and .00135are the mutation rate constants for the 5 marker haplotypes expressed in mutation/haplotype and mutation/marker, respectively, for conditional generation of 25 years (Klyosov, 2009a).

Conclusion
The results of this s ort to the theory that ha the bearers of R1a1 began migration to e marked by known ar late E tern E viation (SD, one sigma) was calculated for an average number of mutations per marker, which is a reciprocal square root of a total number of mutations in the dataset (Klyosov, 2009a).A square root of the sum of SD 2 and .10 2 (the last figure corresponds to the square of the standard deviation of the mutation rate constant) gives the margin of error of the timespan to the common ancestor of the haplotypes in the dataset, provided that all of them are derived from the same the most recent common ancestor.For the dataset with 126 mutations (see above) the standard deviation is 8.91%, and the overall margin of error for the timespan of 7200 years is 13.4%; that is, 7200 ± 960 years.
A detailed examination of many "classical" genealogies has own that the above procedural results are the best fit with actual data.
Assignments of haplotypes ere based on their SNP classification, as provided in the databases.In some instances it was additionally supported calculating their position on the phylogenic trees from their respective STR data.tudy lend a supp plogroup R1a arose in Central Asia, apparently in South Siberia or the neighboring regions, such as Northern and/or Northwestern China, around 20,000 years before present.The preceding history of the haplogroup is directly related to the appearance of Europeoids (Caucasoids) ~58,000 ybp, likely in the vast triangle that stretched from Western Europe through the Russian Plain to the east and to Levant to the south, as it was suggested in the preceding article (Klyosov & Rozhanskii, 2011).A subsequent sequence of SNP mutations in Y chromosome, with the appearance of haplogroups NOP ~48,000 ybp and P ~ 38,000 ybp in the course of their migration eastward to South Siberia, eventually gave rise to haplogroup R ~30,000 ybp and R1 ~26,000 ybp, and then to haplogroup R1a and R1a1 ~20,000 ybp.The timeframe between the appearance of R1a and R1a1 is uncertain.
At some point in time, the west, over Tibet and the Himalayas, and not later than 12,000 ybp they were in the Hindustan.They continued their way across the Iranian Plateau, along Anatolia and Asia Minor apparently between 10,000 and 9000 ybp.By 9000 -8000 they arrived in the Balkans and spread westward over Europe and to the British Isles.At that point, R1a1 still had DYS392 = 13 in their haplotypes, as did their brother haplogroup R1b1.This marker is very slow, and mutates on average once in 3500 conditional generations.Somewhere on this extended timescale, bearers of R1a1 (or the parent, upstream haplogroups) developed Proto-IE language and carried it along during their journey from Central Asia to Europe.The earliest signs of the language in Anatolia were detected by linguists, and dated by 9400 -9600 -10,100 ybp, which coincides with the data of DNA genealogy that is described in this paper.
The arrival of R1a1 in Europe might b chaeological cultures in the Balkans and Central/Eastern Europe, dated back to 9000 -7000 ybp.Yet they also can be attributed to other ancient haplogroups, such as I, J, E, G.
As the bearers of haplogroup R1b1a2 began to popu urope after 4,800 ybp (the Bell Beakers and other R1b1 migratory waves to Europe, including perhaps the Kurgan people, though their identification and haplogroup assignment remains unclear), haplogroup R1a1 had moved to the Russian Plain around 4800 -4600 ybp.From there R1a1 migrated (or moved as military expeditions) to the south, east, and south-east as the historic Aryans.Dates for these movements are strikingly similar, and they span 4200 and 3600 ybp.As a result, in Anatolia and Mitanni, South Ural, Iran, India, and beyond the Ural Mountains, in South Siberia, in all those areas today's linguists find the same languages: the Aryan, or the Indo-European language, or the Iranian family of languages.They all have the same Aryan roots.They founded common horse breeding terminology and shared essentially the same vocabulary for household items, gods and religious terms, although sometimes twisted due to "human factor" as found in India and Iran.
Currently, most of those with European R1a1 live in Eas urope, primarily in Russia (up to 62% of the population) and Poland, Ukraine, Belarus (up to 55% of the population in the last three countries).In depth reports on their haplotype branches and distinct SNP (characteristic mutations in the DNA) will be explored in forthcoming publications.indebted to Laurie Sutherl le help with the preparation of the manuscript.ani, A., Gonzalez, A. M., L V. M., & Underhill, P. A. (2009).Saudi Arabian Y-chromosome diversity and its relationship with nearby regions.BMC Genetics, 10, 1959. doi:10.1186/1471-2156-10-59 damov, D., & Klyosov, A. A. ( 2008

Figure 2 .
Figure 2.The 6-marker haplotype tree for R1a1 haplotypes in Andra Pradesh (tribes Naikpod, Andh, and Pardhan), South India.The 46-haplotype tree was composed from data listed in(Thanseem et al., 2006).The designations of haplotypes are those used in the article.
Figure 3.The 10 marker haplotype tree for R1a1 haplotypes in India (mixed population, including tribes and castes).The 110-haplotype tree was composed from data listed in(Sengupta et al., 2006).The article contains 114 Indian R1a1 haplotypes, however, four of them were incomplete.

Figure 4 .
Figure 4.The 10-marker haplotype tree for R1a1 haplotypes in Pakistan.The 42haplotype tree was composed from data listed in (Sengupta et al., 2006).
as 15 mutations in the first 37 markers from the respective L342 Bashkir base haplotype.This places their common ancestor to 3500 ± 400 ybp.It apparently reflects migrations of R1a1-L342 bearers from the Ural region westward to Transylvania along with Finno-Ugric migrations of those times.

Figure 5 .
Figure 5. R1a1 10 marker 638-haplotype tree from all over the world, composed based on data published by Underhill et al. (2009).

Figure 6 .
Figure 6.R1a1 8 marker 365-haplotype tree from all over the world, composed based on data published by Zhong et al. (2010).

Figure 7 .
Figure 7. R1a1 8 marker 131-haplotype tree collected in North-Western China, composed based on data published by Shou et al.

Figure 8 .
Figure 8. rker 148-haplotype tree collected in Russia and Ukraine o four haplotypes have their common ancestor of ~13,275 ybp.
Figure 9. plotype 25-marker tree collected in the Indian FTDNA Pro-R1a1 53-ha ject database (November 2011).The Project contained 101 of 12 marker haplotypes, but only 53 of them were in the 25 marker format.