Family and / or Friends ? Gene Mapping at Crossroads

Mapping gene(s) underlying a specific trait offers an opportunity to plant breeders to apply marker assisted selection. All gene mapping approaches except LD mapping use family based segregation populations developed by crossing two or more parents. These family based gene mapping approaches include simple interval mapping, composite interval mapping, multiple interval mapping and Bayesian mapping etc. Each approach has its own advantages and disadvantages based on type of population and underlying statistical model. Unlike family based approaches, LD mapping uses population of unrelated individuals which are like friends belonging to different family backgrounds. Relative pros and cons of family and friends based approaches make them complementary to each other. Family based approaches identify wide chromosomal region underlying the trait of interest with relatively lower markers density, and therefore, have low mapping resolution. Conversely, friends based LD mapping identifies chromosomal region of interest with higher resolution using higher marker density. The integration of family and friends based approaches addresses their respective pros and cons successfully to enhance mapping resolution for more valid application of marker assisted selection.


Introduction
The germplasm is often exploited to develop improved crop varieties for changing needs and environments [1].The vast amount of genetic variation present in the form of germplasm can be exploited to the best once the traits of economic importance have been mapped with molecular markers [2].Gene mapping is the estimation of the sequence of genes and their relative positions on a particular chromosome.The objective of gene mapping is to find molecular markers which are impartially inherited and closely linked due to location within or in close proximity of the genes governing the quantitative traits.Mapping a nucleotide sequence underlying a specific trait offers an opportunity for plant breeders to apply marker assisted (MAS) selection.Most of yield contributing traits are controlled by many loci and their molecular characterization and genetic mapping are called quantitative trait loci mapping (QTL-mapping).
Quantitative variation may be explained as the combined action of many discrete genes, each having a small effect on the overall phenotype and being influenced by the environment.The contribution of each quantitative locus at a phenotypic level is expressed as an increase or decrease in trait value and it is not possible to distinguish the effect of various loci acting in this manner from one another based on phenotypic variation alone.Furthermore, the effect of particular environmental variables is also expressed as a quantitative increase or decrease in the final trait value.The same amount of total genetic variation can be produced by allelic variation at many loci, each having a small effect on the trait or at a few loci having a larger effect.As both genetic and environmental factors contribute in the same positive or negative manner to trait value, it is generally not possible, from the phenotypic distribution of the trait alone, to distin-guish the effect of genetic factors from those of environmental factors as sources of variation in traits.Therefore, breeding for quantitative traits tends to be a less efficient and time-consuming process.Tools for directed genetic manipulation of quantitative traits have undergone a crucial revolution since the late 1980s with the development of molecular markers.As a result, interchange between molecular biology and quantitative genetics, which has developed independently for many years, has become apparent since the 1990s [3].Since then, high-density molecular maps have been constructed in many crops and genome-wide mapping and marker-based manipulation of genes affecting quantitative traits have become possible.Traits which have been improved largely by conventional breeding and biometrical methods in the past can be manipulated now using molecular markers.Location and effect of the genes controlling a quantitative trait can be determined by marker-based genetic analysis.A chromosomal region linked to or associated with a marker which affects a quantitative trait was defined as a quantitative trait locus (QTL) [4].A QTL that has a large effect and can explain a major part of total variation can be analyzed genetically as a major gene in most cases.
The efficiency of a gene mapping also called QTL mapping approach is judged on the basis accuracy in QTL identification, while playing down the occurrence of false negatives and positives.False negative (Type II error) is a state of lack of marker-trait association when in fact it exists and false positive (Type I error) is a state when there is a marker-trait association when in fact it does not exist.
A brief overview of statistics of all QTL mapping approaches is well described by Xu, [5].For detailed statistical description following references are highly recommended: [6][7][8][9].Furthermore, many freely accessed websites offer courses on statistical genomics and QTL mapping (e.g.http://www.stat.wisc.edu/yandell/statgen/course/).The scope of this review article confines to describe comparative advantages and disadvantages of all QTL mapping approaches with special focus on newly developed nested association mapping (NAM) approach.

Family Based Approaches
Majority of gene mapping approaches use family based segregation populations developed by crossing two or more parents such as F 2 , doubled haploids (DHs), recombinant inbred lines (RILs), recombinant inbred chromosomal lines (RICLs) and near isogenic lines (NILs) etc.These family based gene mapping approaches include simple interval mapping, composite interval mapping, multiple interval mapping and Bayesian mapping etc.The comparative advantages and disadvantages of these approaches are briefly described here.

Single Marker-Based Approaches
The single marker approach (also referred as single point analysis or single factor analysis of variance) has been extensively used with isozyme markers [10].Single factor ANOVA is made for each marker independently.F-test is used to test the significance between marker genotype classes.Though statistical computations for this approach are simple but it has some major drawbacks: 1) the probability of QTL detection is significantly affected with the distance between marker and QTL; 2) the approach cannot discriminate marker association with one or more QTLs; 3) the QTLs effects are likely to be miscalculated because of their confounding with recombination frequencies.

Simple Interval Mapping
Simple interval mapping approach was developed by Lander and Botstein [11] exploiting full benefits of linkage maps.The approach detects marker-trait associations at multiple points of targeted interval between two adjacent marker loci.The log of odds ratio (LOD) is used to test the presence of a QTL.If the LOD value for a QTL exceeds critical threshold value, the QTL is considered to be significantly associated with the trait under study.The formula for setting significance levels suitable for simple interval mapping for given number of number of marker interval, number of chromosomes, genome size and false positive rates was devised by Lander and Botstein [11].Simple interval mapping has been the most widely used approach because of its calculations through statistical software MAPMARKER/QTL (ftp://ftp-genome.wi.mit.edu/distribution/software/newqtl/).In this approach recombination between QTL and marker can be compensated using tightly linked markers.Thus, the probability of detecting QTL and providing accurate estimate of QTL effect is increased.However, simple interval mapping fails to take into account genetic variance of all QTLs when multiple QTLs are segregating in segregating populations.In such cases, simple interval mapping suffers from same limitations of single marker analysis.

Composite Interval mapping
Composite interval mapping combines interval mapping approach for a single QTL in an interval with multiple regressions on marker associated with few other QTLs [12].This approach has been used to develop precise models for two or three linked QTL [13,14].It takes into account a marker interval and some other chosen single markers in each analysis.Consequently, on a chromo-some with n markers, n-1 tests of interval-QTL associations are carried out.The advantages of composite interval mapping over single marker analysis and simple interval mapping are: 1) multiple QTLs can be mapped simultaneously; 2) the QTL association tests are not affected by the QTLs outside the specified interval because linked markers are used only as cofactors (this characteristic of composite interval mapping increase the accuracy of QTL mapping); 3) an other factor of increased power of QTL detection is reduced residual variance because of eliminating variance of unlinked QTLs.

Multiple Interval Mapping
Multiple interval mapping is advancement from interval mapping just as multiple regressions extends analysis of variance.With this approach we can infer the location of QTLs between markers, handle missing data properly and determine interactions between QTLs.Three different statistical approaches are used for multiple interval mapping: 1) maximum likelihood [15] and chronological testing to search model space; 2) multiple imputation which uses pairwise plots, Bayesian log of odds values (LOD) and sequential testing [16]; 3) Markov chain Monte Carlo (MCMC) to search model space [14].Multiple interval mapping is a multiple-QTL analysis which combines QTL mapping with the analysis of genetic architecture of quantitative traits through an algorithm to identify positions, number, effects and interaction of a QTL.

Bayesian Mapping
Bayesian paradigm which has been used successfully in different contexts provides a logical approach to statistical modeling [17].Bayesian analysis treats every factor as an unidentified variable with a prior distribution.It classifies variables into two classes: observable variables and unobservable variables.The observables variables include phenotypic data, pedigrees and marker data etc.
Bayesian approach gained its popularity in QTL mapping because of the availability of Markov chain Monte Carlo (MCMC) algorithms.MCMC approach achieves many analytic goals which are otherwise intricate to achieve [18].Bayesian mapping approach can also use prior knowledge of QTLs.With MCMC approaches linkage analysis can be performed with any number of marker loci, multiple trait loci and multiple genomic regions.Simultaneously, MCMC allows the use of complex pedigrees of arbitrary size.

In Silico Mapping
In silico mapping was developed to identify genes by concurrently exploiting genotypic, phenotypic and pedi-gree data available in genomic databases and breeding programs without designed mapping experiments.This approach was first used to explore whether chromosomal segments underlying quantitative traits could be predicted with the SNP database and existing phenotypic data from mouse inbred strains [19].The genotypic and phenotypic data was analyzed in silico to discover candidate QTL intervals.The potential of the computational method to accurately detect QTL intervals was tested.The results of 19 out of 26 experiments verified QTL intervals detected by in silico mapping.Hence, in silico mapping can reduce many months to years of field and laboratory work required to phenotype and genotype experimental progenies, to milliseconds once a large number of relevant data is publically available.

Family Based Linkage Analysis Mapping
This is classical approach in which LD is created by developing a population by crossing few founders.For family mapping, the first step is to establish mapping populations like F 2 , double haploids, back crosses, recombinant inbred lines and near isogenic lines which are then phenotyped to find out segregation of the trait in different environments.In the next step, DNA markers showing polymorphism between the parents and among segregants are identified.For this, a set of markers is screened for polymorphism and the polymorphic markers are used to generate genotypic data to construct linkage map (relative genetic distance) and order (position) of the molecular markers used for genotyping.The genetic map is accomplished by the assessment of recombination frequencies between the markers.The markers located on the linkage map are associated with the phenotypic data of trait(s) being studied and significantly correlated markers with a phenotypic trait are considered to be closely linked with the QTL region affecting the trait being mapped.
In family mapping, the accuracy of mapping a gene relies on the size of mapping population, genetic variation covered by the population, and number of molecular markers applied.Once, the QTLs underlying a specific trait are exactly tagged with molecular markers using linkage analysis mapping approach, the markers can be used to transfer the gene of interest from a donor line to the target genotype (marker assisted selection).Even though, linkage mapping is being used for gene mapping in crop plants, it is very costly, has low resolution and evaluates few alleles simultaneously in a relatively longer time scale [21][22][23][24].Low resolution in linkage analysis mapping is due to lower number of meiotic events happened since experimental crossing in the near past [25].

OPEN ACCESS AJPS
Family and/or Friends?Gene Mapping at Crossroads 115 Although linkage analysis in plants typically localizes QTLs to 10 to 20 cM intervals because of the limited number of recombination events that occur during the construction of mapping populations but it requires relatively less number of markers compared to genomewide association mapping.This advantage of linkage analysis can be used to identify putative genomic regions that can be used as prior information for fine mapping using association mapping.Association mapping is an alternative and/or complementary approach to identify marker-trait associations and has been extensively employed in animal and humangenetics [26,27] in which reasonably large segregating populations are not possible to develop.

Friends Based LD Mapping/Association Mapping
Current gene mapping efforts are shifting from conventional linkage analysis based mapping to LD based association mapping [28] which is the most effective approach to utilize natural variation in the form of ex situ conserved crop genetic resources.In association mapping, a natural population of unrelated individuals which can be called as friends is surveyed to determine marker-trait associations using LD [21].LD refers to historically increased non-equilibrium (reduced level of recombinations) of specific alleles at various loci.The level of LD extent can be measured statistically and therefore has been extensively used in humans to tag and finally clone genes controlling complex quantitative traits [29][30][31][32].This approach was extended to plants in 2001 and a substantially increased mapping resolution over F 1 -derived mapping populations was reported [33].Association mapping offers several advantages over familybased mapping [34].The availability of huge genetic variation in the form of germplasm provides broader allele coverage and saves time and cost to establish tedious and expensive bi-parental mapping populations, and most importantly offers higher resolution due to the exploitation of relatively higher number of meiotic events throughout the history of germplasm development.Association mapping also offers the possibility of using historically measured phenotypic data [35,36].Furthermore, covering the whole genome with sufficient mapping resolution requires thousands of markers, therefore, the strategy of targeting individual linkage groups is being successfully adopted [37,38].
The general approach of association mapping includes six steps as described by Almaskri et al. [39] and adopted by Sajjad et al. [40] 1) a collection of diverse genotypes are selected that may include, land races, elite cultivars, wild relatives and exotic accessions, 2) a comprehensive and precise phenotyping is performed over the traits such as, yield, stress tolerance or quality related traits of the selected genotypes in multiple repeats and years/environments, 3) the genotypes are then scanned with suitable molecular markers (AFLP, SSRs, SNPs), 4) population structure and kinships are determined to avoid false positives followed by 5) quantification of LD extent using different statistics like D, D' or r 2 .Finally, 6) genotypic and phenotyping data are correlated using appropriate statistical software allowing tagging of molecular marker positioned in close proximity of gene(s) underlying a specific trait.Consequently, the tagged gene can be mobilized between different genotypes and/or cloned and annotated for a precise biological function.
In a set of unrelated individuals, mapping power using association mapping approach is the probability of detecting the true marker-trait associations that depends on 1) the evolution and extent of LD in the genomic region harboring the loci for trait(s) being mapped and mapping population; 2) the type of gene action of the trait; 3) size and composition of population; 4) field design and accuracy of phenotyping, genotyping and data analysis.The power of AM can be increased by better data recording and analysis and increasing population size.In AM there are specific statistical methods to determine the falsepositives (Type 1 error) such as permutation [41] or false recovery rates [42].For association mapping study in the presence of population structure Pritchard et al. [43] established a useful technique for structured association (SA).Structured association (SA) uses Bayesian approach [44] to search sub-populations using Q matrix to avoid false positives.Population structure (Q-matrix) and kinship coefficient (K-matrix) can be estimated in subpopulations using the program STRUCTURE (Pritchard and Wen 2004).Recently, Yu et al. [45] established another approach called a mixed linear model (MLM) to bloc structure information (Q-matrix) and kinship information (K-matrix) in AM analysis.Later on, the Q+K MLM model performed better even in highly structured population of Arabidopsis as compared to any other model that used Q-or K-matrix alone [46].Some mixed model approaches also combine QTL and LD, where, QTLs or already known genes are used as a priori information in population mapping [47].This is the effective approach in association mapping that reduces the number of markers and populations size.This approach also increases the precision and power of marker-trait associations [48].

Family and Friends Based Nested Association Mapping (NAM)
The most commonly used approaches to genetic mapping are family based linkage analysis and LD based association mapping [49].Considering their advantages and disadvantages these two approaches are complementary.Linkage analysis identifies wide chromosomal region underlying the trait of interest with relatively lower number of marker coverage, therefore, it has low mapping resolution.On the other hand, association mapping identifies chromosomal region of interest with high resolution using higher marker density [33].Nested association mapping integrates family based linkage analysismapping and association mapping to combine their respective advantages to enhance mapping resolution without using very dense marker maps.The creation of NAM population is pre-requisite for NAM study.The first NAM population was developed in model crop maize (Zea mays L.) because of immediate availability of highly diverse germplasm and possibility of generating segregating progenies and their selfing to make immortal RIL genotypes [50].
The NAM strategy is to generate an immortal common mapping population that could be exploited efficiently by researches for genomic, genetic and system biology tools to dissect complex traits.First NAM population of Pakistani wheats is being developed at the Department of Plant Breeding and Genetics, PMAS-Arid Agriculture University Rawalpindi.The procedure of developing this mapping population is outlined in Figure 1.
The procedure for developing of nested association mapping population in wheat being adopted at Department of Plant Breeding and Genetics PMS-Arid Agriculture University Rawalpindi is given below.
1) Selection of highly diverse wheat genotypes as founders and crossing with a reference genotype (e.g.Inqlab-91-the most successful cultivar), followed by selfpollination of each hybrid for six generations and selecting 200 homozygous recombinant inbred lines (RILs) per family (total 6000 RILs).
2) Genotyping of each founder with large number of molecular markers for which Inqlab-91 will have rare alleles.
3) Genotyping with a smaller number of tagging markers on both the founders and the progenies to identify the inheritance of chromosome segments and to project the high-density marker information from the founders to the progenies.4) Phenotyping of progenies for various complex traits.
5) Conducting genome-wide association analysis connecting phenotypic traits with high-density markers of the progenies.

OPEN ACCESS AJPS
Family and/or Friends?Gene Mapping at Crossroads 117

Conclusion
The integration of family based linkage analysis and LD mapping approaches in the form of nested association mapping approach would enhance QTL mapping resolution power resulting in precise marker-trait association.Since a NAM population is stable and immortal, multilocations and multi-years phenotyping would enhance the validity of QTLs leading to more accurate marker assisted selection in future.

Figure 1 .
Figure 1.Diagrametic presentation of development of wheat Pak-NAM population.