Genome Mapping to Enhance Efficient Marker-Assisted Selection and Breeding of the Oil Palm ( Elaeis guineensis Jacq.)

The oil palm (Elaeis guineensis Jacq.) is one of the major cultivated crops among the economically important palm species. It is cultivated mainly for its edible oil. For a perennial crop like oil palm, the use of Marker Assisted Selection (MAS) techniques helps to reduce the breeding cycle and improve the economic products. Genetic and physical maps are important for sequencing experiments since they show the exact positions of genes and other distinctive features in the chromosomal DNA. This review focuses on the role of genome mapping in oil palm breeding. It assesses the role of genome mapping in oil palm breeding and discusses the major factors affecting such mapping. Ge-nerating a high-density map governed by several factors, for instance, marker type, marker density, number of mapped population, and software used are the major issues treated. The general conclusion is that genome mapping is pivotal in the construction of a genetic linkage map. It helps to detect QTL and identify genes that control quantitative traits in oil palm. In perspective, the use of high-density molecular markers with a large number of markers, a large number mapping population, and up-to-date software is necessary for oil palm genome mapping.


Introduction
The oil palm (Elaeis guineensis Jacq., Arecaceae) originated from West Africa [1] [2]. It is a diploid (2n = 2x = 32), perennial monocot plant, and the most productive oil-producing crop in the world. It is mainly cultivated in humid tropical zones of the world [3] [4]. It is naturally cross-pollinated, monoecious, and allogamous. The economic life span of the oil palm is about 30 years [5] [6].
According to the United States Department of Agriculture (USDA) [7], the total world vegetable oil production as of 2020/2021 was 72,769 MT, led by soybean oil (362.64 MT), followed by palm oil (75.19 MT), and rapeseed oil (69.17 MT) and sunflower seed (49.66 MT). Oil palm produces on average 4 metric tons of oil per hectare every year [8] [9]; this is approximately 10 times higher than soybean. Palm oil falls into two major applications: the food industry (with over 80% of the market) and the rest for the chemical industry for formulation of paints, inks, resins, varnishes, plasticizers, biodiesel production, etc. [3] [10].
Despite its wide adaptation and importance, oil palm production and productivity are generally far from their potential due to several biotic and abiotic constraints. Climate change, land, and a labor shortage are major factors that hinder yield and palm oil quality across the world [11]. Moreover, breeding of oil palm is made difficult because of the perennial nature of the crop that limits the rate of increasing palm oil yield and quality. Equally, Herrero et al. [12] reported that breeding of oil palm is applicable through the use of the conventional method, which needs more space and time for selecting promising crosses, mainly when increasing parental biodiversity. To alleviate this problem and improve oil palm yield and quality, breeders need to implement molecular techniques of oil palm breeding.
In marker-assisted selection (MAS), the use of a molecular marker with quantitative trait loci (QTL) helps in phenotypic screening to address the limitations of traditional breeding methods. The accuracy and efficiency of selection are improved by MAS [13]. The method brings a remarkable result mainly for traits with low to moderate heritability, which is difficult to achieve by the traditional breeding method. In most cases, MAS breeding requires knowledge about the distribution of QTL for the targeted trait inside the genome. In many crop species including the oil palm, MAS has been instrumental in the genetic improvement of several agronomic traits [13]. In oil palm, the use of MAS studies has been discussed since the 1990s [14]. In whole genome sequencing research, linkage maps, molecular markers, and QTL maps are crucial for MAS. In several crop species, linkage maps, a large number of DNA markers, and identification of QTL for major traits have been developed [15].
Genetic linkage maps express the actual inheritance of loci into offspring based on the patterns of recombination during meiosis [16]. In the oil palm, different genetic linkage maps from numerous families of oil palm have been constructed with remarkable results based on MAS breeding studies. Some genetic maps have been constructed for the oil palm using amplified fragment length polymorphism (AFLP) [17] [18], restriction fragment length polymorphism (RFLP) [14] [18], random amplified polymorphic DNA (RAPD) [19], simple sequence repeats (SSR) [20] [21] [22] [23], and single nucleotide polymorphism (SNP) [12] [24] [25] [26] [27]. Still, with the oil palm, numerous quantitative trait loci (QTL) mapping reports have revealed the existence of major-effect QTLs for many traits [12] [30]. The objective of this review was to assess and highlight the role of genome mapping in oil palm breeding and discuss the major factors affecting genome mapping.

Types of Genome Maps
Several types of maps exist such as cytogenetic map, physical map, and genetic map.

Cytogenetic Map
This is the visual appearance of a chromosome when stained and/or labeled under a microscope [31] [32]. The units for the cytogenetic map are a fraction of a chromosomal arm or centiMcClintocks (cMc) ( Figure 1) [33]. This is obtained by visualizing distinct regions marked by light and dark bands which give each of the chromosomes a unique appearance. The map shows the positions of chromosomes in the bands, i.e. bandmap (genome deletion panel) [33] [34]. Hozier and Davis [35] showed that the integration of this method of mapping with other molecular genetic mapping methods allows the study and mapping of different mammalian genomes. Azhaguvel et al. [36] reported that this type of map is the earliest that has been used for mapping fruit fly and the corn crop. Shah [34] on their part showed that the application of the cytogenetic mapping method is advantageous to study genome analysis, chromosome mapping, and analysis of somaclonal variations in tissue culture.

Physical Map
Physical mapping reflects the actual physical distance in base pairs (bp) or multiples thereof (for example, kilobases (kb) i.e. bp × 1000) between molecular markers ( Figure 1) [37] [38]. Such maps are increasingly being used to understand the molecular insights of genes and their evolution [36]. According to O'Rourke [16], physical maps provide an effective tool to isolate and study genes: where they are, what they do, and how they interrelate? A better understanding of these maps allows the location of the marker in the chromosome with the centromere and telomeres and permits the detection of some mutation phenomena such as insertions, deletions, and translocations [39]. Due to the current advancement in sequence technology, there is a constant increase of interest in these maps mainly because of the difficulty of assembling large fractionated genomes without a good physical map [40]. Dixit [33] stated that, unlike genetic maps, the construction of a physical map requires molecular biology techniques; indeed, it represents the entire genome as a set of overlaying cloned  DNA fragments that make up a genome and this is ordered with respect to a reference map (such as a genetic map). In the same light, Deonier [38] reported the usefulness of the construction of wide-ranging physical maps for studying the characteristics of both sequenced and unsequenced genomes. These maps are used in the genomic study of oil palm, in addition to genetic maps. For instance, Herrero et al. [12] reported that this map can be used for the analysis of quantitative trait locus for the traits of interest, generally for crop and specifically oil palm breeding.

Genetic Map
Genetic maps show the positions of genes and other related sequence features like DNA markers in the genome, based on how often the genes or markers are inherited together [31] [32] [37]. It shows the map distance, in cM, which separates any two loci, and the position of these loci relative to all other mapped loci ( Figure 1) [41]. It is constructed based on the meiotic recombination between homologous chromosomes [36]. The use of high-resolution genetic maps is very pertinent since they determine the relationship between breeding and genome sequencing. Likewise, O'Rourke [16] outlined that in the genetic and genomic studies of plants or animals, the application of high-density genetic maps full of polymorphic markers is a key for marker-assisted selection. Conversely, Li [42] noted that the construction of high-density and high-resolution genetic maps is vital to the structural and functional understanding of the genome and genes of interest through linkage analysis. Dixit [33] showed that a high resolution of the genetic map is determined by several factors such as the number of crossovers in a plant species that have been scored or a large number of progeny in humans. They also stated that the type of molecular markers used also has an impact on the resolution of the map. Advances in Bioscience and Biotechnology markers on a chromosome, respectively. Details of their differences are presented in Table 1. The genetic map is also known as a linkage map. It describes genes or loci within a chromosome based on recombination rate [36]. This mapping concept was first developed by Sturtevant [43], established by linearly placing five sex-linked genes on the Y chromosome of fruit fly (Drosophila melanogaster). It provides an approximate distance between loci in the genome in terms of recombination rate and its determination is based on the number of crossovers [33]. O'Rourke [16] showed that crossing over frequency between genes or DNA markers is proportional to the chromosomal distance between them. For instance, the more closer the genes, the fewer crossover frequencies, and vice-versa.
It allows the establishment of linkage associations between genes or DNA markers and it is a baseline to establish the physical mapping, thereby opening a door for map-based genome isolation. This map does not allow the study of particular chromosomes in the genome, rather a set of polymorphic genetic marker loci or genes [36]. In this map, the genetic distance between two molecular markers is computed based on the number of recombination events without precision on the actual physical distance [33]. To address the above problem, physical mapping has to be performed.
A physical map has a linearly ordered set of molecular markers (DNA fragments) surrounding the whole genome or a particular genomic region of interest [31] [32]. Azhaguvel [36] classified this map into two types. The first is the macro-restriction map which gives information regarding the DNA fragments at the chromosome level. The second is known as the ordered clone map consisting of an overlapping collection of cloned DNA fragments, such as in yeast and bacterial artificial chromosome (YAC). It determines the actual distance of DNA markers on the chromosomes in base pairs [33]. The genetic-physical map ratio varies significantly from one chromosome region to the other. It is mainly dependent on the nature of the chromosome and the frequency of recombination in that region [36]. For example, the estimated genetic to physical distance ratio of oil palm range from 68.44 Mb/cM to 21.37 Mb/cM [25]. Table 1. Differences between genetic and physical maps.

Genetic map Physical map
It is the calculated map distance based on the crossover percent between two linked genes The actual physical distance between linked genes This map distance highly varies as the frequency of crossing over varies in a different segment of chromosomes and it is only a predicted value The physical distance of linked genes bears no direct relationship to the map distance calculated based on crossover percentages The distance measured in Map unit or centiMorgan The distance measured in base pairs (bp, Kbp, Mbp) Linear order is identical as in the physical map Linear order is identical as in the genetic map The relative distance between two genes The exact location of genes in the chromosomes The need for physical-genetic maps has increased steadily in the past decade. Since then these two maps are used fully to study gene cloning and whole-genome and specific genome region DNA sequencing [31] [32]. A genetic map constructed to identify the target gene and closely linked DNA markers were used to filter a large set of the library used to construct the physical map [36]. Then, the newly produced DNA markers were used to identify the clones for genetic fine mapping. O'Rourke [16] reviewed the correlation of genetic and physical maps and revealed that physical maps consist of ordered library pieces of DNA covering entire genomes or chromosomes; the genetic map was constructed based on the recombination analysis of molecular markers, with the main target to identify the cloning genes. Physical and genetic map integration is used to identify the genomic region that has a high recombination hot spot region with repressed recombination [44]. Azhaguvel [36] noted that such integration reveals all about the genome sequences. This opens a new door to develop DNA markers, identify genes, quantitative trait loci (QTLs), expressed sequence tags (ESTs), regulatory sequences, and repeat elements.

A Molecular Marker Used for Mapping in Oil Palm Populations
Molecular markers are widely used nowadays in various plant breeding programs to track loci and genomic regions [36]. Identification of major genes controlling quantitative traits in crop plant genomes is possible with molecular markers. To this end, genetic mapping techniques are used to retrieve and locate important genes and genomic information responsible for a particular trait [31] [32]. Several genetic maps have been established for a wide range of plant species using various molecular marker systems such as RFLP [45], RAPD [46], simple sequence repeat (SSR), or microsatellite [47], sequence-tagged sites (STS) [48], AFLP [49], single-nucleotide polymorphism (SNP) [50], sequence-characterized amplified region (SCAR) [51], and cleaved amplified polymorphic sequences (CAPS) [52]. Depending on different purposes for gene mapping, each of the molecular markers has its pros and cons (Table 2). However, RFLP, RAPD, SSR, and AFLP markers are most commonly used in plant species for genetic mapping [16]. In the oil palm, different molecular marker types have been used for the construction of genetic linkage maps (  [55]. In this paper and for the first time, we review the primary results of oil palm genome mapping as summarized in Table 3. This table shows an outline of the major studies on oil palm genome mapping with their different features.

Oil Palm Genome Mapping
Genetic linkage maps reflect the actual inheritance of loci from parents to their offspring based on the patterns of recombination during meiosis. In oil palm for   [12].
In oil palm, the first genetic linkage map constructed based on RFLP markers from genomic libraries was published in 1997. This map which considers 97 RFLP markers (84 probes) mapped a selfed guineensis cross (tenera x tenera) with a total genetic distance of 860 cM, and produced a total of 24 linkage groups using a LOD score of 4 and recombination fraction of 0.4. According to the study [14], more than 95% of the markers could be linked to at least one other marker, suggesting that good genome coverage helps to detect the position of the shell thickness gene (Sh) at a distance of 9.8cM on group 10. From their result, Mayes et al. [14] concluded that this map helps to enable the mapping of the gene responsible for controlling major commercial oil palm traits. Likewise, Rance [29] also used 153 RFLP markers to construct a genetic linkage map of 84 self-fertilization F2 oil palm populations used to detect major genes influencing shell thickness. The result confirms that QTL mapping helps to detect genes that influence a large proportion of the total phenotypic variance in a large and small population.
Further, RAPD is another marker used to construct a genetic linkage map in the oil palm. The first RAPD marker map was developed by Moretzsohn [19] to develop a pseudo-testcross mapping strategy in combination with the RAPD assay. This was meant to construct genetic linkage maps of different fruit types (shell thickness) of F 1 tenera (sh+ sh-) x pisifera (sh-sh-) progeny populations.
The map used a total of 48 RAPD markers, and 308 F1 progeny populations, and produced a total of 12 linkage groups with a map distance ranging from 399.7 -449.3 cM at a LOD score of 5.0 and by considering the projected Elaeis total map distances and genome sizes, physical and genetic distances relationships were established (1.06 Mbp/1 cM and 1.09 Mbp/1 cM, for tenera and pisifera, respectively). They also obtained limited genome coverage with the two maps (28.0%, for tenera and 25.6%, for pisifera). This result depicted the importance of RAPD markers used for genetic linkage mapping markers closer to the sh+ locus, helped to detect the gene responsible for shell thickness, and gave a step forward for MAS for shell thickness in the oil palm.
AFLP is another pronounced marker used to construct a genetic linkage map in the oil palm. The first AFLP based genetic map in oil palm was developed by Billotte et al. [17] involving a cross between a thin-shelled E. guineensis (tenera) palm and a thick-shelled E. guineensis ( their findings, Seng [22] concluded that the application of a genomic map in oil palm helps to validate against a closely related population and helps identify yield-related QTLs. Likewise, Ting et al. [30] and Ukoskit [23] also used the AFLP markers to construct a genetic linkage map in oil palm. Similar studies have been done using SSRs for the mapping of the oil palm genome. For instance, QTLs identification is associated with callogenesis and embryogenesis [30], QTL mapping for oil yield using African oil palm [54], linkage map and QTL analysis for sex ratio and related traits [23], genetic map construction for two independent oil palm hybrids [55], linkage mapping and identifica- Currently, oil palm single nucleotide polymorphisms (SNPs) are the most highly preferred and high-density markers used to study genetic diversity and population structure, to construct high-density genetic maps, and provide genotypes for the genome-wide association [56], and genomic selection studies [5]. probably, also in other species.

Factors Limiting Oil Palm Genome Mapping
For the last two decades, numerous findings in the area of oil palm genome mapping have been reported by different scholars (Table 3). These have remarkably improved knowledge on the genetic improvement of the oil palm using marker-assisted selection strategies. However, the success of genome mapping of the oil palm is highly dependent on several factors. For instance, Mayes et al. [14] reported that the choice of mapping populations is one of the major determinant factors of genome mapping of the oil palm, stating that to select them, several criteria were to be considered such as the simplicity of cross for allele scoring and linkage analysis, representation of alleles within breeding materials, and availability of phenotypic data. The variation is very clear between Asian and African types of oil palm genetic materials. Again, another factor affecting genome mapping is the genetic marker type used for mapping. In this regard, a report (Table 3), clearly showed that marker polymorphism creates a variation in the outcome of genetic linkage groups. For instance, the first RFLP marker by Mayes et al. [14] produced a total of 24 LGs, while the first SSR marker by Billotte et al. [17] produced 16 LGs. Very recently, the use of high-density markers like SNPs brought more light to the genetic mapping of oil palm. In addition to the type of marker, the density of markers used also brought variation in genome mapping of the oil palm. For example, Mayes et al. [14] used a total of 97 RFLP markers while Rance [29] used a total of 153 RFLP markers. Moreover, a total of 49 additional marker loci resulted in an improvement of map resolution from 24 linkage groups [14] to 22 linkage groups [29] and not only the map resolution but, also the total map length differed; as the number of markers increased from 97 to 153, the map length decreased from 860 cM to 852 cM, respectively. Conversely, Billotte et al. [17] used a combination of 255 SSR and 688 23% of the filled gaps were covered by this marker relative to the SSR based map.
In the same vein, compared to the results of Billotte et al. [17] i.e., 255 SSRs in combination with other low-density markers resulting in a total length of 1743 cM with an average marker density of 7 cM, Billotte [20] recently used independent high-density markers i.e., 251 SSR and obtained a total map length of 1479 cM with an average marker density of 6 cM. The variation is clear that in the later, they used a single high-density mapping marker.
Besides, the population sample size is another factor that brings a variation in the genome mapping of the oil palm. Singh [18] reported that even though they are using high numbers of markers, this doesn't result in a fine map. Based on their conclusion, this result is due to the use of a small sample size of the F1 progeny. Equally, Billotte [20] reported that there is a variation of map detection power, due to the variation in population size, and based on their report, a large population size of the multi-parent system provides greater detection power for the QTL than biparental and small populations. By the same token, Ukoskit [23] also reported that the difference in map length is due to the variation in the population size. In general, a large number of markers with large population sizes (pedigree populations) results in better genome mapping [57].
Last but not the least, genome mapping is also highly governed by the software used for mapping the genome. In the oil palm, various software programs have been used to build genetic linkage maps (  [62] and LepMAP 3 [57]. Due to the perennial nature of the oil palm, its out-crossing nature and long generation time result in difficulty to obtain enough genetic materials or mapping populations, to overcome these limitations, consensus genetic maps are obtained by integrating multiple unrelated genetic maps sharing common markers. Such consensus maps can be constructed by different linkage map software (Table 3). Nowadays, Lep-MAP3 (LM3) is a novel linkage map construction software suite. It can handle millions of markers and thousands of individuals possibly from multiple families [57].

Conclusion
In perennial crops like oil palm, getting new or improved varieties through conventional breeding methods is difficult because it is time-consuming and costly, all related to the long generation cycles, large plant size, and the long evaluation period of 10 -15 years. The application of marker-assisted breeding techniques in this crop helps to minimize the above-listed constraints. The construction of genetic linkage maps plays a major role in the genetic analysis and molecular breeding programs of oil palm. This has been used for the identification of genetic loci using different traits such as yield and its components, oil quality, and abiotic stress, resulting in better genetic improvement and more cost-effective breeding. Nowadays, high-throughput molecular markers sequencing technolo- Advances in Bioscience and Biotechnology gy helps to raise both genetic and physical maps to a new level by providing an increased sequence pool from which to build genetic maps and assemble genome sequences. In the last two decades, most of the research findings reveal that genome mapping helps in the identification of major genes that control quantitative traits like yield and quality of oil palm. Furthermore, the literature on this crop shows that there is a variation of genome mapping due to several factors; for instance, marker type, marker density, size of the mapped population, and software used. Despite all pros and cons, genome mapping in this crop plays a crucial role, and to get a more pronounced map in the future, oil palm genome mapping should focus on the use of high-density molecular marker types, a large number of mapping population, and up-to-date software that can yield remarkable results and help to map and detect more quantitative traits related to both yield and oil quality.

Authors' Contributions
Essubalew Getachew SEYUM: Developed an idea and wrote the manuscript whereas all others are involved by commenting, suggesting, and re-arranging the setup of the manuscript.