Molecular Markers for Genetic Diversity Studies in African Leafy Vegetables

African leafy vegetables are becoming important crops in tackling nutrition and food security in many parts of sub-Saharan Africa, since they provide important micronutrients and vitamins, and help resource-poor farm families bridge lean periods of food shortage. Genetic diversity studies are essential for crop improvement programmes as well as germplasm conservation efforts, and research on genetic diversity of these vegetables using molecular markers has been increasing over time. Diversity studies have evolved from the use of morphological and biochemical markers to molecular markers. Molecular markers provide valuable data, since they detect mostly selectively neutral variations at the DNA level. They are well established and their strengths and limitations have been described. New marker types are being developed from a combination of the strengths of the basic techniques to improve sensitivity, reproducibility, polymorphic information content, speed and cost. This review discusses the principles of some of the established molecular markers and their application to genetic diversity studies of African leafy vegetables with a main focus on the most common Solanum, Amaranthus, Cleome and Vigna species.


Introduction
African leafy vegetables include those native to Africa [1], as well as introduced vegetable crops that have been integrated into local food cultures and have become indigenized.They grow spontaneously in the wild, semidomesticated or domesticated in home gardens.They are important emerging crops on the continent due to their contribution in nutrition and food security-they help people bridge food shortage seasons because of their earliness, robustness and easiness to cultivate.Moreover, from our experience, they are important sources of income for resource-poor people, mainly women, in rural and peri-urban areas in sub-Saharan Africa.In the past, these crops have received little research attention, but nowadays they are studied in various aspects, genetic diversity being one main research focus.In order to establish efficient and sustainable breeding programmes and to decide for sound conservation strategies, knowledge on the existing genetic diversity is essential.Genetic diversity can be detected by morphological markers, biochemical constituents (e.g.secondary metabolites), and/or macromolecules (proteins and deoxyribonucleic acids-DNA).Molecular markers detect mostly selectively neutral variations at the DNA level and are an indispensable tool for genetic diversity studies with a high level of precision and reproducibility.
Morphological traits have been used for diversity analyses in African leafy vegetables such as Solanum spp., Vigna unguiculata, Amaranthus spp., and Cleome gynandra [2]- [5].The limitations of these traits result from the plasticity of certain traits and modifications caused by the environmental conditions.Moreover, morphological markers are limited in number to cover all genome regions of a plant species and are therefore less suitable to be used as markers.A correlation of morphological markers and molecular markers, however, can provide basic information to explain the genetics of phenotypic variation [6].Other classical strategies for evaluation of genetic diversity such as isozymes have also been complemented by DNA markers due to their several drawbacks such as limited number of suitable loci in a given genome and the requirement of fresh tissue of high developmental stage as well as the subjection to variation due to the environment.Today, molecular markers are an indispensable tool for genetic diversity studies with a high level of precision and reproducibility.An ideal molecular marker should have the following qualities [7]: 1) Highly polymorphic and reproducible; 2) Adequate resolution of genetic differences e.g.high information content; 3) Simple, quick and inexpensive; 4) Needing small amounts of tissue and DNA; 5) Linkage to distinct phenotypes; and 6) Requiring no prior information about the genome of an organism.Finding all these qualities in a single marker technique is nearly impossible.Hence, the selection of a marker or a combination of markers will depend on the type of study and the species.Well established DNA marker based techniques such as Restriction Fragment Length Polymorphisms (RFLPs), Random Amplified Polymorphic DNAs (RAPDs), Simple Sequence Repeats (SSRs), Inter-Simple Sequence Repeats (ISSRs), Amplified Fragment Length Polymorphisms (AFLPs) and Single Nucleotide Polymorphisms (SNPs) have been used in genetic diversity studies in African leafy vegetables.These markers differ in their genomic abundance, level of polymorphism, locus specificity, reproducibility, dominance or codominance, technical requirements and financial investment.In general, molecular markers are classified as either non-polymerase chain reaction (PCR) based markers, i.e.RFLPs, or the most commonly used PCR-based markers, i.e.RAPDs, AFLPs, SSRs, ISSRs and SNPs [8].This review aims at discussing the principles of some of the established molecular markers and their application to genetic diversity studies of African leafy vegetables with a main focus on the most common Solanum, Amaranthus, Cleome and Vigna species.

Allozymes
Allozymes are variants of enzymes which differ in one or few amino acids due to allelic differences of the encoding orthologous genes [7].For allozyme analyses, proteins are extracted from plant tissues and separated by electrophoresis by their net charge, conformation and size.Mutations in the DNA may result in replacement of an amino acid, hence in modification of net electric charge and the overall shape (conformation) of the protein.These protein modifications affect the migration rate of the proteins in an electric field allowing allelic variation detection by gel electrophoresis and subsequent enzyme-specific stains.These stains contain the respective substrate for the enzyme, co-factors and an oxidized salt (e.g.nitro-blue tetrazolium).Thus, the allozymes become visible as bands in the gel, and their numbers reflects the number of loci and alleles (homozygous or heterozygous), as well as in some cases the number of subunits of the protein that can be separated [7].
The allozyme technique is simple since it does not require DNA extraction or sequence information.However, in some species considerable effort for optimization for certain enzymes was noted.Allozymes can be applied at low costs, if no expensive enzyme staining reagents are required.Allozymes are codominant markers that are highly reproducible.Some limitations however, include a restricted number of suitable allozyme loci that can be used, the requirement of large amounts of fresh tissue and sometimes limited variation due to identical electrophoretic mobility even for distantly related germplasm [9].
Allozymes combined with RAPDs were used in analyses of genetic diversity in Vigna luteola and V. marina in Nigeria [10].Seven out of thirteen tested allozyme loci were found to be polymorphic.Mainly variation among different populations was detected, whereas the genetic diversity within both species based on the allozyme analysis was low.Using RAPDs a higher resolution was obtained [10].Other genetic diversity studies in the genus Vigna using allozymes include a collection of wild and cultivated cowpea (V.unguiculata) accessions of different parts of Africa [11].The results of the study confirmed previous morphological classifications and allowed a clear separation of the different breeding systems (outcrossing and selfing) present in the species.A collection of wild cowpea populations from West Africa with focus on Ghana, Burkina Faso and Nigeria studied by Kouam et al. (2012) [12] using nine polymorphic allozyme loci revealed pronounced genetic differences between and low levels of genetic diversity within populations due to the prevailing inbred breeding system.
In Amaranthus, Di Renzo et al. (2000) [13] used seven enzyme systems to examine the genetic diversity of commercial cultivars and experimental strains representing seven different species from an amaranth breeding programme.Based on genetic diversity indices it was shown that 60% of the variation was interspecific while 40% was intraspecific.

Random Amplified Polymorphic DNAs (RAPDs)
RAPDs [14] involve the use of short random oligonucleotide primers to realize DNA polymorphisms produced by mutations at or length mutations between the primer sites in the genome.The key innovation in this technique was the use of an arbitrary primer that requires no prior sequence information of the genome.The RAPD primers, as all PCR primers, have to meet two criteria: a minimum GC content of 40% and the absence of a palindromic sequence (a base sequence that reads exactly the same from left to right and from right to left) as suggested by Williams et al. (1990) [14].Polymorphisms in RAPD fragments are detected mainly as presence or absence of bands.Its simplicity and low cost are reasons for its wide application.However, the RAPD technique is highly sensitive to the reaction conditions, hence reproducibility between different laboratories is low [15].
Among and within six Amaranthus species from different regions of the Indo-Gangetic plains genetic diversity and relationships were analysed using RAPD markers [16].High variation in the genetic diversity within the different amaranth species was found with similarity coefficients ranging between 0.16 and 0.97.
In cowpea, Nkongolo (2003) [17] characterised 38 Malawian landrace accessions using twenty RAPD primers and scored 143 DNA fragments.Their study revealed that the variation among accessions within geographic regions or groups accounted for 96% of the total molecular variance and that this was due to the uncontrolled gene flow among the populations.Ba et al. (2004) [18] characterised domesticated cowpea, and its wild progenitors represented by wild accessions obtained from West, East and Southern Africa using 202 RAPD marker bands.RAPD analysis showed wild accessions from East Africa, the proposed area of origin, to be more diverse within the species V. unguiculata var.spontanea.Moreover, the primitive cultivars expressed a higher genetic diversity than the more advanced ones.Zannou et al. (2008) [19] used RAPD markers to reveal genetic diversity in 70 cowpea accessions in Benin.Although being based on a rather low number of only 32 polymorphic bands, their study detected a high genetic diversity among the accessions, a confirmation of a previous more detailed study involving 120 primers by Mignouna et al. (1998) [20].They suggested that the high variation may be an indication of different ancestry to the accessions studied.A combined approach using morphological traits, ISSR and RAPD markers by Ghalmi et al. (2010) [21] compared twenty cowpea landrace accessions in Algeria.The RAPD markers alone did not correlate with the morphological data to cluster the accessions, but when combined with the ISSR data a significant correlation was found.Malviya et al. (2012) [22] used 18 sets of RAPD primers resulting in 148 polymorphic bands to analyse the genetic diversity among 10 Indian cultivars of cowpea.The detected relatively narrow genetic base was suggested to be due to a single domestication event occurring in the origin of this crop.The authors also concluded that seed conservation practices led to only limited exchange of germplasm throughout India and also did not integrate genotypes from foreign sources into local breeding programmes.
RAPDs and morphological markers were also used to study genetic variation and phylogenetic relationships of 12 accessions belonging to the section Solanum by Poczai et al. (2010) [5].By 210 polymorphic RAPD markers and the morphological markers clustering of the accessions into similar groups was achieved, however, the morphological differences observed in some S. scabrum var.scabrum accessions were not correlated to the respective molecular data.In spider plant (C.gynandra), K'Opondo et al. (2009) [23] studied four morphotypes (defined by the coloration of stems and petioles) collected from small-scale farmers or wild grown plants in Kenya based on 31 polymorphic RAPD bands and could separate all four morphotypes.

Amplified Fragment Length Polymorphisms (AFLPs)
The AFLP technology [24] has received recognition as one of the most efficient markers currently available.It has the capacity to concurrently screen representative DNA regions distributed randomly throughout the genome.It combines the strength of RFLP with the flexibility of PCR-based technology by ligating adaptor cassettes with primer recognition sequences to the restricted DNA and selective PCR amplification of restriction fragments using a limited set of primers.AFLP generates fingerprints of any DNA and does not require prior knowledge of the DNA sequences.Reproducibility is also high in AFLP markers.Disadvantages include the need for pure, high molecular weight DNA, the need for dominant scoring of this marker type, and the possible non-homology of co-migrating fragments belonging to different loci [8].In addition, due to the high number and different intensity of bands per primer combination, there is the need to adopt certain strict but subjectively determined criteria for acceptance of bands in the analysis.Detection of the AFLP fragments is done on denaturing polyacrylamide gels mostly using automated sequencers or automatic capillary sequencers, but silver stained gels can also be evaluated visually.
AFLPs have been used to study genetic diversity among several species of African leafy vegetables.In Solanum spp., Jacoby et al. (2003) [25] compared the genetic relationships of 14 genotypes mostly of the S. nigrum complex established by morphological markers to those obtained in AFLP analyses including 222 polymorphic markers.Both methods separated the different genotypes and clustered them into similar groups.Dehmer and Hammer (2004) [26] employed two AFLP primer combinations to characterise the genetic diversity present in a collection of 44 accessions of the S. nigrum L. complex in the Gatersleben genebank in Germany.They were able to classify taxonomically unknown material, and to correlate the clustering of the examined accessions with their geographic origin.Moreover, different levels of genetic diversity were detected in the four identified groups with higher infraspecific variation within S. americanum than interspecific variation of the remaining species.In order to gain new insight into the taxonomically difficult S. nigrum complex, Olet et al. (2011) [27] assessed the genetic relationships among 107 accessions (90 collected in Uganda and 17 from a genebank) representing eight Solanum species using AFLPs.Although out of 510 AFLP bands only 31 were polymorphic, Olet et al. (2011) [27] were able to conclude that the accessions analysed represented only five species and that most species must have been introduced to Uganda.
Cowpea genetic diversity has also been examined in several studies using AFLPs.Coulibaly et al. (2002) [28] evaluated genetic relationships within a total of 117 cowpea accessions, including both domesticated and wild forms from eastern and western Africa.They found AFLPs to be superior to allozymes that had been applied in earlier studies.Higher genetic diversity in wild than in cultivated accessions was observed and the higher diversity of wild materials from eastern Africa supported the ideas of the origin of the species in eastern Africa.Moreover, the AFLP data were in line with a proposed unique domestication event in northern Africa [28].By three AFLP primer combinations resulting in 253 bands in total, the different subspecies of V. unguiculata could not be fully resolved due to pronounced intra-accession variability [29].Fang et al. (2007) [30] also examined genetic relationships among 60 advanced breeding lines of cowpea from different breeding programmes and 27 landrace accessions of different origins by scoring 382 AFLP bands in total (207 polymorphic).Despite the diverse origin of the materials analysed very high genetic similarity was observed.Since the accessions clearly clustered according to the breeding programmes an exchange of materials especially with West African breeding programmes was recommended by the authors.
Dendrograms based on AFLP (and ISSR) analyses of 30 accessions and cultivars of the Amaranthus species revealed a clear assignment of all taxa to different cluster groups, which was not possible by sequence comparisons due to low sequence variation in the ITS (internal transcribed spacer) region [31].

Simple Sequence Repeats (SSRs)
SSRs markers, also known as microsatellites or short tandem repeats, are very short (1 to 5) nucleotide motif repetitions occurring as interspersed repetitive elements in all eukaryotic genomes [32].Schlotterer and Tautz (1992) [33] suggested strand slippage during DNA replication to cause the variation in the number of tandem repeat units, since the repeats allow matching via excision or addition of repeats.SSRs utilize either unlabelled primer pairs or one radioactive or fluorescent labelled primer.Unlabelled SSR-PCR products are analysed using polyacrylamide or sometimes agarose gels.The detection by lasers of automated sequencers has enabled an efficient and high-throughput application of fluorescent labelled microsatellite primers [34], but on the other hand this has made the SSR technique relatively costly compared to other markers [6].SSR primer development and testing is also time-consuming especially in species whose primer sequences have to be designed newly or are inadequate for unstudied groups.These markers however have a number of advantages such as codominance of alleles, high reproducibility, a low amount of required template DNA that does not need to be of high quality, a high genomic abundance in eukaryotes and a random distribution throughout the genome [6].
SSRs have been used in recent times for most of the genetic diversity studies in African leafy vegetables: Van Biljon et al. (2010) [35] studied accessions of the S. nigrum complex and their progenies by SSR primers that had partly been transferred from other economically important Solanum species.They detected a close relationship among the accessions, hence confirmed the fact that the S. nigrum complex is taxonomically difficult to resolve.
By means of only five out of 27 polymorphic SSR markers Li et al. (2001) [36] distinguished 88 of 90 cowpea breeding lines.For the complete set of 27 primers, between two to seven alleles were detected per primer, but the primers differed strongly in their PIC (polymorphic information content) which ranged between 0.02 and 0.73 (average: 0.45) [36].Using the same primer set, Diouf and Hilu (2005) [37] found SSR markers to be a more powerful tool than RAPDs to elucidate the relationships between cowpea local cultivars and breeding lines from Senegal.Later, Asare et al. (2010) [38] designed primers from sequence reads for studying the diversity of 141 cowpea accessions from nine geographical locations in Ghana.They found a PIC range of 0.07 to 0.66 with an average of 0.38, and an average heterozygosity of 0.19.The accessions clustered into five main clades, but only a loose correlation with the geographical origin was observed.Phylogenetic relationships and genetic diversity among 16 cowpea genotypes that were used in breeding programmes for resistance to Striga gesnerioides in Burkina Faso were investigated using 16 SSR primer combinations [39].The Striga resistant cultivars were found to be very similar in their SSR profiles and only very few primer combinations revealed polymorphisms that could discriminate resistant from susceptible cultivars.SSRs were also used to assess the genetic diversity among 22 local cowpea cultivars and inbred lines collected in Senegal [40].Forty-four polymorphic primer combinations deduced from expressed sequence tags showed a lower PIC range of 0.08 to 0.33 compared to the earlier studies and the cultivars clustered in groups which were characterised by certain morphological traits.

Inter-Simple Sequence Repeats (ISSRs)
ISSRs are DNA fragments about 100 -3000 bp in length which are located between flanking microsatellite regions.The ISSR technique [41] involves amplification of DNA segments present between two identical microsatellite regions that are oppositely oriented to each other.By single microsatellite primers ISSRs of different sizes are amplified in a PCR reaction targeting multiple genomic loci.Thus, fragments of several loci are generated at once, separated by gel electrophoresis and scored for presence or absence.ISSRs are mostly dominant markers, though occasionally a few of them exhibit codominant inheritance.ISSRs are similar to AFLPs and RAPDs in that they do not require sequence data for primer construction [42].They also require low quantities of DNA templates for PCR and are randomly distributed throughout the genomes.In addition, they are simple, quick and use of radioactivity is not necessary.Despite these advantages, ISSRs can have problems in reproducibility, and their dominant inheritance and homology of co-migrating amplification products result in similar problems like for RAPDs.According to Kojima et al. (1998) [43] ISSRs show high levels of polymorphism, but this depends on the method of detection used.Polyacrylamide gel electrophoresis (PAGE) in combination with radioactively labelled primers was shown to be most sensitive, followed by PAGE with AgNO 3 staining and then agarose gels with detection by ethidium bromide staining.
ISSR in combination with RAPDs and SSRs were applied to assess the genetic diversity in 31 Amaranthus accessions [44].All markers separated the accessions, but resulted in different cluster structures making further analyses necessary.Another study [45] used ISSRs in diversity analyses of 56 Amaranthus accessions belonging to three species and separated the Amaranthus species based on 11 ISSR primers with only few exceptions.
In the Vigna Ajibade et al. (2000) [46] applied ISSRs to investigate 62 taxa within the genus with a focus on the species V. unguiculata.Of the 19 primers tested, 15 were very effective and resulted in 63 DNA fragments.It was possible to distinguish the taxa at the species level and below, however, ISSRs were not suitable to clearly differentiate subgenera within Vigna.Ghalmi et al. (2010) [21] characterised 20 landrace accessions of cowpea in Algeria by 12 ISSR primers and found a correlation of the genetic data with the geographical distribution.Vila-Nova et al. (2014) [47] studied the genetic variability and weevil resistance among 27 cowpea cultivars.Ten ISSR primers uncovered a large genetic variability and sufficient polymorphisms to discriminate all the 27 cowpea cultivars.However, the genetic variability could not be related to the resistance of the cultivars tested.
Poczai and Hyvönen (2011) [48] applied ISSR markers combined with start codon targeted polymorphisms (SCoT) and chloroplast sequence data as well as morphological traits to analyse the genetic relationship among diploid, tetraploid and hexaploid Solanum species of the S. nigrum complex.They found out that all the accessions of the diploid species shared a cluster with all the polyploid species, and concluded that the polyploid species have originated in few combinations of genetically differentiated diploids.

Single Nucleotide Polymorphisms (SNPs)
SNPs are single-base pair positions in the genome of two or more individuals at which different sequences alternatives (alleles) occur in a population.SNPs are generally abundant, but their density differs considerably between different regions of a genome between genotypes in any species, and more so between species.For instance, Sachidanandam et al. (2001) [49] reported that the average density of SNPs in the human genome was estimated at about 1 in 1000 bp but is considerably larger in some genomic areas such as the noncoding human leukocyte antigen (HLA) regions [50].In plant species analysed so far, approximately one SNP was usually present per 200 to 500 bp with large differences between different plant species.For instance, maize has 1 SNP per 60 -120 bp [51].The SNP technique combines two elements, namely the generation of allele-specific products and the analysis of the products.Detection methods for SNPs are either by direct hybridization techniques or those involving the generation and separation of allele-specific products [52].The high costs for the development of the SNP markers by comparative sequencing of a large numbers of genotypes is a limitation for the use of SNP in plants with limited economic importance.SNPs stand out because of their total number per genome, their comparatively low mutation rates, their distribution across the genome and their relative ease of detection.Applicability in high-throughput genotyping methods such as DNA chips make SNPs striking as genetic markers.Automation with SNPs is also possible and is used for example for identification of genotypes and construction of high-density genetic maps [53].Only recently the first application of SNPs in genetic diversity studies in African leafy vegetables appeared aiming at genotyping of a worldwide collection of landraces and African ancestral wild cowpeas using 1200 SNPs [54], which in contrast to earlier statements revealed the presence of two major gene pools (eastern and western Africa) and divergent domestication events in cultivated cowpeas in Africa.Further reduction in costs for next generation sequencing will allow low cost SNP detection and GBS (genotyping by sequencing) or related techniques to be used even in crops with minor economic importance in the near future.

Conclusion
The 33 studies (published between 1998 and 2015) examined in this review indicate that RAPDs have been used in the majority (25% of the studies) while SNPs were applied only in 3%, allozymes in 12%, AFLPs in 22%, and each of SSRs and ISSRs in 19% of the studies.RAPDs have been used particularly in the investigation of genetic relationship among accessions of a single species from different geographic areas and phylogenetic relationship among species.Allozymes, AFLPs, SSRs and ISSRs have been used at the interspecific and intraspe-cific levels.Most of the studies covered Vigna (58%), Solanum (24%), and Amaranthus (15%), whereas C. gynandra has received little research attention (only 3% of all studies) when genetic diversity studies are concerned.Molecular tools will increasingly be important enabling genetic studies of these and other African leafy vegetables addressing questions regarding the evolutionary origin, centers of diversity, domestication, genetic structure of populations, characterization of germplasm and establishing markers for important agronomic traits.The marker techniques used for this will change with further advances in technical development.If the PCR based markers are compared, RAPDs have been used frequently due to the fact that they are cheap and do neither need sequence information nor high-quality DNA.However, ALFPs and SSRs and possibly SNPs are recommended in future studies due to their better reproducibility and higher information content.The results will considerably support germplasm collection and maintenance strategies and enable the development of improved breeding methods