Molecular Footprint of Kenya’s Gene Bank Repositories Based on the cp-Genome Signatures

While the mutational processes that subsume biological diversity can be revealed in great detail through phylogenetic inferencing using plastid markers, few studies document their use. Accurate phylogenic inference can provide a framework for addressing a host of important evolutionary questions including a context to reconstruct molecular evolution of an organism. Despite the obvious utility of plastid markers in illuminating biological enquiry, many important questions still abound. The use of cp-DNA gene sequence data for phylogenetic inference can have an enormous impact on plant phylogenetics and systematics. The repertoire of genetic diversity of Kenya’s Gene Bank repositories can be explored based on cp-genome signatures. This is because cp-DNA-based mutational changes are an important additional tool to the previous evidence available on plant evolution yet to be explored in biodiversity studies in Kenya. Taken together, these evolutionary changes can inspire development of realistic algorithms for phylogenetic inferencing based on molecular data. Phylogenetic reconstructions are at the very core of molecular evolution. Comparative sequence analyses of plastid markers can have utility beyond the study of phylogeny. The pattern of nucleotide substitution observed over evolutionary time can reflect functional constraints imposed due to natural selection. In line with this, it is possible to detect subtle anatomical variations associated with small fitness effects that can account for genetic diversity at varietal level. The lack of sequence information in Kenyan cowpea has limited the robust advancement of molecular markers use in dissecting diversity based on the putative plastid markers [1]. The present study sought to generate and upscale novel technologies such as genomics, DNA barcoding and bio-informatics in understanding molecular diversity of cowpea accesHow to cite this paper: Patrick, O., John, M., Dennis, O., Mustafa, B. and Paul, A. (2018) Molecular Footprint of Kenya’s Gene Bank Repositories Based on the cp-Genome Signatures. American Journal of Molecular Biology, 8, 215-244. https://doi.org/10.4236/ajmb.2018.84019 Received: June 14, 2018 Accepted: October 27, 2018 Published: October 30, 2018 Copyright © 2018 by authors and Scientific Research Publishing Inc. This work is licensed under the Creative Commons Attribution International License (CC BY 4.0). http://creativecommons.org/licenses/by/4.0/ Open Access

important questions still abound.The use of cp-DNA gene sequence data for phylogenetic inference can have an enormous impact on plant phylogenetics and systematics.The repertoire of genetic diversity of Kenya's Gene Bank repositories can be explored based on cp-genome signatures.This is because cp-DNA-based mutational changes are an important additional tool to the previous evidence available on plant evolution yet to be explored in biodiversity studies in Kenya.Taken together, these evolutionary changes can inspire development of realistic algorithms for phylogenetic inferencing based on molecular data.Phylogenetic reconstructions are at the very core of molecular evolution.Comparative sequence analyses of plastid markers can have utility beyond the study of phylogeny.The pattern of nucleotide substitution observed over evolutionary time can reflect functional constraints imposed due to natural selection.In line with this, it is possible to detect subtle anatomical variations associated with small fitness effects that can account for genetic diversity at varietal level.The lack of sequence information in Kenyan cowpea has limited the robust advancement of molecular markers use in dissecting diversity based on the putative plastid markers [1].The present study sought to generate and upscale novel technologies such as genomics, DNA barcoding and bio-informatics in understanding molecular diversity of cowpea acces-

Introduction
Occurrence of cp-DNA microsatellites in the chloroplast genome has been widely utilized for delineating and reconstruction of phylogenetic relationships, taxonomic studies and the identification of maternal patterns in polyploids.The National Gene Bank of Kenya holds a repository of cowpea accessions from diverse phylogeographic backgrounds [1].They carry a repertoire of genetic diversity, not adequately characterized.However, this immense potential needs to be unlocked using a suite of novel technologies, such as genomics.Vascular plants have a relatively slow rate of molecular evolution, and their frequent exposure to hybridization and introgression, often makes it difficult to discriminate them.
Previous studies have examined these constraints in narrow geographic or taxonomic contexts, but the present investigation expanded analyses to consider the performance of seven plastid markers in molecular and phylogenetic clustering.
With many genomic tools and resources becoming increasingly available, a more detailed and in-depth genome characterization of cowpea is crucial for their genetic improvement.The current genetic profile of cowpea displays an inadequate level of characterization.High variance in taxonomic scope, biogeographic focus, the number of DNA barcode markers employed and the methodologies used for making taxonomic assignments makes comparisons among past studies American Journal of Molecular Biology difficult.In fact, no prior study has involved a large-scale comparative analysis of the capacity of the seven standard barcode markers to deliver a varietal level identification for different biogeographic gene pools of cowpea using standard barcode library.

The cp-Genome
The cp-DNA architecture of plants is currently a focus of research in plant molecular evolution, phylogenetics and systematics.Several unique genome profile of cp-DNA confers plants excellent molecular evolutionary analyses.Foremost is the fact that, the cp-DNA genome is relatively small besides constituting an abundant component of cellular DNA.Further, it is not in doubt that the cp-DNA genome has been extensively characterized at the molecular level which has had the resultant effect of providing the basic information that support comparative evolutionary research and analyses.More importantly, the cp-DNA has relatively slow rates of nucleotide substitution which has the advantage of providing the requisite window of resolution for studying plant phylogeny and systematics at deep levels of molecular evolution.Despite a fairly conservative rate of genetic evolution and relatively stable gene content, comparative molecular analyses reveal complex patterns of mutational change.Noncoding regions of cp-DNA diverge through insertion/deletion changes that are site dependent.Rates of molecular change are often reported to vary among plant families and in a manner that violates the assumption of a simple molecular clock.Protein-coding genes exhibit patterns that are known to reveal subtle anatomical and functional relationships.Comparative studies of molecular sequences have the resolution to reveal this underlying complexity.To fully understand the mechanisms of evolutionary change and in formulating realistic models of mutational processes, a complete description of the complexity of molecular signature change is necessary.The conservation of cp-DNA gene content and a relatively slow rate of nucleotide substitution in coding genes have made the cp-genome an ideal focus for studies of plant evolutionary history [2].The cp-DNA gene loci can enable the reconstruction of plant evolutionary history at a level of detail that is unprecedented in molecular systematics (Figure 1).Early work on cp-DNA sequences suggested that relative rates of nucleotide substitution do not follow a constant molecular clock but rather that substitution rates in chloroplast loci vary among evolutionary lineages.
Several DNA fingerprinting and genotyping assays based on molecular markers have been developed in the past and are still in use today [3].Phylogenetic inference of a data set composed of matK and rbcL, sequences from basal angiosperms demonstrated parsimony informative traits and significantly more phylogenetic structure on average per parsimony informative site than the highly conserved chloroplast gene rbcL [4].In the same study, sequence information from matK alone generated phylogenies as robust as those constructed from data sets comprised of 2 -11 other genes combined [5].The molecular information generated from matK locus has been used to resolve many phylogenetic relationships from shallow to deep taxonomic levels among species [6] [7].This locus distinguishes itself among plastid genes used in plant systematics due to its distinct mode and tempo of evolution.The matK rate of substitution is three times higher at the nucleotide level and is reported to be six times higher at the amino acid level than that of rbcL, denoting it as a rapidly evolving gene locus [8] [9].The alignable and coding nature can facilitate character based analyses and allow inference of DNA barcode based diversity among phylogeographically diverse groups.

The trnH-psbA Intergenic Spacer
The trnH-psbA region is a straightforward region easily amplifiable across land plants, and is one of the most variable intergenic spacers [10].It has been used successfully in a range of barcoding studies.[11] reports that the trnH-psbA, non-coding intergenic region exhibits significant sequence divergence with notable insertion/deletion rates.Studies by [12] indicate that this plastid region has highly conserved coding sequences that makes it an attractive marker.These attributes make trnH-psbA an important plant barcode for species discrimination.However, the complex molecular evolution and considerable length variation of trnH-psbA limits it as a barcode singly [13].However, trnH-psbA is reported to suffer high rates of insertion or deletion in larger families of angiosperms.The trnH-psbA putative gene loci albeit a standard barcode region in most plants has been reported to suffer frequent inversions in some lineages of plants and singly as a barcode marker, may result to over estimation of genetic divergence and consequently inaccurate assignment of phylogenetic position.
The trnH-psbA is the most widely used plastid marker with highly conserved coding sequences on both sides making the design of universal primers feasible.
The non-coding intergenic region is reported to exhibit most sequence divergence with high rates of insertion/deletion [14].These attributes make trnH-psbA highly suitable as a plant barcode for species discrimination.Alignment of the trnH-psbA spacer can be highly ambiguous because of its complex molecular evolution and considerable length variation [15], high rates of insertion or deletion in larger families of angiosperms [16].

The matK Gene Locus
The matK gene has been used in the reconstruction of grass phylogenies and to reveal polymorphisms.This gene sequence is one of the seven DNA candidate loci widely recommended for the DNA barcoding of plants [17].Chloroplast genes and intergenic spacers have been amplified and amplicons used to reveal polymorphisms following direct sequencing [18].The matK gene loci is the only putative group II intron maturase encoded in the chloroplast genome of plants and is the only plastid gene containing this putative maturase domain in higher plants [19].The matK locus is maturase-kinase gene region, a plastid gene which is responsible for the chloroplast post-transcriptional processing in plants.It has an unusual evolutionary tempo, with relatively high substitution rates at both nucleotide and amino acid levels according to [20].The strong phylogenetic signal from matK gene renders it invaluable gene loci in plant systematics and evolutionary studies at various evolutionary depths.This gene locus has 1500 base pair nested in the group II intron of the 50 and 30 exons of trnK in the large single copy region of the chloroplast genome of green plants.The matK gene sequence is one of the seven putative gene loci widely utilized in the DNA barcoding of land plants.Phylogenetic  This study therefore considered matK alongside six barcode markers.

The rbcL Locus
The cp-gene rbcL-RuBisCo large subunit is an ideal candidate barcode region and is considered the most abundant protein on earth, RuBisCo (Ribulose-

The atpF-atpH Intergenic Spacer
The second International Barcode of Life Conference proposes that atpF-atpH intergenic spacer is a potential plant barcode region.The fact that atpF-atpH marker has not been widely used in plant systematic and phylogeographics has led to paucity of data on its performance as barcode loci.However, the CBoL Plant Working Group indicate that atpF-atpH has relatively modest discriminatory power, intermediate sequence quality and universality and could be used as a plant DNA barcode.Recent studies document positive reports on the performance of atpF-atpH as a plant barcode region [25].Studies on duckweeds [26] also demonstrated that atpF-atpH; a noncoding spacer could serve as a universal DNA barcoding marker for species-level identification.In the study on duckweeds, the utility of this non cording region in identification of new species by reason of its ease of amplification, straightforward sequence alignment and rates of DNA variation was reported [27].In the same study, it's documented that studies.The current study therefore sought to among others test the in formativeness of atpF-atpH loci in delineating cowpea diversity at sub-species level.

psbK-psbI Intergenic Spacer
The psbK and psbI locus encodes two low molecular weight polypeptides, K and I respectively, for the photosystem II and are conserved from algae to land plants [28].The potential of the psbK-psbI intergenic spacer as a barcode for plants was tested in the flora of the Kruger National Park, South Africa.The results revealed its high PCR amplification and sequencing performances (98% of taxa) and ease in the alignment of sequences.Accordingly, psbK-psbI was proposed in conjunction with markers such as matK, trnH-psbA and atpF-atpH as appropriate for plant barcoding.The CBOL-Plant Working Group also observed that the species discriminatory power of this locus was better than that of matK and other loci, except trnH-psbA.However, due to the inconsistency in getting bidirectional unambiguous sequences, this has only been considered as a supplementary locus.

The rpoB and rpoc1 Gene Loci
The loci rpoB and rpoc1 are reported to encode three out of the four subunits of the cp-DNA.Genome wide substitution analyses in previous studies reveal that rpoB, rpoC1 and rpoC2 are accumulating higher amount of nonsynonymous substitutions, which indicates either positive or relaxed selection.Their high substitution rate makes these genes highly suitable for phylogenetic inferencing.
The locus rpoC has been reported as suitable for phylogenetic analysis.Currently, rpoB has been considered as the core gene for phylogenetic analyses and identification, especially when studying closely related isolates alongside 16S rRNA gene, the rpoB locus helps delineate new and refine bacterial flora [29].These loci rpoA, rpoB, rpoC1 and rpoC2 collectively, are considered ideal for phylogenetic studies.After extensive studies, these genes have been proposed for barcoding either individually or in combination by various research teams.The CBOL-Plant Working Group, on the other hand, observed that the species discrimination of rpoC1 was the least (43%) among the seven loci tested.Nonetheless, in recent years, rpoC1 has been found highly useful for barcoding the bryophytes [30].In a nutshell, further research on these gene sequences is necessary in determining their suitability as an ideal barcode.

Molecular Phylogeny
Similarity of Biological functions and molecular mechanisms in living organisms often suggests an ancestral lineage of an organism.Molecular phylogeny explores the structure and function of molecules and how they change over time to help infer evolutionary relationships.Automated DNA based approaches can accelerate cataloguing of organisms.The primary objective of molecular phylo-geny studies is reconstruction of the order of evolutionary events and proceed to represent such events in evolutionary trees which then graphically depict relationships among species or genes over time.Phylogenies are a fundamental tool for organizing knowledge of Biological diversity, for structuring classifications and for providing insight into events during evolutionary process.Furthermore, phylogenetic trees show descent from a common ancestor, thereby providing overwhelming evidence supporting the theory of evolution.
Molecular phylogenies can be generated from character datasets that provides evolutionary content and context of an organism.Bio molecular sequence alignments can also reveal character data of DNA.Molecular markers such as Single Nucleotide Polymorphisms (SNPs) can also inform genetic diversity.Evolution is modeled as a process that changes the state of a character, such as the type of nucleotide (AGTC) at a specific locus in a DNA sequence in which each character is a function that can map a set of taxa to distinct states [31] [32].The idea dates back to Darwin, but the numerical calculation of trees using quantitative methods is relatively recent [33], and their application to molecular data even more so [34].In the age of rapid and rampant gene sequencing, molecular phylogeny has come into its own, emerging as a major tool in Biodiversity studies.Genome-wide data can provide novel opportunities for resolving phylogenetic relationships.Multiple sequence alignment as one of the many heuristic methods for aligning sequences exist and improved algorithms continue to emerge.
Phylogenetic inference has recently gained popularity among molecular taxonomists and is used to describe relationships between paralogues within a gene family [35], histories of populations [36], the evolutionary and epidemiological dynamics of pathogens [37], the genealogical relationship of somatic cells during differentiation and cancer development [38] and the evolution of language [39].In recent years, molecular phylogeny has become an indispensable tool for genome comparisons notably in classification of metagenomics [40]; to identify genes, regulatory elements and non-coding RNAs in newly sequenced genomes [41]; to interpretation of modern and ancient individual genomes [42] and reconstruction of ancestral genomes.Currently, phylogenetic inference using sequenced data has received great attention by molecular Biologists.The development of the coalescent theory [43] and the widespread availability of gene sequences for multiple individuals from the same species have prompted the development of genealogy-based inference methods.These methods have revolutionized Computational Biology.Knowledge about phylogenetic inference based on a suite of current methodologies for phylogenetic inference.
While molecular phylogeny is large and complex [44], it is nonetheless an important tool in taxonomy and systematics.Phylogeny reconstruction methods are either distance or character-based.In distance matrices, the distance between every pair of sequences is calculated, and the resulting matrix is used for tree reconstruction.Algorithms such as NJ [45] can apply a cluster algorithm to the distance matrix to resolve a phylogeny.Character-based matrices such as maxi-mum parsimony, maximum likelihood is key in phylogenetic inference.In the current study, we explore a suite of current methodologies for phylogenetic inference based on sequence data of closely related cowpea variants.Genetic markers found on specific locations on a chromosome are considered landmarks for genome analyses [46].However, molecular markers reveal greater polymorphisms at the protein or at DNA level as opposed to morphological traits.

DNA Amplification
Seven polymorphic cp-DNA genome signatures were used to screen and amplify three intergenic spacers' atpF-atpH, psbK-psbL and trnH-psbA and four genes: matK, rbcL, rpoB, rpoC1.PCR products of seven primer pairs with different dyes coloaded together in 96-well working plate vortexed and spined then an aliquot utilised.PCR products were resolved on a polyacrylamide gel (1%), using 0.5× TBE containing 1 mg/mL ethidium bromide with a vertical electrophoresis apparatus at 300 v. PCR amplification in a 0.2-mL PCR tube with a reaction volume of 25 μL, containing 2.5 μL 10× PCR buffer, 1 μM of each primer, 1 mM of each dNTPs, 0.5 U TaqDNA polymerase and 50 ng DNA.Tubes placed in an Eppendorf Master Cycler Gradient thermocycler programmed for initial denaturation at 94˚C for 1 min followed by 35 cycles of 30 s at 94˚C, 30 s at 55˚C, 1 min at 72˚C, and a final extension of 10 min at 72˚C.The PCR conditions were as follows: initial denaturation at 94.0˚C for 15 minutes; 30 cycles of denaturation at 95.0˚C for 1.0 minute, annealing temp of 50˚C for 40.0 seconds, extension at 72.0˚C for 1.0 min 30 sec and final extension at 72.0˚C for 5.0 minutes.The 30 µL total reaction mixture for the amplification of either the seven candidate cp-DNA contained 1.0 mL × PCR buffer, 1.67 mM MgCl 2 , 0.3 mM dNTPs, 0.3 µM of each primer and 2.5 U TaqDNA together with 2.0 µL of DNA template.The PCR conditions were as follows: initial denaturation at 95.0˚C for 5 minutes; 35 cycles of denaturation at 95.0˚C for 1.0 minute, annealing temp at 56.0˚C for 30 seconds, extension at 72.0˚C for 1.0 min 15 sec; and final extension at 72.0˚C for 7 minutes.

Gel Electrophoresis
The amplicons were resolved on 1% agarose gel at 80 V for 48 minutes.The gels were observed for bands based on a UV trans-illuminator (FotoDyne model 3 3500 Foto-Prep).Photographs of the bands were taken using the software "Strata-gene Eagle View" that was integrated with the digital camera on the UV trans-illuminator.The gel was rinsed with distilled water and air dried.An aliquot of the PCR product was checked by agarose gel electrophoresis.The resolved products were extracted from the gel and purified using the Qiagen DNA purification kit according to the prescribed protocol.Table 1.1).The amplicon sizes are shown.PCR reaction conditions followed guidelines from the CBOL plant-working group.

Sequence Alignment
the highest Identity score was considered to be a similar sequence.BLAST used our sample sequence as the query sequence to search NCBI databases, to look for similar sequences, and then used a similarity matrix to measure the similarity between sequences and the possibility that the similarity could be due to chance based on the nucleotide sequence of the query versus its target.The higher the bit score, the more closely related that sequence was to the query sequence.The E value, on the far right, was the number of search matches to the current non-redundant sequence database expected by chance alone.The smaller the E value of the BLAST hit, the more likely that the similarity reflected a common descent and not that it occurred by chance.Sequences with an E value ≤ 10 −5 were considered homologues.For sequence annotation and gene ontology, the contigs were analysed to predict Open Reading Frame (ORF).The NCBI taxonomy tool (https://www.ncbi.nih.gov/taxonomy) was used to infer the complete classification of the sequence which was confirmed in the International Plant Names Index website.The sequences were then named based on their molecular character.The sequences were then assembled into a single Fasta file format in BioEdit software version 7 (Thomas Hall & Abbott).This was followed by performing a local alignment using ClustalW in Mega 6 software with UPGMA as the clustering method.Consensus sequences were generated and sequences of the candidate DNA barcodes aligned using ClustalW and verified by BioEdit.Sequence alignment was initially performed using ClustalW and manually adjusted using MEGA v.7.0.Phylogenetic analyses were performed using maximum-parsimony (MP) approaches.Maximum-parsimony (MP) analyses involved a heuristic search strategy based on 1000 replicates of random addition of sequences.Genetic distance matrices were calculated on the basis of Kimura 2-Parameter (K2P) substitution model for the seven chloroplast candidate DNA loci and the average values between subpopulations inferred.The distance matrices were inferred based on the Kimura 2 parameter substitution model for the seven chloroplast candidate DNA loci and the average values between subpopulations inferred.

Evolutionary Relationship
Analyses of DNA barcode sequences for cluster recognition provides an efficient approach for recognizing putative operational taxonomic units (OTUs).Sequence reads were edited by ChromasLite to collapse redundancies and unknown bases and to generate consensus sequences from sequence fragments with subsequent nucleotide alignments of partial cp-DNA loci and intergenic spacers to generate a consensus sequence using the multiple MAFFT online alignment.BLAST search on the Gene Bank database was performed to decipher similarities.The evolutionary history was inferred using the Neighbor-Joining algorithm [47].The bootstrap consensus tree inferred from the 1000 replicates was considered to represent the evolutionary history of the sub-populations.Branches corresponding to partitions reproduced in less than 50% bootstrap replicates were collapsed [48].The percentage of replicate trees in which the subpopulations clustered in the bootstrap test (1000 replicates) is indicated above the branches [49].The evolutionary distances were computed using the Jukes-Cantor method [50] and are in the units of the number of base substitutions per site.The analysis involved 54 nucleotide sequences.Codon positions included were 1st + 2nd + 3rd + Noncoding.All positions containing gaps and missing data were collapsed.In estimating evolutionary divergence between sequences, the numbers of base substitutions per site are indicated.In order to confirm identities and show evolutionary relationships between operational taxonomic units (OTUs), the sequences were independently subjected to phylogenetic analyses.Initial alignment was done by BioEdit v. 7.0.and the rest of the analyses done in MEGA 6 software.Study sequences were subjected to estimation of the evolutionary divergence based on pairwise-distance tool in MEGA 6.The sequences were arranged in order of the plastid genes targeted in the study and were aligned, cut to size and concatenated using MAFFT v. 6.857b software.The concatenated sequences were transferred to MEGA 7.0 software and a combined phylogenetic tree constructed.To calculate the overall mean evolutionary divergence.Tajima's D test was conducted in MEGA to detect the nature of gene selection, nucleotide variation per site and nucleotide divergence per sequence.

Molecular Phylogenetic Analyses
MEGA 6 software was applied in phylogenetic analyses.Subsequent phylogenetic analyses and concatenation revealed between and within cowpea variants degree of diversity and similarity.Each of the raw sequence files were inspected and corrected prior to phylogenetic analysis using BioEdit software version 7.0.These sequences were compared to those in Gene Bank (National Centre for Biotechnology Information; http://www.ncbi.nlm.nih.gov/) using BLAST to verify identity and followed by phylogenetic data analyses.Consensus sequences from the forward and reverse primer pairs were generated in BioEdit ver.7.0 [51] and subsequent nucleotide alignment generated using CLUSTAL W (Thompson et al., 1994) and implemented in BioEdit ver. 7. The alignment file was subsequently loaded in MEGA 7 to infer evolutionary history using the algorithm of Neighbor-Joining method [52].The bootstrap consensus tree was inferred based on 1000 replicates which were taken to represent the evolutionary divergence of the analyzed accessions.Branches corresponding to partitions reproduced in less than 50% bootstrap replicates were collapsed and not included in the final analyses.The evolutionary distances were computed based on the Jukes-Cantor method and were in the units of the number of base substitutions per site.The analyses involved various nucleotide sequences depending on each putative DNA loci/intergenic spacer.Codon positions included were 1st, 2nd, 3rd and noncoding.All positions containing gaps and missing data were collapsed.DNA sequences were visualized and manually edited using BioEdit software to minimize sequencing errors and removal of gaps in the coding regions that could cause shifts in the open reading frames.The BLASTn algorithm (http://www.ncbi.nlm.nih.gov/BLAST) was used to perform sequence similarity searches against the non-redundant nucleotide databases of NCBI.The correspondence between the sequences of the PCR amplicons and the known sequences was inferred.Multiple sequence alignments were performed by BioEdit and the intra-and infraspecific genetic divergences were calculated using MEGA 6 beta software [53] according to the Kimura 2-parameter distance model [53].
Based on the pairwise nucleotide sequence divergences, the neighbor-joining (NJ) tree algorithm was used to infer diversity of all the accessions.A bootstrap analysis was conducted to infer the stability of the computed branches with 1000 resampling replicates.All nucleotide positions with gaps and missing bases were collapsed from the data set using complete deletion.Identification of accessions was based on generating a phylogenetic tree.The trees were built with MEGA 6 using the best algorithms methods of UPGMA and MP compared with other tree building techniques for DNA barcoding.UPGMA trees were made from K2P distances.The MP trees were constructed based on the close neighbour interchange (CNI) method with search level 1.Each tree contained the bootstrap value as generated by Mega 7.0 software.To confirm identities and evolutionary relationships between subpopulations of vigna, the sequences were independently subjected to phylogenetic analyses using MEGA 6 and the Neighbor-Joining method in a p-distance matrix model.To concatenate the plastid marker genes, sequences were arranged in a similar order based on on Fast Fourier Transform (MAFFT v. 7) and the genes combined manually prior to construction of a concatenated phylogenetic.

Phylogenetic Reconstruction
The use of a distance matrix of sequence similarity to produce a hierarchical clustering phenogram remains popular today [54].The argument is that distance matrices are fast and easily computable when dealing with large data sets and have a reputation of resolving easily.In order to confirm identities and show evolutionary relationships between variants of vigna spp, seven chloroplast genes were independently subjected to phylogenetic analyses by employing MEGA 7.0 and the NJ algorithm in a p-distance model.To confirm the tree topologies, maximum-likelihood method was inferred [55].MEGA 7.0 was used to estimate evolutionary divergence of the collections and concatenated chloroplast gene sequences were arranged in order using BioEdit v. 7.0.and alignment of the sequences done using multiple sequence alignment based on Fast Fourier Transform (MAFFT) and the genes manually combined to allow construction of a concatenated phylogenetic tree and subsequent estimation of mean evolutionary divergence Initial alignment was done by ClustalW in BioEdit v.7.0.and subsequently in MEGA 6 software.The study sequences were also subjected to estimation of the evolutionary divergence by using the "pairwise-distance" tool in MEGA 7.0.The sequences of the study accession samples were arranged in a similar order for each of the plastid genes targeted in the study.To employ a multi-tiered barcoding technique [56] [57] [58] [59], a subset of cowpea germplasms was tested at several genomic loci to determine polymorphic barcode markers at the intraspecific level.All the seven candidate loci chloroplast gene regions provided variable and informative in delineating the various cowpea germplasms at varietal level.The suitability of single locus ability to resolve phylogenetic relationships at the varietal level was clearly demonstrated by each marker but at varying degrees.

DNA Extraction and PCR Amplification
Good quality DNA was obtained which resolved well on 1% agarose gel.Robust amplification products were obtained (Figures 2-4).Overall, PCR amplifications were largely successful with all the primer pairs designed for each DNA region exhibiting clear amplicons.Although double PCR products were usually not detectable in the gel, any problems that may have arisen from multiple comigrating amplicons of similar size were eliminated or collapsed.

Phylogenetic Analyses
The Majority of the markers revealed between three to four clades represented by letters A, B, C, D and E. The most conserved marker was rpoB yielding only two clades while the most versatile and parsimonious region was trnH-psbA which revealed a total of five clades.In view of this, rpoB would not be a good marker for phylogenetic inferencing, because it is conserved.The trnH-psbA loci appear the most informative loci in inferring phylogenetic differences among the accessions (Figures 12-18).The marker trnH-psbA was the most parsimony informative in phylogenetic inferencing.distance analyses based on the NJ algorithm generated dendograms with similar topologies resolving into 2, 3, 4 and 5 major clades as the case would apply.The marker rpoB was the most conservative generating only two clades (Figure 17).In line with this, rpoB gene locus owing to its conservative nature cannot be a good barcode locus.The markers atpF-atpH, matK, psbK-psbL and rbcL all generated three clusters each.trnH_ psbK was the most versatile generating five distinct clusters (Figure 18).The phylogeny reconstruction through the NJ method resulted in an optimal tree with a sum of branch length of 0.096 for rpoB, 3.11 for atpF_atpH, 3.95 for rbcL, 0.81 for matK, and 1.15 for psbK_psbL.In the NJ bootstrap consensus tree, all the accessions were grouped into varying number of clades depending on each locus with distinct clades set at over 50% bootstrap support.The present study indicates that the dendograms based on MP analyses yielded similar topology as those of the NJ tree and gave parsimonious trees.The NJ tree allows conversion of sequence polymorphisms into genetic distance matrices based on nucleotide substitution models.Separate analyses for each barcode marker yielded NJ trees that correctly clustered together.In contrast, the NJ tree built for each barcode sequence did not show distinct uniqueness because tree phenograms generated were largely similar and this could be attributed to low divergence values among accessions perhaps because of the genetic homogeneity of cowpea as a crop.Overall, NJ phenograms constructed from the whole set of cp-DNA loci produced low discrimination among accessions owing perhaps to paucity of informative characters in some loci like rpoB that is highly conserved.This study documents 5 major clades (trnH_psbA) supported by high bootstrap values over 50%.However, each barcode loci identified several distinct haplotypes over all target regions corresponding to specific clades.The analysis of DNA barcode sequences for cluster recognition was explored for recognizing putative cowpea sub variants based on operational taxonomic units (OTUs).This approach accelerates and improves taxonomic workflows.

Discussion
Taken together, the study further reveals the NJ for the reconstruction of phylogenetic trees from evolutionary distance data placing the accessions in distinct clades.This algorithm uses general data clustering techniques to sequence analysis using genetic distance as a clustering metric.The principle behind this method was to establish pairs of operational taxonomic units (OTUs) that minimize the total branch length at each clustering stage of OTUs.NJ was used to determine the branch lengths as well as the topology of parsimonious trees.

Evolutionary Divergences
The evolutionary history was inferred using the Neighbor-Joining algorithm.
The percentage of replicate trees in which the associated groups clustered together in the bootstrap test of 1000 replicates is shown above the branches.The evolutionary distances were computed using the Kimura 2-parameter method and are represented in the units of the number of base substitutions per site.The analysis involved a total of 54 nucleotide sequences depending on the marker.
Codon positions included were 1st + 2nd + 3rd + non-coding.All ambiguous positions were removed for each sequence pair.Evolutionary analyses were conducted in MEGA 7.0.The phylogenetic tree topology generated using the NJ approach was consistent with the tree topology generated using the maximum likelihood method (ML).It was observed that there were a total of 5 clusters of the operational taxonomic units (OTUs) with bootstrap values greater than 50% supported at the internodes for trnH_psbA, apparently the most versatile and variable region.Phylogenetic trees indicate evolutionary relationships by means of bootstrap values.
However, it was regarded as critical to establish in numerical terms to what extent molecular markers differed in terms of their mean discriminatory power and nucleotide substitution between gene sequences.Hence, the gene sequences were subjected to mean divergence and "pairwise distance" estimation.The overall mean evolutionary divergence, computed using the distance menu of MEGA 6 was 0.19 for matK gene, 0.41 for rbcL, 0.24 for rpoB, and 0.64 for atpF_atpH.These findings are indicative of the superiority of concatenated trees in phylogenetic analysis.All the seven markers proved informative in discriminating between cowpea variants into distinct clusters albeit at varying degrees indicating the likelihood of cp-DNA and intergenic spacers of being used in delineating cowpea at varietal level.It would seem therefore that all the cp-DNA gene markers, which provided overall, mean evolutionary divergence with discriminatory power in delineating and clustering cowpea accessions into distinct clades.These findings are in agreement with Tajima's neutrality test findings reporting similar observations.Hence, the phylogeny based on the seven markers was complemented with phylogenetic analyses of DNA sequences of multiple cp-DNA genes and intergenic spacers in the multilocus sequence analyses.The MLSA approach is currently recommended for phylogenetic analysis of closely related species.The mutilocus sequence approach was tested in the current study to elucidate its ability in deciphering the phylogenetic relatedness among accessions.Overall, all the candidate loci were largely variable lead by trnH_psbA yielding five clades followed by rpoc1 gene yielding 4 clusters.The matK gene, psbK_psbL and rbcL gene revealed a total of 3 clusters each.However, rpoB was the most conservative gene loci revealing only 2 major clades.It would appear therefore that comparatively, rpoB would not be a good locus in phylogenetic analyses in the present study compared to trnH_psbA.

Figure 1 .
Figure 1.Schematic timeline of different DNA barcode markers.Colours (red = warm; blue = cool) represent an informal measure of enthusiasm among DNA barcoding researchers in the systematics community for CBOL and iBOL: Adapted from: PLoS ONE, http://www.plosone.org/.
DNA barcoding made significant contribution to the taxonomical structure in duckweeds as opposed to the less informative morphological classification and O. Patrick et al.DOI: 10.4236/ajmb.2018.84019221 American Journal of Molecular Biology therefore recommends atpF-atpH as an important barcode region in biodiversity DNA quantification was done by using a DNA Nano Drop 2000/2000 c Spectrophotometer.The bands containing DNA of interest was excised and the DNA purified using the DNA DOI: 10.4236/ajmb.2018.84019224 American Journal of Molecular Biology purification kit from Qiagen plant DNA extraction kit following manufacturer's guidelines 3.3.DNA Sequencing Cycle sequencing was done using Big Dye terminator v3.1 and sequencing on a 3130 xl genetic analyzer (Applied Biosystems, USA); electropherograms were edited using SEQUENCER 4.6 software (Genes Codes Corporation, USA) and DNA sequences aligned by BioEdit.Incomplete sequences at both ends were excluded from the analyses.Failed sequencing was exempted from the combined matrix in order to analyse complete matrices.The forward primer for each of the 7 markers was labelled at the 5' end of the oligonucleotide using fluorescent dyes for detection by the automated sequencer ABI 3730 genetic analyzers (Applied Biosystems).The 25 µL PCR reaction mixture for the amplification of the chloroplast gene contained 1.0× PCR buffer, 1.5 mM MgCl 2 , 1.0 µL (99%) dimethyl sulphoxide, 0.4 mM dNTPs, 0.4 µM of each primer and 2.5 U TaqDNA polymerase (Super-Therm) (all supplies obtained from Bioneer Company South Korea together with 1.0 µL of DNA template.Sequencing primers are as indicated in Figures 5-11 indicate aligned sequence contigs for various markers.
analyses involved a total of 298 sequences NCB published GenBank accessions [KX824129-KX824422].Codon positions included were the 1st, 2nd, 3rd, and non-coding positions.Evolutionary analyses were conducted in MEGA software v7.0.Boot strapping with 1000 replicates used to measure branch confidence.Overall, the consensus trees topologies are presented here for each marker demonstrating the number of clades (clusters) resolved by each marker.

Figure 13 .
Figure 13.rpoC 1 loci: Phenogram of NJ cluster analyses.The phylogenetic tree based on rpoc1gene sequences, constructed using the neighbor-joining algorithm.Bootstrap values of more than 50 are shown at the internodes.

Figure 14 .
Figure 14.matK loci: Phenogram of NJ cluster analyses.The phylogenetic tree based on matK gene sequences, constructed using the neighbor-joining algorithm.Bootstrap values of more than 50 are shown at the internodes.

Figure 15 .Figure 16 .Figure 17 .
Figure 15.psbK_psbL loci: Phenogram of NJ cluster analyses.The phylogenetic tree based on pbsK_pbsL gene sequences, constructed using the NJ algorithm.Bootstrap values of more than 50 are shown at the internodes.
on nucleotide substitution models.Based on the coalescence of conspecific populations the NJ tree clusters closely related accessions into single but phylogenetically distinct clades.