Genetic Diversity and Population Structure of Tomato (Solanum lycopersicum) Germplasm Developed by Texas A&M Breeding Programs


Genetic variation developed in plant breeding programs is fundamental to creating new combinations that result in cultivars with enhanced characteristics. Over the years, tomato (Solanum lycopersicum) breeding programs associated with the Texas A&M University system have developed morphologically diverse lines of tomatoes selected for heat tolerance, fruit quality, and disease resistance to adapt them to Texas growing conditions. Here we explored the intraspecific genetic variations of 322 cultivated tomato genotypes, including 300 breeding lines developed by three Texas A&M breeding programs, as an initial step toward implementing molecular breeding approaches. Genotyping by sequencing using low coverage whole-genome sequencing (SkimGBS) identified 10,236 high-quality single-nucleotide polymorphisms (SNPs) that were used to assess genetic diversity, population structure, and phylogenetic relationship between genotypes and breeding programs. Model-based population structure analysis, phylogenetic tree construction, and principal component analysis indicated that the genotypes were grouped into two main clusters. Genetic distance analysis revealed greater genetic diversity  among the products of the three breeding programs. The germplasm developed at Texas A&M programs at Weslaco, College Station, and by Dr. Paul Leeper exhibited genetic diversity ranges of 0.175 - 0.434, 0.099 - 0.392, and 0.183 - 0.347, respectively, suggesting that there is enough variation within and between the lines from the three programs to perform selection for cultivar development. The SNPs identified here could be used to develop molecular tools for selecting various traits of interest and to select parents for future tomato breeding.

Share and Cite:

Kandel, D. , Bedre, R. , Mandadi, K. , Crosby, K. and Avila, C. (2019) Genetic Diversity and Population Structure of Tomato (Solanum lycopersicum) Germplasm Developed by Texas A&M Breeding Programs. American Journal of Plant Sciences, 10, 1154-1180. doi: 10.4236/ajps.2019.107083.

1. Introduction

Tomato (Solanum lycopersicum L.) is a major vegetable crop widely grown around the world [1] [2] [3] . In the USA, fresh and processed tomatoes accounted for more than $1.85 billion in US farm cash receipts in 2018 [4] . The two leading states for fresh-market tomato production are Florida and California, which together comprise almost two-thirds of the total US fresh tomato acreage. Historically, Texas grew as many acres of tomatoes as the leading producing states, with 13,315 ha planted in the 1960s. Because of the lack of adequate cultivars, pest/disease pressure, inefficient production practices, and competition from other production areas, however, Texas growers later migrated to other crops, largely abandoning fresh market tomatoes (harvesting only 304 ha. in 2017) [5] . To satisfy customer demand, Texas imports an estimated 2.4 billion pounds/year [6] . A recent study found that Texas consumers frequently request tomatoes with “vine ripe” flavor, aroma, and texture, and will pay a premium for locally produced selections [7] . This situation represents a great opportunity for local producers to re-claim their share of the Texas tomato market. For this to be possible, however, new cultivars and production practices need to be developed to support the industry.

The Texas A&M AgriLife Research tomato breeding programs at Weslaco and College Station have been breeding tomatoes for heat tolerance, fruit quality, and disease resistance adapted to Texas growing conditions for the past five and eighteen years, respectively. Recognizing that high temperatures significantly impact tomato flavor and appearance, our work has focused on introgressing heat tolerance and high-temperature fruit set genes. The two tomato breeding programs combined have developed more than 400 families, including heat-tolerant lines with disease resistance and diverse heirloom selections with multiple fruit colors and unique flavors. Much of this work targeted improvements in both flavor and content of beneficial phytochemicals [8] . This high-yielding, heat-tolerant base germplasm was developed over a period of 37 years at Weslaco by former Texas A&M breeder Dr. Paul Leeper and has been used extensively by the current Texas A&M breeding programs and others programs in tropical regions, including the cultivars Chico III, Chico, El Monte, Monte Grande, La Pinta, Chico Grande, and Saladette [9] [10] [11] [12] . To date, elite lines in our breeding programs have been selected using conventional phenotypic selection approaches, in which large populations are screened for several breeding cycles. Although this approach has produced high-quality, high-yield cultivars, it is time-consuming and requires substantial resources to develop each new cultivar.

An alternative approach to improve breeding efficiency involves the use of modern molecular breeding techniques for population management, including methods to obtain desired genetic heterogeneity in the end-product cultivars. One of the first steps in implementing molecular breeding approaches is to estimate the genetic variation within the breeding lines. Genetic heritable variability is indispensable in plant breeding aimed at developing new cultivars that express desirable characteristics generation after generation [13] . Furthermore, the development of improved varieties is enhanced when parents are selected based on genetic heterogeneity [14] , making genetic variation estimation necessary in breeding programs to allow the selection of parental lines either to increase breeding population variation or to develop hybrids for cultivar release [15] .

Genetic variation between breeding lines can be effectively determined through the use of molecular markers. In tomato, genetic diversity has been extensively studied using a wide range of molecular data. Miller and Tanksley (1990) [16] used restriction-fragment-length polymorphism (RFLP) markers for genetic diversity analysis of self-incompatible and self-compatible tomato species. To unveil the genetic variations that underlie fruit sugar and organic acid production, Zhao et al. (2016) [17] conducted a genetic diversity analysis of 174 tomato accessions using simple sequence repeat (SSR) markers. To gain insight into the morphological traits of fruits, Sacco et al. (2015) [18] performed a genetic diversity analysis of 123 tomato genotypes using single-nucleotide polymorphisms (SNPs). Similarly, Lin et al. [19] and Aflitos et al. [20] performed an evolutionary study of tomato and its wild relatives involving SNPs.

The advent of next-generation sequencing technologies coupled with bioinformatics has led genetic diversity studies into a new era. Sequencing of tomato has resulted in the discovery of large numbers of SNPs distributed throughout the genome [20] [21] [22] . Furthermore, cultivated tomato genome has been fully sequenced [23] and the genotyping by sequencing (GBS) has emerged as a powerful tool for sequencing large populations. The availability of large numbers of SNPs distributed throughout the genome, a reference genome, and the GBS technique [23] [24] [25] has made large intraspecific studies possible. This is important as most prior studies focused on interspecific variations and only a few intraspecific studies have been performed [19] [20] [26] [27] . The SNPs postulated from such intraspecific studies offer better clues to the genetic control of agronomic traits and can be used to deduce phylogenetic relationships. Parent selection based on such genetic information can greatly enhance breeding efficiency and help to achieve breeding goals such as high quality (flavor, color, shape), long shelf life, disease resistance, and heat tolerance.

In the present study, we used three representative sets of tomato breeding lines from the current Texas A&M AgriLife Research breeding programs at Weslaco and College Station and from former Texas A&M breeder Dr. Paul Leeper to assess genotypic intraspecific variations within Texas A&M germplasm. Sequencing of these lines yielded 10,236 high-quality polymorphic SNPs. Genetic distance analysis revealed that the tomato breeding lines developed by the Texas A&M breeding programs possess a high level of genetic diversity that, upon selection, can be used to develop high-yielding adapted cultivars for Texas production. Furthermore, intraspecific SNPs identified in the present study could be used to identify economically important traits in cultivated tomatoes. Finally, based on the results of phylogenetic and genetic distance analyses, hybridization strategies can be developed to increase diversity and optimize hybrid development within and between breeding programs.

2. Materials and Methods

2.1. Plant Material

A total of 322 tomato (Solanum lycopersicum) genotypes were evaluated in this study. Among them, 300 genotypes were developed by three independent tomato breeding programs in the Texas A&M University (TAMU) system. Out of them, 127 were developed by Dr. Kevin Crosby’s breeding program at Texas A&M University, College Station, TX (designated TAM-CS); 125 by Dr. Carlos Avila’s breeding program at the Texas A&M AgriLife Research and Extension Center at Weslaco, TX (designated TAM-W); and 48 by the breeding program of Dr. Paul Leeper, a former TAMU breeder at Weslaco, Texas (designated TAM-L) (Table S1). These genotypes were developed by hybridizations of Texas A&M germplasm with a diverse set of parents including accessions from the USDA National Germplasm System and other public breeding programs mentioned below and subsequent selfing up to the F9 generation. Pedigree information for all the breeding lines developed in Leeper’s program and some from Crosby’s program have been lost (Table S1). Breeding lines developed from all the three breeding program harbor good phenotypic variations in tomato fruit shape, size, and color. Besides the genotypes from the Texas A&M University breeding programs, 16 genotypes from the USDA collection, 3 from the Asian Vegetable Research and Development Center (AVRDC), and 3 developed by University of Florida tomato breeding program (designated FLA) were also included in the present study (Table S1).

2.2. DNA Extraction

Leaves from twelve four-week-old seedlings of the respective genotypes were collected and combined into a single bulk sample. Tissue was lyophilized, homogenized, and stored at −20˚C until extraction. Genomic DNA was extracted from 50 mg of homogenized tissue using the CTAB method [28] . Qualitative and quantitative tests of the DNA were performed by electrophoresis and Qubit 2.0 fluorometry (Life Technologies, Carlsbad, CA), respectively. For each sample, 1.2 mg of DNA was sent to the Texas A&M Genomics and Bioinformatics services (College Station, TX) for sequencing.

2.3. GBS, SNP Discovery, and Population Structure

Genotyping of 322 tomato genotypes was performed using low-coverage whole-genome sequencing (SkimGBS [29] ) with a paired-end approach (150 bp × 150 bp) (Illumina HiSeq 4000) at the Texas A&M Genomics and Bioinformatics service (College Station, TX). Raw sequences from the 322 genotypes were filtered to remove low-quality reads and adapter sequences. High-quality sequence data were mapped to the tomato reference genome (S. lycopersicum v3.00) [23] using bowtie2 [30] . The aligned BAM files were sorted, quality filtered for mapping, and filtered for duplicate reads using SAMtools [31] and Picard ( The GATK HaplotypeCaller (HC) [32] was used for SNP calling from the aligned data of the 322 tomato genotypes. These raw polymorphic SNPs were filtered to remove SNPs with a high percentage of missing genotypes and low minimum allele frequency (MAF). The resulting genotypes were imputed using Beagle (v4.00) [33] . The imputed genotypes were further filtered to keep only genotypes with probability ≥0.9. The polymorphic SNPs were subsequently filtered to remove the SNPs with >30% missing genotypes.

The population structure and hybrid forms of tomato genotypes were inferred using the Bayesian model-based clustering program STRUCTURE (v2.3.4) [34] using polymorphic SNPs obtained from the GBS analysis. To determine the number of populations in a given genotype, the STRUCTURE was run with 5000 burn-in periods with 5000 Markov-chain Monte Carlo (MCMC) steps using an admixture model and correlated allele frequencies among populations. The program was run independently three times for each value K ranging from 1 to 10. To detect the true value of K (population), we used the uppermost level of structure calculated using the ΔK method as described in Evanno et al., 2005 [35] . The tomato genotypes were assigned to each true population (Q) based on the value obtained for the proportion of population membership for a given K. The population structure of 322 tomato genotypes was visualized using a bar plot (sorted by Q) in the Python matplotlib package.

2.4. Phylogenetic and Principal Component Analysis

Phylogenetic analysis was performed using the unweighted pair-group method with arithmetic mean (UPGMA) algorithm implemented in TASSEL v5.2.52 [36] . The phylogenetic tree obtained from TASSEL was visualized using iTOL v4.3.3 and each population was annotated using customized annotation files [37] . The pairwise genetic distance matrix between each pair of genotypes was calculated using TASSEL v5.2.52 and visualized using the Python matplotlib package. The PCA was performed using the PCA function in TASSEL. The first three principal components were exported and visualized as a three-dimensional (3D) scatter plot using the Python matplotlib package.

3 Results

3.1. Generation of High-Quality Tomato GBS Data

We generated a total of ~598 million sequence reads (paired-end, 150 bp) using low-coverage (average ~0.37×) whole-genome sequencing across 322 tomato genotypes. The raw sequence data were filtered to remove low-quality bases, adapter contamination, and uncalled bases to produce high-quality sequence data (~522 million reads). On an average, ~95% of high-quality reads mapped to the tomato reference genome for SNP discovery. In total, we obtained ~3.2 million SNPs from tomato SkimGBS data from the 322 genotypes, which we subsequently filtered to remove SNPs with >50% missing, rare alleles with MAF <5% across all 322 tomato genotypes, and SNPs with low genotype probability (<0.9) (Figure S1 and Figure S2). We used the remaining 10,236 high-quality SNPs for downstream analysis. SNPs were not distributed evenly across all chromosomes (Figure 1). Chromosome 12 and 1 had the highest numbers of identified SNPs with 1337 and 1208, respectively, whereas chromosomes 6 and 4 had the lowest number of identified SNPs with 173 and 255, respectively. In addition, 1279 SNPs were mapped to unanchored scaffolds (Chr00).

3.2. Genetic Distance between Tomato Genotypes

We calculated the pairwise genetic distance matrix for the 322 tomato genotypes in TASSEL v5.2.52. Genetic distance between tomato genotypes ranged from 0.092 to 0.443, with an average distance of 0.270 (Table 1 and Table S2). Among them, the combination of genotypes TAM-CS-138 and USDA-273 revealed the smallest genetic distance (0.092). Genotype TAM-CS-138 is an F5 inbred heirloom type with large, pink fruit, developed by the Texas A&M College Station breeding program, and genotype USDA-273 is a cherry tomato that produces small red fruit, from the USDA germplasm bank (Table S1). Among all possible 100,142 combinations between the 322 genotypes, the largest genetic distance (0.443) was observed between genotypes TAM-CS-111 and TAM-W-322 (Table 1). Genotype TAM-CS-111 is an F5 inbred that produces small, round red fruit, from

Figure 1. Distribution of 10,236 SNPs across tomato chromosomes. Unanchored scaffolds (Chr00) refers to SNPs not mapped to any chromosome.

Table 1. Genetic distances among tomato genotypes developed by different groups.

Note: AVRDC = Asian Vegetable Research and Development Center; TAM-CS = Dr. Kevin Crosby’s breeding program at Texas A&M at College Station; TAM-W = Dr. Carlos Avila’s breeding program at Texas A&M AgriLife Research and Extension Center at Weslaco, TX; TAM-W = Dr. Paul Leeper’s breeding program at Texas A&M AgriLife Research and Extension Center at Weslaco, TX; FLA = Florida Tomato Breeding Program; USDA = United States Department of Agriculture.

the Texas A&M College Station breeding program, while TAM-W-322 is an F9 inbred that produces medium Roma-type pink fruit, developed by the Texas A&M AgriLife breeding program at Weslaco.

The genetic distances between the genotypes from the three Texas A&M University Breeding programs and control outgroups from the USDA, AVRDC, and Florida are presented in Table 1. Breeding lines developed by the Texas A&M College Station, Weslaco, and Leeper programs had overall intra-program genetic distance means of 0.282, 0.264, and 0.255, respectively (Table 1). The genetic diversity between germplasms from the different Texas A&M breeding programs indicates a high potential for introducing variability between programs. In regard to within-program variation, among lines from the Texas A&M AgriLife breeding program at Weslaco, the largest genetic distance (0.434) was between genotypes TAM-W-172 and TAM-W-322; among lines from the Texas A&M College Station breeding program, the largest genetic distance (0.392) was between TAM-CS-111 and TAM-CS-165; and for those from the Texas A&M Leeper program, the largest genetic distance (0.347) was between TAM-L-51 and TAM-L-16. The genotypes developed by the USDA were overall the least diverse group, with a mean genetic distance of 0.234 and a range of 0.192 - 0.292. Within that group, the genotype combination of USDA-259 and USDA-320 showed the largest genetic distance (Table 1). The sets of genotypes from the AVRDC and Florida breeding programs used in the present study showed mean genetic diversities of 0.296 and 0.298, respectively (Table 1).

3.3. Population Structure

We explored the population structure of tomato genotypes using a model-based clustering method implemented with STRUCTURE v2.3.4. The maximum value for ΔK was observed when K = 2 (Figure 2(a)), indicating the presence of two

Figure 2. Identification of population structure using a model-based clustering method. (a) Calculated ΔK values for each number of expected populations (K); (b) The 322 tomato genotypes were assigned to two populations (Q1 and Q2) by model-based clustering. The distributions of genotypes in each population are represented in orange (Q1) and blue (Q2) based on their population membership.

main population clusters (Q1 and Q2) (Figure 2(b)). Out of the entire population evaluated in this study, 32 tomato genotypes (9.9%) were grouped into Q1, while the remaining 290 genotypes were placed into Q2 (90.1%) (Table 2). Of the two clusters, the genetic diversity assessment indicated that Q1 is more diverse, and it included the two genotypes with the largest genetic distance observed (genotypes TAM-W-322 and TAM-CS-111, Figure 3(a)). The range of genetic distances between genotypes assigned to cluster Q1 was 0.288 - 0.443, and the mean was 0.346 (Figure 3(a)). In cluster Q2, the range of genetic distances between genotypes was 0.092 - 0.334, with a mean of 0.268, and this cluster included the two genotypes with the smallest genetic distance (0.092), TAM-CS-138 and USDA-273 (Figure 3(b)).

The population structure analysis also revealed that genotypes from the breeding programs were distributed between the Q1 and Q2 clusters, while all evaluated genotypes from the USDA collection belonged to the Q2 cluster (Table 2 and Table S1). The majority of genotypes (62%) in the Q1 cluster were developed by the Texas A&M College Station breeding program, while germplasm developed by the Texas A&M AgriLife Weslaco and Leeper breeding programs accounted for 18.75% and 12.5%, respectively (Table 2). On the other hand, the Weslaco and College Station breeding programs contributed roughly equally (41% and 36.89%, respectively) to cluster Q2, with the Leeper program accounting for 12.5% (Table 2). The Q2 cluster included most of the genotypes from each of the three breeding programs, accounting for 95.2% of those from the Weslaco program, 84.26% of those from the College Station program, and 91.66% of those from the Leeper program.

Figure 3. Distribution of genetic distance between genotypes in two populations. (a) Distribution of genetic distance in population Q1. The maximum genetic distance (0.443) occurred between TAM-CS-111 and TAM-W-322 and the minimum (0.282) between TAM-CS-150 and TAM-CS-120; (b) Distribution of genetic distance in population Q2. The maximum genetic distance (0.334) was between TAM-CS-104 and TAM-W-204 and the minimum (0.092) between TAM-CS-138 and USDA-273.

Table 2. Distribution of tomato genotypes from six different sources into two clusters. A model-based structure analysis performed on 322 genotypes divided them into two clusters Q1 and Q2.

Note: AVRDC = Asian Vegetable Research and Development Center; TAM-CS = Dr. Kevin Crosby’s breeding program at Texas A&M at College Station; TAM-W = Dr. Carlos Avila’s breeding program at Texas A&M AgriLife Research and Extension Center at Weslaco, TX; TAM-W = Dr. Paul Leeper’s breeding program at Texas A&M AgriLife Research and Extension Center at Weslaco, TX; FLA = Florida Tomato Breeding Program; USDA = United States Department of Agriculture.

3.4. Phylogenetic Tree and Principal Component Analysis

Next, we constructed a phylogenetic tree based on the 10,236 SNPs and found that it also divided the 322 tomato population into two groups and that these groups corresponded with the two population clusters Q1 and Q2 (Figure 4). Thus, the phylogenetic tree displayed consistency with the population structure revealed by the model-based clustering analysis with STRUCTURE v2.3.4 (Figure 2). Figure 4 shows that the genotypes producing the smallest genetic distance (USDA-273 and TAM-CS-138) had the shortest branches arising from the lowermost clade. Similarly, genotype TAM-W-322, which was one of the two genotypes producing the largest genetic distance with another, was placed on the

Figure 4. Phylogenetic analysis of 322 tomato genotypes built using the UPGMA hierarchical clustering method. The branches of the two predicted populations Q1 and Q2 are highlighted in orange and blue, respectively.

extreme other side of the phylogenetic tree in the longest uppermost clade (Figure 4). The phylogenetic tree indicated that the genotypes TAM-W-322, TAM-CS-111, and TAM-L-16, from the Texas A&M Weslaco, College Station, and Leeper breeding programs, respectively, had the potential to yield greater genetic diversity when combined with other genotypes. We also performed PCA to check the number of population structure groups; Figure 5 presents the distribution of tomato genotypes in scatter plots of the first three principal components in a 3D space. This PCA also revealed that the tomato genotypes clustered into two groups, with some overlap indicative of the small genetic distances between some genotypes in Q1 and Q2.

4. Discussion

Genetic diversity studies have increased in recent years due to advances in high-throughput sequencing technologies and the availability of high-resolution SNPs. For example, 5.4 million SNPs were identified between wild and cultivated tomato genomes during the sequencing of the tomato reference genome from the cultivar Heinz 1706 [23] . Likewise, 11.6 million SNPs were found from the sequencing of 360 accessions that included both cultivated and wild tomato species [19] and 180,000 - 350,000 SNPs from the sequencing of four large-fruited cultivated tomato accessions [38] . In the present study, sequencing of 322 tomato genotypes from cultivated S. lycopersicum resulted in the discovery of 3.2 million SNPs. After filtering on the basis of quality parameters, 10,236 high-quality SNPs were obtained and used for genetic diversity analysis. Among them, the largest number of SNPs were observed in chromosome 12, followed by chromosome 1 and 10 (Figure 1). The existence of unanchored scaffolds (Chr00) and the large

Figure 5. Principal component analysis (PCA) of 322 tomato genotypes. The first three principal components (PCs) are shown using a 3D scatter plot. The PCA clustered the 322 genotypes into two distinct clusters (populations) Q1 and Q2, represented by orange and blue dots, respectively. Most of the genotypes were assigned to the Q2 population.

number of SNPs mapped to it (1279 in total) indicate that numerous genomic regions have not yet properly placed in chromosomes [2] . It also highlights the importance of identifying new intraspecific SNPs in the tested tomato breeding lines.

Past efforts to develop diverse breeding populations in Texas A&M University breeding programs are reflected in the range of genetic diversity within and between the groups of tomato lines developed by the three Texas A&M programs as well as in comparison to the outgroup lines. High ranges were expected, since some of these lines were generated from diverse sets of parents, including some that the USDA, Florida, and AVRDC programs contributed to Texas A&M diversity (Table 1 and Table S1). Among the breeding programs, the highest range and mean of genetic diversity were detected among the genotypes from the Texas A&M Weslaco (genetic distance range 0.175 - 0.434, mean 0.264) and College Station (range 0.099 - 0.392, mean 0.282) breeding programs (Table 1). These results can be explained by the possibility that a significant proportion of common parents shared has been shared between Texas A&M breeding programs and subsequent selections between programs. The largest genetic diversity was achieved from the combination of genotypes from the Weslaco and College Station programs (genetic distance of 0.443 between genotypes TAM-W-322 and TAM-CS-111), indicating that crossing germplasm from the two programs should generate more variation for cultivar development. However, in looking at the genetic diversity between the Weslaco and College Station breeding programs, we found that there was in general greater genetic diversity within than between programs, perhaps because the recently initiated program at Weslaco used College Station material for breeding population development. The broad range of genetic diversity of breeding lines within a breeding program was also reflected in the population structure analysis and the phylogenetic tree (Figure 3 and Figure 4). Genotypes from all three Texas A&M breeding programs and also from AVRDC and Florida lines were observed in both the Q1 and Q2 clusters (Figure 4). Additionally, the grouping of genotypes into two clusters with some overlaps was further validated by the PCA.

Several inbred lines developed by the Texas A&M Weslaco and College Station breeding programs were developed from the hybridization of heirloom tomato parents with morphologically diverse fruit characteristics, including color, size, and shape, in an attempt to improve quality. Though distinct in nature, heirloom tomatoes possess comparatively low genetic diversity [2] [39] [40] . Thus, a genotype developed by hybridizing two heirloom tomato strains is expected to have low genetic diversity compared to genotypes evolved from contemporary lines since heirloom genotypes that are different only in shape and color may differ only by a handful of genes [41] [42] [43] . This may have contributed to the lower genetic diversity in some of the Texas A&M Weslaco and College Station breeding program lines in the Q2 cluster.

On the other hand, some of the breeding lines were developed by introgressing one or more disease-resistance genes. Disease-resistance genes are primarily introgressed from wild relatives, which have been reported to carry 20 times higher genetic diversity than that of cultivated tomato [19] [20] , and which thereby contributed to the high genetic diversity in Texas A&M breeding populations. Some examples of introgressed resistance genes in the Texas A&M AgriLife breeding population include the gene Mi-1, which confers resistance against root knot nematode caused by Meloidogyne spp. and was introgressed from Solanum peruvianum [44] ; Sw-5, which confers resistance to the tomato spotted wilt virus (TSWV), introgressed from S. peruvianum [45] [46] ; Ty-2 and Ty-3, which confer resistance to tomato yellow leaf curl virus (TYLCV), introgressed from S. habrochaites [47] [48] and S. chilense [49] , respectively; and I-2 and I-3, conferring resistance to vascular wilt caused by Fusarium oxysporum race 2 (Fol2) and Fol1, Fol2, and Fol3, were introgressed from S. pimpinellifolium [50] and S. pennellii [51] [52] , respectively. Thus, introgressions of disease-resistance genes during hybridization could have played an important role in producing the genetic diversity among breeding lines observed in the present study and thus in grouping the genotypes into two clusters.

The present study revealed that the tomato breeding lines developed by the Texas A&M breeding programs possess a high level of genetic diversity and thus should be capable, upon selection, of yielding a variety of cultivars adapted for Texas production. Furthermore, the broad genetic base of the breeding lines and the higher recombination generated through hybridization could be utilized to uncover QTLs for complex traits. As the SNPs identified here were intraspecific, they could be valuable for uncovering economically important traits within cultivated tomato. Finally, our work here suggests that through the use of a phylogenetic tree and genetic distances, it is possible to develop crossing strategies to increase diversity and encourage hybrid development within and between breeding programs.


We are grateful to the USDA ARS Plant Genetic Resources Unit, the AVRDC, and Samuel Hutton at the University of Florida for providing seeds for this study. We would like to thank the Bioinformatics Genomics and Bioinformatics services, College Station, TX, for performing the GBS on tomato genotypes. We appreciate the help of Samantha Serna, Alexandra Hernandez, and Alondra Menchaca for planting tomato genotypes in the greenhouse, and Henry Awika and Sonia del Rio for tissue collection, lyophilization, and homogenization. This research was funded by Texas A&M AgriLife Research seed grant FY18-124353 assigned to C.A.A. and K.C. and Texas A&M AgriLife Research Insect-vectored Disease Seed Grant (114190-96210) to K.M.


Table S1. The details of 322 tomato genotypes used in genetic diversity analysis. All the genotypes were from cultivated tomato Solanum lycopersicum.

AVRDC = Asian Vegetable Research and Development Center; TAM-CS = Dr. Kevin Crosby’s breeding program at Texas A&M at College Station; TAM-W = Dr. Carlos Avila’s breeding program at Texas A&M AgriLife Research and Extension Center at Weslaco, TX; TAM-W = Dr. Paul Leeper’s breeding program at Texas A&M AgriLife Research and Extension Center at Weslaco, TX; FLA = Florida tomato breeding program; USDA = United States Department of Agriculture; *Fruit size: XS = extra small, S = small, M = medium, L = large, XL = extra large.

Table S2. Distance Matrix of 322 Genotypes Based on Identified SNPs Markers

Figure S1. Distribution of the SNP missing rate (a) before imputation and (b) after imputation. SNPs with >50% missing, rare alleles with minor allele frequency (MAF) < 5% across all 322 tomato genotypes, and SNPs with low genotype probability (<0.9) were imputed.

Figure S2. Average missing rate of SNPs across 322 tomato genotypes (a) before imputation and (b) after imputation. SNPs wit > 50% missing, rare alleles with minor allele frequency (MAF) < 5% across all 322 tomato genotypes, and SNPs with low genotype probability (<0.9) were imputed.


AVRDC = Asian Vegetable Research and Development Center

GBS = Genotyping by Sequencing

HC = Haplotype Caller

MAF = Minor Allele Frequency

PCA = Principal Component Analysis

SNP = Single-Nucleotide Polymorphism

TAMU = Texas A&M University

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.


[1] Foolad, M.R. (2007) Genome Mapping and Molecular Breeding of Tomato. International Journal of Plant Genomics, 2007, Article ID: 64358.
[2] Menda, N., Strickler, S.R. and Mueller, L.A. (2013) Advances in Tomato Research in the Post-Genome Era. Plant Biotechnology, 30, 243-256.
[3] Rothan, C., Diouf, I. and Causse, M. (2019) Trait Discovery and Editing in Tomato. Plant Journal, 97, 73-90.
[4] National Agricultural Statistics Service (2018) Vegetables 2018 Summary. United States Department of Agriculture, Washington DC.
[5] National Agricultural Statistics Service (2017) Census of Agriculture. United States Department of Agriculture, Washington DC.
[6] Texas Department of Agriculture (2014) Production versus Consumption Surplus and Deficit. Texas Agricultural Commodities.
[7] Segovia-Coronel, M. (2014) Measuring Consumer Acceptance and Willingness to Pay for Specialty Tomatoes: Impact of Product Taste and Health Features. Department of Agricultural Economics, Texas A & M, College Station.
[8] Lee, J.H.J., Jayaprakasha, G.K., Avila, C.A., Crosby, K.M. and Patil, B.S. (2019) Metabolomic Studies of Volatiles from Tomatoes Grown in Net-House and Open-Field Conditions. Food Chemistry, 275, 282-291.
[9] Lazarte, J.E., Leeper, P.W. and Barringer, R.K. (1981) High-Temperature Fruit Set of Tomatoes. HortScience, 16, 289-289.
[10] Leeper, P.W. and Cox, E.L. (1986) “Freshmarket 9” Tomato. HortScience, 21, 156.
[11] Lazarte, J.E., Leeper, P.W. and Barringer, R.K. (1981) High-Temperature Effects on Tomato Fruit-SET. HortScience, 16, 444-445.
[12] Leeper, P.W. (1969) TAMU “Chico III”: A Pear Type, Disease Resistant Tomato Designed for Machine Harvest. T.A.M.U. Texas Agricultural Experimental Station, College Station.
[13] Acquaah, G. (2012) Variation: Types, Origin and Scale. In: Principles of Plant Genetics and Breeding, 2nd Edition, John Wiley & Sons, Ltd., Hoboken.
[14] Bisen, A., Khare, D., Nair, P. and Tripathi, N. (2015) SSR Analysis of 38 Genotypes of Soybean (Glycine max (L.) Merr.) Genetic Diversity in India. Physiology and Molecular Biology of Plants, 21, 109-115.
[15] Denwar, N.N., Awuku, F.J., Diers, B., Addae-Frimpomaah, F., Chigeza, G., Oteng-Frimpong, R., Puozaa, D.K. and Barnor, M.T. (2019) Genetic Diversity, Population Structure and Key Phenotypic Traits Driving Variation within Soybean (Glycine max) Collection in Ghana. Plant Breeding.
[16] Miller, J.C. and Tanksley, S.D. (1990) RFLP Analysis of Phylogenetic Relationships and Genetic Variation in the Genus Lycopersicon. Theoretical and Applied Genetics, 80, 437-448.
[17] Zhao, J.T., Xu, Y., Ding, Q., Huang, X.L., Zhang, Y.T., Zou, Z.R., Li, M., Cui, L. and Zhang, J. (2016) Association Mapping of Main Tomato Fruit Sugars and Organic Acids. Frontiers in Plant Science, 7, 1286.
[18] Sacco, A., Ruggieri, V., Parisi, M., Festa, G., Rigano, M. M., Picarella, M.E., Mazzucato, A. and Barone, A. (2015) Exploring a Tomato Landraces Collection for Fruit-Related Traits by the Aid of a High-Throughput Genomic Platform. PLoS ONE, 10, e0137139.
[19] Lin, T., Zhu, G.T., Zhang, J.H., Xu, X.Y., Yu, Q.H., Zheng, Z., Zhang, Z.H., Lun, Y.Y., Li, S., Wang, X.X., Huang, Z.J., Li, J.M., Zhang, C.Z., Wang, T.T., Zhang, Y.Y., Wang, A.X., Zhang, Y.C., Lin, K., Li, C.Y., Xiong, G.S., Xue, Y.B., Mazzucato, A., Causse, M., Fei, Z.J., Giovannoni, J.J., Chetelat, R.T., Zamir, D., Stadler, T., Li, J.F., Ye, Z.B., Du, Y.C. and Huang, S.W. ( 2014) Genomic Analyses Provide Insights into the History of Tomato Breeding. Nature Genetics, 46, 1220-1226.
[20] Aflitos, S., Schijlen, E., de Jong, H., de Ridder, D., Smit, S., Finkers, R., Wang, J., Zhang, G., Li, N., Mao, L., Bakker, F., Dirks, R., Breit, T., Gravendeel, B., Huits, H., Struss, D., Swanson-Wagner, R., van Leeuwen, H., van Ham, R.C., Fito, L., Guignier, L., Sevilla, M., Ellul, P., Ganko, E., Kapur, A., Reclus, E., de Geus, B., van de Geest, H., Lintel Hekkert, B.T., van Haarst, J., Smits, L., Koops, A., Sanchez-Perez, G., van Heusden, A.W., Visser, R., Quan, Z., Min, J., Liao, L., Wang, X., Wang, G., Yue, Z., Yang, X., Xu, N., Schranz, E., Smets, E., Vos, R., Rauwerda, J., Ursem, R., Schuit, C., Kerns, M., van den Berg, J., Vriezen, W., Janssen, A., Datema, E., Jahrman, T., Moquet, F., Bonnet, J. and Peters, S. (2014) Exploring Genetic Variation in the Tomato (Solanum Section Lycopersicon) Clade by Whole-Genome Sequencing. Plant Journal, 80, 136-148.
[21] Phan, N.T., Trinh, L.T., Rho, M.Y., Park, T.S., Kim, O.R., Zhao, J., Kim, H.M. and Sim, S.C. (2019) Identification of Loci Associated with Fruit Traits Using Genome-Wide Single Nucleotide Polymorphisms in a Core Collection of Tomato (Solanum lycopersicum L.). Scientia Horticulturae, 243, 567-574.
[22] Ruggieri, V., Francese, G., Sacco, A., D’Alessandro, A., Rigano, M.M., Parisi, M., Milone, M., Cardi, T., Mennella, G. and Barone, A. (2014) An Association Mapping Approach to Identify Favourable Alleles for Tomato Fruit Quality Breeding. BMC Plant Biology, 14, 337.
[23] The Tomato Genome Consortium (2012) The Tomato Genome Sequence Provides Insights into Fleshy Fruit Evolution. Nature, 485, 635-641.
[24] Chung, Y.S., Choi, S.C., Jun, T.H. and Kim, C. (2017) Genotyping by Sequencing: A Promising Tool for Plant Genetics Research and Breeding. Horticulture Environment and Biotechnology, 58, 425-431.
[25] Elshire, R.J., Glaubitz, J.C., Sun, Q., Poland, J.A., Kawamoto, K., Buckler, E.S. and Mitchell, S.E. (2011) A Robust, Simple Genotyping-by-Sequencing (GBS) Approach for High Diversity Species. PLoS ONE, 6, e19379.
[26] Labate, J.A., Sheffer, S.M., Balch, T. and Robertson, L.D. (2011) Diversity and Population Structure in a Geographic Sample of Tomato Accessions. Crop Science, 51, 1068-1079.
[27] Mazzucato, A., Papa, R., Bitocchi, E., Mosconi, P., Nanni, L., Negri, V., Picarella, M.E., Siligato, F., Soressi, G.P., Tiranti, B. and Veronesi, F. (2008) Genetic Diversity, Structure and Marker-Trait Associations in a Collection of Italian Tomato (Solanum lycopersicum L.) Landraces. Theoretical and Applied Genetics, 116, 657-669.
[28] Weigel, D. and Glazebrook, J. (2002) Arabidopsis: A Laboratory Manual Cold Spring Harbor Laboratory Press, Cold Spring Harbor.
[29] Golicz, A.A., Bayer, P.E. and Edwards, D. (2015) Skim-Based Genotyping by Sequencing. Methods in Molecular Biology, 1245, 257-270.
[30] Langmead, B. and Salzberg, S.L. (2012) Fast Gapped-Read Alignment with Bowtie 2. Nature Methods, 9, 357-359.
[31] Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Martha, G., Abecasis, G. and Durbin, R. (2009) The Sequence Alignment/Map format and SAMtools. Bioinformatics, 25, 2078-2079.
[32] McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., Garimella, K., Altshuler, D., Gabriel, S., Daly, M. and DePristo, M.A. (2010) The Genome Analysis Toolkit: A MapReduce Framework for Analyzing Next-Generation DNA Sequencing Data. Genome Research, 20, 1297-1303.
[33] Browning, S.R. and Browning, B.L. (2007) Rapid and Accurate Haplotype Phasing and Missing-Data Inference for Whole-Genome Association Studies by Use of Localized Haplotype Clustering. American Journal of Human Genetics, 81, 1084-1097.
[34] Pritchard, J.K., Stephens, M. and Donnelly, P. (2000) Inference of Population Structure Using Multilocus Genotype Data. Genetics, 155, 945-959.
[35] Evanno, G., Regnaut, S. and Goudet, J. (2005) Detecting the Number of Clusters of Individuals Using the Software Structure: A Simulation Study. Molecular Ecology, 14, 2611-2620.
[36] Bradbury, P.J., Zhang, Z., Kroon, D.E., Casstevens, T.M., Ramdoss, Y. and Buckler, E.S. (2007) TASSEL: Software for Association Mapping of Complex Traits in Diverse Samples. Bioinformatics, 23, 2633-2635.
[37] Letunic, I. and Bork, P. (2019) Interactive Tree of Life (iTOL) v4: Recent Updates and New Developments. Nucleic Acids Research, 47, W256-W259.
[38] Causse, M., Desplat, N., Pascual, L., Le Paslier, M.C., Sauvage, C., Bauchet, G., Berard, A., Bounon, R., Tchoumakov, M., Brunel, D. and Bouchet, J.P. (2013) Whole Genome Resequencing in Tomato Reveals Variation Associated with Introgression and Breeding Events. BMC Genomics, 14, 791.
[39] Blanca, J., Montero-Pau, J., Sauvage, C., Bauchet, G., Illa, E., Diez, M.J., Francis, D., Causse, M., van der Knaap, E. and Canizares, J. (2015) Genomic Variation in Tomato, from Wild Ancestors to Contemporary Breeding Accessions. BMC Genomics, 16, 257.
[40] Rick, C.M. (1988) Tomato-Like Nightshades-Affinities, Autoecology, and Breeders Opportunities. Economic Botany, 42, 145-154.
[41] Fray, R.G. and Grierson, D. (1993) Identification and Genetic Analysis of Normal and Mutant Phytoene Synthase Genes of Tomato by Sequencing, Complementation and Co-Suppression. Plant Molecular Biology, 22, 589-602.
[42] Rodriguez, G.R., Munos, S. anderson, C., Sim, S.C., Michel, A., Causse, M., Gardener, B.B., Francis, D. and van der Knaap, E. (2011) Distribution of SUN, OVATE, LC, and FAS in the Tomato Germplasm and the Relationship to Fruit Shape Diversity. Plant Physiology, 156, 275-285.
[43] Ronen, G., Carmel-Goren, L., Zamir, D. and Hirschberg, J. (2000) An Alternative Pathway to Beta-Carotene Formation in Plant Chromoplasts Discovered by Map-Based Cloning of Beta and Old-Gold Color Mutations in Tomato. Proceedings of the National Academy of Sciences, 97, 11102-11107.
[44] Smith, P.G. (1944) Embryo Culture of a Tomato Species Hybrid. Proceedings of the American Society for Horticultural Science, 44, 413-416.
[45] Stevens, M.R., Lamb, E.M. and Rhoads, D.D. (1995) Mapping the Sw-5 Locus for Tomato Spotted Wilt Virus-Resistance in Tomatoes Using RAPD and RFLP Analyses. Theoretical and Applied Genetics, 90, 451-456.
[46] Stevens, M.R., Scott, S.J. and Gergerich, R.C.J.E. (1994) Evaluation of Seven Lycopersicon Species for Resistance to Tomato Spotted Wilt Virus (TSWV). Euphytica, 80, 79-84.
[47] Kalloo and Banerjee, M.K. (1990) Transfer of Tomato Leaf Curl Virus-Resistance from Lycopersicon hirsutum to L. esculentum. Plant Breeding, 105, 156-159.
[48] Hanson, P.M., Bernacchi, D., Green, S., Tanksley, S.D., Muniyappa, V., Padmaja, S., Chen, H.M., Kuo, G., Fang, D. and Chen, J.T. (2000) Mapping a Wild Tomato Introgression Associated with Tomato Yellow Leaf Curl Virus Resistance in a Cultivated Tomato Line. Journal of the American Society for Horticultural Science, 125, 15-20.
[49] Ji, Y., Schuster, D.J. and Scott, J.W. (2007) Ty-3, a Begomovirus Resistance Locus near the Tomato Yellow Leaf Curl Virus Resistance Locus Ty-1 on Chromosome 6 of Tomato. Molecular Breeding, 20, 271-284.
[50] Ori, N., Eshed, Y., Paran, I., Presting, G., Aviv, D., Tanksley, S., Zamir, D. and Fluhr, R. (1997) The I2C Family from the Wilt Disease Resistance Locus I2 Belongs to the Nucleotide Binding, Leucine-Rich Repeat Superfamily of Plant Resistance Genes. Plant Cell, 9, 521-532.
[51] Bournival, B.L., Scott, J.W. and Vallejos, C.E. (1989) An Isozyme Marker for Resistance to Race-3 of Fusarium oxysporum sp lycopersici in Tomato. Theoretical and Applied Genetics, 78, 489-494.
[52] Scott, J.W. and Jones, J.P. (1989) Monongenic Resistance in Tomato to Fusarium oxysporum F. sp. lycopersici Race-3. Euphytica, 40, 49-53.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.