Genome Wide Association Study of Yield and Yield-Related Traits in Elite Spring Bread Wheat Genotypes Grown under High Temperature Environment in Sudan ()
1. Introduction
Heat stress caused by high temperature is one of the most important wheat yield limiting factor affecting more than 30 million hectares of wheat production area annually in the world and thereby causing significant grain yield reduction [1] . The effect of high temperature is even expected to be more detrimental stress in the future because of a rise in global temperature by 1˚C - 4.5˚C over the next 50 years associated with climate change [2] . In addition to lowering grain yield, the effect of climate change is also evident in the quality of wheat as increased heat results in shriveled wheat grains [3] . Heat stress can cause partial or total breakdown of anatomy, morphology, biochemistry and physiology of the crop. It is a function of the magnitude and rate of temperature increase, as well as the duration of exposure to the raised temperature [4] . Genetic variabilities for heat stress have been previously reported [5] . Association mapping using phenotypic and genotypic data of association panels has become an important approach in identifying molecular markers linked to traits of interest for potential use in marker selection. Its advantage over the more conventional bi-parental mapping approach lies mainly in the greater extent of allelic variation which can be surveyed, while also avoiding the need to establish a customized mapping population [6] . Association mapping or genome-wide association studies (GWAS) may be a suitable complementary approach to bi-parental populations for identifying genetic markers associated with traits of interest [7] . Compared to other marker types, single nucleotide polymorphisms (SNPs) are generally more abundant, stable, efficient, high-res genetic maps and genomic selection and cost effective [8] . The objectives of this study were to 1) Genetic diversities of 250 bread wheat genotypes and identify heat tolerant genotypes; 2) Identify significant MTAs and QTLs for heat tolerance in Sudan; 3) Select the best high yielding and heat tolerant genotypes.
2. Materials and Methods
2.1. Phenotyping
2.1.1. Experimental Site
The experiments were conducted in two seasons (2016/17 and 2017/18) using two dates for planting; first sowing date (1st SD) on 20 November (optimum) and second sowing date (2nd SD) on 12 December (late) at the Gezira Research Staion Farm (GRSF) of the Agricultural Research Corporation (ARC), Wad Medani, Sudan. (GRSF) is located at latitude of 14˚-24'N and longitude of 29˚-33'E and altitude of 407 masl. The soil at this station is classified as heavy clay soil, pH of about 8.0 - 8.4, low organic matter (0.05), deficient in nitrogen (380 ppm), and phosphorus (ESP, 4 ppm).
2.1.2. Plant Materials and Experimental Design
Experiments were conducted using 250 genotypes selected from different spring bread wheat advanced yield trials introduction from ICARDA, including three checks (namely Imam, Goumria and Nebta). Experiments were arranged in an alpha lattice design with two replications. Plots consisted of four rows, 3 m long and 0.2 m apart. Seeds were sown at the rate of 120 kg/ha.
2.1.3. Cultural Practices
The recommended dose of fertilizer (43 kg P2O5/ha) was applied prior to sowing and 86 kg N/ha as urea was applied with the second and fourth irrigations. The experiments were irrigated (flood irrigation) at 10 - 12 days interval for a total of 9 irrigations.
2.1.4. Measurements and Statistical Analysis
Data were recorded for days to heading and maturity, plant height, number of spikes/m2, number of seeds/spike, 1000 seed weight (gm), biomass (kg/ha), harvest index and grain yield (kg/ha). The phenotypic data were statistically analyzed using Genestat software version 18.
2.1.5. Metrological Data
In the first season, the temperature in the 1st SD was higher than the 2nd SD, while in the second season it was the opposite (Figure 1). In the first season, the mean temperature in the first sowing date at the critical period of growth (from 20 December -20 January) was higher than the second sowing date at the critical period (from end of January - end of February). In the first season, the maximum and minimum temperatures were lower by 3˚C in the second sowing date respectively when compared to the first sowing date of the critical stage of crop growth, while in the second season, maximum and minimum temperatures were lower by 5˚C in the first sowing date when compared to the second sowing date.
Figure 1. Maximum, minimum and mean temperatures (˚C) at the critical stages of wheat genotypes grown at two sowing dates for two seasons (2016/17 and 2017/18) at Gezira Research Station Farm (GRSF), Wad Medani, Sudan.
2.2. Genotyping
2.2.1. DNA Extraction
DNA extraction was carried out following standard protocol. Genome-wide markers were obtained for the genotypes using wheat 15 k Single Nucleotide Polymorphism (SNP) markers for yield and traits in 231 bread wheat genotypes. Analysis was carried out at Trait Genetics Company, Germany (specialized in the development and analysis of molecular markers for biodiversity and plant breeding research).
2.2.2. Genome-Wide Association Analysis (GWAS)
A GWAS was performed using the mixed linear models (MLM) in TASSEL (version 5.2. 37). The significant between 15K SNPs and traits were identified using a model with a Q matrix as the fixed effect and a kinship matrix as the random effect (Q + K) as well as a model with a principal component (PC) matrix as the fixed effect and a kinship matrix as the random effect (PCA + K), at a threshold of P < 0.001.
The likelihood distribution (LD) is the nonrandom association of alleles at different sites. For association studies the appropriate marker density depends on the background levels of LD. The low LD population need higher marker densities and high LD population require low marker densities. LD among markers was calculated using observed vs. expected allele frequencies of the markers in TASSELv.5.2.37. Only mapped markers were used for LD calculation both for the entire panel and for model-based subgroups. The critical r2-value beyond which LD is due to true physical linkage was determined by taking the 95th percentile of the square root transformed r2 data of unlinked markers [9] . The percentage of marker pairs significant at different critical r2-values (0.2) and P < 0.001 was determined for each chromosome to compare the degree of LD among genes.
3. Results and Discussion
3.1. Phenotypic Variation
3.1.1. Yield and Yield Components Variations
Highly significant genotypes differences were fund for yield and yield components for two seasons in tow sowing. Generally were found wide range between maximum and minimum values of yield and yield components. These variations among genotypes for their traits reflect their different genetic backgrounds. These results were in agreement with those of [10] [11] [12] . Maximum grain yield ranged from 4016 - 5120 kg/ha for 1st SD and 2nd SD respectively in first season and from 5386 - 3926 kg/ha for 1st SD and 2nd SD respectively in second season and the biomass ranged from 11,032 - 13,625 kg/ha for 1st SD and 2nd SD respectively in first season and from 15,625 - 11,161 kg/ha for 1st SD and 2nd SD respectively in second season (Table 1). The maximum difference values among above traits due to furthermore genetic backgrounds to the difference temperature in sowing date in each environment (in the first season, generally the temperature in the first sowing date higher than the second sowing date, while the second season the opposite). Significant phenotypic variability for different traits such number of grain per spike and harvest index. Genotypes differed significantly for these traits among these experiments indicating the presence of sufficient genetic variability for selection to identify the pest genotypes.
3.1.2. Morphological Traits Variations
Highly significant genotypes differences were fund for days to heading, days to maturity, plant height and number of spike/m2 for two seasons in tow sowing dates (Table 2). Wide range variations in these traits, days to heading ranged from 41 - 70 dyes for 1st SD and 2nd respectively in first season and from 46 - 79 dyes for 1st SD and 2nd respectively in second season, days to maturity ranged from 73 - 102 dyes for 1st SD and 2nd respectively in first season and from 76 - 116 dyes for 1st SD and 2nd respectively in second season, plant height from 44 - 87 cm for 1st SD and 2nd respectively in first season and from 66 - 100 cm for 1st SD and 2nd respectively in second season and number of spike/m2 ranged from
Table 1. Mean, minimum and maximum values for yield and yield components of the 250 elite spring bread wheat genotypes grown in two sowing dates in GRF in Wad Medani, Sudan for seasons 2016/17 and 2017/18.
Sig Level = significant level and *** = very highly significant at P < 0.001.
Table 2. Mean, minimum and maximum values for morphological traits of the 250 elite spring bread wheat genotypes grown in two sowing dates in GRF in Wad Medani, Sudan season 2016/17.
Sig Level = significant level and *** = very highly significant at P < 0.001.
202 - 586 for 1st SD and 2nd respectively in first season and from 280 - 595 for 1st SD and 2nd respectively in second season. This variation depends on genotypes. Very large variation in the degree of response of bread wheat to heat stress was observed for various traits including days to heading and maturity plant height [11] .
3.2. Association Mapping
Genome wide association analysis was performed using 15 k SNPs markers for yield and traits in 231 bread wheat genotypes (13 missing for genotypic data) grown for two seasons. The associations between genotype and phenotype were worked out using the mixed linear models (MLM) in TASSEL. The significance between 15 K SNPs and traits was identified using a model with a Q matrix as the fixed effect and a kinship matrix as the random effect (Q + K) as well as a model with a principal component (PC) matrix as the fixed effect and a kinship matrix as the random effect (PCA + K), at a threshold of P < 0.001.
3.2.1. Population Structure
STRUCTURE analysis showed that the number of sub-populations ranged from zero to ten when using the diploid genotyping model. Population structure analysis of the genotypes indicated that the likely number of sub populations was four (Figure 2). The cluster analysis also provided evidence for the presence of four subpopulations (Figure 3). The similarity between clusters and the subgroups of model-based analysis in STRUCTURE indicated that higher genetic
Figure 2. Population structure for 231 genotypes in a spring wheat association mapping.
Figure 3. Cluster for 231 genotypes in a spring wheat association mapping panel.
variance among population. Population structure caused by admixture of different sub-population should be in consideration because GWAS has become more complicated due to complex breeding history of germplasm along with limited gene flow between different sub populations; this can be reduced by taking population structure as covariant in GWAS studies [13] [14] .
3.2.2. Linkage Disequilibrium (LD)
In Genome-Wide Association Studies (GWAS), the concept of linkage disequilibrium is important as it allows identifying genetic markers that tag the actual causal variants. LD was estimated as squared allele frequency correlations (R2), and only P-values ≥ 0.01 for each pair of loci were considered significant analysis using TASSEL software. LD decay distance increases with minor allelic frequency, the increase is very significant at minor allelic frequency ≥ 0. In this study LD decay rate was determined on chromosome level, however, a clear and rapid decline in LD with distance were observed (Figure 4) also showed (Figure 5). LD
Figure 4. Linkage disequilibrium (r2) plot of all chromosomes of the genome in 231 genotypes of a spring wheat association mapping panel. Plot of pair-wise single-nucleotide polymorphism LD r2 values as a function of intermarker map distance (Mbp). The blue curve represents the model fit to LD decay. The green line represents the 51.26 (Mbp) confidence interval for the quantitative trait loci regions in which LD r2 = 0.2, beyond which LD is likely due to linkage.
Figure 5. Linkage disequilibrium (r2) of all chromosomes with highest significant comparison of marker (LD at P < 0.001) for 231 genotypes in across seasons grown at GRF, Wad Medani, Sudan.
decay rate evaluation was conducted at the genome and individual chromosome level. The genome level LD decayed below r2 = 0.2 at about 6 cM for the A genome, 5 cM for the B genome and 16 cM for D genome (Figures 6-8, respectively). The results showed extended LD in the D genome than in the A and B genomes. Similar finding using 394 genetically mapped SNP markers on 478 spring and winter wheat cultivars and reported more extended LD in the D genome than in the A and B genomes could be the introduction of new haplotypes, which can increase the extent of LD from Aegilops tauschii (D genome donor) into the D genome of hexaploid wheat germplasm through synthetic wheats [15] . Extent of LD depends upon genetic distance and it determines the number of markers required for association mapping, the low LD population need higher
Figure 6. Linkage disequilibrium (r2) plot of A genome.
Figure 7. Linkage disequilibrium (r2) plot of B genome.
Figure 8. Linkage disequilibrium (r2) plot of D genome. The green line represents the 6 (CM) for A genome 5 (CM) for B genome and 16 (CM) for D genome confidence interval for the quantitative trait loci regions in which LD r2 = 0.2, beyond which LD is likely due to linkage.
marker densities and high LD population require low marker densities [16] . This indicates that fewer markers are needed for GWAS in the D genome than for the A and B genomes.
3.2.3. Marker-Traits Associations (MTAs) of Grain Yield
A total of 282, 355 and 475 significant (P ≤ 0.05) markers associated with grain yield were observed in the first, second and across seasons, respectively. MTAs for grain yield were highly significantly associated with the chromosomes 6B, 4B and 6A in the first season, were 6B, 6A, 7A and 3A in the second season and 6B and 6A in the across seasons (Table 3). Chromosomes 6B and 6A were stable in
Table 3. Top markers with high association with yield of 231 wheat genotypes evaluated under heat stress in each season and across seasons at GRF, Wad Medani, Sudan.
Chro = chromosome number.
all seasons. This result was in agreement with [17] reported that chromosomes 6B and 6A associated with yield under temperature irrigated environments. Chromosome 6B was significantly associated with yield under drought stress [18] . Manhattan plots of the GWAS results are shown in Figures 9-11 for yield in the first, second and across seasons, respectively. Marker associated with grain yield in chromosomes 6B, 4B and 6A in the first season, 6B and 6A in the second season and 6B, 6A and 2B across seasons. The grain yields were highly significantly (P ≤ 0.001) associated with the markers 6B/Kukri_c27662_645 in each season and across seasons (Table 3), with markers 6B/. BobWhite_c23771_525, 6B/Excalibur_c32219_491, 6B/RAC875_c31299_13021302, 6B/BS00046264_51, 6A/wsnp_Ex_rep_c69373_68312188 and 6A/Tdurum-contig12123_1650 in the first season and across seasons and with markers 6BKukri_c27662_675 and 6A/BobWhite_c15802_72 in the second season and across seasons. Chromosome 6B was significantly associated with yield under rain-fed conditions [19] .
Figure 9. Manhattan plots of the GWAS results using 15 K SNPs markers in grain yield kg/ha for 231 wheat genotypes in first season grown at GRF, Wad Medani, Sudan.
Figure 10. Manhattan plots of the GWAS results using 15 K SNPs markers in grain yield kg/ha for 231 wheat genotypes in second season.
Figure 11. Manhattan plots of the GWAS results using 15 K SNPs markers in grain yield kg/ha for 231 genotypes in across seasons grown at GRF, Wad Medani, Sudan.
3.2.4. MTAs of Morphological Traits in Optimum and Late Sowing Dates
Days to heading (DH) was associated with two markers located on chromosome 3B as well as with three chromosomes 4B, 6B, and 5A in optimum sowing dates in both seasons. Chromosome 3B was significantly associated with DH under rain-fed conditions [20] . Several QTL were identified by multi-environment single trait analysis for DH on some chromosome including 6B (Lopes et al., 1013). QTL that affect flowering time in wheat have been reported on chromosomes 5A, and 3B by [21] . In the late sowing dates DH associated with markers on chromosomes 1A and 1B (Table 4), Several chromosomes associated with days to maturity (DM) in optimum and late sowing dates but markers on chromosomes 2B associated in late in two seasons. Earlier studies have reported QTLs for maturity on chromosome 2B in a parental popula6-1-7tion [22] . The MTAs for DH and DM on chromosome 2B corresponded to a previously identified chromosomal region [23] . Plant height was associated with markers on chromosomes 6B, 1A, and 2A in optimum sowing dates in two seasons. Chromosome 1A was significantly associated with plant height under irrigated environments [17] and chromosome 2A is most likely the QTL reported by [24] . Plant height in late sowing dates was associated with 6B in two seasons. Many markers on chromosomes were associated with number of seed/spike but the chromosome 2A was stable under optimum and late. Chromosome 2A, and most of the associated markers were located near QTL for multiple traits such as number of spikelets per spike and number of grains per spike [25] [26] . Several chromosomes like 6A, 2A, 7A, 3B, 4B, 5B, 6B, 1B, 4D and 6D associated with number of spikes/m2 [19] found QTL on chromosomes 1B, 2A, 3B, 4DS, 5D, 6D and 7A for tiller number per m2. The thousand kernel weight associated with four markers located on chromosome 6B in optimum and late sowing dates and also the trait associated with chromosomes 4B, 6D, 3B, 4A, 3A, 1B, 5A, 2D, 7A and 2B. QTL for kernel weight on chromosomes 6B, 2B, 2D, 3B, 3A, 5A and 1B [19] .
Table 4. Chromosomes with high association with agronomic traits of 131 wheat genotypes evaluated under heat stress seasons 2016/17 and 2017/18 at GRSF, Wad Medani, Sudan.
DH = days to heading, DM = days to maturity, PH = plant height, Spi/m2 = spikes/m2, Se/Spi = number of seed/spike, TKW = thousand seed height, Bio kg/ha = biomass kg/ha and HI = harvest index.
Multi-environment single trait QTL analysis for kernel weight detected significant QTL on chromosomes 6B and 6D [27] . Biomass was associated with chromosome 5A and was stable in optimum and late, [24] reported that chromosome 5A associated with biomass under drought and heat conditions. Chromosomes 2B and 7A were associated with harvest index in optimum in two seasons while, in late the trait was associated with chromosomes 5B, 6B and 7B. QTL for harvest index on chromosome 5B [28] .
3.2.5. Yield Covariate with Days to Heading
Heading time is an important trait for adaptation of wheat to target environments [29] . It is regulated by three well-characterized groups of loci namely, vernalization (VRN), photoperiod (Ppd) and earliness (Es) [30] . The VRN genes are dominant for the spring growth habit whereas the VRN2 genes are dominant for the winter growth habit [29] . Days to heading as coverable to identify QTLs which is independent and more reliable reported by [5] . When used as covariate of yield days to heading in this study, markers on chromosome 2B associated with yield and were stable in the first, second and across seasons (Table 5). The marker of (Tdurum_contig71387_65) on chromosome 2B more stable in the first, second and across season and markers (Tdurum_contig10994_617 and RFL_Contig1483_1765) were found on chromosome 2B in the first season and
Table 5. Chromosomes with high association yield covariate with days to heading in wheat genotypes evaluated in seasons 2016/17 and 2017/18 and across seasons at GRSF, Wad Medani, Sudan.
Chro = chromosome number.
across season. Days to heading was one trait specific SNP on chromosome 2B presented on unmapped chromosome were found under heat and drought stress [30] . In the second season markers on chromosomes 7A, 6A, 2B, 6B and 1A were found. MTAs on 6B were associated with yield. QTL cluster for yield with heading were detected on chromosome 2B, 7A, 5A, 6B, and 1A [29] . Yield under stress is mainly controlled by the QTLs controlling heading and maturity time because the duration of the crop life cycle affects the timing and intensity of the stress experienced by the plants [31] .
4. Conclusion
The results of phenotypic variations for all the traits studied, indicated that the variability among the genotypes considered in this study. Results of genome-wide association study (GWAS) indicated that several chromosomes were detected for yield and related traits in this population. The most important chromosomes findings were:
· The chromosomes 6B and 6A were important for grain yield in two seasons and in across season.
· More than 10 markers on chromosomes were highly associated with grain yield at least at two seasons but marker (Kukri_c27662_645) was stable in the two seasons and across seasons. The markers (wsnp_Ex_c4125_7456528, BobWhite_c23771_525, Excalibur_c32219_491, wsnp_Ex_rep_c69373_68312188, RAC875_c31299_1302, RAC875_c31299_1302, Tdurum_contig12123_1650, BS00046264_51, BobWhite_c15802_72, RAC875_c14887_829 and wsnp_Ex_c3025_5587183) are associated with grain yield in this study can be important targets for marker-based breeding and fine mapping of functional genes in the future.
· Using yield as covariate with days to heading, chromosome 2B was associated with yield and more stable in the two seasons and across seasons.
· The study showed that there were more than 10 genotypes showed good performance with high yield potential and heat tolerance and the pedigree of the best 10 genotypes are (WBLL1*2/BRAMBLING//SHAMIEKH-4 (27), ZARAFA-5/
FLAG
-6//MILAN/PASTOR (37), CHAM-10/3/PASTOR//MUNIA/ALTAR 84/4/PFAU/MILAN (242), HUBARA-15/ICBW 206971//(4) EALME4SA–464 (139), ABU-REYAA-1/LEITH-1 (192), CHAM-10/3/PASTOR//MUNIA/ALTAR84/4/PFAU/MILAN (152), DEBEIRA//MILAN/ASTOR/4/URES/BOW//OPATA/3/HD2206/HORK'S' (20), WAXWING*2 (WAXWING*2/KUKUNA//SHUHA-4/CHAM-8 (94), KAUZ//KAUZ/STAR/3/ETBW4919/4/TEMPORALERAM87*2/KONK and CHAMRAN/4/OPATA/BOW//BAU/3/OPATA/BOW/5/SAMIRA-9.
5. Recommendation
· The chromosomes 6B and 6A were associated for grain yield in two seasons and in across season.
· More than 10 markers on chromosomes highly associated with grain yield, can be use these markers for fine mapping of functional genes in the future.
· 10 genotypes showed good performance with high yield potential and heat tolerance, it recommended to evaluate these genotypes in advanced trials to the release some of them.
The currently identified MTAs should be further validated using other elite set of genotypes so that they can be used for MAS in the breeding program. The high yielding and heat tolerant genotypes identified in this study will be evaluated across key locations in Sudan for potential release in Sudan. Furthermore, they will be used as parents in the breeding.
Acknowledgements
A lot of thanks and appreciation to the International Center for Agricultural Research in the Dry Areas (ICARDA) especially to Dr. Tadesse Wuletaw senior wheat breeder at ICARDA and to Dr. Charles Kleinermann (Head, Capacity Development Unit at ICARDA) for their support to this study. Especial thanks to Mr. Samira Elhanafi and Dr. Sawsan Tawkaz. I also thank to Gezira University (GU) especially to Dr. Siddig Eisa main and prof Abu Elhassan Salih Ibarhim. I also thank to Agricultural Research Corporation (ARC), Sudan and wheat research program especially to Dr. Izzat Sid Ahmed and Wheat Research Program team in (ARC), Sudan for their support and help.