Genome-Wide SNPs Identification and Determination of Proteins Associated with Stress Response in Sorghum ( Sorghum bicolor L . Monech ) Accessions

Current efforts in sorghum breeding programs are exploiting genotyping-bysequencing (GBS) data to provide full-genome scans for desired traits. The aim of this study was to utilize GBS approach for the identification of genomic regions associated with stress response in sorghum (Sorghum bicolor L. Monech) accessions. DNA samples of twenty sorghum accessions, having different response to drought, were used to prepare GBS libraries for sequencing. SNPs were called using the TASSELGBS pipeline and the tags that present at least 10 times in the dataset were considered and aligned to the reference genome of Sorghum bicolor. The identified SNPs were all compared with the published sorghum transcript related to stress response gene activity. Overall; 94.40% tags were aligned and 69,736 putative SNPs positions were identified. Blast search revealed homology to annotated heat and drought–tolerance associated genes which code for ATPases, Peroxidase, Hydrophobic protein LTI6A, Aquaporin SIP2-1, Aconitate hydratase and phosphatidylinositol-4phosphate-5-kinase. The phylogeny of the 20 accessions was constructed using the generated SNPs data. Phylogenetic analysis data showed that the phenotypically tolerant line (El9) makes a separate cluster and the same for the accessions HSD8653 and HSD5612 near to the cluster that includes most accessions with known post-flowering drought tolerance (HSD7410, HDS10033, HSD8552, GESHEISH and HSD8849). Post-flowering drought sensitive accessions (Tabat, Wadahmed, HSD6468 and HSD6478) formed a separate cluster while the sensitive accession HSD9959 and the tolerant accessions HSD8511 and HSD9566 were distributed between the two clusters. Thus, cluster analysis confirmed the variation among accessions in post-flowering How to cite this paper: Fadoul, H.E., El Siddig, M.A., Abdalla, A.W.H. and El Hussein, A.A. (2017) Genome-Wide SNPs Identification and Determination of Proteins Associated with Stress Response in Sorghum (Sorghum bicolor L. Monech) Accessions. American Journal of Plant Sciences, 8, 1624-1631. https://doi.org/10.4236/ajps.2017.87112 Received: April 26, 2017 Accepted: June 19, 2017 Published: June 23, 2017 Copyright © 2017 by authors and Scientific Research Publishing Inc. This work is licensed under the Creative Commons Attribution International License (CC BY 4.0). http://creativecommons.org/licenses/by/4.0/ Open Access

published sorghum transcript related to stress response gene activity.Overall; 94.40% tags were aligned and 69,736 putative SNPs positions were identified.Blast search revealed homology to annotated heat and drought-tolerance associated genes which code for ATPases, Peroxidase, Hydrophobic protein LTI6A, Aquaporin SIP2-1, Aconitate hydratase and phosphatidylinositol-4phosphate-5-kinase.The phylogeny of the 20 accessions was constructed using the generated SNPs data.Phylogenetic analysis data showed that the phenotypically tolerant line (El9) makes a separate cluster and the same for the accessions HSD8653 and HSD5612 near to the cluster that includes most accessions with known post-flowering drought tolerance (HSD7410, HDS10033, HSD8552, GESHEISH and HSD8849).Post-flowering drought sensitive accessions (Tabat, Wadahmed, HSD6468 and HSD6478) formed a separate cluster while the sensitive accession HSD9959 and the tolerant accessions HSD8511 and HSD9566 were distributed between the two clusters.Thus, cluster analysis confirmed the variation among accessions in post-flowering

Introduction
Sorghum is the dietary staple food for about 500 million people over 30 countries and is especially important in semi-arid regions of Africa, along with pearl millet and cassava [1].It is believed that sorghum yields and/or further resistance to biotic and abiotic factors could be improved upon using suitable improved varieties, and preferably developed from environmentally suited local landraces and/or wild/weedy varieties [2].Important abiotic stresses in sorghum are broadly defined into three categories: water, temperature, and nutrition.Although sorghum possesses excellent drought resistance compared to most other crops, drought stress is the primary factor that reduces sorghum production worldwide [3].Drought response in sorghum has been classified into two distinct stages; pre-flowering drought response that occurs prior to anthesis and post-flowering drought response that is observed when water limitation occurs during the grain-filling stage.Post-flowering drought stress tolerance is indicated when plants remain green and fill grain normally [4].
Marker-assisted selection was employed to improve the stay green trait involved in the drought tolerance of sorghum [5].Recent advances in sorghum genomics, including availability of an aligned sorghum genome sequence [6], utilizing newer marker techniques such as DArT, and single-nucleotide polymorphisms [7], have strengthened the foundation for better integration of molecular marker technologies in applied sorghum breeding programs [8].
GBS is a simple highly multiplexed system for constructing reduced representation libraries for the Illumina NGS platform developed in the Buckler lab [7].
It generates large numbers of SNPs for use in genetic analyses and genotyping [9].
Current efforts in sorghum are exploiting genotyping-by-sequencing data to provide full-genome scans across >100,000 SNP (single-nucleotide polymorphism) loci in each member of a portion of a global reference collection of sorghum germplasm that has been phenotyped with a lysimetric system to explore the allelic variation available for key components of the drought resistance phenotype.
The objective of the present study was to identify significant SNPs of genomic regions associated with stress response in 20 sorghum accessions collected from Sudan, using GBS.previously measured in the field (Table 1).

DNA Extraction
Genomic DNA was isolated from leaf tissues using CTAB method [10].The DNA was eluted in 50 µl Nuclease-Free Water.The samples were quantified using Qubit Fluorometer and Nanodrop.The quality of samples was checked in 0.8% Agarose gel electrophoresis.

Preparation of GBS Libraries
To 100 ng of gDNA, barcoded adapters were added and digested with Apek1 enzyme for 2 h at 75˚C.The barcoded and common adapters were ligated to the sticky ends of the digested DNA with T4 DNA Ligase enzyme at 22˚C for 60 mins, followed by heat inactivation.The ligated products were pooled into two groups of 10 samples each.The pool of ligated products was size-selected to 700 bp -1 KB on 2% gel.The pool was PCR amplified to generate the final library pool.

Quantity and Quality Check (QC) of Library on Bioanalyzer
The library pool was analyzed in Bioanalyzer 2100 (Agilent Technologies) using High Sensitivity (HS) DNA chip as per manufacturer's instructions.The library pool was sequenced on illumina NextSeq platform.

Data Analysis
SNPs were called from illumina FASTq files using the TASSELGBS pipeline.
Only 75 bp tags present at least 10 times in the dataset were considered.Reads were aligned to the reference genome by Bowtie 2 tool with only the best hit alignment used.SNPs with >95% missing data were discarded.SNPs were filtered by minor allele frequency, as rare SNPs are especially useful for inferring differences between accessions after aligned with reference genome.

Identification of Stress Response-Associated SNPs
To identify stress response genes, the published stress response related genes from a reference sorghum plant were used.A total of 587 genes were obtained from the published sorghum transcriptome [11].The 587 genes positions were compared with the obtained SNPs positions belonging to the 20 sorghum samples to identify SNPs positions related to stress response genes.

Phylogenetic Analysis
The accessions data were used for phylogenetic analysis using the cladogram function in TASSELGBS.

Results and Discussion
The gDNA extracted and purified from the 20 sorghum accessions was digested using ApeK1 enzyme to prepare the GBS library; the library pool was sequenced independently on Illumina Nextseq platform.The next generation sequencing for the accessions' samples was performed using 2 × 150 bp chemistry; the reads statistics for the generated data is shown in Table 2.A total of 61,656,784 raw reads were retained.These reads were not uniform among different accessions; the highest number of reads (4,486,288) was recorded for HSD8163 while the least number (965,338) was recorded for HSD10033 (Table 2).Three millions quality reads data per sample were generated; the TASSEL-GBS pipeline was used for identifying and calling SNPs from next generation sequenced accessions with a reference genome, as a result; a total of 814859 distinct tags were identified for all 20 sorghum accessions (Table 3).Overall; 94.40% tags were aligned to the reference genome of Sorghum bicolor (http://www.ncbi.nlm.nih.gov/genome/?term=sorghum)using Bowtie 2 tool, 716,829 cut positions were determined.SNPs were called from the aligned tags, resulted in 69,736 SNPs positions in all 20 accessions.A total of 587 genes associated with stress response were obtained from the published sorghum transcriptome [11].

Stress Response Gene and Associated SNPs
Blast search using the flanking sequence against NCBI databases against the reference genome B35 (BT × 623) was considered significant and revealed homology to annotated genes with function in stress adaptation.This genomic regions is found to be coding for the AAA+ ATPases, Peroxidase, Hydrophobic protein LTI6A, Aquaporin SIP2-1, Aconitate hydratase and phosphatidylinositol-4phosphate-5-kinase.
According to [12] such a spatial arrangement suggests a significant role for linkage group A in both drought stress tolerance and yield in sorghum.Linkage group A harbored many important genes encoding key photosynthetic enzymes, heat shock proteins, cell membrane ATPase and ABA-responsive genes.The SNPs identified in this study refers to a gene has a similar sequence to the AT-Pases, Peroxidase, Hydrophobic protein LTI6A, Aquaporin SIP2-1, Aconitate hydratase and phosphatidylinositol-4-phosphate-5-kinase.The plasma membrane proton pumps is an enzymes whose activity is altered significantly in response to a number of factors in the environment.Besides regulation of growth and development processes, the plasma membrane proton pump also plays a role in plastic adaptation of plants to changing conditions, especially conditions of stress.Adaptation is a complex process.Some of the modifications in plants subjected to abiotic stress are indicated to be adaptive.Physiological modifications caused by environmental stress and allowing continued plant functions are ascribed by plant physiologists as being adaptive [13].GBS approach has also been used by Thurber et al. (2015) to detect three genomic regions necessary for temperate adaptation across 1160 sorghum conversion lines, containing the Dw1, Dw2, and Dw3 loci on chromosomes 9, 6, and 7, respectively.Also, [14] Table 3. Summary of tags alignment against Sorghum bicolor (sorghum) Genome.To investigate genetic relationships among these accessions, a phylogenic analysis using the marker data generated and developed by GBS was performed (Figure 1).Generally, the tolerant line El9 makes a separate cluster.HSD8653 and HSD5612 were found to be near to the cluster which includes most accessions with post-flowering drought tolerance (HSD7410, HDS10033, HSD8552, GESHEISH and HSD8849).Post-flowering drought sensitive accessions (Tabat, Wadahmed, HSD6468 and HSD6478) formed a separate cluster while the sensitive accession HSD9959 and the tolerant accessions HSD8511 and HSD9566 were distributed between the two clusters.Thus, cluster analysis suggested the presence of great variation among accessions in post-flowering drought tolerance.
The population structure and genome-wide linkage disequilibrium in 478 spring wheat cultivars from 17 populations across the United States and Mexico, have been studied by [15] using 1536 SNPs, 9 clusters were identified, concluding that the previously inferred populations share a common genetic identity.
Also, [16] analyzed 194 accessions of sorghum using Diversity Array Technology Seeds of twenty sorghum accessions, for this study, were obtained from Agricultural Plant Genetic Resources Conservation and Research Center, Agricultural Research Corporation (ARC), Wad Medani, Sudan, as well as from Department of Agronomy, Faculty of Agriculture, University of Khartoum, Sudan.The accessions were classified by Agricultural Research Corporation into three groups (tolerant, intermediate and sensitive) depending on morphological characters

(
DArT) markers, the clustering of the segregating populations reflects the genetic relationships among the parental lines with regard to the variations panned by the diversity panel.

Figure 1 .
Figure 1.Phylogenetic analysis of sorghum accessions based on GBS results.

Table 1 .
Sorghum accession used in this study.

Table 2 .
Summary of sequenced raw reads recorded for each Sorghum accession.