Identification of Quantitative Trait Loci (QTL) Underlying Protein, Oil, and Five Major Fatty Acids’ Contents in Soybean ()
1. Introduction
Soybean (Glycine max L. Merr.) is one of the major economic crops in the world for the source of edible oil and feeds. Traditionally, protein and oil are the primary seed quality traits of interest for soybean but recently there is increased demand for altered oil components i.e., fatty acids for biofuel and human consumption. The range of protein and oil content in modern elite soybean cultivars are 34.9% - 39.6% and 19.0% - 23.5% respectively [1,2]. The oil quality depends on its fatty acid composition that affects nutritional value, flavor and stability of the soybean oil. The term “total fatty acid content” refers to the sum of the five major fatty acid components found in soybeans, namely palmitic (C16:0), stearic (C18:0), oleic (C18:1), linoleic (C18:2) and linolenic acids (C18:3) [3]. The human body can synthesize palmitic, stearic and oleic acids through metabolism. Low percentage of palmitic acid in soybean oil is desirable for edible oils which may be able to reduce the risk of coronary disease [4,5]. A diet with high oleic acid, like olive oil, has reduced cholesterol that can affect positively arteriosclerosis and heart disease [6]. Another use for modified soybean oils is the production of biodiesel fuel which has emerged as a potential renewable energy source to help alleviate environmental concerns. A high percentage of oleic acid is desirable for biodiesel to enhance biodiesel’s oxidative stability [7]. However, a high percentage of linolenic and linoleic acids are desirable to increase the energy content of biodiesel and decrease the gelling temperature. Human can’t synthesize linoleic and linolenic acids and these two essential fatty acids that humans have to obtain from food. Linolenic acid is also essential in photosynthesis and pollen development in plants, thus it cannot be eliminated from the seed oil [3]. Rancidity and short shelf life in soybean oil are due to linolenic acid because it can easily be oxidized at the three double bonds [3]. Separate breeding efforts have been initiated to alter the fatty acid profile of soybean oil for human consumption with high oleic acid and low palmitic and linolenic acids and for biodiesel with the reverse profile [5].
Protein and oil contents are polygenic quantitative traits resulted from interactions between multiple genes and the environment [8]. Gelderman (1975) [9] first referred such polygenes by an acronym Quantitative Trait Loci (QTL), a region of genome that associated with an effect on a quantitative trait. Abundance molecular markers are necessary to cover the whole genome of soybean which could identify significant QTL and candidate genes for quality traits of soybean seeds. Currently, Soy Base (2013) [10] contains more than 250 QTL for each of seed protein and oil, all of which have been mapped in many different populations and environments. Recently, we identified two QTL for protein and six QTL for oil in the recombinant inbred lines (RIL) of PI 438489B and “Hamilton” [11]. SoyBase (2013) [10]also contains more than 245 QTL for fatty acid components viz. palmitic, stearic, oleic, linoleic, and linolenic acids mapped in many different populations and environments.
Different types of molecular markers have been used to construct linkage maps and identify QTL in soybean such as single nucleotide polymorphism (SNP), restriction fragment length polymorphisms (RFLP), simple sequence repeat (SSR), amplified fragment length polymorphism (AFLP), and random amplified polymorphic DNA (RAPD) [11-19]. Here the Genetic Linkage Map of Soybean from SNP markers genotyped through Illumina Infinium BeadChip Array was used. The objective of this study was to identify QTL associated with protein, oil and five major fatty acids in theMD96-5722 by ‘Spencer’ recombinant inbred lines (RILs) population.
2. Materials and Methods
2.1. Plant Material and Seed Analysis for Protein, Oil, and Fatty Acids
Ninety-two F5:7 RILs developed by crossing breeding line, MD96-5722, with cultivar ‘Spencer’, was used to generate phenotypic and genotypic data. The population was grown in a field at Fayetteville State University(FSU) campus, Fayetteville, NC in 2012 with row spaces of 25 cm generating a plant density of 160,000 plants/ha. No additional fertilizers or insecticides were used. The experiment was conducted from May 1st to October 4th, 2012. Seeds at harvest maturity stage were analyzed for protein, oil, and fatty acids. About 25 g of seed from each line was ground using a Laboratory Mill 3600 (Perten, Springfield, IL), the ground samples were analyzed by near infrared reflectance using a diode array feed analyzer AD 7200 (Perten, Springfield, IL) [20-22]. Perten’s Thermo Galactic Grams PLS IQ software, initially developed by the University of Minnesota was used for calibrations. Protein and oil analyses were based on a seed dry matter basis, and fatty acids were analyzed based on total oil [21,23].
2.2. Genetic Map and QTL Identification
The ‘MD 96-5722’ by ‘Spencer’ population was genotyped using 5,376 SNPs through SoySNP6K Illumina Infinium BeadChip array. A genetic linkage map [24] was constructed through Join Map 4 (Kyazma BV, Wageningen, Netherlands) [25]. Composite interval mapping (CIM) was used to detect QTL from genotypic and phenotypic data using WinQTLCart2.5 software (http://statgen.ncsu.edu/qtlcart/WQTLCart.htm) [26]. The Model 6 with four parameters for forward and backward stepwise regression, 10 cM window size, 1 cM step size and five (5) control markers were chosen for running WinQTLCart [26]. The threshold was determined by permutations tests with 1000 iterations.
3. Results
Table 1 shows approximately normal distribution of protein, oil and the 5 major fatty acid compositions in the F5:7 seeds. Both the skewness and kurtosis values for these traits were <1.00. Variations among RIL lines were narrow for protein (CV = 3.57%) and oil (4.74%) but was wider for fatty acids components. For example, variation for oleic acid was 12.77% and for linolenic acid was 15.88%.
Composite Interval Mapping (CIM) was used to identify candidate QTL. Names of QTL against each trait, linkage group (LG)/chromosome (Chr_), positions of the QTL with markers (SNP) interval, LOD score, percentage of r2 and additive values presented in Table 2. Figure 1 also presented positions of QTL on each linkage group. Eight linkage groups namely, LG-N (Chr_3), LG-A1 (Chr_5), LG-K (Chr_9), LG-F (Chr_13), LG-B2 (Chr_14), LG-E (Chr_15), LG-J (Chr_16), and LG-G (Chr_18) contained 28 QTL for protein, oil and the five

Table 1. Means and ranges of protein, oil, palmitic, stearic, oleic, linoleic and linolenic acid contents (% on dry based) of soybean cultivar, MD96-5722, ‘Spencer’ and their F5:7 RIL population.

Table 2. Chromosomal locations and parameters associated with the quantitative trait loci (QTL) of protein, oil and major fatty acid components in MD 96-5722 and ‘Spencer’ Recombinant Inbred Line populations of soybean.
major fatty acids component. No QTL was found on the other 12 linkage groups of soybean.
One significant QTL for protein (qPro001) was identified on LG-B2 (Chr_14) with LOD scores 4.13. Eleven QTL (qOil001 to qOil011) for oil contents were identified. Within these QTL, two were on LG-N (Chr_3); one of each was on LG-A1 (Chr_5), LG-K (Chr_9), and LG F (Chr_13). Among the rest six, three QTL was identified on each LG-B2 (Chr_14) and LG-J (Chr_16). The LOD scores for QTL of oil ranged between 2.51to 4.67.
There were total of 16 QTL for fatty acids identified; two for palmitic acid (qPal001 on LG-N/Chr_3 and qPal002 on LG-G/Chr_18), one for stearic (qSte001 on LG-J/Chr_16), three for oleic (qOle001 on LG-F /Chr_13; qOle002 and qOle003 on LG-J/Chr_16), three for linoleic (qLinl001 on LG-N/Chr_3; qLinl002 and qLinl003 on LG-J/Chr_16), and seven for linolenic acid (qLinn001 and qLinn002 on LG-F/Chr_13; qLinn003 on LG-B2/ Chr_14; qLinn004 and qLinn005 on LG-E/Chr_15; qLinn006 and qLinn007 on LG-J/Chr_16). The result of QTL analysis provides clue that, parental lines MD 96- 5722 and “Spencer” each have beneficial alleles at different loci controlling protein, oil and fatty acids concentrations.
4. Discussion
Linkage mapping has been assisting soybean breeding programs extensively, by identifying QTL for protein and oil contents with a range of genetic backgrounds and in different environments [20,27-29]. Various soybean lines such as wild and cultivated soybeans and genotypes from different countries have also been used to explore seed protein QTL [26,27,29]. Here a QTL for protein on LG B2 (Chr_14) was identified, which was overlapped with one of oil QTL (qOil006) on the same linkage group or chromosome (Table 2, Figure 1). This overlapping or pleiotropic effect of protein and oil QTL within homeologous regions inferred a rearrangement of the QTL in homeologous pairs that may occurred due to duplicationevent [30,31]. Previously six QTL were identified with SNPs for oil contents (qOil001 - qOil006) in the PI 438489B by Hamilton population [11]; among these six, two QTL (qOil007 and qOil008) were identified on LG-B2 (Chr_14). However, their positions were different from the QTL identified in this study. Another recent study identified 11 QTL for oil concentration in a RIL population of “OAC Wallace” and “OAC Glencoe” with SSR markers on nine different chromosomes; specifically1 (LG-D1a), 7 (LG-M), 9 (LG-K), 12 (LG-H), 13 (LGF), 14 (LG-B2), 16 (LG-J), and 17 (LG-D2) [32]. Many linkage groups mentioned above are common with our findings regarding oil QTL but the cM positions are different.
The results of QTL analysis showed a total of 16 QTL for fatty acids (Table 2, Figure 1). Among these fatty acids QTL, two QTL associating with palmitic acid identified on LG-N (Chr_3) and LG-G (Chr_18) with LOD scores of 3.21 and 4.80 respectively. Reinprecht et al. also reported a QTL for palmitic acid on LG-N but in different position [33]. However, not identified here but were reported earlier palmitic acid QTL on LG-D2,LG-K, and LG-L by Moongkanna et al., on LG-A1, B2 by Hyten et al., on J and M by Diers and Shoemaker, on LG-A1 and LG-M by Li et al., and on LG-D1b and LG-A2 by Panthee et al. [3,34-37]. A QTL and SNP markers associated with stearic acid content were found on LG-J (Chr_16) (Table 2, Figure 1). Previously, Diers and Shoemaker [34] and Panthee et al. [37] also reported QTL for stearic acid composition on LG-J (Chr_16). Other groups identified such QTL on different linkage groups such as, Panthee et al., Brummer et al., and Spencer et al. on LG-B2 [37-39]. Moongkanna et al. located QTL on LG-A1, C2, E, and O [3]. More stearic acid QTL were also mapped on LG-C2, L by Hyten et al., on LG-F, G, M by Reinprecht et al., and on LG-B2 by Spencer et al. [33,34,39].
Three QTL for oleic acid were identified on LG-F (Chr-13) and LG-J (Chr-16) with LOD scores ranged from 3.39 to 3.60 (Table 2, Figure 1). The findings here on oleic acid QTL are different from many other reports. For example, Moongkanna et al. located 8QTL linked to high oleic acid percentage on LG-A1, G and H [3]; Panthee et al. reported a QTL on LG-E [37]; Monteros et al. reported 2 QTL on LG-A1, D2 and G [40]; Hyten et al. (2004) reported 2 QTL on LG-D1b and L [34]; and Bachlava et al. were on LG-I, L and O [41]. Three QTL underlying linoleic acid content were identified on two different LG-F (Chr_13) and LG-J (Chr_16) (Table 2, Figure 1). One QTL was identified by Hyten et al. at different position but on the same LG-F, they also detected another QTL on LG-L [34]. No similar positions or linkage groups were so far found for linoleic acid contents from the past studies. For example, Moongkanna et al. identified seven SSR markers associated with linoleic acid on LG-A1, G, H, and Panthee et al. found a QTL on LG-E [3,37].Based on our CIM analysis, there were 7 linolenic acid QTL identified and these are located on LG-F (Chr_13), LG_B2 (Chr_14), LG-E (Chr_15), and LG-J (Chr_16) (Table 2, Figure 1). QTL for linolenic acid was earlier reported on LG-E, G, H by Moongkanna et al., [3] on LG-B2 by Spencer et al., and Byrum et al. [39,42], on LG-C2, E, H, O by Shibata et al. [43], LG-E, K by Diers and Shoemaker [35], on LG-E, G by Panthee et al. [37], on LG-F, L by Hyten et al. [34] and on LG-E, K by Reinprecht et al. [33]. The finding here of linolenic acid QTL on LG-E and LG-F are in agreement with some earlier reports cited above.
Some of the QTL among 28 identified here on specific linkage groups or chromosomes were common to the past studies but the positions of the QTL were different. This may be because many of the previous QTL were discovered through simple linear regression methods (SIM) not by composite interval mapping (CIM), also they used different types of markers, different populations and plant populations also grow in different environments. In past studies, there was also few identical QTL identified in same experimental population grown in different environments by different researchers or in different years [44,45]. Although the development of a more saturated genetic linkage map would enhance the chance of identification of more QTL of protein, oil and fatty acids, especially within the gaps in our current genetic map, multiple location experiments are needed to determine the QTL environmental stability.
Acknowledgements
The authors would like to thank the Department of Defense (DoD) for funding this work through the grant# W911NF-11-1-0178 to M.A.K and S.K. We thank Ms. Pam Ratcliff and the rest of the undergrad students crew at FSU for taking care of the plants in the greenhouse and field, and Sandra Mosley at USDA-ARS, Stoneville, MS, for lab assistance on protein, oil and fatty acids analysis. This research was partially funded by United States Department of Agriculture, Agricultural Research Service project number 6402-21220-012-00D. The US Department of Agriculture (USDA) prohibits discrimination in all its programs and activities on the basis of race, color, national origin, age, disability, and where applicable, sex, marital status, familial status, parental status, religion, sexual orientation, genetic information, political beliefs, reprisal, or because all or part of an individual’s income is derived from any public assistance program. (Not all prohibited bases apply to all programs.) Persons with disabilities who require alternative means for communication of program information (Braille, large print, audiotape, etc.) should contact USDA’s TARGET Center at (202) 720-2600 (voice and TDD). To file a complaint of discrimination, write to USDA, Director, Office of Civil Rights, 1400 Independence Avenue, S.W., Washington, D.C. 20250-9410, or call (800) 795-3272 (voice) or (202) 720-6382 (TDD). USDA is an equal opportunity provider and employer.
NOTES
#Corresponding author.