A Differentially Expressed Gene from a High Oil Producer Cultivar of Castor Bean (Ricinus communis) Is Involved in the Biosynthesis of Ricinoleic Acid

Ricinus 
communis or castor bean is a non-edible oilseed plant widely 
cultivated worldwide for the high content of castor oil in its seeds and the 
different uses the oil has in the industry. An increase in its oil content and 
production efficiency is difficult, making understanding the molecular 
mechanisms underlying the synthesis of oils in the seed necessary. Here, a 
combined analysis of protein-protein interaction networks was performed using public 
data on differential gene expression in castor bean seeds at different stages 
of development. From this analysis, four key enzymes were selected and analyzed 
in the polyunsaturated fatty acids pathways, whose gene expression was 
subsequently quantified during the development of the seeds in a Colombian 
cultivar that produces high amounts of oils and contrasted with a lower 
producing cultivar. The gene coding FAH12 was differentially expressed in the 
early stages of seed development in the high oil-producing cultivar and has 
differences in amino acids A242V and Q319H. The analysis presents this gene as 
one of those responsible for early ricinoleic acid synthesis, making it a 
candidate for use in crop genetic improvement programs to increase the oil content 
in castor bean.

Germplasm collections are one of the most important sources of genetic resources in the world, using which it is possible to study genetic profiles that explain the phenotypic plasticity of plants [9]. In recent years, the Colombian Agricultural Research Corporation (in Spanish, Corporación Colombiana de Investigación Agropecuaria or AGROSAVIA) has advanced in the evaluation of wild-type castor bean materials with high oil content and adaptation to tropical climates, creating a germplasm bank suitable for studying genes related to the biosynthesis of fatty acids in tropical conditions [10].
The biosynthesis of ricinoleic acid in castor bean starts with the biosynthetic pathway of triacylglycerol (TAG) in the plastids and endoplasmic reticulum (ER), a common pathway in most flowering plants that have been studied [11].
In a posterior pathway, the enzyme oleate D12-hydroxylase (FAH12, EC 1.14.13.26), located in the membrane of ER [12], catalyzes the direct hydroxylation of oleic acid (C18:1) to ricinoleic acid (C18:1-OH) [13]. The gene coding this castor bean FAH12 enzyme was already isolated and characterized [14], and Arabidopsis plants were subsequently genetically transformed, producing low levels of ricinoleic acid [15]. Thus, several studies conducted on model plants with genes related to fatty acid production pathways in seeds, have been obtained new fatty acids and unusual fatty acids, information considered the base to new studies in non-model plants [16]. This suggests in addition to key genes, other genes are also needed for the large-scale production of ricinoleic acid in seeds, such as transcription factors or accessory enzymes [17]. Evidence of the above was found when researchers studied the biosynthesis of diacylglycerol acyltransferase (DGAT, EC 2.3.1.20) isoenzymes in castor bean in vitro plants and they identified that diricinolein and C18:1-OH-CoA are needed as substrates [18]. When these enzymes were transformed into Arabidopsis expressing the oleate D12-hydroxylase, the content of C18:1-OH in seeds increased by approximately 30% [19].
Therefore, it is necessary to increase knowledge about gene expression associated with the biosynthetic routes of fatty acids and their regulation. It could provide a better understanding of the mechanisms that control the oil synthesis and then move toward increasing their production efficiency [20]. Nowadays, high-throughput sequencing technologies make it possible to study the gene expression of the complete transcriptomes in plants [21]. This strategy was used in castor bean plants to elucidate its genome [22] and organellar genome [23] and transcriptome in tissues during seed development [24] [25]. All this genetic information helps build new knowledge-based in silico reconstruction of the interaction networks between genes, proteins, and metabolites using organism models as reference [26].
In this research, a protein-protein interaction (PPI) network was reconstructed for castor bean plant based on the orthologous proteins of Arabidopsis. Gene expression values in protein-coding genes during different seed stages were uploaded to the network and biosynthetic key genes involved in the production of fatty acids were identified and verified by RT-qPCR.
For reads mapping to the reference genome (R. communis, http://castorbean.jcvi.org/index.php) was used HISAT [28]. Annotation of the coding genes was made by comparison into the TIGR database [29]. After mapping of reads in the gene models of R. communis genome, were obtained seven files with the read counts per gene using Cufflinks version 0.9.3 [30]. Finally, differential expression ratio for each gene was calculated contrasting different stages (II to V) of endosperm development using Cuffdiff in [30]. Read counts from germinating seed, leaf and developing male flowers were pooled and taken as control.

In silico Construction and Analysis of PPI Network
We conducted a combined analysis of PPI networks (interactome) for R. communis, which was inferred by peer-to-peer orthology analysis with the proteins reported from version TAIR10. The reconstruction of the interaction network was completed by mapping orthologous proteins against the network reported for A. thaliana (ftp://ftp.arabidopsis.org/home/tair/Proteins/) using the software OrtoMCL [31]. The interaction network was enriched with the differential expression ratios obtained in Section 2.1, having fold changes in different developmental stages of the endosperm. Finally, nodes analysis corresponding to the metabolic pathways involved in fatty acid synthesis and accumulation was performed, displaying nodes (enzymes) that exhibited overexpression in the endosperm compared to the control.

Plant Material
For this research, we used cultivars of castor bean (R. communis) classified as experimental cultivar Ricinus Corpoica 03 (VERC03, Colombian collection) and experimental cultivar Ricinus Corpoica 12 (VERC12, a result of genetic improvement from Brazil). Considered as promassing cultivars after having been studied for more than a decade and that are reported as suitable to be grown in dry agoclimatic areas [10]. These cultivars were obtained from the germplasm collection at the La Selva AGROSAVIA Research Center (Rionegro, Colombia) through a Material Transfer Agreement 20162103285. The two cultivars were cultivated in an experimental field plot in the Bajo Cauca Antioqueño, Colombia (Latitude: 7˚56'29''N, Longitude: 75˚8'43''W), located at 150 meters above sea level, with 85% of relative humidity and 28˚ Celsius of average temperature during the day, the experimental area was uniformly fertilized according to [10] and irrigation at 10-day intervals. Plant material was collected for four different stages of seed development (Figure 1 and Figure S1

Fatty Acid Composition and Oil Content
Total saturated, monounsaturated, and polyunsaturated fats in castor bean seeds during different development stages were determined using gas chromatography (GC). Figure 1. Development stages of castor bean seeds selected from the VERC03 cultivar. S1 to S4 represent seed stages based on development time of seeds and their average size: image shows that S1 was defined as seeds with 10 days having 1.39 cm in average size; in S2 the seed having a minor size but with changes of testa color; in S3 stage the volume of seed increases dramatically until 1.43 cm and teste is mature with mosaic color brown and silver; S4 have capsules senesce and desiccate, and less seed weight and capsule size.  Synthesis Kit for RT-qPCR with dsDNAsa (Thermo Fisher Scientific) and cDNA products obtained were stored at −80˚C until the evaluation process.

Quantification by RT-qPCR
Primer design was made using the Primer 3 program following the parameters established by [32] seeking to amplify fragments of approximately 150 bp (Table   1) [33]. The specificity of the primers was verified using the Primer-BLAST pro-

FAD2 and FAH12 Cloning and Sequencing
Based on the analysis of PPI networks and validation by qPCR, four genes

Phylogenetic Analysis
Phylogenetic reconstruction of the desaturated/monooxygenase protein family was performed using the cloned sequence of the FAH12 gene from castor bean VERC03 to corroborate its identity. FAH12 and FAD2 gene sequences of R. communis, A. thaliana, Glycine max, and Jatropha curcas among other phylogenetically close plant species were retrieved from GenBank database (see the access numbers of the sequences in Figure 5).
Firstly, the FAH12 sequence of castor bean VERC03 was translated with Virtual Ribosome 1.1 [38] using the standard genetic code. It was added to the genes dataset in amino acids and then aligned using MAFFT v7.271 [39]. Secondly, nucleotide sequences were aligned with RevTrans2 [40]. Thirdly, these alignments were repartitioned using GBLOCKS 1 [34], and the suggested partitions were compared to the Pfam [41] annotations for the location of the active protein site. Fourthly, the best partitioning scheme was selected according to the evolution models using Partition Finder Protein 1.0.1 [42]. Finally, three topologies were reconstructed: 1) maximum likelihood with Garli (10 independent replicates supported with 1000 bootstrap pseudoreplicates), 2) maximum likelihood with RA × ML (10 independent replicates supported with 1000 bootstrap pseudoreplicates), and 3) Bayesian inference with MrBayes (2 MCMC analyses  with 10,000,000 generations with 4 chains each). All topologies were contrasted using nucleotides and aminoacid seeking to determine inconsistencies.

In silico PPI Network Construction and Analysis
A combined analysis of R. communis PPI networks (interactome) was conducted using an A. thaliana network, as a reference, composed by 11,706 proteins of (TAIR), with 24,417 interactions between these proteins. Overall, it was identified 11,192 proteins orthologous to R. communis, with significant homology to A. thaliana proteins with 23,777 interactions between them. From the protein network obtained for the castor bean plant, FAH12 protein (oleate 12-hydroxylase, genome ID 28035.m000362, E.C. 1.14.18.4) was overexpressed 10.77 times in advanced seed development stages compared with the initial development stages and other tissues ( Figure 2). Furthermore, it was evident that FAH12 also interacts with other proteins related to the production of fatty acids, establishing a possible role of this gene in the production of fatty acids in castor bean (Figure 2(b)). In addition to this gene, FAD2, DGAT2, and PDAT1-2, genes were identified, such as possible genes involved in two physiological processes: biosynthesis and fatty acid accumulation.

Fatty Acid Composition and Oil Content -VERC03 and VERC12
The analytes detected in the chromatogram of the samples were mainly classified The highest amounts of SL, ML, and PL (2.1226 g/100 g, 2.1342 g/100 g, and 2.6517 g/100 g, respectively) were detected and quantified in the fatty acids of the fruits at developmental stage II (S2) of the tropical VERC03 cultivar, revealing that the production and accumulation of triacylglycerols starts at an early stage during the formation of the fruit (Figure 3(a)). Likewise, a direct proportional relationship was determined between the quantities of fatty acids and the size of the fruit until developmental stage III (S3). Analyses of acid types 18:1-OH and 18:2-OH in the tropical VERC03 and VERC12 cultivars showed an increase in oleic and linoleic acids (Figure 3(b)).  in the stages S2 and S3 for only VERC03 cultivar (Figure 4). Expression of these genes in cultivar VERC12 (improved commercial cultivar from Brazil-Embrapa)

Quantification by RT-qPCR
showed to be statistically invariable during the three developmental stages with  respect to the UBI housekeeping gene and S1 stage as a reference. DGAT transcript was invariable in all stages and cultivars (Figure 4).

Gene Sequencing
To validate the identity of FAH12 and FAD2 genes, two fragments of 1166 bp and 1151 bp respectively, were obtained from amplification using cDNA from VERC03 cultivar, and then sequenced. The sequences obtained were aligned and comparative with sequences reported in the GenBank. Differences were found in the amino acids A242V and Q319H in FAH12 and F88V in FAD2 ( Figure S3).

Phylogenetic Analysis
A phylogenetic reconstruction was made for the desaturase/monooxygenase protein family using sequences cloned from FAH12 and FAD2 of the VERC03 castor bean cultivar and accessions retrieved from GenBank. Two groups were formed with a significant value of maximum likelihood, one with sequences of FAH12 genes and another derived from this, with FAD2 group in the top. We corroborated the identity of the clones sequenced and they were located in the same clade with each reference sequence for R. communis (Figure 5).

Discussion
In this research, we studied the gene expression related with the biosynthesis pathway of triacylglycerols in fruits of two castor bean tropical cultivars in castor bean began to take relevance from 1995 where [14] determined an increase in the differential expression of this gene in the seeds with respect to the leaves. In 2007, [20] studied the expression profiles of genes involved in fatty acid and triacylglycerol synthesis in R. communis, wherein they determined an increased expression of FAH during seed development; therefore, a direct relationship between the increase in the expression of this gene and the production Figure 5. Phylogenetic tree of FAH12 and FAD2sequences obtained from tropical castor bean seed VERC03 against other phylogenetically close plant species. Alignment was performed with MAFFT v7.271 and the reconstruction of the tree was performed with maximum likelihood using RAxML software (10 independent replicates supported with 1000 bootstrap pseudoreplicates).
of ricinoleic acid was further determined [20].  [19]. With regard to VERC12 cultivar, it was detected a progressive accumulation of polyunsaturated fatty acids during fruit maturation, which translates into a greater amount of monounsaturated and polyunsaturated fats in the last stage of development evaluated. On the other hand, seed lipids were characterized on natural castor bean mutant deficient in ricinoleic acid synthesis (OLE-1) and it was, identified high levels of palmitic and linoleic acid in the initial development seed stages, but during seed maturation, the ricinoleic acid content came up to 80% of the total fatty acids [44].
The results obtained here for DGAT2 in the VERC03 cultivar are comparable with those reported in 2010 by [27], who revealed the highest expression of this gene in the fruits at intermediate stages of development. In the same way, analyses of the expression of cDNAs in FAH12 transgenic plants revealed that the castor bean type-2 acyl-coenzyme A: diacylglycerol acyltransferase (RcDGAT2) could enhance HFAs from 17% to nearly 30%. These results indicate the probable importance of members of the DGAT2 gene family plays a key role in the process to increase the hydroxy fatty acids in castor bean [19]. Moreover, our sequence analysis of FAH12 gene led to identify two mutations, one of them (H319Q) directly related to changes in its secondary structure and also to enhance ricinoleic acid production as well as was reported by [44], who identified H319Q residue located at FAH12 His box III (domain IX), recognized as the enzyme catalytic site.
Finally, the phylogenetic reconstruction of FAD2 and FAH12 was consistent with what was reported by [14]. Our ML tree clustered both gene sequences with the same genes from Populus trichocarpa, which was also to be expected because of the taxonomic closeness of the two species. Their results suggest that FAD2 and FAH12, despite belonging to the same protein family, have a different function based on similar reaction mechanisms, which may imply an enzymatic evolution.

Conclusion
We successfully used interactome analysis, gas chromatography, qRT-PCR and gene sequencing to develop efficient methods for predicting orthology genes, analytes determination, differential expression of genes and cloning key genes involved in initial and advanced seed development stages, regarding to clarify the biosynthesis pathways and accumulation of fatty acids for a couple of tropical cultivars of castor bean. Despite the fact that we found four important genes involved in the pathway of ricinoleic synthesis in castor bean, it was possible to determine that FAH12 gene was expressed more than 3 folds in advanced seed development stages compared with the initial stages in the high oil-producing VERC03 cultivar. Additionally, it was found special differences in some aminoa-cid residues for FAH12 protein, differences involved in structural changes and enhanced ricinoleic acid production. The results obtained here are of vital importance for future genetic characterization of castor bean promising materials, thinking in using those genes and its identified mutations as molecular markers in markers-assisted breeding in castor bean plants. Finally, we suggest FAH12 gene as a key gene for plant breeding programs that seek to enhance ricinoleic acid production. Figure S1. Development stages of the castor bean seeds selected from the cultivar VERC12. S1 to S4 represent seed stages based on development time of seeds and their average size: image shows that S1 was defined by 10 days of fruit development with a white testa color; in S2 the seed have a minor size but have changes of testa color; in S3 was defined by 40 days of fruit development with a mosaic brown and black testa color; and S4 was defined by 60 days of fruit development with capsules senesce and desiccate, and less seed weight and capsule size. Figure S2. Chromatograms obtained from different development stages of the castor bean seeds selected from the cultivar VERC03.