1. Introduction
Transcriptional and post-transcriptional regulation of gene expression influences and controls many important biological processes in bothmonocots and dicots, such as cellular morphogenesis, signal transduction, and environmental stress responses [1] [2] . Transcription factors (TFs) are responsible for regulating the expression of genes involved in plant-specific cis-regulatory elements in the promoter regions [3] . Yanagisawa and Schmidt [4] were the first to isolate the TF in maize, with an array of Dof TF genes subsequently isolated and functionally characterized in many plants, including Arabidopsis thaliana and Orizasativa [5] [6] , Sorghum bicolor (L.) Moench [7] , Brachypodium distachyon [8] , Solanum lycopersicum [9] , Ricinus communis L. [10] , Cajanus cajan [11] , Phyllostachys heterocycla [12] , Chrysanthemum morifolium [13] , Capsicum annuum [14] and Populus trichocarpa [15] .
The Dof (DNA-binding with one finger) is a plant-specific TF that contains 200 - 400 amino acids and a single C2C2-type (CX2CX21CX2C-type) zinc-finger-like motif composed of 52 amino acid residues at the N-terminal, which specifically binds to a 5'-(A/T)AAAG-3' element [16] [17] . Dof TFs are involved in several important functions [18] , such as root light signaling [19] , germination [20] , regulation of stomatal development [21] , development of the vascular system [22] , and responses to biotic [23] and abiotic [24] [25] stress. As such, identification and classification of the Dof family in common beans is useful for future research on plant gene expression, as to date no study has been performed on identifying members of the Dof family in common bean.
Common bean (Phaseolus vulgaris L.) is one of the most important legume crops for human consumption, and is an exceptional source of protein, carbohydrates, and other nutrients [26] [27] . Despite being the world’s largest producer of common bean, with an average annual production of 3.5 million tons [28] , common bean productivity in Brazil is still considered to be low due to several factors, such as the adverse effects of climatic conditions, and the occurrence of pests and diseases [29] . Therefore, considering the importance of Dof TFs and the lack of information about this gene family in P. vulgaris, weidentified and characterized this gene family in P. vulgaris L. using a computational approach. We identified Dof-coding sequences and characterized them at both phylogenetic and structural levels in order to gain a better understanding of the genetic determinants of tolerance to abiotic and biotic stresses in this crop.
2. Material and Methods
2.1. Identification and Annotation of Dof Genes in the Genome of P. vulgaris
Initially, we identified all members of the Dof proteins in sequences of the A. thaliana genome obtained from the TAIR database (http://www.arabidopsis.org/), whereas those for O. sativa, G. max, and P. vulgaris were downloaded from the databases TIGR (http://rice.plantbiology.msu.edu/) and Phytozome v12 (http://www.phytozome.net). To confirm the identity of the Dof genes, the sequences were compared to those in the GenBank database using BLASTP and BLASTX searches (National Center for Biotechnology Information [NCBI]: http://www.ncbi.nlm.nih.gov) [30] . Protein sequences were aligned using Clustal Omega v. 2.0.3 [31] . The physical and chemical characteristics of Dof proteins in common bean were described using the ProtParam tool (http://web.expasy.org/protparam/), including the number of amino acids, the theoretical isoelectric point (PI), and the molecular weight (kDa). All sequences of predicted Dofproteins were analyzed in silico regarding their subcellular location via the use of WoLF PSORT algorithms (http://wolfpsort.org/).
2.2. Protein Alignment and Phylogenetic Analysis
Multiple sequence alignment of the full-length deduced amino acid sequences of Dof proteins was performed with Clustal Omega v. 2.0.3 set to Hidden Markov Model (HMM) parameters [31] . Phylogenetic and molecular evolutionary analyses were conducted using MEGA v. 6.06 [32] and Maximum Likelihood (ML) using complete deletion. The reliability of the resulting tree was tested via bootstrapping with 1000 replicates.
2.3. Identification of Conserved Motifs
The conserved motifs of the Dof protein sequences were identified using the Multiple Expectation Maximization for Motif Elucidation (MEME; http://meme-suite.org/) [33] , as the basis for the following parameters: motif length set to 6 ~ 100, motif sites set to 2 ~ 120, and maximum number of motifs set to 25. The resulting motifs were checked against NCBI (http://www.ncbi.nlm.nih.gov/gorf/gorf.html) and PROSITE (http://www.expasy.org) to verify their significance.
2.4. Genomic Structure
We used the online Gene Structure Display Server program tool (GSDS; http://gsds.cbi.pku.edu.cn/) [34] to predict the exon/intron organization of the Dofgenes. Complete sequences of the corresponding genomic DNA and full-length transcripts of each gene were used.
2.5. Chromosomal Location and Calculation of the Duplication Events
A local blast search of the P. vulgaris genome sequence was performed to map the physical location of the 36 genes. The locations of the genes on the 11 chromosomes of common bean were mapped with Mapchart 2.2 software [35] . The Plant Genome Duplication Database (http://chibba.agtec.uga.edu/duplication/) was used to estimate the synonymous (Ks; non-synonymous substitution (Ka) rates were calculated following the procedures described by [36] , as well as the evolutionary constraints (Ka/Ks) between the duplicated pairs of PvDofs. The approximate dates of the duplication events were calculated by the equation (T = Ks/2λ), assuming an average value for the synonymous substitution rate (λ) of 8.46 × 10−9 [37] .
2.6. Synteny Analysis
Plant Genome Duplication Database (PGDD; http://chibba.agtec.uga.edu/duplication/) [38] was used to search for orthologous genes in P. vulgaris and A. thaliana; P. vulgaris and O. sativa; and P. vulgaris and G. max. The resulting synteny map was constructed using Circos software (http://circos.ca/) [39] .
2.7. Gene Expression Analysis in Silico
Illumina RNA-seq datasets were downloaded from Phytozome Database (http://phytozome.jgi.doe.gov/pz/portal.html#!info?alias=Org_Pvulgaris). The expression profiles of PvDof genes were analyzed in specific tissue libraries of plants at differentstages of development, consisting of young pods, stem_10, stem_19, flower buds, flowers, root_10, nodules, root_19, green mature buds, leaves, and young triloliates. The expression profile in silico were calculated by Cufflinks in FPKM units (expected number of fragments per kilobase of transcript sequence per millions base pairs sequenced). FPKM values were log2 transformed and the heatmap was generated with the algorithm CIMMiner (http://discover.nci.nih.gov/cimminer).
3. Results
3.1. Identification and Classification of PvDof Genes
Sequence homology analysis identified a total of 36PvDof genes in the P. vulgaris genome (Table 1). When we compared the number of genes with other species, A. thaliana also had 36PvDOF genes, whereas O. sativa had 30, G. max had 78, and S. lycopersicumhad 34 (Table 1). All genes identified encoded proteins containing the Dof domain, and were designated as PvDof01 to PvDof36 based on their location on the chromosome. The names of the PvDof genes, Dof gene accession numbers, gene location, length of the coding sequences, and characteristics of the PvDof proteins are shown in Table 2. The full length coding sequences of the PvDof genes ranged from 612 bp (PvDof33) to 1086 bp (PvDof26),
Table 1. Comparison of the number of Dof genes of each classamong P. vulgaris, A. thaliana, O. sativa, G. max, and S. lycopersicum.
a [5] , b [40] , c [9] .
Table 2. General physical and chemical characteristics of the 36 PvDof genes identified in P. vulgaris.
CDS: coding sequence; bp: base pairs; aa: amino acids; MW: molecular weight (kDa); pI: isoelectric point.
and their putative proteins contained between 203 and 361 amino acid (aa) residues, with an average of ~294 aa. The theoretical pI ranged from 5.03 (PvDof28) to 9.75 (PvDof07), and molecular weights ranged from 20.76 kDa (PvDof19) to 39.11kDa (PvDof02). The majority of common bean PvDof proteins were found in the nucleus, indicating the specific nature of their transcription regulation.
Homologous sequences were analyzed through multiple alignment using the amino acid sequences containing Dof domains. All bean Dof domains had a typical DNA-binding domain of 55 residues spanning a single C2/C2 zinc finger (Figure 1). In general, the regions of the Dof domains had 55 basic residues located in the N-terminal region (Figure 1), with alignments of the other highly
Figure 1. Multiple sequence alignment of Dof domain sequences from the proteins of P. vulgaris. The typical features of Dof proteins showing four cysteine residues are indicated. Below the alignment, the conserved residues of amino acids are represented in blue in the upper boxes.
conserved residues belonging to the Dof family identified in P. vulgaris consisting of Cys-3, Pro-4, Arg-5, Cys-6, Ser-8, Thr-11, Lys-12, Phe-13, Cys-14, Tyr-15, Asn-17,Asn-18, Tyr-19, Gln-23, Pro-24, Pro-25, Phe-27, Cys-28, Cys-31, Arg-33, Trp-35, Thr-36, Gly-38, Gly-39, Arg-42, Pro-45, Gly-47, and Arg-51 (Figure 1). In addition, we observed several partially conserved amino acid residues, consisting of Tyr-16, Ser-20, Ser-22, His-26, Lys-29, Tyr-34, Leu-41, Asn-43, Val-44, Val-46, Gly-48, Gly-49, and Lys-52 (Figure 1).
3.2. Phylogenetic and Conserved Domain Analysis of Dof Proteins in P. vulgaris
A Maximum Likelihood tree was generated from the aligned amino acid sequences of theDof genes in order to assess evolutionary relationships. Our analysis revealed a distinct clustering of Dof proteins, and further analysis using phylogenetic tree topology allowed us to classify the PvDof gene family into four major classes (A, B, C, D) and seven orthologous subclasses (A, B1, B2, C1, C2, D1 and D2, which presented 8, 7, 6, 7, 5, 2, and 1 genes, respectively) (Figure 2). Phylogenetic relationships within multigenic families may provide additional information about the Dof genes evolution [9] . We present detailed information
Figure 2. Phylogenetic relationships and organization of conserved motifs of Dofgenes sequences in common bean. Phylogenetic tree of 36 PvDof proteins was constructed using MEGA; the motifs identified by MEME software are represented by colored boxes and their consensus sequences are shown in Table 3.
about the 25 putative motifs of the Dof gene sequences in P. vulgaris, including names, widths, and best possible matches, in Table 3. Identification of each of these motifs is also illustrated in Figure 2, in which motif 1 is represented by the Dof domain that is uniformly found in all bean protein sequences (Table 3). The motifs 12, 16, 19, 20, 23, 24, and 25 were observed in subclass A; motifs 2, 3, 5, 6, 7, 8, 12, 14, 15, 22, 23, and 24, were observed in subclass B1, which contained the highest number of motifs; motifs 7, 12, 16, 19, 20, 23, and 24 were observed in subclass B2; motifs 4, 10, 15, 17, 18, 21, and 23 were observed in subclass C1; motifs 4, 9, 10, 11, 13, 18, 19, and 25 were observed in subclass C2; and the subclasses D1 and D2 contained two motifs each, consisting of 5 and 21 and 8 and 9, respectively. From these results, it can be seen that the majority of the members
Table 3. The MEME motif sequences and lengths in PvDof proteins.
1Motif 1 represents the Dof domain.
of this gene family are closely related and share common motif compositions, indicating that the structures of the gene members are highly conserved within the same subclass.
3.3. Gene Structure, Chromosomal Location, and Gene Duplication Events of PvDof Genes
Structural diversity and characterizations of exon/intron structure were evaluated for each Dof gene (Figure 3). Genes in the subclasses A and D2 contained no introns, whereas genes in the subclasses B1, B2, C1, C2, and D1 all had one or two introns. The structural analyses of the PvDof genes were based on the results of the clades of the phylogenetic tree, suggesting that, as in other plants, members of the same subclass had similar structures and thus likely perform similar functions.
Genome chromosomal location analyses revealed that PvDof were randomly distributed in 10 out of 11 chromosomes (Figure 4), but the PvDof genes were unevenly distributed among chromosomes. The largest number of PvDof genes occurred on chromosome 2 (six PvDof genes), followed by five located on
Figure 3. Schematic diagram of exon, intron, and untranslated region (UTR) organization, as indicated by yellow rectangles, gray lines, and blue rectangles, respectively.
Figure 4. Physical map of PvDof genes showing their chromosomal locations. Vertical bars represent the chromosomes and numbers at the left indicate gene positions (the scale on the left is in megabases, Mb). The chromosome number is indicated on the top of each chromosome (vertical bar). Red and green lines reflect segmental and tandem duplications, respectively. Data extracted from Table 4.
chromosomes 3 and 6 (Figure 4). In addition, four genes were found on chromosome 9, chromosomes 1, 5, 10, and 11 each possessed three PvDof genes, and one gene was detected on chromosome 7 (Figure 4).
Expansion analysis of the Dof gene family in the P. vulgaris genome was examined. Based on their chromosomal distribution and the high rate of sequence similarity, we determined that 26 duplication pairs arose from segmental and tandem duplication events; the lines in Figure 4 show the connections among these paralogs. Twenty-four of the paralog pairs were the result of putative segmental duplication events. Two pairs of paralogous genes occurred on the same chromosome, separated by only a short distance (<0.2 Kb), which suggests that the gene pairs PvDof24/PvDof25 and PvDof34/PvDof35 represent tandem duplication (Figure 4 and Table 4). Our results indicate that segmental duplication predominated in the expansion of the PvDof gene family in common bean, but that tandem duplication was also involved.
We calculated Ka and Ks values, as well as the Ka/Ks ratio, in order to estimate the date of the duplication events (Table 4). Segmental duplication events of the Dof genes in common bean occurred from 2.13 mya (million years ago) (Ks = 0.04) to 26.06 mya (Ks = 0.44), with a mean of 11.54 mya. However, estimations of the date of tandem duplication events in the paralog genes were not possible because these gene pairs (PvDOF24/PvDOF25 and PvDOF34/PvDOF35) differed only in their intron sequences. The Ka/Ks ratio of all duplication events was >0.3, which implies that significant functional divergence could have occurred after duplication. The Ka/Ks ratios of six duplicate pairs were <1.0, indicating that the PvDof genes evolved under negative selection acting against protein-coding changes. These results suggest that segmental/tandem expansion of the Dof gene family in common bean could be dated to relatively recent duplication events.
Table 4. Date of duplication of the pairs of paralogous genes of the PvDof gene family. Ka represents the non-synonymous substitution number per non-synonymous site, Ks is the number of the synonymous substitution site; Ka/Ks represents the ratio of non-synonymous (Ka) to synonymous (Ks) substitutions.
3.4. Comparative and Synteny Analyses of the Dof Gene Families in P. vulgaris, A. thaliana, O. sativa, and G. max
To evaluate the evolutionary relationship of the Dof gene family among different plants, a phylogenetic tree was generated from the amino acid sequences of P. vulgaris, A. thaliana, O. sativa, and G. max. Maximum Likelihood analysis revealed a distinct clustering pattern of Dof proteins, and phylogenetic tree topology allowed us to classify the Dof gene family into four major classes designated: A, B, C, D and nine orthologous subclasses A, B1, B2, C1, C2, C3, D1, D2 and D3 (Figure 5). Of these, classes C and B were the largest, containing 63 and 41 orthologs and accounting for 36% and 23% of the total predicted number of Dof genes, respectively, whereas class A, the smallest class, contained only 35 members and accounted for 19% of predicted Dof genes. The number of clusters found here was similar to the results of previous research [5] [41] . Distribution among the subclasses was intervowen for the majority of the Dof members, indicating that Dof gene family expansion occurred prior to the divergence of common bean, Arabidopsis, soybean, and rice. The subclasses C3 and D3, which were species-specific to Arabidopsis and rice, respectively, may be the result of a gene loss event during dicot-monocot divergence [41] [42] .
Figure 5. Phylogenetic tree of the amino acid sequences of Dof genes generated from 36 sequence of P. vulgares, 36 sequences of A. thaliana, 30 sequences of O. sativa, and 78 sequences of G. max, using 1000 bootstrap replicates. Individual PvD of subgroups are identified by the different colors on the tree.
A substantial number of Dof genes were systematically investigated, and synteny analysis was performed between P. vulgaris Dof genes and those of two other plants, one a dicot (A. thaliana) and the other a monocot (O. sativa). In addition, synteny analysis was performed on G.max, a legume closely related to P. vulgaris [37] . As such, three comparative synteny maps were constructed, consisting of P. vulgaris against A. thaliana, O. sativa, and G. max (Figure 6). A total of 123 pairs of orthologous genes with synteny relationships were identified. Seven pairs of Dof genes were found with synteny relationships, including five AtDof genes and five PvDof genes in Arabidopsis and common bean, respectively (Supplementary Table S1). Only two pairs of matching Dof synteny genes were common to bean and rice, including two OsDof genes and one PvDof gene (Supplementary Table S2). A total of 114 pairs of synteny relationships were found between soybean and common bean, of which 62 GmDof genes and 33 PvDof genes were detected (Supplementary Table S3). However, no synteny was observed for the PvDof03, PvDof31, and PvDof35 genes, suggesting that these orthologous genes were formed following the divergence of P. vulgaris and G. max. It would appear that the Dof genes in P. vulgaris share an origin with those in A. thaliana, O. sativa, and G. max, but that subsequent expansion of the PvDof genes occurred following the monocot/dicot divergence. In addition, we observed clear losses and/or duplications of several of the Dof genes in the genomes of these plants.
3.5. Transcription Profiling of PvDof Genes in Different Tissues
We analyzed the transcriptional profiles of all 36 PvDof genes in 11 different plant tissues (young pods, stem_10, stem_19, flower buds, flowers, root_10, nodules, root_19, green mature buds, leaves, and young triloliates) (Figure 7). The expression patterns indicated that the PvDof10, PvDof30, PvDof36, PvDof12, and PvDof27 genes were classified into classes A and C, and were preferentially expressed in young pod and stem tissues. We then examined the response of the
(a) (b) (c)
Figure 6. Genome-wide synteny analysis of Dof genes. (a) Comparative map between P. vulgaris and A. thaliana. (b) Comparative map between P. vulgaris and O. sativa. (c) Comparative map between P. vulgaris and G. max.
Figure 7. Heatmap showing the expression profiles of common bean PvD of genes across different tissues based on specific libraries. FPKM average values were used, and hierarchical clustering in the different tissues is represented by the color scale. Tissues included in the analysis consisted of young pods, stem_10, stem_19, flower buds, flowers, root_10, nodules, root_19, green mature buds, leaves, and young triloliates.
PvDof23 and PvDof03 genes in subclass C1, as these were expressed only at very low levels in almost all of the tissues and organs of common bean (Figure 7).
4. Discussion
The Dof gene family, which is found in many plant species, is responsible for numerous transcription regulation functions associated with various biotic and abiotic stress responses. This gene family is especially prominent in such plants as Arabidopsis spp. and O. sativa [5] , G. max [40] , S. lycopersicum [9] , S. officinarum [43] , and P. heterocycla [12] . In this study, we identified a total of 36PvDof genes in P. vulgaris (Table 1). The number of PvDof homologs identified in this study was similar to that found previously in Arabidopsis, rice, sorghum, and poplar [5] [7] [15] . Our results indicated that the Dof genes in P. vulgaris are highly similar to those in other species. Our results also revealed that the conserved C2C2-Dof domain was uniformly observed in all PvDof proteins. This domain is indicative to be considered a functional TF pertaining to the Dof gene family [40] [44] . Although the same number of Dof genes was found in Arabidopsis (36) and common bean (36), the common bean genome, at 650 Mb [45] , is considerably larger than the Arabidopsis genome, at 145 Mb [46] . As shown in Table 1, Cai et al. [9] found 34 genes in tomato (with a genome size of 950 Mb), indicating that genome size is not proportional to the number of genes.
Accurate classification was important for understanding the structures, functions, and evolution of the PvDof genes. In order to gain further insight into the evolutionary relationships between PvDof genes in common bean, we evaluated the exon/intron structural organization of all protein sequences. There were between zero and two introns in each gene, whereas most members of the same class/subclasses shared similar intron/exon organization (Figure 3). Our results corroborate those found in other species, such as Arabidopsis [5] , Cucumissativus [47] , and S. lycopersicum [9] . Divergence in the intron/exon structure can provide important information on evolutionary factors when processing the phylogenetic relationships of several multigenic families found in plants [48] . In addition, the MEME motif search tool was employed to identify and understand the diversity of the motifs in the PvDof genes, for which we identified 25 different conserved motifs that are present in each of the Dof protein sequences in P. vulgaris. The majority of PvDof genes within the same subclass shared similar motifs, suggesting that these conserved motifs are closely related and implying functional similarities between the proteins (Figure 2). Analysis of gene structure and conserved motif position provides additional information about the evolutionary relationships of this family in P. vulgaris [11] .
Gene family expansion in plants is primarily the result of segmental/tandem duplication and transposition events. Gene duplication on different chromosomes is often due to segmental duplication events, whereas the presence of two or more genes on the same chromosome indicates a tandem duplication event [49] . Thus, we analyzed the chromosomal distributions of the PvDof genes, which are shown in Table 4. We identified 24 pairs of paralogous genes randomly scattered throughout the genome, which we considered to be evidence of segmental duplication, whereas two pairs of genes found on the same chromosome were considered to be evidence of a tandem duplication event. Gene duplication plays an important role in gene family expansion and functional diversification [50] . Comparing the ratio of non-synonymous (Ka) to synonymous (Ks) mutations provides a means of analyzing positive and negative selection of specific amino acid sites within the total length of Dof protein sequences between the different groups [11] . Analysis of the Ka/Ks ratio indicated that, despite differences between the Ka/Ks values, most were substantially less than or equal to one, which suggests that the sequences within each of the class are under strong purification selection pressure and that positive selection may have acted.
Phylogenetic comparison and the construction of synteny maps of common bean Dof proteins showed that they were most similar to soybean Dof proteins, which reflects the similarity between the genomes of the two species. We found one extensive gene synteny between P. vulgaris and G. max, in which the total number of genes identified in common bean (91.66%, or 33 genes) were in synteny with Dof proteins in G.max. Previous studies have shown that P. vulgaris and G. max diverged from a common ancestor and shared a whole-genome duplication (WGD) event ~56.5 mya, and only diverged from one another ~19.2 mya [37] [51] . In addition, G. max experienced an independent WGD ~10 million years ago [37] [52] . This became evident when we compared the number of orthologous genes between these two species, in which 33 PvDOF syntenic genes from the common bean genome exhibited a 1:2 mapping to 62 GmDof syntenic ortholog genes in soybean. The PvDof05, PvDof25, and PvDof35 proteins appear to be unique to common bean, suggesting that these genes may have specific regulatory functions in this species, and may be involved in different physiological processes, although confirmation of this hypothesis requires further research.
Expression profiles were analyzed to determine the specificity of the Dof genes in common bean, which revealed that most of the PvDof genes were expressed in different tissues; moreover, detailed analysis of the expression patterns indicated that most genes pooled in the same subgroup had similar expression profiles. As shown in Figure 7, the expression levels of the PvDof10, PvDof30, PvDof36, PvDof12 and PvDof27 genes belonging to classes A and C were relatively higher in young pod and stem tissues, indicating that they may play important roles in the development of these tissues in bean. Wang et al. [14] reported that CaDofs28, CaDofs10, CaDofs14, and CaDof16 were primarily expressed in the stems of Capsicum annuum, which is perhaps unsurprising given that the stem contains abundant vascular tissue; Kim et al. [53] also observed, in Arabidopsis, that the AtDof5.1 gene was highly expressed in vascular tissues. These expression profiles suggest that PvDof genes may be involved in various physiological functions during plant development.
5. Conclusions
Here, we examined the genome sequence, classification, chromosomal locations, and conserved motifs of the 36 Dof genes in common beans via genome-wide analysis. The PvDofgenes were distributed on 10 chromosomes, and the high degree of variation in their sequences provided potential evidence for diversifying functions. Multiple alignment of the PvDoF sequences revealed highly conserved cysteine residues, which are considered to be a unique feature of Dof TFs. In addition, extensive in silico characterization of these proteins will provide insight into the diversity of their genetic structures in terms of numbers and intron/exon positions, as well as in terms of their functional diversity. Finally, phylogenetic comparisons of common bean Dof proteins with those found in Arabidopsis, rice, and soybean led to the identification of several orthologous and paralogous genes, which furthers our understanding of the evolutionary characteristics of this family of genes in P. vulgaris and other plant species. The results of this study provide additional information and potential biotechnological resources for further understanding the molecular basis of this gene family and consequently improvement of common bean crops.
Acknowledgements
The authors thank UNIPAR for the financial support. TMI and CBT thank CAPES for the fellowship.
Conflict of Interests
The authors declare no conflict of interest.
Supplementary
Table S1. Synteny between Phaseolus vulgaris and Arabidopsis thaliana Dof family genes.
Table S2. Synteny between Phaseolus vulgaris and Oryza sativa Dof family genes.