DOF transcription factors in developing peanut ( Arachis hypogaea ) seeds

DNA binding with one finger (DOF) transcription factors play important roles in storage material accumulation and morphogenesis of developing seeds. Oil and protein contents varied in different cultivars in important oil crop peanut. DOF proteins have not been studied in this crop. In this paper, we analyzed all the DOF genes expressed in developing seeds from a cDNA library with 20,000 transcripts, cloned and compared similar genes of GW391729 from eight peanut cultivars, and analyzed similar genes expressed in root and leave with control and inoculated with Ralstonia solanacearum. The results indicate that total eight types of DOF genes were expressed in developing seeds of cultivar 063103. Most of DOF transcription factors expressed involved in developmental process in a complicated way. Among them, GW391729 is possible related to the seed number in fruit, and also is possible related to leafspot resistance. Detailed function of these DOF proteins need to be further studied.


INTRODUCTION
DOF family proteins are plant specific with the structure feature of C2-C2 formed Zn-finger binding to DNA motif T/AAAAG at their N-terminal, which possess functions in various biological process in plants [1][2][3].The DOF proteins have been systematically analyzed in arabidopsis, rice (Oryza sativa subsp.Japonica), poplar (Populus trichocarpa) and soybean (Glycine max) [1,2,4,5].DOF proteins have been classified into seven groups [1], and each group contains certain number of conserved sequences [2].The function of proteins are composed of subfunctions, which are determined by the responsive subdomains in the proteins.The conserved DOF zinc finger domains in DOF proteins can bind to cis DNA core elements AAAG [6], and may also interact with other proteins including DOF itself [7].The differences in DOF zinc finger domains determined its binding affinity with specific cis AAAG motif, which could locate in different genes or in different elements of one promoter [8].The region outside DOF zinc finger domain could have different function, such as modifying structure of the DOF zinc finger or interacting with other proteins, etc [9,10].Thus it is possible to bioinformatically analyze the function of unknown DOF proteins by paired comparison of similar amino acid sequences of conserved domains.
DOF proteins play different regulation role in various developmental stages in plants.PBF (P box binding factor) is the DOF protein studied in a great detail in the process of seed development and formation of storage material.PBFs expressed in endosperm in maize (Zea mays), barley (Hordeum vulgare), rice and wheat (Triticum aestivum), can bind to cis-elements in endosperm specific promoters to regulate the expression of endosperm specific genes as well as typical storage protein genes [11,12].The wheat PBFs (WPBF), however, have been reported to be expresseed in all types of tissues, and interacts with a TaQM protein from root [13].PBFs can bind to dof box in the promoter of storage protein gene, and also interact with other transcription factors which binding with the same promoter in the adjacent region such as PBF with O2 [9], BPBF with GAMYB [10] and OBF with OBP [14].It has also been demonstrated that in some genes there are more than one dof boxes in promoters [15].For example, ZmDof1 can binds to multiple sites in the promoter of CYPPDK1 and PEPCZM2A genes [15].In promoters of major storage proteins ARAH1 and ARAH3 genes of peanut, there are also multiple dof boxes, which even takes up more than one third of known cis-elements of promoter [16][17][18].
In a cDNA library of developing seeds from peanut variety 063103, eight clones of DOF cDNAs from over 20,000 clones have been found.These clones were sequenced, translated and compared with AtDofs, GmDofs and ZmPBF.The AhDof proteins are structurally classified and their functions are suggested via bioinformatic and theoretic analysis.

Materials
Eight DOF cDNA clones are from a cDNA library of developing seeds of peanut line 06-3103, which is leaf spot disease resistance.Additional DOF cDNA sequences similar to DF cDNAs from peanut seed are screened from cDNA libraries of leave and root of control and innoculated with Ralstonia solanacearum.The vector is pDNR-LIB (CLONTECH).The plasmids are purified and the inserts are examined by PCR with M13F and M13R primers.The DOF clones are sequenced with M13R and T7 primers by BGI (Huada Gene) Company in Wuhan.The DOF ESTs from peanut cultivars VBL6, GT-C20 of developing seeds are obtained with blastn-est program from GeneBank [36].The cDNA sequences are translated into protein sequence with AUGUSTUS (version 2.1) [37].The AhDof ests from 06-3103 are deposited in GeneBank as: GW391722, GW391723, GW391-724, GW391725, GW391726, GW391727, GW391728, GW391729.
Other proteins and Ests from peanut listed in text are from GeneBank and Swiss-Prot.DNA extracted from Material listed in Table 3.

Sequence Comparison and Analysis
DOF cDNAs from peanut line 06-3103 were batch blasted with blasx-nr and blast-swiss to get mostly similar DOF proteins from arabidopsis and soybean, maize PBF was used as a reference [39].Then the selected seed DOF proteins were pairewise compared with GeneDoc software.The conserved domains are searched with conserved motifs listed by Lijavetzky et al. [2].The DOF proteins are classified by the similarity comparison with that of arabdopsis according to Yanagisawa [1].

Bioinformatic Analysis on Function of DOF Proteins
The information of expression patterns of AtDofs similar to Ahdofs from developing seeds of 06-3101 were collected from iHOP (Information Hyperlinked over Proteins), http://www.ihop-net.org/[40,41].The information about functions of related DOF proteins are also picked from literature by searching key word "DOF" and genebank association numbers in NCBI Pubmed, and by searching TAIR and linked information.

Eight Different DOF Genes Expressed in Developing Peanut Seed
Eight different DOF genes were found from a cDNA library from developing seed with over 20,000 clones of cultivar 063103.The resulted DOF proteins belong to DOF type II, III, IV, V, and VII respectively according to Lijavetzky et al. [2] (Table 1).GW391725, GW391726 and GW391727 are similar to three CYCLING DOF FACTORs (CDF) genes (type II), AtCDF3, AtCDF2 and AtCDF1 in arabidopsis respectively (Figure 1 33C cp AhAtC
(Figure 1(c)), belong to type V. GW391724 is similar to AtDof3.4 and GmDof2 (Figure 1(d)), belong to type IV.GW391728 is similar to AhDof3 (ACF74278.1)and GmDof5 (Figure 1(e)), belong to type VII; GW391729 is similar to but at less similar level to AtDof2.2 and GmDof12 (Figure 1(f)).Among these similar AtDof and GmDof proteins, All DOFs are expressed in developing seed [19,40].Different types of DOF proteins not only have their distinct sequences in N and C terminal regions (Table 1), but also have their specific characteristics in conserved DOF finger regions (Figure 2).The key structure in zinc finger domain is two CXXC bridges and a C residue surrounded by some conserved amino acids between two CXXCs.Four C residues in two CXXCs binding with one zinc ion cause the formation of a finger like conformation in this region.There are four variable motifs in zinc finger domain among AhDofs, GmDof and AtDofs.The first motifs is N (or K/D/E) SM (orT/P/S) D (E/N) which locates right after CPRC (Figure 2).The second is NVN (NIN/NFS/NYS/SLS/SLT), which is after KFCY-YNNY, the middle conserved domain containing middle C residue (Figure 2).The third variable region cover the second CXXC, H (or/Y) FACKN (or/A/S/T/K) CQ (or/R/K) RYWTS (or/A/H/K/R) (Figure 2).The forth variable region is AGR (/GGS/GVS/GGC/GSY) (Figure 2), right before the possible NLS sequence RKN-KR [42,43].The N terminal CXXC is conserved as CPRC in AhDofs listed, the second CXXC in AhDofs listed have variation as CKNC (CDFs), CKSC (GW391724, GW-391722), CKAC (GW391729, GW391728, and PBF of Maize) and CKTC (GW391723) (Figure 2).
It is obvious that in all CDFs there is coherence among four variable regions and consistency among different CDFs in the same region (Figure 2).It is coherent that the four variable regions are all relatively conserved in CDFs.They are N (or K) SMD (for peanut) (E for arabidopsis), NV (or I) N, HFCKN (for peanut and soybean) (A or K for arabidopsis) Q-A(or S), AGR.The CDFs are consistent in that the conserved combination of amino acids are rich of N or Q (only one) which are polar amino acid residues without charge.In the key variable region, the variable amino acids have the same physiochemical property, such as D and E in the first region, V and I in the second region (Figure 2).Some are restrictly the same, such as M before D or E, N-N, Q after the second CXXC, AGR before RKN.The common property in CDF zinc DOF domain right before nuclear  C terminal domain of AhCDF3s (GW391726, GW-391725) (Figure 3(c)).Within its N terminal 205 amino acids residues, only 13 amino acids are different from that of GW391726, within its N terminal 86 amino acids residues, only 9 amino acids are different from that of GW391725 (Figure 3(c)), which is different between species.GO339137 is highly similar to a 67 aa at C terminal of GW391722, with only 6 aa residues difference between them (Figure 3(a)).These differences are possibly the result from differences between species.GO-324523, GO332293 have little similarities to other domains outside of DOF zinc finger domain with known AhDofs.
The second similarity group of DOFs contains GW-391724, At5g66940 and GmDof2.They have variable I region as D(E)ST, variable II region as NFS, the third variable region as SCR-H, the forth variable region as G(V)SR (Figure 2).
The specific conserved domain in GW391722 similarity group is in region III, YFCKSCRRYWTK (Figure 2).GW391722,23,28,29 all contained a same N-glycosylation site overlay the DOF middle conserved domain and variable II region (NYSL), All three CDF and GW-391724 do not have N-glycosylation site in conserved DOF domain, but three overlayed N-glycosylation site on the N-terminal outside of conserved DOF domain.

GW391729 Was Expressed in Peanut Leave
Inoculated with Ralstonia solanacearum DOF genes expressed from cDNA libraries of roots and leave with control and inoculated with Ralstonia solanacearum from same cultivar 063103 were examined.GW-391729 was found expressed only in leave inoculated with Ralstonia solanacearum, not in controled leave (Figure 4), this indicates that GW391729 is possibly related to the leaf resistant to Ralstonia solanacearum.In leave of both controled and inoculated with Ralstonia solanacearum, similar but different ests to GW391724 From ESTs deposited in GeneBank database derived from various cDNA libraries of developing seed of peanut, partial sequences of four DOF cDNAs (GO332017, GO339137, GO324523, GO332293) from VBL6 cDNA library, two DOF cDNAs (EE124644, EE125478) from Luhua4 cDNA library, and one (ES703897) from Tifton C20R5 cDNA library were obtained, translated and compared with DOF proteins from 06-3103.GO332017 is a partial sequence with 215 amino acids, highly similar to  developing seeds of 063103 with Genebank tools show that the most similar AhDofs are from wild peanut species A. ipaensis (Table 2), which is the source of BB genome of cultured peanut specie [44].This indicates that all the eight AhDofs are possible from peanut BB genome.
were expressed (Figure 5), these ests are not the same DOF proteins as GW391724, possibly related to the leaf development.In root inoculated with Ralstonia solanacearum, an est similar to but different from GW391722 was expressed (Figure 6), its function is possibly related to root response to Ralstonia solanacearum.As for function of GW391724 and GW391722 expressed in seed and different from those in leaf and root respectively, need further investigation.
Among AhDofs, ESTs from wild Arachis species similar to GW391722, GW391725, GW391726, GW-391727 are only from A. ipaensis (Table 2), indicates these four genes are possible locate in only BB genome.For GW391723, GW391728, GW391729, beside that the most similar ests are from A. ipaensis, there are two similar ests to each from seed of A. duranensis (Table 2), which is the origin of AA genome of cultured peanut

Evolution Origion of AhDofs from Developing Seeds
The blastn results using the est sequences of AhDofs from  specie [44].This reveals that in AA genome there are possible similar alle genes.

DISCUSSION
The integrated function of a protein is determined by its integration of structure and sequence.The different domain of a protein may have different activities.DOF proteins have relatively conserved zinc finger domain to bind to DNA promoter sequence, and variable C terminals differring at amino acid composition and lengths.The different structure in zinc finger domain determined their specific binding targets [7].Specific C terminal sequences and structure determined their specific reaction other than binding to DNA.Thus the specific function of Sequences of GW391729 from 8 cultivars were compared and analysied.Most of cultivars with 3 seeds per fruits have distinctive sequences, which is different from those with 2 seeds per seeds (Table 3).But it seems that size of seeds and resistance to intrusion of seed do not related to GW-391729.DOF proteins can be deduced by their sequence similarities to each structure domain with known function.
GW391725, GW391726 and GW391727 are three highly similar DOFs within AhDofs.They are mostly similar to AtCDF1 (At5g62430), AtCDF2 (At5g39660) and AtCDF3 (At3g47500).They contain all the conserved motifs of group II DOF proteins [1] (Table 1).CDF genes were first identified in Arabidopsis with yeast two-hybrid screen for proteins interacting with C terminal kelch repeats of FKF1, as well as that of LKP2, which can degrade CDFs, but not that of ZEITLUPE (ZTL) which functions in different signal transduction pathways [45,46].Three AtCDFs were isolated as At-CDF1 (At5g62430), AtCDF2 (At5g39660), AtCDF3 (At-3g47500) [45,46].AtCDF1 was further found to bind to the promoter of CO gene as a key negative factor in flowering pathway.By degrading the CDF1, FKF1 can release the inhibition of CDF1 on CO expression and thus release the inhibition of flowering [45,46].Thus AtCDF1 is an inhibitor for flowering.The highest expression of AtCDF1 is in cauline senescent leave, stem between 1st and 2nd intenode, the intermediate expression level is detected in vegetative rosette, early embryo stages 3 to 5, and the pods in the same stages, each flower organs of flower stages 12 to 15, as well in mature pollen.AtCDF1 was also reported restricted in vasculature [29].AtCDF2 and AtCDF3 do not have effect on flowering [45], while another study indicates AtCDF2 also function in repression of photoperiod induced flowering [47].AtCDF2 and AtCDF3 expressed more specifically in developing seeds, the expression level increased from torpedo stage till the highest level at curling cotyledon stage 9 [40,41].The expression of AtCDF2 is more specific in developing seed and dry seeds than AtCDF3.Besides in developing seeds, AtCDF2 is also expressed in mature pollen and cauline senescent leave.AtCDF3 expressed at higher level in cauline senescent leave, stem between 1st and 2nd intenode than AtCDF2, it also expressed in vegetative leave, stage 15 flower/ pedicel, stage 15 petal and stamen, lower expression lever in sepal, early stage of surface of pods [40,41].All the three AtCDFs (At3g47500, At5g39660 and At5g62430) are expressed in sperm cells and they are the only DOF genes expressed there of the 32 DOF transcription factors [27].In cold acclimations, AtCDF2 and AtCDF3 are two of five DOF genes upregulated [29,30].AtCDF3 has been known as H-protein promoter binding factor-2a.Since three AtCDFs all interact with FKF1 and LPK, and they all have long N terminal domains before DOF zinc domains which located just before NLS and binding to DNA sequences on promoters of genes, thus may cause large bulk at binding sites and possibly hinder the interaction of other transcription factors with the promoters, so that the function of CDFs may be con-served in higher plants, and they act as inhibitors for expression of certain genes.
GW391722 has a similarity of 94% and 91% to At-5g60200 and At2g28510 within 56 aa residues in DOF conserved domain respectively, 85% to GmDof13 in DOF domain with 97 aa residues, 54% to GmDof11 in DOF zinc finger containing domain with145 aa residues, it also have DOF group III conserved motif 13, 11, 20 [1], belonging to group III.Both group III DOF At5g60200 and At2g28510 were reported to regulate vascular formation process [19].At2g28510 was upregulated by cytokinin [21] and GA4 [20] stimulation.Compared to At2g28510, more expression of At5g60200 was detected in shoot meristem [39,40].The expression of At2g28510 enriched in 24 h imbibed seeds.In G. max, under control of 35S CaMV promoter, GmDof11 increase the lipid content in seeds of trangenic Arabidopsis plants by binding to a P-box like cis DNA element and also regulate the expression of a set of genes [4].This indicates that GW-391722 may have possible function in both vascular differentiation and metabolism by binding to promoters of a set of genes.
GW391724 shared sequence similarity of 81% to Gm-Dof2 zinc finger domain with 64 aa residues, 73% to AtDof3.4 (At3g50410, OBP1) zinc finger domain with 63 aa residues, 72% to AtDof5.8 (At5g66940) zinc finger domain with 66 aa residues.It has a DOF group IV conserved motif 33 and a undetermined motif among GW-391724, AtDof5.8, and GmDof2 (Table 1), so possibly belong to group IV [1].Group IV DOF At5g66940 is highly expressed in shoot apex inflorescence, vegetative, and transition stage, and in a patchy pattern in young floral organ primordia, latter stages limited to stamens and carpels, and early stages of developing seeds [39,40].At5g66940 was down regulated in the strubbelig-like mutant (slm) in which plant organogenesis controlled by SLM genes, was one of genes responsive to SLM genes [23].AtOBP1 was recently reported upregulating cell cycle specific genes, shortening G1 phase and overall length of the cell cycle, function as specific cell division regulator [38].
Among AhDofs, GW391728 shared the highest similarity to PBF at N terminal sequences including zinc finger domain (65%, 55 out of 84 aa), which belong to group VII [1], thus GW391728 is also possibly a cis DNA element P box-like binding factor.Besides, GW-391728 has the highest similarity to ACF74278/ DQ-889513 from peanut line Luhua14 with sequence identity of 169/181 aa (93%), but is much less similar to protein sequence of GO324523 from peanut line Tifrunner (Figure 2(b)), and did not found similar ests in peanut GT-C20R or other cultivars in genebank database.Comparison of protein sequences from eight cultivars reveals that GW391729 is possible related to the seed number in fruits.But it is only display such phenomenon among 8 evolutionally closed cultivars.GW391729 is also has possible function on leaf disease resistance, thus its function may related to development and possible cooperate with other DOF proteins.
The protein sequences of GW391723 and GW391729 shared similarities to that of At2G28810 with 48% at 76 aa/157aa and 59% at 61aa/103 aa respectively.Amino sequence of GW391723 has a conserved group V motif 17 at N-termial domain (Table 1) and possibly belong to group V [1], but which of GW391729 does not have similar sequence in other domains besides DOF zinc finger domain with known DOF proteins.Group V At2G28810 expressed in all stages but not in late pod stages [39,40].It is preferred expressed in phloem companion cells.Thus, it has a possible role in differentiation of vascular tissue [24].

Figure 1 .
Figure 1.Phylogenetic trees of AhDofs with their similar DOF proteins.

Figure 2 .
Figure 2. Comparison of DOF domain sequences between similar AhDofs, AtDofs and GmDofs.localization sequences possibly provided the structure basis for their common specific DNA binding property related to the function of CDF types of DOF proteins.

Figure 4 .
Figure 4. Comparison of GW391729 and cDNA F09 from leaf inoculated with Ralstonia solanacearum.

Figure 5 .
Figure 5.Comparison of GW391724 and similar cDNAs from leaf inoculated with Ralstonia solanacearum.

Figure 6 .
Figure 6.Comparison of GW391722 and similar cDNAs from root inoculated with Ralstonia solanacearum.

Table 1 .
Comparison of the conserved domains between AhDofs, GmDof and AtDofs besides DOF domains.

Table 2 .
Similarity analysis of DOF est to those from original species in Arachis genus.

Table 3 .
Differential sites of GW391729 among from 8 cultivars.