Genome-Wide Identification , Classification and Evolutionary Expansion of KNOX Gene Family in Rice ( Oryza sativa ) and Populus ( Populustrichocarpa )

The KNOX gene family codes for transcriptional regulators with a variety of functions in plant developmental and physiological processes. In this study, a genome-wide comparative analysis of KNOX genes in Poplar (Populustrichocarpa) and rice (Oryza sativa L. ssp. japonica) was carried out. With comprehensive computational analyses, which take into account the gene structures, phylogeny and conserved motifs, 15 and 13 KNOX genes in Poplar and rice were identified, respectively. These KNOX genes were further divided into 3 groups. The Poplar gene POPTR_0012s04040 and the rice genes LOC_Os03g47042 and LOC_Os03g47022 were classified to a new group of KNOX genes without ahomeobox domain together with KNATM, which were proposed to play potential role in plant development and pluripotency. The identification of KNATM homolog in monocotyledons (rice) provided a strong support for proposing an ancient shuffling of HOMEOBOX gene with MEINOX gene took place in the KNOX phylogeny. Using subcellular location information, GO (gene ontology) and expression profile analysis, KNOX genes in rice and poplar were proposed to function similarly to the members in Arabidopsis. Our observations may lay the foundation for future functional analysis of KNOX genes in rice and poplar to unravel their biological roles in cellular pluripotency.


Genome-Wide Identification, Classification and Evolutionary Expansion of KNOX Gene
Family in Rice (Oryza sativa) and Populus (Populustrichocarpa)

Introduction
Homeobox genes are key regulators in plant and animal development.In plants, homeobox genes are divided into several groups according to their sequences, one of which is the KNOX (knotted-like homeobox) family [1] [2].Plant KNOX genes encode homeodomain-containing transcription factors that are required for meristem maintenance and organ initiation.There are 4 conserved motifs in KNOX proteins, KNOX1, KNOX2, ELK and Homeobox_KN [3] [4] [5].KNOX1 and KNOX2, the two upstream domains collectively called MEINOX, are separated by a poorly conserved linker sequence [6].
In Arabidopsis, members of the KNOX genes are divided into three classes depending on their sequences and expression patterns, class I, class II and class M [7].Class I subfamily consists of 4 members, SHOOT MERISTEMLESS (STM), KNAT1, KNAT2 and KNAT6 which are characteristically expressed in the shoot apical meristem (SAM).They play important roles in meristem, control of leaf shape and hormone homeostasis and act as either transcriptional activators or repressors [8] [9].STM functions in shoot apical meristem, maintains and also regulates inflorescence architecture [10], KNAT1 also shows cell type specific expression patterns in the Arabidopsis root [11] and acts in a redundant fashion with STM [12] [13] [14].KNAT2 is expressed during embryogenesis and marks the base of the SAM [15] [16], active in root tissue [17].KNAT6 is expressed in the embryonic SAM, the SAM boundaries [15] and the phloem tissue of roots [18].By contrast, the functions of the four Arabidopsis KNAT Class II genes KNAT3, KNAT4, KNAT5, and KNAT7, are barely known, but they are more widespread [8] [9].KNAT3, KNAT4 and KNAT5 show cell type specific expression patterns in the Arabidopsis root [11].The expression of KNAT3 is the highest in young siliques, inflorescences and roots, and the strongest expression of KNAT4 is in leaves and young siliques.Both of them show reduced expression in etiolated seedlings [19].KNAT5 is expressed in the young primordium and the newly developing elongation zone of roots and its' expression in the epidermis marks the boundary between cell division and elongation [11].KNAT7 is highly expressed in the central part of the Arabidopsis root [11] [20] and concerts with secondary wall formation in Arabidopsis and Poplar [21].
Class M represented by KNATM, a novel class of KNOX family in Arabidopsis, is characterized as missing the homeodomain by [7], and has a role in leaf proximal-distal patterning and is expressed in proximal-lateral domains of organ primordia at the boundary of mature organs [7].
Compared to the largely investigated functions of Arabidopsis KNOXs, three Poplar KNOX genes (ARBORKNOX1, ARBORKNOX2 and PoptrKNAT7) and nine rice KNOX genes (Oskn1/HOS59, Oskn2/OSH1, Oskn3/OSH15, OSH3, OSH6, OSH43, HOS58, OSH71 and HOS66) have been characterized up to date.ARBORKNOX1 (ARK1) and ARBORKNOX2 (ARK2) are both belong to Poplar Class I KNOX homeobox genes, and ortholog to Arabidopsis STM gene and KNAT1 respectively.ARK1 is expressed in the SAM and the vascular cambium, and is down-regulated in the terminally differentiated cells of leaves and secondary vascular tissues derived from these meristems [22].ARK2 is expressed widely in the cambial zone and in terminally differentiating cell types, influences terminal cell differentiation and cell wall properties during secondary growth [23].Three rice KNOX genes, Oskn1, Oskn2 and Oskn3, are expressed in the SAM and are involved in regulating SAM formation [24], OSH3 is expressed in the inflorescence meristem and involved in morphogenesis [25] [26].HOS66 has a broadly expression from root to leaf, flower and callus [27] while its' function is still unraveled.
Though a survey and classification of KNOX genes in maize had been carried out 20 years ago [1] and expression analyses [25] [28] and function analysis of a few rice KNOXs [24] [29] have been carried out, no detailed systematic analysis including subcellular location, gene structure and expression profiling has been conducted.In order to reveal the evolution and function of KNOX genes in monocot and dicot plants further, a genome-wide identification of KNOX genes in Poplar and rice were taken as reference for dicot and monocot, respectively.
The sequence phylogeny, genome organization, gene structure, conserved motifs and expression profiling of those homologous genes were also analyzed to verify the evolution origins of Poplar KNOX genes as well as to confirm their tissue-specific expression patterns.This might provide some insights for future engineering modifications of plant pluripotency and regeneration characteristics in rice and Poplar.

Database Search and Sequence Retrieval
Multiple database searches were performed to collect all members of the rice (Oryza sativa) and Poplar (Populustirchocarpa) KNOX family.The amino acid sequence of the 9 KNOX genes from Arabidopsis (Arabidopsis thaliana) were used as query sequences, BLAST programs (TBLASTN and BLASTP) on the Phytozome (http://www.phytozome.net/),TAIR (http://www.arabidopsis.org/),Rice Genome Annotation Project (http://rice.plantbiology.msu.edu/),NCBI (http://www.ncbi.nlm.nih.gov/) and PlantDB (http://www.plantgdb.org/)genome database were performed and the sequences hitting with 1e−6 or less were treated as conserved genes.To increase the extent of the database search results, position-specific iterated BLAST [30] search against the rice and Poplar database on the NCBI web site were also performed.All the sequences with conserved domains of KNOX protein except for those annotated as miRNA, retrotransposons or transposable elements were considered as KNOX candidates.

Construction of the Phylogenetic Tree and Gene Structure Analysis
To generate the phylogenetic tree of each gene family, a multiple alignment

Protein Structure Prediction
The 3D (three-dimensional structure) protein structure prediction of KNOXs in Arabidopsis, rice and Poplar were performed by PHYRE2 (Protein Homology/ AnalogY Recognition Engine) Protein Fold Recognition Server (www.sbg.bio.ic.ac.uk/phyre2/) [34].PHYRE2 is web-based services for protein structure prediction using the principles and techniques of homology modeling.
It is able to regularly generate reliable protein models when other widely used methods such as PSI-BLAST cannot.

GO Analysis and Subcellular Localization
All targets identified in this study were subjected to agriGO toolkit analysis (http://bioinfo.cau.edu.cn/agriGO) to investigate gene ontology [35].

Expression Analysis for KNOX Genes in Poplar and Rice
For Poplar, the expression profile for each KNOX gene was obtained by eva-

Identification and Phylogenetic Analysis of the KNOX Family Genes in Poplar and Rice
By BLAST(BLASTp and BLASTn) search against multiple databases with query sequences of nine previously published KNOX genes from Arabidopsis, the candidate KNOX genes in rice and Poplar were further analyzed to confirm the presence of KNOX1, KNOX2, ELK and Homeobox_KN domain through Pfam program and Conserved Domain Search Service (CD Search) in NCBI.Excluding the redundant sequences, 13 rice (Oryza sativa.L) KNOX genes and 15 Poplar (populoustirchocarpa) KNOX genes were identified (Table 1).
According to the conserved amino acid sequences, all these KNOXs genes in Arabidopsis, rice and Poplar divided into three groups, class I, class II and class III, which agreed with representative genes of Arabidopsis (Figure 1).Among which, 22 KNOX genes belonged to class I, including 4 Arabidopsis KNOX genes, 9 Popolus KNOX genes and nine rice KNOX genes, 13 genes belonged to class II and 2 genes belonged to class III.Class I was further divided into 3 subclasses, designated as IA, IB and IC.Class IC comprises only 4 rice genes, LOC_Os03g56140, LOC_Os03g56110, LOC_Os03g47022 and LOC_Os03g47042, showed ambiguous clustering from other phylogenetic classes (Figure 1).
Class II included IIA and IIB two subclasses.KNAT7, together with rice gene LOC_Os03go3164 and Poplar member POPTR_0001s08550 represented class IIA subclass.Besides Arabidopsis KNAT3, KNAT4 and KNAT5, 3 rice members, LOC_Os08g19650, LOC_Os02g08544 and LOC_Os06g43860 and 4 Poplar members, POPTR_0006s20440, POPTR_0018s12200, POPTR_0006s27560, and POPTR_0018s02210 were classified into class IIB clade (Figure 1).Poplar KNOX gene POPTR_0012s04040 fell into class III together with Arabidopsis KNOX member KNATM, showed big divergences from the other clades and indicated that class III was new branch in the phylogeny (Figure 1).

Comparison of Gene Structure in Arabidopsis, Poplar and Rice
Structural diversity of KNOXs in rice and Poplar analyzed based on the alignment of cDNA and corresponding genomic sequences by GSDS revealed that KNOX members within the same subfamilies shared very similar gene structure in terms of either intron numbers or exon lengths except for class IC.KNOXs in class IA shared a structure with 5 exons and the third and fourth exons were separated by a long phase 1 intron.According to the sequences from Phytozome, class III member KNATM had only one intron, whether it had 5' and 3' UTR still need to be identified (Figure 1).Nonetheless, the gene structures in 4 rice members of subclass IC appeared to be more variable.LOC_Os03g56110 had a same structure as subclass IA, LOC_Os03g56140 had 4 exons and a long phase 1 intron interval between the third and fourth exon.Both of LOC_Os03g47042 and LOC_Os03g47022 had 3 exons separated by two phase 0 introns, the 5' UTR and 3'UTR of gene LOC_Os03g47042 were not available and the gene LOC_Os03g47022 had 3'URT only, that showed similar features with KNATM in gene structure.
In addition, the intron phases were highly conserved in KNOX genes within each class.Class I and Class III always was in phase 0 and 1, while 50% genes in class II was in phases 2 and 0 (Figure 1), resulting in a significant excess of non-symmetrical exons.That suggested splicing phases were also highly conserved during the evolution and there was a strong correlation between the phylogeny and exon/intron structure of the KNOX gene family in rice and Poplar.
Although the intron phases with respect to codons were remarkably well-conserved within the same subfamilies, there were striking distinctions in the arrangement of introns and intron phases among subfamilies of KNOX IB (Figure 1).The conservation of intron phases within KNOX subfamilies and the striking dissimilarity between subfamilies may reciprocally lend support to the results from phylogenetic analysis and genome duplication, and provide further supports to plant KNOX subfamilies definition.

Multiple Alignment and Conserved Domain Analysis of KNOXs in Each Class
The domain conservation of 13 rice and 15 Poplar KNOX proteins were analyzed by performing a multiple sequence alignment with Arabidopsis KNOX amino acid sequences by ClustalW (Figure S1, Figure S2, Figure S3) and MEME program.No KNOX2 domain was found in rice gene LOC_Os03g56140.
Both ELK domain and Homeobox_KN domain were not found in rice sequences LOC_Os03g47042 and LOC_Os03g47022 (Figure 2).These two genes showed similar characters as class III member, Arabidopsis KNATM and POPTR_0012s04040, which characterized as no ELK domain and Homeobox_KN domain (Figure 2).For other class I genes and class II genes, KNOX1, KNOX2, ELK and Homeo-box_KN domains were highly conserved (Figure 2).The KNOX1 domain was completely conserved in all of the 22 class I sequences and the E-Value was as high as 4.3e-309 (Figure S1).The domain variation of three genes in class IC showed consistence with their divergence in phylogenic analysis.And the domain similarities among LOC_Os03g47042, LOC_Os03g47022, KNATM and POPTR_0012s04040 suggested that the gene cluster analysis sometimes cannot classify the high divergence of the motif, especially combined with its short length and the occurrence of many ancient paralogs when large number of sequences being examined, which can be complementary by domain analysis.Same results were also obtained through CD (conserved domain) search in NCBI and Pfam (Table S1).

Protein 3D Structure Predication of KNOX Family Genes in Rice and Poplar
The  However, the 3D structures of 4 rice gene members in class IC showed high variances.LOC_Os03g56110 and LOC_Os03g56140 still had typical 3 helices structure like other KNOX genes, while rice gene LOC_Os03g47042 had very short helices II, no typical loop structure between helices I and helices II and no helix-turn-helix motif formed by helices II and helices III, and LOC_Os03g47022 showed merged long helices, no loop, turn and other two helices (Figure 3(c)).The typical 3D protein structures against the 3-helix-structureagreed with the protein structure prediction of class III KNOXs (Figure 3(d)).Arabidopsis gene KNATM (AT1G14760) had a structure with two longer helices connected by a short linker, populous gene PORT_0012s04040 had a structure with unlinked helices, the first helices was disconnected with the rest two.The conservation and variances of 3D protein structure in class I was highly in accordance with phylogenic analysis and conserved domains analysis, which further supported they represented highly diverged lineage-specific KNOX sequences.These a typical 3D protein structures of LOC_Os03g47022, LOC_Os03g47042 and PORT_0012s04040 implied their special biological function and phylogenic divergence.

Differential Expression Profile and Subcellular Location of KNOX Family Genes in Poplar and Rice
To understand the temporal and spatial expression patterns of rice and Poplar KNOX genes, we compared their expression patterns during development with the expression data derived from RNA-seq available at MUS and from PopGe-nIE respectively.The diversity expression patterns of rice KNOX genes in 11 different tissues showed that except unavailable expression data for LOC_Os03g47042, most of the class I genes mainly expressed in pre-emergence inflorescence, post-emergence inflorescence, and pistil, and 25 DAP (days after pollination) embryo with different expression dynamics (Table 2), gene LOC_Os05g03884had highest expression in pistil and gene LOC_Os01g19694in 25 DAP embryo.LOC_Os07g03770 also had a significant expression in anther besides the highest expression in pre-emergence inflorescence with expression value as high as 52.75 (Table 2).Comparing to other class I genes, the gene LOC_Os03g51710 had expression in all of 11 tissues and its highest expression was in pre-emergence inflorescence, the value reached 5.86 (Table 2).
All of the 4 class II genes almost expressed in all of the 11 tissues which were analyzed and had the same expression pattern with strong expression in shoot, 4-leaf stage seedling and 20 days leaf, and the gene LOC_02g08544 showed a ubiquity expression (Table 2).
For Poplar, the 5 class I KNOX genes (POPTR_0008s19300, POPTR_0013s01000, POPTR_0005s01720, POPTR_0004s00650, and POPTR_0002s11400) had strong expression in root, mature leaf, internodes, young leaf and nodes, while no expression of gene POPTR_0010s05340, POPTR_0015s09030, POPTR_0012s08910, POPRT_0011s01600 and Class III gene POPTR_0002s11400 were available in these five tissues.The gene POPTR_0008s19300 had a stronger expression in  nodes than in other four tissues.The gene POPTR_0013s01000 had stronger expression in both of internodes and nodes with the expression value of 9.37 and 9.47 respectively.The expression of gene POPTR_0005s01720, gene POPTR_0004s00650 and gene POPTR_0002s11400 in root, internodes and nodes were higher than in other tissues (Table 3).The class II KNOX genes showed evenly expression in root, mature leaf, internodes, young leaf and nodes with highest expression in mature leaf (Table 3).Protein subcellular localization is crucial for protein function prediction.The subcellular location of KNOX members in rice (Table 2) and Poplar (Table 3) showed that except for three rice class IB members were still unknown, all others were located in nucleus, which implied their transcription factor functions.

GO (Gene Ontology) Analysis of KNOX Target Genes in Rice and Poplar
To identify biological processes these KNOXs might participate in and whether they were differential in rice and Poplar, 15 Poplar KNOX genes and 13 rice KNOX genes were subjected to AgriGO toolkit analysis to investigate gene ontology.13 rice genes were all involved in cellular process, regulation of biological process, biological regulation, metabolic process, transcription regulator activity and binding activity (Figure 4(a)).All 15 Poplar KNOXs genes were involved in DNA binding activity and more than 70% of the Poplar genes were involved in biological regulation and transcription regulator activity (Figure 4(b)).The  enrichment of rice and Poplar KNOXs involved in DNA binding was consistent with the subcellular predication that most of them were located in nucleus.
According to GO analysis, the functional group for transcription regulator activity, transcription factor activity and sequence specific DNA binding was highly enriched by KNOX target genes in rice and Poplar nucleus (Figure 5), but the enrichment of functional group for DNA binding and transcription factor activity was not as high as that of sequence-specific binding in Poplar.That showed all of rice KNOXs and most of Poplar KNOXs had roles in DNA binding and transcription regulator activity when considered their expression and subcellular analysis together.
Except for transcription factor activity and DNA binding, KNOXs target genes in rice were highly functionally related to RNA metabolic process, regulated RNA metabolic process and RNA biosynthesis process (Figure 5(a) and  .p-value was assigned to each GO group based on the overabundance of significant genes.The block color from yellow to red was divided in 9 levels to represent an increasing of enrichment strength roughly.
depending on their sequences and expression patterns [7].According to sequences search and blast in this study, KNOXs homologous are scattered across rice and Poplar chromosome and can also be grouped into 3 classes based on sequences and gene structure (Figure 1).That indicated KNOXs were an ancient gene family and conserved highly across species.However, class IC consisted of only four rice genes, was divergent clustering between other phylogenetic classes.They most likely represented highly diverged lineage-specific KNOX sequences or the phylogenetic analysis could not resolve their evolutionary relationships.Two rice genes, LOC_Os03g47016 and LOC_Os03g47036, were classified into KNOX gene family by Jain et al. (2008), while no KNOX1 and KNOX2 domains were found by us, so they were excluded from KNOX family in this study.However, two new KNOX genes, LOC_Os03g47042, LOC_Os03g47022 were firstly identified in this study and grouped into class IC according to the amino acid sequences, but when considered the gene structure, conserved domain and 3D protein structure all together, they showed more similar to Arabidopsis class M members and might be the KNATM orthologs in rice.As deep nodes and determining interclade relationships commonly showed low statistical support and varied between different phylogenetic methods which typically observed in protein phylogenies [40], the ambiguous classifications indicated the gene cluster analysis sometimes cannot classify the high divergence of the motif, especially combined with its short length and the occurrence of many ancient paralogs when large number of sequences being examined, which can be complementary by domain analysis.KNATM homolog was found only in dicots and placed in a new class in KNOX family [7].The KNATM homolog in Poplar suggested that KNATM originated early in the evolution of dicotyledons.What's more, two possible KNATM homolog genes were found in rice (Figure 1), it was the first time to identify MEINOX protein in monocotyledon and the discovery argued against the hypothesis that the KNATM homolog in monocotyledons is a canonic KNOX protein with the homeodomain being redundant or KNATM originated in dicotyledons was lost in monocotyledons.All of these argue against the hypothesis of KNATM being a pseudogene.

Functions of KNOXs in Rice and Poplar
As key effectors involved in transcriptional regulation and hormonal signaling, the function diversity is performed by the KNOX members in different classes.
Class IA, IB and Class II KNOXs in rice and Poplar all had 4 highly conserved domains and were located in nucleus, highly expressed in undifferentiated tissue (Table 2 and Table 3), functioned in transcriptional factor activity (Figure 5), that suggested them might participate in plant cell differentiation and plant morphogenesis, with similar functions as that in Arabidopsis.
In Arabidopsis, class I KNOX genes were mainly expressed in the meristematic tissues and regulate hormonal pathways to maintain meristematic cells in an undifferentiated state [5].Our expression analysis found class I gene either in rice or Poplar mainly expressed in pre-emergence inflorescence, post-emergence inflorescence, pistil and 25 DAP embryo (Table 2), showed similar expression patterns as in Arabidopsis.
STM (AT1G62360) was classified into class IB in this study and expressed during early embryogenesis, its expression marked the entire SAM (Long et al.
1996) and was proposed to be essential for SAM formation and maintenance [41] [42] because STM inhibited the cellular differentiation normally associated with organogenesis and permitted the WUS-CLAVATA feedback loop to maintain the central stem cells [41] [43].As two Poplar genes, POPTR_0004s00650 and POPRT_0011s01600 exactly fell into STM clade, they had similar characters as STM in phylogenic, protein structure and expression patterns, they probably functioned as the same as STM in Arabidopsis and it was worth to give a further investigation for their roles in cellular differentiation and interactions with other genes.Two of three rice class IB KNOX genes, LOC_Os03g51690 and LOC_Os07g03770 had highest expression in pre-emergence inflorescence (Table 2) and fell into the same clade to KNAT1.Expression of Oskn3 marked the boundaries of different embryonic organs following SAM formation (Postma-Haarsma et al.KNAT2 was expressed during embryogenesis and marked the base of the SAM [15] [16].Its' promoter had been reported to be active in root tissue [17].KNAT6 was expressed in the embryonic SAM, the SAM boundaries [15] and the phloem tissue of roots [18].The gene POPTR_0008s19300 had strongest expression in nodes and might function in phloem tissue just like KNAT6 as they both fell into the same clade and had the similar expression profiles. The 4 conserved domains of KNOX protein showed different molecular functions.KNOX1 played a role in suppressing target gene expression,KNOX2 was thought to be necessary for homo-dimerisation, ELK domain was required for the nuclear localization of these proteins, and Homeobox KN domain was a homeobox transcription factor conserved from fungi to human and plants [3] [4] [5].Poplar gene POPTR_0012s04040, rice genesLOC_Os03g47042 and LOC_Os03g47022 had only the MEINOX domain as the same as KNATM (Figure 2), which indicated the functions owing to ELK and Homeobox-KN4 might be lost in these 3 genes.However, all of them were found to be expressed in nucleus (Table 2 and Table 3) and showed transcription activities by GO analysis (Figure 4 and Figure 5).Though strong functional relationship existed between the homeodomain (HD) and the MEINOX domain, some observations indicated that the MEINOX domain can also work in a homeodomain-independent fashion [44].Class III members KNATM was found to be participated in transcriptional regulation in a homeodomain-independent fashion [7].That indicated the KNATM homologues in rice and Poplar might function also independent of Homeoboxdomain.However, the BP-interacting domain reported by Magnani and Kake [7] was not conserved in rice homologues (Figure 3).The mechanisms of KNATM-BP interaction in rice KNATM homologues need further investigations.
Recent evidence suggested that auxin may play a major role in down-regulating KNOX genes during organ emergence [45].And several clues pointed to a hormone-dependent down-regulation of KNOX genes in the incipient primordium [46] [47], but the identity of the genetic factors responsible for this control is still unknown.
All of these hinted the known and new KNOXs identified in this study may regulate expression of their target genes to control cell differentiation and development in rice and Poplar.Though further experimental studies still need to be conducted to unravel their biological roles, the genome-wide analysis of KNOXs in rice and Poplar will help to discovery new KNOXs gene and provide a valuable resource for further functional analysis.

Figure 1 .
Figure 1.Phylogenetic relationship and gene structure of the KNOX members in Arabidopsis, rice and Populus.The phylogenetic tree was constructed with the amino acid sequences of KNOX proteins from Rice (Oryza sativa) (LOC_Os), Arabidopsis (Arabidopsis thaliana) (AT) and Populus (Populustrichocarpa) (POPTR) using UPGMA (rooted phylogenetic tree with branch length) method by Clustalw.The gene structure was performed by GSDS program with the sequences from Phytozome.exon/intron structures are represented by the green bars which showing exons and grey lines which showing introns.The blue bar showing 5' and 3' UTR, number 0, 1, 2 corresponds to the intron phase.The sizes of exons and introns are proportional to their sequence lengths.

Figure 2 .
Figure 2. Conserved domain analysis of KNOXs with MEME program.Based on domain conservation, all these KNOXs genes in Arabidopsis, rice and Poplar were divided into three groups, class I, class II and class III.Class I was further divided into 3 subclasses, designated as IA, IB and IC.Class II was further divided into subclasses IIA and IIB.Combined P-value and motif locations were showed.Non-overlapping sites with a p-value better than 0.0001.The height of the motif "block" is proportional to -log (p-value), truncated at the height for a motif with a p-value of 1e −10 .
3D protein structures of KNOXs in rice and Poplar predicated by PHYRE2 showed variances within different subclasses.All of the members in class IA, class IB and class II members had highly conserved 3D structures (Figure 3(a)) featured 3 helices, among which helices I and II were typically connected by a loop structure, helices II and helices III formed a helix-turn-helix motif.The structure of KNAT4 was a little different which included a set of 3 helices structure at N terminal and a set of 5 helices structure at C terminal (Figure 3(b)).A bipartite arrangement of these two set helix structure was separated by a linker region.This kind of structure configuration hinted the functional diversity of KNAT4 in Arabidopsis development.

Figure 3 .
Figure 3.The 3D protein structure of KNOX genes.(a): Typical 3 helices structure of KNOX genes in this study; (b): 3D structure of KNAT4; (c): 3D structure of two class IC KNOX genes LOC_Os03g47042 and LOC_Os03g47022; (d): 3D structure of class M. Image coloured by rainbow N → C terminus.

Figure 4 .
Figure 4. GO flash chart of biological process and molecular function by GO analysis of targets of KONXs in rice (a) and Populus (b).Blue bars indicate the enrichment of Populus KNOXs targets in GO terms.Green bars indicate the percentage of total annotated Populus genes mapping to GO terms.

Figure 5 (
Figure 5(b)).It confirmed that rice KNOXs were involved in plant development as transcriptional factors.Poplar KNOXs target genes were weaker in regulation of RNA metabolic process and RNA biosynthesis process (Figure 5(c) and Figure 5(d)).

1 .Figure 5 .
Figure 5. Gene Ontology (GO) analysis of the KNOX genes.GO analysis according to AgriGO (http://bioinfo.cau.edu.cn/agriGO/index.php),with respect to molecular function ((a), (c)) and biological process ((b), (d)) in rice ((a), (b)) and populous ((c), (d)).p-value was assigned to each GO group based on the overabundance of significant genes.The block color from yellow to red was divided in 9 levels to represent an increasing of enrichment strength roughly.
1999).The expression of KNAT1 covered the embryonic SAM, post-embryonic development SAM and the boundary of the inflorescence SAM (Venglat et al. 2002).LOC_Os03g51690 might also take part in the inflorescence architecture just like LOC_Os07g03770 and KNAT1.

Table 1 .
KNOX gene information a .
a:The sequence information was from Phytozome database (http://www.phytozome.net/);The longest isoform was chosen if there were more than one alternative splicing isoform available.

Table 2 .
Rice expression a .
a: expression support for each gene model was explored through gene expression evidence search page (http://rice.plantbiology.msu.edu/expression.shtml)available at MSU.Expression data were derived from NCBI Sequence Read Archive (SRA).

Table 3 .
Expression and subcellular location of Populus KNOX genes a .profile were obtained by evaluating its EST representation among 19 cDNA libraries derived from different tissues and/or developmental stages available at PopGenIE (http://www.popgenie.org/).
a: the expression