1. Introduction
Coffee inflorescences are glomerules produced by the differentiation of axillary meristematic buds above opposite leaf insertion points in the nodes of plagiotropic branches, mainly. Two to five glomerules distributed on both sides of each node can be produced simultaneously, but not in perfect synchrony. Each glomerule holds five pentamerous flowers, displaying epipetalous stamens, bifid stigmas, and bilocular ovaries [1]. Nevertheless, to observe flower bud emissions—meaning that young tiny buds, completely differentiated as flowers, displaying anthers but not yet microspores [2], become visible by the naked eye—and flower anthesis (blossoms) in the field, it can be necessary to wait weeks to months after flower evocation, which is the transition of meristems from the vegetative to the reproductive state triggered by environmental signals.
Observing coffee plants in south to southeastern Brazil, it has been proposed that commitment to flowering takes place when days become shorter than nights and temperatures consistently decrease, which is followed by a resting period expected to occur anytime from July to August [3]. This indicates that coffee flower evocation events would be expedited around the autumn equinox. Following flower evocation, comes the determination of flower part identities, accomplished by meristem identity determination genes [4]. Following identity determination, initial flower bud elongation leads to young flower bud emissions. This initial and gradual growth can reduce, but not eliminate, the asynchrony observed during coffee flowering [5], and is followed by resting. Resting is ecodormancy, determined by the intensity of environmental conditions, and relieved when the environment changes [6]. Coffee flower bud ecodormancy is commonly imputed to drought solely [1]. However, it partially coincides with the coldest weeks of the year in south to southeastern Brazil [3], and low temperatures can intensify ecodormancy [7], which can persist for months [1] [2] [8]. To surpass the abiotic stresses faced during dormancy, young flower buds rely on the covering provided by colleter exudation [9]. In general, Coffea arabica L. is more adaptable to low temperatures than C. canephora Pierre ex. A. Froehner [10]. Post-dormancy coffee flower growth resumption and blossoming are promoted by dormancy relief [11], and involve environmental signals other than those triggering flower evocations. In most plant species, it includes reopening of cell-to-cell communication by the removal of callose gates from plasmodesmata [6] [12].
In addition to ecodormancy, paradormancy (or latency) is the inhibition of bud growth by the surrounding tissues [6] in apical dominance. It can help evergreen perennials, such as coffee plants, to keep active meristems undifferentiated despite their exposure to environmental conditions that could lead to flower evocation [13]. This is necessary to enable ongoing vegetative growth throughout evergreens’ decades-long lives [14]. Additional variability can occur when meristems are incompletely committed to flowering, such as when evocation signals emitted by the environment are weak [15].
Regardless of location, development from flower bud emissions to anthesis takes estimated 120 days, resting included [8], but variation is expected [16] due to changes in some or all the factors mentioned above, and also altitude and the number of cloudy colder days, for instance. Variation is under genetic control, beyond dormancy release by water availability, and is important for genetic plasticity to support adaptability [5]. In the last section of this review, a bit of phenological data analysis is provided to illustrate differences. Additional data about coffee flowering can be found [17]-[19].
In contrast to flower evocation by the environment, coffee fruits (and beans) are frequently studied in detail, including related gene expression patterns [7] [20]-[28]. Fruits are harvested from May to June [3], more than a year after flower evocation events. Fruit development and maturation directly impact coffee quality, prices and economy. Brazil has been the largest coffee producer and exporter worldwide since 1840. C. arabica, native to Africa, was introduced from French Guiana early in the 19th century, and spread rapidly. Collects in the states of Rio de Janeiro, Minas Gerais, Bahia, Pernambuco, Amazonas, and Santa Catarina are reported as early as 1869 [29]. Currently, two-thirds of all 60-kg bean packs produced are from C. arabica, while C. canephora contributes one-third approximately [30].
Concerning is that, besides fruit formation [31] [32], flower evocation and development can be seriously affected by climate change [33], which interferes with species adaptability [34]-[42]. For coffee, unfavorable climate conditions can be harmful during both, flowering and fruit set, and even non-picked fruits can produce a negative feedback on subsequent flowering periods [43]. This feedback can be eliminated by planning harvests to the right time. However, if fruit maturation and flower evocation overlap too much during the autumn as a consequence of climate change, planning can become challenging. Selection of divergent genotypes could help.
Similar networks could operate in Coffea and Arabidopsis, despite differences in species distribution, to produce the transition from vegetative to reproductive meristems. Flowering mechanisms described for the model plant Arabidopsis have correspondence in many species (see below). Still, structural and functional analyses of genes responsive to environmental signals to trigger coffee flower evocation are very scarce. A few published reports, most regarding meristem identity determination, are reviewed in the next sections. A core network of genes and proteins interacting to trigger annual and perennial Arabidopsis flower evocation in response to light and temperature is reviewed and the genes are compared to their orthologs identified in Coffea genomes. Protein isoform preservation, gene organization in the chromosomes and their regulatory elements are discussed and taken as arguments to support further and deeper investigation.
2. Flower Evocation-Related Genes
Due to its importance to human survival, the existence of a molecule that could “bring blossoms”, lately designated as the florigen [15], was methodically investigated for six decades before the assignment of the Arabidopsis thaliana FT (FLOWERING LOCUS T) protein to the role [44]-[46]. Thereafter, FT protein has been considered the most important switch for flower evocation in A. thaliana [47]. The gene networks implicated in flower evocation and meristem identity determination described for A. thaliana are similar in most species, from cereals to orchids [48]-[54], including tropical species such as the biofuel plant Jatropha curcas [55], cotton [56], Manihot esculenta [57], and mango [58].
Research focusing specifically on coffee flower evocation by environmental signals could be improved by evaluating the concerted expression of a minimal set of genes, and their paralogs, that could suit the phenological model [3] mentioned in the Introduction. For this review, CONSTANS (CO), FLOWERING LOCUS C (FLC), FLOWERING LOCUS T (FT or FLT), SUPPRESSOR OF OVEREXPRESSION OF CONSTANS (SOC), and a vernalization responsive triad (VRN1, VRN2, and VIN3) of genes were selected because they can interact as parts of a core network driving the comprehensive and quite disseminated mechanism for flower evocation by temperature and light, meaning day-length (Figure 1). A similar conceptualization was accomplished decades ago for A. thaliana, following the identification of core genes implicated in its flowering [47].
![]()
Figure 1. Networking to flower in response to environmental signals. Symbols are derived from A. thaliana genes active on flower evocation—from the perception of environmental signals to signal transduction and integration—which are necessary for meristem transition from the vegetative to the reproductive state. CO is for CONSTANS, FLM/FLC is for FLOWERING LOCUS C/M, FLT/FT is for FLOWERING LOCUS T, SOC is for SUPPRESSOR OF OVEREXPRESSION OF CONSTANS, and VRN/VIN is for VERNALIZATION homologs. Open arrows pointing down or up indicate decreases or increases, respectively, in the signal amount or gene transcription, and are set as supposedly necessary to induce coffee flower evocation under short days and decreasing temperatures. Straight lines indicate active and dotted lines indicate silenced interactions between genes. At the line endings, arrowheads indicate induction and bars indicate repression. Meristem awareness would be for meristems that received FLT/FT protein. Commitment to flower indicates that transition has already started, and the meristems display reproductive-state morphology at the microscopic level.
To the present, coffee meristem identity-determination genes have been assessed. In situ expression patterns have been published [59], and a review including the genome-wide survey of meristem identity determination genes, was published recently [60], stating that future studies should identify all coffee MADS-box genes. Indeed, a genome-wide identification of MADS-box genes in C. arabica was accomplished by the same research team [61]. Nevertheless, while these authors reviewed a high number of genes, a small number of gene paralogs (only two FLC paralogs were mentioned) was assessed, and no gene/gene family was clearly implicated in flower evocation by the environment.
One C. arabica FT ortholog was identified and subjected to a complete and in-depth expression analysis, including heterologous complementation of A. thaliana mutants for flowering time. The authors reported continuous expression from February to October and an expression peak in June in leaves of three different genotypes. Examining one paralog for each gene, the authors concluded that CaFT and “environment-related floral regulators” CaCO and CaFLC were not co-expressed as expected [62]. This effortful investigation is shy regarding paralog analyses, which proved to be essential for most plant species, including A. thaliana [39] [63]-[65].
Alternatively, but less plausibly, C. arabica could be classified as an autonomous species, meaning that meristem identity determination and flowering would be driven by the contents of gibberellins, regardless of seasonal particularities. However, coffee vegetative and reproductive buds can occur on the same branches, and the frequencies of the two types of buds on the plagiotropic branches change depending on seasonal signals, to produce leaves or flowers (refer to the last section in this review for examples). Age and position along the axis release undifferentiated meristems on leaf axils from apical dominance [6] [11], allowing two types of responses—flowering or vegetation—and the response depends on genetic specifics and the environment. A genotype known for its capacity to flower almost continuously throughout the year regardless of season was designated C. arabica var. Semperflorens, which means “always flowering,” for this reason. It can produce flowers when neighboring coffee plants of other genotypes cannot, and displays a bimodal fruit production pattern, peaking in April and November in south to southeastern Brazil, with lower fruit production from June to August. Continuous flowering is so unique that it is possible for breeders to guarantee that progenies originating in certain months result exclusively from self-pollination, even when C. arabica var. Semperflorens plants are surrounded by other compatible genotypes [66] [67]. These Semperflorens plants could be considered autonomous with regard to their flower evocation. Their lack of interaction with environmental signals throughout the year differs completely from the reactions described in the final section for other C. arabica genotypes.
3. Proteins Functioning in Flower Evocation: Much to Be Learned from Coffea
Aiming to provide theoretical support to the functioning of the gene network in Figure 1 with in silico available data, all Coffea spp. and Arabidopsis proteins were aligned and surveyed at once in order to compare function/family versus species of origin, while agents defining cluster composition. The function/family was the prevailing agent, followed by the species of origin (Figure 2). Proteins from the three Coffea species as well as A. thaliana orthologs in the same function/family clustered together in high-order clusters. Inside these high-order clusters, sub-clusters held C. eugenioides proteins and the proteins encoded in the chromosomes contributed by ancestral C. eugenioides to the primordial C. arabica plants, as well as C. canephora proteins and their paralogs in the chromosomes contributed by ancestral C. canephora to C. arabica. This pattern suggests the possibility of divergence and sub-functionalization according to the species during evolution in tropical environments, without loss of function. Examples include CcFLT1/CaFLTIC and CeFLT1/CaFLT1E (sub-cluster 3b, Figure 2), CcVIN3/CaVIN3C, and CeVIN3/CaVIN3.1E and 2E (cluster 7, Figure 2, Cc and C = C. canephora, Ce and E = C. eugenioides, and Ca indicates C. arabica specimens).
![]()
Figure 2. Proteins implicated in A. thaliana flower evocation as response to environmental signals and their orthologs in Coffea spp. For this review, regardless the genus, isoforms identified online, were aligned all together and clustered using the maximum likelihood method with the Jones-Taylor-Thornton model of amino acid changes to produce this circular tree. A very small number of those Coffea isoforms was already accessed and investigated (see the text for details). CO indicates the homologs of CONSTANS proteins regardless of plant species, FLC is for FLOWERING LOCUS C, FLT is for FLOWERING LOCUS T and its diverse structural homologs, SOC is for SUPPRESSOR OF OVEREXPRESSION OF CONSTANS, and VRN/VIN is for VERNALIZATION homologs. Isoforms designated with the same number + letter before the dots are all encoded in the same single locus, resulting from alternative splicing, for instance. Letters C and E as the last character designate paralogs encoded in C. arabica homoeologous chromosomes contributed by ancestral C. canephora (Cc) and C. eugenioides (Ce) to C. arabica (Ca), respectively.
FLC (Figure 1, Figure 2) encodes a flowering repressor that controls vernalization-suppressible late flowering, meaning that flowering under its control will not occur without vernalization [68]. The FLC protein interacts directly with the FT chromatin to repress flowering [69]. In annual A. thaliana, the FLC gene is kept inactive by cold because VIN3 (Figure 1, Figure 2) is induced as long as cold temperatures are perceived, and the VIN3 protein keeps specific lysine residues in histone 3 (H3) at the FLC locus methylated. VIN3 protein multimerization is critical for its function and depends on a somewhat continuous cold period, which explains the stochastic model (expression of two or more genes that collectively explain a high proportion of variation in a single dependent variable) of the progressive cumulative effects of vernalization to induce flower evocation. VRN1 and VRN2 proteins join different “repressive” complexes of DNA-binding factors, which are also involved in maintaining the FLC H3 methylated state. H3 demethylation reinstates the repression of flowering [70].
So, in the annual A. thaliana, vernalization genes act together to extend the histone methylation state at the FLC locus, producing a “memory of the winter” that can persist for approximately 10 cell cycles, and ensures that downstream processes will not be interrupted immediately upon the occurrence of warmer temperatures [71]-[75]. This type of “memory” has never been described for coffee plants and VIN/VRN Coffea orthologs have not been assessed. Nevertheless, despite Coffea being a tropical genus rarely subjected to freezing cold conditions, the vernalization-related proteins assessed in the present review are sufficiently preserved to split into three individual clusters populated by orthologs to A. thaliana VRN1 (cluster 5), VRN2 (cluster 6), and VIN3 (cluster 7), with no admixture or intertwining (Figure 2).
In addition to histone methylation, other mechanisms of FLC inactivation, such as RNA polymerase pausing, noncoding RNA-mediated gene silencing, and transcript destabilization, are under investigation for annual Arabidopsis [76] [77].
In perennial Arabidopsis and other perennial Brassicaceae species, however, FLC paralog functioning is different as reviewed in [13]. Due to discrete polymorphisms in its noncoding regions, FLC expression increases as soon as temperatures increase to end the flowering season in perennial species. Flower buds form and grow slowly during cold exposure, complying with the physiological competence of the meristems, which is possibly controlled by auxins and related to apical dominance. In axillary meristems where apical dominance is mitigated, FLC can rule as the sole repressor of flower evocation [14]. All these characteristics are present in Coffea; as already mentioned, bypassing the “branch juvenility” determined by its position on the plant orthotropic or plagiotropic axes [11], the node’s competence to attend flower evocation related signals would be reached. In addition, in Arabidopsis perennials, and also in Coffea, FLC is duplicated and the copies are placed in tandem (Table 1, chromosome 11). It urges to verify how much cold and to what extent cold temperatures can repress Coffea FLC variants, whether variant repression is equally stable in different coffee cultivars, and what the roles of conserved VIN isoforms could be, considering that strong stabilization of the histone methylation state at FLC loci would not occur in the absence of freezing cold temperatures.
Equally important is to notice that A. thaliana FLC is closely related to FLM genes, and 10 - 12 transcript FLM transcript isoforms are described in the TAIR database [78]. The A. thaliana FLM operates in flower evocation under temperate, cool conditions, where freezing cold, capable of inducing strong vernalization, is rare [79]. FLMβ/AT1G77080.4 and FLMδ/AT1G77080.2 are isoforms resulting from the alternative splicing of FLM pre-RNA molecules. The FLMβ/AT1G77080.4 isoform is more abundant at 23˚C than at 16˚C, and is the only isoform able to interact with SHORT VEGETATIVE PHASE (SVP) in the nucleus to delay the temperature-dependent flowering of A. thaliana [79] [80]. The existence of a Cryptochrome 2, which is sensitive to blue light and favors the transcription of the FLMβ isoform, was reported for A. thaliana [81] and never investigated for Coffea. In areas occupied with coffee cultivation in south to southeastern Brazil and other coffee-producing tropical countries around the world, daily temperatures of 16˚C and 23˚C are common. The frequency of days, or hours, of one or the other of these temperatures could be key to flower evocation in different cultivars.
CaFLC3E, CcFLC3, and CeFLC/M3—one paralog protein from each Coffea species assessed—clustered together (sub-cluster 2c, Figure 2), and closer to the six A. thaliana FLC isoforms included in the analysis (AtFLC1.1 to 1.6, sub-cluster 2a1, Figure 2) than any other Coffea FLC. Among those three, some were previously identified by automated annotation as FLM/AGL27 orthologs to A. thaliana proteins encoded in the AT1G77080 locus, instead of orthologs to FLC (AT5G10140). The analyses accomplished for this review suggested that the three proteins in this sub-cluster 2c (Figure 2) could be the representatives of a Coffea FLM instance, possibly more effective than FLC proteins to induce flower evocation under non-freezing cool temperatures.
CO genes are responsive to photoperiod, and CO proteins interact directly with FT [82] [83]. CO proteins are stabilized late in the long-day afternoon, which is the critical state that triggers annual A. thaliana flowering [84] [85]. In a recent report [86], CO was proposed to be essential for FT expression in long- and short-day plants, but particularly important for long-day plants. This could be related to the red/far red ratio in daylight, as perceived by Phytochrome A to stabilize CO, preventing degradation and allowing docking to the FT promoter to produce a bimodal expression pattern, with a pronounced peak in the afternoon. The bimodal pattern would be significantly reduced during short days, which could be interpreted as a decrease in the importance of day-length signaling, suggesting the possible need for additional flower evocation signals to trigger flowering under short days. Accordingly, in the clustering analysis accomplished for the present review, CO homologous proteins (sub-clusters 1a and 1b, Figure 2) displayed the largest distance between sequences from the two plant genera accessed in silico: annual A. thaliana, which is a long-day plant, and Coffea spp., which are here interpreted as short-day plants. One isoform from each Coffea species clustered closer to A. thaliana CO-like 2 protein (AtCOL2, sub-cluster 1a; Figure 2) than to AtCO. AtCO was set aside as an isolated tip (sub-cluster 1b, Figure 2) in the same high-order cluster 1. What differences are between AtCO and AtCOL2 and why Coffea CO homologous clustered to one or the other rest to be investigated.
In turn, transcription of SOC (cluster 4, Figure 2) would be enhanced by FT proteins directly, inhibiting CO ectopic expression in a negative feedback and maintaining the conditions necessary for annual A. thaliana flower evocation under long days [87]. SOC role in Coffea spp. is also neglected.
And finally, FT protein—the florigen (FLT in cluster and sub-clusters 3a-b, Figure 2)—is not an environmental signal sensor triggering flower evocation, nor it is a meristem identity determinant. It is considered an integrator of flowering pathways induced by temperatures and by photoperiods [44], and is recurrently mentioned to explain natural variation of flowering time observed for A. thaliana ecotypes adapted to diverging environmental conditions [37] [63] [64]. It shall be present in the meristems [88] to trigger the transition to a reproductive state. In presence of FT protein, meristem identity genes are expressed leading to the loose of meristematic characteristics and to the determination of flower part identities. However, FT cannot stay active indefinitely to grant normal development [89].
The A. thaliana genome contains five genes homologous to FT, namely TERMINAL FLOWER 1 (TFL1), TWIN SISTER OF FT (TSF), MOTHER OF FT AND TFL1, BROTHER OF FT AND TFL1, and ARABIDOPSIS THALIANA CENTRORADIALIS HOMOLOGUE [65] [90]. The TSF signal is less mobile than the FT signal, and its silencing exerts more pronounced effects under short days [65]. In turn, the TFL1 protein has repressor functions that are opposite to those of FT protein, despite they are paralogs. Functional conversions like this would be natural [91]. A. thaliana FT isoforms AtFLT1.1, AtFLT1.2, AtFLT1.3 (sub-clusters 3a1-2, and 3b, Figure 2), and the AtTSF protein produced four scattered isolated tips in the same high-order cluster (cluster 3, Figure 2). Coffea FT paralogs, regardless of species, clustered closer to the AtTSF protein (sub-cluster 3b, Figure 2), which, as previously mentioned, would be more effective at triggering meristem awareness than AtFT under short days.
Taking all together (Figure 1), vernalization/temperature decrease is perceived through FLC/FLM gene silencing, possibly by the interaction with VIN/ VRN proteins, which impair FLC/FLM protein accumulation and docking into the FT gene promoter. This releases FT expression and FT protein to interact with downstream meristem identity-determination genes, concluding the transition to the reproductive state. In turn, day-length is perceived through changes in CO protein accumulation and docking into specific cis-elements [92]-[94] on the regulatory region of A. thaliana FT genes [95]. A. thaliana [96], C. canephora proteins [97], and paralogs in the C. eugenioides and C. arabica genomes were identified in the Genome Bank [98], aligned [99] and clustered [100] to demonstrate that the network in Figure 1 could operate in all these species. Therefore, considering the absence of freezing temperatures and long days with which to induce coffee flower evocation under tropical conditions, hypothetically, Coffea FTs maximal expression could depend on the delicate and finely coordinated docking of mild amounts of opposite transcription factors into their promoters or on a peculiar representation of different docking site classes (cis-elements) in their promoters (see below). Complete FLC/FLM repressor silencing and/or complete FLC/FLM protein docking off could make a few docked CO activators enough to produce FT, which would trigger the expression of meristem identity-determination genes, on branch nodes already liberated from apical dominance.
4. Gene Placement on Coffea Chromosomes
Interestingly, some loci, such as FLC, FT, and VRN2, were prolific regarding protein isoform in any species, whereas the number of isoforms of CO and SOC was low (Table 1). Because SOC proteins interact directly and control CO expression, it is plausible to admit that eventual non-concerted independent diversification has been reduced during evolution. In general, paralog isoforms in the different Coffea genomes are encoded in homologous and/or homoeologous chromosomes, with three exceptions: proteins CeCO, CeVNR2, and CcFLT3 (Table 1). As the CcFLT3 gene rests without allocation in the most recently available C. canephora chromosome assembly, it can still be allocated to chromosome number 9.
Complexity increased from C. canephora to C. eugenioides, and then to C. arabica, which had a total of 14, 15, and 22 paralog proteins, respectively (Table 1), to the seven A. thaliana genes in Figure 1. Accordingly, the C. canephora genome is from a doubled haploid plant, which is likely highly homozygous, and C. arabica plants are natural allotetraploids generated by interspecific hybridization from ancestral C. canephora and C. eugenioides millions of years ago [101]. The complexity introduced by in tandem duplications and allopolyploidization, join point mutations, indels, and alternative splicing to produce, for example, 19 C. arabica var. Caturra VRN2 protein isoforms, coded by the single VRN2 gene allocated to chromosomes 2c which was contributed to C. arabica by its ancestor C. canephora.
Coffea isoforms FLC1, 2, 3 and 4, designated as such in this review, are encoded on different genes/loci (Table 1). Likewise, according to their structural particularities, FLC isoforms split into four divergent sub-clusters (sub-clusters 2a-d) in the same high-order FLC cluster 2 (Figure 2). For instance, C. canephora FLC3 (CcFLC3) is placed on chromosome 11 at the coordinates 33.146.554...33.132.049, and the locus is here designated as CHR11 (c) (Table 1). The placement of paralog coding sequences in C. canephora and C. eugenioides corresponds: the CeFLC3 locus is the third one coding for FLC isoforms on chromosome 11 of C. eugenioides. Surprisingly, in C. arabica var. Caturra, the single CaFLC3 paralog was identified on chromosome 11e, contributed by C. eugenioides, while paralogs on chromosome 11c contributed by C. canephora were not identified.
Similarly incomplete sets were observed for the CO, FLC1 and FLC2, FLT3, and VRN2 loci: A. thaliana orthologs were not present in C. arabica chromosomes contributed from both ancestors, based on the genome assembly currently available. These absences would most probably result from the Caturra genome particularities.
Table 1. Genes implicated in A. thaliana flower evocation in response to environmental signals and their orthologs in Coffea spp. A. thaliana genes are identified by their symbols and loci designations. Annotation refers to the description of the best Coffea spp. orthologs of A. thaliana genes. Coffea spp. proteins accessed are identified by the designations used for clustering and the chromosomes (CHR) where they are encoded are indicated. Letters inside parenthesis following chromosome numbers designate Coffea paralogous loci replicated in tandem.
|
A. thaliana |
Annotation |
C. canephora |
C. eugenioides |
C. arabica |
CO |
AT5G15840 |
AtCO |
Protein CONSTANS like |
|
|
|
|
|
|
AtCOL2 |
CHR 07 |
CcCO |
CHR 11 |
CeCO |
CHR 07E |
CaCOE |
AtCO un |
|
|
|
|
|
|
FLC |
|
|
AGAMOUS like MADS box protein AGL27 FLM |
CHR 11 (a) |
CcFLC1 |
CHR 11 (a) |
CeFLC1 |
CHR 11C (a) |
CaFLC1C |
|
|
CHR 11 (b) |
CcFLC2 |
CHR 11 (b) |
CeFLC2.1 CeFLC2.2 CeFLC2.3 |
CHR 11C (b) |
CaFLC2C.1 CaFLC2C.2 CaFLC2C.3 |
|
|
|
|
|
AtFLC1.1 |
CHR 11 (c) |
CcFLC3 |
CHR 11 (c) |
CeFLC/M3 |
CHR 11E (a) |
CaFLC3E |
|
AtFLC1.2 |
AT5G10140 |
AtFLC1.3 |
CHR 11 (d) |
CcFLC4 |
CHR 11 (d) |
CeFLC4.1 |
CHR 11C (c) |
CaFLC4.1C CaFLC4.2C CaFLC4.3C |
|
AtFLC1.4 |
|
AtFLC1.5 |
|
|
CeFLC4.2 |
CHR 11E (b) |
CaFLC4.1E CaFLC4.2E CaFLC4.3E CaFLC4.4E |
|
|
|
|
|
|
FLT TSF |
AT1G65480 AT4G20370 |
|
Protein heading date 3A (FLT1) |
CHR 10 |
CcFLT1 |
CHR 10 |
CeFLT1 |
CHR 10C CHR 10E |
CaFLT1C CaFLT1E |
|
AtFLT1.1 |
Protein heading date 3A (FLT2) |
CHR 08 |
CcFLT2 |
CHR 08 (a) CHR 08 (b) |
CeFLT2.1 CeFLT2.2 CeFLT2.3 |
CHR 08C CHR 08E |
CaFLT2C CaFLT2E |
AtFLT1.2 |
AtFLT1.3 |
AtTSF |
Protein heading date 3A (FLT3) |
CHR 00 |
CcFLT3 |
CHR 09 |
CeFLT3.1 CeFLT3.2 CeFLT3.3 |
CHR 09C |
CaFLT3C |
|
SOC1 |
AT2G45660 |
AtSOC1 |
MADS box SOC1 |
CHR 02 |
CcSOC1 |
CHR 02 |
CeSOC1.1 |
CHR 02C |
CaSOC1C |
CeSOC1.2 |
CHR 02E |
CaSOC1E |
AGL19-like |
CHR 08 |
CcSOC2 |
CHR 08 |
CeSOC2 |
CHR 08C |
CaSOC2C |
CHR 08E |
CaSOC2E |
VRN1 |
AT3G18990 |
AtVRN1 |
VRN1 |
CHR 07 |
CcVRN1 |
CHR 07 |
CeVRN1 |
CHR 07C |
CaVRN1C |
CHR 07E |
CaVRN1E |
VRN2 |
|
|
|
|
|
|
|
|
CaVRN2A.1C |
|
|
|
|
|
|
|
|
CaVRN2A.3C |
|
|
|
|
|
|
|
|
CaVRN2A.4C |
|
|
|
|
|
|
|
|
CaVRN2A.5C |
|
|
|
|
|
|
|
|
CaVRN2A.6C |
|
|
|
|
|
|
|
|
CaVRN2A.7C |
|
|
|
|
|
|
|
|
CaVRN2A.8C |
|
|
|
|
|
|
CeVRN2.1 |
|
CaVRN2A.9C |
|
AtVRN2.1 |
|
|
|
|
CeVRN2.2 |
|
CaVRN2A.10C |
|
AtVRN2.2 |
|
|
|
|
CeVRN2.3 |
|
CaVRN2A.12C |
AT4G16845 |
AtVRN2.3 |
Polycomb embryogenic flower2 |
CHR 02 |
CcVRN2 |
CHR 01 |
CeVRN2.4 |
CHR 02C |
CaVRN2B.1C |
|
AtVRN2.4 |
|
|
|
|
CeVRN2.5 |
|
CaVRN2B.2C |
|
AtVRN2.5 |
|
|
|
|
CeVRN2.6 |
|
CaVRN2B.3C |
|
|
|
|
|
|
CeVRN2.7 |
|
CaVRN2B.4C |
|
|
|
|
|
|
|
|
CaVRN2B.5C |
|
|
|
|
|
|
|
|
CaVRN2B.6C |
|
|
|
|
|
|
|
|
CaVRN2B.7C |
|
|
|
|
|
|
|
|
CaVRN2B.8C |
|
|
|
|
|
|
|
|
CaVRN2B.9C |
VIN3 |
AT5G57380 |
AtVIN3 |
VIN3.1 |
CHR 06 |
CcVIN3.1 |
CHR 06 |
CeVIN3.1 |
CHR 06C |
CaVIN3.1C |
CHR 06E |
CaVIN3.1E |
VIN3.2 |
CHR 10 |
CcVIN3.2 |
CHR10 |
CeVIN3.2 |
CHR 10C |
CaVIN3.2C |
CaVIN3.3C |
CHR 10E |
CaVIN3.2E |
CaVIN3.3E |
5. Conserved Domains and Regulatory cis-els: More Learning in View
Clustering (Figure 2) agreed with the results produced by BLASTX screening (Table 1) of Genome databases, both supporting the selection of Coffea orthologs to A. thaliana flower evocation-related proteins as the potential agents of coffee flower evocation in response to environmental signals (Figure 1) to be reviewed as start. Additionally, regardless the genus, the peptides in the same family/cluster displayed the same functional conserved domains [98] in Arabidopsis and Coffea. Protein designations according to Table 1 and Figure 2, and a graphical view of the respective conserved domains identified by the CD routine (CDD search) [98], are available as Supplementary Files 1a and 1b. Proteins lacking conserved domains were observed; however, Coffea proteins from one family/cluster displaying conserved domains characteristic of another family were not observed. The FLC and SOC protein isoforms displayed the same conserved domains in all the four assessed plant species/two genera. Refer to Supplementary File 1b, items 4 - 9, 14, 22 - 23, 31 - 34, 36 - 39, 65 - 69, 72, and 80 - 82 for the graphical views. Clusters populated with proteins from those two families were also intertwined (sub-clusters 2a1-2, 2c, and cluster 4; Figure 2).
The last set of analyses focused on reviewing sequence homology and functional similarity on A. thaliana and Coffea, included the identification [102], counting, and contrasting (Sigma Plot, v. 11.2) of cis-elements (cis-els) in the regulatory regions of the genes coding for all 112 proteins (Table 1) accessed for clustering. Regulatory regions were examined up to a thousand base pairs upstream of translation-initiation codons. The identified cis-els were divided into sets/classes of metabolic processes/types of responses, designated as DEV, FLW NTW, LIGHT, PGR, and TEMP. The frequencies of different cis-els in the regulatory regions of genes in the seven families from Figure 1 and Table 1, and a list of the cis-els included in each of the DEV, FLW NTW, LIGHT, PGR, and TEMP sets, are available as Supplementary Files 2a and 2b, respectively. The DEV (development) cis-els are involved with transcription factors producing secondary branches/sprouts, maintaining circadian rhythms [103], and organizing the cell cycle via chromatin and histone modification, which are closely related and essential for flowering [72] [104], but also for primary physiological processes such as photosynthesis and shoot/root elongation.
The FLW NTW (flower networking) set comprised cis-els dedicated to be docking sites of flower evocation-related proteins, improving the interaction necessary to coordinate the perception and the responses to the multiple environmental signals reaching a plant every minute. The most frequent cis-els in the FLW NTW class were CARGATCONSENSUS and CARGCW8GAT, identified in the gene promoters of all families from Table 1. These cis-els are the docking points for the FLC and for the AGL15 proteins/transcription factors, respectively. AGL15 is another MADS-box transcription factor involved in embryogenesis and flowering [105].
The PGR (plant growth regulators) set was populated with cis-els recruiting plant growth regulators—particularly gibberellins and abscisic acid—which are antagonists in diverse physiological processes, including flowering. PGRs are involved in reactions of all orders, triggered by most environmental/abiotic signals/stresses, and also the autonomous flowering pathway.
The LIGHT class included elements important to keep flowering in concert with light dependent mechanisms, such as day-length and the expression of chlorophyll a/b-binding genes that participate in photosynthesis. Cis-els for temperature-driven transcription factor docking, such as vernalization-related factors, were included in the TEMP class. These were recurrent in regulatory regions of the seven gene families reviewed here (Table 1), followed by those meant to interact with light driven transcription factors. For further verification, please refer to the LIGHT and TEMP data in the graphs provided in Supplementary File 2a. This result is another indication that temperature is crucial for inducing—and probably interrupting—Coffea flowering at the right times. Vernalization can supplant the photoperiod in sensitive plants subjected to both stimuli simultaneously [106], although day-length is important for distinguishing similarly cool seasons with different light regimens [95], such as early spring and late autumn, in the absence of low temperatures.
FLC/FLM Coffea orthologs displayed regulatory cis-els for docking of light- and temperature related transcription factors in similar frequencies (Fig. S2A.2 in Supplementary File 2a), indicating that both environmental signals could influence their expression. The FT genes followed the same trend and additionally displayed the highest frequency of cis-els in the FLW NTW class (Fig. S2A.3 in Supplementary File 2a). This abundance and diversity of cis-els is in accordance with the flowering signal integrator function imputed to the FT protein [63] [64].
Contrary to expectations, ACGGAT cis-els were absent in Coffea FT gene regulatory regions, and were present in the regulatory regions of CO paralogs from all three Coffea species assessed, and also one C. canephora SOC paralog. ACGGAT cis-els can recruit gibberellins or CO proteins [92] [93] and contribute to determine early- or late-flowering Arabidopsis phenotypes [94]. Regarding the absence of these elements from the 1000-bp FT regulatory regions screened for this review, it shall be noted that ACGGAT can be found far upstream the translation initials in A. thaliana, and this could also be the case for Coffea FTs. However, what of its presence in Coffea CO ortholog promoters? Could Coffea CO protein out-compete gibberellin docking into promoters of its own coding genes in a negative feedback mechanism to inhibit flowering under high temperatures and long days?
A manual search for TGTG/CACA repeats, which are also implicated in the direct interaction of the CO proteins docking on FT promoters, produced mild results. Canonical distribution reproducing the arrangement reported for A. thaliana [82] was not found in Coffea, although combinations of TGTG/CACA-like motives were identified side by side in positions -300 to -100 for at least one FT gene per Coffea species.
6. Phenotypic Variability Observed in the Field
In order to bring readers closer to the quotidian aspects of flowering in the field, the analyses described in the following paragraphs are a register of the diversity regarding flower bud emission on C. arabica genotypes designated as early, intermediary, and late by coffee breeders on the basis of the time taken to complete reproductive cycles, from blossoming up to ripe fruits. As mentioned above, flower evocation is not perceived by naked eyes and hardly taken as part of the reproductive cycle. Nevertheless, in addition to diverge for flower bud emission timing and fruit maturation, the three plant types are expected to diverge regarding flower evocation driven by differences in the expression patterns of the genes included in Figure 1, Figure 2 and Table 1, in response to environmental signals, specifically temperature and day-length.
Two to four plants per type were evaluated in two or three random blocks. A minimum of 7 and a maximum of 12 experimental units per genotype were examined each week, from late August 2022 to January 2023. Despite being short, this period is representative of the day-length variation at this latitude (Instituto de Desenvolvimento Rural do Parana, Londrina Experimental Station, Brazil; coordinates 23˚35'S and 51˚16'W). Sunlight was available for 11.5 up to 13.5 hours per day, meaning a two hours difference, where the peak is 3 h and 10 min between the shortest (June, winter solstice) and the longest (December, summer solstice) days. Regarding temperatures, it was a meteorologically rich winter- spring-early summer transition period, and atypically rainy and cold, with short spells of warmer temperatures. Earlier in the year, uncommon cold spells in May 2022 (autumn) brought the daily minimum down to 6˚C, which was 4˚C lower than the minima observed during July 2022 (winter). The three plant types responded differently.
The experimental units in the field were the uppermost sections of coffee orthotropic branches (main branches or trunks). These branches grow continuously during the year, albeit slowly/very slowly in the winter, producing new branch nodes and internodes and two plagiotropic branches per node, but no additional orthotropic branch unless the first one is injured. Regardless of the month, phenological grades were defined according to the developmental state of the reproductive structures observed on the plagiotropic branch nodes growing on similar uppermost sections for all plants. The experimental units received a single grade when a single phenological stage clearly and undoubtedly prevailed, or a few different grades when different phenological stages were observed simultaneously on a similar number of nodes. Grade 0 was for nodes with no morphological signal of induction to produce vegetative or reproductive structures; grades 1.3 and 1.6 were used for flower buds following emission, which can be observed in the field by the naked eye, meaning that transition to the reproductive state had already occurred (Fig. S3.1, Supplementary File 3); grades 2, 3, and 4 were for flower buds about to open, open flowers (blossoms), and senescent flowers, respectively; grade 5 was for incipient fruits/ovaries turning to fruits; and grade 6, which was the maximal grade observed in the field for the period reported here, was used for green fruits in any developmental state posterior to 5 and before full seed endosperm hardening (images are available in Fig. S3.2, Supplementary File 3). These phenological grades are represented by different colors in Figure 3(A). At least one entry (if the experimental unit was uniform regarding phenological grades) per experimental unit/week was recorded, resulting in roughly 180 phenological observations per genotype at the end of five months. The interactions among these phenological observations, collect date (around 20 points/dates of phenological grade determination on the field) and minimum daily temperature on the same date (another 20 points), were analyzed and displayed as three-dimensional graphs (Sigma Plot, v. 11.2). The minimum and maximum absolute daily temperatures for the same period, obtained from an automated meteorological station installed in the Institute, are shown in Figure 3(B).
In the uppermost section of the early plants (Figure 3(A), EARLY), flower induction and the development of reproductive structures (grade progression) tended to occur ‘vertically’, indicating that the influence of temperature (y axis) was strong and responses were rapid. Nodes with grade 0 in September/October (red area in the inferior left corner of the graph) were induced by low temperatures in August (Figure 3(B)) to appear in the first weeks of November as flower buds and small young fruits (dark green area in the center). The intermediary plants (Figure 3(A), INTERMEDIARY) flower bud emission and reproductive structure development followed a ‘transversal’ trend, meaning that light and also temperature could be strong determinants of reproduction. Three flower-producing fruit events are represented in the graph. The transitions from grades 1 - 2 for grades 5 - 6 are represented as two sequences of yellow/light-green followed by dark-green spots in the graph diagonal axis, from the bottom left to the upper right. The late cultivar phenological evolution (Figure 3(B), LATE) followed a ‘horizontal’ trend, meaning that it possibly took longer for signal perception, transduction, or to trigger the responses to environmental signals at the uppermost section of the plants. Late plants displayed grade 2 nodes by November, and a few nodes evolving from grade 4 to 5 in December 2022.
Flower evocation conditions, including the uncommon cold spells registered in May 2022, were perceived by all the plants. However, the cultivars responded at different speeds and intensities. Under the low temperatures in May 2022, the early cultivar was induced to produce the major fruit cohort of the year (the big dark blue area in the right half of the base of the graph). The vertical dimension at the extreme right of the graph is half red and half deep blue. Late in October 2022, flower bud emission had been concluded for the uppermost sections of these early plants, and no additional transition from grades 0 to 1, which would obligingly include flower evocation, was observed. Under the influence of the same cold spells in May, the intermediary cultivar produced a cohort (dark blue spot in the base of graph) of similar magnitude to the other two coming after (dark spots in the center and upper-right corner). The late cultivar reaction was one of low intensity, and subsequent flower bud emission events were not observed. Could commitment to flowering be incomplete? A small number of nodes in the uppermost plant sections underwent flower evocation in May, mostly attaining grade 6, while temperatures were around 10˚C.
When warmer temperatures became recurrent, undifferentiated meristems/ meristematic buds frequency (red spots on the upper-right corner of the graphs in Figure 3) increased, to share the branches with developing fruits and buds producing vegetative organs, which would once more dominate the uppermost section of the plants, starting by the early genotype, in the next months, announcing the end of the reproductive cycle.
![]()
Figure 3. Three different C. arabica phenological patterns observed in the same experimental station. A—Three-dimensional representation of interactions between minimum daily temperatures × time (September 2022 to January 2023) × phenological grades for the orthotropic (main) branch uppermost sections and plagiotropic branches produced at these sections of early, intermediary, and late Coffea arabica plants, to demonstrate the existence of variability regarding flower bud emission (transitions from grade 0 to grades 1.3), which would be consequence of different reactions to the same environmental signals, specifically temperature and day-length, inducing flower evocation (somewhere between grades 0 and 1). The color scale on the right indicates grades corresponding to branch node phenological states, ranging from 0 (non-induced meristematic buds) to 6 (immature expanding fruits). B—Daily minimum and maximum temperatures at the IDR-Paraná Research Station at Londrina-PR, Brazil (23˚35'S and 51˚16'W) by date (dd/mm/yyyy).
7. Prognostics and Expectations
This review demonstrated the potentiality for spatial and temporal expression of the genes shown in Figure 1, Figure 2 and Table 1 to reduce the scarcity of information about flower evocation triggered by environmental signals in Coffea spp. Phenotypic variability is present and can be captured (Figure 3). Molecular data shall be accessed to verify how well it suits and explains variability.
Basic questions require answers. What are the main environmental signals and how do they trigger coffee flower evocation? Do early cultivars need fewer cold hours or fewer short days to respond with the meristem transition from vegetative to reproductive? Are intermediary plants more responsive to mild variations in environmental signals, are they more sensitive, or both? Would late cultivars require more intense cold or more continuous periods of cold hours to respond with abundant flower bud emission? Will vernalization (VRN/VIN) related genes repeat the expression patterns reported for A. thaliana?
It is possible to search for answers by accessing different gene sets, such as that described in [107]. Orthologs to TFL, which, despite being paralogous to FT, would induce antagonistic effects on flowering [90] [108] [109] would also be a good choice. Nevertheless, Figure 1 is a network of genes and their relationships. Temperature and day-length signals were admitted as possible triggers for Coffea flower evocation implicated in phenotypic variability, which can resemble Arabidopsis perennial species [11] [39] [110], and are not yet sufficiently understood. Coffea orthologs accessed in silico were coherently similar to Arabidopsis models in their primary sequences and conserved domains. Furthermore, cis-els in the regulatory regions are intriguing and can bring interesting novelty, while isoforms can be investigated individually [35], examining dissociation curves to begin. By analyzing temporal expression patterns, genes functioning in tune can be distinguished from those functioning in opposition to each other around the year. In addition, by analyzing spatial expression patterns, signal perception can be distinguished from the responses. All this will be interesting supports to genotype selection for adaptability to different environments.
Acknowledgements
Prof. Dr. Maria Helena de Souza Goldman for the stimulation to write. The Brazilian Coffee Research and Development Consortium for granting access to coffee plants and laboratory facilities at the IDR-Parana (Londrina, PR). Drs. Gustavo H. Sera and Luciana H. Shigueoka, researchers at the IDR-Parana (Londrina, PR), for indicating C. arabica field trials visited to collect phenological data, and Dr. Heverly de Morais, for recovering and providing data from the IDR meteorological station.
Data Availability
The coffee sequences analyzed in this study are available for C. arabica var. Caturra Red (tetraploid, PRJNA 497895, accession CCC135-36, “autogenous population”) and C. eugenioides (diploid, PRJNA 497891, accession CCC68, “autogenous population”) at the National Center of Biotechnology Information (NCBI, http://www.ncbi.nlm.nih.gov/). C. canephora (doubled haploid, DH200-94, France) data are available at the Coffee Hub website (Dereeper et al. 2015. http://www.coffee-genome.org). A. thaliana data are available at The Arabidopsis Information Resource (TAIR, http://www.arabidopsis.org). The CD Search routine available at the NCBI was used to identify the conserved domains (https://www.ncbi.nlm.nih.gov/Structure/cdd/. Data bank version 3.20).
Supplementary Files
https://data.mendeley.com/datasets/7wvhh7v965/1.