Bioinformatic Analysis of Potential Pathogenicity Effectors of Candidatus Liberibacter asiaticus, Causal Agent of Citrus Huanglongbing

Huanglongbing (HLB) or citrus greening is currently the most important citrus disease, caused by the bacterium Candidatus Liberibacter asiaticus (CLas). The impossibility of isolating it causes understanding its pathogenic mechanisms to be a complicated task. Recent studies identified 16 proteins with the signal peptide needed to be secreted in the plant and cause the disease. The present study aims to perform a bioinformatic analysis of these proteins with the function prediction approach by gene ontology (GO) and the detection of conserved domains. It was observed that of the 16 proteins analyzed not all are found in different infective strains reported in the literature. The GO analysis allowed us to relate different proteins with the biological process of energy and pathogenic activity, especially CLIBASIA_03315 and CLIBASIA_05115, respectively. The domain analysis allowed the observation of a β-CA domain, tentatively related to the damage caused to the chloroplast and a PAAR domain associated with the T6SS secretory system. Our results provide information on the possible function of potential pathogenicity effectors in CLas.


Introduction
Currently, the world citrus industry faces one of its greatest challenges in history, the HLB [1]. In Mexico, this disease is caused by the bacterium Candidatus How to cite this paper: Flores-de la Rosa, F.R., Rodríguez-Quibrera, C.G., Matilde-Hernández, C. and Santillán-Mendoza, R. ver, it has been observed that some pathogenic effectors as Las5315 induces an extreme accumulation of starch in the plant, increment the intracellular H 2 O 2 content and suppress the expression of antioxidant genes [12] [13]. Likewise, the presence of structures of secretory systems in the bacteria has also been observed [12], which indicates that the use of virulence proteins is highly related to the activation of symptoms in the plant [13].
Virulence proteins or pathogenicity effectors play a transcendental role in the plant pathogen interaction according to the zigzag model [14], as these are responsible for breaking the resistance acquired by the host, as well as be targets for the activation of plant resistance [15]. In the case of phytopathogenic bacteria, effectors are released through the Type III Secretory System [16] [17] however, in the case of CLas, the structures that make up this system are absent, while the elements of the general secretion pathway (GSP/Sectranslocon) are complete [8], numerous potential proteins with the signal peptide required for their secretion have also been detected in the CLas genome [18], even some of them have been evaluated by transient expression in model plants and the characteristic symptoms of the disease have been observed [19], however, most of these proteins are still without knowledge of its function in the interaction between the plant and the pathogen.
Predicting the function of unknown proteins is one of the main current objectives of bioinformatics. One of the most commonly used approaches for the prediction of protein function is the Gene Ontology (GO), which consists in the systematization of three ontologies: 1) the biological process to which the protein is related, 2) the cellular component where the protein is located and 3) the molecular function of the protein [20] [21]. Therefore, the objective of this work was to perform, through bioinformatic tools, an analysis of the gene ontology of CLas proteins that are potential pathogenicity effectors.

Potential Pathogenicity Effectors
The genes reported by Pitino et al. [19] [8] based on the following criteria: 1) presence of a signal peptide in its structure, 2) length less than 250 amino acids and 3) no existing a characterization of the function in NCBI. The presence of copies of these genes was determined in the genomes of Gxpsy strain [22], ishi-1 strain [23], A4 strain [24] and FL17 strain [9] through BLAST on the NCBI platform.

Determination of Gene Ontology
In order to predict the possible function of the proteins of the aforementioned genes, the gene ontology (GO) approach was used [20] and the ontologies related to the Biological Process (BP), Cellular Component (CC), and Molecular Function (MF) were determined. To carry out the search, an approach based on the affinity propagation and the architecture of the domains in the amino acid sequences was used, using the PANDA online software (http://dna.cs.miami.edu/PANDA/) [25], with the options offered by default.
The results were consulted with the Quick GO database (https://www.ebi.ac.uk/QuickGO/term/GO:0016020) of the European Bioinformatics Institute (EMBL-EBI).

Search for Conserved Domains and Protein Structure
The presence of functional domains was determined in the different sequences of the genes under study. For this, the online MOTIF search software (https://www.genome.jp/tools/motif/) was used to search the database of Preserved Domains (CDD) of the NCBI, for which the offered options were used. In the case that more than one domain was identified, those not related to prokaryotes were discarded and special attention was focused on those that could show virulence relationship in other organisms. The possible structure of the proteins with potential pathogenic activity coming from the bacteria was determined, for this the software Phyre2 (Protein Homology/analogY Recognition Engine V 2.0) was used [26] and the results were visualized in the software Ez-Mol 1.22 (http://www.sbg.bio.ic.ac.uk/~ezmol/). Given the experimental importance of the CLIBASIA_05315 proteins [19], a BLAST of the detected domain was performed and a multiple alignment was generated with the obtained sequences, using the Muscle algorithm; a dendrogram based on genetic distance UPGMA was also built in the MEGA X software [27]. The alignment was analyzed for conservation by specific sites using the Jalview software [28].

Results and Discussion
The sixteen proteins identified as potential pathogenicity effectors by Pitinio et al. [19] were analyzed to determine their possible role in the interaction that causes the HLB disease. From the cellular component analysis, it was observed that nine of the sixteen proteins are tentatively located in the membrane. The prediction of some proteins was very uncertain, for example, the case of CLIBASIA_02470, which only gave suggestions to localize in membrane, however, no data of biological process or relevant molecular function was obtained.
Some of the proteins were related to biological processes associated with cellular energy (CLIBASIA_05315, CLIBASIA_00460 and CLIBASIA_03695), which is of great interest because it is considered that energy parasitism is essential in the HLB process [11]. The results of gene ontology are summarized in Table 1.
The CLIBASIA_00460 protein is related to the electron transfer capacity (Table 1) and is present in all the genomes of CLas analyzed (Table 2), so it is very interesting that in the domain analysis a conserved domain PAAR is located (proline-alanine-alanine-arginine) which has been observed binds to the VgrG spike system as a conical extension, forming a type of contractable mechanism similar to that present in bacteriophages [29], this belongs to the secretory system type VI (T6SS) and gives it the ability to release different types of toxic molecules to the plant [30], with a high degree of efficiency and specificity [31]. The domain structure observed in our results (Figure 1(a)) is very similar to those that have been presented in other investigations of the PAAR domain in proteins associated with T6SS [29]. These results suggest that the protein CLIBASIA_00460 may be associated with the release of effectors during the development of HLB mediated by T6SS, which has proved to be an essential part in the development process of other diseases in other crops [32], because it confers advantages in the adaptation and colonization of the hosts [33].   The protein CLIBASIA_05115 is related to the biological process of pathogenesis, while membrane location and molecular function related to nucleotide binding are predicted, therefore, this protein is of future interest in the study of the interaction between CLas and the plant (Table 1), however, no conserved domains were detected ( Table 2). The structural analysis of the protein showed a high degree of similarity (data not shown) with the invasive protein Bartonella baciliformis b [34], which suggests a potential role in the pathogenicity of the bacteria.
A case of special interest in the pathogenic interaction of HLB is the protein CLIBASIA_05315, which proved to be located very close to the chloroplasts when transient expression was made in Nicotiana bethamiana [19], it is also reported that it causes some of the main symptoms of HLB such as chlorosis and starch accumulation in the same model [13]. Therefore, the detection of a domain corresponding to a β carbonic anhydrase (β-CA) is of relevant importance in the understanding of CLas action mechanisms (Table 3). The β-CA enzymes are part of various processes in cells, including respiration and photosynthesis, mediating the reversible reaction of CO 2 -HCO 3 [35]. However, it has been observed that these enzymes are related to many other physiological processes such as CO 2 fixation, lipid and amino acid biosynthesis, establishment of seedlings and response to stress [36]. The understanding of the role of these enzymes in the activation of resistance to diseases is not yet well understood, however, experimental evidence suggests that they actively participate as a salicylic acid receptor [37], and generate activation of Acquired Systemic Resistance (SAR) [38]; therefore, it is considered that the role of β-CA is related to a protection against oxidative stress in the plant [39].
Recent studies indicate that the accumulation of ATP and H 2 O 2 in plants infected with HLB is due to a significant increase in the biosynthesis activity of oxidizing compounds related to the protection of the plant and a decrease in the detoxifying elements of the same, for which reason, CLas generates an oxidative stress that damages the cells of the plant [40], damage in which the protein CLIBASIA_05315 seems to be intricately involved [13]. The fact that said protein contains a domain related to β-CA suggests that its role in the activation of the disease is that of pathogenicity effector [14] since it somehow breaks the activity of β-CA present in the chloroplast, altering its photosynthetic activity, causing the accumulation of oxidizing compounds and inhibiting their role in the activation of SAR.
Previously, the presence of β-CA in different pathogenic bacteria has been detected [41], especially in human pathogens [42] [43]. Additionally, the presence of these enzymes has been observed in pests and pathogens of agricultural importance, for which it represents a possible objective in the development of chemical control strategies [44]. However, the analysis of conservation ( Figure  2) and genetic differentiation (Figure 3) clearly show that the domain found in the protein CLIBASIA_05315 is different from that found in other bacteria,   therefore, experimental evidence of the presence of the β-CA domain in the protein CLIBASIA_05315 is necessary, because this could help to understand more fully the mechanisms by which pathogenicity develops in plants with HLB, as well as develop possible mechanisms to control the disease.
In conclusion, the analysis of gene ontology (GO) allowed us to observe that of the sixteen proteins proposed by Pitino et al. [19], some are tentatively associated with cellular energy, membranes and electron transfer. The detected domains suggest that the presence of β-CA in CLIBASIA_05315 is related to its affinity to the chloroplast and the physiological alteration previously demonstrated. In turn, the PAAR domain in the protein CLIBASIA_05115 suggests the active participation of T6SS during the development of HLB. Our findings allow us to direct the future research to the study of effectors of the Candidatus Liberibacter asiaticus in plants as a pathogenicity marker and as a molecular blank for development of diseases control strategies.