Host-Pathogen Wars: New Weapons from Biotechnology and Genomics

Pathogens are imminent threats to crop production. Among the management tools available to protect crops from diseases, the use of host-plant resistance had been hindered by a lack of tools and resources to identify resistance genes (R-genes). Genomic technologies have empowered acquisition of a new level and quality of information on plant-pathogen interactions. Next generation sequencing, differential transcriptome analysis, gene editing, and use of bioinformatics have greatly expanded the numbers of R-genes identified, enriched understanding of R-avirulence gene interactions, and disease diagnosis. In this review, we highlight the application of genomic technologies to identification of pathogen machinery for future improvement of host plant resistance.


Introduction
Since the emergence of the first land plants around 700 million years ago [1], molecular interactions have occurred with microorganisms, symbiotic, epiphytic or pathogenic. Such interactions have shaped the evolution of the plant immune response, evolving a broadly primary mechanism to recognize common features of microbial pathogens and a more specific secondary mechanism based on resistance proteins (R proteins) to detect the presence of the pathogen effector proteins [2].
In agroecosystems, natural ecosystems that have been modified to produce food and fiber, cultivated plants are more exposed to sudden alterations imposed by weather and man to pest damage and catastrophic outbreaks owing to lack of diversity in species of plants and microorganisms. A plant disease epidemic develops when three factors occur simultaneously over a relatively long period of time: favorable environmental conditions, a virulent pathogen, and susceptible host plants [3]. Plant disease management practices try to modify one or more factors aiming to reduce initial inoculum (delay disease onset) or reduce disease progress curve in order to achieve economical control. Environment can be modified through crop management, as in a no-tillage system, use of crop rotation, adjustment of planting density, early planting or avoidance of crops susceptible to pathogens that predominate in specific locations. In some cases, when the pathogen is widespread, the only plausible options to slow disease progression are chemical control and the use of resistant genotypes.
The objectives of most plant breeding programs are to develop varieties that produce greater yields or better quality. However, there are several successful examples of breeding for disease resistance worldwide [4] [5] [6]. Plant genetic resistance can be defined as a set of mechanisms which interfere with and/or reduce the growth and/or development of economically important parasites [7], and disease resistance can be categorized as complete resistance or immunity conditioned by a single gene or incomplete resistance conditioned by multiple genes of partial effects [8], respectively named qualitative disease resistance and quantitative disease resistance. Breeding programs have identified and used R-genes widely, achieving complete resistance mediated by resistance (R) proteins. However, quantitative disease resistance, resistance that is expressed as a reduction in disease, rather than as the absence of disease, would contribute to the design and deployment of durably resistant crop cultivars [9].
Qualitative disease resistance is based on R-genes and occurs when there is an incompatible interaction between host plant and pathogen. In the evolutionary host-pathogen arm race, the pathogens impose a selective pressure and force the plant population to evolved post-invasion resistance mechanisms, often controlled by dominant R-genes, whose proteins detect specific pathogen effectors and trigger effective defense responses [10]. Resistance (R) and avirulence (avr) proteins may have direct recognition as stated in the gene-for-gene model proposed by Flor (1956) [11] or indirect interaction, as per the guard or decoy model [12], providing resistance to the plant. However, R-genes can be deflected by the pathogen and allowing the pathogen to successfully establish in the host.
To facilitate the study of R-genes and their interaction, it is necessary to know where the gene is located in the genome, when and how its expression is regulated, and the phenotypic reaction resulting from its expression. Genetic mapping to locate the chromosome location of a specific R-gene or the distribution of R-gene families across the genome has been greatly facilitated with improvement of genetic markers technology over the last three decades. The develop-  [13]. Cleavage of specific sequences in the DNA using restriction enzymes produces thousands of markers that can be evaluated by segregation of the corresponding loci along the chromosomes of an organism [14]. The advent of the polymerase chain reaction (PCR) [15], simple-sequence repeat (SSR) [16] and Single Nucleotide Polymorphism (SNP) markers provide detection of subtle changes in DNA between closely related individuals, detecting even changes in single nucleotides throughout the genome. SNPs can be detected in expressed sequence tags (ESTs) or newly sequenced DNA [17] [18], with their genomewide distribution potentially reducing marker bias [19]. The whole genome sequences of both crop plants and pathogens have increased the ability to study the regulation of R-genes at the whole genome level. While Sanger-based sequencing has produced most of the high-quality reference genomes to date, next generation sequencing (NGS) methods have dramatically improved the cost-effectiveness of whole genome sequencing and overtaken Sanger sequencing even to produce reference-quality genome sequences. Evolution and gene expression studies benefit from re-sequencing of ESTs or conserved regions in the genome [20] [21] [22]. As a result of genome sequencing, gene cloning becomes trivial, as does the identification of potentially functional R-genes that contain functional domains known to be diagnostic [23] [24] [25]. Further, gene editing offers the opportunity to reveal the effects of R-genes that cause interactions with the pathogen, perhaps disrupting virulence which drives a mutation to high frequency within a pathogen population [26].
Effector proteins have been a useful tool for the identification of candidate resistance genes as they act as molecular markers [25] [27]. Avr genes have corresponding R-genes that when activated in the plants can trigger immune responses. As such, effectors can be used to track their R-gene complements in many plant species and provide information about their functionality [25] [27]. Although there is an extensive database available of effector genes [28] [29], there will be many yet to be discovered as pathogens evolve quickly. Some class of pathogens have few distinct characteristics that are helpful to locate avr genes in the genome. For example, oomycete effectors are easily detected through the genome by their secretion signal peptide and RXLR motifs (Arg-X-Leu-Arg) required for translocation into the cell [23], while fungi have different signal sequences but no known specific signature sequence motifs to identify effectors. Host

Application of Genomic Tools to Improving Host-Plant Resistance
Genes conferring durable resistance to a pathogen are often difficult to find, deploy, and maintain. Crop plants frequently have extensive, although sometimes cryptic, genetic variation available in germplasm collections. Monoculture has arguably increased the rate at which new pathogens emerge [30] with associated selection pressure imposed on both the plant and pathogen potentially contributing to chromosome rearrangement of pathogens [31], horizontal gene transfer [32] [33], and recombination or mutation events. Selection pressure on R-genes is constantly present in the field and forces R-genes to evolve to defeat pathogens, while pathogens in turn experience selection pressures to defeat the plant immune system (arms race). Since the Pto gene in tomato [34], several R-genes have been cloned and most of them encode intracellular multi-domain proteins [35]. R proteins have a C-terminal leucine-rich repeat domain (LRR) linked to a central nucleotide-binding domain (NB), forming the NB-LRR receptor proteins which recognize effectors during pathogen infections. The nucleotide-binding fold in NB-LRR proteins is part of a larger domain, the NBSapoptosis repressor with caspase recruitment (ARC) domain [36].
Genetic mapping has become a key element in studying plant disease resistance and identifying DNA markers diagnostic of disease resistance. First based on isozymes, the development of more abundant markers progressing from RFLPs to SSRs and eventually to SNPs has increased the efficiency for identifying diagnostic markers based on differences between genotypes of as little as a single base pair [37]. As a result, markers can be developed to narrow down a genomic region of interest, 'fine mapping' a trait to identify plausible candidate gene(s) [38]. Characteristic features of R-genes such as the NBS-LRR or other motifs provide clues by which they might be tentatively identified. Routine whole genome sequencing now accelerates searches for candidate R-genes likely to participate in salient biological pathways, utilizing similarities to known genes to deduce their possible functions in the cell, and providing DNA sequence from which to develop effector-assisted markers [21] [25].

1) Next generation sequencing applications
An important way to identify effector genes is through conserved gene regions of pathogens or fingerprints. For phytobacteria, a typical effector sequence has hrp promoters, signal peptide, T3SS chaperone site, and skewed GC content in pathogenicity islands [39]. For fungi there is a signal peptide, transmembrane domain, and secretomes predicting effector location on the genome, however, sequence tags need to be introduced to confirm effector function since there are no universal translocational motifs for fungi as it is for oomycetes [40]. Although the RXLR translocational motif may also be found in fungi, it is the fingerprint for oomycete effectors [21] [25]. Until recent years, plant parasitic nematode genes had no known signature by which to predict effector proteins. The complete genome analysis of the yellow potato cyst nematode (Globodera ros-  [33]. This finding indicates that F. oxysporum and F. verticilioides are most likely to have had a common ancestor more recently than either one did with F. graminearum. Additionally, re-sequencing data of Leptosphaeria maculans fungi, etiological agent of black leg disease of canola, showed a predicted effector and a previously cloned gene were in fact the same sequence with just three SNPs differentiating the virulent and avirulent alleles [20]. Sequencing and re-sequencing is important for, but not limited to, finding variations that cause changes in pathogenicity and tracking avr genes evolution from one species to another. Building phylogenetic trees is a useful way to depict DNA sequence information in the study of pathogens and to find microorganism related with disease control. DNA sequences of closely related strains of Bacillus amyloliquefaciens, a biocontrol agent, were aligned and searched for associations with bacterial activity against pathogens. Strains of B. amyloliquefaciens contained many beneficial genes related to plant health such as genes that elicit plant basal defenses (flgK, fliD, and Hag) [42]. Genomic comparisons of pathogenic and non-pathogenic strains may provide clues toward new methods of disease control.
NGS is also routinely being used for RNA sequencing to provide a means to gain insight into the gene repertoire and expression patterns of an organism.  [44]. These findings suggested that both pathogens are genetically closely related, and their effector genes may share common evolutionary pathways that lead to host specificity. Similarly, a comparison between non-pathogenic and pathogenic fungi revealed ESTs unique to pathogenicity [45]. The predicted protein functions of such transcripts might help in identifying the actual R-genes.
These genetic differences between pathogenic and non-pathogenic organisms can further be screened for domains that are characteristic of R-genes such as NBS and ARC-2 domains [24]. Simple sequence repeats within ESTs typically vary between 2 -6 nucleotides in the repeating unit(s). It is useful to design markers because the number of repeats can evolve quickly and create a great number of polymorphic loci [18]. The repeats can often be found close to a gene of interest, which can be efficiently utilized to track R-genes. However, ESTs databases are not a complete repertoire of expressed genes on the genome because the expression of R-genes can be dependent on environmental condition, what plant part the elicitor is being applied, and/or the time pathogen infection is occurring, making their detection difficult. In partial summary, ESTs remain of high value in the study of plant disease, as will be further elaborated in this article.
From another perspective, quarantine of pathogens is highly important to prevent introduction into areas that are pathogen free. While seed lots and nurseries are often screened for quarantined pathogens, small quantities of a pathogen in a large seed lot or nursery may pass undetected due to lack of sensitivity of diagnostic methods. A survey completed in Italy using 454 pyrosequencing together with meta-barcoding revealed Phytophthora ramorum, a quarantined pathogen for ornamental plants, in the soil of one of the nurseries [22]. This early detection has prevented the spread of P. ramorum to nurseries nearby, exemplifying how genomic techniques applied in plant pathology may prevent disastrous consequences. Although the identification of P. ramorum was possible, many pathogens have poor or no database available to rely on detection through sequencing. Oxford Nanopore technology is the most recent third generation DNA sequencing with ability to produce long reads length which facilitate genome assembly and analysis of structural variants [46]. It can generate 100 kilobase reads at a low cost. However, the use of this technology has not been fully explored in plant pathology. The MinIon is the mobile sequencer equipment that can quickly sequence a whole genome in about 24 hrs. The MinIon contains one cell and it was able to reconstruct Escherichia coli gene order without reference sequence or platforms [47]. The ability to sequence the genome at a fast pace will open doors to gather more pathogen information and detect diseases on site.
2) Differential gene expression RNA sequencing and microarray transcriptome analysis are often used to were up-regulated when cysts were germinating on soybean leaves. However, three hours after infection RXLR protein genes were down-regulated, as well as in zoospore stages [48]. Importantly, this information highlighted steps at which the pathogen may be prompted to infect, invade and reproduce in planta.
High throughput microarray methods have been developed to profile transcriptomes [49]. Although microarrays are an older technology than RNA sequencing (RNA-Seq), they are still widely used in plant science due to low cost. For instance, the life cycle of a pathogen may be related to its gene expression at different steps of infection. Phytophthora infestans gene expression profiling based on more than 18,000 unigenes gathered from previously identified ESTs, showed different expressed genes throughout the pathogen life stages with almost 90% of the sequence selected expressed at least once during the P. infestans life cycle. Groups of genes expressed during different stages of the pathogen life cycle may indicate genes involved in infection or disease development. Examples of such genes include catalase/peroxidase and RXLR proteins, which are related to plant pathogenesis and flagellar components, which improve pathogen mobility and dispersion [21].
Although microarrays are useful for gene expression and quantification, they can miss important genes that are unknown in EST databases. Microarrays are unable to identify alternative splicing, may or may not distinguish gene isoforms, and are only a relative quantification method. In comparison, RNA-Seq provides a broader view of the transcriptome with coverage of coding and non-coding genes, besides being able to analyze just mRNA or poly-A RNA with elimination of rRNA. For example, comparative transcriptome profiling of two races of Xanthomonas oryzae pv. oryzae revealed some genes to be higher expressed in the virulent HB1009 (K3 race) strain than avirulent Xoo KACC10331 (K1 race), which were predicted to be involved in virulence or host-specificity [50].
The ability of RNA-Seq to recover small RNA is facilitating studies of gene silencing. Zheng et al. (2015) [51] were able to predict genes essential to Heterodera avenea survival using a siRNA system. Soaking H. avenea in a solution containing gene specific siRNA from Caenorhabditis elegans proved to be lethal to H. avenea larvae. In another study, RNA-Seq was utilized to examine changes in gene expression in Globodera pallida, a potato cyst nematode. High numbers of expressed genes were observed in juvenile stage 2 (J2) and male stage. Importantly, most of the predicted effectors in G. pallida had almost no similarity with root-knot nematodes which suggested different mechanisms of infection [31]. Despite RNA-Seq having gained popularity in determining transcriptome profile and quantifying gene expression levels, it is still a challenge to apply this method in organisms in which a reference genome is currently not available.
However, as the costs of NGS continues to decrease, a reference quality genome sequence is expected to become available for most plant, animal, and microbial species by the next decade. The field of RNA-Seq based transcription profiling is progressing rapidly, and each new improvement in sequencing and data processing strategy brings us closer to a more complete understanding of the complex mechanisms involving host-plant defense responses and how they are overcome by pathogens.

3) Gene editing
The ability to inactivate or insert a gene sequence facilitates the work of plant pathologists in studies of devastating pathogens worldwide such as Magnaporthe oryzae that causes rice blast and accounts for more than 30% of world rice loss [52]. there is an improved software connected to it to analyze the sequence data more accurately [57]. Once a genome is sequenced, the genome assembly is annotated, and protein-based gene models are predicted. For example, gene predictions can be done by software such as FGENESH+ [17] or gene ontology can be done by the BLAST2GO program after RNA sequencing is performed [51]. Different bioinformatics programs were used to assemble the whole Globodera genome and predict effectors and their location, splicing sites, and identify horizontal gene transfer candidate genes [41].
Homology with known genes and proteins facilitates manual inspection for functional genes and introns, using gene ontology classifications to collect and analyze genetic data [56]. Many DNA sequences in international databases can be employed to search for the best match to new R or effector gene sequences, pathogen deployments and speciation, and genome and gene evolution. Even small changes in DNA sequence can distinguish one strain from another, making the use of SNP markers important for distinguishing populations with conserved region that might be essential to plant resistance. By aligning DNA sequences from resistant and susceptible wheat plants, a SNP marker was found and assessed for putative linkage to the leaf rust resistance Lr67 gene, not found in susceptible plants [58]. Moreover, RFLP sequences of necrotrophic pathogens M. B. da Silva et al. American Journal of Plant Sciences from the Cochliobolus genus were anchored and many scaffolds were created to search for SNPs and pinpoint virulence genes [59]. In another example, by analyzing the DNA sequence of different isolates of Fusarium oxysporum f. sp. vasinfectum, pathogenic and non-pathogenic to cotton, Wang et al., (2010) [60] were able to determine that the virulent strain of Fusarium oxysporum f. sp. vasinfectum found in Australia was evolved from a wild ancestor indigenous to Australia that is mildly pathogenic to cotton. Bioinformatics have infinite uses and is a necessary key tool in the study of plant-pathogen interaction and disease control.

Final Considerations
Advances in biotechnology and genomics have improved the potential for research discoveries in host-plant interactions. The emergence of new genomic tools such as the draft genome sequence of crop plants and their pathogens, and the new gene editing tools such as TALENs and CRISPR-cas9 are offering additional avenues for researchers to explore new approaches to discovering and incorporating host-plant resistance for plant improvement. The rapid declines in cost of new technology, together with continually expanding capabilities, promise a rich yield of new discoveries salient to mitigation of plant disease. The evolution of DNA technology has greatly benefitted the studies of new and alternative identification methods of pathogens. Faster pathogen diagnosis methods can help both farmers and researchers and will lead to improvement of disease management strategies. A better support in research and development to further understand pathogen biology needs to be addressed.