Relationship of Secreted Salivary Protein Variants to Virulence in Hessian Fly ( Mayetiola destructor (Say))

Salivary proteins are the initial contact between sedentary insect pests and their host plants. It is expected that one or more salivary proteins mediate the interaction between Hessian fly and wheat, in which a feeding site is estab-lished to the benefit of the fly. A survey of 52 loci annotated as insect secreted salivary proteins was conducted in 384 individuals evenly distributed among eight biotypes of Hessian fly (B, C, D, E, GP, L, O, and vH9). Amplicons were sequenced with Illumina, and sequence reads were aligned to the reference sequences from which primers had been designed. Positions of consistent base variation (998 in all) were identified and tabulated by biotype. No vary-ing position was associated with biotype-wide virulence to any one of wheat resistance genes H3, H5, H6, H7/H8, H9, H11, H13, and H26. The multiplate pooling strategy utilized in this study is an effective, affordable way to reveal the genotype of hundreds of individuals at tens of genetic loci.


Introduction
As the third largest producer of wheat in the world, the United States exported 27.2 million tonnes of wheat during the 2018-19 growing season [1]. To fight wheat yield loss due to Hessian fly infestation, it is necessary to be able to identify the molecular mechanisms that operate during wheat-Hessian fly interac-tions. Secreted salivary proteins are the major point of contact between sucking insects and their plant hosts. There is considerable evidence that some secreted salivary proteins trigger defense responses, while others suppress ongoing defense responses [2]. Virulence requires that the host response is sufficiently suppressed to allow the insect to survive and reproduce. In Hessian fly (Mayetiola destructor (Say)), eight biotypes have been defined by their pattern of virulence against a battery of wheat host resistance genes [3] [4] [5]. Here we report an investigation of the distribution of polymorphic nucleotides for 52 secreted salivary proteins in samples of eight biotypes that had been maintained for many years in laboratory culture, since these sites would be potential biomarkers for virulence. We looked for correlation between specific polymorphic sites and biotype, i.e., virulence to a particular set of wheat R genes.

Source of Sequence
Translated dipteran nucleotide sequence in GenBank nt was screened for signal peptides with SignalP [6], and 52 such sequences (Supplementary Table 1) were chosen for primer construction and amplification. Each 20 µl PCR was performed by using 10 µl MyFi Mix from Bioline (Taunton, MA USA), 1 µl of sample DNA in water, and 1 µl of each primer for a final primer concentration of 0.5 µM. Initial denaturation was at 95˚C for 2 min followed by 35 cycles of denaturation at 95˚C for 30 sec, annealing at 55˚C for 30 sec, and extension at 72˚C for 1 min. The PCR finished with extension at 72˚C for 5 min and holding at 4˚C. A multi-plate multiplexing strategy was applied to minimize sequencing cost (Figure 1). The four 96-well plates were sequenced as 2x250 base-pair paired-end Illumina reads in samples of 48 individuals from each of Hessian fly biotypes B, C, D, E, GP, L, O and vH9 [7].

SNP or Indel Discovery
The PCR multiplexing strategy appears as the list of barcodes in Figure 1. Reads  were grouped by primer pair, and each group was aligned and displayed in the Integrative Genomics Viewer [8]. Single nucleotide variants (SNPs) were detected by visual inspection. Coordinates of each combination of locus (primer pair) and variable site were saved to a file for each biotype from the Integrative Genomics Viewer.

Tabulation and Display of Variable Sites
The number of variable sites shared by each possible combination of biotypes was calculated from the coordinates files using the Bioinformatics and Evolutionary Genomics website of the Vib-Ugent Center for Plant Systems Biology of Ghent University [9]. Combinations were enumerated three ways, for all eight biotypes B through vH9 listed above, for the seven biotypes without vH9, and for the six biotypes without vH9 and L, to investigate the sharing of variable sites among biotypes to the exclusion of other biotypes ( Table 1). The combination counts were presented as partial or complete Venn diagrams of greatly differing appearance, because the problem of drawing such high-order Venn diagrams has not yet been solved completely. The eight-biotype diagram only showed the numbers of variable sites within each biotype and shared in 12 of 28 possible pairs of biotypes ( Figure 2); it was produced using the Venn Diagram Maker Online [10], and the size of each overlap in the diagram reflected the number of shared variable sites. The complete seven-biotype diagram ( Figure 3) was drawn in the Adelaide format using R package venn [11]. The complete six-biotype di-  Figure 4) was drawn with tools in the same Vib-Ugent website [9] that counted the combinations.

Results and Discussion
Illumina sequencing produced 27,247,176 merged read pairs that passed quality control and were greater than 150 bases in length. The counts of read pairs ranged from 79,684 for S42 to 1,846,094 for S31 and from 1,148,582 for biotype C to 2,808,266 for biotype D. Counts of read pairs ranged from 261 to 123,110 for individual flies over all loci. The biotypes display distinct combinations of virulence and avirulence to alleles at eight wheat R genes as shown in Table 1 [12]. Thus biotypes B, D, E, L, O, and vH9 all survive on H3 wheat, but they share no polymorphic sites in our set of 52 secreted salivary proteins to the exclusion of C and GP (Table 2). Likewise biotypes C, D, L, O, and vH9 overcome H6 but share no polymorphic sites to the exclusion of B, E, and GP; B, C, D, L, and vH9 overcome H7/H8 and share no polymorphic sites to the exclusion of E, GP, and O; and C and vH9 overcome H9 but share no polymorphic sites to the exclusion of B, D, E, GP, L, and O ( Table 2). The relative sizes of circles and overlaps in Figure 2 show the sharing or lack thereof of 105 positions for single biotypes and 256 positions for pairs of biotypes, but it does not show the 637 polymorphic positions shared among three or more biotypes. The pairs C-GP (148 exclusively shared sites, Table 2), E-L (39 sites), E-O (20 sites), and GP-L (16 sites) have the most sharing to the exclusion of other relationships.
Among the tested biotypes, only E overcomes H13 (Table 2). It uniquely harbors 22 variable positions, but 13 sites of substitution and one site of deletion reside in one acetylcholinesterase locus, which matches GenBank accession ATY49611.1. However, the vH13 gene has previously been mapped and characterized (GenBank accession AEG42082.1) [13], so the distinct variant positions in the acetylcholinesterase of biotype E might be merely coincidental. Biotypes L and O overcome H5, but the response of vH9 to H5 is not known with certainty. Biotypes L and O uniquely share two variable sites, and the trio of L, O, and vH9 shares no variable sites ( Table 2). The variants in L and O occur at base 40 in locus 78 (a peroxidase; Figure 5(a)) and base 236 in locus 144 (lysosomal acid phosphatase; Figure 5(b)), but both occur in only 2% -5% of individuals and therefore cannot account for the overall virulence of L and O toward H5. Biotypes B, C, D, E, and O overcome H26. These biotypes share polymorphism at base 120 in S34, a sequence similar to Genbank accession XP_020706750.1 (endothelin-converting enzyme; Figure 5(c)), again at too low frequency to account for the virulence of these biotypes at large to H26. Only biotype O overcomes H11, but the response of L and vH9 to H11 is not known with certainty; O has 16 apparently unique polymorphic sites. Figure 3 and Figure 4 show the distributions of uniquely shared polymorphic sites when vH9 or vH9 and L are not considered. The counts differ somewhat from those in Table 2, where all eight biotypes are represented. All three depictions show a large core of 189 to 212 locus-positions that vary in all the represented biotypes and therefore do not appear to be directly involved in virulence. Also, biotype GP uniquely shares more variable positions than any other Figure 5. Within-biotype genotype frequencies in particular loci that are polymorphic in biotypes virulent to H5 and H26. (a) Genotype frequencies by biotype at nucleotide 40 in locus S78, which matches GenBank ETN66032.1, peroxidase from Anopheles darlingi; (b) Genotype frequencies by biotype at nucleotide 236 in locus S144, which matches GenBank KYN33573.1, lysosomal acid phosphatase 1, partial, from Drosophila nasuta; (c) Genotype frequencies at nucleotide 120 in locus S34, which matches GenBank XP_020706750.1, endothelin-converting enzyme homolog isoform X1 from Athalia rosae. biotype; L is a distant second. Biotype GP is not virulent against any of the listed resistance genes [14] and thus might be the least modified from ancestral populations existing before the deployment of wheat R genes. A multiplate pooling strategy was cost-effective and yet allowed reads to be traced to individual fly. This strategy was successfully applied to a panel of 52 genes that encode signal peptides and whose protein products are expectedly secreted. Nevertheless, no frequent nucleotide variant at any particular polymorphic site was limited to biotypes that shared a common virulence toward any of eight wheat resistance genes, and thus none of the 52 queried genes seems to cause virulence.
A more sensitive method to detect gene involvement would be to look at alleles as whole-amplicon variants, but even this is not free of errors. One attempt was to 1) map all the reads to one of the 52 genes, 2) assemble those reads that mapped to any one locus with SPAdes [15], 3) determine individual fly genotype by blastn [16] of the reads against the assembly, and 4) look for high-frequency variant alleles shared by biotypes virulent to a particular resistance gene. This attempt was complicated by the large number of spurious "alleles" arising from assembled sequencing errors, such that the actual alleles comprised less than 5% of the total assembly and accounted for less than 70% of the read counts. Nevertheless, once optimized this method might (or might not) reveal a relationship of a particular multiSNP allele to virulence to a particular resistance gene. A more robust method is linkage mapping with markers traceable to regions in the Hessian fly genome assembly, as performed by Zhao et al. [17]