Assessment of Genetic Relationship and Application of Computational Algorithm to Assess Functionality of Non-Synonymous Substitutions in DQA2 Gene of Cattle, Sheep and Goats ()
Received 27 March 2015; accepted 8 December 2015; published 11 December 2015
1. Introduction
Genetic variation in parasite and host and relative distribution across space and time is of great interest and serves as a basis for adaptive change. Spatial population structure can strongly influence the process of co- adaptation between parasite and host and the evolution of virulence [1] . The major histocompatibility complex (MHC) is a large genomic region or gene family found in most vertebrates that encodes MHC molecules. MHC molecules play an important role in the immune system and autoimmunity. They are cellular glycoproteins involved in antigen presentation to CD4+ T cells. The genes encoding these molecules are polymorphic [2] [3] . DQ genes of MHC class II region encode for α (DQA) chain of the molecule [4] . The second exon has been shown to be highly polymorphic and under positive selection, and the class II DQA gene has recently attracted more attention [5] . In cattle, there are two or possibly three DQA genes [6] . The ovine DQ region encompasses 130 kb, with the DQ1 and DQ2 subregions located 22 kb apart. According to McKenzie et al. [7] , DQA2 of sheep is found on chromosome 20.
Because MHC genes must defend against a great diversity of microbes in the environment, the MHC molecules (coded for by the MHC genes) must be able to present a wide range of peptides. MHC genes achieve this through several mechanisms: 1) the MHC locus is polygenic, 2) MHC genes are highly polymorphic and numerous alleles have been described, and 3) several MHC genes arecodominantly expressed [8] [9] . DQ genes of MHC class II region encode for (DQA) and (DQB) chains of the molecule and are highly polymorphic [4] .
Recent advances in high-throughput technologies have generated massive amounts of genome sequence and genotype data for a number of species. The method to identify functional SNPs from a pool, containing both functional and neutral SNPs is challenging by experimental protocols [10] . Therefore, computational predictions have become indispensable for evaluating the disease-related impact of nonsynonymous single-nucleotide variants discovered in exome sequencing [11] . A number of computational methods have been developed to predict the functional effect of a non-synonymous single-nucleotide polymorphism (nsSNP), a single-nucleotide change in a protein-coding region of a gene that causes an amino acid substitution (AAS) in the resulting protein [12] . Many such methods have their roots in molecular evolution, as they use information derived from multiple sequence alignments. Most computational prediction tools for amino acid variants rely on the assumption that protein sequences observed among living organisms have survived natural selection. Therefore, evolutionarily conserved amino acid positions across multiple species are likely to be functionally important, and amino acid substitutions observed at conserved positions will potentially lead to deleterious effects on gene functions [13] .
The increasing information on genetics of host and parasites and their interaction at molecular level can lead to insights into disease emergence and control. Plasticity in the host genome especially for the genes responsible for disease resistance gives an advantage to host against pathogens with respect to protection. Therefore, studying the variability in the host population for disease resistance genes such as DQA2 is of utmost importance in practicing genetics of disease resistance.
The general objective of the study was to investigate computationally molecular genetic variation of DQA2 gene of some selected mammalian species (cattle, sheep and goats) especially on its evolution and differentiation within and among species as well as the attendant effects of the polymorphism on the function of DQA2 gene.
2. Materials and Methods
2.1. Sequences of Species
A total of thirty three (33) DQA2 nucleotide sequences comprising cattle (10), sheep (12) and goats (11) were retrieved from the GenBank (www.ncbi.nlm.nih.gov). The GenBank accession numbers of the sequences were AY829359, AY829358.1, AY829357, AY829356.1, AY829355.1, AY829354.1, AY829353.1, AY829352, AY829351, AY829350 and AY829349.1 (caprine); HG798789.1, HG798790.1, HG798791.1, HG798794.1, JX484834.1, FJ179558.1, FJ179557.1, FJ179551.1, HG798796.1, HG798795.1, HG798793.1, U65906.1 (ovine); D50045.1, D50046.1, D50049.1, D50048.1, D50047.1, AB098906.1, NM_001012681.1, JN225517.1, AY442305.1 and AY442304.1 (bovine).
2.2. Sequence Alignment and Translation
Sequences alignment, translation and comparison were done with ClustalW as described by Larkin et al. [14] using IUB substitution matrix, gap open penalty of 15 and gap extension penalty of 6.66.
2.3. Functional Analysis
In silico functional analysis of missense mutations was obtained using SNAP, which is a neural network based method for identifying from sequence functionally disruptive single amino acid substitutions [15] . The inputs to SNAP include secondary structure and solvent accessibility predictions, evolutionary and family information, biophysical differences between the wild type and mutant amino acids, statistical likelihoods of observing residue triplets around the mutation site, SIFT [16] and SwissProt [17] annotation if available. For each mutant, SNAP returns three values: the binary prediction (neutral/non-neutral), the RI (range 0 - 9) and the expected accuracy that estimates accuracy [Equation (1)] on a large dataset at the given RI (i.e. accuracy of test set predictions calculated for each neutral and non-neutral RI [18] .
(1)
2.4. Phylogenetic Trees Analysis
Neighbor-Joining NJ trees were constructed each using P-distance model and pairwise deletion gap/missing data treatment. The construction was done on the basis of genetic distances, depicting phylogenetic relationships among the DQA2 nucleotide sequences of the investigated species. For the nucleotide sequences, the evolutionary distances were computed using the Maximum Composite Likelihood method. In the case of the amino acid however, the evolutionary distances were computed using the Poisson correction method. The reliability of the trees was calculated by bootstrap confidence values [19] , with 1000 bootstrap iterations using MEGA 5.1 software [20] . Similarly, UPGMA trees for the DQA2 gene were constructed with consensus nucleotide and amino acid sequences. All the nucleotide sequences were trimmed to equal length of 236 bp corresponding to same region before generating the trees.
3. Results
The predicted amino acid sequences of caprine, ovine and bovine DQA2 orthologous alleles are shown in Figure 1.
Forty seven amino acid substitutions of the wild type alleles located in the putative peptide coding region of caprine DQA2 alleles were obtained from the alignment of deduced amino acid sequences of goats. Out of these, eleven amino acid substitutions (H14L, H14R, L34M, E35L, G56S, G56R, 161V, A62E, D69Q, T72N and T72G) were returned neutral (Table 1); an indication that they do not impair protein function. The Expected Accuracy (EA) ranged from 53% - 87%.
For sheep, sixteen amino acid substitutions (A11P, A11T, A11G, A11M, L14S, L14T, V27L, V27S, G35S, S46T, D55E, L57T, L57A, L57G, K65Q and V68I) appeared beneficial while the rest forty seven appeared harmful (Table 2). In this case, the EA ranged from 53% - 93%.
Twenty four amino acid substitutions did not impair the function of protein while seventy seven substitutions
Table 1. Functional analysis of coding nsSNPs of the DQA2 gene of goat using SNAP.
I = isoleucine, L = Leucine, V = valine, C = cysteine, A = alanine, G = glycine, P = proline, T = threonine, S = serine, Q = glutamine, N = asparagine, H = histidine, E = glutamic acid, D = aspartic acid, K = lysine, R = arginine, Y = tyrosine, M = methionine, F = phenylalanine. Including only predictions with: RI ≥ 0; Expected Accuracy (EA) ≥ 50%.
Table 2. Functional analysis of coding nsSNPs of the DQA2 gene of sheep using SNAP.
I = isoleucine, L = Leucine, V = valine, C = cysteine, A = alanine, G = glycine, P = proline, T = threonine, S = serine, Q = glutamine, N = asparagine, H = histidine, E = glutamic acid, D = aspartic acid, K = lysine, R = arginine, Y = tyrosine, M = methionine, W = tryptophan. Including only predictions with: RI ≥ 0; Expected Accuracy (EA) ≥ 50%.
Figure 1. Comparison of the predicted amino acid sequences DQA2 alleles of goat, sheep and cattle. Dot indicates amino acid identity; Missing = ? Amino acid positions included in the peptide binding region according to Reche and Reinherz [31] .
appeared to have a negative effect on the function of protein of cattle (Table 3). The EA ranged from 53% - 94%, respectively.
The phylogeny based on nucleotide and amino acid sequences of DQA2 gene revealed the close relatedness of the caprine, ovine and bovine species, although there were some intermingling among the sequences of the three species investigated (Figure 2 and Figure 3).
The genetic relationships of DQA2 Bovidae subfamily members of goats, sheep, and cattle shown in the UPGMA phylogenetic trees (Figure 4 and Figure 5) revealed that goats and sheep were closer at this locus compared to cattle.
4. Discussion
MHC genes are the most polymorphic genes described in vertebrates, with polymorphisms occurring predominantly at residues involved in peptide binding (antigen binding sites) [21] . Variation at these sites may affect the antigen binding groove and antigenic-peptide binding ability, and hence peptide specificity [2] . The present findings indicate that the caprine DQA2 gene is highly polymorphic. Similar patterns were observed in the case of ovine and bovine species. Current concerns about food security highlight the importance of maintaining productive and disease-resistant livestock populations. Susceptibility to monogenic and complex diseases has a strong impact on the economic output of livestock farms. Disentangling the genetic factors that modulate disease progression might be useful to implement selection schemes aimed at eradicating or decreasing the incidence of pathological conditions [22] [23] . The genetic analysis of production and disease-related traits in goats has been rarely done at a genome-wide scale. In this regard, the lack of well-established microsatellite panels covering the whole genome hindered, to a significant extent, the implementation of genome scans aimed at detecting QTL
Table 3. Functional analysis of coding nsSNPs of the DQA2 gene of cattle using SNAP.
I = isoleucine, L = Leucine, V = valine, C = cysteine, A = alanine, G = glycine, P = proline, T = threonine, S = serine, Q = glutamine, N = asparagine, H = histidine, E = glutamic acid, D = aspartic acid, K = lysine, R = arginine, Y = tyrosine, M = methionine, F = phenylalanine, W = tryptophan. Including only predictions with: RI ≥ 0; Expected Accuracy (EA) ≥ 50%.
Figure 2. Phylogenetic relationships of caprine, ovine, and bovine DQA2 nucleotide sequences.
Figure 3. Phylogenetic relationships of caprine, ovine, and bovine DQA2 amino acid sequences.
Figure 4. Phylogenetic relationships of caprine, ovine, and bovine DQA2 consensus nucleotide sequences using UPGMA.
Figure 5. Phylogenetic relationships of caprine, ovine, and bovine DQA2 consensus amino acid sequences using UPGMA.
[22] . Although, identifying single gene markers associated with resistance to gastro-intestinal parasites is difficult as resistance to parasites is considered to be polygenic with hundreds to thousands mutations responsible for the resistant phenotype [24] [25] , research continues in the area of genetic markers as they have the advantage over phenotypic markers of measurement prior to birth, meaning that producers can make productivity decisions early [26] . A high level of diversity in MHC genes allows populations to survive despite exposure to rapidly evolving pathogens [27] . It plays major role in determining whether transplanted tissue will be accepted as self or rejected as foreign. Also the study of the MHC can aid in the development and the design of vaccines based on synthetic peptides comprising one or more T-cell epitopes of the pathogen. Using footrot marker screening of DQA2 gene of sheep, the potential to select a high resistant flock is possible within three to five breeding seasons [28] .
The presence of numerous alleles at a particular MHC locus is evidence of the long-term evolutionary persistence of the locus. This is suggested by the frequency with which alleles in one species are more closely related to the alleles in a closely related species than to the other alleles in the same species [29] . This could be exploited in the development and the design of vaccines as well as drug production. The close similarity of a gene among ruminants may be ascribed to recent separation in evolutionary process and/or similar selection pressure which the ruminants have suffered during evolution [30] . The genetic relationships of DQA2 Bovidae subfamily members of goat, sheep, and cattle shown in the UPGMA phylogenetic trees were in accordance with the well- known evolutionary history of Bovidae subfamily speciation, although more expressed using the amino acid sequences than the nucleotide sequences. Here, goats are more closely related to sheep than cattle; which is congruous to the submission of Zhou et al. [2] .
In developing countries, such as Nigeria, some quantitative and qualitative measurements have been used for selection and breeding purposes against disease infestation with little or no meaningful improvement in the stocks. This has necessitated the paradigm shift to computational genomics to facilitate the analysis and interpretation of the vast array of molecular data. Therefore, the present beneficial SNPs could be exploited in the genetic improvement of Nigerian native livestock for increased disease resistance using information emanating from both wet and dry laboratories in future studies.
5. Conclusion
The study revealed high genetic diversity at the DQA2 locus of goats, sheep and cattle. Some beneficial non- synonymous amino acid substitutions at putative peptide binding sites were also found at this locus. This knowledge would be relevant for performing further genotype-phenotype research as well as pharmacogenetics studies in order to show association between caprine, ovine and bovine DQA2 allelic variation and the clinical progression of infectious diseases especially in a developing country such as Nigeria. This becomes imperative considering the suggestion that pathogen-mediated selection (PMS) is the driving force maintaining diversity at MHC loci.
NOTES
*Corresponding authors.