On the Different Ways to Handle the Trend of Disease Risk in Genetic Association Tests

Genetic association studies usually apply the simple chi-square (χ 2 )-test for testing association between a single-nucleotide polymorphism (SNP) and a particular phenotype, assuming the genotypes and phenotypes are indepen-dent. So, the conventional χ 2 -test does not consider the increased risk of an individual carrying the increasing number of disease responsible allele (a particular genotype). But, the association tests should be performed with the consideration of this disease risk according to the mode of inheritance (addi-tive, dominant, recessive). Practical demonstration of the two possible methods for considering such order or trends in contingency tables of genetic association studies using SNP genotype data is the purpose of this paper. One method is by pooling the genotypes, and the other is scoring the individual genotypes, based on the disease risk according to the inheritance pattern. The results show that the p-values obtained from both the methods are similar for the dominant and recessive models. The other important features of the methods were also extracted using the SNP genotype data for different inheritance patterns.


Introduction
The disease aetiology has been well understood from the blessings of the genome-wide association study (GWAS). GWAS detects the association between genetic variants and disease traits using samples from a given population. The findings of GWAS data have opened a new clinical insight. This leads to novel bioinformatic advances in processing and interpreting GWAS summary data that enabled the detection of novel disease variants and gene loci [1]- [8].
Testing association is a crucial part of GWAS as the positive genes are reported from this inferential procedure in a case-control study design. The contingency table tests for individual single-nucleotide polymorphism (SNP) are carried out here, where, the individual genotype counts are handled with the phenotype (case-control) [3] [5] [6] [9] [10] [11] [12]. Generally, the two degrees of freedom (d.f.) simple chi-square (χ 2 )-test is applied for the contingency table for testing association [13].
But, this conventional χ 2 -test does not consider risk of developing disease for carrying a particular genotype. That is, instead of assuming that the conditional probability of being affected with disease having a specific genotype, an independence association between the genotype and phenotype is assumed. For this independence assumption, the simple χ 2 -test has no logic of ordering or trend of genotype based on the disease risk. Because, different people having different genotype combinations will produce varying risks of developing a particular disease as frequency of risk allele will differ with respect to the difference of genotype.
Defining the penetrance function is the way to model the relationship between SNPs and disease risk by considering such order [14] [15] [16]. This function measures the probability for occurring a particular phenotype for a given genotype [11] [12] [17] [18]. For each inheritance pattern (recessive, dominant, additive), the penetrance can be defined by a mathematical model [10] [11] [17].
There are two ways to include this trend or order of genotypes in the contingency table. One is to rearrange or pool of genotype counts of the table with the consideration of alternative model of penetrance [12] [19], and the other is applied by specifying a score vector for each of the models.
The main objective of this paper is to demonstrate a practical application of the two different ways of considering the order or the trend of the genotypes in GWAS association tests for SNP genotype deoxyribonucleic acid (DNA) sequencing data.
The organization of this paper is as follows. Section 2 presents how the SNP genotype data can be organized in a contingency table. Different ways of testing association for both the ordered and unordered genotype data are outlined in Section 3. The description of the genotype data used in this analysis is provided in Section 4. Section 5 presents the results obtained from the analysis of SNP genotype data using the tests described in Section 3. Finally, some concluding remarks are given in Section 6. Table   The contingency table is

Tabular Presentation of SNP Genotype Data: Contingency
The statistic in Equation (1) follows χ 2 -distribution with 2 d.f., where, 1, 2,3, 4,5, 6 n = presents the number of cells in the table (Table 1). The test statistic defines (Equation (1)) the summation over all the six cells of the table, where, O i represents the observed cell counts for each of the six cells: n 1 , n 2 , n 3 , n 4 , n 5 , n 6 . Here, under the null hypothesis of no association, the test statistic compares the observed number of M/M genotypes in cases with the corresponding expected assuming that the relative allele (or genotype frequencies) to be the same in case and control groups for the M/M genotype [4] [20].
Usually GWAS emphasis on the associations between single SNP and a trait viz. major human diseases. The association study includes 2× 3 contingency table (Table 1)

Testing for the Ordered Genotypes
Pooling the genotype counts: For dominant and recessive models The concept of ordering genotypes based on the disease risk is not considered in the above χ 2 -test (Section 3.1). The disease risk for an individual is defined from the genotype or allele at a specific marker. In the above χ 2 -test, the independence between the binary phenotype and the individual genotypes was assumed. But, in practice, the risk of developing a particular disease for each person is not the same as different person have different genotype combination, where the frequency of the disease responsible allele would not be the similar.
This order or trend of genotypes can be included in the association tests of contingency tables by specifying the disease penetrance with respect to a penetrance model. Rearrangement or pooling of the genotype counts is one way to consider this order in association studies [19].
The full genotype table for a general genetic model provides the unordered genotype counts for a single SNP (Table 1). Let us demonstrate how to include the concept of penetrance by rearranging the counts of Table 1, which specifies a genetic model at prior, where, "m" is the disease responsible allele.
If the hypothesis is, carrying any number of copies of allele "m" increases the disease risk, then the assumed model is dominant. This implies that one or two copies of disease responsible allele are required to increase the risk of an individual. Hence, the counts for the M/m and m/m genotypes are to be pooled in Table 1, and thus produce a 2 × 2 table of genotype counts for the dominant model ( Table 2). On the other hand, if the hypothesis is, carrying two copies of disease responsible allele "m" increases the disease risk for an individual, then the assumed model is recessive. So, the counts for the M/M and M/m genotypes are to be pooled in Table 1, and thus produce a 2 × 2 table of genotype counts for the recessive model (Table 3). A χ 2 -test with one (1) d.f. is used for these 2 × 2 tables (Table 2 and Table 3) of case-control allele counts, which is the widely used allelic association test.
The statistic in Equation (2)  The three common choices for the scoring system with the reference of the model definition as given above in terms of the penetrance parameter (γ) are: Additive score: 0 0

Genotype Data Preparation
First, the SNP genotype data for single SNP was generated via computer simulation in R programming language for 3,000 individuals. Individuals were then assigned at random to the cases and controls with the equal probabilities of cases and controls: (0.5, 0.5) (Data 1). Then, another data for each of the gene was generated through the simulation in R for the same number of individuals with the random assignment of the equal probabilities to the cases and controls as for Data 1 (Data 2).

For Unordered Genotypes
The following 2 × 3 contingency table (Genotype Table 1) is presenting genotype counts for a randomly selected gene with single SNP, constructed from Data 1 (Section 4). Genotype

For Ordered Genotypes
Using pooling for dominant and recessive models In order to include the ordering of the genotypes in association tests for the  Table 1 are rearranged (Genotype Table 2 and Genotype Table 3) according the definition given in Section 3.2.
Genotype The 1 d.f. χ 2 -tests are applied for testing association of the above two tables (Genotype Table 2 and Genotype Table 3), and the recorded p-values are summarized in Table 5.
Setting score vectors of penetrance models Assigning score vectors according to the mode of inheritance is the alternative way to include the trend in association testing (Genotype Table 4). The Cochran-Armitage trend test is applied here for testing genetic association (Section 3.2).
Genotype Table 4 The p-values obtained from these tests are summarized in Table 5 along with the p-values from the above tests of pooling genotypes.

Features for the Genes with Multiple SNPs
To extract the gene wise features of the two ways (pooled; scoring), the dominant and recessive models were applied for the genotype data (Data 2). Three genes having multiple SNPs were selected randomly from Data 2 (Section 4).
There are 3, 5 and 1,000 SNPs in the selected 3 genes, GENE1, GENE2 and GENE3, respectively. Individual SNP tests were performed for each of the three genes using 1 d.f. tests by considering both of the above mentioned methods that is pooling the genotypes and assigning the score vectors for the genotypes according to the definition of the dominant and recessive models, respectively. The p-values for GENE1 and GENE2 are shown in Table 6. Gene wise features for the methods were also investigated for the GENE3 that has huge numbers of SNPs (1000). The p-values from this investigation are presented in Figure 1. Here, the p-values are plotted in the negative-log-transformed scale (−log 10 (p)). The p-values obtained from the two methods are plotted in the two panels of Figure 1, where, Figure 1(a) for the dominant model and Figure   1(b) for the recessive model.  (Table 6), and also for the single SNP cases (Table 5).

Conclusion
This paper presents the possible ways of considering genotype ordering in contingency table tests of genetic association by applying trend test. Though, this research used simulated genotype data, but, the methods could also be applied for the real genotype data. As the basic structure of the simulated and the real data are the same, so, the directions or pattern of the obtained results would be the same in both cases. Both the mathematical and practical demonstrations are provided here. Polling of the genotype counts and assigning the score of the genotypes of a contingency table are two possible ways to consider the trend or

Conflicts of Interest
The author declares no conflicts of interest regarding the publication of this paper.