^{1}

^{1}

^{1}

Backgrounds: Although many disease-associated common variants have been discovered through genome-wide association studies, much of the genetic effects of complex diseases have not been explained. Population-based association studies are vulnerable to population stratification. A possible solution is to use family-based tests. However, if tests only estimate the genetic effect from the within-family variation to avoid population stratification, they may ignore the useful genetic information from between-family variation and lose power. Methods: We have developed an adaptive weighted sum test for family-based association studies. The new test uses data driven weights to combine two test statistics, and the weights measure the strength of population stratification. When population stratification is strong, the proposed test will automatically put more weight on one statistic derived from within-family variation to maintain robustness against spurious positives. On the other hand, when the effect of population stratification is relatively weak, the proposed test will automatically put more weight on the other statistic derived from both within-family and between-family variation to make use of both sources of genetic variation; and at the same time, the degrees of freedom of the test will be reduced and power of the test will be increased. Results: In our study, the proposed method achieves a higher power in most scenarios of linkage disequilibrium structure as well as Hap Map data from different genes under different population structures while still keeping its robustness against population stratification.

In past decades, many disease-associated common variants have been discovered through genome-wide association studies (GWASs). However, the majority of the genetic effects of complex diseases still cannot be explained. Recent advances in next-generation sequencing technologies provide new opportunities to study the genetic effects of low- frequency variants and rare variants. Many of those complex-trait rare-variant association studies are population based [

The test is proposed for family-based association studies of quantitative trait in either a candidate region study or a genome-wide scan. The data-driven weights are based on a measure of population stratification. Since population stratification and linkage disequilibrium (LD) cause a bias for the estimate, a permutation procedure is employed to find the p-value. Extensive simulation studies are carried out under various LD structures as well as Hap Map data from different genes under different population structures. In these simulation studies, we examine the Type I error rate and compare the power of the proposed method with other FBAT tests. Simulation results show that the proposed method has a correct Type I error rate and consistently achieves a higher or similar power in all scenarios. In summary, we believe the adaptive weighted sum based FBAT is a potentially powerful method for family-based genetic study of multiple markers and it can also be used as an alternative tool for the detection of underlying causative genetics variances.

In family-based association studies, FBAT, a general unified approach, has been proposed to permit any type of genetic models, a general family design, different phenotypes and multiple markers [_{MM} [_{LC} [_{LC} are the estimates of genetic effect considering between-family variation. It is a biased estimator and is sensitive to population structure. We investigate the data-driven weights used in FBAT_{LC} and provide a new methodology to analyze the multiple correlated markers for family-based association studies.

We use FBAT_{WS} to denote the new test. It is based on weighted sum of two association tests. One of which estimates the genetic effect from both within-family and between-family variation and the other is from within-family variation only. The weights are computed automatically based on a measure of the population stratification str- ength in family data. If the strength of the population stratification is strong, including between-family variation will produce false positives. At this time we need to decrease the weight of the test estimating the genetic effect from both within-family and between-family variation, and increase the weight of the other test to reduce false positive rates. If the strength of the population stratification is weak, it will not produce much false positive. Including between-family variation will increase power of the test, and at the same time it will not produce much false positive. That is why we want to increase the weight of the test estimating the genetic effect from both within-family and between-family variation. The proposed method can capture more important information from multiple loci in the family data while maintaining robustness to population stratification. Since population stratification and linkage disequilibrium cause a bias for the estimate, a permutation procedure is employed conditional on the traits, parental genotypes, and haplotypes.

The general idea of FBAT [

Following the standardized FBAT [

With a large number of families, FBAT statistic for the kth marker:

is approximately N(0,1).

Another approach to the multi-marker family-based association testing is to linearly combine single-marker test statistics using data-driven weights (FBAT_{LC}) [

where

for the others (include offspring in the non-informative families and all parents).

Let _{LC} test statistic:

is approximately N(0,1), where

Although the data-driven weights are independent of Z under _{LC} will be highly dependent on the estimate of the optimal weights. In the conditional mean model, the weights are estimates of genetic effects using population data, which can be regarded as estimates of the genetic effects using between-family variation. It has been shown that this estimator is biased unless there is no population stratification. Intuitively, the more accurate the estimate is, the closer the weights to the optimal weights, and the more power the test can gain. However it will lose power if the effect of population stratification is significant. Thus, we proposed a new multi-marker test FBAT_{WS} using adaptive weights to combine two test statistics based on the estimate of the existing population stratification.

The strength of population stratification will be measured by

where

Under the null hypothesis: no genetic effect and no population stratification, _{WS} will automatically put more weight on the second term to maintain robustness against spurious positives. On the other hand, when the effect of population stratification is relatively weak, FBAT_{WS} will automatically put more weight on the first term to make use of both sources of genetic variation: between-family and within-family. In latter case, the degrees of freedom of the test will be reduced, and power of the test will be increased. Because LD structure will be maintained in the permutation procedure, in order to improve the computational efficiency, FBAT_{WS} does not consider LD structures.

The second term

covariance matrix

_{WS}. For each child with fixed trait in any family, each parental haplotype is transmitted to the child with equal probability, so that, for any given parental hypostyles, there are four different permutations of the data. When the parental haplotypes are unknown, inferring haplotype is needed. There are several methods to infer haplotypes. For example, Thunder [

In the simulation study, we apply the proposed test FBAT_{WS} on two sets of data. One is simulated with six scenarios of LD structure. The other is downloaded haplotype data from 170 unrelated samples of JPT + CHB (Japanese in Tokyo, Japan + Han Chinese in Beijing, China) in the HapMap3 Phased Haplotypes. We compare the power of the proposed test FBAT_{WS} with the following three FBAT tests: 1) the single-marker test with Bonferroni multiple testing adjustment FBAT_{B} the Bonferroni adjusted p-value _{MM} [_{LC} [

One goal of the simulation study is to examine whether the proposed multi-marker test is robust to the underlying LD structure. We consider six different LD structures and assume additive genetic effect. A target region with eight observed SNPs and an unobserved causative SNP in the middle is simulated. For each nuclear family, both parental haplotypes for nine correlated SNP markers are simulated on the basis of a multivariate normal distribution with LD structure

The quantitative phenotype of each individual is determined by:

where

k | 1 | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|---|

0.4 | Unif (0.3, 0.7) | 0.4t | Unif (0.3, 0.7) t |

Next, our simulation study will be based on real LD structure. We download haplotype data from 170 unrelated samples of JPT + CHB (Japanese in Tokyo, Japan + Han Chinese in Beijing, China) in the HapMap3 Phased Haplotypes. We consider three genes CHI3L2 (in the region of 15.78 kb), CTLA4 (in the region of 10 kb) and IL21R (in the region of 47.69 kb), which have also been analyzed in other simulation studies [

Type I error rate for the case of six mimicked LD structures is shown in

Four FBAT tests are considered for power comparisons with six different LD structures. The unobserved casual SNP has an equal chance to be positively or negatively correlated to those observed SNPs in all scenarios. In _{B} (B), (MM), FBAT_{LC} (LC), and FBAT_{WS} (WS) are indicated by the blue dot-dashed line, the green dotted line, the red dash line, and the black solid line, respectively. In the first simulation study, the goal is to compare the performance of the proposed method with other FBAT methods. We fix the window size for each scenario and assume the sample come from the same population. An examination of the results show that FBAT_{WS} has a consistently higher power in all cases, followed by FBAT_{LC}, FBAT_{MM} and FBAT_{B} FBAT_{B} is considered as the most conservative test in this study, because the independent assumption is violated. The power of FBAT_{MM} is improved since it considers the variance-covariance matrix. On the other hand, it also suffers from the relatively high degrees of freedom, especially when the region under consideration is large. The power of FBAT_{LC} is improved since it has only one degree of freedom, it uses the optimal weights to combine single-marker tests, and it overcomes the degrees of freedom problem raised by FBAT_{MM}. In a genetic region with strong LD, we do not have any clue of how the underlying casual marker is related to the observed SNPs. The optimal weights in FBAT_{LC} are biased estimates of genetic effects [_{LC} will lose some power. The power of FBAT_{WS} is improved since it not only considers the optimal weights to combine single-marker tests like FBAT_{LC}, but also automatically adjusts the weights based on the estimate of the genetic effect from between-family variants and within-family variants.

Type I error rates for the simulated HapMap data on CHI3L2, IL21R, and CTLA4 are given in

LD | LD = L1 | LD = L2 | LD = L3 | LD = L4 | LD = L5 | LD = L6 |
---|---|---|---|---|---|---|

B | 0.047 | 0.036 | 0.051 | 0.042 | 0.052 | 0.039 |

MM | 0.047 | 0.045 | 0.068 | 0.054 | 0.057 | 0.050 |

LC | 0.050 | 0.057 | 0.058 | 0.045 | 0.055 | 0.047 |

WS | 0.052 | 0.052 | 0.059 | 0.038 | 0.052 | 0.048 |

B, MM, LC, WS indicates FBAT_{B}, FBAT_{MM}, FBAT_{LC}, FBAT_{WS}, respectively. L1, L2, L3, L4, L5, L6, indicate six scenarios of LD structure given in

significance. We also found that FBAT_{B} has a lower type 1 error rate than other tests, because the strong LD structure existed in all three regions. The results of power comparison in one population and two populations are shown in

CHI3L2 | CTLA4 | IL21R | CHI3L2* | CTLA4* | IL21R* | |
---|---|---|---|---|---|---|

B | 0.023 | 0.024 | 0.027 | 0.029 | 0.026 | 0.034 |

MM | 0.049 | 0.036 | 0.041 | 0.051 | 0.040 | 0.042 |

LC | 0.044 | 0.035 | 0.042 | 0.045 | 0.050 | 0.039 |

WS | 0.040 | 0.037 | 0.037 | 0.037 | 0.041 | 0.054 |

B, MM, LC, WS indicates FBAT_{B}, FBAT_{MM}, FBAT_{LC}, FBAT_{WS}, respectively.

Four FBAT tests are considered for power comparisons under different LD structures of three genes CHI3L2 (in the region of 15.78 kb), CTLA4 (in the region of 10 kb) and IL21R (in the region of 47.69 kb). The unobserved casual SNP is randomly selected in all scenarios. In _{B} (B), FBAT_{MM} (MM), FBAT_{LC} (LC), and FBAT_{WS} (WS) are denoted by the blue dot-dashed line, the green dotted line, the red dash line, and the black solid line, respectively.

We consider all samples from one population first. The power of FBAT_{WS} is relatively high in most scenarios. For gene CHI3L2, where SNPs are dense and highly correlated with each other, FBAT_{WS} is the most powerful test, followed by FBAT_{WS}, FBAT_{MM} and FBAT_{B} when the heritability is relatively low. As heritability increasing, the power of FBAT_{MM} is the highest, and FBAT_{WS} is the second among all tests. This implies FBAT_{WS} is more sensitive to the genetic effect with low heritability. FBAT_{MM} is adept to deal with genetic region with strong LD and high heritability. For the gene CTLA4, where the number of markers is relatively small and LD pattern is relatively weak, FBAT_{WS} is again the most powerful test, followed by FBAT_{LC}, FBAT_{B} and FBAT_{MM}. For the gene IL21R, where SNPs are loose and LD pattern is relatively weak, FBAT_{WS} is the most powerful test, followed by FBAT_{B}, FBAT_{LC}, and FBAT_{MM}. For genetic region with weak LD like CTLA4 and IL21R, FBAT_{MM} lose its potential power due to the issue of degrees of freedom. In all scenarios of two populations, the results are similar that FBAT_{WS} is the most powerful test except for simulated data based on gene CTLA4 with high heritability. In practice, most undiscovered genetic variants have low heritability. The power of tests depends on the LD patter. In general, FBAT_{WS} automatically adjusted the weights to combine the estimates of genetic effect from various source of genetic variants, therefore is a powerful test for family-based association studies. It is robust to population stratification and the underlying LD structure. Our simulated results demonstrate that V is a potentially powerful test among multi-marker tests.

We propose a novel multi-marker family-based association test for multi-marker testing using data-driven weights to automatically combine statistics, which are based on different sources of genetic variation. One of the statistics comes from the estimation of the genetic effects from both within-family and between-family variations, which is more like a population-based statistic. The other is from estimation of within-family variation, which is a family-based statistic. The data driven weights are computed automatically, and they measure the strength of the population stratification existed in the family data. The advantage of family-based studies is its ability to avoid spurious positives caused by population stratification. For the FBAT test, we regard the offspring genotypes as a random variable given trait and parental genotypes or haplotypes. On the other hand, FBAT tests do not consider the genetic information from between- family variation, since those can raise the issue of population stratification. By using adaptive weighted sum to combine this information efficiently into the test statistics can improve the power of the test.

The proposed method tries to use the most information of genetic variance for family based association studies. Data driven weights are employed to make our test robust to population stratification and linkage disequilibrium between multiple markers. Since population stratification and linkage disequilibrium cause the bias of the estimation, a permutation procedure is employed and descried for this situation. The new test is a potentially powerful method for family-based genetic study of multiple markers by considering genetic variance in different aspects and can also provide an alternative tool for the detection of underlying causal genetics variances. In our simulation studies using mimicked LD patterns and three genes from HapMap data, the results show that the proposed test achieves a higher power in most scenarios than the single-marker test with Bonferroni correction, the multi-marker test similar to the Hotelling

Jiang, R.F., Dong, J.P. and Dai, Y.L. (2016) An Adaptive Weigh- ted Sum Test for Family-Based Multi-Mar- ker Association Studies. Open Journal of Genetics, 6, 61-73. http://dx.doi.org/10.4236/ojgen.2016.64007

LD: Linkage disequilibrium,

GWASs: Genome-wide association studies,

FBAT: Family-based association test,

GEE: Generalized estimating equation,

FBAT dosage: Imputing allele dosages in FBAT,

FBAT_{MM}: Multi-marker family-based association test,

FBAT_{LC}: Linearly combined single-marker test statistics,

FBAT_{WS}: Proposed test in this article,

FBAT_{B}: Single-marker test with Bonferroni multiple testing adjustment,

SNP: Single-nucleotide polymorphism.

Submit or recommend next manuscript to SCIRP and we will provide best service for you:

Accepting pre-submission inquiries through Email, Facebook, LinkedIn, Twitter, etc.

A wide selection of journals (inclusive of 9 subjects, more than 200 journals)

Providing 24-hour high-quality service

User-friendly online submission system

Fair and swift peer-review system

Efficient typesetting and proofreading procedure

Display of the result of downloads and visits, as well as the number of cited articles

Maximum dissemination of your research work

Submit your manuscript at: http://papersubmission.scirp.org/

Or contact ojgen@scirp.org