Implicit Hypotheses Are Hidden Power Droppers in Family-Based Association Studies of Secondary Outcomes

Family-based tests of association between a genetic marker and a disease constitute a common design to dissect the genetic architecture of complex traits. The FBAT software is one of the most popular tools to perform such studies. However, researchers are also often interested in the genetic contribution to a more specific manifestation of the phenotype (e.g. severe vs. non-severe form) known as a secondary outcome. Here, what we demonstrate is the limited power of the classical formulation of the FBAT statistic to detect the effect of genetic variants that influence a secondary outcome, in particular when these variants also impact on the onset of the disease, the primary outcome. We prove that this loss of power is driven by an implicit hypothesis, and we propose a derivation of the original FBAT statistic, free from this implicit hypothesis. Finally, we demonstrate analytically that our new statistic is robust and more powerful than FBAT for the detection of association between a genetic variant and a secondary outcome.


Introduction
The aim of genetic epidemiological studies is to identify the genetic factors influencing the development of common diseases.Genetic epidemiology combines classical epidemiological data (assessment of risk factors known to affect the expression of the phenotype studied) and genetic information (familial relationships, typing of genetic marker) and proposes a large range of tools to address the initial question, the use of one depending on the nature of your sample and the size of your wallet.Over the past ten years, however, our understanding of the pattern of genetic variation at the genome scale, coupled to an unprecedented decrease in the cost of measuring this variation, has put (genome-wide) association studies at the front.Although the vast majority of genetic association study designs are derived from usual case-control retrospective epidemiological studies (i.e. that compare the distribution of allelic/genotypic frequencies between a group of cases and a group of controls), one is quite specific to the field of genetic epidemiology and relies on the collection and analysis of families.Such family-based tests of association between a genetic item (allele, genotype...) and the disease under study offer interesting features as compared to case-control designs (Laird and Lange [1]; Chen and Abecasis [2]).They are robust against population stratification, allow the inference of both haplotype phase and missing genotypes (Chen and Abecasis [2]; Burdick et al. [3]), and can identify peculiar allelic segregation, for example, due to imprinting effect (Vincent et al. [4]).
The Transmission Desequilibrium Test (TDT) has emerged as the first popular family-based test of association (Spielman et al. [5]).It tests whether the transmission of a given allele from a heterozygote parent to an affected child is different from what is expected in the absence of any association between the genetic marker and the disease under study.The null hypothesis is written as p = 0.5 where p is the proportion of a given allele that has been transmitted to affected children by heterozygote parents.Whereas the TDT could only analyze binary traits in samples of pure trios (i.e. two parents and a single affected child), Laird et al. [6] proposed a more comprehensive approach designed to handle binary, quantitative or censored traits, multiple genetic models (e.g.additive, dominant or recessive) and more complex family structures (e.g.families with multiple children).This approach uses a natural measure of association between two variables, i.e. the covariance between phenotypes and genotypes, and relies on a score-test.It has been implemented in the popular Family Based Association Test software (FBAT, Laird et al. [6]; Rabinowitz and Laird [7]; Lange and Laird [8]).In this context of familial samples, FBAT has proved very efficient in identifying alleles associated with many phenotypes, whether binary or quantitative (e.g.Mira et al. [9]; Cobat et al. [10]).
Although developed to handle a large variety of tests according to the nature of both the traits and their genetic determinants, it is intrinsically designed to test primary outcomes (e.g.affected vs. unaffected) as the null hypothesis is based on the same underlying principles as the TDT (i.e.p = 0.5).However, in many cases researchers are interested in the genetic contribution to a more specific phenotype (e.g.severe vs. non-severe form), here denoted as a secondary outcome.Here, what we demonstrate is the limited power of the classical formulation of the FBAT statistic to detect the effect of genetic variants that influence a secondary outcome, in particular when these variants also impact on the onset of the disease, the primary outcome.We prove that this loss of power is driven by an implicit hypothesis and we propose a derivation of the original FBAT statistic, free from this implicit hypothesis.Finally, we demonstrate analytically that our new statistic is robust and more powerful than FBAT for the detection of association between a genetic variant and a secondary outcome.

Original FBAT Statistic
For sake of simplicity and without major loss of generality, we consider the analysis of a diallelic marker in a sample of trios with no missing parental data under an additive genetic model.Using the same notations as in the original FBAT paper (Laird et al. [6]), in which i X represents the genotype at the locus being tested and i T the phenotype of the child of family i .The expectation of i X is calculated conditioned on the parental genotypes under the null hypothesis of no as- sociation.

( ) ( )
X is the number of copy of the allele under study (0, 1 or 2).As the most com- mon way to code the phenotype is 1 T = for affected individuals and 0 T = for unaffected ones.In a sample with no missing parental data, unaffected individuals do not contribute to the statistic; however, in the presence of missing parental data, such unaffected individuals will indirectly impact on the statistic as they can be used to infer missing parental genotypes under some conditions (Knapp [11]).S is generally written as: The null hypothesis of no association between the phenotype and a given allele is the random transmission of this allele from heterozygote parents to (affected) children.By noting p the transmission probability of this allele, the null 0 H and alternate 1 H hypotheses can be written as: The tested allele will be considered "at risk" or "protective" for the disease, if

FBAT Statistic to Test Secondary Outcomes
It is common practice to study a "primary" phenotype (e.g.disease yes/no) but as stated in the introduction, researchers are often interested in the genetic contribution to a "secondary" phenotype (e.g.severe vs. non-severe form of the disease).At first glance, FBAT could be used to test this hypothesis by computing the original statistic independently in the two modalities of the secondary outcome (e.g.severe and non-severe).Denoting 1 D and 2 D the two modalities of the secondary outcome, 1 p and 2 p the transmission probabilities of the tested allele to 1 D and 2 D children, respectively, we have: ( ) . 2 H p ≠ 1 More precisely, in the general case, the null hypothesis of FBAT is "no association OR no linkage" and therefore the alternate hypothesis is "association AND linkage".H 0 can be written as a composite hypothesis: "no association AND no linkage" ∪ "no association AND linkage ∪ "association AND no linkage".In the particular case of a sample limited to trios, there is no linkage information, and the hypotheses are: However, because of the bivariate nature of the phenotype under study (i.e.disease AND severe form or disease AND non-severe form), rejection of the null hypothesis cannot distinguish between alleles associated with the disease per se (i.e.independently of its severity) or alleles specifically associated with the severity of the disease.FBAT offers no immediate solution to study such secondary outcomes, i.e. to distinguish between alleles impacting the primary (e.g.disease per se) or the secondary (e.g.severe vs. non-severe) outcome.Below we propose two new tests denoted as FBAT het and FBAT het free that can be used to directly assess the association between a marker allele and a secondary outcome.

The FBAT het Test
A first straightforward idea is to perform a homogeneity test of the allelic transmission rate between the two subgroups 1 D and 2 D .
Let FBAT , homogeneity , FBAT het = FBAT with the phenotypes coded as Indeed, The two hypotheses can then be written as: Note that under an additive genetic model and in a sample of trios with no missing parental data, coding , where 1 n and 2 n are the number of heterozygote parents of children with phenotype 1 D and 2 D (see Appendix A) 2 .
2 FBAT het can be implemented in FBAT by using the offset option "-o" while coding 1 1 T = and 2 0 T = : the software then calculates, for each allele, an offset µ used to transform the phenotypic values in 1 1 T µ = − and 2 T µ = − that minimizes the variance of the statistics.
We show in Appendix B that using the offset option is equivalent to coding Here, one should not code unaffected individuals as 0 but as missing to avoid that the controls interfere in the calculation of the statistics.FBAT software can be downloaded from: http://www.biostat.harvard.edu/fbat/fbat.htm.

The FBAT het free Test
A somewhat hidden/under evaluated constraint of FBAT het is that the null hypothesis forces the transmission probabilities in both groups to be 0.5.Although valid and likely efficient in quite a number of practical situations, this can dramatically impact the power of the test in the study of a secondary outcome.A simple example being that carrying one copy of the allele is sufficient to develop the disease per se but that carrying two alleles will be associated with developing a severe form of the disease.
We propose a new statistic denoted as FBAT het free that relaxes this 0.5 constraint.Consider a diallelic locus (A and a ) and denote in FBAT het free (Figure 1).Note that for all three statistics, the expectancy and variance of a trio including two heterozygote parents are twice those of a trio with only one heterozygote parent.Symmetrically, Aa heterozygote parents transmitting allele a each contributes for 1/2 and 1 A n N to S E − , and for 1/4 and 1 to V in FBAT or FBAT het and FBAT het free , respectively.Then with It is shown in Appendix C that FBAT het free is a Pearson's chi-squared test.In summary, the hypotheses of the FBAT het free test can be written as: : .

H p p ≠
As opposed to FBAT and FBAT het , the implicit/hidden 0.5 constraint has disappeared.

Comparison of FBAT het and FBAT het free
To illustrate the magnitude of the differential power of FBAT het and FBAT het free , we could have gone for large simulation studies.However, we show analytically in Appendix D that:  A among all affected individuals ( )

Discussion
Family-based association studies have gained popularity to dissect the genetic architecture of complex traits and FBAT is likely the most popular tool to perform such studies.We have shown that at first glance it can be conveniently used to test for secondary outcomes, e.g.genetic heterogeneity between severe and non-severe forms of a disease.As an example, in a sample of trios, one can weight each "sub-phenotype" (severe and non-severe) by the inverse of the variance of each statistic.We called this test FBAT het , for which the null and alternative hypotheses are 0 1 However, in the previous test, the transmission probabilities under the null hypothesis are fixed to 0.5 in both groups.This may not be optimal in the context of secondary outcomes when the transmission of the tested allele has already been found to significantly differ from 0.5 with respect to the primary outcome.We show that it is possible to relax this constraint by modifying the expectation in the FBAT het statistic so that the test is defined as 0 1 2 : H p p = and 1 1 2 : H p p ≠ , which are the classical hypotheses in the vast majority of homogeneity tests.This new test, FBAT het free , is proven to be equivalent to a classical test for homogeneity.FBAT het free is the most powerful test when the mean transmission to affected children ( 1 2

D D
+ , primary outcome) is not 0.5.Stated differently, each time an allele is found associated with the disease per se, FBAT het free will be the most powerful to detect heterogeneity between the transmission rates of this allele across the modalities of the secondary outcome.
For sake of simplicity, we have derived our main statistic FBAT het free in the context of the analysis of a diallelic marker under an additive genetic model in a sample of trios with no missing parental data.However, generalization to other genetic models and more complex family structures should be possible by using, for a given marker, the estimated mean transmission of the allele under study among affected individuals, in preference to the actual 0.5 that prevents testing 1 2 p p = .By doing so, one will be able to take advantage of all the features of FBAT ranging from the analysis of all kinds of phenotypes to the simultaneous testing of several alleles either in a classic multivariate way or taking into account the phase through haplotypic analysis.With the notations used in the main text, for FBAT het , , thus testing for secondary outcome.

Figure 1 .is in Figure 2 .and 1 ρ
Figure 1.Contribution of a trio to FBAT, FBAT het and FBAT het free according to the number of heterozygote parents.In a trio with one (left panel) and two (right panel) heterozygote parents, the expected genotypes aa, Aa and AA of the child will vary according to the statistics used.In FBAT and FBAT het , the transmission probability of an allele A from an heterozygote parent is 1 2 , whereas it is A n N for FBAT het free (with N denoting the total number of alleles transmitted from heterozygote parents in the whole sample, A n the number of alleles A transmitted, and A n N the mean transmission of allele A).

Figure 2 .
Figure 2. Distribution of ρ according to A n N .

Figure 3 .
Figure 3. Power of FBAT het vs. FBAT het free according to the mean transmission rate of the tested allele among the affected children.and constructive criticism.JG is funded by the Fondation pour la Recherche Médicale, and QV by the Institut Imagine.This work was supported by the Programme Blanc de l'Agence National de la Recherche.
That FBAT free = ρFBAT het free Whereas in the above-mentioned FBAT and FBAT het tests the expected transmission of the allele of interest under the null hypothesis of no association is 0.5, in FBAT het free it is A n N .We can calculate S , E and V for FBAT, FBAT het and FBAT het free .The contribution to S E − of each transmission of an allele A from any