Multivariate Asthma Phenotypes in Adults : The Quebec City Case-Control Asthma Cohort

Background and Objectives: Asthma is a heterogeneous disease where patient severity can be classified according to various models based on numerous variables. Large collections of well-phenotyped subjects are needed to find distinct clusters of patients for personalized medicine and future genetic studies. The objective of this study is to describe the collection of the Quebec City Case-Control Asthma Cohort and to identify homogeneous subgroups of asthma patients based on clinical characteristics. Methods: This cohort is part of an ongoing project initiated in 2007 to elucidate the genetic basis of asthma. All subjects are randomly recruited at the same site following advertisements. Subjects are unrelated French Canadian white adults 18 years of age or older. Each participant underwent a spirometry, methacholine challenge, and allergy skin-prick tests. Blood was collected for DNA, cell counts and total serum IgE measurements. So far, 982 subjects have been recruited and classified as cases (n = 566) or controls (n = 416). We performed factor and cluster analyses on collected phenotypes from this set to identify subgroups of phenotypically similar asthmatic patients. Results: Factor analysis with 13 variables led to the selection of five factors: lung function, numbers of allergens, blood eosinophil percentage, smoking status and age. K-means cluster analysis on the reduced dataset produced four significantly different groups with the following characteristics: smoking history, low atopy and low lung function, high atopy, and young non-smoking with average atopy. Conclusions: The Quebec City Case-Control Asthma Cohort is a new resource for local and collaborative clinical and genetic research on asthma. This new collection reveals distinct multivariate phenotypes of adult asthma that are likely to be important for future genetic studies and targeted therapies.


Introduction
Large collections of participants with and without asthma are the essence of clinical and genetic research in asthma.Increasingly large-scale collaborative studies are performed to identify the genetic architecture of asthma [1,2].Despite these major efforts, only a fraction of the heritable components of asthma has been identified and much remains to be discovered.Pooling resources as accomplished by the European-based GABRIEL [1] and the US-based EVE [2] consortiums are essential to elucidate the genetics of asthma.However, asthma is a heterogeneous disease [3].Pooling resources to increase sample size is required, but subgrouping phenotypes is likely to be as much important to understand the mo-lecular basis of asthma.
Accurate description of asthma phenotypes is an ongoing challenge [3][4][5][6].Cluster analysis aggregating phenotypes without a priori assumptions has become an optimal choice to define asthma [7].Cluster analysis can offer clinicians and researchers a better insight on the multiple phenotypes present in asthma patients for targeted therapies and future genetic studies in more homogenous subgroups of patients.So far, heterogeneity of the disease has led to various subgroupings of asthma phenotypes in adults [5,6,8,9] and children [10,11].Distinct adult airway limitation phenotypes were described [12], but adult asthma phenotypes still lack consensus for both severe and mild to moderate asthma.
Here we describe an ongoing effort to build a new case-control cohort to study the genetics of asthma in É. LAVOIE-CHARLAND ET AL. 134 Quebec City.The main goal of the Quebec City Case-Control Asthma Cohort is to drive our local genetic research program on asthma, but also to serve a wide range of purposes from a replication cohort for others' hypothesis-driven research to participation in large-scale genome-wide testing collaborative efforts nationally and internationally aiming to improve our molecular understanding of asthma.In this report, we focus on the phenotypes of participants enrolled in the Quebec City Case-Control Asthma Cohort.We aim to describe the collection and clinical characteristics of participants and to perform cluster analyses with this cohort of mild to moderate adult asthma.In addition, we want to establish subgrouping phenotypes of asthma that will be used in future clinical and genetic studies using the Quebec City Case-Control Asthma Cohort.

Study Subjects and Clinical Data
Asthmatic and non-asthmatic unrelated French Canadian (white) subjects 18 years and older are recruited from local advertisements in media or following a visit to the respiratory clinic to participate in other research projects.Patients with chronic obstructive pulmonary disease, systemic inflammatory diseases, and body mass index (BMI) above 40 kg/m 2 are excluded.Personal health information, medication use, and clinical test measurements are collected in a local electronic database.A blood sample is collected and stored at the Institut universitaire de cardiologie et de pneumologie de Québec (IUCPQ) site of the Respiratory Health Network Tissue Bank of the Fonds de recherche du Québec-Santé (www.tissuebank.ca).The study protocol was approved by the Research Ethics Board of the IUCPQ.All participating subjects signed an informed consent approved by the REB.Subjects are de-identified using a code number for confidentiality.Access to data is protected using the data management structure approved by the REB.

Spirometry
All patients undergo pulmonary function testing using a spirometer (Medi-Soft Exp'air impression, Roxon meditech ltd, Montreal).Spirometry is carried out according to the American Thoracic Society guidelines [13,14].Instructions to perform the forced expiratory maneuver are given to patients by an experienced research nurse and three reproducible measurements of forced expiratory volume in 1 sec (FEV 1 ) and forced vital capacity (FVC) are obtained.Predicted lung function values are obtained from Knudson et al. [15].

Methacholine Challenge
Airway responsiveness to methacholine is measured us-ing the 2-min tidal method [13].Briefly, after baseline measurements of FEV 1 and FVC, each subject inhaled saline (0.9%) followed by consecutive doubling concentrations of methacholine between 0.03 to 64 mg/mL.Methacholine aerosols are inhaled for 2 min at 5-min intervals.The FEV 1 is measured 30 and 90 seconds after each inhalation or every two minutes until the FEV 1 improved compared with the precedent.The test is stopped when the FEV 1 falls by more than 20% compared to baseline or at a concentration of 64 mg/mL for negative responders.An acceptable-quality FEV 1 needs to be obtained at each time point, otherwise, a second attempt is performed.We used the lowest FEV 1 value after saline inhalation and the lowest FEV 1 value after each dose to calculate the fall in FEV 1 .The reproducibility of the technique was previously reported [16].The test result is expressed as the provocative dose of methacholine inducing a 20% fall in FEV 1 (PC 20 ) and airway hyperresponsiveness (AHR) is defined as PC 20 < 8 mg/ml.

Asthma Status
Asthma diagnosis was confirmed by a physician (L.-P.Boulet and M. Laviolette) based on clinical symptoms, lung function and airway responsiveness.Asthma severity was determined according to the Canadian Asthma Guidelines [17].

Skin-Prick Tests
Skin-prick tests (SPT) with allergens were performed to measure the allergic status.A total of 25 inhalant allergens were tested including animal hair and danders (cat, dog, horse, and cattle), feather mix, house dust mix, house dust mites (dermatophagoides pteronyssinus and dermatophagoides farinae), trees (white ash, cottonwood, birch, american elm, boxelder maple, and oak), grass (grass mix, timothy, perennial ryegrass, ragweed, sagebrush-mugwort, English plantain, and cocklebur), and molds (Alternaria tenuis, Hormodendrum and Aspergillus fumigatus).Subjects are considered atopic if at least one allergen caused a wheal diameter of at least 3 mm at 10 min in the presence of a negative saline control and a positive histamine response [18].

Blood Cell Counts
Complete blood counts were determined at the Hospital's Clinical Laboratory by automated procedures.Measured parameters include total white count and counts for neutrophils, lymphocytes, monocytes, eosinophils, and basophils.

Blood Buffy-Coat
A blood sample is collected and the buffy coat is obtained after centrifugation.The buffy coat is kept at −80˚C prior to DNA extraction.The biological materials (i.e.buffy coat, DNA and serum) are kept frozen in our local biobank for future studies.

Statistical Analysis
Variables not normally distributed were log 10 transformed before analysis.The differences between cases and controls were assessed by a two-tailed unpaired Student's t-test for quantitative variables and with chi-square tests for dichotomous or categorical variables.All statistical analyses were performed using the R statistical software version 2.15.3.

Variable Reduction, Data Transformation, and
Factor Analysis 181 variables were available for analysis.Initial reduction of the dataset led to 9 continuous and 29 categorical variables.Skin prick test results were grouped by type of allergen, by total allergen count and by atopy status, in order to reduce size of the final dataset.IgE levels and percent eosinophil were log 10 transformed in order to normalize their distribution.PC 20 scores were transformed into a dichotomous AHR variable as described above.In order to eliminate the correlation between variables, factor analysis (maximum likelihood) with orthogonal varimax rotation was applied.This standard method identifies independent groups of variables.Only asthmatic patients with complete information on the remaining 13 variables were included in the factor analysis.Factor analysis was repeated for different transformations of the allergy count.Factor analysis with the allergy transformation that explained the greatest proportion of the model's variance was selected.Factors with eigenvalues greater than 1 were analysed and only variables with absolute value of loadings greater than 0.4 were considered significantly correlated to the factors.

Cluster Analysis
Cluster analyses were restricted to asthma cases with complete information for the variables selected from the factor analysis.Variables selected from the factor analysis were standardized prior to cluster analyses as recommended [19] and participants with missing data for these variables were excluded.Optimal number of clusters k was evaluated graphically by Ward's agglomerative hierarchical clustering [6,9].K-means clustering (Cluster package) separated the dataset into k clusters.Differences between clusters were evaluated by Kruskal-Wallis and chi-squared test (or exact fisher as appropriate) for continuous and categorical variables, respectively.Dif-ferences were deemed significant if the P-value was smaller than 0.05.

Results
In this initial report, we included data on 982 subjects, 566 asthmatic subjects and 416 controls recruited from October 2007 to March 2013.The recruitment rate for the full five years (i.e.2008 to 2012) averages 190 subjects per year.The rate of missing values for the variables of interest is low: no missing values for anthropometric variables, 9 (0.9%) for spirometry, 68 (6.9%) for methacholine challenge, 43 (4.4%) for IgE levels, 18 (1.8%)for blood cell counts, and 4 (0.4%) for atopy.So far, more women (n = 623) than men (n = 359) were enrolled in the cohort.
Table 1 presents the clinical characteristics of cases and controls.No difference in age or in the proportion of men and women is observed between these two groups.
The mean body mass index (BMI) is higher in cases compared to controls, but this difference is not clinically relevant.Table 2 presents asthma medication for cases.Figure 1 shows age distributions according to asthma status.As observed in this figure, a greater proportion of young adults (20 to 30 years) was recruited.
As expected, predicted lung function is lower in asthmatic subjects (Table 1).The proportion of AHR is also increased in cases compared to controls (Figure 2(a)).81.9% of asthmatic subjects and 6.9% of controls had airway hyperresponsiveness at the time of testing.Allergies are also more frequent in cases compared to controls (Figure 2(b)); 88.1% and 59.7% of asthmatic subjects and controls were atopic, respectively.In addition, subjects with atopy and asthma have a greater number of positive responses to allergens compared to control subjects with atopy (Figure 3).The greater percentage of allergic response among subjects with asthma is also observed across all allergens evaluated (Figure 4).Blood eosinophil and total IgE levels are higher in asthmatics (Table 1).There is also a significant correlation between these two variables (r = 0.33, P ≤ 0.0001).The proportion of ex-smokers is higher in patients with asthma compared to controls, but this difference is not statistically significant (Table 1).

Cluster Analysis
377 asthmatics had complete information for the 13 variables selected for factor analysis.Factor analysis (Table 3) with total count of positive allergens as the allergy transformation led to the selection of 5 factors with eigenvalues >1 which explained 53.1% of the total variance in the dataset.Factors were representative of: smoking status, lung function, white blood cell counts, allergy, and age (at evaluation and at onset).Add-on medication only 5

Rescue and ICS 124
Rescue and Add-on 6 ICS and Add-on 59 Rescue, ICS and Add-on 77

No medication 115
For each factor, the variable most highly correlated to the factor (or containing fewer missing data) was selected for further analysis: smoking status, %predicted FEV 1 , log 10 % eosinophil, number of positive allergens, and age.All highly correlated variables are in bold in Table 3.We added 145 asthmatics with complete information for these five variables to the dataset for subsequent K-means analysis (n = 522).According to the dendrogram of the agglomerative hierarchical clustering, four clusters were required to separate the 522 subjects using the standardized variables described above.
K-means clustering (Table 4) was applied on participants.Clusters 1 to 4 contain 105, 104, 138, 175 subjects, respectively, and are further described below.As expected, clusters differ significantly for at least one of the five variables selected by factor analysis.

Smoking History
20% of participants were part of this smoking cluster.78% of them were ex-smokers while the others were current-smokers.Age average was 35 and the majority of patients in this cluster were allergic to animal hair and danders.Blood eosinophil percentage (2.77%) was lower than the cohort average (3.71%).Most members of cluster 1 were diagnosed with mild asthma (71%).

Low Atopy & Poor Lung Function
This cluster of older patients (mean age of 57) included 20% of participants.52% were ex-smokers while the others were never-smokers.Percent positive allergens were lower than the asthma cohort average for all types of allergens, but % eosinophil (4.13%) was slightly higher than average (3.71%).Mean percent predicted FEV 1 was the lowest of all groups (73.9%).21 out of the 23 severe asthmatic patients included in the analysis were in this cluster.

High Atopy
The third cluster included 26% of all cases and was composed of mostly never-smokers with age average of 31.This cluster had the highest percentage of positive allergens compared to other clusters and most patients in cluster 3 were positive to animal allergens.Mean %predicted FEV 1 was lower than average and %eosinophil was the highest among clusters.57% of participants in cluster 3 were diagnosed with mild asthma.

Young Non-Smoking & Average Atopy
34% of asthmatic participants were clustered in this non-smoking cluster with a mean age of 27.Average %   eosinophil was the lowest amongst the four groups.78% of patients in cluster 4 were diagnosed with mild asthma.

Discussion
We describe for the first time the Quebec City Case-Control Asthma Cohort.Currently this cohort consists of 982 subjects including 566 asthmatic and 416 controls well-characterized for asthma and related phenotypes.All participants are white adults and more women are enrolled so far.One important objective of this cohort is to power our own clinical and genomic research program on asthma.The cohort is also planned for genetic replication purposes and participation in large-scale collaborating efforts to elucidate the genetic architecture of asthma.Pooling resources will be needed to achieve the later objective, but subgrouping asthma phenotypes will be as important.In this report, we defined four clusters of asthma patients based on demographic and clinical characteristics.Individuals within subgroups are more likely to share the same underlying molecular basis.Optimal description and classification of participants based on clinical features are necessary to identify the genetic components of asthma.Accordingly, cluster analysis is an appropriate approach to clearly define our cohort and orient further studies.Clustering techniques are based on similarities (or differences) between observations, where individuals with high similarities (or small differences) for selected variables will have a greater probability of joining the same cluster.As performed previously [8,10,20], highly correlated variables were not included in the clustering, insuring that distinct asthma factors are included in the analyses.
This was achieved through factor analysis and factors obtained reflected known aspects of asthma diagnosis.Although our method was statistically-based, it did not limit the clinical significance of the selected variables.Atopy [6], sputum eosinophilic inflammation [6,21,22] and immunoglobulin E levels [21] have been described as factors of asthma, while age and smoking are both known to affect asthma diagnosis and treatment [23].
Selecting the appropriate numbers of clusters is a challenging aspect of cluster analysis.Although numerical and objective indexes exist, they are limited to particular types of variables, datasets or cluster shapes.Visual techniques through dendrogram analysis [6] are subjective, but may be the best approach when little is known on the dataset's structure.In our study, numerical indexes were difficult to apply to our dataset since cluster shape was unknown and clustering was based on both continuous and categorical variables.In the past, hierarchical clustering has been used in a conclusive fashion to describe asthma phenotypes [10][11][12]24], which validates a hierarchical clustering determination of optimal numbers of clusters in asthma cohorts as described in this study.
This study has limitations.It is possible that other variables not measured in this study may better explain asthma even though factor analysis has provided us with five variables correlated to five independent factors.It would be insightful to compare different variables to determine which would be optimal to both represent factors of asthma and cluster phenotypes.Through our cluster analysis, we have shown that adult asthma can be characterized by 4 clusters based on 5 variables, but our results can only present a snapshot of a disease which evolves over time (e.g.immunoglobulin E levels and percent predicted FEV 1 can vary within a day).Such variations imply that asthmatic participants may cluster differently on different visits.Longitudinal studies are needed to confirm the stability of clusters over time.It should be noted that mild to moderate asthmatics compose most of our cases.At this point, we cannot evaluate the impact of severe asthma on clustering.We are currently recruiting patients with severe asthma via the hospital's asthma clinic.
It is important to note that asthma patients in this cohort are relatively young (with an average age of 36), and have a mild to moderate diagnosis of asthma and the majority are atopic (86%).These factors must be considered when our results are compared to previous clustering Blood eosinophil were log 10 transformed and variables (age, smoking status, FEV 1 , blood eosinophil, number of positive allergens) were standardized prior to cluster analysis.Statistical tests for significance are either Kuskall-Wallis for continuous data or fisher's exact test for count data.Total positive allergens were included in the analysis, but groups of allergens are presented here for information and include: animal hair and danders allergens (cat, dog, horse, cattle), tree mix allergens (white ash, cottonwood, birch, american elm, boxelder maple, and oak), grass mix allergens (grass mix, timothy, perennial ryegrass, ragweed, sagebrush-mugwort, English plantain, cocklebur), mold allergens (alternaria tenuis, hormodendrum and aspergillus fumigatus) and dust mites allergens (dermatophagoides pteronyssinus and dermatophagoides farinae).
analyses in asthma [5,6,12].The current study may help define asthma in young allergic adults.Difficulties arise when comparing the findings of our cohort with other studies, due to low resemblance in variable selections.Trends in clustering are observed with the Severe Asthma Research Program (SARP) [10] (cohort age average of 37 years).The youngest group in SARP and our study is characterized with relatively high atopy, while the older groups are characterized by lower baseline lung function.
A recent study on two independent Korean cohorts [5] presented clusters that are different compared to our study, even if variables in both analyses were similar (age, smoking status, atopy).Consensus between studies is needed to identify the best combination of variables to classify asthma patients in homogeneous and clinically relevant subgroups.Heterogeneity being a key compo-nent of asthma, phenotype alone may well be insufficient to properly describe the disease.Recent progresses in genetics of asthma, as discussed below, offer the possibility to include genetic and molecular data in multivariate phenotype of asthma and will need to be considered in future studies.
Understanding the genetic component of asthma is challenging [25].Recent GWA studies on asthma have discovered susceptibility genes robustly associated with the disease [23,[26][27][28][29][30][31][32][33][34][35][36][37][38][39][40].However, this approach is powered to detect mostly common genetic variants, underestimating rare ones [41].More comprehensive genetic studies are needed to identify the missing heritability of asthma including the rare variants [42] and structural variations [43].In addition, targeted, exome, and wholegenome sequencing are increasingly applied to refine GWAS-nominated loci or to discover new genetic variants involved in complex diseases.Again, significant progress in this area is needed in the field of asthma.Technological advances and new genomic approaches offer great promise for improving health through personalized medicine [44,45].However, none of the genomic approach described above are feasible without a large dataset of patients highly characterized for the disease under investigation.The most promising and upcoming genomic applications require increasingly large sample size and sufficient quantity of DNA.Existing cohorts to study the genetics of asthma have a finite amount of DNAs and more efforts are needed to collect new subjects and to expand existing cohorts for continuing progresses in the field.The ongoing collection of the Quebec City Case-Control Asthma Cohort aims at providing such resource in the field of asthma.The establishment of this resource is a major part of our genetic research program and will be the essence of the following studies that we are planning on the genetics of asthma.We hope that this cohort will contribute in multi-center efforts intent to understand the molecular basis of asthma and welcome collaborations for replicating genetic findings.
With the recent advances in genomic technologies, the bottleneck of genetic research has shifted from our capacity to measure DNA to the availability of a large number of patients well-characterized for the disease under study.This need was one of the primary rationales to initiate the Quebec City Case-Control Asthma Cohort.DNAs from this cohort will be studied using contemporary and upcoming genomic approaches to elucidate the genetic architecture of asthma.These include the study of rare and structural genetic variants and more comprehensive genomic analyses such as exome and whole-genome sequencing.Application of these molecular approaches to subgroups of phenotypically similar asthmatics described in this manuscript is required to fully understand the genetic susceptibility to asthma and to ensure point-of-care implementation of personalized medicine.

Figure 1 .
Figure 1.Age distribution according to asthma status.

Figure 2 .
Figure 2. Atopy and airway hyperresponsiveness.(a) Proportions of subjects with airway hyperresponsiveness (AHR) in cases and controls; (b) Proportions of subjects with allergies in cases and controls.

Figure 3 .
Figure 3. Proportions of subjects with x number of allergies for asthmatic (top panel) and non-asthmatic (bottom panel) subjects.The labels surrounding the pies indicate the number of allergies.

Figure 4 .
Figure 4. Percentage of positive response for tested allergens (y-axis) among subjects with (black) and without (white) asthma.

Table 1 . Clinical characteristics of the Quebec City Case-Control Asthma Cohort.
AHR, airway hyperresponsiveness; BMI, body mass index; FEV 1 , forced expiratory volume in one second; FVC, forced vital capacity; SPT, skin-prick test.

Table 3 . Factor analysis with varimax rotation on cases.
Factors had eigenvalues greater than 1 and explained 53.1% of the total variance.Values in bold have loadings greater than 0.4.BMI, body mass index; FEV 1 , force expiratory volume in one second; FVC, force vital capacity; AHR, airway hyperresponsiveness (CP 20 < 8 mg/ml).