Alcohol and Type 2 Diabetes : Results from Canadian Cross-Sectional Data

Cross-section data from Canadian Community Health Surveys are used to examine the relationship between moderate alcohol use and type 2 diabetes. Results from these data are compared with those which have been obtained from prospective longitudinal studies. The major result is that both types of data yield similar conclusions with respect to this relationship. The reason why this occurs is because Canadian drinking behavior is quite stable once a respondent has become an adult and remains relatively stable thereafter. The only difference between the two types of survey is the time at which information on drinking behavior is obtained. Since this does not matter if drinking behavior is stable over large age ranges results from the two types of survey will be similar. Neither type of data can be used to support the proposition that the relationship between drinking behavior and the risk of diabetes is causal. Some advantages that sample survey data have over longitudinal data are also noted.


Introduction
Much has been written on the effects of moderate alcohol consumption as a prophylactic for type 2 diabetes.The studies which are regarded as the most influential and referred to most often are prospective or longitudinal.They collect baseline information on a sample at a fixed point in time and then follow the respondents in the sample for a number of years until the respondent is either diagnosed with diabetes, dies, or the study is terminated for administrative reasons.The measure which is used to represent diabetes is the duration of diabetes free life.One of the distinguishing features of the methodology is that only those respondents who do not have di-abetes at baseline are retained for follow-up coverage.Some studies collect information at regular intervals as the study progresses but most do not.
There is a sufficiently large number of these studies to have generated three meta-studies reviewing their results [1]- [3].The conclusion of these meta-studies is that moderate alcohol use is associated with a lower risk of getting type 2 diabetes relative to non-drinkers, occasional drinkers, or heavy drinkers.New studies have also emerged: [4]- [7], which add to this already extensive literature and confirm the earlier results.
The appeal of prospective studies is that the observed statistical relationship between alcohol use and the risk of type 2 diabetes is based on the respondent's self-reported drinking behavior at baseline which by construction is prior to the onset of diabetes.Because of this the hypothesis that this relationship is causal has some appeal.On the other hand, results based on sample surveys involving cross-section data are seen as much less convincing since drinking behavior at the time of the survey refers to a time after the onset of diabetes for those who suffer from the disease.This would have current behavior explaining events that happened in the past, or so it would seem.
It is argued here that because Canadian drinking habits are relatively stable over time and over a large range of ages the data requirements for both cross-sectional and prospective longitudinal surveys to be informative about this issue are similar.This is good news for diabetes research for two reasons.First, countries like Canada, for example, which have not produced many longitudinal medical or health surveys can use cross-section data to investigate what effect drinking behavior has on the probability of having diabetes using the sample surveys that are regularly carried out by Statistics Canada [8] and [9].Secondly, these sample surveys can be used to answer an important question that arises when longitudinal data is being used.Deleting respondents with diabetes at baseline generates a selection problem.For example, [10] examined subjects aged 65 or older using the American Cardiovascular Health Study.For Canada in 2010 20.7% of males over 65 had diabetes so that the sample of diabetes free respondents may not be representative of the 65+ sample as a whole 1 .Presumably a similar result holds for their data as well.Does this matter?Sample data from the 2010 Canadian Community Health Survey suggests there are no sample selection problems associated with looking at older populations.What, in principle, appeared to be a problem does not arise.
Finally, unlike longitudinal studies, there are no attrition problems when a cross-section survey is the source of the data.Respondents are contacted only once; there is no need to keep track of them and inference problems due to non-response or death do not arise even if mortality is related to alcohol use 2 .
The paper has the following format.The argument that cross-sectional and longitudinal surveys are similar with respect to when drinking behavior is determined is developed in detail in the next section.The statistical model is outlined in Section 3 and the results are contained in Section 4. These are discussed in Section 5 and the paper the ends with a summary of the results and some conclusions.

Longitudinal vs. Cross-Sectional Data
When researchers use baseline data to examine outcomes that occur later in a project this data has to represent the respondent's characteristics not just at the time the data was collected but for a considerable period prior to the collection date as well as for the follow-up period.In the case of alcohol consumption and diabetes risk in longitudinal surveys the authors of [6] note that this requires "alcohol intake be fairly stable over time".However, it is not clear exactly what the appropriate time frame is.Different studies report different results but there is not much information about the age at first moderate alcohol use which minimizes the probability of getting diabetes.
Stability of behavior is also required for cross-section data to be informative about this relationship but the requirements are somewhat more stringent.Alcohol intake has to be stable for a period of unknown length prior to the onset diabetes.For the samples used here many of the respondents became diabetic 10 or 15 years before the data was collected so that reported drinking behavior obtained in the survey, if it is to be informative about the risks of having diabetes, has to be the same as it was long before the information on the respondent's current alcohol consumption behavior was obtained.
In Canada, drinking behavior is formed when respondents are in their twenties and remains remarkably stable up to ages 50 -59 for men.There is considerable more variation for women.In Table 1 and Table 2 proportions of the population who claim to be regular drinkers are displayed 3 .This table contains data from the 2000-01 and 2011-12 Canadian Community Health Surveys and gives proportions by ten-year age groups for both male and female respondents.In 2000-01 the proportions of regular drinkers among males was around 50% for age groups 20 -59 with little or no significant variation across age groups.For the 2011-12 sample there is hardly any variation at all over the first four age categories but the proportions of male regular drinkers in the age groups 20 -59 in 2010 were about 10% higher than they were eleven years earlier.For women changes in behavior across the two surveys are much larger.In 2011-12 women in all age categories drank considerably more than they did eleven years earlier.The survey design is the same for both years so although the respondents are not the same for the two surveys they come from the same distribution.For men, the conclusion from this is that alcohol consumption behavior does not change very much as respondents or cohorts get older and thus the age at which this information is collected will not have a major impact on the results concerning the importance of alcohol intake on the risk of diabetes.Of course, the upward trend in regular drinking behavior could have an impact on the results.This issue will be examined later by looking at some simulations.

Statistical Models
In the Canadian Community Health Surveys respondents are asked whether they have type 2 diabetes.They are also asked what type of medication has been prescribed for them.Most respondents (86% in 2011-12) were either taking some form of oral medication or insulin or sometimes both.So that although the information is selfreported the fact that most respondents had some involvement with a medical practitioner suggests that it is quite reliable.The measure of diabetes used here is the answer to this question for type 2 diabetes.Call this d i for respondent i.This is a binary variable which takes the value 1 if respondent i has type 2 diabetes and 0 if not at the time of the survey.The type 2 diabetes indicator variable is explained by normal probability model.Define where X i is vector of personal characteristics of respondent i including whether he or she is a regular or occasional drinker and u i is a normally distributed error term.* i d can be interpreted as a latent variable measuring the propensity for respondent to become diabetic.The outcome probabilities can then be defined as and where The parameters in the model can be estimated by maximizing the sample likelihood function whose natural logarithm is ( ) ) The respondent characteristics include six smoking categorical variables going from never smoked to being a daily smoker.There are four educational categories going from less than a high school diploma to a university degree.There is also information on the respondent's age, body mass index, income decile and level of physical activity.But the last two were not used as regressors.The non-drinker category includes former drinkers.This is seen as problematic by some authors, [12] for example.There are some problems associated with combining never and former drinkers but these are shown to be small in [13].Table 3 and Table 4 show parameter estimates for the probability model for four age groups for both males and females for the two alcohol use dummies, the natural logarithm of the respondent's body mass index, and age.

Results
Discussion of the results begins with the analysis of the age group 40 -49 for males.Diabetes prevalence rates for the age groups 20 -39 are quite low at 1.2%.They rise to 4.8%, four times higher, for the age group 40 -49.It would therefore appear that most of the respondents who suffer from diabetes in this age group became diabetic in their late thirties or forties.As argued earlier the drinking behavior for this age group is similar to what it was at younger ages and before the onset of diabetes.Thus, the large and significant regression coefficient for the categorical variable "regular drinker" for males of −0.468 (0.098) in the first row of Table 3 leads to the same conclusions that are claimed in the studies based on longitudinal data.Current drinking behavior as measured at the age when the respondent was surveyed is a good representation of lifetime drinking behavior for this age group so that having been a regular drinker is associated with a lower probability of having type 2 diabetes.Lifetime behavior is exogenous or predetermined so diabetes cannot cause someone to be have been a regular drinker.Being diabetic could induce lower current alcohol consumption but there is no evidence that it does and medical advice concerning alcohol use for diabetics does not usually recommend lower alcohol consumption.Whether moderate alcohol is a causal factor in reducing the risk of diabetes is another matter; but being exogenous or predetermined is not sufficient for causality.Causality is a complex issue and it will be examined in more detail in the next section.Other variables were included as regressors in the normal probability models.The coefficients of the natural logarithm of the respondent's body mass index, BMI, was always the largest and most significant, followed by age and then some of the higher educational categories.Being a lifetime non-smoker also reduced the risk of diabetes.Income and physical activity were not included as explanatory variables because of the possibility of reverse causation.Being a heavy drinker was included but it was never significant, a result similar to that found by [7] and others.For this cohort of male respondents who have a history of regular moderate drinking, are not overweight, don't smoke, are well educated and younger are much more likely not to have diabetes.
The parameter estimates for the older age groups are very similar to those for the age group 40 -49.This result is somewhat surprising since the distribution of drinking behavior begins to change towards more occasional and nondrinkers as the cohorts get older.Apparently these changes are not large enough to alter the conclusions based on the youngest cohort.
The parameter estimates for women are similar to those for men except that being a regular drinker is more important for women and these coefficients increase with age.This is an unusual result since most studies find that the prophylactic effects of regular moderate alcohol use are less pronounced for women.
Table 5 and Table 6 show predicted probabilities of having diabetes by age groups and type of drinker.Although the regression coefficients associated with the regular drinking category do not change very much across age groups predicted probabilities of having diabetes increase dramatically by age and by drinking behavior category.For the age group 70 -79, for example, respondents in the occasional or non-drinker categories are more than two and a half times as likely to have diabetes than regular drinkers.

Discussion
The slight upward trend in the proportion of regular drinkers means that the measured proportions for the age group 40 -49 actually overstate the proportions of regular drinkers ten or twenty years earlier.If the actual proportions overstate the true proportions then it will also lead to an inflated estimate of the true effect of being a regular drinker on the probability of having diabetes.How large is the error associated with the use of the inflated data?To get an answer to this question female a simulation exercise was carried out where 10% of the regular male drinkers and 37% of the regular female drinkers in the age group 40 -49 were randomly reallocated equally to the two other categories.This leaves a set of regular drinkers which has the same proportion of regular drinkers for the age group 20 -29 in the 2000-01 sample for both genders.For these simulated samples there is an increase in the regression coefficient associated with the regular drinker dummy from −0.401 (0.076) to −0.346 (0.077) for men and −0.372 (0.081) to −0.318 (0.090) for women, respectively.Neither change is significant, and the new coefficients are still many times their standard errors.Although there is a change in the size of the response the effect associated with being a regular drinker it is still present and highly significant.Thus even if current drinking behavior does not represent exactly what respondents did twenty years earlier it is still a good enough measure of their behavior.One of the reasons why researchers pay so much more attention to longitudinal data sources is because of the belief that it will bring them closer to discovering a causal relation between drinking behavior and the risk of diabetes.The issue of causality is an important one and not being able to claim that results are causal is often seen as detracting from their credibility.The important question here is whether results based on either longitudinal or sample survey data can be used to support the hypothesis that there is a causal relation between drinking behavior and diabetes.
From a purely statistical point of view the answer to this question is most probably not and this result does not depend on which type of data is being used in the analysis.In the linear regression framework ( [14], p. 31) showed that in a relation like Equation (1) the X i vector of variables cause d i if X i and u i are independent and the regression coefficients are significant.Here causality fails because the drinking dummies are not an adequate description of respondent drinking behavior.There is no information on the type of beverage consumed or whether it is consumed with meals.In addition, the number of drinks consumed per day is an important characteristic of drinking behavior and should also be included as a regressor.This information is available in the surveys but it is not regarded as being reliable and for that reason was not included here.Respondents in the survey were asked to recall what they drank on each day and what they said they drank does not agree with drinks per day which are based on per capita alcohol sales data.The result is that there is unobservable variation within the category "regular drinker" and this generates a measurement error problem and leads to correlation between X i and u i .This does not mean that causality is rejected; it just cannot be confirmed using this type of data.
However, what is of slightly more concern is that even if an accurate picture of drinking behaviour could be observed it still might not be possible to confirm that the relation is causal.Suppose for example that instead of Equation ( 1) the true model is where the Z i are highly correlated with the X i variables but cannot be observed by the researcher.Ideally the model to be estimated should be based on the equation The estimated β coefficients will not be significant but the δ coefficients will be.But when Equation ( 1) is the basis of the model then and u i and X i are not independent.The model based on Equation (1) is not causal not because of measurement error but because of omitted regressors which are correlated with the observable regressors.This is not just a hypothetical situation.Information on the respondent's history of dietary and physical activity as well as detailed information on the timing and degree of being overweight or obese is very important in the analysis of diabetes.This information is uniformly absent in almost all surveys whatever their type.In this respect, both types of survey are similar and neither will be very informative about issue of causality.However, there is other evidence suggesting a causal relationship.[6] found that in addition to moderate levels of alcohol consumption being associated with lower risks of diabetes, a small increase in alcohol consumption to moderate drinking behavior from being a light drinker also reduced the risk of diabetes.This is extremely compelling evidence in favor of the relation being causal.The duration of the experiment in question in this study was four years suggesting that the periods under consideration here are sufficiently long to capture the effect of drinking behavior on the risk of type 2 diabetes.Additionally, there is some medical evidence which suggests that alcohol use increases insulin sensitivity which lessens the probability of getting diabetes, [15] and [16].Moreover, Table 5 and Table 6 show very large and significant differentials across drinking categories that increase dramatically with age.This is consistent with the hypothesis that moderate drinking behavior actually reduces the probability of getting diabetes 2. In the absence of plausible competing hypotheses one might be inclined to believe that the relationship is causal.
The sample survey results presented here should be of considerable interest to diabetes researchers because they confirm what others have found using prospective data, namely that there is a "U" shaped relation between alcohol consumption and the risk of diabetes.Evidence of this result was obtained here when age groups were aggregated into the age group 40 -79.Sample sizes were too small to use the individual age groups.Within the drinker category there are seven sub-categories going from less than once a month to drinking every day.The category with the largest regression coefficient was three to four days per week for men and five to six days per week for women.Optimal drinking behavior was never characterized by drinking every day.This confirms the "U" shaped relation between alcohol use and the risk of type 2 diabetes mentioned above.
In the introduction the study involving older cohorts by [10] was mentioned.The high average age of the respondents raised the possibility that sample selection could contaminate the results.However, when this sample was examined here similar results were obtained for the regression coefficients, −0.345 (0.032) for males and −0.528 (0.034) for females, respectively.These coefficients are similar to those for the younger age groups; both are highly significant and the gender differential is preserved.There is no apparent effect of selecting respondents who are older than 65 on the prophylactic effects of moderate drinking.

Summary and Conclusions
The results in this paper show that moderate alcohol use acts as a prophylactic in reducing the risk of type 2 diabetes.The data used here is cross-sectional and represents behavior at one point in time.However, Canadian drinking habits are fairly stable over time and across cohorts so that information about them will be similar for both longitudinal and cross-section surveys.Thus it should not be surprising that the protective effects of moderate alcohol use that so many longitudinal studies have found should also apply to respondents in the 2010-11 Canadian Community Health Survey.This is useful information since this issue has not been examined using Canadian longitudinal data.There are advantages from using cross-section surveys in terms of cost and the avoidance of both attrition and selection problems that arise in longitudinal surveys.It was also shown that neither type of survey could be used to justify a causal relation between alcohol use and type 2 diabetes.For longitudinal surveys the fact that the information on alcohol use was collected prior to the onset of the disease is not sufficient to support the claim that the relation was causal.
Sample surveys like the Canadian Community Health Survey are a new source of data that can and should be used to explore how health issues are related to respondent behavior.However, some of the problems noted about these surveys could be circumvented by including more retrospective content like the history of the respondent's weight and exercise habits as well as more accurate information on how much and how often they drink alcohol.

Table 1 .
Proportion of regular drinkers in Canadian community health surveys, 2000-01 and 2011-12, for males.

Table 2 .
Proportion of regular drinkers in Canadian community health surveys, 2000-01 and 2011-12, for females.

Table 5 .
Predicted probabilities of diabetes by age group and drinking category for males, 2011-12.

Table 6 .
Predicted probabilities of diabetes by age group and drinking category for females, 2011-12.