To Explore the Causal Link between Genetic Predisposition to Alcohol Consumption and Type 2 Diabetes Mellitus ()
1. Introduction
Diabetes has become a serious global public health issue that significantly affects human health [1]. Diabetes is a group of chronic metabolic diseases caused by insufficient insulin secretion or defects in its action [2]. According to the 10th edition of the International Diabetes Federation’s Global Diabetes Map published in 2021, 537 million adults (aged 20 - 79 years) worldwide had diabetes in 2021, with one in ten people affected. It is projected that this number will rise to 643 million by 2030 and 783 million by 2045 [3]. Increasing evidence suggests a complex association between alcohol consumption and type 2 diabetes. While some studies propose that harmful drinking may be associated with an increased risk of type 2 diabetes, this association is not linear and may be influenced by individual genetic background, lifestyle, and other environmental factors [4]. This study aims to understand the relationship between alcohol consumption and type 2 diabetes using Mendelian Randomization (MR).
1.1. Alcohol Consumption and Type 2 Diabetes
The relationship between alcohol consumption and type 2 diabetes has received considerable attention. Wang Yanhuan et al. found in their prospective study “Alcohol Consumption and Risk of Type 2 Diabetes Among Adult Males in Deqing County, Zhejiang Province” that individuals who consume large amounts of alcohol in a single session and drink frequently have an increased risk of developing type 2 diabetes. In contrast, those who consume smaller amounts of alcohol in multiple sessions may have a lower risk [5]. Xu Li et al. further confirmed these findings in their review “Research Progress and Prospects on Moderate Alcohol Consumption and Diabetes Prevention,” which also showed that moderate alcohol consumption can reduce the incidence of diabetes, while excessive drinking increases the risk [6]. Zhu Yali et al. mentioned in their cross-sectional study “Association Between Alcohol Consumption and Pre-diabetes and Diabetes in Bao’an District, Yan’an City” that light drinking (0.1 g/day < ethanol intake < 19.9 g/day) is a protective factor against pre-diabetes and diabetes, whereas heavy drinking (ethanol intake ≥ 20 g/day) is a risk factor [7]. These findings suggest that alcohol consumption is an important factor in the development of type 2 diabetes.
1.2. Genetic Predisposition and Drinking Behavior
Genetic studies have shown that an individual’s genetic makeup can influence their drinking behavior, including frequency and quantity. Foreign research has indicated that the expression of alcohol-related genes, such as initial reactions to alcohol and maximum daily drinking episodes, is genetically influenced. For example, rs1229984 in ADH1B (Alcohol Dehydrogenase 1B) and rs671 in ALDH2 (Aldehyde Dehydrogenase 2) can prevent heavy drinking (Kimura and Higuchi 2011) [8]. SNPs such as rs1229984 in ADH1B, rs13130794 in KLB, rs144198753 in BTF3P13, rs1260326 in GCKR, rs13107325 in SLC39A8, and rs11214609 in DRD2 are significantly associated with drinking behavior [9]. Therefore, genetic predispositions related to drinking can influence drinking behavior.
1.3. Mendelian Randomization Studies
To better understand the causal relationship between alcohol consumption and type 2 diabetes, we adopted the statistical method of Mendelian Randomization (MR). MR uses genetic variants as instrumental variables to evaluate the causal relationship between exposure factors (such as alcohol consumption) and health outcomes (such as type 2 diabetes). The core idea of MR is to utilize the random allocation characteristics of genetic variants, similar to randomization in randomized controlled trials, to overcome potential confounding and reverse causality [10].
1.4. Research Objectives and Significance
Given the above background, this study aims to further explore the causal link between genetic predispositions for alcohol consumption and type 2 diabetes using Mendelian Randomization. Specifically, we will assess how different levels of genetic predisposition to alcohol consumption affect the risk of developing type 2 diabetes and explore potential biological mechanisms to provide new insights into the prevention and management of type 2 diabetes. Understanding the causal relationship between genetic predispositions for alcohol consumption and type 2 diabetes not only helps reveal disease mechanisms but also provides scientific evidence for targeted prevention strategies. Additionally, this study will guide clinical practice, particularly in designing personalized medical plans, to reduce the incidence of type 2 diabetes and improve patients’ quality of life.
2. Methods
2.1. Data Sources and Basic Information
Genome-wide association studies (GWAS) summary data related to alcohol consumption were obtained from the IEUOpenGWASProject database for European populations. The IEUOpenGWASProject, developed by the Medical Research Council Integrative Epidemiology Unit (MRCIEU) at the University of Bristol, is a comprehensive resource of manually curated GWAS summary datasets available for download or query. By searching for alcohol-related traits, we obtained a sample from 2017 with 112,117 participants and 12,935,395 SNPs. GWAS summary data related to type 2 diabetes were also obtained from the IEUOpenGWASProject database for European populations. By searching for type 2 diabetes traits, we obtained a sample from 2021 with 490,089 participants and 24,167,560 SNPs. Since all the data were extracted from publicly available summary statistics, ethical approval was not required.
2.2. Data Extraction and Preprocessing
In this study, to establish causality, the Two-Sample Mendelian Randomization (TSMR) method was employed utilizing standardized Genome-Wide Association Studies (GWAS) summary statistics. The GWAS datasets used encompass essential information on exposure and outcome variables, specifically including several key elements: Single Nucleotide Polymorphisms (SNPs), the chromosome number where the SNP resides (Chromosome, chr), the precise location on the chromosome (Position, pos), the effect size (Effect Size, β), which denotes the impact of each allele on the outcome variable, expressed in terms of standard error. Notably, in this study, the effect sizes are presented as Odds Ratios (OR) and converted to their natural logarithmic form (log(OR)) following conventional practices for subsequent analyses. Furthermore, the dataset includes important parameters such as the standard error of the effect size (Standard Error, SE), the effect allele (Effect Allele, EA) associated with the SNP, and the significance level (P-value) of the risk allele in the exposure factor. Collectively, these data serve as the foundation for causal inference in this study.
2.3. Selection of Instrumental Variables
To ensure the validity and applicability of the selected instrumental variables in this study, single nucleotide polymorphisms (SNPs) that meet the fundamental criteria for instrumental variables were initially screened from the dataset of the exposure factor. The screening process followed the steps outlined below:
1) Initial Screening of Significant SNPs: Candidate SNPs were preliminarily identified from the exposure factor based on a threshold of P < 5 × 10−8. Considering the relatively smaller sample size for alcohol consumption, which limits the number of available instrumental variables, an additional criterion of P < 5 × 10−5 was adopted to supplement the pool of candidate SNPs, thereby increasing the number of potential instruments.
2) Removal of Linkage Disequilibrium (LD) SNPs: Linkage disequilibrium (LD) refers to the non-random association of alleles at different loci on a chromosome, indicating that these alleles tend to be inherited together more often than would be expected by chance. To avoid potential biases introduced by linkage disequilibrium, the PLINK clumping algorithm was applied to the GWAS summary data of the exposure factor. The clumping process involved setting the linkage disequilibrium parameter r2 threshold to 0.01 and the genetic distance to 5000 kilobases (kb), further refining the list of SNPs to ensure they met the independence requirement for instrumental variables.
2.4. Validation of Instrumental Variables
In Mendelian Randomization (MR) studies, the reliability of causal inference conclusions heavily relies on the appropriate selection of instrumental variables. To ensure this, the present study strictly adheres to one of the three core assumptions in MR analysis concerning the choice of instrumental variables―the relevance assumption. This assumption stipulates that the instrumental variables must maintain a stable and strong association with the exposure factor. Specifically, SNPs serving as instrumental variables should exhibit a significant correlation with the exposure factor under investigation, such as the relationship between alcohol intake and the risk of Parkinson’s disease. The three main hypotheses of Mendelian randomization studies are shown in Figure 1.
To further ensure that the selected instrumental variables comply with this assumption, the study implemented additional measures, including the use of F-statistics to evaluate potential weak instrument bias. In this context, a weak instrument refers to genetic variants that poorly explain variations in the exposure factor, characterized by a weak link with the exposure. It is crucial to note that weak instruments are not synonymous with invalid instruments; there is a fundamental distinction between the two. In practice, the risk of weak instrument bias is particularly pronounced when the sample size is insufficient to provide adequate statistical power. By employing the aforementioned methods, the aim of this study is to minimize such biases to the greatest extent possible, thereby enhancing the accuracy and credibility of the causal inference outcomes.
![]()
Figure 1. Three main assumptions of MR studies.
2.5. Estimation of Causal Effects
In this study, to accurately assess the causal effects of exposure factors on outcome variables, multiple statistical methods were primarily employed, with a focus on the Inverse Variance Weighted (IVW) method.
1) Inverse Variance Weighted (IVW) Method: The IVW method is the most commonly used approach for estimating causal effects [11]. Its fundamental principle assumes that all selected genetic variants are valid instrumental variables (IVs) and that these IVs affect the outcome variable only indirectly through the exposure factor. Based on this assumption, the IVW method expects that the effect of each IV on the outcome variable is proportional to its effect on the exposure factor. Therefore, the causal effect of the exposure factor on the outcome variable can be estimated by calculating the weighted average of the ratios of the outcome variable effects to the exposure variable effects for each IV, with weights typically being the inverse variances of the IV effect estimates. This method provides a robust estimate of the causal effect, equivalent to the slope obtained from a linear regression model regressing SNP-outcome associations on SNP-exposure associations.
However, a critical limitation of the IVW method lies in its assumption that the intercept of the regression model is fixed at zero. This implies that if the effect of an IV on the exposure factor is zero, then its effect on the outcome variable should also be zero. This assumption can be challenged in practice due to pleiotropy, where some IVs may directly or indirectly influence the outcome variable, not solely through the exposure factor. In such cases, the causal effect estimate derived from the IVW method may be biased.
2) Causal Effect Estimation: In this study, the IVW method was utilized to calculate the causal effect of the exposure factor on the outcome variable. Specifically, by implementing the aforementioned statistical procedures, we obtained the overall effect size, standard error (which was subsequently transformed into the final Odds Ratio (OR) and its 95% Confidence Interval (CI)), and the significance test value. In this study, a two-tailed test was used by default, and a causal effect was considered statistically significant when the P-value was less than 0.05.
2.6. Statistical Analysis Software and Workflow
The statistical analyses for this study were conducted using R software (version 4.4.1). For the Two-Sample Mendelian Randomization (TSMR) analysis, the primary tool used was the TwoSampleMR package, developed by Gibran Hemani et al. This R package is publicly available on GitHub.
For visualization purposes, the study utilized the forest and TwoSampleMR packages within R to generate graphs relevant to the Mendelian Randomization analysis. These included Forest Plots, Scatter Plots, and Funnel Plots, which are essential for visualizing the results and assessing the robustness of the causal inferences made.
3. Results
3.1. Selection of SNPs Related to Alcohol Consumption and Instrumental Variables
By screening the GWAS summary data for alcohol consumption, we initially obtained SNPs with P < 5 × 10−5, LD parameter r2 < 0.01, and genetic distance > 5000 kb. Calculations of r2 and F statistics for each SNP revealed that the F statistics for the 247 candidate instrumental variables were all greater than 10, excluding weak instrument bias due to insufficient sample size and ensuring the reliability of the MR analysis. The detailed information of each SNP in the drinking-related GWAS data is shown in Table 1.
3.2. Validation of Instrumental Variables
The 247 selected instrumental variables all met the criterion of P < 5 × 10−5, indicating that these SNPs are strongly associated with alcohol consumption and satisfy the relevance assumption for instrumental variables. After integrating the exposure (alcohol consumption) and outcome (type 2 diabetes) data, the corresponding P values, beta values, and standard errors (SE) for these SNPs in both the alcohol consumption and type 2 diabetes databases are summarized in Table 2.
![]()
Table 1. Detailed information of SNPs in the GWAS summary data for alcohol consumption.
![]()
Table 2. P-values, beta values, and SE values for SNPs in alcohol consumption and type 2 diabetes datasets.
3.3. Estimation of Causal Effects
The estimation of causal effects using different methods is shown in Table 3. Overall, there is no causal relationship between genetically predicted alcohol consumption and the risk of type 2 diabetes (T2D). The IVW method results indicate that there is no statistically significant association between alcohol consumption and an increased risk of T2D (OR = 1.0046, 95% CI: [0.8722, 1.1571], P = 0.9495). Similar results were obtained using the Weighted Median (WM) method (OR = 0.9013, 95% CI: [0.7777, 1.0445], P = 0.1671), Simple Model (OR = 0.8635, 95% CI: [0.6201, 1.2023], P = 0.8635), and Weighted Mode (OR = 0.8306, 95% CI: [0.6258, 1.2023], P = 0.2003) (all Ps > 0.05), which supports the stability of the IVW method results. The results of scatter plot, funnel plot, forest plot and leave one method sensitivity analysis of MR Analysis are shown in Figures 2-5.
![]()
Table 3. MR estimates of the causal effect of alcohol consumption on type 2 diabetes using different methods.
![]()
Figure 5. Leave-one-out sensitivity analysis results.
4. Discussion
4.1. Summary
This paper reviews the research progress on the complex associations between alcohol consumption and T2D and explores their causal relationship using Mendelian Randomization (MR) methods. The relationship between alcohol consumption and T2D is not linear; moderate alcohol consumption may improve insulin sensitivity and reduce inflammation, thereby providing some protective effect against T2D. However, excessive alcohol consumption increases the risk of the disease, particularly through weight gain and exacerbation of insulin resistance. An individual’s genetic background may influence drinking habits, with certain genetic variants being associated with higher drinking tendencies, and these variants may also affect the risk of T2D.
The aim of this study was to further investigate the causal link between genetic predisposition to alcohol consumption and the risk of T2D using MR methods. The study utilized European population samples from the IEU Open GWAS Project database, including GWAS summary data related to alcohol consumption and T2D. The experimental methods included the selection, validation of instrumental variables, and estimation of causal effects. During the selection process, SNPs significantly associated with alcohol consumption were chosen, and SNPs in linkage disequilibrium were removed. Furthermore, the causal effect was estimated using the Inverse Variance Weighting (IVW) method.
The results show that, based on the selected 247 instrumental variables, there is no significant causal relationship between genetically predicted alcohol consumption and the risk of T2D. The IVW method results indicate that there is no statistically significant association between alcohol consumption and an increased risk of T2D (OR = 1.0046, 95% CI: [0.8722, 1.1571], P = 0.9495).
4.2. Limitations of This Study
The conclusions drawn from this study contradict those of previous studies, possibly due to the weak influence of available genetic variants on exposure factors. For example, while genes associated with alcohol consumption can influence drinking behavior to some extent, they are also affected by postnatal factors such as lifestyle, social environment, family, social interactions, and psychological factors.
Regarding the impact of drinking behavior on T2D, the focus is primarily on the dose of alcohol consumed per occasion. Although genes associated with alcohol consumption can affect the quantity and activity of alcohol-metabolizing enzymes in the body, they only relate to the upper limit of alcohol consumption per occasion and do not have a direct relationship with the actual amount of alcohol consumed.