Analysis of Gestational Diabetes Mellitus (GDM) and Its Impact on Maternal and Fetal Health: A Comprehensive Dataset Study Using Data Analytic Tool Power BI

Abstract

Gestational Diabetes Mellitus (GDM) is a significant health concern affecting pregnant women worldwide. It is characterized by elevated blood sugar levels during pregnancy and poses risks to both maternal and fetal health. Maternal complications of GDM include an increased risk of developing type 2 diabetes later in life, as well as hypertension and preeclampsia during pregnancy. Fetal complications may include macrosomia (large birth weight), birth injuries, and an increased risk of developing metabolic disorders later in life. Understanding the demographics, risk factors, and biomarkers associated with GDM is crucial for effective management and prevention strategies. This research aims to address these aspects comprehensively through the analysis of a dataset comprising 600 pregnant women. By exploring the demographics of the dataset and employing data modeling techniques, the study seeks to identify key risk factors associated with GDM. Moreover, by analyzing various biomarkers, the research aims to gain insights into the physiological mechanisms underlying GDM and its implications for maternal and fetal health. The significance of this research lies in its potential to inform clinical practice and public health policies related to GDM. By identifying demographic patterns and risk factors, healthcare providers can better tailor screening and intervention strategies for pregnant women at risk of GDM. Additionally, insights into biomarkers associated with GDM may contribute to the development of novel diagnostic tools and therapeutic approaches. Ultimately, by enhancing our understanding of GDM, this research aims to improve maternal and fetal outcomes and reduce the burden of this condition on healthcare systems and society. However, it’s important to acknowledge the limitations of the dataset used in this study. Further research utilizing larger and more diverse datasets, perhaps employing advanced data analysis techniques such as Power BI, is warranted to corroborate and expand upon the findings of this research. This underscores the ongoing need for continued investigation into GDM to refine our understanding and improve clinical management strategies.

Share and Cite:

Hashim, S. and McAdams, A. (2024) Analysis of Gestational Diabetes Mellitus (GDM) and Its Impact on Maternal and Fetal Health: A Comprehensive Dataset Study Using Data Analytic Tool Power BI. Journal of Data Analysis and Information Processing, 12, 232-247. doi: 10.4236/jdaip.2024.122013.

1. Introduction

Gestational Diabetes Mellitus (GDM) is a significant concern during pregnancy due to its potential health implications for both the mother and the fetus. GDM, or gestational diabetes mellitus, is a condition characterized by irregular glucose levels during pregnancy. It is a prevalent complication, affecting 3% - 10% of pregnancies. Typically diagnosed between the 22nd and 26th weeks of gestation, GDM poses high-risk implications for both expectant mothers and infants. Potential complications include respiratory issues, metabolic disorders, premature delivery, and excessive fetal weight gain, which may complicate the birthing process. While GDM usually resolves after childbirth, women remain at an increased risk of developing type 2 diabetes, with a cumulative incidence ranging from 30% - 50% within 5 - 10 years following the initial pregnancy. Numerous studies suggest that early medical intervention in the first or second trimester can prevent high-risk complications associated with GDM [2] .

Data cleaning is an essential prerequisite in the data analysis process to ensure the reliability and accuracy of results. This research paper presents a detailed account of the data cleaning steps undertaken on a dataset, highlighting the techniques employed to address various data quality issues. The dataset, consisting of medical and fetal information, was subjected to rigorous cleaning, restructuring, and transformation. The methodology and strategies implemented are outlined, with each step explained and exemplified. The paper also emphasizes the importance of data preprocessing in enhancing the overall quality of the dataset for subsequent analysis. The dataset consists of 600 pregnant women, with 74 diagnosed with GDM, 7 as prediabetic, and the remainder non-diabetic. Key demographic insights include age, race, and BMI distribution among the participants. All biomarkers with description and ranges from the dataset are listed in Table 1. In this investigation of Gestational Diabetes (GD), we leverage the capabilities of Microsoft Power BI to unearth valuable insights from the wealth of data within our organization. Power BI acts as a powerful tool that seamlessly connects various data sources, allowing us to bring together disparate sets of information.

Table 1. Data definitions of the columns used in dataset.

2. Data Collection

The method of data collection for the study on gestational diabetes involved accessing data from https://physionet.com/, which provided a dataset consisting of information on 600 pregnant women. This dataset included details such as age, race, BMI, diabetes status (gestational diabetes, prediabetic, non-diabetic), and multiple visits for each woman throughout the pregnancy and post-delivery. To ensure the accuracy of the sample data, the following measures have been taken:

Data Source Validation: Physionet.com is a reputable source for physiological data, which increases the likelihood of the data being accurate and reliable. However, we have conducted preliminary assessments to ensure the dataset’s integrity and validity.

Data Cleaning: Prior to analysis, we performed data cleaning processes to identify and rectify any errors, inconsistencies, or missing values in the dataset. This step helped improve data quality and accuracy.

Sample Representativeness: The sample of 600 pregnant women may have been selected to represent a diverse population to ensure the generalizability of the findings. This includes considering factors such as age, race, BMI, and diabetes status to reflect the broader population of pregnant women.

Statistical Analysis: Statistical techniques have been employed to detect outliers, assess distributions, and examine patterns within the data. These analyses helped identify anomalies and ensure the accuracy and consistency of the sample data.

3. Literature Review

Medical informatics had its roots in the 1950s, initially emerging in the United States and later extending its influence on Europe and developing Eastern countries. The inaugural scholarly work outlining the concept of utilizing computer technologies in medicine was introduced by Robert S. Ledley and Lee Browning Lusted in 1959 [2] . As authors describe, we have seen the development of technology and health together and these both domains cannot be separated. In recent years, the introduction of data analytics to large amounts of healthcare data collected on daily basis opened numerous new opportunities and challenges in the field of medical informatics. By definition, healthcare informatics refers to the process of leveraging information technologies to improve the quality of healthcare [3] . The shift to digitizing healthcare data is a direct outcome of the evolution and revolution of big data. The substantial increase in data volume in recent years prompted the identification of a distinct domain known as big data. In the realm of information technology, the term “big data” commonly refers to vast sets of data that surpass the capacity and complexity manageable by traditional databases. Healthcare activities generate large amounts of data. Analytical procedures have been used to derive actionable judgments from data management technologies [4] . It promises us the power of early detection, prediction, prevention, and helps us to improve the quality of life [5] . Numerous machine learning models have been developed to predict pregnancies. In addition to these models, employing an analytical approach could prove to be an efficient and cost-effective method. Therefore, this paper investigates the use of analytics through power BI analytical tool on the patients with gestational diabetes.

4. Data Modeling

To facilitate effective data analysis, the dataset is divided into five distinct tables as shown in Figure 1, visit 1, visit 2, visit 3, visit 4, and difference between visit 1 and visit 3, allowing for focused examination of each pregnancy visit and comparisons between specific visits.

5. Methodology

Data cleaning plays a pivotal role in ensuring the integrity and usability of a dataset for analytical purposes. In this research, we present a systematic approach to data cleaning and preprocessing applied to a complex dataset comprising medical and fetal data. The paper also emphasizes the importance of data preprocessing in enhancing the overall quality of the dataset for subsequent analysis. The following sections describe the specific steps and techniques employed to address data quality issues and enhance the dataset’s utility. The methodology and strategies implemented are outlined, with each step

Figure 1. Data modeling.

explained and exemplified. Firstly, the dataset was partitioned into five sub-tables: Visit 1, Visit 2, Visit 3, Difference in Visit 1 and 3, and Fetal Details. The steps are:

1) Removal of Null and Redundant Columns Null and redundant columns were removed to simplify the dataset, enhance clarity, and reduce computational overhead.

2) Decimal Point Rounding Columns with excessive decimal points were rounded to maintain consistency and precision in numeric values.

3) Uniform Case Conversion Inconsistent case types (e.g., “Yes” and “yes”) were standardized to ensure uniformity.

4) Data Type Transformation Column data types were adjusted based on the content of each column, ensuring accurate representation.

5) Column Name Modification Column names were modified for improved clarity and ease of interpretation.

6) Outlier Detection and Handling Outliers in column content were identified and addressed through various methods, such as correcting decimal places and recalculating erroneous values.

7) Unwanted Column Removal Columns containing repetitive or unnecessary information were removed to streamline the dataset.

8) Text Value Replacement Conditional columns were created to replace text values with appropriate ones, addressing inconsistencies in data entry.

9) Delimiter Standardization Columns with multiple delimiters were processed to achieve a uniform data pattern.

10) Symbol Replacement Symbols, such as “<.>”, were replaced with values slightly higher or lower to facilitate data analysis.

11) Column Splitting Columns were split using delimiters to consolidate data into two categories (e.g., “Yes” and “No”).

12) Generalization of Values Similar values were generalized or replaced to ensure consistency and reduce complexity. For instance, ethnicities were categorized into five major groups.

6. Demographics

Demographics of the Dataset are shown in Figure 2. The dataset comprises 600 pregnant women, with 50 diagnosed with GDM, 7 with prediabetes, and the remainder as non-diabetic. Many of the patients were aged 30 and above, and most belonged to the white race. The dataset encompasses four critical visits throughout pregnancy and includes a significant percentage of women with a high BMI.

7. Identification and Risk Factor Analysis

Figure 3 shows the analysis revealed 50 GDM cases, with risk factors including obesity (33%), family history of type 2 diabetes, advanced maternal age (29 women

Figure 2. Demographics of dataset.

Figure 3. Risk analysis.

above 30), and limited instances of prior GDM. Ethnicity’s impact was inconclusive due to skewed representation [6] .

In addition, the study identified 50 GDM cases based on oral glucose tolerance tests (OGTT)/Glucose Challenge Tests (GCT) during the second trimester (V2), corroborated by the “Dx with GDM” column in the V4 table as shown in Figure 4. Risk factors for GDM, including obesity, family history, previous GDM history, age, and ethnicity, were analyzed, providing critical insights into potential predispositions [7] .

Figure 5 shows analysis of various biomarkers, such as liver enzymes (ALT), kidney markers (albumin and creatinine), platelets, vitamin D, and others, were examined to understand their associations with GDM. Notable findings include elevated ALT levels in GDM patients, potential kidney-related complications, reduced platelet counts, and a correlation between vitamin D deficiency and GDM.

The analysis of maternal complications revealed several maternal complications associated with GDM as shown in Figure 6, including cesarean sections (C-sections), gestational hypertension, and the use of interventions during delivery. However, there were no instances of preeclampsia or eclampsia in the dataset, emphasizing the need for further exploration [8] .

Fetal Complications Fetal complications observed in babies born to mothers

Figure 4. OGTT/GCT visualization visualize based on other biomarkers.

Figure 5. Visualization of liver enzymes.

Figure 6. Visualization of maternal and fetal complication.

with GDM included macrosomia, low Apgar scores, antenatal steroid use, and abnormal spinal test results. These findings underscore the importance of managing GDM to mitigate fetal risks [9] .

8. Conclusion

This comprehensive analysis offers valuable insights into the demographics, risk factors, biomarkers, and complications associated with GDM in a sizable healthcare dataset. Despite its contributions, this study acknowledges the limitations of the dataset and highlights the necessity for further research with larger sample sizes to gain a more nuanced understanding of GDM and its consequences. This research serves as a foundation for improved management and care of GDM during pregnancy, ultimately benefiting both mothers and their infants.

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

References

[1] El-Rashidy, N., ElSayed, N.E., El-Ghamry, A. and Talaat, F.M. (2022) Prediction of Gestational Diabetes Based on Explainable Deep Learning and Fog Computing. Soft Computing, 26, 11435-11450.
https://doi.org/10.1007/s00500-022-07420-1
[2] Masic, I. (2014) Five Periods in Development of Medical Informatics. Acta Informatica Medica, 22, 44-48.
[3] Wang, F. and Stiglic, G. (2015) Data Analytics in Healthcare Informatics. 2015 International Conference on Healthcare Informatics, Dallas, TX, 21-23 October 2015, 444.
https://doi.org/10.1109/ICHI.2015.62
[4] Awrahman, B.J., Aziz Fatah, C. and Hamaamin, M.Y. (2022) A Review of the Role and Challenges of Big Data in Healthcare Informatics and Analytics. Computational Intelligence and Neuroscience, 2022, Article ID 5317760.
https://doi.org/10.1155/2022/5317760
[5] Rehman, A., Naz, S. and Razzak, I. (2022) Leveraging Big Data Analytics in Healthcare Enhancement: Trends, Challenges and Opportunities. Multimedia Systems, 28, 1339-1371.
https://doi.org/10.1007/s00530-020-00736-8
[6] Razo-Azamar, M., et al. (2023) An Early Prediction Model for Gestational Diabetes Mellitus Based on Metabolomic Biomarkers. Diabetology & Metabolic Syndrome, 15, Article Number: 116.
https://doi.org/10.1186/s13098-023-01098-7
[7] Belsti, Y., et al. (2023) Comparison of Machine Learning and Conventional Logistic Regression-Based Prediction Models for Gestational Diabetes in an Ethnically Diverse Population; the Monash GDM Machine Learning Model. International Journal of Medical Informatics, 179, 105228.
https://doi.org/10.1016/j.ijmedinf.2023.105228
[8] Snyder, B.M., et al. (2020) Early Pregnancy Prediction of Gestational Diabetes Mellitus Risk Using Prenatal Screening Biomarkers in Nulliparous Women. Diabetes Research and Clinical Practice, 163, 108139.
https://doi.org/10.1016/j.diabres.2020.108139
[9] Cooray, S.D., Boyle, J.A., Soldatos, G., Wijeyaratne, L.A. and Teede, H.J. (2019) Prognostic Prediction Models for Pregnancy Complications in Women with Gestational Diabetes: A Protocol for Systematic Review, Critical Appraisal and Meta-Analysis. Systematic Reviews, 8, Article Number: 270.
https://doi.org/10.1186/s13643-019-1151-0

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.