A Study on Near-Infrared Non-Invasive Blood Glucose Concentration Regression Prediction Based on PSO-MKL-SVR

Abstract

To improve the accuracy of predicting non-invasive blood glucose concentration in the near-infrared spectrum, we utilized the Particle Swarm Optimization (PSO) algorithm to optimize hyperparameters for the Multi-Kernel Learning Support Vector Machine (MKL-SVR). With these optimized hyperparameters, we established a non-invasive blood glucose regression model, referred to as the PSO-MKL-SVR model. Subsequently, we conducted a comparative analysis between the PSO-MKL-SVR model and the PSO-SVR model. In a dataset comprising ten volunteers, the PSO-MKL-SVR model exhibited significant precision improvements, including a 16.03% reduction in Mean Square Error and a 0.29% increase in the Squared Correlation Coefficient. Moreover, there was a 0.14% higher probability of the Clark’s Error Grid Analysis falling within Zone A. Additionally, the PSO-MKL-SVR model demonstrated a faster operational speed compared to the PSO-SVR model.

Share and Cite:

Yang, X. and Zhou, L. (2024) A Study on Near-Infrared Non-Invasive Blood Glucose Concentration Regression Prediction Based on PSO-MKL-SVR. Journal of Applied Mathematics and Physics, 12, 1-11. doi: 10.4236/jamp.2024.121001.

1. Introduction

Diabetes has emerged as a significant challenge in the global health sector. According to data from the International Diabetes Federation, approximately 537 million people worldwide were diagnosed with diabetes in 2021, resulting in 6.7 million deaths. This number is expected to continue rising in the coming years. Diabetes exerts profound impacts on patients’ lives, posing risks of serious complications such as cardiovascular diseases, vision loss, and kidney disorders. Although traditional blood glucose monitoring methods, especially those reliant on blood sampling, have played a crucial role in diabetes management, they come with a set of limitations. The frequent blood sampling process not only causes discomfort to patients, but also carries the risk of infections and other complications. Furthermore, these conventional methods fail to provide continuous monitoring, whereas diabetes management often demands more detailed and real-time data.

To overcome the limitations of traditional methods, non-invasive blood glucose detection technology has emerged as a promising alternative. This approach aims to monitor diabetes patients’ blood glucose levels through non-intrusive methods like laser spectroscopy and infrared spectroscopy, providing convenience, reducing infection risks, and offering more frequent and accurate blood glucose data for improved diabetes management. Recent years have seen significant research advancements in non-invasive technologies for blood glucose detection [1] [2] .

Machine learning-based near-infrared non-invasive blood glucose detection focuses on integrating human near-infrared spectral information with machine learning. This integration establishes regression models that reveal the complex nonlinear relationship between near-infrared spectral information and blood glucose concentration values. The goal is to alleviate the discomfort of invasive testing for diabetes patients while also reducing overall testing costs. Ongoing research in intelligent bio-optoelectronic information analysis shows sustained interest and development in applying machine learning methods such as neural networks and support vector machines.

Among these approaches, Wang et al. [3] introduced a deep belief network to extract high-dimensional features from non-invasive blood glucose spectra, using support vector machines to predict blood glucose concentration values. Nie et al. [4] employed imaging photoplethysmography to capture changes in arterial blood volume, establishing PCR, PLS, SVR, and RFR models. On another front, Aloraynan et al. [5] suggested using integrated machine learning methods to develop a single-wavelength mid-infrared spectral blood glucose classification model, achieving an accuracy rate of 90.4%. Han et al. [6] , based on a non-linear stacked autoencoder deep neural network, optimized the mixed model of ensemble linear partial least squares regression compared to support vector machines and partial least squares regression, resulting in significantly improved prediction performance.

Support Vector Machine (SVM), a prominent algorithm in machine learning, utilizes a kernel function to map data into a high-dimensional feature space. The Support Vector Machine model based on Multiple Kernel Learning (MKL) employs distinct parameters to create multiple kernels [7] . Subsequently, it trains the weights of each kernel, selecting the optimal combination of kernel functions for regression. In MKL-SVM, there are typically multiple kernel function weights and hyperparameters requiring adjustment. While grid search is commonly employed for hyperparameter optimization, this method often involves lengthy iterations. Population-based intelligent optimization algorithms excel in solving complex, high-dimensional, or nonlinear optimization problems. Particle Swarm Optimization (PSO) is one such algorithm, drawing inspiration from the collective behavior of social animals like birds or fish. PSO fosters collective intelligence through cooperation and competition among particles in the swarm, guiding and optimizing the search process [8] .

Zhang et al. [9] proposed a method that combines Particle Swarm Optimization with the Generalized Regression Neural Network (GRNN), showcasing impressive predictive performance with a root mean square error of 0.26 in non-invasive blood glucose detection. Ren et al. [10] introduced an enhanced PSO combined with the Wavelet Neural Network, improving the accuracy of blood glucose concentration prediction. Additionally, Li et al. [11] explored incorporating deep kernel features into a Particle Swarm Optimization and Multiple Kernel Learning Support Vector Machine model, achieving superior results compared to PSO Support Vector Machine. Therefore, methods involving PSO, MKL, and SVM can considerably enhance the predictive accuracy of models.

This paper employs the PSO-MKL-Support Vector Machine Regression (SVR) method to construct a non-invasive near-infrared blood glucose prediction model. The fundamental concept involves combining Gaussian kernel functions with different kernel parameters through linear combinations and integrating this composite kernel function with SVR to form the regression model. The PSO method is then utilized to search for the optimal parameters, including MKL weight parameters and the SVR penalty factor. These optimized parameters constitute the final non-invasive near-infrared blood glucose prediction model.

2. The PSO-MKL-SVR Model

MKL intricately merges multiple kernel functions through multi-layer linear combinations, forming a sophisticated kernel structure resembling a neural network, as illustrated in Figure 1. When L = 1 , the data is input in the form of

Figure 1. MKL network architecture.

data pairs ( X i , X j ) , with parameters set as:

( w 1 , q 1 , w 2 , q 1 , , w h 1 , q 1 ) = ( w 1 , q 1 , w 2 , q 1 , , w M , q 1 ) = ( w 1 , 1 1 w 2 , 1 1 w M , 1 1 w 1 , 2 1 w 2 , 1 2 w M , 2 1 w 1 , q 1 1 w 2 , q 1 1 w M , q 1 1 )

are used to linearly combine weights, resulting in

K q 1 ( X i , X j ) = h = 1 M w h , q 1 K h ( X i , X j ) .

In this setting, q = 1 , 2 , , q 1 , where q 1 represents the number of kernel functions in the first hidden layer. The output, K q 1 ( X i , X j ) , not only captures the results from q 1 kernels in the initial hidden layer of multiple kernel learning but also serves as input for the subsequent layers in this iterative process. As this multi-layered combination unfolds, the eventual complex kernel function for the data pair ( X i , X j ) in the output layer of multiple kernel learning is identified as:

K C K ( X i , X j ) = h = 1 h L + 1 w h , 1 L + 1 K h L ( X i , X j ) ,

where for the lth layer, there are h = 1 , 2 , , h l , q = 1 , 2 , , h l + 1 .

In the SVR model, input samples are non-linearly mapped to a high-dimensional feature space. The regression model is subsequently established within this high-dimensional feature space as follows:

f ( x ) = ω T ϕ ( x ) + b ,

where ω is the normal vector to the hyperplane, and b is the intercept. For a given dataset X = { X i } i = 1 n , Y = { Y i } i = 1 n , based on the literature [12] , the SVR constraint optimization problem is given by:

min w h , q l , ω , b 1 2 ω 2 + C i = 1 n ( ξ i + ξ i * ) s .t . f ( X i ) y i ε + ξ i , y i f ( X i ) ε + ξ i * , ξ i 0 , ξ i * 0 , i = 1 , , n , K C K ( X i , X j ) = h = 1 h L + 1 w h , 1 L + 1 K h L ( X i , X j ) . (1)

Utilizing the Lagrange multiplier method elegantly transforms the optimization problem presented in Equation (1) into its dual counterpart. Subsequently, arriving at the solution to Equation (1) becomes a streamlined process by resolving the dual problem:

b ˜ = Y S V i = 1 n ( a ˜ i * a ˜ i ) K C K ( X i , X S V ) , f ( X ˜ ) = i = 1 n ( a ˜ i * a ˜ i ) K C K ( X i , X ˜ ) + b ˜ ,

where a ˜ i * , a ˜ i are the optimal Lagrange multipliers for the dual problem in Equation (1), and X S V , Y S V represents any support vector in the dataset.

In the MKL-SVR model, it is necessary to find the optimal combination of MKL weights ( w h , q l ) and SVR penalty factor (C). Therefore, this study employs the PSO algorithm to iteratively search for the optimal combination ( w h , q l , C ) for MKL-SVR, using information sharing based on the current best solution obtained in the search process.

The PSO algorithm is one of the swarm intelligence optimization algorithms, simulating the foraging behavior of bird flocks. Each bird is abstracted as a particle without mass or volume. The position of the ith particle is represented as

x i = ( x i , 1 , x i , 2 , , x i , D ) ,

where x i , j { w h , q l , C } , D = ( L + 1 ) H Q + 1 , H = i = 1 L h i , Q = i = 1 L + 1 q i , j represents the number of elements in the parameter set ( w h , q l , C ) . The “flight” velocity of the ith particle is represented as v i = ( v i , 1 , v i , 2 , , v i , D ) . The best position found by the ith particle in the current iteration is denoted as the individual best p b e s t = ( p i , 1 , p i , 2 , , p i , D ) , and the best position found by the entire particle swarm is denoted as the global best g b e s t = ( g 1 , g 2 , , g D ) . By tracking the individual best and global best positions, the particle swarm’s direction and velocity are updated (Equation (2)), allowing the continual iteration to search for the optimal global solution.

v i , j ( t + 1 ) = w v i , j ( t ) + c 1 r 1 ( t ) [ p i , j ( t ) x i , j ( t ) ] + c 2 r 2 ( t ) [ g j ( t ) x i , j ( t ) ] , x i , j ( t + 1 ) = x i , j ( t ) + v i , j ( t + 1 ) , (2)

where i = 1 , 2 , , m represents the particle index, j = 1 , 2 , , D represents the parameter index, w is the inertia weight, v i , j [ v min , v max ] , x i , j [ x min , x max ] , v min , v max , x min , x max is a constant, r 1 , r 2 [ 0 , 1 ] are random numbers, and t is the iteration number.

After updating the forward direction and position in each iteration, it is necessary to perform boundary checking, which can be expressed as:

v i , j ( t + 1 ) = { v min , v i , j ( t + 1 ) < v min v max , v i , j ( t + 1 ) > v max , x i , j ( t + 1 ) = { x i , j ( t + 1 ) , x i , j ( t + 1 ) [ x min , x max ] x min + r a n d ( x max x min ) , x i , j ( t + 1 ) [ x min , x max ] , (3)

where r a n d is a random number between [0, 1]. To increase randomness, if r a n d > 0.8 , adaptive particle mutation is introduced, resulting in

{ x j ( t + 1 ) , j = 1 , 2 , , D D r a n d = 0 , x 1 ( t + 1 ) = x min + r a n d ( x max x min ) , D r a n d = 1 , x 2 ( t + 1 ) = x min + r a n d ( x max x min ) , D r a n d = 2 , x D ( t + 1 ) = x min + r a n d ( x max x min ) , D r a n d = D . (4)

In the PSO-MKL-SVR model, the PSO particle position encapsulates crucial parameters, x i , j { w h , q l , C } , necessary for MKL-SVR. Simultaneously, we define the initialization of x i , j while setting the maximum population size and maximum evolution times, indicating the number of particles in the search space and the iteration times, respectively.

Throughout each iteration, the MKL-SVR model undergoes training based on the parameters corresponding to each particle, generating prediction results using the test set. Subsequently, the individual best ( p b e s t ) for the ongoing iteration is determined by selecting the parameter combination with the minimum Mean Squared Error (MSE). The individual best undergoes updating x i , j through Equation (2), iterating to acquire the individual best for each cycle.

Among the individual best values, the global best ( g b e s t ) is identified as the parameter combination with the minimum MSE. The corresponding model is then outputted, and the PSO-MKL-SVR model adeptly determines the optimal parameters. Figure 2 visually represents this process.

3. Results and Discussion

3.1. Dataset and Evaluation Metrics

This section systematically evaluates the performance of the PSO-MKL-SVR model through experiments designed to predict non-invasive near-infrared blood glucose concentrations. Spanning the near-infrared spectral range from 900 to 1700 nm, these experiments utilize data obtained from Oral Glucose Tolerance Test (OGTT) experiments. These experiments measure the near-infrared reflectance spectral intensity on participants’ fingertips, along with corresponding actual blood glucose concentration values.

A total of ten volunteer participants contributed data, each dataset consisting of 149-dimensional near-infrared spectral features and their corresponding blood glucose concentration values. To facilitate effective model training, 90% of the data was assigned to the training set, reserving the remaining 10% for the test set. In comparison, the PSO-SVR model was chosen as a reference point for evaluation.

Assessing a model’s precision and reliability involves crucial metrics such as MSE, reflecting the mean of squared differences between predicted and actual values, and the Squared Correlation Coefficient (SCC), gauging the model’s capacity to elucidate variability in observed data. In conjunction, Clark’s Error Grid Analysis provides a visual method for evaluating the accuracy of non-invasive blood glucose models.

This analysis employs a coordinate system where actual blood glucose levels are plotted on the horizontal axis, and the model’s predicted blood glucose levels are represented on the vertical axis. The coordinate system is categorized into five regions: A, B, C, D, and E. Region A signifies errors with minimal impact on patient treatment decisions. Region B denotes errors with minor clinical consequences, typically avoiding overtreatment. Errors in Region C may cause mild or

Figure 2. Flowchart of PSO-MKL-SVR.

moderate clinical consequences, prompting treatment adjustments. Region D highlights errors that could result in moderate or severe clinical consequences, requiring immediate intervention. Lastly, Region E indicates errors where the difference between predicted and actual values may lead to dangerous clinical decisions, demanding urgent attention.

MSE = 1 m i = 1 m ( y i y i * ) 2 , SCC = R 2 = 1 i = 1 m ( y i y i * ) 2 i = 1 m ( y i y ¯ i ) 2 = 1 MSE ( y , y * ) Var ( y ) .

As a result, this study employs MSE, SCC, and Clark’s Error Grid Analysis to assess the model.

3.2. Comparative Experimental Results Analysis

To start, the PSO-SVR model undergoes pre-training, where the Gaussian kernel function showcases superior accuracy in contrast to the Sigmoid and polynomial alternatives. Consequently, the PSO-MKL-SVR model is crafted to harness the combined power of multiple Gaussian kernel functions for an in-depth comparison with the PSO-SVR model, exclusively utilizing the Gaussian kernel.

Throughout the training phase, we executed a thorough five-fold cross-validation to train three unique models: PSO-MKL-SVR, PSO-2MKL-SVR, and PSO-SVR. This process entailed the careful selection of optimal parameter sets. Following this, the testing set was leveraged to compute crucial metrics, including MSE, SCC and the probabilities within all regions of Clark’s Error Grid. The deliberate use of a five-fold cross-validation ensures that the models not only demonstrate robust performance but also exhibit effective generalization capabilities.

Upon reviewing Table 1, it becomes apparent that, among the 10 volunteers, the individually derived PSO-MKL-SVR models for the testing set demonstrate

Table 1. Results of the model validation for the ten volunteers.

an average Mean Squared Error (MSE) of 0.0272 and an average Coefficient of Determination (SCC) of 0.9862. These values mark significant improvements when compared to the average MSE and SCC of the PSO-SVR model, reflecting an enhanced accuracy by 16.03% and 0.29%, respectively. However, it’s worth noting that augmenting the number of layers in MKL only resulted in accuracy improvement for the model corresponding to the 2nd volunteer. Overall, the average MSE accuracy of the 2-layer PSO-MKL-SVR model experienced a marginal decrease of 0.63%, while the average SCC remained virtually unchanged.

Figure 3 and Table 2 provide a visual and numerical depiction of the Clarke’s Error Grid Analysis results for the ten volunteers. This analysis gauges the accuracy of blood glucose prediction models by scrutinizing the relationship between reference blood glucose concentration values and their predicted counterparts. The grid is segmented into five zones: A, B, C, D, and E. Points within Zones A and B are considered theoretically acceptable, while those in Zones C, D, and E may pose potential risks, potentially leading to clinical misdiagnosis. A closer look at Table 2 reveals a noteworthy 0.14% increase in the percentage of points falling within Zone A for the testing set of the PSO-MKL-SVR method. This signifies that the MKL approach enhances the predictive accuracy of the model and mitigates the impact of interfering factors on predictions. Furthermore, Figure 3 visually demonstrates that, at the edges of Zone B, the MKL algorithm bolsters the correlation between predicted blood glucose values and concentration values, nudging them into the more desirable Zone A.

Figure 3. The Clarke’s Error Grid Analysis graph for PSO-MKL-SVR and PSO-SVR.

Table 2. The Clarke Error Grid Analysis results for ten volunteers.

4. Conclusion

PSO-MKL-SVR effectively minimizes the influence of interfering factors on the predictive model, enhances the correlation between predicted blood glucose values and concentration values, and successfully corrects points at the edge of Zone B in the Clarke’s Error Grid Analysis, reclassifying them into the more desirable Zone A. When compared to PSO-SVR, PSO-MKL-SVR demonstrates notable improvements in average MSE and SCC accuracy, boasting increases of 16.03% and 0.29%, respectively. Despite PSO-MKL-SVR being configured with four parameters compared to PSO-SVR’s two, it exhibits a considerably shorter training time at 46.3 seconds, in contrast to 90.254 seconds for PSO-SVR. This time difference arises from the parameter ranges of PSO-MKL-SVR, where three parameters fall within [0, 1], and the fourth within [26, 28], as opposed to PSO-SVR with both parameters within [26, 28]. However, it’s crucial to acknowledge that the model’s predictions lack stability across different individuals; the MSE difference between the third and fifth volunteers for PSO-MKL-SVR is approximately 10.6 times, signifying a notable impact of inter-individual differences on the model’s accuracy, and highlighting the need for improvements in generalization.

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

References

[1] Tang, L., Chang, S.J., Chen, C.-J. and Liu, J.-T. (2020) Non-Invasive Blood Glucose Monitoring Technology: A Review. Sensors, 20, Article 6925.
https://doi.org/10.3390/s20236925
[2] Bolla, A.S. and Priefer, R. (2020) Blood Glucose Monitoring—An Overview of Current and Future Non-Invasive Devices. Diabetes & Metabolic Syndrome: Clinical Research & Reviews, 14, 739-751.
https://doi.org/10.1016/j.dsx.2020.05.016
[3] Wang, Z.Y., Zhou, L.H., Liu, T.Q., Huan, K.W. and Jia, X.N. (2022) Development of Non-Invasive Blood Glucose Regression Based on Near-Infrared Spectroscopy Combined with a Deep-Learning Method. Journal of Physics D: Applied Physics, 55, Article ID: 215401.
https://doi.org/10.1088/1361-6463/ac4723
[4] Nie, Z.H., Rong, M. and Li, K.Y. (2023) Blood Glucose Prediction Based on Imagingphotoplethysmography in Combination with Machine Learning. Biomedical Signal Processing and Control, 79, Article ID: 104179.
https://doi.org/10.1016/j.bspc.2022.104179
[5] Abdulrahman, A., Shazzad, R., Xu, C. and Ban, D.Y. (2022) A Single Wavelength Mid-Infrared Photoacoustic Spectroscopy for Noninvasive Glucose Detection Using Machine Learning. Biosensors, 12, Article 166.
https://doi.org/10.3390/bios12030166
[6] Han, G., Chen, S.Q., Wang, X.Y., et al. (2021) Noninvasive Blood Glucose Sensing by Near-Infrared Spectroscopy Based on PLSR Combines SAE Deep Neural Network Approach. Infrared Physics & Technology, 113, Article ID: 103620.
https://doi.org/10.1016/j.infrared.2020.103620
[7] Bach, F.R., Lanckriet, G.R.G. and Jordan, M.I. (2004) Multiple Kernel Learning, Conic Duality, and the SMO Algorithm. Proceedings of the Twenty-First International Conference on Machine Learning, Banff, 4-8 July 2004, 8 p.
https://doi.org/10.1145/1015330.1015424
[8] Marini, F. and Walczak, B. (2015) Particle Swarm Optimization (PSO). A Tutorial. Chemometrics and Intelligent Laboratory Systems, 149, 153-165.
https://doi.org/10.1016/j.chemolab.2015.08.020
[9] Zhang, J.X., Wu, Y.N., Sheng, M.Q., et al. (2022) Research on Non-Invasive Blood Glucose Detection Method Based on PSO-GRNN. 2022 International Conference on Intelligent Manufacturing and Industrial Big Data (ICIMIBD), Changsha, 9-11 December 2022, 137-143.
https://doi.org/10.1109/ICIMIBD58123.2022.00035
[10] Ren, Z., Liu, T., Xiong, C.X., et al. (2023) Quantitative Measurement of Blood Glucose Influenced by Multiple Factors via Photoacoustic Technique Combined with Optimized Wavelet Neural Networks. Journal of Biophotonics, 16, e202200304.
https://doi.org/10.1002/jbio.202200304
[11] Li, Y., Zheng, H.W., Huang, X.Y., et al. (2022) Research on Lung Nodule Recognition Algorithm Based on Deep Feature Fusion and MKL-SVM-IPSO. Scientific Reports, 12, Article No. 17403.
https://doi.org/10.1038/s41598-022-22442-3
[12] Cortes, C. and Vapnik, V. (1995) Support-Vector Networks. Machine Learning, 20, 273-297.
https://doi.org/10.1007/BF00994018

Copyright © 2023 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.