Machine Learning Models for Blood Glucose Level Prediction in Patients With Diabetes Mellitus: Systematic Review and Network Meta-Analysis


Introduction

Diabetes mellitus (DM) has become one of the most serious health problems worldwide [], with more than 463 million (9.3%) patients in 2019; this number is predicted to reach 700 million (10.9%) in 2045 [], which has resulted in growing concerns about the negative impacts on patients’ lives and the increasing burden on the health care system []. Furthermore, previous studies have shown that without appropriate medical care, DM can lead to multiple long-term complications in blood vessels, eyes, kidneys, feet (ulcers), and nerves [-]. Adverse blood glucose (BG) events are one of the most common short-term complications, including hypoglycemia with BG<70 mg/dL and hyperglycemia with BG>180 mg/dL. Hyperglycemia in patients with DM may lead to lower limb occlusions and extremity nerve damage, further leading to decay, necrosis, and local or whole-foot gangrene, even requiring amputation [,]. Hypoglycemia can cause serious symptoms, including anxiety, palpitation, and confusion in a mild scenario and seizures, coma, and even death in a severe scenario [,]. Thus, there is an imminent need for preventing adverse BG events.

Machine learning (ML) models use statistical techniques to provide computers with the ability to complete assignments by training themselves without being explicitly programmed []. However, ML models for managing BG requires huge amounts of BG data, which cannot be satisfied by the multiple data points generated by the traditional finger-stick glucose meter []. With the introduction of the continuous glucose monitoring (CGM) device, which typically produces a BG reading every 5 minutes all day long, the size of the data set of BG readings is sufficient to be used in ML models [].

Recently, there has been an immense surge in using ML technologies for predicting DM complications. Regarding BG management, previous studies have developed different types of ML models, including random forest (RF) models, support vector machines (SVMs), neural network models (NNMs), and autoregression models (ARMs), using CGM data, electronic health records (EHRs), electrocardiograph (ECG), electroencephalograph (EEG), and other information (ie, biochemical indicators, insulin intake, exercise, and meals) [,-]. However, the performance of different models in these studies was not quite consistent. For instance, in terms of BG level prediction, Prendin et al [] showed that the SVM achieved a lower root mean square error (RMSE) than the ARM, while Zhu et al [] showed a different result.

Therefore, this meta-analysis aimed to comprehensively assess the performance of ML models in BG management in patients with DM.


MethodsSearch Strategy and Study Selection

The study protocol has been registered in the international prospective register of systematic reviews (PROSPERO; registration ID: CRD42022375250). Studies on BG levels or adverse BG event prediction or detection using ML models were eligible, with no restrictions on language, investigation design, or publication status. PubMed, Embase, Web of Science, and Institute of Electrical and Electronics Engineers (IEEE) Explore databases were systematically searched from inception to November 2022. Keywords used for study repository searches were (“machine learning” OR “artificial intelligence” OR “logistic model” OR “support vector machine” OR “decision tree” OR “cluster analysis” OR “deep learning” OR “random forest”) AND (“hypoglycemia” OR “hyperglycemia” OR “adverse glycemic events”) AND (“prediction” OR “detection”). Details regarding the search strategies are summarized in . Manual searches were added to review reference lists in relevant studies.

Selection Criteria

Inclusion criteria were as follows: (1) participants in the studies were diagnosed with DM; (2) study endpoints were hypoglycemia, hyperglycemia, or BG levels; (3) the studies established at least 2 or more types of ML models for prediction of BG levels and 1 or more types of ML models for prediction or detection of adverse BG events; (4) the studies reported the performance of ML models with statistical or clinical metrics; (5) the studies contained the development and validation of ML models; and (6) study outcomes were means (SDs) of performance metrics of test data for prediction of BG levels and sensitivity and specificity of test data for prediction or detection of adverse BG events.

Exclusion criteria were as follows: (1) studies did not report on the derivation of ML models, (2) studies were based only on physiological or control-oriented ML models, (3) studies could not reproduce true positives, true positives, false negatives, and false positives for prediction or detection of adverse BG events, (4) studies were reviews, systematic reviews, animal studies, or irretrievable and repetitive papers, and (5) studies had unavailable full text or outcome metrics.

Authors KL and LYL screened and selected studies independently based on the criteria mentioned before. Authors KL and YM extracted and recorded the data from the selected studies. Conflicts were resolved by reaching a consensus. The study strictly followed the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analysis) statement () [-].

Data Extraction and Management

Two reviewers independently carried out data extraction and quality assessment. If a single study included more than 1 extractable test results for the same ML model, the best result was extracted. If a single study included 2 or more models, the performance metrics of each model were extracted. For studies predicting BG levels, RMSEs based on different prediction horizons (PHs) were extracted. For studies predicting or detecting adverse BG events, the sensitivity, specificity, and precision of reproducing the 2×2 contingency table were extracted.

Specifically, the following information was extracted:

General characteristics: first author, publication year, country, data source, and study purpose (ie, predicting or detecting hypoglycemia)Experimental information: participants (type of DM, type 1 or 2), sample size (patients, data points, and hypoglycemia), demographic information, models, study place and time, model parameters (ie, input and PHs), model performance metrics, threshold of BG levels for hypoglycemia, and reference (ie, finger-stick)Methodological Quality Assessment of Included Reviews

The Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool was applied to assess the quality of included studies based on patient selection (5 items), index test (3 items), reference standard (4 items), and flow and timing (4 items). All 4 domains were used for assessing the risk of bias, and the first 3 domains were used to assess the consensus of applicability. Each domain has 1 query in relation to the risk of bias or applicability consisting of 7 questions [].

Data Synthesis and Statistical Analysis

The performance metrics of ML models used to predict BG levels, predict adverse BG events, and detect adverse BG events were assessed independently. The performance metrics were the RMSE of ML models in predicting BG levels and the sensitivity and specificity of ML models in predicting or detecting adverse BG events. A network meta-analysis was conducted for BG level–based studies to assess the global and local inconsistency between studies and plotted the surface under the cumulative ranking (SUCRA) curve of every model to calculate relative ranks. For event-based studies, pooled sensitivity, specificity, the positive likelihood ratio (PLR), and the negative likelihood ratio (NLR) with 95% CIs were calculated. Study heterogeneity was assessed by calculating I² values based on multivariate random-effects meta-regression that considered within- and between-study correlation and classifying them into quartiles (0% to <25% for low, 25% to <50% for low-to-moderate, 50% to <75% for moderate-to-high, and >75% for high heterogeneity) [,]. Furthermore, meta-regression was used to evaluate the source of heterogeneity for both BG level–based and adverse event–based studies. The summary receiver operating characteristic (SROC) curve of every model was also used to evaluate the overall sensitivity and specificity. Publication bias was assessed using the Deek funnel plot asymmetry test.

Furthermore, BG level–based studies were divided into 4 subgroups based on different PHs (15, 30, 45, 60 minutes), and adverse event–based studies were analyzed using different types of models (ie, NNM, RF, and SVM). A 2-sided P value of <.05 was considered statistically significant. All statistical analyses were performed using Stata 17 (Stata Corp) and Review Manager (RevMan; Cochrane) version 5.3.


ResultsSearch Results

A total of 20,837 studies were identified through systematically searching the predefined electronic databases; these also included 21 studies found using reference tracking [,-]. Of the 20,837 studies, 9807 (47.06%) were retained after removing duplicates. After screening titles and abstracts, 9400 (95.85%) studies were excluded owing to reporting irrelevant topics or no predefined outcomes. The remaining 407 (4.15%) studies were retrieved for full-text evaluation. Of these, 361 (88.7%) studies were excluded for various reasons, and therefore 46 (11.3%) studies were included in the final meta-analysis ().

Figure 1. Flow diagram of identifying and including studies. IEEE: Institute of Electrical and Electronics Engineers. Description of Included Studies

As studies on hyperglycemia were insufficient for analysis, we selected studies on hypoglycemia to assess the ability of ML models to predict adverse BG events. In total, the 46 studies included 28,775 participants: n=428(1.49%)for predicting BG levels, n=28,138 (97.79%) for predicting adverse BG events, and n=209 (0.72%) for detecting adverse BG events. Of the 46 studies, 10 (21.7%) [-,-] predicted BG levels (), 19 (41.3%) [,-,,,-] predicted adverse BG events (), and the remaining 17 (37%) [,,-,-] detected adverse BG events ().

Table 1. Baseline characteristics of BGa level-based studies (N=10).First author (year), countryData sourceSample sizeDemographic informationObject; settingModel; PHb (minutes); inputPerformance metricsPatients, nData points, n
Pérez-Gandía (2010), Spain []CGMc device15728—dT1DMe; outModels: NNMf, ARMg PH: 15, 30 Input: CGM dataRMSEh, delayPrendin (2021) United States []CGM deviceReal (n=141)350,000AgeT1DM; outARM, autoregressive moving average (ARMA), autoregressive integrated moving average (ARIMA), SVMi, RFj feed-forward neural network (fNN), long short-term memory (LSTM) PH: 30 Input: CGM dataRMSE, coefficient of determination (COD) sensibility, delay, precision F1 score, time gainZhu (2020) England []Ohio T1DM, UVA/Padova T1DReal (n=6), simulated (n=10)1,036,800—T1DM; outDRNNk, NNM, SVM, ARM PH:30 Input: BG level, meals, exercise, meal timesRMSE, mean absolute relative difference (MARD) time gainD\'Antoni (2020), Italy []Ohio T1DM6—Age, sex ratioT1DM; outARJNNl, RF, SVM, autoregression (AR), one symbolic model (SAX), recurrent neural network (RNN), one neural network model (NARX), jump neural network (JNN), delayed feed-forward neural network model (DFFNN) PH: 15, 30 Input: CGM dataRMSEAmar (2020), Israel []CGM device, insulin pump1411,592,506Age, sex ratio, weight, BMI, duration of DMT1DM; inARM, gradually connected neural network (GCN), fully connected (FC [neural network]), light gradient boosting machine (LCBM), RF PH: 30, 60 Input: CGM dataRMSE, Clarke error grid (CEG)Li (2020), England []UVA/Padova T1DSimulated (n=10)51,840—T1DM; outGluNet, NNM, SVM, latent variable with exogenous input (LVX), ARM PH: 30, 60 Input: BG level, meals, exerciseRMSE, MARD, time lagZecchin (2012), Italy []UVA/Padova T1D, CGM deviceSimulated (n=20), real (n=15)——T1DM; outNeural network–linear prediction algorithm (NN-LPA), NN, ARM PH: 30 Input: meals, insulinRMSE, energy of second-order differences (ESOD), time gain, J indexMohebbi (2020), Denmark []Cornerstones4Care platformReal (n=50——T1DM; inLSTM, ARIMA PH: 15, 30, 45, 60, 90RMSE, MAEDaniels (2022), England []CGM deviceReal (n=12)—Sex ratioT1DM; outConvolutional recurrent neural network (CRNN), SVM PH: 30, 45, 60, 90, 120 Input: BG level, insulin, meals, exerciseRMSE, MAE, CEG, time gainAlfian (2020), Korea []CGM deviceReal (n=12)26,723——SVM, k-nearest neighbor k-nearest neighbor (kNN), DTm, RF, AdaBoost, XGBoostn, NNM PH: 15, 30 Input: CGM dataRMSE, glucose-specific root mean square error (gRMSE), R2 score, mean absolute percentage error (MAPE)

aBG: blood glucose.

bPH: prediction horizon.

cCGM: continuous glucose monitoring.

dNot applicable.

eT1DM: type 1 diabetes mellitus.

fNNM: neural network model.

gARM: autoregression model.

hRMSE: root mean square error.

iSVM: support vector machine.

jRF: random forest.

kDRNN: dilated recurrent neural network.

lARJNN: ARTiDe jump neural network.

mDT: decision tree.

nXGBoost: Extreme Gradient Boosting.

Table 2. Baseline characteristics of studies predicting adverse BGa events (N=19).First author (year), countryData sourceSample sizeObject; settingModelTimeAge (years), mean (SD)/rangeThresholdPatients, nData points, nHypoglycemia, nPils (2014), United States []CGMb device22518152T1DMc; outSVMdAll—e3.9Seo (2019), Korea []CGM device1047052412DMf; outRFg, SVM, k-nearest neighbor (kNN), logistic regression (LR)Postprandial523.9Parcerisas (2022), Spain []CGM device106722T1DM; outSVMNocturnal31.8 (SD 16.8)3.9Stuart (2017), Greece []EHRsh9584—1327DM; inMultivariable logistic regression (MLR)All—4Bertachi (2020), Spain []CGM device1012439T1DM; outSVMNocturnal31.8 (SD 16.8)3.9Elhadd (2020), Qatar []—133918172T2DM; outXGBoostiAll35-63—Mosquera-Lopez (2020), United States []CGM device1011717T1DM; outSVMNocturnal33.7 (SD 5.8)3.9Mosquera-Lopez (2020), United States []CGM device202706258T1DM; outSVMNocturnal—3.9Ruan (2020), England []EHRs17,6583276703T1DM; inXGBoost, LR, stochastic gradient descent (SGD), kNN, DTj, SVM, quadratic discriminant analysis (QDA), RF, extra tree (ET), linear discriminant analysis (LDA), AdaBoost, baggingAll66 (SD 18)4Güemes (2020), United States []CGM device6556T1DM; outSVMNocturnal40-603.9Jensen (2020), Denmark []CGM device46392179T1DM; outLDANocturnal43 (SD 15)3Oviedo (2019), Spain []CGM device101447420T1DM; outSVMPostprandial41 (SD 10)3.9Toffanin (2019), Italy []CGM device20709636T1DM; outIndividual model-basedAll463.9Bertachi (2018), United States []CGM device6516T1DM; outNNMkNocturnal40-603.9Eljil (2014), United Arab Emirates []CGM device10667100T1DM; outBaggingAll253.3Dave (2021), United States []CGM device112546,64012,572T1DM; outRFAll12.67 (SD 4.84)3.9Marcus (2020), Israel []CGM device1143,5335264T1DM; outKernel ridge regression (KRR)All18-393.9Reddy (2019), United States []—559029T1DM; outRF—33 (SD 6)3.9Sampath (2016), Australia []—3415040T1DM; outRanking aggregation (RA)Nocturanl——Sudharsan (2015), United States []——839428T2DM; outRFAll—3.9

aBG: blood glucose.

bCGM: continuous glucose monitoring.

cT1DM: type 1 diabetes mellitus.

dSVM: support vector machine.

eNot applicable.

fDM: diabetes mellitus.

gRF: random forest.

hEHR: electronic health record.

iXGBoost: Extreme Gradient Boosting.

jDT: decision tree.

kNNM: neural network model.

Table 3. Baseline characteristics of studies detecting adverse BGa events (N=17).First author (year), countryData sourceSample sizeObject; settingModelTimeAge (years), mean (SD)/rangeThresholdPatients, nData points, nHypoglycemia, nJin (2019), United States []EHRsb—c4104132T1DMd; inLinear discriminant analysis (LDA)All——Nguyen (2013), Australia []EEGe514476T1DM; inLevenberg-Marquardt (LM), genetic algorithm (GA)All12-183.3Chan (2011), Australia []CGMf device1610052T1DM; experimentalFeed-forward neural network (fNN)Nocturnal14.6 (SD 1.5)3.3Nguyen (2010), Australia []EEG67927T1DM; experimentalBlock-based neural network (BRNN)Nocturnal12-183.3Rubega (2020), Italy []EEG3425161258T1DM; experimentalNNMgAll55 (SD 3)3.9Chen (2019), United States []EEG—30011DMh; inLogistic regression (LR)All——Jensen (2013), Denmark []CGM device101267160T1DM; experimentalSVMiAll44 (SD 15)3.9Skladnev (2010), Australia []CGM device525211T1DM; infNNNocturnal16.1 (SD 2.1)3.9Iaione (2005), Brazil []EEG81990995T1DM; experimentalNNMMorning35 (SD 13.5)3.3Nuryani (2012), Australia []ECG5575133DM; inSVM, linear multiple regression (LMR)All16 (SD 0.7)3.0San (2013), Australia []ECG1544039T1DM; inBlock-based neural network (BBNN), wavelet neural network (WNN), fNN, SVMAll14.6 (SD 1.5)3.3Ling (2012), Australia []ECG1626954T1DM; inFuzzy reasoning model (FRM), fNN, multiple regression–fuzzy inference system (MR-FIS)Nocturnal14.6 (SD 1.5)3.3Ling (2016), Australia []ECG1626954T1DM; inExtreme learning machine–based neural network (ELM-NN), particle swarm optimization–based neural network (PSO-NN), MR-FIS, LMR, fuzzy inference system (FIS)Nocturnal14.6 (SD 1.5)3.3Nguyen (2012), Australia []EEG54420T1DM; inNNM—12-183.3Ngo (2020), Australia []EEG813553T1DM; inBRNNNocturnal12-183.9Ngo (2018), Australia []EEG85426T1DM; inBRNNNocturnal12-183.9Nuryani (2010), Australia []ECG5278T1DM; experimentalFuzzy support vector machine (FSVM), SVMNocturnal16 (SD 0.7)3.3

aBG: blood glucose.

bEHR: electronic health record.

cNot applicable.

dT1DM: type 1 diabetes mellitus.

eEEG: electroencephalograph.

fCGM: continuous glucose monitoring.

gNNM: neural network model.

hDM: diabetes mellitus.

iSVM: support vector machine.

As shown in -, 40 (87%) studies [,,-,,,-,-,-] included participants with type 1 diabetes mellitus (T1DM), 2 (4.3%) studies [,] included participants with type 2 diabetes mellitus (T2DM), and the remaining 4 (8.7%) studies [,,,] did not specify the type of DM. Regarding the data source of ML models, CGM devices were involved in 22 (47.8%) studies [,,,,,,-,,,,,,,-], EEG signals were used in 8 (17.4%) studies [,-,,-], ECG signals were involved in 5 (10.9%) studies [-,], EHRs were used in 3 (6.5%) studies [,,], data generated by the UVA/Padova T1D simulator were used in 3 (6.5%) studies [,,], the Ohio T1DM data set was used in 2 (4.3%) studies [,], and 4 (8.7%) studies [,-] did not report the source of data. Regarding the setting of data collection, 24 (52.2%) studies [,-,,-,-,-,,,,-] were conducted in an out-of-hospital setting, 13 (28.3%) studies [,,,,,,-] were conducted in an in-hospital setting, 6 (13%) studies [-,,,] were conducted in an experimental setting, and the remaining 1 (2.2%) study [] did not specify the environment. Regarding when adverse BG events occurred in the 36 (78.3%) adverse event–based studies, 15 (41.7%) [,,,,,,,,,,,,-] reported nocturnal hypoglycemia, 16 (44.4%) [,,,,,,,-,,,,-] were not specific about the time of day, 2 (5.6%) [,] reported postprandial hypoglycemia, 1 (2.8%) [] reported morning hypoglycemia, and the remaining 2 (5.6%) [,] did not report the time setting. To carry out the network meta-analysis of BG level–based studies, we chose the RMSE as the outcome to be compared.

Quality Assessment of Included Studies

The quality assessment results using the QUADAS-2 tool showed that more than half of all included studies did not report the patient selection criteria in detail, which led to low-quality patient selection (). Furthermore, the diagnosis of hypoglycemia using blood or the CGM device was considered high quality in the reference test in our study.

Figure 2. Quality assessment of included studies. Risk of bias and applicability concerns graph (A) and risk of bias and applicability concerns summary (B). Statistical AnalysisMachine Learning Models for Predicting Blood Glucose Levels

Network meta-analysis was conducted to evaluate the performance of different ML models. For PH=30 minutes, 10 (21.7%) studies [-,-] with 32 different ML models were included, and the network map is shown in A. The mean RMSE was 21.40 (SD 12.56) mg/dL. Statistically significant inconsistency was detected using the inconsistency test(2=87.11, P<.001), as shown in the forest plot in . Meta-regression indicated that I² for the RMSE was 60.75%, and the source of heterogeneity analysis showed that place and validation type were statistically significant (P<.001). The maximum SUCRA value was 99.1 for the dilated recurrent neural network (DRNN) model with a mean RMSE of 7.80 (SD 0.60) mg/dL [], whereas the minimum SUCRA value was 0.4 for 1 symbolic model with a mean RMSE of 71.4 (SD 21.9) mg/dL []. The relative ranks of the ML models are shown in , and the SUCRA curves are shown in A. Publication bias was tested using the Egger test (P=.503), indicating no significant publication bias.

For PH=60 minutes, 4 (8.7%) studies [,,] with 17 different ML models were included, and the network map is shown in B. The mean RMSE was 30.01 (SD 7.23) mg/dL. Statistically significant inconsistency was detected using the inconsistency test (2=8.82, P=.012), as shown in the forest plot in . Meta-regression indicated that none of the sample size, reference, place, validation type, and model type was a source of heterogeneity. The maximum SUCRA value was 97.8 for the GluNet model with a mean RMSE of 19.90 (SD 3.17) mg/dL [], while the minimum SUCRA value was 4.5 for the decision tree (DT) model with a mean RMSE of 32.86 (SD 8.81) mg/dL []. The relative ranks of the ML models are shown in , and the SUCRA curves are shown in B. No significant publication bias was detected using the Egger test (P=.626).

For PH=15 minutes, 3 (6.5%) studies [,,] with 14 different ML models were included, and the network map is shown in C. The mean RMSE was 18.88 (SD 19.71) mg/dL. Statistically significant inconsistency was detected using the inconsistency test (2=28.29, P<.001), as shown in the forest plot in . Meta-regression showed that I² was 41.28%, and the model type and sample size both were the source of heterogeneity, with P=.002 and .037, respectively. The maximum SUCRA value was 99.1 for the ARTiDe jump neural network (ARJNN) model with a mean RMSE of 9.50 (SD 1.90) mg/dL [], while the minimum SUCRA value was 0.3 for the SVM with a mean RMSE of 13.13 (SD 17.30) mg/dL []. The relative ranks of the ML models are shown in , and SUCRA curves are shown in C. Statistically significant publication bias was detected using the Egger test (P=.003).

For PH=45 minutes, only 2 (4.3%) studies [,] with 11 different ML models were included, and the network map is shown in D. The mean RMSE was 21.27 (SD 5.17) mg/dL. Statistically significant inconsistency was detected using the inconsistency test (2=6.92, P=.009), as shown in the forest plot in . Meta-regression indicated significant heterogeneity from the model type (P=.006). The maximum SUCRA value was 99.4 for the NNM with a mean RMSE of 10.65 (SD 3.87) mg/dL [], while the minimum SUCRA value was 26.3 for the DT model with a mean RMSE of 23.35 (6.36) mg/dL []. The relative ranks of the ML models are shown in , and SUCRA curves are shown in D. Statistically significant publication bias was detected using the Egger test (P<.001).

Figure 3. Network map of ML models for predicting BG levels in different PHs. PH=30 (A), 60 (B), 15 (C), and 45 minutes (D). ARIMA: autoregressive integrated moving average; ARM: autoregression model; ARMA: autoregressive moving average; ARJNN: ARTiDe jump neural network; BG: blood glucose; CRNN-MTL: convolutional recurrent neural network multitask learning; CRNN-MTL-GV: convolutional recurrent neural network multitask learning glycemic variability; CRNN-STL: convolutional recurrent neural network single-task learning; CRNN-TL: convolutional recurrent neural network transfer learning; DFFNN: delayed feed-forward neural network; DRNN: dilated recurrent neural network; DT: decision tree; FC: fully connected (neural network); fNN: feed-forward neural network; GCN: gradually connected neural network; JNN: jump neural network; kNN: k-nearest neighbor; LGBM: light gradient boosting machine; LSTM: long short-term memory; LVX: latent variable with exogenous input; ML: machine learning; NARX: one neural network model; NN-LPA: neural network–linear prediction algorithm; NNM: neural network model; PH: prediction horizon; RF: random forest; RNN: recurrent neural network; SAX: one symbolic model; SVR: support vector regression. Table 4. Relative ranks of MLa models for predicting BGb levels in PHc=30 minutes.ML modelSUCRAdRelative rankNNMe52.014.4ARMf39.617.9ARJNNg79.56.8RFh6.927.1SVMi73.38.5One symbolic model (SAX)0.428.9Recurrent neural network (RNN)19.023.7One neural network model (NARX)3.927.9Jump neural network (JNN)36.018.9Delayed feed-forward neural network model (DFFNN)15.824.6Gradually connected neural network (GCN)41.117.5Fully connected (FC [neural network])58.112.7Light gradient boosting machine (LGBM)69.39.6DRNNj99.11.2Autoregressive moving average (ARMA)54.313.8Autoregressive integrated moving average (ARIMA)46.616.0Feed-forward neural network (fNN)86.34.8Long short-term memory (LSTM)69.19.7GluNet96.42.0Latent variable with exogenous input (LVX)75.27.9Neural network–linear prediction algorithm (NN-LPA)60.012.2Convolutional recurrent neural network multitask learning (CRNN-MTL)77.57.3Convolutional recurrent neural network multitask learning glycemic variability (CRNN-MTL-GV)77.27.4Convolutional recurrent neural network transfer learning (CRNN-TL)71.88.9Convolutional recurrent neural network single-task learning (CRNN-STL)52.014.4k-Nearest neighbor (kNN)26.021.7DTk16.224.5AdaBoost18.024.0XGBoostl29.220.8

aML: machine learning.

bBG: blood glucose.

cPH: prediction horizon.

dSUCRA: surface under the cumulative ranking.

eNNM: neural network model.

fARM: autoregression model.

gARJNN: ARTiDe jump neural network.

hRF: random forest.

iSVM: support vector machine.

jDRNN: dilated recurrent neural network.

kDT: decision tree.

lXGBoost: Extreme Gradient Boosting.

Figure 4. SUCRA curves of ML models for predicting BG levels in different PHs. PH=30 (A), 60 (B), 15 (C), and 45 minutes (D). ARIMA: autoregressive integrated moving-average; ARM: autoregression model; ARMA: autoregressive moving average; ARJNN: ARTiDe jump neural network; BG: blood glucose; CRNN-MTL: convolutional recurrent neural networks multitask learning; CRNN-MTL-GV: convolutional recurrent neural networks multitask learning glycemic variability; CRNN-STL: convolutional recurrent neural networks single-task learning; CRNN-TL: convolutional recurrent neural networks transfer learning; DFFNN: delayed feed-forward neural network; DRNN: dilated recurrent neural network; DT: decision tree; FC: fully connected (neural network); fNN: feed-forward neural network; GCN: gradually connected neural network; JNN: jump neural network; kNN: k-nearest neighbor; LGBM: light gradient boosting machine; LSTM: long short-term memory; LVX: latent variable with exogenous input; ML: machine learning; NARX: one neural network model; NN-LPA: neural network–linear prediction algorithm; NNM: neural network model; PH: prediction horizon; RF: random forest; RNN: recurrent neural network; SAX: one symbolic model; SVR: support vector regression. Table 5. Relative ranks of MLa models for predicting BGb levels in PHc=60 minutes.ML modelSUCRAdRelative rankARMe41.010.4Gradually connected neural network (GCN)14.214.7Fully connected (FC [neural network])55.78.1Light gradient boosting machine (LGBM)56.08.0RFf59.77.5GluNet97.81.4NNMg59.97.4SVMh49.59.1Latent variable with exogenous input (LVX)85.93.3Convolutional recurrent neural network multitask learning (CRNN-MTL)61.47.2Convolutional recurrent neural network multitask learning glycemic variability (CRNN-MTL-GV)54.28.3Convolutional recurrent neural network transfer learning (CRNN-TL)44.59.9Convolutional recurrent neural network single-task learning (CRNN-STL)32.511.8k-Nearest neighbor (kNN)42.510.2DTi4.516.3AdaBoost24.113.1XGBoostj66.56.4

aML: machine learning.

bBG: blood glucose.

cPH: prediction horizon.

dSUCRA: surface under the cumulative ranking.

eARM: autoregression model.

fRF: random forest.

gNNM: neural network model.

hSVM: support vector machine.

iDT: decision tree.

jXGBoost: Extreme Gradient Boosting.

Table 6. Relative ranks of MLa models for predicting BGb levels in PHc=15 minutes.ML modelSUCRAdRelative rankNNMe84.43.0ARMf86.82.7ARJNNg99.11.1RFh64.65.6SVMi20.911.3One symbolic model (SAX)0.314.0Recurrent neural network (RNN)45.98.0One neural network model (NARX)11.812.5Jump neural network (JNN)62.2

留言 (0)

沒有登入
gif