Figure 1. Schematic representation of the adopted methodology for building predictive models. The model is trained using 80% of the available data and tested using the rest of the held-out data (unseen by the model during training). When new patient data are available, the final model in the hospital decision support can be used for the prediction of sepsis, MDRO, and mortality for efficient patient management.
Figure 1. Schematic representation of the adopted methodology for building predictive models. The model is trained using 80% of the available data and tested using the rest of the held-out data (unseen by the model during training). When new patient data are available, the final model in the hospital decision support can be used for the prediction of sepsis, MDRO, and mortality for efficient patient management.
Figure 2. Comparison of various classifiers SVC: support vector machine, LR: logistic regression, GNB: Gaussian Naive Bayes, GB: gradient boosting, XGB: XGboost, and RID: ridge classifiers were used. Only the training set (80% of data) is used for data augmentation and model selection.
Figure 2. Comparison of various classifiers SVC: support vector machine, LR: logistic regression, GNB: Gaussian Naive Bayes, GB: gradient boosting, XGB: XGboost, and RID: ridge classifiers were used. Only the training set (80% of data) is used for data augmentation and model selection.
Figure 3. Shows the recall versus learning rate for all three outcomes without data augmentation on the training set, plotted using XGboost as the base model. Without data augmentation, the event prediction capability of the models are 10–35%, 50–57%, and 35–50% for sepsis, MDRO, and mortality, respectively.
Figure 3. Shows the recall versus learning rate for all three outcomes without data augmentation on the training set, plotted using XGboost as the base model. Without data augmentation, the event prediction capability of the models are 10–35%, 50–57%, and 35–50% for sepsis, MDRO, and mortality, respectively.
Figure 4. Shows the recall versus learning rate for all three outcomes with data augmentation on the training set, plotted using XGboost as the base model. With data augmentation, the event prediction capability of models are 82–85%, 95–99%, and 88–93% for sepsis, MDRO, and mortality, respectively.
Figure 4. Shows the recall versus learning rate for all three outcomes with data augmentation on the training set, plotted using XGboost as the base model. With data augmentation, the event prediction capability of models are 82–85%, 95–99%, and 88–93% for sepsis, MDRO, and mortality, respectively.
Figure 5. Shows the recall versus number of estimators for all three outcomes without data augmentation on the training set, plotted using XGboost as the base model.
Figure 5. Shows the recall versus number of estimators for all three outcomes without data augmentation on the training set, plotted using XGboost as the base model.
Figure 6. Shows the recall versus number of estimators for all three outcomes with data augmentation on the training set, plotted using XGboost as the base model.
Figure 6. Shows the recall versus number of estimators for all three outcomes with data augmentation on the training set, plotted using XGboost as the base model.
Figure 7. Shows the performance of the (1) default XGboost model before hyperparameter tuning, best model resulted from (2) grid search, (3) random search, and (4) Bayesian optimization (hyperopt) for each endpoint, obtained by using 5-fold stratified cross-validation with model scoring set to “recall”. Parameter n_iter=100 was set for random search.
Figure 7. Shows the performance of the (1) default XGboost model before hyperparameter tuning, best model resulted from (2) grid search, (3) random search, and (4) Bayesian optimization (hyperopt) for each endpoint, obtained by using 5-fold stratified cross-validation with model scoring set to “recall”. Parameter n_iter=100 was set for random search.
Figure 8. Comparing the performance of the default, grid search, random search, and Bayesian optimization (hyperopt) models when the predictive variable MDRO is removed from the feature set to predict sepsis.
Figure 8. Comparing the performance of the default, grid search, random search, and Bayesian optimization (hyperopt) models when the predictive variable MDRO is removed from the feature set to predict sepsis.
Figure 9. Comparing the performance of the default, grid search, random search, and Bayesian optimization (hyperopt) models when the predictive variable sepsis is removed from the feature set to predict mortality.
Figure 9. Comparing the performance of the default, grid search, random search, and Bayesian optimization (hyperopt) models when the predictive variable sepsis is removed from the feature set to predict mortality.
Table 1. Characteristics of potential features in the dataset, F: fungi, GP: gram-positive, GN: gram-negative, MENA: Middle East and North Africa.
Table 1. Characteristics of potential features in the dataset, F: fungi, GP: gram-positive, GN: gram-negative, MENA: Middle East and North Africa.
Features (n = 1166)ValuesAge (mean ± std. dev)40.30±14.51InfectionBloodstream infection, n (%)427 (36.62%)Chest infection, n (%)260 (22.22%)Sinus infection, n (%)11 (0.94%)Skin infection, n (%)78 (6.68%)Colitis, n (%)86 (7.37%)Urinary tract infection, n (%)79 (6.8%)GenderMale, n (%)925 (79.33%)Female, n (%)241 (20.66%)Diagnostics categoryALL, n (%)283 (24.27%)AML, n (%)640 (54.88%)LYM, n (%)213 (18.26%)MDS, n (%)27 (2.31%)Type of microorganism in BSIF/GN,F/GP,GP,F,n(%)17 (3.98%)GN, n (%)337 (78.92%)GN, GP, n (%)19 (4.45%)GP, n (%)54 (12.65%)RegionR1- South Asia, n (%)497 (42.62%)R2- MENA, n (%)424 (36.36%)R3- East Pacific, n (%)166 (14.23%)R4- Sub-Sahara Africa, n (%)55 (4.71%)R5- Others, n (%)24 (2.05%)Treatment phasePretreatment, n (%)166 (14.23%)Induction for remission, n (%)323 (27.70%)Post induction, n (%)507 (43.48%)Salvage therapy, n (%)51 (4.37%)Palliative, n (%)119 (10.21%)Disease statusComplete/partial response, n (%)548 (47.00%)Refractory/Relapse, n (%)194 (16.64%)Others, n (%)424 (36.36%)OutcomeSepsis, n (%)229 (19.64%)MDRO, n (%)215 (18.43%)Mortality, n (%)66 (12.86%)Table 2. Distribution of features in the sepsis and non-sepsis groups, number of FNEs, n = 1166. Along with the features listed below, other multi-category features, such as region of origin, diagnostic category, treatment phase, disease status, and type of microorganism in the bloodstream, were used to predict sepsis. A total of 12 variables were used in the model to predict sepsis.
Table 2. Distribution of features in the sepsis and non-sepsis groups, number of FNEs, n = 1166. Along with the features listed below, other multi-category features, such as region of origin, diagnostic category, treatment phase, disease status, and type of microorganism in the bloodstream, were used to predict sepsis. A total of 12 variables were used in the model to predict sepsis.
FeaturesSepsis Group (n = 229)Non-Sepsis Group (n = 937)Age42.16 ± 15.6039.8 ± 14.19Sex (male)174 (75.98%)751 (80.14%)Line-related86 (37.55%)112 (11.95%)BSI-polymicrobial31 (13.54%)32 (3.41%)Chest infection98 (42.79%)162 (17.29%)UTI31 (13.54%)48 (5.1%)MDRO96 (41.92%)119 (12.70%)Table 3. Distribution of features in the MDRO and non-MDRO groups, number of FNEs, n = 1166. Along with the features listed below, other multi-category features, such as region of origin, diagnostic category, treatment phase, disease status, and type of microorganism in the bloodstream, were used to predict MDRO. A total of 13 variables were used in the model to predict MDRO.
Table 3. Distribution of features in the MDRO and non-MDRO groups, number of FNEs, n = 1166. Along with the features listed below, other multi-category features, such as region of origin, diagnostic category, treatment phase, disease status, and type of microorganism in the bloodstream, were used to predict MDRO. A total of 13 variables were used in the model to predict MDRO.
FeaturesMDRO Group (n = 215)Non-MDRO Group (n = 951)Age40.83 ± 14.6840.17 ± 14.46Sex (male)163 (75.81%)762 (80.12%)Line-related113 (52.56%)85 (8.93%)BSI-polymicrobial43 (20%)20 (2.1%)Chest infection65 (30.23%)195 (20.50%)Colitis32 (14.88%)54 (5.67%)UTI23 (10.69%)56 (5.89%)Skin infection30 (13.95%)48 (5.04%)Table 4. Distribution of features in the mortal and non-mortal groups, number of patients, n = 513. Along with the features listed below, other multi-category features, such as diagnostic category, treatment phase, disease status, and type of microorganism in the bloodstream, were used to predict mortality. A total of 8 variables were used in the model to predict mortality.
Table 4. Distribution of features in the mortal and non-mortal groups, number of patients, n = 513. Along with the features listed below, other multi-category features, such as diagnostic category, treatment phase, disease status, and type of microorganism in the bloodstream, were used to predict mortality. A total of 8 variables were used in the model to predict mortality.
FeaturesMortal Group (n = 66)Non-Mortal Group (n = 447)Age42.28 ± 16.6040.08 ± 14.59Sex (male)53 (80.30%)343 (76.73%)Chest infection40 (60.60%)100 (22.37%)Sepsis55 (83.33%)51 (11.41%)
Comments (0)