Using artificial intelligence algorithms to predict the overall survival of hemodialysis patients during the COVID-19 pandemic: A prospective cohort study

1. INTRODUCTION

The COVID-19 pandemic, which emerged in late 2019, had a profound and far-reaching impact on the global economy and healthcare systems. Among the numerous challenges posed by this ongoing crisis, individuals undergoing chronic hemodialysis (HD) face unique and important difficulties. Patients receiving in-center HD are particularly susceptible to infection clusters due to the inherent challenges of maintaining social distancing and the frequent hospital visits required for their treatment.1 Furthermore, individuals with end-stage renal disease and accompanying comorbidities such as diabetes and protein-energy wasting are at an increased risk of severe COVID-19 and mortality.2 To mitigate the adverse health outcomes associated with COVID-19, timely vaccination has emerged as the most efficient and effective strategy for this specific patient population.

Extensive research has conclusively demonstrated the effectiveness of COVID-19 vaccination in reducing infection rates, disease severity, hospitalization, and mortality in the general population.3,4 However, immunocompromised HD patients still face high mortality rates during follow-up, presenting a unique challenge. Studies consistently indicate a diminished humoral and cellular immune response to vaccination in this vulnerable population. Specifically, HD patients exhibited lower levels of plasma anti-severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) spike (S) receptor-binding domain (RBD) IgG antibodies, reduced neutralization capacity, and impaired T-cell response after receiving the second dose of the Pfizer-BioNTech BNT162b2 vaccine compared with nondialysis individuals.

Many researchers have discovered a rapid decline in antibody titers 16 weeks after vaccination.5 However, a separate study demonstrated that booster doses significantly enhance anti-RBD IgG titers, improve the T-cell response, and reduce the incidence of COVID-19 infection in both HD patients and nondialysis controls.6 These findings underscore the importance of administering repeated vaccinations to elicit a robust and long-lasting immune response against COVID-19, particularly in HD patients. Based on the current published evidence, the Taiwan Centers for Disease Control recommends an additional dose and a booster dose following the completion of the primary COVID-19 vaccine series for immunocompromised individuals, including HD patients.

Previous large population cohort studies have identified multiple factors, such as hypertension,7,8 proteinuria,9 demographic variables, and underlying comorbidities,10 that are associated with the rapid decline in estimated glomerular filtration rate. The emergency model developed for predicting the prognosis of COVID-19 achieved a mean area under the receiver operating characteristic curve (AUC) of 0.85.11 However, the complexity and interrelation of clinical characteristics and extensive electronic health record data pose challenges for traditional predictive models based on classification point systems.12,13

In recent years, the rapid advancement of artificial intelligence has brought about groundbreaking achievements in machine learning and big data analytics, leading to transformative innovations across various domains, including the development of predictive models.12–15 These state-of-the-art techniques empower researchers to delve into vast and diverse data sources, enabling comprehensive exploration and integration that result in remarkably accurate predictions and enhanced risk assessment. By harnessing the extraordinary capabilities of artificial intelligence algorithms, vast volumes of data can be efficiently analyzed, unveiling intricate patterns and valuable insights that were once inaccessible. This revolutionizes the ability to create highly precise and personalized predictions for individuals who may be at heightened risk of experiencing complications associated with COVID-19 vaccination. The use of artificial intelligence in predictive modeling holds immense potential for optimizing patient care and enabling proactive interventions in the context of vaccination-related challenges.

In our study, we aimed to use artificial intelligence algorithms to predict the survival impact of partial COVID-19 vaccination in HD patients. The model incorporated a comprehensive set of clinical characteristics, including demographic information, comorbidities, laboratory data, and concomitant medications obtained from outpatient visits, emergency room visits, and hospital admissions.

To improve the interpretability and reliability of our predictive model, we used SHapley Additive exPlanation (SHAP) values. These values helped us gain insights into the importance and contribution of each significant feature in our machine-learning model. By analyzing the SHAP values, we gained a deeper understanding of the impact of each feature on the prediction of overall survival in COVID-19 cases among HD patients. This approach enhanced the interpretability of our model and increased its reliability in supporting clinical decision-making.

2. METHODS 2.1. Data sources and study population

This study was conducted at Taipei Veterans General Hospital, a tertiary medical center in Taipei. We enrolled chronic hemodialysis participants from July 2021 to April 2022, excluding those with a previous diagnosis of COVID-19 and acute kidney injury. The study followed up until October 2022, during which the omicron variant was the predominant circulating strain of SARS-CoV-2 in Taiwan. The study was approved by the review board of Taipei Veterans General Hospital (VGH-2021-07-001AC).

2.2. Feature selection

We extracted 29 clinically relevant features from the patient population for in-depth analysis. These characteristics covered a range of important aspects, including demographic characteristics, underlying comorbidities, laboratory data, concomitant drug use, and whether the patient was infected with COVID-19.

The demographic characteristics consisted of essential medical information such as age, gender, and relevant lifestyle factors such as smoking and alcohol consumption. Furthermore, we carefully considered underlying comorbidities that might substantially impact patient outcomes, including hypertension, diabetes mellitus, coronary artery disease, congestive heart failure, peripheral arterial occlusive disease, cerebrovascular accident, cancer history, and smoking status.

To gain a deeper understanding of each patient’s medical profile, we integrated critical laboratory data into our analysis. This included baseline measurements of vital parameters such as blood urea nitrogen (BUN), serum albumin, calcium, cholesterol, chloride, creatinine, glucose, hematocrit, hemoglobin, potassium, sodium, triglycerides, uric acid, and urea reduction ratio.

2.3. Class definition

In our study, the classification of patients was based on the presence or absence of SARS-CoV-2 infection during the designated follow-up periods. A class label of 1 was assigned if a patient experienced SARS-CoV-2 infection within the specified time frame. The diagnosis of SARS-CoV-2 infection was determined through the detection of SARS-CoV-2 RNA using the polymerase chain reaction method or the detection of viral protein using an antigen test. The study patients were closely monitored and followed up until either death or completion of the study, whichever event occurred first.

2.4. Data cleaning and machine-learning model development

In our study, categorical variables were reported as frequencies (number of occurrences) and proportions (percentages). This provides information about the distribution of different categories within the dataset. For continuous parametric variables, we presented the median value along with the interquartile range (IQR), capturing the spread or variability of the data. To handle missing values in the clinical characteristics, appropriate imputation methods were employed to ensure a complete dataset suitable for analysis. This helped to maintain the integrity and representativeness of the data.

For model development, the study cohort was randomly divided into an 80% training set and a 20% testing set, ensuring a balanced representation of the data in both subsets. Six different machine-learning models, including CatBoost,16 light gradient boosting machines (LightGBM),17 RandomForest,18 and extreme gradient boosting models (XGBoost),19 were used. These models were selected based on their proven efficacy in similar healthcare studies.

To optimize model performance and address the issue of dimensionality, forward feature selection was implemented. This process enabled the systematic identification of the most informative subset of features from the available variables.20,21 The final subset of features was determined by iteratively evaluating different feature combinations, enhancing the model’s interpretability and efficiency in a medical context.

To evaluate the model’s performance and assess its stability, a 5-fold cross-validation strategy was applied to the training set. This involved dividing the training set into five equal-sized subsets, using four subsets for training the model and the remaining subset for validation. This process was repeated five times, ensuring that each subset served as a validation set once.22,23 This approach allowed for reliable performance estimation and ensured the generalizability of the applied machine-learning model.

2.5. Hyperparameter optimization

The CatBoost, LightGBM, RandomForest, and XGBoost models underwent a hyperparameter optimization process using a 5-fold cross-validation procedure to maximize the F1 score.24–27 Grid searches were conducted to systematically evaluate the models’ performance by exploring various combinations of hyperparameter values.

By exhaustively searching the hyperparameter space and selecting the optimal combination, we ensured that each ensemble model was fine-tuned to achieve the highest possible performance in the healthcare context. This rigorous process was designed to optimize the models for accurate risk assessment and prediction, ultimately reducing uncertainty and improving patient care outcomes.

2.6. Model evaluation

The discriminative power of various machine-learning models was evaluated using the AUC. Additionally, several performance metrics, such as the F1 score, accuracy, precision, recall, average precision, and log loss, were computed for each model using the test dataset.

Furthermore, SHAPs were employed to assess the risk of COVID-19 development in HD patients and provide explanations for the attributed values of clinical characteristics. SHAP provides a unified framework for interpreting model predictions by quantifying the contribution of each feature to the outcome. By leveraging SHAP, we gained valuable insights into the relative importance and impact of different clinical characteristics on the risk of developing COVID-19 in the HD population.

2.7. Software and package application for modeling

We employed Python (version 3.9) and the open-source Scikit-learn library to develop the machine-learning models. Statistical analysis was conducted using SAS version 9.4 (SAS Institute, Cary, NC).28 In Python, we used various packages from the Scikit-learn library for different stages of model development. The sklearn.model_selection.train_test_split function was used to randomly split the data into training and testing sets. The CatBoost model was implemented using CatBoostClassifier. For the random forest model, we used sklearn.ensemble.RandomForestClassifier. The GBDT model was established using sklearn.ensemble. GradientBoostingClassifier. The XGBoost model was implemented using the XGBoost Python package. The LightGBM model was developed using the lightGBM.LGBMClassifier Python package. Cross-validation was performed using sklearn.model_selection. StratifiedKFold to ensure the robustness and reliability of the model evaluation. For further explanation at the individual level, we used local interpretable model-agnostic explanations (LIME) and SHAP force plots to illustrate the impact of key features at the individual level.15 In brief, LIME gives an explanation of a classifier by approximating the key features by applying a locally linear model.16 In our analysis, a significance level of p < 0.05 was considered statistically significant.

3. RESULTS 3.1. Characteristics and distribution of patients

According to Table 1, a total of 443 hemodialysis patients were included in this study, with 355 patients in the training set and 88 patients in the testing set. The median age was 66 years in the training set and 63.5 years in the testing set. Among the patients, 56.6% in the training set and 46.6% in the testing set were male. Within the training set, the incidence of SARS-CoV-2 infection was 68 (19.2%), and the overall mortality among patients with and without SARS-CoV-2 infection was 13 (3.7%) and 47 (13.2%), respectively. In the testing set, the incidence of SARS-CoV-2 infection was 15 (17%), with the overall mortality among patients with and without SARS-CoV-2 infection being 5 (5.7%) and 12 (13.6%), respectively. Comorbidities such as hypertension (85.9%), diabetes mellitus (51.3%), and coronary artery disease (33.2%) were common. Laboratory data collection included measurements of albumin, BUN, calcium, total cholesterol, chloride, hematocrit, glucose, hemoglobin, potassium, sodium, triglyceride, uric acid, and urea reduction rate. Anti-S RBD antibody levels were assessed at three time points.

Table 1 - Demographics and clinical features of hemodialysis patients Characteristics Training set (n = 355) Testing set (n = 88) Demographic  Age, y 66 (20, 100) 63.5 (27, 92)  Male 201 (56.6) 41 (46.6)  Smokers 54 (15.2) 16 (18.2) COVID-19 vaccination status  One vaccination 24 (6.8%) 1 (1.1%)  Two vaccinations 106 (29.9%) 22 (25%)  Three vaccinations 225 (63.4%) 65 (73.9%) Comorbidities  Hypertension 305 (85.9) 81 (92)  Diabetes mellitus 182 (51.3) 39 (44.3)  Coronary artery disease 118 (33.2) 30 (34.1)  Heart failure 110 (31) 21 (23.9)  Peripheral artery disease 23 (6.5) 3 (3.4)  Stroke 60 (16.9) 8 (9.1)  Malignancy 61 (17.2) 20 (22.7) Laboratory data  Albumin, g/dL 4.1 (2.5, 4.9) 4.1 (3.1, 4.8)  Blood urea nitrogen, mg/dL 36 (15, 198) 33.5 (14, 64)  Calcium, mg/dL 9.1 (7.7, 10.7) 9.2 (7.5, 10.5)  Total cholesterol, mg/dL 148 (43, 322) 148 (33, 246)  Chloride, mEq/L 95 (88, 107) 95 (90, 105)  Hematocrit % 29.7 (20.2, 54.5) 29.4 (22, 37.5)  Glucose, mg/dL 146 (62, 377) 142.5 (67, 273)  Hemoglobin, g/dL 9.5 (6.4, 17.9) 9.7 (9.1, 10.3)  Potassium, mEq/L 4.6 (3, 5.9) 4.6 (3.4, 5.7)  Sodium, mEq/L 137 (129, 142) 137 (131, 142)  Triglyceride, mg/dL 141 (22, 795) 146 (36, 401)  Uric acid, mg/dL 5.9 (0.9, 12.4) 6.1 (1.4, 8.3)  Urea reduction rate % 72.9 (0, 91.5) 75 (4.6, 86.4) Anti-S RBD antibody levels  T1, units 13 (0.4, 2500) 38.7 (0.4, 2500)  T2, units 250 (0.4, 2500) 2500 (0.4, 2500)  T3, units 1116 (0.4, 2500) 2500 (0.4, 2500) Outcome  Severe acute respiratory syndrome coronavirus 2 infection 68 (19.2%) 15 (17%)  All-cause mortality 60 (16.9%) 17 (19.3%)

Values for categorical variables are given as numbers (percentages); values for continuous variables are given as medians and interquartile ranges.

aBlood samples for anti-S RBD antibodies were collected from the dialysis patients at T1 (2 wk after the first dose), T2 (2 wk after the second dose), and T3 (2 wk after the third dose). Blood samples were also obtained from patients who did not receive a vaccine or who only received the first or second dose at the same time.

Anti-S RBD antibody = anti-spike protein receptor-binding domain antibody.


3.2. Model prediction ability

Four machine-learning models, CatBoost,16 LightGBM,29 RandomForest,30 and XGBoost,31 were evaluated for their predictive capabilities in assessing the survival impact of partial COVID-19 vaccination on hemodialysis patients. Fig. 1 illustrates the performance of these models through (A) receiver operating characteristic curves. Additionally, the discriminative abilities of the models were compared using the F1 score, with LightGBM achieving an impressive F1 score of 0.95 through 5-fold cross-validation. In terms of (B) model performance on the testing dataset, the legend presents the AUC, accuracy, specificity, and precision values for each model. LightGBM, denoted by an asterisk (*), emerged as the champion model with the highest performance.

F1Fig. 1:

A, Receiver operating characteristic curves: graphical representation of machine-learning models’ performance in predicting the survival impact of partial COVID-19 vaccination on hemodialysis patients. B, Model performance: comparison of models’ discriminative abilities using the F1 score. The legend includes the area under the receiver operating characteristic curve (AUC), accuracy, specificity, and precision values for each model on the testing dataset. The asterisk (*) indicates that light gradient boosting machines (LightGBM) achieved the highest performance as the champion model. XGBoost = extreme gradient boosting models; CatBoost = categorical boosting.

3.3. Ranks of feature importance and SHAP values in the LightGBM model

In the LightGBM model, feature importance was analyzed using SHAP values,32 and a feature importance map was generated (Fig. 2A). The top five significant characteristics, ranked by their impact in descending order, were number of COVID-19 vaccinations, vaccination group (completed three doses), antibodies 2 weeks after the second dose, albumin, and age. The SHAP summary plot (Fig. 2B) provides valuable insights into the effect of feature importance on the model’s output. Higher SHAP values indicate a higher probability of influencing predictions in the LightGBM model. Yellow represents forecast increases, while blue indicates detailed forecast influencers. According to SHAP values, the most influential factor was the field of laboratory data, followed by COVID-19 vaccination, anti-S RBD antibody levels, and demographic data.

F2Fig. 2:

A, Feature importance plot: top clinical features. B, SHapley Additive explanation summary (SHAP) summary plot: relative importance of top clinical features for predicting risks. Albumin serum albumin level, vaccine number of COVID-19 vaccinations, group vaccination group (complete three doses), subgroup-specific vaccine subgroup (e.g., AZ group, AZ mixed with RNA group, RNA group), age patient age, T1 2 weeks after the first dose, T2 2 weeks after the second dose, T3 2 weeks after the third dose. SARS-CoV-2 = severe acute respiratory syndrome coronavirus 2; anti-S RBD antibody = anti-spike protein receptor-binding domain antibody.

In the LightGBM model, we conducted a comprehensive analysis of feature importance and SHAP values to provide insights at different levels. At the domain level, we categorized the top 17 features based on the main clinical domains in the COVID-19 mortality prediction model, aligning with the clinical workflow of HD patient management (Fig. 3). For the overall predicted probability of mortality, the LIME plot was used to visualize the incremental effects of variables.

F3Fig. 3:

LIME force plots illustrate the impact of important features on the prediction of the survival impact of partial COVID-19 vaccination on hemodialysis patients. The plots depict incremental effects of variables on predicted mortality probabilities, providing individual-level explanations. Albumin serum albumin level, vaccine number of COVID-19 vaccinations, group vaccination group (complete three doses), age patient age, T1 2 weeks after the first dose, T2 2 weeks after the second dose, T3 2 weeks after the third dose. SARS-CoV-2 = severe acute respiratory syndrome coronavirus 2; LIME = local interpretable model-agnostic explanations.

If-1, the predicted probability of 1-year mortality was relatively low (0.03), driven by several important factors. Variables such as COVID-19 vaccination number 3, anti-S RBD antibody level, time point 2 (2500 units), group (receiving three doses of the vaccine), albumin IQR q2 (4.4 g/dL), uric acid IQR q2 (6.6 mg/dL), non-SARS-CoV-2 infection, and hematocrit IQR q2 (29.2%) exhibited a positive correlation (blue) with the predicted probability.

If-2, the predicted probability of 1-year mortality was relatively high (0.81) with distinct contributing factors. Variables including COVID-19 vaccination number 2, anti-S RBD antibody level, time point 2 (99.9 units), group (not receiving three doses of the vaccine), albumin IQR q2 (3.4 g/dL), uric acid IQR q2 (3.6 mg/dL), SARS-CoV-2 infection, and hematocrit IQR q2 (26.4%) displayed a positive correlation (blue) with the predicted probability.

4. DISCUSSION

In this study, we employed interpretable machine-learning methods to develop a survival prediction model specifically for HD patients in the context of COVID-19. We collected blood samples from each patient at multiple time points and analyzed the humoral response by detecting anti-S RBD antibodies. Our primary approach involved using the LightGBM algorithm to construct a highly accurate survival prediction model. We also aimed to provide explanations at different levels to enhance interpretability.

The use of machine learning in the context of COVID-19 has shown several benefits, particularly in early detection and diagnosis. Recent studies have demonstrated that artificial intelligence models trained on large clinical datasets can generate more accurate diagnoses. For example, in one study, it was reported that machine learning (ML) models could identify COVID-19 early based on clinical symptoms without the need for CT imaging.33 Scholars in another study constructed an ML random forest model for classifying COVID-19 clinical types with over 90% prediction accuracy.34 In addition to diagnosis, machine learning has also been used for screening and improving the accuracy of clinical assessments. ML models trained on data from thousands of individuals have shown high accuracy in identifying COVID-19 using a limited number of binary features.35 The use of artificial intelligence-based screening to enhance clinical diagnosis by considering multiple diagnostic indicators has been reported in other studies. Machine-learning algorithms have also been employed for predicting outcomes related to COVID-19. For instance, ML models developed using data from the Korean National Health Insurance Service achieved high sensitivity, specificity, and AUC in mortality prediction.36

The LightGBM model emerged as the champion model, demonstrating the highest F1 score on the testing dataset. This indicates its superior predictive capabilities in determining the survival impact of partial COVID-19 vaccination in HD patients. However, importantly, all models achieved relatively high AUC values, highlighting their potential for accurate predictions.

The feature importance analysis using SHAP values provided valuable insights into the key factors influencing the predictions. Features such as COVID-19 vaccination number, vaccination group (complete three doses), 2 weeks after the second dose, albumin, and age ranked as the top contributors to the model’s predictions. These findings align with clinical reasoning, indicating that variables related to vaccination status, biochemical markers (e.g., albumin), and age play important roles in assessing the survival impact of partial COVID-19 vaccination in HD patients.

Furthermore, the LIME plot provided additional interpretability by illustrating the incremental effects of variables on the predicted probabilities of mortality. This helped identify specific factors positively correlated with the predicted probabilities in different scenarios. Understanding these associations can assist clinicians in identifying high-risk patients and implementing appropriate interventions.

The use of interpretable machine-learning methods in this study contributes to the growing body of research that aims to uncover the factors influencing mortality outcomes in HD patients with COVID-19. By identifying specific features associated with mortality, healthcare professionals can gain insights into disease progression and make informed decisions regarding patient care and management strategies.

The results of our study provide valuable insights. However, some limitations are acknowledged. The analysis focused specifically on HD patients, and the generalizability of the findings to other populations may be limited. Additionally, the retrospective nature of the study introduces the potential for bias and confounding factors.

In conclusion, our study demonstrated the utility of interpretable machine-learning methods in predicting the survival impact of partial COVID-19 vaccination in HD patients. The identified features and explanations provide valuable clinical insights, aiding in risk assessment and decision-making. Future studies with larger and more diverse populations are warranted to validate and refine these predictive models for improvements in patient management and outcomes.

ACKNOWLEDGMENTS

We thank the Big Data Centre, Taipei Veterans General Hospital for the data arrangement and Shang-Liang Wu, PhD, for the statistical analysis. This study was supported by grants from the National Science and Technology Council (NSTC110-2320-B-075-004-MY3) and Taipei Veterans General Hospital (V112C-067, V112E-001-2).

REFERENCES 1. Li SY, Tang YS, Chan YJ, Tarng DC. Impact of the COVID-19 pandemic on the management of patients with end-stage renal disease. J Chin Med Assoc. 2020;83:628–33. 2. Salerno S, Messana JM, Gremel GW, Dahlerus C, Hirth R, Han P, et al. COVID-19 risk factors and mortality outcomes among medicare patients receiving long-term dialysis. JAMA Netw Open. 2021;4:e2135379. 3. Moreira ED, Kitchin N, Xu X, Dychter SS, Lockhart S, Gurtman A, et al.; C4591031 Clinical Trial Group. Safety and efficacy of a third dose of BNT162b2 COVID-19 vaccine. N Engl J Med. 2022;386:1910–21. 4. Magen O, Waxman JG, Makov-Assif M, Vered R, Dicker D, Hernán MA, et al. Fourth dose of BNT162b2 mRNA COVID-19 vaccine in a nationwide setting. N Engl J Med. 2022;386:1603–14. 5. Dulovic A, Strengert M, Ramos GM, Becker M, Griesbaum J, Junker D, et al. Diminishing immune responses against variants of concern in dialysis patients 4 months after SARS-CoV-2 mRNA vaccination. Emerg Infect Dis. 2022;28:743–50. 6. Borchers A, Pieler T. Programming pluripotent precursor cells derived from xenopus embryos to generate specific tissues and organs. Genes (Basel). 2010;1:413–26. 7. Polonia J, Azevedo A, Monte M, Silva JA, Bertoquini S. Annual deterioration of renal function in hypertensive patients with and without diabetes. Vasc Health Risk Manag. 2017;13:231–7. 8. Hobeika L, Hunt KJ, Neely BA, Arthur JM. Comparison of the rate of renal function decline in nonproteinuric patients with and without diabetes. Am J Med Sci. 2015;350:447–52. 9. Lim CTS, Nordin NZ, Fadhlina N, Anim MS, Kalaiselvam T, Haikal WZ, et al. Rapid decline of renal function in patients with type 2 diabetes with heavy proteinuria: a report of three cases. BMC Nephrol. 2019;20:1–6. 10. Żyłka A, Dumnicka P, Kuśnierz-Cabala B, Gala-Błądzińska A, Rybak K, Drożdż RJFMC. Role of new biomarkers for the diagnosis of nephropathy associated with diabetes type 2. Folia Med Cracov. 2015;55(4):21–33. 11. Batista AFM, Miraglia JL, Donato THR, Chiavegatto Filho ADP. COVID-19 diagnosis prediction in emergency care patients: a machine learning approach. medRxiv. 2020. Doi:10.1101/2020.04.04.20052092. Available at medRxiv. Accessed September 15, 2023. 12. Ou SM, Tsai MT, Lee KH, Tseng WC, Yang CY, Chen TH, et al. Prediction of the risk of developing end-stage renal diseases in newly diagnosed type 2 diabetes mellitus using artificial intelligence algorithms. BioData Min. 2023;16:8. 13. Liao WH, Cheng YF, Chen YC, Lai YH, Lai F, Chu YC. Physician decision support system for idiopathic sudden sensorineural hearing loss patients. J Chin Med Assoc. 2021;84:101–7. 14. Chu YC, Kuo WT, Cheng YR, Lee CY, Shiau CY, Tarng DC, et al. A survival metadata analysis responsive tool (SMART) for web-based analysis of patient survival and risk. Sci Rep. 2018;8:12880. 15. Chen Y, Ouyang L, Bao FS, Li Q, Han L, Zhu B, et al. A multimodality machine learning approach to differentiate severe and nonsevere COVID-19: model development and validation. J Med Internet Res. 2021;23(4):e23948. 16. Dorogush AV, Ershov V, Gulin A. CatBoost: gradient boosting with categorical features support. arXiv preprint arXiv. 2018. Doi:10.48550/arXiv.1810.11363. Available at arXiv. Accessed September 15, 2023. 17. Alzamzami F, Hoda M, Saddik AE. Light gradient boosting machine for general sentiment classification on short texts: a comparative evaluation. IEEE Access. 2020;8:101840–58. 18. Rodriguez-Galiano VF, Ghimire B, Rogan J, Chica-Olmo M, Rigol-Sanchez JP. An assessment of the effectiveness of a random forest classifier for land-cover classification. ISPRS J Photogramm Remote Sens. 2012;67:93–104. 19. Osman AIA, Ahmed AN, Chow MF, Huang YF, El-Shafie A. Extreme gradient boosting (XGBoost) model to predict the groundwater levels in Selangor Malaysia. Ain Shams Eng J. 2021;12:1545–56. 20. Rahman MM, Usman OL, Muniyandi RC, Sahran S, Mohamed S, Razak RA. A review of machine learning methods of feature selection and classification for autism spectrum disorder. Brain Sci. 2020;10:949. 21. Kulan H, Dag T, Mumtaz W. In silico identification of critical proteins associated with learning process and immune system for Down syndrome. PLoS One. 2019;14:e0210954. 22. Jung Y, Hu J. AK-fold averaging cross-validation procedure. J Nonparametr Stat. 2015;27:167–79. 23. Little MA, Varoquaux G, Saeb S, Lonini L, Jayaraman A, Mohr DC, et al. Using and understanding cross-validation strategies. Perspectives on Saeb et al. GigaScience. 2017;6:1–6. 24. Chadha A, Kaushik B. A hybrid deep learning model using grid search and cross-validation for effective classification and prediction of suicidal ideation from social network data. New Gener Comput. 2022;40:889–914. 25. Diao X, Huo Y, Zhao S, Yuan J, Cui M, Wang Y, et al. Automated ICD coding for primary diagnosis via clinically interpretable machine learning. Int J Med Inform. 2021;153:104543. 26. Jiang X, Xu C. Deep learning and machine learning with grid search to predict later occurrence of breast cancer metastasis using clinical data. J Clin Med. 2022;11:5772. 27. Chen YC, Chu YC, Huang CY, Lee YT, Lee WY, Hsu CY, et al. Smartphone-based artificial intelligence using a transfer learning algorithm for the detection and diagnosis of middle ear diseases: a retrospective deep learning study. E Clinical Medicine. 2022;51:101543. 28. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B. Scikit-learn:machine learning in python. J Mach Learn Res. 2011;12:2825–30. 29. Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, et al. LightGBM: a highly efficient gradient boosting decision tree. Adv Neural Inf Process Syst. 2017;30:3146–56. 30. Biau G, Scornet E. A random forest guided tour. Test. 2016;25:197–227. 31. Chen T, Guestrin C. XGBoost: a scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016:785–94. 32. Lundberg SM, Lee SI. A unified approach to interpreting model predictions. Adv Neural Inf Process Syst. 2017;30:4765–74. 33. Li WT, Ma J, Shende N, Castaneda G, Chakladar J, Tsai JC, et al. Using machine learning of clinical data to diagnose COVID-19: a systematic review and meta-analysis. BMC Med Inform Decis Mak. 2020;20:1–13. 34. Peng M, Yang J, Shi Q, Ying L, Zhu H, Zhu G, et al. Artificial intelligence application in COVID-19 diagnosis and prediction. SSRN Electronic J. 2020. Doi:10.2139/ssrn.3541119. Available at SSRN. Accessed September 15, 2023. 35. Chen Y, Ouyang L, Bao FS, Li Q, Han L, Zhu B, et al. A multimodality machine learning approach to differentiate severe and nonsevere COVID-19: model development and validation. J Med Internet Res. 2021;23(4):e23948. 36. Organization WH. Coronavirus disease 2019 (COVID-19): situation report, 73. 2020.

留言 (0)

沒有登入
gif