A Machine Learning Algorithm using Clinical and Demographic Data for All-Cause Preterm Birth Prediction

  SFX Search  Buy Article Permissions and Reprints Abstract

Objective Preterm birth remains the predominant cause of perinatal mortality throughout the United States and the world, with well-documented racial and socioeconomic disparities. To develop and validate a predictive algorithm for all-cause preterm birth using clinical, demographic, and laboratory data using machine learning.

Study Design We performed a cohort study of pregnant individuals delivering at a single institution using prospectively collected information on clinical conditions, patient demographics, laboratory data, and health care utilization. Our primary outcome was all-cause preterm birth before 37 weeks. The dataset was randomly divided into a derivation cohort (70%) and a separate validation cohort (30%). Predictor variables were selected amongst 33 that had been previously identified in the literature (directed machine learning). In the derivation cohort, both statistical (logistic regression) and machine learning (XG-Boost) models were used to derive the best fit (C-Statistic) and then validated using the validation cohort. We measured model discrimination with the C-Statistic and assessed the model performance and calibration of the model to determine whether the model provided clinical decision-making benefits.

Results The cohort includes a total of 12,440 deliveries among 12,071 individuals. Preterm birth occurred in 2,037 births (16.4%). The derivation cohort consisted of 8,708 (70%) and the validation cohort consisted of 3,732 (30%). XG-Boost was chosen due to the robustness of the model and the ability to deal with missing data and collinearity between predictor variables. The top five predictor variables identified as drivers of preterm birth, by feature importance metric, were multiple gestation, number of emergency department visits in the year prior to the index pregnancy, initial unknown body mass index, gravidity, and prior preterm delivery. Test performance characteristics were similar between the two populations (derivation cohort area under the curve [AUC] = 0.70 vs. validation cohort AUC = 0.63).

Conclusion Clinical, demographic, and laboratory information can be useful to predict all-cause preterm birth with moderate precision.

Key Points

Machine learning can be used to create models to predict preterm birth.

In our model, all-cause preterm birth can be predicted with moderate precision.

Clinical, demographic, and laboratory information can be useful to predict all-cause preterm birth.

Keywords preterm birth - machine learning - social determinants of health - XG- boost - predictive algorithm Note

This study was presented at the Society for Maternal-Fetal Medicine 42nd Annual Meeting, Virtual Poster, February 2022.

Publication History

Received: 12 December 2022

Accepted: 18 October 2023

Article published online:
04 December 2023

© 2023. Thieme. All rights reserved.

Thieme Medical Publishers, Inc.
333 Seventh Avenue, 18th Floor, New York, NY 10001, USA

留言 (0)

沒有登入
gif