The development of prognostic and diagnostic tools is a key component of research efforts in clinical epidemiology to improve patient management based on their individual characteristics [1]. Clinical prediction models, in particular, can be used to estimate the risk of a clinical endpoint based on a set of relevant individual-level predictors. Historically, development and validation of prediction models have been largely based on regression approaches, using stepwise algorithms for variable selection. In recent years, however, the limitations of regression-based approaches [2], [3], together with the potential advantages of machine learning (ML) for variable selection, have achieved increasing recognition [4], [5], [6].
While the application of ML and DL to non-tabular clinical data such as images and speech or text is widely accepted [7], [8], their potential advantages in other common epidemiological contexts such as prediction modeling are still debated [9], [10]. Despite successful applications in several clinical fields [11], [12], [13], [14], concerns around the advantages of ML for prediction modeling are still pervasive [15], [16]. Poor performances of ML and DL have been especially highlighted when the training sample is small, biased, or lacks diversity [17], and more research is needed to identify the settings where ML and DL can offer advantages in clinical prediction modeling [18], [19], [20]. Within this context, a setting of potential appeal that has received limited attention is the ability of ML to integrate complex predictors with non-linear and non-additive effects. It is often of interest, for example, to assess how novel potential predictors such as protein biomarkers, imaging, or genetic variants improve the prognostic performance of an existing clinical model [21]. Predictors such as protein biomarkers, however, often present potential non-additive and non-linear effects that should be considered and explored while building the model [22]. This can be achieved within a regression-based framework through the inclusion of product terms and splines transformations [23], [24]. Nevertheless, this approach is known to increase the risk of model overfitting and model misspecification [25], [26], thus making alternative approaches that offer robust overfitting control through cross-validation of particular appeal in this context. These broadly include ML methods such as random forests and ensemble approaches, which require human intervention in the phase of predictors and parameters definition, but also deep learning (DL) methods such as neural networks, where the machine is also responsible for the initial data and parameters processing [27].
Moreover, prediction models are often designed to estimate event risks over time, relying on methods for time-to-event outcomes such as Cox regression. However, most existing comparisons between regression-based and ML approaches focus on comparing techniques for binary outcomes rather than time to event outcomes [15], [16]. ML approaches for survival data based on gradient boosting are available, and extensions of neural network methods for censored outcomes have recently been developed [28], [29]. To the best of our knowledge, ML approaches have only been applied and compared to standard Cox regression in the context of high-dimensional screening for clinical prediction modeling [30], [31]. Application of DL methods for survival outcomes in clinical epidemiology have not been discussed.
In this study, we evaluated selected ML and DL approaches for time-to-event data and compared their performance to classical regression-based approaches (backward and penalized regression) in clinical prediction, with specific focus to those settings where the data complexity is not necessarily driven by high-dimensionality but rather by the presence of non-additive effects. ML methods were selected based on previous applications and considerations in clinical prediction modeling, as well as software availability and documentation. We used data from a recently developed clinical score designed to predict atherothrombotic risk in patients with type 2 diabetes mellitus (T2DM), further including complex predictors such as protein biomarkers and lipids along with the clinical factors previously evaluated [32]. We provide several considerations for improving the interpretability and generalizability of the results, and discuss tools to incorporate complex relationships while maintaining a parsimonious and interpretable clinical prediction model.
Comments (0)