POSTCARDS from a SIESTA: Crossing the Translational and Generalizability Gap for Predictive Models of Acute Respiratory Distress Syndrome-Related Mortality*

In this issue of Critical Care Medicine, the study by Villar et al (1) tests a new mortality prediction score in Spanish Initiative for Epidemiology, Stratification and Therapies for Acute Respiratory Distress Syndrome (SIESTA) (ALIEN, STANDARDS, STANDARDS-2) and externally validates in prevalence and outcome of acute hypoxemic respiratory failure (PANDORA) data. This novel dataset in SIESTA contains three trials with 1,000 patients with moderate-to-severe acute respiratory distress syndrome (ARDS). Although the prediction of mortality in ARDS using either machine learning or scoring methods is not new, this has been confirmed in a new dataset. The generation of this dataset is itself novel, as there was an attempt at standardized oxygenation support using fixed positive end-expiratory pressure (PEEP) (10 cm H2O) and Fio2 (50%) if possible. This makes P/F ratios more comparable across patients (2).

It is commendable that the authors have used rigorous methods to train and validate these models. By using 100 bootstrapping folds, along with five-fold cross-validation (using 80% train/20% test splits), the authors have created a relatively robust training pipeline. This is limited by potential information leakage, with hyperparameter tuning conducted on the full dataset as opposed to a left-out set. The benefit is that the hyperparameter tuning continues to be effective in another prospective observational cohort environment for moderate-to-severe ARDS.

The strongest predictors for this model included plateau pressures at T0, T24, and the number of organ failures. As high plateau pressures are associated with decreased compliance, often with more severe ARDS, this associates higher mortality with more severe ARDS. This is a reasonable finding (3). Furthermore, as more organ failures in the sequential organ failure assessment are associated with higher mortality, it is logical that increased organ failure plays a strong role in this model (4). However, these are not new findings.

To address the classic problem of high dimensionality to sample size, the authors use a feature selection method called the genetic algorithm (GA). This method is a class of search algorithms that are inspired by evolutionary biology and natural selection, and outperform random search algorithms due to their ability to use historical data to optimize their search space (5,6). Within the context of feature selection, GAs use the concept of “genes” representing the input feature domain, and an “organism” representing a potential set of optimal features. They start by randomly initializing the population, which consists of all probable solutions to a given objective. In the second stage, a “fitness score” is assigned to each individual candidate in the population, such that the higher the fitness score, the more probable for being chosen for reproduction. Then selection occurs, in which pair-wise candidates are selected for reproduction, where variation operators are applied to either perform “crossover” or “mutation” tasks; the former representing where randomly selected information is used to generate a child of equal length and the latter where new information is generated in the child. This process is followed by replacement wherein the new “child” populations replace the parents, until the overall fitness score improves, indicating a more optimal solution. This process is repeated until a stopping criterion is met, usually where a threshold for a fitness score has been reached.

This heuristic-based adaptation of the random search algorithm has been used across multiple domains and has resulted in a robust selection of highly enriched features (7,8). Yet, there are important disadvantages of the algorithm that are pertinent to Critical Care Medicine. Importantly, the initialization parameters often drive convergence, thus the subset of patients chosen for initialization may influence subsequent candidate populations. In the case of this particular approach, the authors sought to overcome this limitation by performing 100 iterations of bootstrapped search, thus optimizing the likelihood that the proposed set can represent generalizable features. Yet, the conventional GA is prone to premature convergence from the lack of global search ability due to the loss of population diversity during evolution (9,10). Sometimes, this loss of diversity occurs due to the presence of highly associated features, that is, those that are dominant in terms of their correlation with the outcome in interest, in this case mortality. Indeed, the final set of variables proposed by the authors, including plateau pressure, are highly associated with changing lung compliance. However, the use of such a dimensional reduction strategy may have obfuscated other potentially salient candidate features that may have proven to be superior predictive candidates. The lack of an alternative feature selection algorithm indeed leaves the reader with open-ended questions about what may have emerged within this highly unique dataset.

Among several active challenges of translating novel clinical decision support tools to the bedside has been associated with poor generalizability. Indeed such challenges have been reported across multiple studies, including those derived from clinical trials. A variety of contributors exist; however, a key aspect arises from the occurrence of missingness of measures that are often programmatically captured during the trial. Additionally, participants are often monitored at predefined intervals, which may not be practically implemented in the real-world setting. Collectively, these factors pose limitations when interpreting machine-learning–based results that arise from trials data. However, several methods exist to reliably extract meaning, including in the use of robust stochastic control to approximate a probability distribution around occurrences to approximate variable-level uncertainty. These factors, along with other approaches to maximizing generalization, may be best explored when such data are made available to the wider community for rigorous and reproducible experimentation.

As noted earlier, the dataset with PEEP 10 and Fio2 50% for standardizing the meaning and significance of P/F ratios is a novel exercise. Oxygenation is notoriously hard to model across two independent factors. This enforcement supports more robust comparisons.

The extension of this work using SIESTA data has generated a new mortality score for moderate-to-severe ARDS that is, although not novel, interesting for a first-use dataset. It standardizes metrics for P/F measurement to improve generalizability. However, the new ARDS definition, including heated high-flow nasal cannula (HFNC; e.g., Optiflow, Airvo), may present complications, as there is no pressure or PEEP to adjust for HFNC. Consequently, plateau pressures and PEEP cannot be standardized (11).

Overall, Predicting Outcome and STratifiCation of Severity in ARDS, although confirming the more recent stratification for identification of prognostic categories in acute respiratory distress syndrome (SPIRES) score, remains an interesting and useful study. Confirmation of SPIRES is still a good exercise and effort by a talented team and network. This dataset opens the opportunity for further research, with additional robust insights to be drawn. We look forward to future developments from SIESTA and PANDORA data.

1. Villar J, González-Martín JM, Hernández-González J, et al.; Predicting Outcome and STratifiCation of severity in ARDS (POSTCARDS) Network: Predicting ICU Mortality in Acute Respiratory Distress Syndrome Patients Using Machine Learning: The Predicting Outcome and STratifiCation of severity in ARDS (POSTCARDS) Study. Crit Care Med. 2023; 51:1638–1649 2. Huang B, Liang D, Zou R, et al.: Mortality prediction for patients with acute respiratory distress syndrome based on machine learning: A population-based study. Ann Transl Med. 2021; 9:794–794 3. Jardin F, Vieillard-Baron A: Is there a safe plateau pressure in ARDS? The right heart only knows. Intensive Care Med. 2007; 33:444–447 4. Minne L, Abu-Hanna A, de Jonge E: Evaluation of SOFA-based models for predicting mortality in the ICU: A systematic review. Crit Care. 2008; 12:R161 5. Holland JH: Genetic algorithms. Sci Am. 1992; 267:66–72 6. Kramer O: Genetic algorithms. In: Genetic Algorithm Essentials. Kramer O (Ed). Cham, Springer International Publishing, 2017, pp 11–19 7. Katoch S, Chauhan SS, Kumar V: A review on genetic algorithm: Past, present, and future. Multimed Tools Appl. 2021; 80:8091–8126 8. Sivanandam SN, Deepa SN: Genetic algorithms. In: Introduction to Genetic Algorithms. Sivanandam SN, Deepa SN (Eds). Berlin, Heidelberg, Springer Berlin Heidelberg, 2008, pp 15–37 9. Chen N, Qiu T, Lu Z, et al.: An adaptive competition and its 3D robustness evolution algorithm with self-deployment for internet of things. IEEE/ACM Trans Netw. 2022; 30:368–381 10. Shi K, Huang L, Jiang D, et al.: Path planning optimization of intelligent vehicle based on improved genetic and ant colony hybrid algorithm. Front Bioeng Biotechnol. 2022; 10:905983 11. Matthay MA, Arabi Y, Arroliga AC, et al.: A new global definition of acute respiratory distress syndrome. Am J Respir Crit Care Med. 2023 Jul 24. [online ahead of print]

留言 (0)

沒有登入
gif