Artificial Intelligence and Anorectal Manometry: Automatic Detection and Differentiation of Anorectal Motility Patterns—A Proof-of-Concept Study

INTRODUCTION

Anorectal disorders causing fecal incontinence (FI) or evacuation dysfunction are common, affecting up to 5% of the general population (1). These disturbances are more common in women and have a relevant burden on the quality of life of these patients (2–4). Most disorders are the result of an intricate combination of disturbances, including neurologic and disturbances of the pelvic floor musculature (5). The diagnosis of these disorders is often difficult, with significant diagnostic delay (6,7).

Anorectal manometry (ARM) plays an important role in the investigation of patients with suspected anorectal disorders, either presenting with FI or chronic constipation (particularly when outlet dysfunction is suspected as the underlying mechanism). This technique, in combination to the balloon expulsion test and the rectal sensory test, allows for the assessment of the anorectal sensorimotor function, reflex activity, rectoanal coordination, and, ultimately, the voluntary and unvoluntary control of anal continence. Recently, a consensus classification has been developed for the classification of anorectal disorders, the London classification, inspired by the significant developments on esophageal manometry (8).

Over the past years, there is a growing interest on the development and application of artificial intelligence (AI) algorithms for different medical specialties (9). In gastroenterology, most AI tools have been developed for the automatic identification of lesions in endoscopic images (10). The development and implementation of automated tools for automated interpretation of functional studies has scarcely been reported. Moreover, existing studies focus on esophageal manometry (11–13). To the best of our knowledge, no convolutional neural network (CNN) model for ARM has been proposed. Therefore, our aim was to develop and validate an AI model based on a CNN for the automatic differentiation of motility patterns associated with FI and evacuation obstruction, using raw data from ARM examinations.

METHODS Study design

ARM examinations performed between April 2015 and September 2021 at a referral center Pelvia—Gastrointestinal Motility and Continence were retrospectively reviewed. A total of 827 examinations were included. Data from these examinations were retrieved after analysis and consensus by 2 experts in ARM. This study was registered in the clinical studies platform—Plataforma Brasil (17932 SGP CPP) and was conducted respecting the declaration of Helsinki. Any information deemed to potentially identify the subjects was omitted. Each patient was assigned a random number to guarantee effective data anonymization. A team with data protection officer certification confirmed the nontraceability of data and conformity with the general data protection regulation. This study is noninterventional, and patients were followed up according to current institutional clinical protocols.

ARM procedure

All procedures were performed using water-perfused probes with 8 radial transducers with a distal latex balloon (Dynamed, São Paulo, Brazil). The 8 channels were recorded into a computerized recorder (Dynamed, São Paulo, Brazil). The probe was lubricated and inserted up to 6 cm from the anal verge, with the patient on left lateral decubitus after clinical examination including digital rectal examination. No bowel preparation was used. Resting pressures were measured, as well as anal sphincter contraction pressures, simulated evacuation, and cough reflex. These measures were taken at each centimeter until 1 cm of the anal verge. Moreover, 2 voluntary squeezes of 40 seconds were evaluated, as well as the inhibitory rectoanal reflex by inflating the balloon with 20, 40, and 60 mL of air. Finally, we assessed the rectal sensitivity by slow filling of the balloon with water at room temperature. Rectal sensitivity was evaluated by assessing the fluid volumes at which the patient reported first sensation, the feeling to defecate, as well as the urge to defecate, was registered.

Data from the manometry examinations were retrieved after analysis and consensus by 2 experts in ARM. After reading each examination, the investigators categorized the findings as compatible with FI and evacuation obstruction. Manometric findings associated with IF included lower anal canal functional length (<2.3 or 2.4 cm in women and men, respectively), decreased resting pressure (<40 mm Hg), anal hypocontractility (total pressure <100 mm Hg or increment over resting pressure <60 mm Hg), decreased sustained squeeze duration and pressure, increased rectal sensitivity (volume <10 mL), decreased rectal compliance (<3 mm Hg/mL), absence of rectoanal inhibitory reflex, as well as the absence of cough-induced reflex increase in sphincteric pressures. ARM findings associated with obstructed defecation (OD) include impaired rectoanal coordination during defecation effort, decreased rectal sensitivity (volume >30 mL), and increased rectal compliance (>15 mm Hg/mL). If a consensus could not be reached, information from that examination would be discarded.

Data acquisition and preparation

We have retrospectively collected 827 ARM examinations between 2015 and 2021. In 493 of these examinations, the patients were diagnosed with OD, whereas the remaining 334 presented FI. We then targeted these outcomes of obstruction or incontinence to the decrypted pressure signals of each patient. Low-pass filters and feature extraction were applied. The resulting data set comprises the collected signal features along with the target value.

Model selection and tuning

We aimed to build a model that distinguish between patients with manometric findings of OD from those with FI. We have trained multiple machine learning (ML) models to understand which best learns from our data set. In particular, we trained k-nearest neighbors (KNN), support vector machines (SVM), random forests (RF), and gradient boosting (xGB) models. We also ran a dummy model that for baseline comparisons. A stratified 5-fold strategy was applied in each model training and repeated 10 times. To improve performance, we fine-tuned 5 times each model's hyperparameters using random splits of 90% of the data for training and the remaining 10% for testing. The analyses were performed with a computer equipped with a 2.1-GHz Intel Xeon Gold 6130 processor (Intel, Santa Clara, CA) and a double NVIDIA Quadro RTX 8000 graphic processing unit (NVIDIA, Santa Clara, CA).

Model performance and statistical analysis

The primary outcome measures of this study included the precision, recall (sensitivity), F1-score, and the accuracy in differentiating between manometric evidence of FI and evacuation obstruction. Moreover, the discriminating performance of each model was assessed by analyzing receiver operating curves. The classification of each model was compared with the definitive diagnosis, which was considered the gold standard. For a given examination 8-channel pressure data, the trained models distinguish OD from FI.

The primary outcome measures of this study included the precision, recall, F1-score, and overall accuracy in differentiating between manometric evidence of FI and evacuation obstruction. Moreover, the discriminating performance of each model was assessed by analyzing the area under the receiver operating characteristic (AUROC) curve analysis. The classification provided by each model was compared with the definitive diagnosis, which was considered the gold standard. For a given examination 8-channel pressure data, the trained models distinguish OF from FI. All performance parameters are presented as mean ± SD. Statistical analysis was performed using Sci-Kit learn version 0.22.2 (14).

RESULTS Study population

Patients submitted to ARM between April 2015 and September 2021 at a Brazilian gastrointestinal motility referral center. A total of 827 manometry examinations were ultimately reviewed: 493 examinations received a diagnosis of defecation obstruction, whereas the remaining 334 were diagnosed with FI. Four ML models were assessed, and their accuracies were initially compared using a 5-fold stratified training strategy. Subsequently, models' hyperparameters were tuned to achieve maximal performance using random splits of the data in a ratio of 90% for training data set (n = 744) and 10% for the testing data set (n = 83).

Models' accuracies during 5-fold stratified training

The performance of 4 ML models (xGB, RF, SVM, and KNN) is summarized in Table 1. The gradient boosting model achieved an overall accuracy of 83.0% ± 2.1%. These values were similar to those obtained by the RF (81.8% ± 2.3%) and the SVM models (80.2% ± 3.4%). The KNN model had a lower overall accuracy, comparing with xGB, RF, and SVM. All tested ML algorithms outperformed the baseline comparison model (dummy model, Figure 1).

Table 1. - Accuracy of the studied machine learning models Accuracy (%) (mean ± SD) Range xGB 83.0 ± 2.1 78.8–87.9 RF 81.8 ± 2.3 76.4–86.8 SVM 80.2 ± 3.4 70.3–87.9 KNN 73.5 ± 3.4 69.1–80.0 Dummy 59.6 ± 0.2 59.4–60.0

Models were trained 10 times using 5-fold splits.

KNN, k-nearest neighbors; RF, random forests; SVM, support vector machines; xGB, gradient boost.


F1Figure 1.:

Accuracy (mean ± SD) for each tested model. KNN, k-nearest neighbors; RF, random forests; SVM, support vector machines; xGB, gradient boost.

Hyperparameter tuning and models' performances

To improve performance, the hyperparameters of each model were tuned 5 times. The data set was randomly split in 90% for training and 10% for validation. The confusion matrices for the validation data set of each ML model are shown in Table 2. The accuracies of the tuned models are displayed on Figure 2. Overall, the xGB model presented the highest accuracy (84.6% ± 2.8%). This value is comparable with that obtained by RF (82.7% ± 4.8%) and SVM (81.0% ± 8.0%). The accuracy of these tuned models was significantly higher than that achieved by the KNN model (74.4% ± 3.8%).

Table 2. - Confusion matrices for the testing data set for the tuned models True label Predicted labels OD FI  xGB OD 45 5 FI 4 29  RF OD 48 10 FI 1 24  SVM OD 45 4 FI 4 30  KNN OD 41 8 FI 8 26

FI, fecal incontinence; KNN, k-nearest neighbors; OD, obstructed defecation; RF, random forests; SVM, support vector machines; xGB, gradient boost.


F2Figure 2.:

Accuracy (mean ± SD) for each tuned model. KNN, k-nearest neighbors; RF, random forests; SVM, support vector machines; xGB, gradient boost.

The performance marks of the different tuned models, including the precision, sensitivity, and F1-score, are summarized on Table 3. The xGB model showed the highest precision levels for the detection of OD (83.6% ± 4.3%) and FI (87.4% ± 4.8%). The sensitivity for the detection of OD was higher for the xGB ML model (92.4% ± 3.65%), similar to that of the RF model (90.0% ± 7.3%). The sensitivity for the detection of FI was similar across the different ML models, with the highest value recorded for SVM (79.4% ± 8.0%). The harmonic mean of precision and sensitivity (F1-score) was higher for the xGB model.

Table 3. - Summary of the performance of the machine learning models Model Class Precision (%) (mean ± SD) Sensitivity (%) (mean ± SD) F1-score (%) (mean ± SD) Accuracy (%) Average (mean ± SD) Best KNN OD 79.6 ± 2.7 76.6 ± 5.4 78.0 ± 3.7 74.4 ± 3.8 80.7 FI 68.0 ± 5.1 72.0 ± 3.7 69.8 ± 3.6 SVM OD 85.0 ± 6.1 82.2 ± 8.8 83.6 ± 7.4 81.0 ± 8.0 90.4 FI 75.6 ± 10.2 79.4 ± 8.0 77.6 ± 8.9 RF OD 81.8 ± 1.6 90.0 ± 7.3 85.6 ± 4.4 82.1 ± 4.7 86.7 FI 84.0 ± 10.5 71.6 ± 1.3 76.8 ± 4.7 xGB OD 83.6 ± 4.3 92.4 ± 3.6 87.6 ± 2.2 84.6 ± 2.8 89.2 FI 87.4 ± 4.8 73.6 ± 8.2 79.6 ± 4.6

Models were tuned by training 5 times randomly selecting 90% of the data for training and 10% for testing.

FI, fecal incontinence; KNN, k-nearest neighbors; OD, obstructed defecation; RF, random forests; SVM, support vector machines; xGB, gradient boost.

Overall, the performance for discriminating between OD and FI was higher for the xGB model, with an AUROC of 0.939 (Figure 3).

F3Figure 3.:

Receiver operator characteristic analyses for discriminating between manometric patterns of obstructed defecation and fecal incontinence for each of the tuned models. AUROC, area under the receiver operating characteristic; KNN, k-nearest neighbors; RF, random forests; SVM, support vector machines; xGB, gradient boost.

DISCUSSION

The development and application of AI algorithms to increase the diagnostic yield of currently existing diagnostic modalities has been the focus of intense research. The development of these studies for the study of digestive diseases is rapidly growing, particularly for endoscopic techniques (15). Nevertheless, the development of these systems for the automatic interpretation of motility studies is at its early stages. To this date, studies evaluating the impact of AI in the field of neurogastroenterology and motility have focused on esophageal manometry (11–14,16). This pilot study is the first to evaluate the feasibility and performance of the application of an ML algorithm for ARM. In this study, a CNN algorithm differentiated the FI and evacuation dysfunction motility patterns with high sensitivity and specificity.

FI and chronic constipation because of functional defecation disorders are common health problems and have a significant impact on patients' quality of life and are associated with high economic burden both for healthcare-related costs (17–19). Moreover, the diagnosis of these conditions is particularly difficult because patient-reported symptoms are known to be a poor predictor of the underlying pathophysiologic mechanism (20,21). Also, significant overlap in clinical presentation may occur because patients with chronic constipation may present with overflow FI or FI because of neuropathy or structural lesion (22).

ARM is an essential element for the characterization of the physiologic modifications, which is pivotal for selecting and predicting the response to advanced therapeutic strategies, including anorectal biofeedback and sacral nerve stimulation (8,23,24). Nevertheless, significant variability between centers exists regarding the indications and procedure protocol. Indeed, a study by Carrington et al found significant discrepancy in the methods for describing the results of motility studies, which seems to be even more pronounced with the introduction of high-resolution ARM (25,26). In 2020, the London classification was introduced to standardize study protocol, terminology, interpretation and reporting of manometric findings. Nonetheless, significant practice variability persists because the recommendations are limited because of the variability of existing evidence (8). The accessibility to motility studies, particularly ARM, is limited. Although functional disorders are believed to account for half of referrals to specialized gastroenterology practice, training is variable, and most gastroenterologists do not have an adequate training and experience to perform and interpret motility diagnostic modalities (27,28). The development of an AI tool for ARM may help to overcome these limitations, promoting the accurate detection of motility patterns, which may substantially improve the diagnostic capacity and help to select patients for specific therapies. The introduction of these technologies may prove particularly helpful in low-volume centers with lower levels of expertise in ARM. Moreover, the introduction of ML tools to manometry examinations may help to standardize and streamline study report, thus contributing to a more efficient diagnostic process.

This pilot study aimed to evaluate the potential of AI for the automatic differentiation of distinct manometric patterns in ARM examinations. This study has several highlights. First, to the best of our knowledge, this is the first study assessing the performance of an ML algorithm for application to ARM. In this study, we assessed the performance of 4 ML algorithms (xGB, RF, SVM, and KNN). These algorithms were built uniquely from the extraction of signal features. The use of raw data for the construction of these models makes them more readily applicable for real-time clinical practice, bearing in mind that these models are valid whenever the same ARM protocol is used. Second, all tuned ML models achieved overall good performance marks with mean accuracies ranging between 74% and 85%. The xGB model demonstrated the most robust results, with a mean accuracy of 85% and an AUROC of 0.939. The highest performance of this model may be due to its ability to find nonlinear relationships and deal with missing or outlier values. Third, this study was developed using a large number of ARM examinations performed at a large Brazilian center.

This study has several limitations. First, this is a unicentric retrospective assessing the impact of an AI algorithm on a single system of ARM. Therefore, the results of these study may not be replicable to other ARM systems. Subsequent multicenter studies should evaluate the performance of these algorithms using different systems of ARM and different types of probes (e.g., solid-state probes in addition to water-perfusion catheters). The London classification defines the gold standard rules for the standardization of ARM examinations (8). Our ML algorithm was based on 827 examinations performed over a large timespan (between April 2015 and September 2021). During this period, examinations were performed following a local protocol. Therefore, the results of this study may not be replicable using other ARM protocols, namely that used for the development of the London classification (8). This pilot study should be replicated using widely accepted classifications.

AI is shaping the landscape of medical practice. Gastroenterology, because of its procedure-intense nature, is leading the development of ML for clinical practice. The study of gastrointestinal motility is expected to benefit greatly from the implementation of AI algorithms. Anorectal disorders causing FI or chronic constipation are common health problems with significant impact in the quality of life of patients. This proof-of-concept study is the first to assess the performance of an AI model for application to ARM and may represent a significant landmark to improve its diagnostic accuracy and, ultimately, improve the management of patients with anorectal disorders.

CONFLICTS OF INTEREST

Guarantor of the article: Miguel Mascarenhas Saraiva, MD, MSc.

Specific author contributions: M.M.S. and M.V.P.: equal contribution in study design, data preparation and cleaning, feature extraction, model selection, and drafting and critical revision of the manuscript. T.R. and J.A.: bibliographic review, drafting of the manuscript, and critical revision of the manuscript. J.F.: study design, data preparation, model evaluation, and critical revision of the manuscript. P.S.: data preparation and decryption; critical revision of the manuscript. I.F.J.: performance of the anorectal manometry examinations, feature extraction, and critical revision of the manuscript. H.C. and G.M.: study design and critical revision of the manuscript. All authors approved the final version of the manuscript.

Financial support: None to report.

Potential competing interests: None to report.

Study Highlights

WHAT IS KNOWN ✓ Functional anorectal disorders are common health problems with significant impact on quality of life. ✓ Anorectal manometry (ARM) is the gold standard for the evaluation of suspected motility disorders. ✓ Artificial intelligence is expected to significantly impact the practice of ARM. WHAT IS NEW HERE ✓ We developed machine learning algorithms for automatic differentiation between obstructed defecation and fecal incontinence manometric patterns. ✓ These algorithms, particularly the gradient boost model, reached high levels of accuracy. ✓ These algorithms may be helpful to increase accessibility to ARM and streamline the diagnostic process by this method. ACKNOWLEDGEMENTS

The authors acknowledge NVIDIA (Santa Clara, CA) for providing the graphical processing units used for the performance of this study.

REFERENCES 1. Whitehead WE, Wald A, Diamant NE, et al. Functional disorders of the anus and rectum. Gut 1999;45(Suppl 2):Ii55–9. 2. Bedard K, Heymen S, Palsson OS, et al. Relationship between symptoms and quality of life in fecal incontinence. Neurogastroenterol Motil 2018;30(3):e13241. 3. Maeda Y, Vaizey CJ, Hollington P, et al. Physiological, psychological and behavioural characteristics of men and women with faecal incontinence. Colorectal Dis 2009;11(9):927–32. 4. Belsey J, Greenfield S, Candy D, et al. Systematic review: Impact of constipation on quality of life in adults and children. Aliment Pharmacol Ther 2010;31(9):938–49. 5. Remes-Troche JM, Rao SSC. Defecation disorders: Neuromuscular aspects and treatment. Curr Gastroenterol Rep 2006;8(4):291–9. 6. Goldstein ET. Outcomes of anorectal disease in a health maintenance organization setting. Dis Colon Rectum 1996;39(11):1193–8. 7. Brown HW, Guan W, Schmuhl NB, et al. If we don't ask, they won't tell: Screening for urinary and fecal incontinence by primary care providers. J Am Board Fam Med 2018;31(5):774–82. 8. Carrington EV, Heinrich H, Knowles CH, et al. The International Anorectal Physiology Working Group (IAPWG) recommendations: Standardized testing protocol and the London classification for disorders of anorectal function. Neurogastroenterol Motil 2020;32(1):e13679. 9. Le Berre C, Sandborn WJ, Aridhi S, et al. Application of artificial intelligence to gastroenterology and hepatology. Gastroenterology 2020;158(1):76–94.e2. 10. Repici A, Badalamenti M, Maselli R, et al. Efficacy of real-time computer-aided detection of colorectal neoplasia in a randomized trial. Gastroenterology 2020;159(2):512–20.e7. 11. Kou W, Carlson DA, Baumann AJ, et al. A multi-stage machine learning model for diagnosis of esophageal manometry. Artif Intell Med 2022;124:102233. 12. Kou W, Carlson DA, Baumann AJ, et al. A deep-learning-based unsupervised model on esophageal manometry using variational autoencoder. Artif Intell Med 2021;112:102006. 13. Kou W, Galal GO, Klug MW, et al. Deep learning-based artificial intelligence model for identifying swallow types in esophageal high-resolution manometry. Neurogastroenterol Motil 2022;34(7):e14290. 14. Wang Z, Hou M, Yan L, et al. Deep learning for tracing esophageal motility function over time. Comput Methods Programs Biomed 2021;207:106212. 15. Ahuja A, Kefalakes H. Clinical applications of artificial intelligence in gastroenterology: Excitement and evidence. Gastroenterology 2022;163(2):341–4. 16. Czako Z, Surdea-Blaga T, Sebestyen G, et al. Integrated relaxation pressure classification and probe positioning failure detection in high-resolution esophageal manometry using machine learning. Sensors (Basel) 2021;22(1):253. 17. Barberio B, Judge C, Savarino EV, et al. Global prevalence of functional constipation according to the Rome criteria: A systematic review and meta-analysis. Lancet Gastroenterol Hepatol 2021;6(8):638–48. 18. The Lancet Gastroenterology H. The cost of constipation. Lancet Gastroenterol Hepatol 2019;4(11):811. 19. Deutekom M, Dobben AC, Dijkgraaf MGW, et al. Costs of outpatients with fecal incontinence. Scand J Gastroenterol 2005;40(5):552–8. 20. Knowles CH, Eccersley JA, Scott MS, et al. Linear discriminant analysis of symptoms in patients with chronic constipation: Validation of a new scoring system (KESS). Dis Colon Rectum 2000;43(10):1419–26. 21. Knowles CH, Scott SM, Legg PE, et al. Level of classification performance of KESS (symptom scoring system for constipation) validated in a prospective series of 105 patients. Dis Colon Rectum 2002;45(6):842–3. 22. Jiang AC, Panara A, Yan Y, et al. Assessing anorectal function in constipation and fecal incontinence. Gastroenterol Clin North Am 2020;49(3):589–606. 23. Aziz I, Whitehead WE, Palsson OS, et al. An approach to the diagnosis and management of Rome IV functional disorders of chronic constipation. Expert Rev Gastroenterol Hepatol 2020;14(1):39–46. 24. Wald A. Diagnosis and management of fecal incontinence. Curr Gastroenterol Rep 2018;20(3):9. 25. Carrington EV, Heinrich H, Knowles CH, et al. Methods of anorectal manometry vary widely in clinical practice: Results from an international survey. Neurogastroenterol Motil 2017;29(8):e13016. 26. Lee TH, Bharucha AE. How to perform and interpret a high-resolution anorectal manometry test. J Neurogastroenterol Motil 2016;22(1):46–59. 27. Peery AF, Dellon ES, Lund J, et al. Burden of gastrointestinal disease in the United States: 2012 update. Gastroenterology 2012;143(5):1179–87.e3. 28. Rao SS, Parkman HP. Advanced training in neurogastroenterology and gastrointestinal motility. Gastroenterology 2015;148:881–5.

留言 (0)

沒有登入
gif