Novel artificial intelligence algorithms for diabetic retinopathy and diabetic macular edema

Methodology

To assess the current landscape of AI models related to DR and DME, we conducted a comprehensive literature review through Google Scholar and PubMed, considering studies published up to August 5, 2023. Our search strategy incorporated a range of keywords, including "diabetic retinopathy", "diabetic macular edema", "fundus photograph", "optical coherence tomography", "artificial intelligence", "machine learning", and "deep learning". We limited our selection to articles published in English. When encountering multiple publications from the same study, we considered them as a single entry. Exclusions were made for articles that were published before 2019 or did not present original research utilizing ML or DL techniques or not for DR or DME.

AI in screening and diagnosis of DR

With the growing diabetic population worldwide, the demand for more efficient DR screening methods is increasing, highlighted by their cost-effectiveness and the widespread recommendation for regular screening. Numerous studies on DR screening models using fundus photos have led to the transition of some from experimental development to clinical practice over the years, marking a significant advancement in ophthalmology [4, 5]. These AI systems have progressed from solely detecting DR to identifying other eye diseases like age-related macular degeneration and glaucoma. Primarily aimed at detecting referable diabetic retinopathy (RDR) with high accuracy, AI-driven models provide scalable, efficient, and precise screening solutions. This technological evolution significantly surpasses traditional methods and human graders in efficiency, ushering in an era of automated, large-scale DR screening programs globally.

While these AI systems have shown promise in clinical trials and received regulatory approvals, their performance in real-world settings has yet to be fully convincing. Real-world evaluations of several state-of-the-art AI DR screening systems have revealed inconsistent performance, with sensitivities ranging from 50.98% to 85.90%, and specificities from 60.42% to 83.69% [6]. Similarly, a real-world test of Google's AI system in Thailand revealed a lower specificity compared to the human graders [7]. A high false negative rate risks overlooking diseased individuals, while a high false positive rate may result in unnecessary referrals, raising significant concern regarding the cost-effectiveness of these AI implementations. This performance gap may stem from various factors such as limited diversity and representativeness in the training datasets regarding patient demographics, image acquisition methods, variations in image quality outside controlled environments, and potential overfitting to specific data characteristics. It emphasizes the importance of continuously refining and adapting AI algorithms to enhance their applicability and effectiveness across different populations and clinical settings.

As the field of ophthalmology continues to evolve, the transition from traditional imaging techniques to more advanced imaging modalities has been evident. Ultra-widefield (UWF) imaging captures a greater extent of the peripheral retina, which may provide more prognostic information and allow for more accurate DR diagnosis and grading. In 2021, Tang et al. trained a DL system using 2,861 UWF fundus photos, aiming to assess image gradability and detect both vision-threatening DR and RDR. This model demonstrated a high level of diagnostic performance, consistently surpassing an area under the receiver operating characteristic curve (AUC) of 0.90 across external validation datasets from three countries [8]. Similarly, IDx-DR (IDx Technologies, USA), which was primarily designed for traditional fundus photos, was found to surpass human graders in identifying asymptomatic RDR on UWF images [9]. Despite their relative expense and limited availability making them less convenient tools, UWF images hold potential for further AI research in comprehensive DR assessment.

Another innovative approach involves integrating multiple imaging modalities. While single-modality analysis offers valuable insights, it often provides a limited perspective of complex conditions. Multi-modal analysis, on the other hand, integrates data from various sources, offering a more comprehensive and potentially more nuanced understanding. Hua et al. proposed an architecture to coordinate hybrid information from fundus photos and widefield optical coherence tomography angiography (OCTA). The model achieved robust performance on both domestic and public datasets, with a quadratic weighted kappa rate of 0.902 on the small-sized internal dataset in DR grading, and an AUC of 0.994 in the Messidor dataset in detecting RDR [10]. Nagasawa et al. evaluated the precision of DR staging using a deep convolutional neural network (CNN) with 491 pairs of UWF fundus photos and OCTA images as input. While their results indicated effective DR detection using the combined inputs, they observed that this multi-modal approach did not significantly outperform the single-modality model in terms of diagnostic accuracy [11]. This observation suggests that multi-modal AI techniques may not always offer significantly more information across all applications. Further research is needed to assess their real-world utility and to establish optimal implementation strategies.

AI models for DR lesion segmentation

While DR screening AI systems provide an overall diagnostic assessment of DR, there is also a growing need for a more in-depth, granular understanding of the specific anomalies. Different AI approaches have been deployed for DR lesion segmentation in the recent 20 years, from image processing to traditional ML and DL [12]. Most of the recent segmentation models are based on CNN which can improve generalization, automatically extract features, have higher robustness to variation of image quality, and are more efficient and capable of multitasking compared to traditional ML. In recent years, generative adversarial networks (GANs) have also been used in segmentation tasks for retinal lesions [13, 14]. GANs possess the unique capability to generate synthetic images that mimic normal fundus photos. By comparing these generated normal images with diseased ones, GAN-based models can effectively identify and differentiate lesions [13].

Visualization and annotation of these lesions can be useful in clinical adoption. DR lesion segmentation systems facilitate meticulous analysis by elucidating the precise location, morphology, and extent of each lesion, and can help to quantify lesion counts in an efficient and automated manner. For example, it has been shown that quantitative analysis of DR lesions can help to better predict the risk of progression to proliferative DR (PDR) [15]. Such precision paves the way for better disease staging, personalized treatment, and objective progression tracking.

Integration of lesion detection into DR screening systems was observed to enhance diagnostic performance [16]. Dai et al. introduced a ResNet-based lesion-aware sub-network, outperforming architectures like VGG and Inception, and employed transfer learning from a pre-trained DR base network. As part of a DR screening system, DeepDR, this architecture achieved AUCs of 0.901–0.967 for lesions detection, including microaneurysms, hard exudates, cotton-wool spots and hemorrhage, and the overall DR grading achieved an average AUC of 0.955 [17]. Anderson et al. achieved improved performance by incorporating a segmentation model into a DR classification framework, with manually segmenting 34,075 DR lesions to develop a segmentation model, and then constructing a 5-step classification model [18]. Together with DR screening systems, these AI models can collectively deliver a more detailed evaluation, with screening systems acting as a broad initial assessment and lesion segmentation models providing more in-depth analysis of disease status.

AI models for prediction of DR progression

Beyond disease detection and classification, newer AI algorithms have been developed to predict the development and progression of DR. Individual risk factors such as patient demographics and clinical variables are well known to influence the risk of DR progression [19,20,21]. Structural and functional changes in ocular investigative modalities have also shown association with DR progression, such as wider retinal arteriolar caliber and localized delays on multifocal electroretinogram that corresponded to specific retinal locations where DR structural changes developed [22,23,24].

Al-Sari et al. used ML prediction models to predict the development of DR within five years with an AUC of 0.75 using clinical variables (e.g., albuminuria, glomerular filtration rate, DR status) and an AUC of 0.79 using blood-derived molecular data (e.g., fatty acids, amino acids, sphingolipids) [25]. Several DL algorithms based on fundus photos have also been developed to predict DR progression. Arcadu et al. predicted a 2-step worsening on the Early Treatment of Diabetic Retinopathy Study (ETDRS) scale within one year with an AUC of 0.79 [26]. Bora et al. achieved similar performance in predicting the risk of incident DR within two years. Furthermore, by incorporating clinical variables including diabetes duration and control in addition to fundus photos alone, this further enhanced the predictive performance [27]. More recently, Rom et al. developed a DL model that outperformed prior fundus photos-based algorithms in predicting risk of incident DR up to 3–5 years (AUC = 0.82), and RDR within two years (AUC = 0.81) [28]. Moreover, heatmap analyses using explainable AI techniques revealed that high attention areas of DL algorithms corresponded to regions on baseline fundus photos that eventually developed DR structural changes during follow-up visits. These indicate the potential that DL algorithms possess in uncovering subtle associations in feature-rich fundus photos in the prediction of DR progression.

Such AI models can potentially have a significant clinical impact in personalizing DR screening intervals. For example, patients with DM identified as low-risk for progression may be screened less frequently beyond the existing 1–2 yearly intervals, thereby freeing up much-needed resource capacity for patients at higher risk of disease progression, and who may need more intensive, shorter surveillance intervals. This is particularly important in the face of increasing prevalence of diabetes in increasingly aged populations. For instance, the RetinaRisk algorithm (RetinaRisk, Iceland) generates a recommended screening interval based on the individual predicted risk of developing sight-threatening DR. Its implementation in a Norwegian eye clinic over five years demonstrated safe and effective recommendation of variable screening intervals up to 23 months, compared to fixed 14-monthly screening intervals [29]. This type of individualized DR screening approach is very promising, and if properly validated, can be a major tool for resource optimization in the face of increasing DR disease burden.

AI models for diagnosis of DME

DME is the most common cause of visual impairment related to DR [30]. Timely diagnosis and accurate classification of DME are crucial to ensure appropriate treatment and to prevent further deterioration of vision. Current gold standard diagnostic technique for DME is optical coherence tomography (OCT). Recent works have leveraged DL methods for DME diagnosis from OCT scans [31, 32]. Tang et al. developed a multitask DL system using 73,746 OCT images from three commercially available OCT devices. This system demonstrated AUC values of 0.937–0.965 for DME detection, and 0.951–0.975 for center-involved DME (CI-DME) differentiation across these devices [33].

Although OCT is the definitive imaging standard for diagnosing DME, its widespread use as a screening tool is hampered by factors such as high cost and limited accessibility. In contrast, fundus photography is a more feasible option in primary care settings, owing to its widespread availability and cost-effectiveness. This provides a more accessible avenue for DME screening. Varadarajan et al. trained a DL model using 6039 fundus images from 4035 patients to predict CI-DME. This model delivered an AUC of 0.89, achieving 85% sensitivity at 80% specificity, surpassing even retinal specialist performance [34]. Additionally, for quantifying OCT metrics through fundus photos as input, DL models showcased high accuracy at specific macular thickness cut-offs [35], indicating the potential for quantifying severity of DME as well, which may be useful for triage in teleophthalmology contexts.

Automated identification and quantification of DME biomarkers

Previous clinical research has identified myriad useful DME biomarkers, including subretinal fluid (SRF), intraretinal fluid (IRF), pigment epithelial detachment (PED), hyperreflective dot (HRD), disorganization of inner retinal layers (DRIL), and disruptions in the external limiting membrane (ELM), ellipsoid zone (EZ), and cone outer segment tip (COST) [36,37,38,39]. Though the clinical utility of these biomarkers has been demonstrated in many studies, the detection and quantification of these biomarkers can be tedious or impractical in daily practice. Therefore, there is great interest in leveraging AI technologies for automated analysis of these biomarkers to inform disease prognosis and treatment decisions.

As an important feature for evaluating DME and determining treatment strategy, the retinal fluid become the primary area of interest for research. In contrast to earlier approaches, recent studies have adopted DL architectures for the segmentation and quantification of these fluid areas [40]. Furthermore, the automated quantification of the retinal fluid can also be used to predict visual acuity (VA) [41, 42]. These models offer significant improvements by reducing the need for subjective and labor-intensive manual annotations and providing more accurate segmentations. Unlike traditional methods that depend on low-level features sensitive to image quality, DL models learn to identify features at multiple levels automatically, marking a shift towards more efficient and unbiased research in DME.

Other biomarkers were also areas of interest [43,44,45]. Singh et al. implemented a DL method to detect DRIL, with an accuracy of 88.3% [45]. Orlando et al. developed a DL model to quantify the photoreceptor disruption on B-scans [43]. Multiple algorithms were developed using various techniques for automated detection of HRD in B-scans. However, their performances exhibit significant variability, with Dice similarity coefficient values ranging from 0.46 to 0.91 [46,47,48], which could be due to the relatively small size of the lesions. Further studies are required to enhance the precision of subtle lesion detection.

AI models for prediction of DME treatment response

The therapeutic approach of DME has evolved significantly over the past two decades. Intravitreal anti-vascular endothelial growth factor (anti-VEGF) agents have been established as the first-line treatment option for CI-DME with vision loss [49, 50]. However, there still exists significant heterogeneity in treatment response to anti-VEGF agents for individual patients caused by risk factors including morphological subtype, baseline VA, and concomitant treatments [51, 52]. This variability in treatment outcomes can pose a risk to adherence and ultimately, patient satisfaction.

Recent research has attempted to use AI for longitudinal predictions, focusing on treatment needs and analyzing both structural and functional outcomes. Cao et al. developed a model to predict the treatment response to three consecutive anti-VEGF injections in DME. The model utilized DL to autonomously extract OCT biomarkers, followed by the application of multiple classifiers for response prediction. The random forest model showcased superior results with a sensitivity of 0.900, a specificity of 0.851, and an AUC of 0.923 in predicting good responders and poor responders, even surpassing the predictive ability of ophthalmologists [53]. Alryalat et al. built a model composed of a modified U-net and an EfficientNet-B3 for a similar task, which achieved an accuracy of 75% for classification of treatment responders [54]. Moosavi et al. developed an automated software for analyzing vascular features in UWF fluorescein angiography (UWFFA) images to predict treatment outcomes in DME. They reported AUCs of 0.82 and 0.85 for morphological and tortuosity-based features, respectively, in discerning between treatment "rebounders" and "non-rebounders" [55]. Xie et al. used multiple datasets with OCT data combined with demographic and clinical data to predict 6-month post-treatment response and generate a recommendation to continue injection treatment. The algorithm achieved near-perfect structural prediction, and a mean absolute error (MAE) and mean squared error (MSE) of 0.3 to 0.4 logarithm of the minimum angle of resolution (logMAR) for visual outcome prediction. The accuracy of injection recommendations reached 70% [56]. Recently, Xu et al. used GANs to create post-treatment OCT images based on baseline images. The MAE of the central macular thickness comparing the synthetic and actual images was 24.51 ± 18.56 μm [57]. While the observed difference was not negligible, this study underscores the potential of GANs for structural predictions in ophthalmology. Such AI models, when integrated into clinical practice, could enable ophthalmologists to design more personalized treatment regimens tailored to each patient's unique retinal characteristics. This potentially improves not only the safety of therapy but also enhances the quality of life and creates potential cost savings for patients.

Prediction of visual function in DR and DME

Recent studies have been exploring the use of fundus images to assess visual function [58,59,60]. Kim et al. developed an ML-based VA measurement model using fundus photos from 79,798 patients with different retinal diseases, including DR. Images were divided into four VA categories, and the model demonstrated an average accuracy of 82.4% in estimating these four VA levels [58]. Recently, Paul et al. employed various AI architectures for predicting best-corrected visual acuity (BCVA) using fundus photos in CI-DME patients. The ResNet50 architecture showed the ability to estimate BCVA with an MAE of 9.66 letters, which is within two lines on the ETDRS chart. Additionally, the study observed that incorporating additional clinical visit data could potentially improve predictive accuracy, especially in the subset of patients with lower BCVA [59]. This approach could offer VA estimation for patients unable to participate in chart-based assessments. Moreover, estimating visual function with fundus photos in DR and DME appears promising, given the critical role of BCVA in guiding treatment decisions. However, the field is currently limited by sparse research, raising questions about the generalizability of findings. Consequently, further investigation is essential to validate and expand our understanding in this area.

Telemedicine and remote monitoring

Telemedicine utilizes digital technology to provide healthcare services from distance, allowing patients to access medical consultations and treatments without visiting a healthcare facility. This method greatly improves medical care accessibility, particularly in underserved and rural regions, by overcoming geographic barriers between patients and providers [61]. The integration of AI-based DR screening systems further amplifies telemedicine's potential, offering more effective and efficient diagnostic capabilities. The current applications of screening for DR and DME in the primary care setting are based largely on the analysis of large data throughput collected via conventional retinal fundus or UWF cameras [62]. These approaches are potentially limited in scalability by resource issues such as high cost or shortage of trained personnel either to acquire images or to grade them, especially in remote areas.

The development of hand-held cameras and smartphone-based cameras opened up greater means of accessibility for fundus screening, by overcoming geographical barriers and providing patients in remote or underserved areas access to specialized care [63]. However, the potential for decreased image resolution and quality can be a concern [64].

Efforts have been undertaken to employ hand-held camera-based and smartphone-based fundus images in the field of telemedicine. In a comparative analysis, Ruan et al. evaluated the proficiency of a DL algorithm in identifying RDR from 50° fundus images taken via hand-held cameras against those obtained from traditional desktop cameras. While hand-held devices produced images of superior clarity that were effectively interpreted by human graders, the AI analysis revealed a need for further refinement [65]. Similarly, Rogers et al. reported reduced accuracy in DR detection using hand-held camera images with the Pegasus AI system (Visulytix Ltd., UK) [66]. In contrast, the SMART India Study Group described a DL model in detecting RDR using 2-field fundus photos acquired by nonmydriatic hand-held cameras from 16,247 eyes. The system achieved a high performance in detecting RDR, with an AUC of 0.98 and 0.99 with one-field and two-field inputs, respectively [67]. Significantly, they found that variations in dilation states and image gradability across studies could influence the results [65, 67, 68]. In a recent study, the SELENA + algorithm (EyRIS Pte Ltd, Singapore), which was developed using traditional fundus photos, was integrated into a hand-held fundus camera and paralleled the results of conventional retina specialist evaluations, reinforcing its precision in DR detection in a different use setting [69]. As for smartphone-based images, a study utilizing a DL algorithm previously trained on 92,364 traditional fundus images was subsequently run on 103 smartphone-captured images of varying qualities at 1080p resolution. It found that the algorithm achieved 89.0% sensitivity (95% CI: 81%–100%) and 83% specificity (95% CI: 77%–89%) for the detection of RDR. This was in spite of the presence of multiple glares, smudge and blockage artifacts that at times even required cropping of the retinal images prior to analysis [70]. Sosale et al. evaluated an offline DL-based DR screening software Medios (Medios technology, Singapore) in a clinical setting in India, which demonstrated promising results. This highlights its significant potential for deployment in areas with limited internet resources [71]. These results are encouraging, but more studies, especially based in real-world settings are required to further test this proof of concept. However, if robustly validated, they could be an important tool for increasing access to DR screening in under-resourced healthcare settings, as a means of convenience and cost-effectiveness.

In addition, home-based imaging devices serve as powerful tools for remote monitoring. These devices offer increased accessibility, especially for patients in remote areas, and provide the convenience of conducting regular scans from home, which is particularly beneficial for elderly or mobility-impaired patients. Home-based monitoring facilitates early detection and intervention, potentially preventing disease progression. Furthermore, it contributes to reduced healthcare costs by minimizing frequent clinic visits and integrates seamlessly with telemedicine, enhancing patient engagement and compliance. A home OCT system, Notal Vision Home OCT (NVHO, Notal Vision Inc, Manassas, VA, USA), was validated for daily self-imaging in patients with neovascular age-related macular degeneration [72]. Fifteen participants who were under anti-VEGF treatment performed daily self-imaging at home using NVHO for three months. Images were uploaded to the cloud and analyzed by its DL-based analyzer. Results indicated good agreement between the analyzer and human experts on fluid status in 83% of scans, and 96% agreement between the home-based OCT and in-clinic OCT scans. While similar results have been reported in clinical settings for DME [73], further studies are needed to validate the system's efficiency in a home-based setting.

Technical AI advancements and innovations for DR and DME

In the realm of ophthalmology, technical advancements and innovations in AI are paving the way for groundbreaking improvements. These technological breakthroughs are enhancing accuracy and efficiency in the detection and clinical evaluation of these diabetes-related eye conditions, and providing a more explicit understanding of how AI works for DR and DME management.

A notable development in this area is the application of generative AI, which represents a significant advancement in the field. Generative AI, including techniques like naïve Bayes and linear discriminant analysis, has seen renewed interest, particularly with applications like ChatGPT and in the medical imaging domain through diffusion models and GANs, for tasks including classification, segmentation, and image synthesis [74].

For DR and DME, a common methodology has been to train a GAN with existing retinal fundus images. The trained GAN can then be used to generate synthetic retinal fundus images from the learnt distribution. These synthetic images can then be considered as additional unique image data, with various possible uses. First, these images serve as data augmentation, particularly valuable for addressing imbalances in dataset distributions across DR severity classes. Figure 2 displays a variety of synthesized fundus photographs, each representing different classes of DR. Zhou et al. introduced a GAN with multi-scale spatial and channel attention module, to allow arbitrary grading and lesion label manipulation in the GAN latent space [75]. The synthetic images were then employed to successfully improve the performance of pixel-level segmentation models. Lim et al. formulated a MixGAN model that attempts to automatically adjust the synthetic data distribution to optimize classifier performance [

Comments (0)

No login
gif