Comparing code-free and bespoke deep learning approaches in ophthalmology

For clinicians without DL expertise, and without easy access to experts in this area, CFDL can allow them to prototype novel clinical AI systems. At the same time, for AI experts, CFDL can potentially make the process of training models easier by accelerating the model development pipeline. From our review, it is clear from the studies that CFDL has been showing a promising horizon in multiple ophthalmological tasks including DR screening, multi-retinal disease differentiation, surgical video classification, oculomics research and resource management.

Most of the studies we reviewed were hopeful for future integrations of CFDL into different practice areas [14, 18, 20, 22, 24]. However, we note that positive conclusions drawn on CFDL’s benefits were largely based on the system-derived performance results [14, 18, 20, 22, 24]. Not all CFDL algorithms had undergone further comparison to bespoke DL to prove their unique value and benefits. Furthermore, discussions of CFDL were mostly done mono-dimensionally, seldomly discussing other implementation demands of AI, such as acceptance and applicability.

The need for publicly available datasets for external validation

An important practice in ensuring the broad applicability of AI systems is external validation [26]. It is a vital step in the development of viable AI-powered medical decision-support systems [27]. Internal validation alone is insufficient to prove the models’ ability to maintain their performance in contexts that are different from those from which the training data was obtained [28, 29]. Often, the testing contexts in internal validation are not sufficiently different from the training contexts (e.g. data attributes), and as such, the validated model may be prone to failing generalizability in settings with distinct data contexts (i.e. data shifts) [26]. Many variables, including imaging equipment, ethnic distribution and disease manifestations in the deployment setting may result in model performance drops upon deployment [26, 30]. Thus, assertions about the applicability of CFDL models may be overstated.

To ensure the robustness of models, it is generally recommended for external validation datasets to have limited overlap with the training set [26, 31, 32]. As such, the availability of free, open-access big data sets will be important to externally validate AI models in general and CFDL model in particular [32, 33]. These open-access datasets can save researchers the cost, time and effort of manually combining and cleansing local data from various distinctive sources [32, 33]. Moreover, these datasets that span a diverse variety of populations, settings and case mix variations add to the rigorousness of the validation approach [32, 33].

Systematic approach to model’s decision-making

When it comes to opting for an AI model for a certain task, it is unarguable that the chosen algorithm should be the best candidate for the task. In other words, it should prove its value by showing the superior task-specific advantages it offers over other AI counterparts. Hence, conclusions regarding the beneficial use of CFDL can only be drawn when it has been holistically compared to traditional DL in the task of interest. It is most accurate for ophthalmologists attempting to compare between CFDL and traditional DL’s fittingness for a task to take into account both model performance and implementation considerations. Implementation considerations include the developer’s intentions, user acceptance and cost-effectiveness. However, since trade-offs tend to exist between the different considerations [34], it is imperative that ophthalmologists weigh their relative importance and identify the model that has achieved a fine balance between the factors for the specific context. Future investigative discussions of AI are encouraged to be conducted multidimensionally to better display the model’s context-aligning benefits.

Developer intention

Uncovering the clinician’s ultimate goal is a crucial first step for assessing the fittingness of CFDL in a specific task. In DR screening, it is evident that developers’ objectives were to find a low-cost tool to cover for ophthalmologists in community screenings [14, 17, 35]. Cost is an important consideration in this screening context, especially since the issue of limited public funding reserved for screening projects had been identified by the developers [35]. For multi-retinal-disease classification, the authors aimed to utilise the automated systems for making clinical diagnostic decisions [18, 19]. The developers were seen emphasising the model’s precision [18, 19]. Precision, in this context, is a model’s reliability in producing clinically correct diagnoses, considering plausible concerns of patient health being potentially harmed by inaccurate decisions [36].

Model interpretability is key for fostering the trustworthiness and reliability of an AI system as it opens the portal for ophthalmologists to reason with the algorithm’s operational logic and ascertain clinical justifiability within an algorithm [37, 38]. Hence, model interpretability is considered a significant model quality in the clinical diagnostic context. The knowledge of the developer’s intentions encourages a better understanding of the model qualities for successful AI integration into clinical practice with minimal clinician rejection. Such an awareness of developer intentions can be exploited to screen out CFDL as a beneficial candidate in incompatible ophthalmological tasks. For example, poorly interpretable CFDL can be ruled out as a beneficial candidate for the multi-retinal-disease diagnostic task.

Patient acceptance

The next step in the suitability evaluation of CFDL involves the acknowledgment of the patient’s acceptance. Knowledge of patients’ concerns and attitudes towards AI’s participation in their management pathway helps to ensure the smooth implementation of the model and avoid the use of CFDL in those scenarios that involve patients’ opposition to certain qualities in CFDL. Due to the absence of patient attitude information in the CFDL studies [14, 18, 20, 22, 24], additional questionnaire studies on patient attitude were surveyed. Uncertain model reliability associated with poor model interpretability (i.e. black box) was found to be one of the greatest concerns patients have towards the use of clinical AI [39]. Interestingly, reluctance towards AI uses was expressed only when inadequately interpretable AI models were to proactively take part in high-stake decision-making [39]. However, a welcoming attitude towards AI was discerned when AI was to be utilised in low-risk settings [40]. Patients deemed the unsatisfactory model interpretability situation less worrisome as long as the ambiguous model actions play no part in direct patient management and are not empowered to potentially inflict harm on patients’ well-being [39].

In addition to patient acceptance, regulatory clearance of any AI model, whether CFDL or bespoke, remains a significant challenge. Realistically, CFDL models are best suited for non-clinical, potential use cases that do not require approval as a medical device. For example, CFDL can be used for post hoc analysis of clinical trial data, prototyping of AI system development, and development of AI systems for clinical trial feasibility planning and pre-screening.

Ensuring data privacy is also important, particularly for CFDL models since clinicians would typically upload datasets for training and testing on company-hosted websites to build models. Clinicians might not always be fully aware of how their data is stored, processed or potentially shared within these platforms. Therefore, delegating to legal regulations may assist CFDL users to safeguard their data from a legal standpoint (e.g. confidentiality agreements with AI firms on privacy issues).

Cost factor

Cost is an integral element to pay attention to in the assessment of CFDL’s compatibility with the task nature. Operational cost and cost-effectiveness are two important concepts in the cost domain. Operational cost is a useful indicator to assist tasks with clear ‘low cost’ objectives, like DR screening, to locate potentially cost-beneficial tools on the superficial level. With previous evidence proving CFDL’s capability of processing up to 35,000 images with less than US$100 needed [13], CFDL is disposed to offer low-cost options that support the full ML workflow [41]. Yet, in reality, model cost extends beyond operational costs [42]. Hence, cost-effectiveness is a more accurate representation of the cost-beneficial attribute of an AI tool. By calculating the cost-effectiveness with the proposed formula of willingness to pay (WTP) × change in quality-adjusted life years (QALYs) − change in cost [43], an AI tool is better certified to provide long-term cost-saving benefits. The authenticity of the cost-friendly qualities in CFDL can also be validated.

Redefining opportunities with CFDL

In light of the limited information available, a multidimensional analysis of how CFDL fits in the tasks of ophthalmological training, oculomics research and resource management is not possible. Future model studies on such tasks, as well as the previously discussed screening and diagnostic tasks, are encouraged to incorporate investigations of task intention, patient opinion and cost expectation. It can only be concluded that CFDL opens new doors of opportunity for ophthalmological training, oculomics research and resource management. CFDL may also enable the creation of surgical video libraries for trainees’ self-learning given CFDL’s ability to process vast amounts of data in a computationally less expensive manner than traditional DL [20]. As for oculomics research, CFDL may offer benefits in the early research stages, especially when there’s a minimal guarantee of results. CFDL could provide ophthalmologists with a cost-friendly platform to boldly test out their hypotheses in initial research stages without having to bear heavy financial burdens from model development. As demonstrated by Yeh et al. [21] and Munk et al. [23], model interpretability tools like saliency maps and the What-If tool could help keep an eye on the clinical relevance and plausibility of CFDL-identified novel ocular biomarkers. With more evidence amassed from CFDL analyses on the potential biomarkers, it becomes incentivising to perform DL studies to verify the legitimacy of the CFDL-discovered novel biomarkers. This is because investments in the construction of traditional DL models for mass data analyses tend to be financially dissuasive in the face of little proof of success [44, 45]. In terms of resource management, CFDL was seen making accurate patient admission forecasts at an ophthalmology department and was believed to favour hospital resource management [24]. Taking into account the fact that future admission predictions are liable to high levels of fluctuation in an ever-changing clinical environment [24], like the hit of a pandemic, readily accessible CFDL can be exploited to guide resource planning, e.g. staff and operation theatre in advance with its rough estimations of patient volume [24].

To summarise, CFDL should be evaluated multidimensionally on a case-by-case basis in order to draw conclusions regarding its helpful impact. We did not emphasise performance considerations in our evaluation since comparisons between CFDL and bespoke DL models are prone to bias, especially when different datasets are used to create models for the same task. Furthermore, it is more practical to compare the model’s diagnostic performance to current clinically established gold standards of diagnoses in order to provide evidence supporting the use of AI in current clinical practices, especially since the majority of CFDL and bespoke DL models have achieved high accuracy (80–90%). Therefore, an exact value-to-value comparison in model performance measures has limited implications in the decision-making of a model for the task, given the model’s sensitivity to vary with the dataset and training dynamics [46, 47].

Comments (0)

No login
gif