Artificial intelligence in medicine and the negative outcome penalty paradox

Artificial intelligence (AI), a ‘collection’ of ‘related technologies’ through which computers simulate human cognitive processes, has recently generated considerable interest among both clinicians and in the broader public for its hypothetical applications to medicine and its ‘potential to transform many aspects of patient care’.1 In particular, machine learning, described in basic terms as computers ‘learning from data without being explicitly programmed’, is believed to hold considerable promise.2 Among the touted benefits of this impending ‘AI revolution’ are its impact on ‘medical diagnostics’, where it is predicted to improve ‘accuracy, speed and efficiency’ in the analysis of images, including ‘X-rays, MRIs, ultrasounds, CT scans and DXAs’.3 In fact, these technologies are currently used in multiple areas of clinical practice. Their adoption has been quickest in the field of radiology, where an American College of Radiologists survey from 2020 reported that one third of all radiologists relied on AI technologies in their clinics.4 While far from infallible, AI has already demonstrated ‘impressive accuracy and sensitivity in the identification of imaging abnormalities’.5 For instance, AI has been shown to be more than 98% accurate in diagnosing COVID-19 from chest radiographs.6 In addition, AI can detect various arrhythmias on EKGs with a confidence of 98%–99%.7 Despite this potential, concerns have been raised about both the clinical efficacy and the ethical implications of AI in medicine, factors that may make physicians and healthcare institutions more apprehensive in adopting AI-based practices.8 9 Focus groups on the potential for AI in medicine have found that laypeople believe that physicians ‘must be at the center of medical decision-making to preserve patient safety’.10 A few scholars, particularly Mezrich and Segal, and the team of Gerke et al, have addressed legal issues related to the use of AI in medicine.11–13 However, no scholarship appears to have considered the interface between these legal issues and these public attitudes, especially as they related to jury determinations of liability. Yet that interface raises a set of distinct concerns that likely will have to be resolved by policymakers before many physicians and institutions will feel truly comfortable embracing diagnostic AI.

Liability for AI

In the absence of statutes, common law countries, such as the USA, Canada and Great Britain, rely on judicial application and interpretation to determine liability standards. At present, no common law country has yet adopted a statute that specifically establishes guidelines for determining the liability of physicians in cases involving the use of AI tools, so these cases are governed by the same rules that apply to all other cases of malpractice. In the USA, for instance, ‘[c]linicians must treat patients with due expertise and care’ and ‘they need to provide a standard of care that is expected of relevant members of the profession’.13 As a result, as Gerke and colleagues have noted, since AI is not yet the ‘standard of care’, a physician who defers to an incorrect diagnostic assessment by an AI tool will likely be held liable for a negative outcome.13 Alternatively, they argue that if AI were adopted as the ‘standard of care’, then ‘the choice not to use such technology would subject the clinician to liability’.13 Analogously, under this reasoning, if AI were adopted as the ‘standard of care’, then a physician overruling a correct AI result would be open to liability.

Solving this dilemma appears speciously simple. Once a firm rule is established that either the human provider or the AI tool has the ultimate authority as the standard of care, then physicians will be on notice to follow that approach—and presumably, jurors will hold parties liable only for deviations from it. The former approach is analogous to treating the AI tool as one might a human consultant from another medical specialty, a clinician whose opinion the patient’s physician is free to accept or reject at will. However, the distinctive nature of AI, as discussed below, suggests that juries are unlikely to treat AI tools as they might the opinions of human consultants. In addition, this possible solution does not resolve the challenge of who will be liable for AI errors in situations in which AI is the standard of care, an issue raised by Mezrich, who concludes that in such cases, ‘the facility hosting the AI likely would bear liability’, and the ‘malpractice principles’ would give way ‘to a form of enterprise liability’.12 However, on the surface, establishing a rule that places ultimate authority either on the human provider or on the AI tool appears to establish the level of certainty that can guide physicians and lead to the increased use of AI technologies. While the choice between human provider and AI tool clearly has ethical and clinical implications, from the point of view of legal certainty, choosing a clear rule in this area seems far more important than the specific nature of that rule. By analogy, minor safety benefits may arise from driving on either the left side of the road or the right, but ensuring that all drivers follow the same rule is clearly far more important for safety in this regard.14 Unfortunately, as discussed below, what is known about juror behaviour generally and about public attitudes toward AI suggest that this solution may not prove sufficiently effective.

Jury attitudes toward liability

Common law systems afford juries considerable latitude in determining the reasonableness of conduct in medical liability cases. Some jurors, of course, will engage in nullification to achieve what they believe to be a just outcome, independent of the law or their assessment of the facts before them—a concern in all liability cases that is not particular to AI.15 But even juries acting in good faith are subject to a range of potential biases. Despite widespread perceptions that juries are generally biased toward plaintiffs in medical malpractice litigation, substantial data indicates that this is not in fact the case.16 For example, an ‘expanding body of evidence suggests that jurors begin their deliberations favoring physician-defendants and doubting the motives of plaintiffs’ in such cases.17 However, two forms of bias do likely influence jury verdicts: ‘hindsight bias’ in which ‘bad outcomes seem more predictable in hindsight than they were ex ante’, and ‘outcome bias’, which ‘leads people to assume that individuals who cause accidents have been careless’.17 Studies have shown that the ‘influence that outcome bias exerts on both the public’s and physicians’ attitudes about medical errors is quite noteworthy’, while jurors have been shown to exhibit ‘dramatic difference’ in punitive damage assessments as a result of hindsight bias.18 19 At best, efforts to mitigate or eliminate the influence of these phenomena on juries have demonstrated mixed results, although bifurcation of determinations of responsibility and awarding damages has demonstrating some theoretical efficacy.17 20

Negative outcome penalty paradox

The distinctive nature of the relationship between human diagnosticians and AI creates a particularly high risk for both hindsight and outcome biases, as the following scenarios demonstrate. For the sake of analysis, consider a radiologist reading a chest X-ray to screen for lung cancer. In this scenario—and this case is hypothetical, but not implausible—the AI tool has been found to be 99% effective, while a missed diagnosis poses a high risk for increased mortality. Cases in which the radiologist and the AI tool agree and both are correct will not prove problematic. However, imagine a situation in which the radiologist doubts the accuracy of the AI reading, suspecting that the read falls into the 1% category of cases in which the tool is mistaken. If the AI tool has diagnosed the patient with lung cancer correctly, but the physician overrules it incorrectly, then a jury, prone to hindsight bias, is likely to find liability, no matter how much care the radiologist has rendered in his analysis. Similarly, if the AI tool does not diagnose the patient with lung cancer, but the physician overrules the tool incorrectly, resulting in additional, unnecessary interventions, such as a biopsy, leading to a serious infection, then a jury, prone to hindsight bias, is also likely to find liability, no matter how much care the radiologist has rendered in his analysis. This dilemma presents Hobson’s choice for the clinician in that either approach to similar situations will be penalised if a negative outcome occurs: in other words, a ‘negative outcome penalty paradox’ (NOPP). Certainly, additional diagnostic tests can be imposed to confirm or reject the initial determination, which is often the current practice, but the more such tests are used, the less AI becomes meaningfully helpful. At the extreme, if the results of every AI tool were confirmed by another measure, AI would prove useless.

While the risk of an NOPP exists in all forms of liability, several specific aspects of AI in medicine elevate this threat significantly. First, unlike in many clinical scenarios, the decision of a physician whether or not to overrule a diagnostic AI tool is binary and clear cut, allowing jurors to pinpoint one decision as the cause of a particular outcome. Second, since ‘deep learning algorithms are complex forms whose results can be opaque’, it may be impossible for physicians ‘to explain why a particular diagnosis or recommendation’ was generated by the AI tool.21 In malpractice cases, these ‘black box’ algorithms may prove a particular challenge to clinician defendants, who will face the daunting burden of justifying to juries the acceptance or rejection of AI recommendations whose underpinnings they cannot explain.21 Third, a fundamental disconnect exists between patients’ attitudes toward AI in theory and the data on its efficacy. For instance, a recent survey found that patients preferred a human being over an AI tool when diagnosing a rash by a margin of 81%–15%, even though data shows that AI tools are significantly more accurate than dermatologists in diagnosing many skin conditions.22 23 The ambivalence creates added room for hindsight bias as jurors will likely rely on their affective perceptions in cases in which the physician fails to override inaccurate AI, but rely on the AI accuracy data in cases in which the clinician overrides a correct AI assessment. Fourth, Madeleine Elish has found that in a range of situations with negative outcomes, a so-called ‘moral crumple zone protects the integrity of the technological system, at the expense of the nearest human operator’, and that observers tend to blame human beings over machines, even when objective evidence suggests the machine to be at fault.24 While anthropomorphising AI tools may increase the public’s willingness to attribute responsibility to them to some degree, doing so with regard to imaging devices may prove logistically and conceptually difficult.25 Until these three factors can be mitigated, NOPPs pose an obstacle to the adoption of AI tools in medicine.

A pathway forward

The challenge of the impact of the NOPP on medical liability in AI-related malpractice cases defies easy solutions. Three different categories of potential remedies might be considered: (1) reforms that involve altering the way that AI can be used in the clinical setting; (2) reforms that remove AI cases from the hands of juries entirely; and (3) reforms that construct distinct sets of liability rules specific to the use of AI in diagnostic medicine. The first set of approaches might include establishing AI systems that cannot be overruled by clinicians. Needless to say, such ‘hard stop’ approaches may face stiff opposition from the medical establishment, as physicians are unlikely to be willing to sacrifice their decisional authority. Moreover, any such approach risks losing one major, promising benefit of AI: namely, that while AI tools may be statistically more efficacious than clinicians alone in some cases, combining the assessments of physicians with those of AI tools, and then allowing clinicians to overrule AI tools, may lead to even more efficacious results. The second set of approaches includes such alternatives to jury trials as bench trials in front of specialty trained judges, tribunals before panels of experts and no-fault insurance systems, such as those used at present-day in workers’ compensation claims. Even if such alternatives were legally possible, one might expect both advocates for the rights of plaintiffs and professional trial lawyers to thwart their adoption. In the USA, the politically powerful trial lawyers lobby ‘has consistently opposed the statutory preemption of common law rights’ that such a change would entail.26 The third set of approaches, potentially the most promising, would require all stakeholders in this controversy—clinicians, insurers, trial lawyers and patient advocacy organisations—to agree on distinct sets of liability rules specific to the use of AI in diagnostic medicine. For example, clinical situations in which a clinician seeks to overrule an AI tool might automatically require a second clinician to independently conduct an assessment, blinded to the first clinician’s assessment. A novel rule might establish that overriding an AI tool without such a second assessment be presumed negligence by default. In contrast, if both clinicians agree to overrule the AI tool in independent assessments, doing so might then automatically shield them from liability. That would still leave unresolved for liability purposes those cases in which the two clinicians disagree as to whether to overrule the AI tool, but at least the number of cases subject to the NOPP would be smaller. Or, an entirely different set of specific rules might be developed.

The goal of this paper is not to provide a solution to the NOPP, but rather to ensure that policymakers acknowledge it and attempt to address its implications for clinical practice. Each of the approaches discussed above is not without its potential advantages and drawbacks. Alas, determining which of them, if any, might prove efficacious, is hard to predict in advance. Fortunately, the federal nature of the American judicial system affords an excellent opportunity to explore a range of possible reforms. States may adopt different standards of care, giving policy makers a chance to try out different fixes.27 As Supreme Court Justice Louis Brandeis famously observed, the individual states can serve as ‘laboratories’ of democracy.28 The rapid pace of technological development and the novel nature of AI cry out for such legal experimentation.

Conclusions

Policymakers and philosophers alike increasingly speak of ‘the promise and the peril’ of AI.29 In clinical medicine, much of the focus has been on the promise, which does have the potential to prove transformative. Far less discussion occurs regarding the logistical barriers that may impede the implementation of AI. Yet as with all novel technologies, especially transformative ones, these technologies can only realise their potential if they operate within a congenial legal framework. At present, the NOPP poses a significant barrier to the use of AI in clinical practice. Overcoming this challenge is essential if AI is to live up to high expectations.

Data availability statement

Data sharing not applicable as no datasets generated and/or analysed for this study.

Ethics statementsPatient consent for publicationEthics approval

Not applicable.

Comments (0)

No login
gif