Incorporating variability in economic evaluations requires a structured approach to ensure that differences in patient characteristics and healthcare environments—specifically those that lead to variations in health and cost outcomes for patients receiving the same treatment—are systematically accounted for. The process begins by identifying sources of heterogeneity that are relevant based on potential clinical, economic, and contextual significance, and then assessing these candidates for inclusion using both quantitative and qualitative methods. For analyses relying on modeling, rather than patient-level trial or real-world data (RWD) alone, this step also includes generating the parameters necessary to capture the heterogeneity, such as subgroup-specific treatment effects or cost estimates. The economic analysis is then performed, integrating these factors. Analytical frameworks can also be applied to estimate the expected net benefit of stratified decisions, or, conversely, the expected opportunity cost of not implementing them.
2.1 An Informal Taxonomy of Potentially Relevant Sources of HeterogeneityThe six classes of patient and non-patient heterogeneity described in Sculpher’s seminal informal taxonomy capture a broad spectrum of factors that should be considered when planning economic evaluations [2]. In Table 1, we present a revised informal taxonomy, building on Sculpher’s framework and incorporating several revisions suggested by Kohli-Lynch and Briggs [1].Footnote 1 The first five classes are, at least potentially, knowable at the time of treatment selection, while the sixth is not. The following paragraphs explain these classes.
Table 1 Classes relevant for a comparative economic analysisThe first class, “intervention-related,” includes factors that vary across patients and relate to the treatment [2]. Relative treatment effects, whose variation is often explored in a subgroup analysis of clinical trial data, fall in this class. For example, statin therapy has been shown to produce greater relative risk reductions for cardiovascular events in patients with higher baseline low-density lipoprotein (LDL) cholesterol levels than in those with lower levels. In one study, patients with baseline LDL cholesterol exceeding 160 mg/dL were observed to experience twice the proportional risk reduction in cardiovascular death compared with those with LDL levels below 100 mg/dL [19]. In the US context, addressing treatment effect heterogeneity due to systematic differences in social determinants of health, like health-related behaviors and socioeconomic and environmental factors, is likely to be important, given their strong association with disparities in access to care health outcomes, and costs [20, 21]. This class also includes treatment uptake rates. Uptake will likely vary when treatment effects are heterogenous, with greater use in subgroups of patients expected to benefit disproportionally [22]. Costs that relate systematically to interventions may also be contained in this class. For instance, insulin and some chemotherapies are dosed according to patient weight [23, 24].
The second class, “disease- but not intervention-related” includes factors that vary across patients and are unrelated to the treatment. This class includes heterogeneity in baseline absolute risks related to patient characteristics. Note that even when relative risk reductions are identical in two subgroups, an intervention will prevent more events in a subgroup with a higher baseline risk. Continuing with the statin example from Class 1, higher baseline LDL levels have also been shown to be associated with a greater absolute risk [19, 25]. The finding that statin therapy prevents more events in patients with higher LDL, thus, is due to both a higher baseline risk and larger relative risk reductions, and appropriately addressing heterogeneity requires considering both. A recent example that included both classes found that the ICER was eight times higher for the subgroup with LDL levels below 3.4 mmol/L compared with the subgroup with LDL levels above this threshold [26]. Another example involving statin therapy also considered both Class 1 factors (age-specific statin initiation rates) and Class 2 factors (age- and sex-specific risks for myocardial infarction) [27]. This study also considered age- and sex-specific utility weights, which are a member of Class 4 (described below). Social determinant of health factors may also be pertinent members of this class, as they may be important determinants of disease severity. Class 2 also includes factors that relate systematically to costs; for instance, annual costs in individuals with coronary artery disease have been found to vary with age and co-morbidities like heart failure and diabetes mellitus [28].
Class 3, “unrelated to both the disease and the intervention,” includes factors that influence a patient’s capacity to benefit from an intervention, independent of both relative and absolute treatment effects. For example, the elderly and terminally ill have shorter life expectancies than others, limiting the time over which treatment-related health gains can accrue. Note that stratified decisions can raise ethical concerns, such as the potential to discriminate against the elderly or infirm (discussed further in Sect. 3) [2]. Age also impacts health outcomes indirectly through its role in determining Medicare eligibility, which affects access to care and treatment options. Socioeconomic status, proximity to care facilities [29], and individual insurance coverage terms can also impact both health outcomes and costs. Systematic differences in future costs unrelated to the disease and the intervention, such as the lifetime costs for a child surviving an otherwise fatal condition, are included in this class, and the Second Panel on Cost-Effectiveness in Health and Medicine recommends their consideration in a cost-effectiveness analysis when survival is likely to differ across interventions [10].
Class 4, “patient preferences,” includes factors that relate to the value patients assign to their health and healthcare (e.g., treatment options, associated side effects, and health outcomes) [1, 2]. For example, some patients with late-stage cancer may refuse chemotherapy and opt for palliative care, suggesting a higher relative value of quality of life over quantity, while others may choose treatment. Subgroup analyses informed by these preferences can ensure that patients with varying priorities receive treatments aligned with their individual needs. In traditional cost-effectiveness analyses, the focus is on health-related quality of life and preferences for different health states are measured using survey instruments [30, 31]. Variation in how patients value health naturally results in differences in cost effectiveness. The Generalized Risk-Adjusted Cost-Effectiveness (GRACE) approach, a recent major advance over a traditional analysis, explicitly accounts for the diminishing marginal utility of health, and in so doing, variation in the preferences for health improvements [32]. Specifically, unlike the traditional approach, the value of a given health improvement is greater for patients in poorer than in better health with GRACE. This approach additionally allows for variability in preferences for risk, so that treatments offering more certainty in beneficial outcomes are valued more by those patients who are more risk averse (and vice versa for risk takers) [33]. As such, the GRACE framework can be used to consider other uncertainty-related concepts—and Class 4 factors—introduced by ISPOR’s Special Task Force on Defining Elements of Value, like value of hope (the preference for treatments with a positively skewed distribution of outcomes); the value of knowing (e.g., the value of accurate predictions as to who will respond to treatment); and insurance value (value to healthy individuals of being protected from both the physical and financial burden of a disease) [34,35,36]. In the USA, preferences regarding the financial outlay required to receive a treatment may be particularly pertinent. Accounting for the diversity in the willingness to pay for treatments can inform the design of insurance terms that encourage patients receiving care that is of high value to them [37, 38].
The defining feature of Class 5, “non-patient factors,” is systematic variability unrelated to the patient. In the USA, with a diversity of stakeholders and stakeholder goals, there is an especially high degree of variation in the key non-patient drivers of health and cost outcomes [39]. Studies have shown that outcomes vary with healthcare system features like hospital characteristics and insurance type. For example, the mortality risk for patients admitted for acute myocardial infarction or heart failure was estimated to be lower for those in teaching hospitals compared with non-teaching hospitals [40]. One study estimated that private insurance paid 37% more than traditional Medicare, with Medicare Advantage paying an additional 10% above this, for inpatient care for five common inpatient admissions, even after controlling for enrollee and hospital mix [41]. Chernew et al. found, similarly, that commercial fees were about double that of Medicare in an examination of state-level price variation [42]. Health and economic outcomes may also vary by physician. For example, being assigned to an experienced general surgeon was found to reduce the risk of severe complications and mortality by 5% versus being assigned to a newly graduated surgeon [3]. Non-patient preferences, such as those held by the public and healthcare providers are contained in this class and may be relevant to consider depending on the decision problem that the economic evaluation is intended to inform. For example, one economic analysis of the value of intensive versus conventional glucose control with oral drugs in different age intervals of older adults living with diabetes found that, when US population-based utility weights for health states and treatments were used, ICERs ranged from $136,000 for those aged 65–70 years to $24 million for those aged 90–95 years [43]. However, when utility weights derived from these patients were applied, the ICERs were dramatically lower, by 51% ($67,000) for those aged 60–65 years and 94% ($134,000) for those 90–95 years, respectively. In addition, unlike single-payer systems, drug-market intermediaries—like pharmacy benefit managers who negotiate with manufacturers on behalf of insurers, and employers, many of whom sponsor health insurance and manage employee benefits play a significant role in the US healthcare system [44].
In contrast to the first five classes, which consist of factors observable at the time of the treatment decision, the final class —“factors revealed ex post”—includes factors that are only known after treatment exposure. The most intuitive example is the response to treatment. For instance, the continuation of weight-loss treatments often depends on achieving a minimum weight reduction (e.g., 5%) [45, 46]. Similarly, an economic evaluation that incorporated intensive guideline-directed medical therapy optimization for patients hospitalized with heart failure found that continued assessment of dose and patient adherence was cost effective versus usual care in the USA [47]. The occurrence of adverse events, which could potentially be avoided by adjusting dose or switching treatments, is another factor in this class. In the US setting, this category can also include costs and patient co-payments that remain uncertain until claims are adjudicated, as differences in these can influence the probability of treatment continuation, and hence health outcomes.
2.2 Including Heterogeneity in a Comparative Economic AnalysisThere is a large body of methods to address heterogeneity in a comparative economic analysis. Much of this work is summarized in three, mostly non-overlapping, literature reviews that serve as useful references [48,49,50]. The work steps can loosely be divided into design, analysis, and reporting phases. We consolidate key points from the literature reviews and provide additional context below, but we refer the reader to the previous reviews and to the original sources for full details.
2.2.1 Design PhaseThe possibility that patient and non-patient heterogeneity may influence health and outcomes should be considered in most, if not all, comparative economic analyses. Identification of what sources may be relevant, if any, ideally begins in the planning stage of a study. This process should follow the scientific method, utilizing evidence review, testable hypotheses, and analysis [12]. Given the complexity of heterogeneity in healthcare, a multidisciplinary team is crucial to ensure all perspectives are considered. This could involve experts from fields like sociology, psychology, and health informatics, among others. Patient input is also essential to align the analysis with the needs of impacted populations [51]. Pre-specifying subgroups can enhance study credibility, but it may be infeasible—particularly early in a treatment’s life cycle—because of limited data and knowledge [2, 12].
A number of considerations should be taken into account when designing the study: (1) which sources of heterogeneity to address, (2) evidence availability and opportunities for generating new evidence, (3) suitable statistical methodologies and analytical frameworks, (4) ethical implications of stratified decision making, and (5) practical barriers or challenges.
Each of the six classes of heterogeneity outlined in Sect. 2.1 should be considered for inclusion. While comprehensive, these classes are not mutually exclusive. Multiple factors may be relevant for a given application and any individual factor may affect value through multiple pathways, for example, baseline LDL levels can influence both relative and absolute treatment effects.
Selection of which sources of heterogeneity to address should be evidence driven. In addition to clinical trial data, surveys, systematic literature reviews, meta-analyses, expert panels, and new analyses of existing data can provide important information. In the USA, RWD play an important role as market access decisions are frequently updated; insurers routinely revise formularies (e.g., annually) and open enrollment periods allow insured populations to switch plans. Where generating new quantitative evidence is impractical, a mixed-methods approach—combining qualitative and quantitative analyses—can be used to synthesize measurable and unmeasurable factors in a cohesive decision-making framework [52]. Shields et al., for example, recommend using logic models to structure theories, evidence, expert opinion, assumptions, and preferences [12], an approach recently adopted by the US Food and Drug Administration in a draft guidance aimed at standardizing risk management programs [53].
The ethical implications of stratified resource allocation decisions should be carefully considered, ideally with input from a multidisciplinary team and patients with direct disease experience [1, 2, 12]. Even when economically efficient, decisions that deny care—particularly when based on characteristics like age, sex, disability, and race—may violate fairness norms. Stratified decisions may be defensible when clear evidence shows differing treatment effects across subgroups [1, 2, 12]. For example, older patients may have a diminished immune response so that the efficacy of immunotherapies is severely limited [54]. Stratified decision making does not always exacerbate disparities; in some cases, they can serve to reduce them [1]. When stratified decision making is deemed unethical, a subgroup analysis can nonetheless help inform decision makers as to the opportunity costs of a population level decision [1, 2, 48].
Finally, potential barriers to implementing stratified decisions in routine practice should be examined. These include challenges in observing relevant factors or identifying appropriate proxies, the costs of gathering necessary data (e.g., biomarker testing), and insufficient data availability (particularly for rare diseases) [1, 2, 48]. Additional concerns may arise if subgroup membership is based on information that could be gamed (e.g., a weight threshold to qualify for weight loss therapy) [2].
2.2.2 Analysis PhaseIn the analysis phase, study methods are applied to study data to determine whether and how heterogeneity should be parameterized in the comparative economic analysis. Statistical methods—including an exploratory analysis, hypothesis testing, and a confirmatory analysis—can help to determine whether heterogeneity should be explicitly addressed, which types of heterogeneity should be considered, the optimal subgroup definitions, and the empirical estimates of these effects to be incorporated into the analysis. These methods must be aligned with the data that are available.
Willke et al. reviewed eight statistical methods for addressing heterogeneity of treatment effects, which is a factor in Class 1 [49]. While the intended readership was researchers interested in incorporating heterogeneity of treatment effects into randomized controlled trials, the paper also discusses issues related to non-randomized data. The eight methods covered include: conventional subgroup analysis of a clinical trial, subgroup analysis of meta-analyses and meta-regression, predictive risk modeling, classification and regression tree analysis, latent growth modeling/growth mixture modeling, series of n of 1 trials (i.e., repeated cross-over trials for single patients), quantile treatment effect regression, and non-parametric methods like kernel smoothing methods. The authors provide a description for each method.
The trade-off between increasing subgroup specificity and greater uncertainty is inherent to all eight methods, as dividing the population into smaller groups—whether through predictive modeling, a subgroup analysis, or non-parametric approaches—often leads to higher variability in estimates and reduced precision, especially when sample sizes within subgroups become smaller. Testing multiple subgroups increases the likelihood of detecting spurious but statistically significant associations by chance, but this risk may be mitigated with statistical adjustments to control type I error, such as the Bonferroni correction.
Conventional methods such as a subgroup analysis and meta-analysis typically generate readily interpretable parameters, such as treatment effects, that can be directly incorporated into comparative economic models. In contrast, methods like classification and regression tree, latent growth modeling/growth mixture modeling, and non-parametric approaches may not directly align with the parameters needed for economic modeling and may require additional steps to derive model-ready parameters, such as treatment effects. Non-parametric methods, which avoid assumptions about data distribution, may also require transformation of key outputs, such as distributions of costs or effects, to generate parameters suitable for integration into economic models. Individual methods are mapped to suitable applications based on key data considerations—including the types of evidence available (e.g., theory, trial evidence from various phases, and RWD), the purpose of the analysis (e.g., exploration, testing, or confirmation), the character of the data (e.g., randomized controlled trial vs RWD), and the volume of data. For example, a meta-analysis, which pools smaller studies into a larger sample, can enhance statistical power but is only useful when the studies to be included are sufficiently homogeneous.
Grutters et al. [48] and Shields et al. [50] surveyed methods specifically used to address patient heterogeneity in the economic evaluation literature, with the former covering methods up to 2011 and the latter updating this for the period from 2011 to January 2024. Prior to 2011, this was generally limited to routine subgroup analyses and to regression methods that link baseline risk, treatment effects, utility, and resource utilization to patient characteristics such as demographics, clinical factors, and preferences. For instance, ordinary least squares regression of net monetary benefit on patient characteristics and interaction terms has been used to identify relevant subgroups [55]. Building on this, the same authors introduced the use of a system of seemingly unrelated equations regression, which relaxes the structural constraints of ordinary least squares by allowing for separate equations with different functional forms and covariates, thereby enhancing statistical efficiency [56]. Nixon and Thompson adapted the seemingly unrelated equations framework by applying Bayesian methods, which allowed for the incorporation of informative priors and extended its applicability (e.g., to non-randomized studies) [57]. This could be particularly valuable in settings with small sample sizes, as it provides more robust estimates by incorporating both data-driven information and expert judgment. When aggregate-level data are used to inform regressions (i.e., meta-regression), researchers should consider the risk of ecological fallacy (i.e., a faulty assumption that relationships observed at the group level necessarily apply to individuals within those groups) [12].
Shields et al. updated this review to cover the period from 2011 to January 2024 [50], identifying six additional methodologies for addressing patient heterogeneity in economic evaluations. Machine learning techniques, such as causal forests, analyze heterogeneity by estimating causal treatment effects across subgroups of patients. Causal forests build a large number of decision trees to identify variation in treatment effects and outcomes, including costs and net monetary benefits, within different patient characteristics, offering a data-driven non-parametric approach to detect these effects in a more automated and scalable manner [58]. Local instrumental variables—variables that are correlated with the treatment but not directly with the outcome—have been used to address confounding and heterogeneity in observational data. Local instrumental variables help reduce biases from unobserved confounding factors, which could otherwise distort estimates of treatment effects [59]. Subpopulation Treatment Effect Pattern Plot techniques provide a graphical exploration of treatment effect variations across different subpopulations [60]. The Subpopulation Treatment Effect Pattern Plot makes no assumptions about the nature of the relationship between outcomes and covariates within each treatment group, allowing for a flexible data-driven approach to identify subgroups where treatment is most effective. Multistate statistical modeling, whereby model-based transition probabilities are derived to account for baseline patient characteristics, is well suited for populating microsimulation models [50]. Use of regression-based approaches to estimate preference heterogeneity depends on the data available [61]. When individual-level utility data are available, as opposed to aggregate health state data, regression can be used to directly model the effects of individual characteristics on outcomes. Patient preference heterogeneity has been assessed using discrete choice experiments, which involve presenting individuals with hypothetical treatment options to capture their preferences for various treatment attributes [37].
No studies, to our knowledge, have surveyed methods to address sources of non-patient heterogeneity, such as surgeon experience, care provider setting, and insurance levels, but there are empirical studies that have considered factors in this class. For example, McClellan et al. used differences in distance to a catheterization hospital—an observable factor that influences treatment decisions but does not directly affect patient outcomes—as an instrumental variable to address unobserved confounding related to underlying patient health, enabling unbiased estimates of the short- and long-term mortality effects of intensive treatments for acute myocardial infarctions [29]. Addressing heterogeneity unrelated to patients requires data structured around these non-patient factors, such as individual healthcare providers or facilities. When such non-patient data are available in randomized controlled trials or RWD, it can be reorganized to explore how variations in these factors influence outcomes. By categorizing or stratifying outcomes based on these factors, statistical techniques like hierarchical or multi-level modeling can be used to quantify the impact of non-patient heterogeneity. In other cases, it may be necessary to collect new data, if feasible and/or use a mixed-methods approach to leverage both qualitative and quantitative data.
For cases where empirical estimates can be generated for subgroups, it is important to examine not only statistical significance, but also clinical and economic meaningfulness. The magnitude of the statistically significant difference should be large enough to justify changes in clinical practice or policy [2]. Replication of subgroup results across multiple datasets may help reduce the risk of spurious findings and increase credibility [62]. Whether pre-specified or not, subgroup selection should be justified, with careful consideration of plausibility.
2.2.3 Implementation in Economic ModelsOnce the decision is made to address specific sources of heterogeneity, the next step is incorporating these factors into the comparative economic analysis. How to parameterize heterogeneity will depend on whether the analysis will be performed directly using patient-level data, in which case heterogeneity is captured naturally, or whether economic simulation modeling is used, in which case it must be explicitly incorporated. Note that the extent of observable heterogeneity may be more limited with clinical trial data than with RWD, owing to study features like exclusion criteria and treatment protocols.
Grutters et al. identified several methods for incorporating systematic relationships into economic models in use before 2011 [48]. Early modeling studies often adapted cohort models to account for heterogeneity by subdividing populations into subgroups, sometimes using tunnel states to reflect patient history. However, these adaptations can lead to a rapid increase in the number of health states and parameters, making models computationally challenging and harder to manage. Patient-level microsimulation methods, such as semi-parametric Markov modeling and discrete event simulation, are more flexible alternatives. These methods use tracker variables to tailor event risks, costs, and utilities to individual characteristics, providing a more granular approach to modeling heterogeneity [
Comments (0)