Collection and Analysis of Adherence Information for Software as a Medical Device Clinical Trials: Systematic Review


IntroductionBackground

There are over 350,000 health-related apps on the market, each claiming to improve certain aspects of physical or mental health []. A small fraction of these apps is subject to Food and Drug Administration (FDA) regulations. Regulators, health care providers, and patients need to understand how these apps compare with alternatives (eg, pharmaceuticals) that undergo rigorous evaluation. As with pharmaceuticals, the risks and benefits of apps depend on how well people use them. Incorrect assumptions about adherence in clinical trials can lead to incorrect regulatory and treatment decisions. With pharmaceuticals, these risks are reduced by the gold standard practice of intent-to-treat analysis, which estimates effectiveness based on actual, typically imperfect, use. This standard is not the norm in trials of digital health apps, leading to an unknown risk of bias (ROB) in the estimated effects. Here, we provide a systematic review of current practices in FDA-regulated apps, leading to recommendations for reducing the risks of bias revealed by the review.

The FDA focuses on the regulation of software as a medical device (SaMD) therapeutics intended to prevent, diagnose, or treat diseases []. If a predicate therapeutic exists, applicants may use the FDA’s 510k pathway to prove that their therapeutic is substantially equivalent to the predicate therapeutic (ie, with the same intended use, technological characteristics, and benefits and risks of an approved or cleared therapeutic []). In the absence of a predicate therapeutic, SaMD therapeutics follow the FDA’s De Novo pathway, which requires evidence that the therapeutic is safe and effective. The FDA established the Digital Health Center of Excellence to create innovative ways to regulate SaMDs [], which, for example, are easier to update than pharmaceuticals. One such innovation, reviewed under the FDA’s precertification pilot program, conducted excellence appraisals of software companies. This program tested a streamlined approach to approving and updating therapeutics for companies that have demonstrated quality practices [,]. Other innovations have been applied across all FDA departments, such as allowing clearance, approval, and marketing claims based on “real-world evidence” []. There are also proposals, created outside FDA, specifying standard processes (eg, performance reporting standards) for clinical trials of low-risk digital health apps not subject to regulatory oversight []. Given the novelty of SaMDs and the associated regulatory environment, the FDA has the need and opportunity to create guidance and requirements for addressing adherence in future trials. We hope to inform that process.

A systematic review by Milne-Ives et al [] found that approximately three-fourths of digital health app trials collected and reported basic adherence information, such as the number of dropouts. These trials reported a variety of app engagement metrics, with only one-third reporting >60% use. Prior systematic reviews of digital health apps reported similar simple summary statistics (eg, average adherence and dropout rates), with few details on how adherence data were collected and analyzed [-]. This systematic review extends that work by examining, in detail, how adherence and engagement information is collected, analyzed, and reported. It considers how those practices affect the estimates of effectiveness and efficacy, defined as the app’s effect in the entire sample, regardless of adherence, and the app’s effect in the adherent subgroup, reflecting the moderating effect of adherence. This review focuses on digital health apps with a reasonably well-defined evidentiary base, namely, those that followed the FDA’s De Novo or 510k pathways.

Criteria for EvaluationROB Framework

Imperfect adherence can cause underestimation or overestimation of the safety and efficacy of a SaMD. For example, a therapeutic’s efficacy and side effects may be underestimated, if trial participants use it sparingly, but consistent use is assumed. Conversely, efficacy may be overestimated if adherence reflects neglected confounding variables (eg, income and lifestyle factors). As a hypothetical example, researchers evaluating an app to reduce the risk of preeclampsia may observe a reduced rate not because of participant adherence but because participants adhering to the app were recipients of commercial health insurance. To evaluate the ROB owing to imperfect adherence, we used the adherence components of the Cochrane ROB Assessment (version 2.0) [], a well-documented tool for systematic reviews and meta-analyses. To determine the ROB from nonadherence, the ROB tool first asks, “Was there nonadherence to the assigned intervention regimen that could have affected participants’ outcomes?” If outcomes could have been affected, the ROB tool then asks, “Was an appropriate analysis used to estimate the effect of adhering to the intervention?” We developed criteria to answer each question based on research regarding adherence metrics and common methods of analyzing efficacy.

Adherence and Engagement Metrics

Adherence refers to how well participants use an intervention, as defined by a protocol or recommendation for use. Engagement refers to how participants use an intervention, irrespective of the intended use of the app. Engagement data can be used to measure adherence for a digital health app. As both adherence and engagement can affect the outcomes of a trial, we have reported both. When collecting and reporting adherence and engagement statistics, researchers must consider 3 facets of use []: initiation, when a person starts using an intervention; implementation, how a person uses the intervention between initiation and discontinuation; and persistence, how long a person uses the intervention before discontinuation.

Which metrics are collected and how they are collected can also affect the ability to conduct efficacy analyses and the analyses’ potential bias. For instance, adherence with recommendations from the therapeutic (eg, using backup contraception when an app detects fertility) could also affect effectiveness estimates. Without collecting this information, researchers would be unable to analyze efficacy in terms of adherence to behavioral recommendations. Therefore, we report adherence and engagement with both the therapeutic and its recommendations. The mechanism of collecting adherence and engagement information can act as a potential confounder if it prompts additional engagement with the therapeutic compared with real-world engagement. Reminders used to increase adherence (eg, email messages) can also be confounders if they are not part of the therapeutic design. To account for these potential confounders, we recorded whether reminders and mechanisms for measuring adherence and engagement were internal to the app or external (ie, an additional component not found in the marketed app). We found few prior studies or analysis plans that determined the level of adherence or engagement required to have a clinical effect. This level of adherence can vary depending on the therapeutic being used. Without a study or trial analysis plan defining low adherence or evidence of the level of adherence needed to produce a clinical effect, we cannot conclusively assess whether adherence is low or not because of insufficient information.

Analysis of Efficacy

In evaluating efficacy analyses, we ask how well a trial or study fulfills the assumptions required by its efficacy analysis method. There are 3 commonly used estimates of efficacy: the average treatment effect (ATE), per-protocol effect, and dose-response effect. describes each estimate, the common analysis methods for calculating estimates, and the assumptions required for unbiased estimates. [-] includes definitions of the following assumptions: consistency, positivity, ignorability, exclusion restriction, strong monotonicity, and the stable unit treatment value assumption (SUTVA). In addition to the requirements in , researchers should preregister their analyses of effectiveness and efficacy to reduce the risk of capitalization on chance [].

Table 1. Methods of analysis commonly used to account for imperfect adherence and the assumptions required for unbiased estimates.Estimate of efficacy and common analysis methodsAssumptions for unbiased estimatesATEa: estimates the average effect of treatment
ATE analysis
Evaluates groups according to their treatment group regardless of adherence.
Estimates efficacy if adherence is modified with regular reminders to participants.
SUTVAb
Consistencyc
Positivity
Ignorability

ITTd analysis
Evaluates groups according to their assigned treatment regardless of adherence.
Estimates efficacy if adherence is modified with regular reminders to participants.
SUTVA
Consistencyc
Randomization (fulfills positivity, exclusion restriction, and ignorability)
Per-protocol effect: estimates the average effect of adhering to the treatment assignment
Complier average causal effect or local average treatment effect
Evaluates the per-protocol effect for the adherent subpopulation.
Evaluates groups based on an adherence threshold. Nonadherent participants in the treatment group are labeled as never-takers. It is assumed that the effect of the never-takers is equal in both groups.
SUTVA
Consistencyc,e
Randomization (fulfills positivity, ignorability, exclusion restriction, and strong monotonicity)

Generalized estimation
Evaluates groups based on an adherence threshold. Groups are evaluated based on adherence over time such as never-takers, early-takers, late-takers, and always-takers.
SUTVA
Consistencyc,e
Positivity
Ignorability (sequential exchangeability)

As-treated analysis
Evaluates groups based on an adherence threshold. Nonadherent participants in the treatment group are considered part of the control group.
SUTVA
Consistencyc,e
Positivity
Ignorability (conditional independence of adherence and outcomes)

Per-protocol analysis
Evaluates groups based on an adherence threshold. Excludes nonadherent participants in the treatment group.
SUTVA
Consistencyc,e
Positivity
Ignorability (conditional independence of adherence and outcomes)
Dose-response effect: estimates the effect of adherence on the treatment
Dose-response analysis (IVf method)
Evaluates adherence as a mediator for all participants using an IV to fulfill the mechanism ignorability assumption.
SUTVA
Consistencyc,e
Randomization (fulfills positivity, ignorability, exclusion restriction, and strong monotonicity)

Dose-response analysis (confounder adjustment)
Evaluates adherence as a mediator for all participants using confounder adjustment to fulfill the mechanism ignorability assumption.
SUTVA
Consistencyc,e
Positivity
Ignorability (conditional independence of adherence and outcomes)

aATE: average treatment effect.

bSUTVA: stable unit treatment value assumption.

cConsistent definition of treatment.

dITT: intent-to-treat.

eConsistent definition of adherence.

fIV instrumental variable.

We applied our framework, which was developed based on the Cochrane ROB, to evaluate how well existing trials and studies meet our standards, with the goal of improving future trials. We examined the completeness of their reporting and the appropriateness of the procedures reported. By focusing on SaMD therapeutics, the most rigorously evaluated digital health apps, we sought to identify improvements for future studies on all digital health apps.


MethodsScreening

A 2-stage search strategy was used to identify all product codes and registrations for patient-facing SaMDs, with intended repeated use for at least 2 weeks, that the FDA had approved or cleared before March 2022. In the first stage, 2 reviewers independently searched the FDA product code database for product codes related to SaMDs. We searched the device name, definition, physical state, and technical method attributes for the keywords “software,” “mobile,” “digital,” and “application.” In the second stage, we searched the FDA registration database for these product codes. We examined each registration’s supporting documents, De Novo decision summaries, and 510k decision summaries to determine whether the product met our inclusion criteria.

We then searched ClinicalTrials.gov, product websites, and MEDLINE for peer-reviewed publications corresponding to each included product. For the ClinicalTrials.gov search, we used the product and company names as keywords, individually and in combination, to identify clinical trials. We included all publications that evaluated the effectiveness or efficacy of the included products, including both randomized controlled trials (RCTs) and observational studies. We reviewed all publications listed at the end of the ClinicalTrials.gov registration for potential inclusion. For the MEDLINE search, product and company names were used as keywords. For the product website search, publications listed as clinical evidence on company websites were included. Two reviewers independently screened each publication, examining the title and abstract as well as the full text, where appropriate. Reviewer disagreements were reconciled by discussion. We screened and included only those articles published before March 2022. We did not include pilot or feasibility studies.

For example, the first stage of the search identified the PYT product code when the “device name” field was searched for “software.” All registrations coded as PYT (ie, “Device, Fertility Diagnostic, Contraceptive, Software Application”) were then evaluated for inclusion based on corresponding supporting documents, 510k decision summaries, and De Novo decision summaries. One included 510k for this product code was for the Clue app, K193330. In the second stage, we searched ClinicalTrials.gov using the keywords “Clue,” “Clue Birth Control,” “Biowink,” “Dynamic Optimal Timing,” and “Cycle Technologies.” We searched MEDLINE using the keywords “Dynamic Optimal Timing,” “Biowink,” and “NCT02833922.” Finally, we searched the product website [] for clinical trial documents.

Data Extraction

For each publication, one reviewer extracted data and the other reviewer checked the accuracy of the data. Differences were reconciled by discussion between the reviewers. The Cochrane Data Collection Form for Intervention Reviews [] was completed with clinical trial characteristics, including the design, number of participants, sampling method, interventions, and outcomes.

The remainder of the data extraction form was created using the criteria for reporting adherence metrics described in the Adherence and Engagement Metrics section and the assumptions for the associated efficacy analysis method described in the Analysis of Efficacy section. Given the diversity of the apps and outcomes, we reported each metric that a clinical trial or study reported separately, without averaging across different metrics. When evaluating efficacy analyses, we categorized trials or studies as fulfilling the positivity condition if they had a control group. We categorized trials as fulfilling the consistency condition if they had definitions of treatment and adherence that avoided hidden variations of treatment that might affect participants differently.

Some assumptions, referenced in and described in , could not be fully evaluated. One such assumption is SUTVA, which requires no interaction between units of observation that could affect a result. Although it is impossible to prove that this assumption holds, some trial designs afford greater confidence than others. For example, if a trial has no central clinical team and treatment is administered only through an app, it would be difficult for participants to interact with the clinical research staff. By contrast, if clinical research staff interact with both the control and treatment groups, they might treat participants in the 2 groups in ways that affect their independence. We categorized a trial as fulfilling SUTVA if it had no central clinical team or if it had mechanisms for reducing the risk of interaction between participants or between participants and staff.

Similarly, it is impossible to fully evaluate the assumption that there are no unmeasured confounders. Instead, we asked whether the researchers demonstrated awareness of confounders by listing potential confounders explicitly and reporting their rationale for selecting them.

The results in the Adherence Metrics section and Analysis of Efficacy section below summarize practices for the included trials using means or counts as appropriate. Given the heterogeneity of the therapeutics and outcomes, we did not estimate the overall impact of all biases. The protocols and preregistrations referenced in the included articles were used as supporting documents. The protocol for this review was registered on the Open Science Framework [], which includes the data extraction forms and extracted data. Article screening data, extracted data, and summarized extracted data are also available in [-].


ResultsIncluded Trials

shows the completed PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) diagram. The 2-stage search for SaMD therapeutics identified 5% (15/301) of product codes and 44% (24/54) of registrations as potential SaMDs. These registrations included 18 unique SaMD therapeutics. Our search of ClinicalTrials.gov, company websites, and MEDLINE identified 40, 228, and 148 articles, respectively. After screening and removal of duplicate articles, 24 articles, involving 10 products, met all the inclusion criteria. A total of 8 products were excluded because clinical trials or observational studies evaluating efficacy for at least 2 weeks were not found in our literature search.

Figure 1. The 2-stage strategy used to identify trials and studies of software as a medical device (SaMD) therapeutics. The Food and Drug Administration (FDA) databases were first searched for SaMD therapeutics that would be used by patients for at least 2 weeks. In the second stage, ClinicalTrials.gov, MEDLINE, and company websites were then searched for articles evaluating effectiveness or efficacy for these products when used by patients for at least 2 weeks.

As seen in and , the 24 included articles (22 total trials) studied a variety of SaMD therapeutics, including those intended to treat irritable bowel syndrome, insomnia, substance use disorder, and attention-deficit/hyperactivity disorder. All the SaMD therapeutics were mobile apps and will be referred to as apps for the remainder of the article. shows an even mix of apps intended for continual use or module-based apps. Most trials (18/22, 82%) specified a recommended dose for their app, such as the frequency of use or the number of modules to complete. Overall, 11 (50%) trials or studies studied apps used a module-based design with a recommended dose for the app [,-,-], whereas 7 (32%) trials or studies studied apps used a continual use design with a recommended dose for the app [,,-]. Apps without a recommended dose only used the continual use design (4/22, 18%) [-,].

Table 2. Included articles and associated products.Product and condition treatedStudy, yearTitleApple Irregular Arrhythmia Notification
Irregular arrhythmia notificationPerez et al [], 2019Large-Scale Assessment of a Smartwatch to Identify Atrial FibrillationBlueStar
Diabetes managementQuinn et al [], 2011Cluster-randomized trial of a mobile phone personalized behavioral intervention for blood glucose control
Diabetes managementAgarwal et al [], 2019Mobile App for Improved Self-Management of Type 2 Diabetes: Multicenter Pragmatic Randomized Controlled Trial
Diabetes managementDugas et al [], 2020Engagement and Outcomes Associated with Contextual Annotation Features of a Digital Health SolutionClue
ContraceptiveJennings et al [], 2018Estimating six-cycle efficacy of the Dot app for pregnancy prevention
ContraceptiveJennings et al [], 2019Perfect- and typical-use effectiveness of the Dot fertility app over 13 cycles: results from a prospective contraceptive effectiveness trialDexCom G6
Diabetes managementAkturk et al [], 2021Real-World Evidence and Glycemic Improvement Using Dexcom G6 FeaturesEndeavorRx
Videogame treatment for ADHDaKollins et al [], 2020A novel digital intervention for actively reducing severity of paediatric ADHD (STARS-ADHD): a randomised controlled trial
Videogame treatment for ADHDKollins et al [], 2021Effectiveness of a digital therapeutic as adjunct to treatment with medication in pediatric ADHD
Videogame treatment for ADHDGallen et al [], 2022Enhancing neural markers of attention in children with ADHD using a digital therapeuticMahana
CBTb for IBScEveritt et al []d, 2019Assessing telephone-delivered cognitive–behavioural therapy (CBT) and web-delivered CBT versus treatment as usual in irritable bowel syndrome (ACTIB): a multicentre randomised trial
CBT for IBSEveritt et al []d, 2019Therapist telephone-delivered CBT and web-based CBT compared with treatment as usual in refractory irritable bowel syndrome: the ACTIB three-arm RCT
CBT for IBSEveritt et al [], 2019Cognitive behavioural therapy for irritable bowel syndrome: 24-month follow-up of participants in the ACTIB randomised trialNatural Cycles
ContraceptiveBerglund Scherwitzl et al [], 2016Fertility awareness-based mobile application for contraception
ContraceptiveBerglund Scherwitzl et al [], 2017Perfect-use and typical-use Pearl Index of a contraceptive mobile app
ContraceptiveBull et al [], 2019Typical use effectiveness of Natural Cycles: postmarket surveillance study investigating the impact of previous contraceptive choice on the risk of unintended pregnancy
ContraceptivePearson et al [], 2021Natural Cycles app: contraceptive outcomes and demographic analysis of UK users
ContraceptivePearson et al [], 2021Contraceptive Effectiveness of an FDA-Cleared Birth Control App: Results from the Natural Cycles U.S. CohortReSet
CBT for SUDeCampbell et al [], 2014Internet-delivered treatment for substance abuse: a multisite randomized controlled trialReSet-O
CBT for OUDfChristensen et al []g, 2014Adding an Internet-delivered treatment to an efficacious treatment package for opioid dependence
CBT for OUDMaricich et al [], 2021Real-world evidence for a prescription digital therapeutic to treat opioid use disorder
CBT for OUDMaricich et al [], 2021Real-world use and clinical outcomes after 24 weeks of treatment with a prescription digital therapeutic for opioid use disorder
CBT for OUDMaricich et al []g, 2021Safety and efficacy of a prescription digital therapeutic as an adjunct to buprenorphine for treatment of opioid use disorderSomryst
CBT for InsomniaRitterband et al [], 2017Effect of a Web-Based Cognitive Behavior Therapy for Insomnia Intervention With 1-Year Follow-up A Randomized Clinical Trial

aADHD: attention-deficit/hyperactivity disorder.

bCBT: cognitive behavioral therapy.

cIBS: irritable bowel syndrome.

dEveritt et al [] and Everitt et al [] were based on the same trial.

eSUD: substance use disorder.

fOUD: opioid use disorder.

gChristensen et al [] and Maricich et al [] were based on the same trial.

Table 3. Summary of devices and trials included in the study (n=22).CharacteristicsValuesTherapeutic indication for use, n (%)
Contraceptive7 (32)
Videogame treatment for ADHDa3 (14)
Irregular arrhythmia notification1 (5)
Diabetes management4 (18)
Cognitive behavioral therapy7 (32)

IBSb2 (9)

Insomnia1 (5)

Substance use disorder4 (18)Type of therapeutic, n (%)
Recommended use with module design11 (50)
Recommended use with continual use design7 (32)
No recommended use with module design0 (0)
No recommended use with continual use design4 (18)Trial design
RCTc [,,,-,,,], n (%)8 (36)

Participants (in comparison groups), mean (SD)290 (120)

Trial length (d), mean (SD)300 (270)
Observational [,-,,,-,-], n (%)14 (64)

Participants (in comparison groups), mean (SD)5100 (7000)

Trial length (d), mean (SD)230 (140)

aADHD: attention-deficit/hyperactivity disorder.

bIBS: irritable bowel syndrome.

cRCT: randomized controlled trial.

Most trials (14/22, 64%) were observational, with the remainder being RCTs (8/22, 36%). On average, the RCTs recruited 290 (SD 120) participants and lasted 300 (SD 270) days. On average, the observational trials recruited 5100 (SD 7000) participants and lasted 230 (SD 140) days.

Adherence Metrics

summarizes how the articles measured and reported each of the 3 aspects of adherence. As each article could report different adherence metrics for the same trial or study and report separate analyses, duplicate trials and studies were counted twice. Of the 24 articles, 23 (96%) collected information about app engagement. All apps that provided recommendations (8/8, 100%) also collected information about adherence to their recommendations [,,,-]. Of the 23 articles that collected adherence information, 2 (9%) reported that adherence information was collected externally from the marketed app [,]. Three articles reported that researchers attempted to increase adherence by notifying inactive patients [-]. One reported the use of in-app notifications and 2 reported using email notifications.

Table 4. Summary of adherence metrics (N=24)a.Adherence metricsValues, n (%)Each reported metric (%), mean (SD)Trial collected information about app engagement23 (96)N/AbTrial collected information about adherence to recommendations (n=8 articles for apps that gave recommendations)8 (100)N/AAdherence information collected outside of the marketed app (n=23 articles for apps that collected adherence information)2 (9)N/AAdherence notification sent outside of app (n=3 articles reported sending adherence notifications)2 (67)N/AEngagement metrics (metric is not measuring prescribed use)
Initiation2 (8)N/A

Initial app use, core completion, or activity use [,]2 (8)52 (35)
Implementation2 (8)N/A

Completed sessions, modules, or activities [,]2 (8)20 (22)

Log-in days []1 (4)23c
Persistence7 (29)N/A

Percentage of participants continuing use at 1 y [,,,,,]6 (25)52 (12)

Number of days participants used the app []1 (4)153cAdherence metrics (metric is measuring prescribed use)
Initiation15 (63)N/A

Provided at least 20 d of data [-]5 (21)100 (0)

Initial app use, core completion, or activity use [,,,-]7 (29)98 (4)

Entered at least 2 period start dates [,]2 (8)100 (0)

Initiation of video in response to app alert []1 (4)44c
Implementation16 (67)N/A

Completed sessions, modules, or activities [-,,]5 (21)88 (16)

Completed at least 4 sessions and 1 call [-]3 (13)64 (5)

Completed half of the modules [,]2 (8)76 (13)

Completed ≥8 core modules [,]2 (8)87 (9)

Percentage of logged intercourse on red days [,]2 (8)23 (0)

Percentage of total days intercourse logged on red days (ie, days where the user did not follow app recommendations) []1 (4)2c

Percentage of perfect use cycles (ie, menstruation cycles where the user followed all trial recommendations) [,]2 (8)17 (10)

Log-in days [,,]3 (13)47 (19)
Persistence4 (17)N/A

Participants using the app at week 12 [,]2 (8)4 (17)

Completed all core modules [,,,]2 (8)49 (19)
Study reported all prescribed facets of adherence (n=20 studies that prescribed a recommended use of the app)9 (45)N/A

aThe left-hand columns report what percentage of articles reported adherence or engagement information and what metrics were used by each article. The right-hand columns report the mean and SD for all the articles that reported that metric.

bN/A: not applicable for summary of facets of adherence.

cSD values are not applicable as only 1 article was included.

A total of 4 articles studied a product without prescribing how often to use the app. Engagement was reported in 3 articles on these products. Of the 24 articles, engagement was reported for 2 (8%) in terms of initiation, 2 (8%) in terms of implementation, and 1 (4%) in terms of persistence. Two continual use therapeutics prescribed app use in terms of initiation and implementation but not persistence. As such, 25% (6/24) of the articles studying these apps reported engagement persistence metrics.

Of the 24 articles, 15 (63%) reported initiation in 4 different ways (eg, the number of users who finished the first app module and the number of users who entered 20 data points into the app). Seven articles excluded participants who did not initiate app use, leading to a high adherence for their adherence metrics. Of the 24 articles, 16 (67%) reported implementation, with 9 different definitions (eg, proportion of days between starting and stopping the use of an app that users logged their temperature and the number of perfect use cycles reported by women [ie, abstaining or using contraception on all high-risk days]). Of the 24 articles, 4 (17%) reported persistence, with 2 different definitions (participants using the app over the prescribed period and participants completing the prescribed number of modules). reports the percentage of studies and the average adherence across trials and studies that used each metric. Of the 20 articles that prescribed use of the app, only 9 (45%) reported all prescribed facets of adherence [,-,,].

ROB: “Nonadherence to the Assigned Intervention Regimen”

Of the 24 articles, 4 (17%) only reported engagement information, as there was no prescribed amount of app use. We found that the outcomes of the remaining articles could have been affected by nonadherence. Of the 83% (20/24) of articles for apps with prescribed use, 25% (5/20) reported adherence at or below their definition of low adherence for at least 1 facet of adherence. Of the remaining 15 articles, 12 (80%) reported that there was some nonadherence with the app for any prescribed facet of adherence or the app’s behavior recommendations but did not provide a definition of low adherence. These articles provided insufficient information to determine whether adherence was sufficient for each app. The remaining 3 articles did not report sufficient information about each prescribed facet of adherence to judge adherence.

Analysis of Efficacy

summarizes the effectiveness and efficacy estimates from each article. Of the 24 articles, 20 (83%) estimated the app’s effectiveness as the ATE for all participants. Of these 20 articles, 11 (55%) preregistered their analysis of effectiveness. A higher percentage of RCTs preregistered their effectiveness analysis (7/9, 78%) compared with observational studies (4/11, 36%). Of the 24 articles, 15 (63%) estimated efficacy in terms of the ATE, per-protocol effect, or dose-response effect. Of these 15 articles, only 5 (33%) preregistered an efficacy analysis. Preregistration was more common for RCTs (3/6, 50%) than for observational trials (2/9, 22%).

Table 5. Summary of efficacy estimates (N=24).Efficacy estimatesValues, n (%)ReferencesEffectiveness estimate20 (83)—a
None4 (17)[,,,]
Average treatment effect20 (83)[-,,,,-]
Preregistered effectiveness analysis (n=20)11 (55)—

RCTb (n=9)7 (78)[,,,,,,]

Observational (n=11)4 (36)[,,,]Efficacy estimate15 (63)—
None9 (38)[,,,,,,,]
Average treatment effect2 (8)[,]
Per-protocol effect10 (42)[,,,-,,,,]
Dose-response effect3 (13)[,,]
Preregistered efficacy analysis (n=15)5 (33)—

RCT (n=6)3 (50)[,,]

Observational (n=9)2 (22)[,]

aReferences not listed for summary rows.

bRCT: randomized controlled trial.

characterizes the articles in terms of how well they meet the assumptions for their method of analysis. Of the 24 articles, 2 (8%) estimated efficacy in terms of ATE [,]. One of them used intent-to-treat analysis and met the relevant reporting requirement [], and the other article calculated the ATE for an observational trial []. It met the criteria for SUTVA and had a clear definition of the treatment condition. However, it did not meet the positivity condition and lacked a control condition. The study adjusted for 1 confounder without saying how it was chosen.

Table 6. Fulfillment of required assumptions for efficacy analyses (n=14).Estimate category, analysis method, and articleSUTVAa, n (%)Positivity, n (%)Consistency, n (%)Exclusion restriction, n (%)Strong monotonicity, n (%)Assignment mechanism ignorability


Clear treatment definitionClear adherence definition

Randomization, n (%)Conditional independence of treatment and outcomesSequential exchangeabilityConditional independence of adherence and outcomes







Control variablesControl variablesControl variablesAverage treatment effect (n=2)
Intent-to-treat analysis (n=1) 1 (100)1 (100)1 (100)NRb1 (100)1 (100)1 (100)NRNRNR
Kollins et al [] (n=1)1 (100)1 (100)1 (100)NR1 (100)1 (100)1 (100)NRNRNR
Average treatment effect analysis (n=1) 1 (100)0 (0)1 (100)NRNRNRNRNRNRNR
Gallen et al [] (n=1)1 (100)0 (0)1 (100)NRNRNRNRBasic response timeNRNRPer-protocol effect (n=9)
Complier average causal effect analysis (n=1) 1 (100)1 (100)1 (100)1 (100)1 (100)1 (100)1 (100)NRNRNR
Everitt et al [,] (n=1)1 (100)1 (100)1 (100)1 (100)1 (100)1 (100)1 (100)NRNRNR
Generalized estimation (n=0) —c———NRNRNR—NR—
As-treated analysis (n=3) 2 (67)1 (33)3 (100)3 (100)NRNRNRNRNRN/Ad
Ritterband et al [] (n=1)1 (100)0 (0)1 (100)1 (100)NRNRNRNRNRBaseline ISIe
Dugas et al [] (n=1)1 (100)1 (100)1 (100)1 (100)NRNRNRNRNRTime and demographic characteristics
Akturk et al [] (n=1)1 (100)0 (0)0 (0)0 (0)NRNRNRNRNRNone
Per-protocol analysis (n=5) 5 (100)1 (20)5 (100)5 (100)NRNRNRNRNRN/A
Everitt [] (n=1)1 (100)1 (100)1 (100)1 (100)NR

留言 (0)

沒有登入
gif