App Characteristics and Accuracy Metrics of Available Digital Biomarkers for Autism: Scoping Review


Introduction

Autism is a common form of neurodivergence estimated to affect 1% of the population worldwide, with prevalence rates reported at 1% in the United Kingdom [] and 1.85% in the United States []. Despite this high prevalence rate, diagnostic delays are common, with difficulties receiving an initial service referral [] and families in the United Kingdom [] and United States [] reporting between 2 and 3 years to receive a diagnosis from the onset of symptoms, respectively. On top of severely impacting the quality of life of those awaiting assessments and their families [], diagnostic delays may increase the likelihood and severity of comorbidities []. Further, current diagnostic processes for autism rely solely on subjective clinician interpretations derived from standardized assessment tools, leading to potential misdiagnosis [] and accentuation of phenomena such as masking [].

Digital health products, such as mobile apps, have the potential to aid the diagnostic process due to their scalability and ease of access. Apps also offer the possibility of providing additional ecological information collected in users’ home environments to clinicians during the assessment phase. Specifically, digital biomarkers, that is, digital tools that collect information about the behavioral characteristics and physiological processes of individuals affected by a condition, have shown promise in identifying the presence of a disorder in several diagnostic domains (eg, cognitive impairment and dementia [], depression [], and learning disabilities []). Recommendations for developers and the scientific community at large outline the importance of transparent communication of the algorithms underlying digital biomarkers as well as plans for iterative evaluations of such products []. Further, multimodal approaches that prioritize cognitive and behavioral assessments have been recognized as promising in aiding precision diagnostics and personalized therapeutics []. While research on digital biomarkers in autism is still in its infancy and specific recommendations for the development and validation of these products are lacking, there are digital health tools available to researchers, clinicians, and end users.

Nevertheless, not many resources synthesizing the characteristics and evaluation outcomes of these products are widely available. The aim of the current scoping review is to summarize the evidence about existing digital biomarker tools so that researchers in the field of autism, clinicians, and end users are provided with up-to-date information to make informed decisions regarding their usefulness and adoption.


MethodsSearch Strategy

A structured search, following the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) guideline [], was conducted on August 21, 2023 (with further manual searches conducted ad hoc), in the databases MEDLINE through PubMed and Elsevier’s Scopus. The search terms included “autism digital biomarker”; “autism app”; and related synonyms, truncations, and Medical Subject Headings. Additional searches were conducted through the US Food and Drug Administration (FDA) website [] and Google to find regulatory submissions and additional materials, respectively. If data reported as part of a regulatory submission had been published in a peer-reviewed journal, the peer-reviewed journal was used. A review protocol was not published; however, the full search strategy can be found in Table S1in .

Inclusion and Exclusion Criteria

Only full-text primary research studies published in English and in peer-reviewed journals, as well as regulatory submissions published on the regulatory body website, were included (Table S2 in ). Further, studies were included if they reported accuracy and validity metrics from available digital biomarkers. Apps using digital versions of existing standardized autism assessments or telehealth adaptations of in-person assessments were also excluded.

Data Extraction and Analysis

Data were extracted and analyzed by 3 reviewers. Titles and abstracts were screened once with a reason for exclusion provided by 1 reviewer, which was then inspected by another reviewer. Full texts of eligible papers were reviewed independently by at least 2 reviewers, and the inclusion and exclusion of studies were discussed as a team. The full data extraction form can be found in ; Table S3 in provides further study details and metrics about the included studies.


ResultsOverview

We found 286 studies and regulatory submissions, of which 49 were eligible for full-text screening. A total of 4 studies met our criteria and were included in the review ().

Figure 1. PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) flow diagram of the literature search and selection process. App Characteristics and Regulatory Aspects

A total of 4 studies involving 4 unique apps were included in the review (), of which 1 (Canvas Dx []) was an FDA-cleared commercial product and 3 (Guess What? [], START [], and SenseToKnow []) were research apps.

The apps from the included studies target children between 17 and 144 months (1.4-12 years) of age who do not exhibit significant sensory or motor impairments (due to the nature of the tasks). Canvas Dx uses questionnaires from caregivers and health practitioners and videos to provide a diagnostic indication. Guess What? acquires structured videos of the interaction between child and parent during a charades-style game and applies face tracking and emotion recognition techniques. START measures social, sensory, and motor skills through games and activities for children and a questionnaire for parents. SenseToKnow displays specifically designed movies aimed at eliciting autism-related attention and motor behaviors while recording the child’s responses through the front camera of the device.

Figure 2. Summary of app characteristics, study details, and accuracy metrics for all included studies. ASD: autism spectrum disorder; FDA: US Food and Drug Administration; NPV: negative predictive value; PPV: positive predictive value.

As for their technical approaches, 2 studies were administered through smartphones and 2 through tablets. Most apps (3/4, 75%) classified individuals as either autistic or nonautistic, although CanvasDx included an “undetermined” category. Further, all the studies used tree-based classification methods, particularly gradient-boosted decision trees. In terms of real-world usability, most studies (3/4, 75%) reported results from usability testing, interviews with clinicians and families, or app quality scores, showing good acceptability and feasibility results and high (93.9%) quality scores. All tools emphasize that their intended use is screening or diagnostic aid rather than stand-alone diagnostics. CanvasDx also warns that results may be potentially unreliable in individuals with specific medical conditions, such as epilepsy or genetic disorders (Table S4 in ).

A total of 3 out of the 4 apps from the included studies are available for download in the United States (links are included in Table S3 in ), with 1 (Guess What?) also being available in the United Kingdom. As for START, only the app code has been published. All apps are free to download. Regarding device compatibility, Canvas Dx and Guess What? can be downloaded on either Android or iOS; SenseToKnow is only available for iOS; and the code for START is for Android implementation only.

Studies Characteristics

Studies included to validate these 4 apps involved 1080 individuals. Participants were children aged between 17 and 144 months (1.4-12 years; mean 2.9 SD 1.0 years), with a mean autism prevalence of 22.9% (range 10.3%-57.1%). The pooled sex split was 38.5% (416/1080; range 30.5%-43.4%) female, 61.1% (660/1080; range 56.6%-69.5%) male, and 0.4% (4/1080; 1 study only: 8.2%) unknown. Race was reported for 2 studies: 2.8% (25/900; range 1.5%-4.2%) Asian, 12.2% (110/900; range 11.4%-13.2%) Black, 64% (576/900; range 53.9%-73.2%) White, 19.8% (178/900; range 13.9%-28.7%) multiracial or other, and 1.2% (11/900; range 0.2%-2.4%) unknown, and ethnicity only reported independently in 1 study: 10.5% (50/475) Hispanic or Latino and 89.5% (425/475) non-Hispanic or non-Latino. A total of 2 studies also reported parental education level: 2.6% (23/900; range 2.1%-3.1%) some high school, 8% (72/900; range 5.9%-10.4%) high school diploma, 22.4% (202/900; range 11.2%-35.1%) some college or associate degree, 66.2% (596/900; range 50.4%-80.4%) bachelor’s degree and above, and 0.8% (7/900; range 0.4%-1.2%) unknown. Samples varied in size and ranged from 49 to 475.

Accuracy and Validity Metrics

When metrics were not reported by the authors but sensitivity, specificity, and prevalence were indicated, they were calculated using the formulas in . Overall, accuracy fluctuated between 28% and 80.6%. Sensitivity and specificity also varied, ranging from 51.6% to 81.6% and 18.5% to 80.5%, respectively. Similarly, positive predictive value ranged from 20.3% to 76.6%, and negative predictive value fluctuated between 48.7% and 97.4%.

Table 1. Accuracy and validity metrics of the included digital biomarker apps for autism classification.App nameSample size, nAutism prevalence (%)Accuracy (%)Sensitivity (%)Specificity (%)PPVa (%)NPVb (%)ComparatorCanvas Dx42528.728c51.6d18.5d20.3c48.7cECDeGuess What?f4957.1737669c76.6c68.3cParent-reported diagnosisSTART13136.661.6Unable to calculateUnable to calculateUnable to calculateUnable to calculateECDSenseToKnow47510.380.6c81.680.532.5c97.4cECD

aPPV: positive predictive value.

bNPV: negative predictive value.

cCalculated using the following formulas: accuracy = (sensitivity) × (prevalence) + (specificity) × (1 – prevalence); PPV = (sensitivity × prevalence) / ; and NPV = (specificity × [1 – prevalence]) / .

dFor 3 classes (autism, uncertain, and nonautism) classification. If participants who received an uncertain or indeterminate class decided by the classifier were removed, for the remaining 31.8% (135/425) of participants who received a determinate output (autism or nonautism), sensitivity and specificity increased to 98.4% and 78.9%, respectively.

eExpert clinician diagnosis.

fMetrics from the feasibility study. Ongoing study for the validation of the GuessWhat? app (registered trial NCT04739982 [])


DiscussionOverview

This study investigated the available digital biomarker tools for autism diagnosis. We found 4 products targeted at children, exploring a variety of domains, ranging from attention and looking behaviors to analysis of social, sensory, and motor skills. Of the examined research studies, 1 included commercial apps with medical device classification, and 3 involved unregulated research apps.

All studies use digital biomarkers of known domains that have been shown to be indicative of autism []. Most products use video analysis (often accompanied by questionnaires) to extract features of interest, whereas 1 of the included tools collects behavioral measures from interactive tasks. Both parent questionnaires and batteries assessing children during interactive tasks are part of well-known diagnostic assessments for autism, for example, Autism Diagnostic Observation Schedule (ADOS-2) [,], Autism Diagnostic Interview-Revised [], and TELE-ASD-PEDS []. While assessment of different domains ensures higher coverage of potentially relevant behaviors, the variety in the evaluated domains limits the comparability between tools. Nevertheless, each of these digital biomarkers has strengths and weaknesses. CanvasDx obtained FDA approval for commercialization and is therefore undergoing robust validation processes. Further, its classification algorithm is very sensitive to extreme cases (very high or very low risk of autism); however, the inclusion of an indeterminate class makes it more difficult to identify less severe cases. Another limitation of CanvasDx is that it is not clinician independent. The strengths of Guess What? include its blended diagnostic and therapeutic approach, which offers an end-to-end solution for users; its wider age range of use; and its availability in multiple geographies. Its diagnostic and therapeutic nature is also its biggest limitation, as this tool was not originally designed as a diagnostic tool, and evaluation data are limited. The START app combines multiple assessment domains (social functioning and motor and sensory behaviors) and has been validated with diverse communities in mind. Nevertheless, the validation study does not report information about misdiagnoses, which limits the interpretability of its results. Finally, SenseToKnow offers combined biomarkers for social behavior as well as cognitive, language, and motor abilities within an assessment shorter than 10 minutes. The major limitation in its validation data pertains to the lower prevalence of autism-positive cases, which highlights the need for further evaluations of the tool.

Availability to download the apps was restricted to specific geographies, with unclear documentation about accessibility. Further, 1 of the tools (START) has only been published within a code repository, thus limiting accessibility to the wider population. Given that one of the main advantages of mobile apps is their scalability and ease of access [], restrictions based on geography may negatively impact the adoption of such products in both research and clinical settings.

In terms of study details, only 2 studies reported full demographic information and socioeconomic variables, despite research showing differences in the adoption of health apps highly correlate with higher education and income []. The fact that most parents reported having a bachelor’s degree or above further impacts the generalizability of the results to individuals from different socioeconomic backgrounds and poses questions around overall real-world usability. Both sex and race and ethnicity data were unbalanced, with most study participants being male and White. Evidence suggests that phenomena such as masking are more common in female individuals [,], thus highlighting the need for digital biomarkers to be validated in balanced sex populations. Similarly, findings outline longer diagnostic delays among minorities and underserved communities, with an associated lack of early interventions during pivotal developmental years [,]. Thus, if app developers fail to assess the usability, acceptability, and validity of these tools in underserved communities and diverse populations, the potential of digital health products to address barriers to entry to diagnostic and therapeutic pathways may be significantly reduced. As such, research evaluating digital biomarkers for autism should aim to assess their performance in these demographics to successfully increase health equity and help tackle diagnostic complexities.

Contrary to the Standards for Reporting of Diagnostic Accuracy Studies [], half (2/4, 50%) of the resources did not provide the full array of accuracy and validity metrics nor the confusion matrix (Table S3 in ), with 1 of the studies (START) only reporting accuracy (but no sensitivity or specificity nor confusion matrix), which substantially limits the interpretability of their findings. The observed high variability in the presented app metrics may be justified by discrepancies in the evaluation methods. First, differences in the prevalence of autism substantially impact the calculation of classic accuracy metrics, therefore not rendering a reliable picture of the true classifiers’ performance and limiting generalizability []. Similarly, most studies (3/4, 75%) used 2 classes (autism vs nonautism) for their algorithm validation, whereas 1 added an “indeterminate” class. Additionally, the studies evaluated their apps using different comparators, with most (3/4, 75%) measuring the performance of their algorithms against expert clinician diagnosis based on the Diagnostic and Statistical Manual of Mental Disorders-5 (DSM-5) criteria and another 1 referring to parent-reported diagnosis. Notably, clinicians may reach a diagnostic outcome using different diagnostic criteria (eg, DSM-5 or International Classification of Diseases-11) or supplement their clinical judgment with the use of assessment tools (eg, ADOS-2) [,]. Additionally, geographic differences exist in parent-reported symptomology when using common assessment tools [], of which accuracy and validity metrics have also been shown to vary by country []. The abovementioned factors significantly impact any attempt to draw direct comparisons between studies. As such, app developers and researchers should aim to validate their products within the context of standardized evaluation frameworks that advocate for validation in diverse populations and with multiple comparators. Further, findings should be reported in a uniform fashion and should include the prevalence of the disorder, full metrics, confusion matrices, binary classification, and comparator of choice to facilitate the assessment of the potential for digital biomarkers to represent effective screening tools for autism.

This review highlighted important gaps in the literature surrounding the development and testing of digital biomarkers for autism. First, it remains unclear whether and to what extent these tools are typically developed with input from both end users and clinicians []. Frameworks outlining guidelines for the design and development of digital biomarkers would help unify approaches across companies and research entities and ensure standards of quality and safety []. Our work also outlined exciting trends toward the development of mobile-first digital technologies. It has been shown that while individuals in underserved communities often do not have access to desktop computers or laptops, the availability of mobile phones is typically higher []. Therefore, developing mobile-first tools could help ensure health equity in communities where conditions like autism suffer from greater diagnostic delays []. Finally, digital biomarker tools typically provide ecologically valid information that may not otherwise be available to clinicians. As such, including these tools as part of the waiting list for autism assessments could provide clinicians with valuable information ahead of the assessment, which in turn could help prioritize more severe cases.

Conclusions

Digital health products are increasingly gaining popularity, yet systematic syntheses of the current state of the art are lacking. Our work highlighted how diversity in the development and evaluation of digital biomarkers aiding in the detection of autism may impact their real-world usability and adoption in communities where these tools may have the most positive impact. As such, standardized and transparent development and evaluation frameworks, recommending assessing the validity of digital biomarkers in diverse populations, are needed to guide researchers, clinicians, and end users in making informed decisions about whether to consider their adoption within research settings and clinical pathways.

None declared.

Edited by L Buis, G Eysenbach; submitted 05.09.23; peer-reviewed by W Zhao, Q Qi, M Mamat, T Valentine; comments to author 03.10.23; revised version received 30.10.23; accepted 02.11.23; published 17.11.23

©Sonia Ponzo, Merle May, Miren Tamayo-Elizalde, Kerri Bailey, Alanna J Shand, Ryan Bamford, Jan Multmeier, Ivan Griessel, Benedek Szulyovszky, William Blakey, Sophie Valentine, David Plans. Originally published in JMIR mHealth and uHealth (https://mhealth.jmir.org), 17.11.2023.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR mHealth and uHealth, is properly cited. The complete bibliographic information, a link to the original publication on https://mhealth.jmir.org/, as well as this copyright and license information must be included.

留言 (0)

沒有登入
gif