Accuracy of Observer-Rated Measurement Scales for Depression Assessment in Patients with Major Neurocognitive Disorders Residing in Long-Term Care Centers: A Systematic Review

Introduction: Depression is often under-detected in long-term care (LTC) patients with major neurocognitive disorders (MNCD) and is associated with important morbidity, mortality, and costs. Observer-rated outcome measures (ObsROMs) could help resolve this problematic; however, evidence on their accuracy is scattered in the literature. This systematic review aimed at summarizing this evidence. Methods: A literature search was conducted in 7 databases using keywords, MeSHs, and bibliographic searches. We included studies published before January 2022 and reporting on the accuracy of a depression ObsROM used in LTC patients with MNCD. Data extraction, analysis, synthesis, and study methodological quality assessments were done by two authors, and discrepancies were resolved by consensus. Results: Among 9,660 articles retrieved, 8 studies reporting on 11 depression measures were included. Scales were classified as patient-reported outcome measures used as ObsROMs or true ObsROMs. Among the first category, the Cornell Scale for Depression in Dementia (CSDD) and the Montgomery-Asberg Depression Rating Scale (MADRS) performed best (area under the curve [AUC]: 0.73–0.87), although both presented with low positive predictive values and high negative predictive values. Among the second category, the Nursing Homes Short Depression Inventory (NH-SDI) performed best, with an AUC of 0.93 and ≥85% sensitivity, specificity, and predictive values. Conclusion: The CSDD and MADRS may be useful to rule out depression in LTC patients with MNCD, whereas the NH-SDI may be useful to rule in and out depression within this same population. Before recommending their use, adequately powered studies to further examine their accuracy in different contexts are necessary.

Introduction

Depression, a disorder affecting psychological well-being and functional status [1], occurs at a prevalence ranging between 22.1% and 37.8% in long-term care (LTC) centers [2]. Depression is associated with decreased quality of life [3, 4], functional [5, 6], and cognitive capacities [7, 8], as well as increased falls [9, 10], mortality [11, 12], and costs [5, 13]. Given these consequences, accurately identifying depression in LTC patients is a high priority.

However, up to 80% [2, 14, 15] of LTC patients are diagnosed with a major neurocognitive disorder (MNCD), a comorbidity known to challenge the accurate detection of depression. Indeed, these comorbidities share several manifestations (e.g., agitation and aggressivity [16]), and MNCDs are associated with significant cognitive decline (e.g., communication and memory) [17], which may limit the validity and clinical utility of existing patient-reported outcome measures (PROMs) [18] (e.g., the Geriatric Depression Scale [GDS] [19] and the Patient Health Questionnaire [PHQ-9] [20]). For these reasons, depression is frequently misdetected among LTC patients with MNCDs.

Two alternative approaches have been proposed to address this issue. First, some have converted existing PROMs into observer-rated outcome measures (ObsROMs) to identify depression based on observations of a given patient (e.g., PHQ-9 converted to PHQ-9 observational version [21, 22]). However, the validity of such approach is questionable, and studies examining its accuracy have not been summarized recently. Second, other investigators have developed true ObsROMs specifically designed to detect depression in patients with MNCDs, but data on their accuracy remain scattered in the literature. Therefore, given the significant morbidity and mortality associated with depression in LTCs and to assist clinicians in identifying valid measures, the objective of this systematic review was to synthesize existing evidence on the screening accuracy of ObsROMs for the identification of depression among LTC patients with MNCD.

MethodsDesign

A systematic review was conducted based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 Statement [23].

Search Strategy and Inclusion Criteria

The literature search was done in four steps. First, a concept plan was elaborated and optimized with the help of a scientific librarian (Table 1). When applicable, medical subject headings or subject terms (e.g., depression, psychiatric status rating scales, geriatric assessment, psychological tests, clinical assessment tools) were also applied. Second, using this concept plan, the first two authors (É.T. and D.C.) independently searched 7 databases to identify studies reporting on the accuracy of depression ObsROMs used in LTC patients with MNCD. These databases were as follows: Medline, CINAHL, PsychInfo, Ageline, Abstract in Social Gerontology, Cochrane Library, and SCOPUS. The electronic searches occurred between January and June 2021 and were updated in January 2022. Third, É.T. and D.C. independently read the titles and abstracts of all retrieved articles and, when necessary, the full text to determine whether any given study should be included. Studies were included if they (a) reported on the accuracy of an ObsROM (true ObsROM or PROM used as ObsROM); (b) targeted LTC patients with MNCDs; (c) were published in English or French; and (d) were published prior to January 2022. Discrepancies were resolved through discussions among members of the research team. Last, we searched the reference lists of the included articles to identify any additional eligible studies.

Table 1.

Systematic review concept plan

Data Extraction

Data extraction was conducted by É.T. and D.C. and included (1) authors; (2) publication year; (3) study design; (4) sample size and characteristics (e.g., age, sex, Mini-Mental State Examination score, and method of MNCD diagnosis); (5) country; (6) setting (type of LTC); (7) scale characteristics (name, number of items, measurement scale, score range, and administration time); (8) reference standard used; (9) accuracy estimates, including sensitivity, specificity, positive (PPV) and negative predictive values (NPV), and area under the receiver operating characteristics curve (AUC), along with their 95% confidence interval (95% CI); and (10) study limitations.

Assessment of Methodological Quality

Risk of bias (ROB) and concern regarding applicability (CRA) were independently assessed by É.T. and D.C. using the Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) criteria [24]. This tool comprises 4 domains (patient selection, index test, reference standard, and flow and timing), each assessed in terms of ROB, and the first 3 domains are also assessed in terms of CRA [24]. Signaling questions are included to help judge ROB. ROB may be judged “low” when all signaling questions are answered “yes” or “high” and when at least one is answered “no” [24]. If any signaling question is answered “unclear,” meaning insufficient data reported to permit judgment, this may also flag the potential for bias [24]. CRA are similarly rated as “low,” “high,” or “unclear” based on reviewers’ judgments [24].

Data Analysis

We present descriptive statistics of the included studies. Then, we summarize the characteristics of the ObsROMs identified, along with their screening accuracy estimates. We used p < 0.05 to determine the statistical significance. Finally, we present ROB and CRA for each included study.

Results

The literature search identified 9,660 records (Fig. 1). After removing duplicates and screening titles and abstracts, eight studies met our inclusion criteria and were included [25-32] (Fig. 1). Sample sizes across studies ranged from 35 to 107 patients (Table 2). Patient and study characteristics are described in Table 2. The retrieved studies reported on the screening accuracy of 11 depression scales. Most scales (n = 8) were validated in only one study, and 6 studies compared the accuracy of ≥2 scales. These scales were grouped into two categories based on their designed administration mode: (1) PROMs used as ObsROMs (n = 9 scales) or (2) true ObsROMs (n = 2 scales). These scales are further described in the following subsections, along with their screening accuracy and the reference standards used to judge their accuracy.

Table 2.

Characteristics of reviewed studies

Fig. 1.

Study flow diagram, Preferred Reporting Items for Systematic Reviews and Meta-Analyses.

Reference Standard Diagnosis of Depression

Overall, four diagnostic criteria were used for the reference standard diagnosis of depression among the retrieved studies: (1) significant clinical depression as established by a consultant psychiatrist, psychologist, or other therapist [25], (2) the Diagnostic and Statistical Manual of Mental Disorders (DSM) criteria [1], (3) the International Classification of Diseases (ICD) codes [33], and (4) the Provisional Diagnostic Criteria for Depression in Alzheimer’s Disease (PDC-dAD) [34].

PROMs Used as ObsROMs (n = 9 Scales)Collateral Source Geriatric Depression Scale

The Collateral Source Geriatric Depression Scale (CS-GDS) is an ObsROM adapted by Nitcher et al. [35] from the GDS, a PROM [19]. Two versions exist (15- and 30-item), and items are measured on a dichotomous scale (absent or present), with a total score ranging from 0 to 15 or 30, depending on the version being used. Li et al. [32] examined the screening accuracy of both CS-GDS versions in a subsample of 35 nursing home (NH) patients with MNCD in Australia (Tables 2, 3). NH staff or family members completed the CS-GDS. Using a medical diagnosis of depression as the reference standard (PDC-dAD), both CS-GDS versions achieved 70% sensitivity (95% CI: 34.8–93.3), whereas specificity was 64% (95% CI: 42.5–82) for the CS-GDS-15 and 56% (95% CI: 34.9–75.6) for the CS-GDS-30 [32] (Table 3). In a ROC analysis, none of the CS-GDS versions performed significantly above chance in detecting depression; the AUC of the CS-CDS-15 was 0.58 (95% CI: 0.40–0.74), whereas that of the CS-GDS-30 was 0.57 (95% CI: 0.39–0.73) [32].

Table 3.

Screening accuracy of identified measurement scales

Cornell Scale for Depression in Dementia

The Cornell Scale for Depression in Dementia (CSDD), a 19-item mixed PROM and ObsROM, was developed by Alexopoulos et al. [36]. Two versions met our inclusion criteria: the CSDD and the CSDD Modified for Use by Long-Term Care Staff (CSDD-M-LTCS), a 19-item ObsROM adapted by Watson et al. [28]. Items on both versions are measured on a 3-point Likert scale (0 = absent; 2 = severe), with the CSDD indicating their severity [36] and the CSDD-M-LTCS indicating frequency [28]. The total score on both versions ranges from 0 to 38. Both scales take 20–30 min to complete [36].

Overall, four studies examined the screening accuracy of the CSDD in samples ranging from 43 to 101 LTC patients [26, 27, 29, 30], and 1 study examined the screening accuracy of the CSDD-M-LTCS in a sample of 107 patients from residential care and assisted living (RC/AL) [28] (Tables 2, 3). Both scales were completed based solely on caregiver observations. Using a medical diagnosis of depression as the reference standard, the CSDD achieved sensitivities ranging from 67% to 94% and specificities ranging from 49% to 57% [26, 27, 30], whereas the CSDD-M-LTCS achieved 47% sensitivity and 65% specificity [28] (Table 3). Both instruments also had high NPV (86–94%) but low PPV (18–31%) (Table 3). In a ROC analysis, the CSDD performed significantly above chance in detecting depression (AUCs ranging from 0.76 to 0.83) [27, 29, 30], whereas the CSDD-M-LTCS did not (AUC: 0.66; 95% CI: 0.49–0.78) [28].

Depression in Dementia Mood Scale

The Depression in Dementia Mood Scale (DDMS), a 17-item mixed PROM and ObsROM, was adapted by Sunderland et al. [37] from the Hamilton Depression Rating Scale, a PROM [38]. Items are measured on a 7-point Likert scale based on specific indications of severity for each (0 = absent; 6 = severe), and the total score ranges from 0 to 102 [37]. Elanchenny et al. [26] examined the screening accuracy of the DDMS based solely on collateral information in a subsample of 43 patients with MNCD from an acute care unit or continuing care geriatric psychiatric wards in England (Tables 2, 3). Trained nurses completed the DDMS. Using a medical diagnosis of depression as the reference standard (significant clinical depression established by a consultant psychiatrist), the DDMS achieved 67% sensitivity (95% CI: 40–94) and 23% specificity (95% CI: 17–43) [26] (Table 3).

Depression Rating Scale

The Depression Rating Scale (DRS), a 7-item mixed PROM and ObsROM, was adapted by Burrows et al. [39] from the mood items in the Minimum Data Set (version 2), a mixed PROM and ObsROM. Items are measured on a 3-point Likert scale indicating their frequency (0 = absent; 1 = present 5 days per week; 2 = present 6–7 days per week), and the total score ranges from 0 to 14 [39]. Watson et al. [28] examined the screening accuracy of the DRS based solely on collateral information in a sample of 107 patients from RC/AL in the USA (Tables 2, 3). A staff member completed the DRS. Using a medical diagnosis of depression as the reference standard (DSM-IV), the DRS achieved 47% sensitivity (95% CI: 21–73) and 85% specificity (95% CI: 76–91) [28] (Table 3). In a ROC analysis, the DRS performed significantly above chance in detecting depression (AUC: 0.70; 95% CI: 0.52–0.82) [28].

Depressive Signs Scale

The Depressive Signs Scale (DSS), a 9-item mixed PROM and ObsROM, was developed by Katona et al. [40]. Apart from 1 item measured on a dichotomous scale (absent or present), items are measured on a 3-point Likert scale (0 = absent; 2 = severe), and the total score ranges from 0 to 17 [40]. The DSS takes 3 min to complete [25]. Overall, 2 studies examined the screening accuracy of the DSS based solely on collateral information in subsamples of 43–68 patients with MNCD [25, 26] (Tables 2, 3). Nurses completed the DSS. Using a medical diagnosis of depression as the reference standard (significant clinical depression established by a consultant psychiatrist), the DSS achieved sensitivities ranging from 62% to 67% and specificities ranging from 59% to 72% [25, 26] (Table 3).

Montgomery-Asberg Depression Rating Scale

The Montgomery-Asberg Depression Rating Scale (MADRS), a 10-item mixed PROM and ObsROM, was developed by Montgomery et al. [41]. Items are measured on a 7-point Likert scale with specific indications of severity for each (0 = absent; 6 = severe), and the total score ranges from 0 to 60 [41]. The MADRS takes 15–20 min to complete [42]. Leontjevas et al. [27, 30] examined the screening accuracy of the MADRS based solely on collateral information in samples ranging from 63 to 101 LTC patients in the Netherlands (Tables 2, 3). Graduate psychologists completed the MADRS based on professional caregiver observations. Using a medical diagnosis of depression as the reference standard, the MADRS achieved sensitivities ranging from 75% to 78% and specificities ranging from 66% to 84% [27, 30] (Table 3). The MADRS also had high NPV (93–94%) but low PPV (33–53%) (Table 3). In a ROC analysis, the MADRS performed significantly above chance in detecting depression (AUCs ranging from 0.73 to 0.87) [27, 30]. The accuracy of the MADRS was also compared to the CSDD’s, but results were inconsistent [27, 30].

Yale Single Question: Do You Often Feel Sad or Depressed?

The Yale Single Question (YSQ), a PROM [43], was adapted to an ObsROM by Watson et al. [28]. It consists of a single question (do you believe the resident is often sad or depressed?) measured on a dichotomous scale (yes or no) [28]. Watson et al. [28] examined the screening accuracy of the YSQ based solely on collateral information in a sample of 107 patients from RC/AL in the USA (Tables 2, 3). A staff member completed the YSQ. Using a medical diagnosis of depression as the reference standard (DSM-IV), the YSQ achieved 47% sensitivity (95% CI: 21–73) and 74% specificity (95% CI: 64–83) [28] (Table 3).

True ObsROMs (n = 2 Scales)Nijmegen Observer-Rated Depression Scale

The Nijmegen Observer-Rated Depression Scale (NORD), a 5-item ObsROM, was developed by Leontjevas et al. [31]. Items are measured on a dichotomous scale (absent or present), and the total score ranges from 0 to 5 [31]. The NORD takes 2–3 min to complete [31]. Leontjevas et al. [31] examined the screening accuracy of the NORD in a subsample of 103 patients with MNCD from somatic and dementia special care units in the Netherlands (Tables 2, 3). Trained nursing staff completed the NORD. Using a medical diagnosis of depression as the reference standard (PDC-dAD, DSM-IV, and ICD-10), the NORD achieved 79% sensitivity (95% CI: 54–94) and 77% specificity (95% CI: 67–86) [31] (Table 3). In a ROC analysis, the NORD performed significantly above chance in detecting depression among patients with MNCD (AUC: 0.84; 95% CI: 0.75–0.90) [31].

Nursing Homes Short Depression Inventory

The Nursing Homes Short Depression Inventory (NH-SDI), a 16-item ObsROM, was developed by Prado-Jean et al. [29]. Items are measured on a dichotomous scale (absent or present), and the total score ranges from 0 to 16 [29]. Prado-Jean et al. [29] examined the screening accuracy of the NH-SDI in a sample of 99 cognitively impaired NH patients in France (Tables 2, 3). Untrained nurses completed the NH-SDI. Using a medical diagnosis of depression as the reference standard (DSM-IV), the NH-SDI achieved 86% sensitivity (95% CI: 74.2–94.4), 85% specificity (95% CI: 71.7–93.8), and PPV/NPV of ≥85% [29] (Table 3). In a ROC analysis, the NH-SDI performed significantly above chance in detecting depression (AUC: 0.93; 95% CI: 0.88–0.98) and was better at discriminating depressed from nondepressed patients than the CSDD [29].

Methodological Quality Assessment (QUADAS-2)

Overall, the methodological quality of the retrieved studies was good (Table 4). A total of 1 study had high ROB for its reference standard since it was unblinded to the index tests for 20 patients [30] and 2 studies for their flow and timing since they omitted to include all patients in their analysis [26, 28]. In addition, one study had high CRA for patient selection since they included patients from an acute care unit [26], and 1 study for index test since the scale raters included nonprofessional caregivers (e.g., family members) [32].

Table 4.

Quality appraisal of reviewed studies

Discussion

The purpose of this review was to synthesize existing evidence on the screening accuracy of ObsROMs for identifying depression in LTC patients with comorbid MNCD. We identified a total of 11 scales, which were classified based on their designed administration mode as either (1) PROMs used as ObsROMs or (2) true ObsROMs.

The first category (PROMs used as ObsROMs) contains 9 scales, of which two (CSDD [36] and MADRS [41]) were found to significantly discriminate depressed from nondepressed patients [27, 29, 30]. Both scales also have low PPV and high NPV [26, 27, 30], which suggest that they might be better for ruling out depression than ruling in [30, 44]. In addition, two studies compared the screening accuracy of both scales using the same samples, but results were nonconclusive as to which performed better [27, 30]. Based on this evidence and considering estimated completion times (20–30 min for the CSDD [36]; 15–20 min for the MADRS [42]), one could favor the use of the MADRS over the CSDD in a busy or short-staffed setting. However, based on given limited data on the comparative accuracy of these scales, we recommend additional research.

As for the second category (true ObsROMs), it contains 2 instruments (NORD [31] and NH-SDI [29]) that significantly discriminated depressed from nondepressed patients [29, 31]. While the NORD has low PPV and high NPV [31], the NH-SDI stands out as having both high PPV and NPV [29], which suggest it could be useful for ruling in and out depression [44]. Compared to the CSDD, the NH-SDI discriminated better depressed from nondepressed patients [29]. Although promising, the NH-SDI has been validated in only one study, in which the prevalence of depression was high [29]. We, therefore, recommend further research to better document its performances under a variety of circumstances (e.g., MNCD types, comorbidities, and range of depression prevalence values). Globally, existing evidence does not allow concluding on which approach is best to identify depression in LTC patients with MNCD (i.e., PROMs used as ObsROMs vs. using true ObsROMs). Best performances appear to be instrument specific.

Interestingly, existing evidence on the screening accuracy of most identified measurement scales originates from single studies with small sample sizes (n = 35–107) [26, 28, 29, 31, 32]. This ultimately led to imprecise screening accuracy estimates, along with wide CIs [26, 28, 29, 31, 32, 45]. To generate precise estimates, adequately powered studies are required [45]. Moreover, given the small sample sizes, no study explored how MNCD characteristics (e.g., type, etiology, and severity) and patient factors (e.g., sex and ethnicity) impact scale accuracy. These represent interesting avenues for further research. Furthermore, multiple other factors might influence the accuracy of measurement scales. For instance, we found important heterogeneity across the reviewed studies regarding their reference standard employed (n = 4) [1, 25, 33, 34], the extent to which the collateral sources were familiar with the patients assessed [28] and the collateral sources’ qualifications (e.g., nurses [25, 26, 29], nursing assistants [28, 31], and psychology students [27, 30]). Further research is needed to determine how such variations influence the accuracy of depression ObsROMs.

Strengths and Limitations

An important strength of this systematic review is to build on and expand Mele et al. [18] prior work in 2 clinically relevant ways: (1) by focusing on a more targeted population of LTC patients with MNCD, in which depression is often under-detected, and (2) by including a broader number of ObsROMs (n = 11). However, certain limitations of this review must be acknowledged. First, despite the fact that our literature search was exhaustive, just like any review, there is a possibility that we omitted eligible studies. Therefore, we recommend periodic updates of this review. Second, limitations also include those of the reviewed studies, including small sample sizes, imperfect reference standards, reliance on observable manifestations in the absence of guidelines on evidence-based indicators [46], and heterogeneity of the collateral sources, all of which may have influenced our results.

To conclude, depression is difficult to assess in LTC patients with MNCD, and best detection performances appears to be instrument specific rather than based on designed administration modes (i.e., true ObsROMs vs. PROMs used as ObsROMs). We identified two scales (CSDD and MADRS) that appear useful to rule out depression in LTC patients with MNCD and one scale (NH-SDI) to rule in and out depression within this same patient population. Before recommending their use in LTCs, adequately powered studies to further examine and compare their accuracy in a variety of contexts are necessary.

Statement of Ethics

An ethics statement is not applicable because this study is based exclusively on published literature.

Conflict of Interest Statement

The authors have no conflicts of interest to declare.

Funding Sources

Funding for this systematic review was from Pr. Christian Rochefort’s research laboratory, who receives funding from the Canadian Institutes of Health Research. É.T. holds a master’s degree scholarship from the Quebec Ministry of Education (Ministère de l’Éducation et de l’Enseignement Supérieur du Québec). D.C. holds master’s degree scholarships from the Canadian Institutes of Health Research (Frederick-Banting and Charles-Best Canada Graduate Scholarship), the Research Center on Aging (Centre de recherche sur le vieillissement de l’Université de Sherbrooke), and the Quebec Ministry of Education (Ministère de l’Éducation et de l’Enseignement Supérieur du Québec). These funding sources were not involved in study conception and design, data acquisition, analyses and interpretation, or the final decision to submit this manuscript for publication.

Author Contributions

All listed authors (É.T., D.C., M.P.V., J.R.D., and C.R.) have made substantial contributions to (1) the conception or design of the work or the acquisition, analysis, or interpretation of data for the work; (2) drafting the work or revising it critically for important intellectual content; (3) have given final approval of the version to be published; and (4) have agreed to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

Data Availability Statement

All data generated or analyzed during this review are included in this article. Further inquiries can be directed to the corresponding author.

Copyright: All rights reserved. No part of this publication may be translated into other languages, reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, microcopying, or by any information storage and retrieval system, without permission in writing from the publisher.
Drug Dosage: The authors and the publisher have exerted every effort to ensure that drug selection and dosage set forth in this text are in accord with current recommendations and practice at the time of publication. However, in view of ongoing research, changes in government regulations, and the constant flow of information relating to drug therapy and drug reactions, the reader is urged to check the package insert for each drug for any changes in indications and dosage and for added warnings and precautions. This is particularly important when the recommended agent is a new and/or infrequently employed drug.
Disclaimer: The statements, opinions and data contained in this publication are solely those of the individual authors and contributors and not of the publishers and the editor(s). The appearance of advertisements or/and product references in the publication is not a warranty, endorsement, or approval of the products or services advertised or of their effectiveness, quality or safety. The publisher and the editor(s) disclaim responsibility for any injury to persons or property resulting from any ideas, methods, instructions or products referred to in the content or advertisements.

View original article

DEMENTIA AND GERIATRIC COGNITIVE DISORDERS

Share Bookmark

0 0 0 0 0 0 0

More from this channel

Accuracy of Observer-Rated Measurement Scales for Depression Assessment in Patients with Major Neurocognitive Disorders Residing in Long-Term Care Centers: A Systematic Review

Comments (0)