Psychometric validation of the Hospital Anxiety and Depression Scale (HADS) in community-dwelling older adults

According to the European Commission’s Green paper on mental health [53], depression is one of the most prevalent mental health problems facing European citizens today. The incidence of depression with increasing age is stated [15]; simultaneously the number of adults over 70 years is globally expected to increase in the coming decades [54]. Hence, access to a valid and reliable scale assessing anxiety and depression among older community-dwelling adults is highly warranted. Therefore, the present study aimed to evaluate the psychometric properties of HADS among community-dwelling older Norwegians ≥ 70 years. In doing this, we tested five hypotheses. The present sample included 1190 older adults, with a mean age of 76.5 years. To the authors’ knowledge, no previous studies have examined the psychometric properties of HADS in a Norwegian population among community-dwelling older adults using CFA.

The CFA approach eliminates the need to summate scales because the SEM programs such as STATA compute latent construct scores for each respondent. This process allows relationships in the model tested to be automatically corrected for error variance, a fundamental strength of CFA in construct validation. Thus, the resulting estimates are adjusted for measurement error [36, 47]. In this study, the original HADS (Model-1) version showed only partly a good fit. In particular, the chi-square demonstrated extremely high values, indicating misspecification. However, utilizing the chi-square as a model fit index relates to some limitations. As already stated, chi-square is sensitive to sample size: a misfit may be trivial, but with larger samples, the p-value decreases, followed by higher estimates [52]. This means that in practice, the chi-square test is “not always the final word in assessing fit” [55]. The present sample size is large (N = 1190), revealing extraordinarily high estimates for the chi-square. When splitting the file into two parts, giving a sample size N = 595, the chi-square improved substantially, and the RMSEA was still acceptable. Hence, reflecting on the chi-square statistic in light of the large sample size, a wide variety of other indices were included to assess model adequacy. The SEM literature states that, as a minimum, RMSEA, CFI, and SRMR should be reported in combination with chi-square [48]. Using multiple fit indices provides a more holistic view of goodness of fit, accounting for sample size, model complexity, and other considerations relevant to the study.

Conversely, the RMSEA estimate has demonstrated lower values with large sample sizes [56, 57]. For an acceptable fit, RMSEA should be ≤ 0.080 [36, 47, 48] or ≤ 0.10 [43], while estimates ≤ 0.050 suggest a good fit. Looking at Model-1, the RMSEA along with SRMR were acceptable and almost good (0.059, 0.052, respectively), while the CFI and TLI were too low. Concerning CFI and TLI, including a cross-loading item (6D) along with a correlated error term between the items 2D and 12D improved these fit indices as well as the total model fit. Consequently, low reliability and content validity seemed to cause low values for CFI and TLI.

Theory guided the inclusion of the cross-loading and the correlated error term. It is rational that feeling cheerful (item 6) and simultaneously feeling anxious is a contradiction. To feel both cheerful and anxious at the same time is unrealistic. In contrast, people may say, “I still enjoy the things I used to enjoy” (item 2) despite occasionally feeling anxious. The same logic goes for item 4 (“I can laugh and see the funny side of things”) and 12 (“I look forward with enjoyment to things”). To feel cheerful is a feeling, an experience here and now, while being able to ‘enjoy the things that I used to enjoy’ as well as being able to ‘laugh and see the funny side of things’ are not necessarily something a person feels in the moment. These are more general future aspects, such as possibilities or attitudes. Thus, these can go together with having anxiety from time to time. Therefore, we did not allow cross-loadings to the anxiety construct for these items.

Dimensionality (H1)

Concerning the dimensionality of the HADS, the two-factor model undoubtedly showed the best fit to the present data; the dimensionality of the HADS questionnaire stood out to be unquestionable supporting H1. The two factors were properly correlated. However, the original two-factor solution did not reveal a good fit. Thus, H1 was only partly supported.

Reliability (H2)

The second hypothesis (H2) concerned the reliability of the HADS. All items were significant. Largely, the items revealed good loadings (shown in Fig. 1) accompanied by good multiple-squared correlations (R2) demonstrating good reliability. Nevertheless, particularly three items belonging to the depression construct (8D,10D,14D) demonstrated low factor loadings and, thus poor reliability, explaining very little of the variance in the construct. These three items caused a low reliability coefficient for depression, while anxiety displayed good reliability. Hence, H2 was not fully supported.

Construct validity (H3)

H3 tested the construct validity, which concerns whether the set of measured items reflects the theoretical latent construct those items are designed to measure. Hence, it deals with the accuracy of measurement involving psychometric evidence of convergent and discriminant validity [58]. A measure is said to process convergent validity if independent measures of the same construct converge or are highly correlated [49]. Usually, researchers do not have data on two different, e.g., depression scales scored by the same sample: this represents a frequent problem connected with convergent validity. However, measures that theoretically are predicted to correlate significantly with depression might be used. The present study included measures of overall global QOL to test for convergent validity, which was supported by a significant correlation in the expected direction.

Testing discriminant validity, H4 stated that HADS correlates significantly and negatively with QOL, while H5 expected anxiety and depression to perform as two distinct concepts. Discriminant validity specifically measures whether constructs that theoretically should not be related to each other are, in fact, significantly unrelated. In psychometrics, discriminant validity, also termed divergent validity, indicates that the results obtained by the scale (here HADS) do not correlate too strongly with measurements of a similar but distinct trait; two tests reflecting different constructs should not be strongly related to each other. If they are, we cannot be sure they are not measuring the same construct. Accordingly, discriminant validity indicates the extent of difference between two constructs. The complementary concept to divergent validity is convergent validity; both are forms of construct validity. Hence, a high correlation (higher than 0.40) [59] between HADS and QOL would indicate that the measures substantially overlap and do not behave as clearly distinct constructs [49]. Moreover, a high correlation between anxiety and depression would indicate that the two factors were measuring much of the same trait: this would give a good internal consistency (Cronbach’s alpha and composite reliability) but blur the dimensionality. In this study, the anxiety and depression factors performed like distinct concepts supporting the discriminant validity. Simultaneously, the factor correlation between anxiety and depression was highly significant, supporting convergent validity [49]. The convergent and discriminant validity was further supported by significant correlations in the predicted direction for anxiety and depression towards QOL, supporting hypothesis H4.

Content validity – a vital aspect of construct validity (H3)

Content validity is a central aspect of construct validity. Reliability and content validity represent interrelated measurement properties. In fact, despite good reliability, content validity might be poor. Contrariwise, validity cannot be good if reliability is low [49]. Item 8D concerns “I feel as I’m slowed down” demonstrated low reliability and, thereby poor validity. In the present sample, with a mean age of 76.5 years, most individuals outside an active work-life have lots of time to adjust to a slower pace of life. Possibly, ‘feeling slowed down’ does not correspond well to older home-living adults’ daily experiences in relation to depression. This item did not perform to be a valid or reliable indicator of depression in this population. Moreover, about 50% of the participants reported physical or mental long-term illness, injury, or loss of function in daily life. Relevantly, a slower pace of life might seem natural and not necessarily an indicator of depression [7].

Likewise, item 10D, “I have lost interest in my appearance,” did not communicate well with these older adults, indicating low reliability and content validity. Losing interest in one’s appearance did not act as a valid indicator of depression in this population. Losing interest in one’s appearance may be reasoned by the inevitable age-related changes they experience rather than as a symptom of depression. Moreover, item 14D, “I can enjoy a good book or TV program,” also stood out as an unreliable indicator of depression. Plausibly, being old, enjoying a good book, or watching TV does not relate to depression. Living in your seventies-eighties-nineties, passive leisure activities are everyday activities that are useful as restoration time after active leisure activities and are related to QOL [60]. Reading books might be more demanding due to a decline in sight as well as fatigue. Consequently, item 14D did not explain any variance in the depression construct and thus misbehaved as a valid indicator for the depression construct.

These findings are consistent with previous studies among nursing home residents without cognitive impairment [37] and hospitalized older adults [33], where the same three items were troublesome among older adults in Norwegian care facilities. In older ages, for the first time in their life, retired adults can slow down. Also, due to a decline in age-related reserve capacity and fear of falling, the most common fear in older adults [61], many older adults may be forced to a slower pace. Doing passive activities such as watching TV or reading may also be a consequence of having a chronic medical condition and multimorbidity, which is associated with anxiety and depression [62]. Hence, the wording of the items 8D, 10D, and 14D should be carefully considered to improve reliability.

Furthermore, the former validation study among older adults in nursing homes [37] also involved a cross-loading for item 6D to anxiety, as well as highly significant error variances between items 2D and 12D. Surprisingly, community-dwelling older adults living at home (the present study), nursing home residents (two different samples giving an approximate N = 500; mean age 84.5 and 86 years) [37], and hospitalized older adults (N = 484; mean age 80.7 years) [33] respond similarly findings of the HADS-D items.

Summarized, construct validity and reliability of anxiety were good. Conversely, the depression construct revealed low validity and reliability, which are interrelated measurement properties. Exclusively, content validity includes the extent to which elements of a measurement scale are appropriate and characteristic of the specific construct for a certain assessment purpose [49]. In this study, content validity concerns whether the 14 HADS items and the two-factor dimensionality precisely represent anxiety and depression in this population. Besides, evidence of face validity can be considered as one aspect of content validity [49]. High face validity of an instrument increases its use in practical situations via ease of use, proper reading level, clarity, and appropriate response formats. Thus, to improve content validity and thereby also reliability for the depression factor, qualitative studies could be applied to get closer to the actual content of depression, investigating what might be the most essential indicators of depression among community-dwelling older adults. Based on such novel evidence, the three troublesome items could be formulated in a more valid format.

Strengths and limitations

A notable strength of this research is the empirical examination of the HADS, which has not been tested previously in a community-dwelling older population of 70 + using CFA in Norway. Also, the large sample size is a strength, allowing the possibility to randomly split the sample into two different samples, including 595 community-dwelling older adults each.

Although the older adults were selected randomly in two subsamples, we cannot state that the sample represents the community-dwelling older adults in the actual city since 3181 of 4667 declined participation. In addition, those excluded from this present study were older and had less education. Hence, in the view of representativity, we assume that the present sample may be disrupted, not representing all community-dwelling older adults.

留言 (0)

沒有登入
gif