A 13-item Health of the Nation Outcome Scale (HoNOS-13): validation by item response theory (IRT) in patients with substance use disorder

The data of this study were collected by experienced data extractors from the hospital electronic medical record system from February 2015 to September 2019. They concerned patients with SUD admitted to a specialized addiction unit of a large university hospital. The population were mainly men (70.7%), with a mean age of 43.3 (SD 11.5) years. During the reported period, the number of hospitalizations ranged from 1 to 13 with a median length of stay of 15 days (2–690). The median HoNOS score was 16 (1–44) at admission and 11 (0–37) at discharge. The questionnaire was administered by the psychiatrists working in the hospital unit who had received a training session for the use of this tool. The Geneva ethics comity approved this study (ClinicalTrials.gov, Identifier: NCT03551301). Six hundred nine (609) valid questionnaires of the HoNOS were analyzed.

Statistical analysis

HoNOS is a polytomous-ordered categorical scale with its items ranked on a 5-point Likert scale from 0 (no problem) to 4 (severe to very severe problem), with higher scores indicating more problems. To handle this type of data, Samejima [19] proposed a probability function that a person’s response falls at or above a particular category given the latent trait as follows [23]:

$$_^\left(\theta \right)=\frac_\left(\theta -_\right)\right]}_\left(\theta -_\right)\right]}.$$

This equation is known as the boundary characteristic function of item j for category k, given the latent trait θ. The parameter \(_\) is the slope of the function or item discrimination and reflects an item ability to discriminate between individuals scoring high and low scores on θ. The \(_\) parameter also called threshold parameter refers to the latent trait where an individual has a 50% probability of endorsing a particular category k or higher.

Conceptually, GRM would treat each item as a series of \(K-1\) dichotomous items, which translates into \(K-1\) thresholds where \(K\) is the number of Likert-type ordered categories [24].

In IRT, persons and items are located on the same continuum. A good differentiation among individuals i.e., the ability of an item at discriminating below and above the mean, is a desired characteristic of a good item [25].

The main concept in IRT is the item characteristic curve (ICC) produced by the model given in the above equation. They account for the relation between a person’s ability or trait and the probability of a particular item response.

Originally, a traditional IRT model contains a single continuous latent variable representing the construct of interest. The fitting of such a model requires the satisfaction of three fundamental assumptions: unidimensionality (the minimal assumption), monotonicity and local independence.

Unidimensionality means that item correlations are explained by a single dimension. This assumption was tested with the Loevinger’s H coefficients [26], which indicate the degree of homogeneity of an item set. Bounded by 0.3 and 0.4, H weakly supports unidimensionality. If bounded by 0.4 and 0.5, the scale is said moderately unidimensional. Higher values than 0.5 strongly satisfy the assumption of unidimensionality [27, 28]. The Mokken package of R program [29] was used for the calculation of the H values.

Monotonicity presumes a non-decreasing probability of endorsement of item response categories when the levels of the latent trait increase. This assumption was examined through the rest-score graphs as the difference between the raw scale score and the item score for each item. These graphs picture the rest-scores on the X-axis and the proportion of respondents in each rest-score group endorsing the item on the Y-axis [30]. The Mokken package of R program [29] was used to plot these graphs.

As for local independence, it assumes that the responses to an item are independent of that of the others, conditional on the person’s location [31,32,33]. This assumption is tested through the item residual correlation matrix. Residual pairs > 0.1 are an indication for local dependence [34, 35].

As psychological constructs became more complex, it also became obvious that the ability of a single construct to approximate complex data had reached its limits. Accordingly, psychometric research have led to the development of more sophisticated models of which MIRT is a novel statistical technique [36].

The 2-PL form of MIRT can be written as [37]:

$$_^\left(\theta \right)=\frac_a}_\left(_-_\right)\right]}_a}_\left(_-_\right)\right]^}$$

where \(_^}\left(\theta \right)\) is the probability that observed scores for item j and respondent i given the ability/trait θ to obtain a score greater than or equal to category k, \(_\) is the vector of item discrimination parameters for item j on each latent trait m, \(_\) is the vector of item severity parameters for each category k within item j, \(_\) is the vector of the latent traits on the \(}\) dimension and D = 1 or 1.7, a scaling constant ( D = 1.7 to scale the logistic to the normal ogive metric, D = 1 to preserve the logistic metric).

Assumptions for using MIRT:

MIRT models differ from UIRT models in that they are a linear combination of a vector of abilities (θ) rather than a single dimension. Apart from that, the monotonicity and independence assumptions remain in force in MIRT models. The monotonicity assumption requires that as any element in the θ-vector increases, the probability of endorsing a certain item response category also increases. As for the independence assumption, it states that the response of any person to any test item is assumed to depend solely upon the person’s θ-vector and the item’s vector of parameters [38].

The model parameters were estimated using the Mirt package [39] of the free R program [29].

To recall, the Mirt package also allows for the estimation of unidimensional models by giving the program appropriate instructions.

Full information maximum likelihood estimation is implemented is this package for both unidimensional and multidimensional models.

A high discrimination parameter, resulting in a steep ICC, suggests that the item has a high ability to differentiate subjects with high and low levels of the construct [40]. A high discrimination also means that the item provides a lot of information on the latent trait. Nevertheless, items with low discrimination parameters, even though less informative, may contribute information over a wider spectrum of the latent trait. Descriptive rules of thumb guidelines for discrimination [41] suggest that: 0 = non discriminative power; 0.01–0.34 = very low; 0.35–0.64 = low; 0.65–1.34 = moderate; 1.35–1.69 = high; > 1.70 = very high; and + infinity = perfect.

Concerning the thresholds, there were five response options thus there are four of them. Table 1 pictures our sample distribution of HoNOS-13.

Table 1 Distribution of HoNOS-13

Using the data at admission, we first fitted a one-factor model for HoNOS-13 for the sake of parsimony and model complexity. Due to lack of fit, a two-factor model identified by two of the authors, psychiatrists (expert consensus) was envisaged: Factor 1 would capture psychiatric/impairment-related issues (items 1 to 8 and 13) and Factor 2 would reflect social-related issues (items 9 to 12).

Goodness of fit of the models was assessed by the root mean square error of approximation (RMSEA) of < 0.08 and < 0.06, respectively, and the comparative fit index (CFI) values of > 0.90 and > 0.95, respectively [42, 43]. Other information criteria, specifically the Akaike information criterion (AIC), Bayesian information criterion (BIC), and the sample-adjusted BIC (SABIC) were also used, knowing that AIC and BIC are specifically designed to penalize for model complexity.

Nested models were compared via the likelihood ratio statistics or by a reduction of goodness-of-fit indices such as AIC, BIC and SABIC. Finally, the performance of the UIRT and MIRT models was addressed through an anova testing which tests whether the more complex model is better at capturing the data than the simpler model.  A significant p-value (p < 0.05) speaks in favor of the more complex model.

All analyses, tests and plots were obtained using appropriate packages of the R program.

Sample size requirements

Forero and Maydeu-Olivares [44] cited by Depaoli et al. [45] have found that sample sizes as small as 200 were sufficient for the parameter estimation of a graded response model. On the other hand, Jiang and al. also cited by Depaoli et al. [45] showed that a sample size of 500 provided accurate parameter estimates in the case a three-dimensional GRM composed from 30 to 90 items each with four response categories [46]. Thus, we are confident that the sample size at hand (609) fulfilled the necessary requirements for the analysis of a two-dimensional scale of 13 items with 5 response categories.

留言 (0)

沒有登入
gif