Nondecomposable Item Response Theory models: Fundamental measurement in psychometrics

The additive conjoint measurement theory (ACMT; Luce & Tukey, 1964) is an abstract theory of measurement that shows that observed data with additive structures result in interval measures. More specifically, if two components (i.e., predictor variables) have an additive effect on a response (i.e., criterion variable), it is possible to imply an interval scale for all the three variables. Formally (Krantz, 1968), an additive structure exists for data with two components (or attributes, or variables; the terms are used interchangeably in the literature, sensitive to the context and to the tradition of the field) whenever it is possible to represent the data using a measurement scale f of two components A and X, with elements a and x, respectively, where fa,x=fa+fx,a property derived from the necessary conditions of extensive measurement (Hölder, 1901).

The ACMT has been extensively used in psychophysics (e.g., Steingrimsson, 2009, Steingrimsson, 2011), one of the predecessors of the field of psychometrics. Despite of the prior influences of psychophysics on psychometrics, the importance of measurement theory and its implications for psychometric modeling is still underdeveloped (Cliff, 1992, Heene, 2013, Markus, 2021, Michell, 2008a). In a notable exception, Perline et al. (1979) argued that the Rasch (1960) model can be thought of a stochastic version of the ACMT. To interpret the Rasch model as an instance of the ACMT, the difficulties of the items and the respondents’ parameters are interpreted as the two underlying components. A representation of the Rasch model where the form of the item response function is not assumed to be known, bi is the difficulty of item i, θj is the true score of respondent j, and pij is the probability of respondent j giving the right answer to question i, can be formally presented as pij=ϕθj−bi,where ϕ is any strictly monotone increasing function with upper limit equals to 1 and lower limit equals to 0. This is considered as an additive model because the response probability depends linearly on the smooth function of the components.

Based on the relation between the Rasch model and the ACMT, several authors in the field of psychometrics have argued that the Rasch model is the only appropriate Item Response Theory (IRT) model for developing fundamental (i.e., interval or ratio) measures in social sciences (e.g., Andrich, 1988, Bond et al., 2020, Borsboom, 2005, Scheiblechner, 2009, Wright, 1999). On the side of measurement theory, because of ACMT’s popularity (Michell, 2008a, Michell, 2008b), it is commonly assumed that an additive structure is a necessary condition for developing fundamental measures. However, Luce et al. (1990), chapter 19) present empirical conditions that can be used to derive models that do not require an additive structure of the data to derive fundamental measures. It is worth noting that the family of logistic IRT models also do not necessarily assume an additive structure of the data (Cliff, 1992, Michell, 2008a). However, the estimates of the models for the true score also do not necessarily result in fundamental measures (Heene, 2013, Scheiblechner, 2009).

Therefore, one relevant implication of measurement theory that is underdeveloped in psychometric modeling is the development of IRT models that do not assume an additive structure, but are proven to result in interval or ratio estimates for the true scores (as long as the model holds for the data). Heene (2013) revisits (Rasch, 1979) to remember its readers that the Rasch model was not developed to answer any specific empirical observation, but rather as an exercise on “Rach’s own mathematical playground”. In the spirit of Rasch intentions, the main aim of the present study is to develop IRT models derived from measurement theories other than the ACMT. More specifically, as there are several measurement theories that, in principle, could be used to this end, we aim to: (i) present the theoretical basis of the Rasch model and its relations to psychophysics’ models of utility; (ii) give a brief exposition on the measurement theories presented in Fishburn, 1974, Fishburn, 1975, some of which do not require an additive structure; and (iii) derive IRT models from these measurement theories, as well as Bayesian implementations of these models. We also present two empirical examples to compare how well these IRT models fit to real data.

In addition to deriving new IRT models, we also discuss theoretical interpretations regarding the models’ capability of generating fundamental measures of the true scores of the respondents. The rest of this paper is structured as follows. In the next section, we present some measurement theories of probabilistic utility models and how these theories are related to the development of the first IRT models. The third section is dedicated to a brief discussion on additive conjoint measurement and its relations to the Rasch model, as well as to present the five measurement theories of Fishburn, 1974, Fishburn, 1975 and how values can be set to develop identifiable IRT models. Next, we derive the IRT models from the exposed measurement theories and provide non-informative priors to fit these models with Bayesian methods. The fifth and sixth sections are dedicated to empirical examples with real data. The paper ends with a discussion and some concluding remarks on the main implications of the present study.

Probabilistic utility models are models used to evaluate and to scale decision systems regarding preferences between objects contained in a specific set of stimuli (Luce, 1958). In this framework, preferences are deemed as probabilistically consistent and, therefore, defined by the proportion p of times that subjects say they prefer the stimulus a over b, for a, b contained in the set A. The system (A, p) is a measurement model for utility and it is called a pair comparison system. Several types of representations are presented in the literature (Luce, 1958) but we focus only on the Fechnerian utility model and on the strict utility models, as these models have direct relation to the Rasch model. First, the Fechnerian utility model is represented by pab=ϕf(a)−f(b),where f is a cardinal utility function (i.e., an interval measure of the elements in A), and ϕ is any strictly monotone increasing function with upper limit equals to 1 and lower limit equals to 0.

Thurstone proposed the normal cumulative distribution function as a form for ϕ (Thurstone, 1927) and his utility model was one of the inspirations for early IRT models (Bock, 1997, Thissen and Steinberg, 1986). Later, Luce (1959) proposed the following form to define the Fechnerian utility model, which resembles the logistic implementation of the Rasch model: pab=11+exp−f(a)−f(b).One known (and interesting) fact from the literature is that the logistic Fechnerian utility model is mathematically equivalent to the strict utility model (Roberts, 1979), also known as the Bradley–Terry-Luce model, represented by pab=f(a)fa+f(b).

Two characteristics of the strict utility model are valuable for adapting it into an IRT model. First, f defines a ratio scale in the strict utility model (Roberts, 1979), while in the Fechnerian utility model, f defines an interval scale. Second, because the strict utility model implies the logistic form of the Fechnerian utility model, it is easy to convert between both models: if f(a) > 0, f(b) > 0, f′=lnf, and ϕλ=1/(1+e−λ), then pab=f(a)fa+f(b)=11+fb/f(a)=11+exp−f′(a)−f′(b)=ϕf′(a)−f′(b).

This implication and the fact that the Fechnerian utility model is mathematically equivalent to the Rasch model means that, if one accepts the assumption that the Rasch model can be used to represent the true score in an interval scale, one should also accept the fact that the Rasch model can be used to represent the true score in a ratio scale. From this fact, as the Rasch model can be seen as a form of the Fechnerian utility model, we derive an IRT model based on the strict utility model, henceforth called the Strict Rasch Model (SIRM), that is expressed by pij=θjθj+bi.Note that, in this case, b and θ are both required to be larger than 0, as if b =θ=0 the result is unidentified. However, the SIRM allows for the same theoretical interpretation as the Rasch model: with fixed bi, as θ increases, the probability of a correct answer also increases. On the other hand, with fixed θj, as b increases, the probability of a correct answer decreases. It should also be noted that the SIRM is not quite a “new” model, but just a different form of the Rasch model that results in a different conclusion about the measurement level of its attributes.

Complementary to this discussion, it is also important to note that the ratio scale interpretation of the Rasch model is not something new (Fischer & Molenaar, 1995). Indeed, Rasch himself has noticed that the logistic implementation of his model, with b interpreted as the “easiness” of the item, could be framed in a multiplicative form: pij=exp(θj+bi)1+exp(θj+bi)=expθjexp(bi)1+expθjexp(bi)=ϑjβi1+ϑjβi, with ϑj=exp(θj) and βi=exp(bi). The multiplicative form of the Rasch model has been interpreted to result in a ratio measure for the attributes. However, it should be noted that this form of the Rasch model does not result in a ratio scale, but rather in a log-interval scale (Roberts & Rosenbaum, 1986), as the scale of the attributes is arbitrary and as the additive form of the Rasch model is invariant upon the addition of an arbitrary constant.

Finally, in the traditional models of utility, both a and b are elements of the same set. In the case of Eq. (7), θj and bi are elements of two distinct sets. Therefore, measurement models such as the one derived in Eq. (7) are of a different type than the ones represent in Eqs. (3) to (6). Whenever the elements being compared come from different sets, the proposed measurement model forms a biorder measurement system formally represented as (A×X, p). This type of system can be used to derive additive composite (or conjoint) measurement (Ducamp & Falmagne, 1969). Furthermore, conjoint measurement can be considered as a factorial extension of probability utility models (Fishburn, 1974) where the stimuli, or response, can be decomposed in its basic attributes or components. Because we limit this study to nondecomposable models that have the same number of parameters (or variables, or components, or attributes) as the Rasch model, the next section focuses only on the models proposed by Fishburn, 1974, Fishburn, 1975.

In the formal analysis of biorders, a single object is represented by the pair (a, x), which is composed by the elements a and x with a ∈ A and x ∈X, where A is the first attribute and X is the second attribute. These attributes can be anything, such as the luminance of stimuli presented in each eye for the measurement of brightness (Steingrimsson, 2009), or even mass and velocity for the measurement of momentum (Krantz et al., 1971, p. 267). In the context of psychometrics, the attributes are usually assumed (Pfanzagl, 1973, chapter 11) to represent the respondents’ dispositions (i.e., true scores; A) and the items difficulty or easiness (X).

Krantz et al. (1971, chapter 6), presented conditions that must hold for the ACMT to be a valid measurement model for the analyzed data. The same authors, (Luce et al., 1990, chapter 19), discussed the case of nondecomposable conjoint measures. This latter type of models occurs when there is more than one utility, or latent, function relating any attribute to the final evaluation of the biorder system. An implication of nondecomposable models is that their structures will be nonadditive, but will also imply fundamental measures. This may be helpful in situations where the studied behavior is nonmonotonically related to the studied construct, or when form of the relation between behavior and construct changes depending on the attributes.

The first formal evaluation for all simple cases of nondecomposable conjoint measures was presented in Fishburn (1974), where two decomposable and three nondecomposable models were provided. For each model, different measurement levels can be attained for each attribute (for proofs, see Theorems k.2 in Fishburn, 1974). The first model presented in Fishburn (1974) is an additive model, formally defined as: ua,x=u1a+u2x.This model is equivalent to the ACMT, to the Rasch model, and to the Fechnerian utility model as defined in Eq. (3). The utility u1 of the attribute a can be substituted by θj, where the subscript j represents a specific respondent (i.e., j is the element a). The utility u2, of the attribute x can be substituted by bi, where the subscript i represents a specific item (i.e., i is the element x). Evidently, θj is interpreted as respondent’s j true score and bi is interpreted as item i easiness.

The second model from Fishburn (1974) assumes a multiplicative structure: ua,x=f1af2x.For this model, attributes a and x are measured in ratio scales (f1 and f2). To interpret the measures of utility similarly in Eqs. (9), (10), three basic transformations can be used. First, instead of θj, we use ϑj. Instead of bi, we use βi. Note that ϑj is also interpreted as the true score of respondent j and that βi is also interpreted as easiness of item i, but both are now measured in a ratio scale. Evidently, in this case, both models are equivalent if θj=log(ϑj) and bi=log(βi). But, also because of that, f1 and f2 are better interpreted as log-interval scales rather than ratio scales (Krantz et al., 1971, chapter 6).

The next three models from Fishburn (1974) represent cases where the effects for each attribute is accounted more than once; i.e., they are nondecomposable. Fishburn (1974) models three and four are named as “utility independent” in Keeney (1971). This means that while the effect for the first attribute is additive, for the second attribute, it is multiplicative. Fishburn (1974) model three is represented as ua,x=u1a+f1af2x,assuming u1(a) to be a measure of θj, f1(a) is another measure of θj, and f2(x) a measure of bi..

Additionally, to make sure that this model is identifiable (Fishburn, 1974, Fishburn, 1975, Keeney, 1971) we need to assume that elements a0, x0, and x1 represent arbitrary values of the attributes and that both ua0,x0=0 and ua0,x1=1. Eq. (11) can then be rewritten as: ua,x=ua,x0+ua0,x1−ua,x0ua,x1−ua,x0ua0,x,where u1a=ua,x0, f1a=ua,x1, and f2x=ua0,x. Following Keeney (1971) one can equate the values of two points ua0,x2 and ua2,x0 so the unit of measurement is fixed empirically as: ua0,x1=ua2,x0ua0,x2=1.In an IRT modeling context, the simplest way of guaranteeing that Eqs. (12), (13) are true is to set f1a=γ(u1a), where γ:R→R+, R is the set of the real numbers, and R+ is the set of the positive real numbers.

The fourth Fishburn (1974) model is similar to Eq. (11) and is defined as ua,x=u2x+f1af2x,assuming u2(x) to be a measure of bi, f1(a) a measure of θj, and f2(x) is another measure of bi. To guarantee that Eq. (14) holds (given similar restrictions due to Eqs. (12), (13)), one can set f2x=γ(u2x). It should be noted that, in this model, there is only a multiplicative effect of the second component, which, as discussed in more detail in the next section, when applied in an IRT context, will result in a model similar to a restricted two-parameter logistic model.

Finally, the fifth model by Fishburn (1974) is a combination of the first and second models, resulting in a measurement theory with both additive and multiplicative effects: ua,x=u1a+u2x+f1af2x.To scale the attributes of this model (obeying the conditions presented in Theorem 19 from Luce et al., 1990, chapter 19), Fishburn (1974) proposes to set the origins and the scale of unit of the utilities according to: ua0,x0=0;u1a0=u2x0=f1a0=f2x0=0;ua,x0=u1a, and ua0,x=u2x;f1b0f2y0=1, and f1b0=f2y0=1;ua0,x0+ub0,y0−ua0,y0−ub0,x0=1. This set of conditions result in that f1a=ua,y0−ua,x0−ua0,y0 and f2x=ub0,x−ua0,x−ub0,x0. One of the simplest ways of guaranteeing these conditions hold is to set u1a=f1a and u2x=f2x. This decision for the value of the parameters will result in a model similar to the latent space IRT model with multiplicative interaction map proposed by Jeon et al. (2021).

The five measurement theories proposed by Fishburn (1974) have had only a few applications in the literature (e.g., Eliashberg & Hauser, 1985) and none in psychometrics (to the best of our knowledge). The first two models inspired by Fishburn (1974), as well as the SIRM, are equivalent to the Rasch model. However, the other three models inspired by Fishburn (1974) have no equivalent in the literature. In terms of practical implementation of the proposed models, both maximum likelihood estimation and Bayesian methods can be used to fit the models. We will focus only on Bayesian methods of joint likelihood estimation as they can simultaneously estimate the parameters of respondents and of items, as well as to provide uncertainty distributions for the parameters (Fox, 2010, Gelman et al., 2013). Using a logistic function as the Item Response Function (IRF), we propose an IRT model, named as the additive Fishburn model (AFM), as the implementation of the model in Eq. (8): pij=11+exp−θj+bi.

The AFM is exactly the same model as the Rasch model, with bi interpreted as the item “easiness” instead of the item “difficulty”. Changing the value of bi on the IRF simply shifts the IRFs horizontally, as seen in Fig. 1. More difficult items are shifted to the right and are represented on darker dotted lines. Easier items are shifted to the left and are represented on lighter dotted lines. For the full specification of a Bayesian model, it is necessary to set priors for the parameters. Therefore, as non-informative priors, both θj and bi can be set to have a normal distribution with means equal to 0, while the standard deviations of bi is set to 1 and the standard deviation of θj receives a hyperprior parameter set to a gamma distribution with shape and rate equal to .001.

Because the parameters of the second model are measured in a ratio (or log-interval) scale, negative values for the attributes are not possible and only (power-)multiplicative transformations of these scores are meaningful (Roberts & Franke, 1976). One could, of course, use the multiplicative form of the Rasch model to implement Eq. (10) as a IRT model. Another possible strategy is to rely on the fact that the logistic function is an offset and scaled version of the hyperbolic tangent function and use the latter as a proper IRF for the model represented by Eq. (10). Therefore, we derive an IRT model from Eq. (10), which we name the multiplicative Fishburn Model (MFM), using the following IRF: pij=211+exp−2ϑj

留言 (0)

沒有登入
gif