Parametric and non-parametric Poisson regression for modelling of the arterial input function in positron emission tomography

Terminology

During each PET measurement, there are a series of blood samples taken over time. For simplicity, examination will be henceforth used to describe the whole series of samples measured while a participant is in the PET system, while sample will be used to describe each individual blood sample recorded over the course of the examination.

Model definitionLikelihood function

Poisson regression is commonly used for modelling variation in count data (i.e. 0, 1, 2,...) [12, 22]. It is a form of generalised linear and nonlinear models [23] which canonically uses the Poisson distribution as its data distribution. To build a generalised linear or nonlinear model, a link function is required. The log link is conventional for count data as it restricts the expected value (i.e. the rate parameter) to positive values, which is also important for the mean-variance relationships of these models as the variance cannot be negative. We have made use of the log link function and the following general model definition for all models presented below. The model is hence constructed as follows:

$$\beginy_i \sim }(\lambda _) \quad \text \end$$

(1)

$$\begin\log )} = f(t_i, \theta ) + \log )} \quad \text \end$$

(2)

From the data distribution, i represents each recorded sample during the PET examination, and \(\lambda _i\) represents the corresponding Poisson rate, i.e. the expected value of the outcome \(y_i\), which is an estimate of the number of counts. The log link is applied through the use of the log-transformation of the rate, \(\log )}\), in the model definition. \(f(t_i, \theta )\) represents the model function used to describe the kinetics of the parent compound in the arterial plasma depending on unknown parameters, \(\theta\), at time, \(t_i\). Lastly, because \(\lambda _i\) refers to the rate of events, it must take into consideration differences in exposure (\(\tau _\)), e.g. if two blood samples are drawn at the same time, and sample A consists of twice as much blood as sample B, then we would expect approximately twice as many counts recorded for A—yet the rate of counts per unit volume is the same. For this reason, we incorporate these differences in exposure into the model through the use of the logarithm of the exposure, \(\log )}\), which is called the model offset in the terminology of generalised linear and nonlinear models [12]. The model offsets are calculated a priori for each sample.

It should be noted that, in contrast to the Gaussian distribution which has mean \(\mu\) and variance \(\sigma ^2\) parameters, the Poisson distribution has only one parameter, the rate \(\lambda\). This is because, for a Poisson random variable, the variance is exactly equal to the mean. However, this relationship is not always seen in practice for count data. In cases for which there is less or more relative variance in the number of measured counts, the data are said to be under—or over-dispersed, and the Poisson distribution is not able to accommodate this. This can be assessed from visual inspection of Q–Q plots [41], here generated using R [32] and the scam [30] and mgcv [44] packages.

For this reason, we also considered the negative-binomial (Eq. (3)) likelihood function, which is a useful generalisation of the Poisson distribution that can allow for over- or under-dispersion, i.e. when the variance is greater than or less than the Poisson model would indicate. When the negative binomial distribution was used, then we exchanged Eq. (3) for Eq. (1). Indeed, some texts claim that it is even usually appropriate in Poisson regression to include an additional parameter to capture overdispersion [12]. This use of the term dispersion is not to be confused with the external or internal dispersion of the measured radioactivity signal from its passage through tubing or vasculature respectively. To differentiate the two, we have always referred to statistical dispersion with over- or under-dispersion; and dispersion of the measured radioactivity signal as either internal or external dispersion. While the canonical parameters of the negative binomial distribution are the probability (p) and number of successes (k), these can be reparameterised to a dispersion parameterisation as an extension of the Poisson family as follows:

$$\begin y_i \sim }(\mu _, \psi ) \end$$

(3)

where \(\mu\) represents the rate, analogous to \(\lambda\) for the Poisson distribution, and \(\psi\) is an additional parameter representing the dispersion such that the variance, equal to \(\mu + \psi \mu ^2\), where \(\psi =0\) corresponds to a Poisson distribution [2].

Nonlinear model

We made use of both parametric and non-parametric models. By parametric models, we mean that each of the parameters has a specific meaning in the context of the model, and that the predicted values can be generated from the model definition and the parameters themselves. By non-parametric models, we mean that parameters do not have any specific meaning in and of themselves, but only as a function of the basis functions generated [7, 14]. Hence, the generation of model predictions from nonparametric models requires the choice of basis functions as well as the corresponding estimated coefficients.

For the parametric model, we used a sum of three decreasing exponentials model as follows:

$$\begin f(t_i, \theta ) = \sum _^} \end$$

(4)

in which parameters \(A_\) and \(\gamma _\) must be estimated.

For non-parametric models, we made use of GAMs (generalised additive models) considering both general and shape-constrained basis functions. For shape-constrained functions, we used monotonically decreasing B-spline basis functions [31], and for basis functions without shape constraints, we used thin-plate basis functions [43]. Smoothing penalties were estimated using restricted maximum likelihood (REML), which has been shown to be less susceptible to undersmoothing compared to generalised cross-validation [33, 44, 45].

The choice of the basis dimension for smoothing terms is a modelling choice made as part of the model-building process when using penalised regression spline models [45]. In theory, it is not possible to select the basis dimension which is optimal, as this would require that we know the “true” smoothness of what is being estimated. In practice, setting the basis dimension sets an upper limit on the flexibility of a smoothing term, and the actual flexibility of that term is determined by the smoothing parameters. It is hence most important that the basis dimension is not set too low as to be excessively restrictive, but selection of a basis dimension which is not too high can also help to guard against overfitting without relying too heavily on suitable estimation of the smoothing parameters. A fuller discussion of these considerations, and the various diagnostics that can be used to guide appropriate selection, can be found in [45].

Offsets

In conventional estimation of the AIF, measured counts must first be converted to radioactivity concentrations, and then corrected in a series of steps finally to derive the arterial plasma parent radioactivity curve which constitutes the AIF, which can then be modelled. For the Poisson model, these correction factors are not applied directly to the measured data prior to modelling, but rather incorporated into the model itself as a series of offsets.

Importantly, the particular corrections and adjustments applied differ between research groups and data sets based on which data and parameters are recorded, the equipment used, and which are deemed necessary. As such, the particular corrections implemented here as offsets are not meant to represent an exhaustive list, and can be expanded or contracted depending on the particular dataset and application. For instance, we have performed external dispersion correction of ABSS samples (to correct for dispersion through tubing), but many groups consider this step unnecessary; and we have not performed internal dispersion correction (to correct for dispersion through vasculature) [17] or correction for extraction efficiency of the HPLC (e.g. [10]). The goal of this paper is to demonstrate the application of Poisson regression to AIF data as a proof of concept, and so we have made the decision to make use of the same corrections as the research group from which the data originated.

For all fitted models, we made use of the same combination of variables, described below in the rest of this section, to account for differences in exposure between samples. Offset variables were either directly measured, indirectly calculated, or estimated (see below). The total offset for each sample, \(\log )}\), was defined as the sum of natural logarithms of all of the M offset variables, i.e.

$$\begin \log )} = \log )} + \log )} + \cdots + \log )}. \end$$

Directly measured offset variables include measurement duration in the gamma counter and sample volume. Indirectly calculated offset variables include radioactive decay relative to injection time, a volume calibration, and an external dispersion correction factor. Radioactive decay was calculated using the half-life of the radioisotope, in this case carbon-11. Volume calibration has historically been performed for the manual discrete data originating from this PET centre using a previously measured calibration function for the gamma counter to account for a very small deviation from linearity in its measured counts: \(e^}\), where \(\rho\) represents the calibration factor. External dispersion correction is applied for blood samples collected using the ABSS to correct for the dispersion over time of the measured signal owing to the sticking effect as the blood travels through the tubing. Here, dispersion correction was applied a priori to ABSS blood radioactivity values to estimate an external-dispersion-corrected blood radioactivity value for each ABSS sample using equations (5) and (6) from [25] implemented in kinfitr [21, 36]:

$$\begin \int _0^t C_ }(T) \text T =\kappa C_ }(t)+\int _0^t C_ }(T) \text T \end$$

(5)

where \(\kappa\) represents the previously estimated dispersion constant of the system, and \(C_ }\) and \(C_ }\) represent the true and measured blood radioactivity concentrations respectively. This equation is solved for \(C_ }(t)\) by interpolating (5) over a uniform grid with spacing \(\Delta t\) and estimating \(C_ }(t)\) as follows:

$$\begin C_ }(t)&=\frac C_ }(T) \text T-\int _0^ C_ }(T) \text T} \end$$

(6)

For the purpose of incorporating external dispersion correction into the model as a multiplicative offset, we calculated a dispersion-correction factor by dividing the dispersion-corrected radioactivity values by the uncorrected values. For all manual samples collected after 10 min, the dispersion factor was set to 1, i.e. no correction, as these samples did not travel through the tubing.

Lastly, modelled offset variables included the BPR and the parent fraction. BPR is used to estimate the plasma radioactivity for samples in which only the whole-blood radioactivity was measured (such as the ABSS samples). For the BPR, using all manual samples for which both whole-blood and plasma radioactivity were measured, we fit a thin-plate regression spline with a basis dimension of 10 to the ratio of whole-blood to plasma radioactivity for each examination using mgcv [44]. For the offset, model estimates were estimated for all time points in which only whole-blood radioactivity was measured, and the BPR was set to 1 for all plasma samples. Similarly, for the plasma parent fraction, we fit a sigmoid function as described in [15], which has previously been validated for the modelling of [11C]PBR28 parent fraction data [26], to all parent fraction samples for each PET examination, and estimated parent fraction values for each time point.

Subjects and data

As a case of study, we considered here the [11C]PBR28 data first reported by [5], which describes the full measurement and acquisition protocols. We used a subset of 10 individuals for whom the original whole blood and plasma count data were available. For each participant, arterial whole blood was sampled each second for 10 min using an ABSS (Allogg, Mariefred, Sweden). Background radioactivity, measured as the mean number of counts in the whole blood radioactivity measurements prior to the rapid ascent for each participant, was subtracted from these measurements by calculating the mean number of counts recorded per second in the samples recorded prior to injection, and subtracting this number from the counts recorded each second after injection. In parallel, manual arterial blood samples of 1–3 mL were drawn at 1, 3, 5, 7, 9, 10.5, 20, 30, 40, 50, 60, 70, 80 and 90 min post radiotracer injection. Manual samples collected before 10 min were drawn from the end of the tubing of the ABSS system, while samples recorded after 10 min were drawn directly from the arterial cannula without travelling through the tubing. Radioactivity was measured in each sample, followed by centrifugation to obtain between 0.8 and 1.5 mL of arterial plasma, in which radioactivity was also measured. Plasma parent fraction was measured at 1, 3, 5, 10.5, 20, 40, 60 and 90 min after injection.

We fit all Poisson regression models described below to the samples following the peak, i.e. the descent of the curve. This constitutes the majority of the measured curve, and the parametric tri-exponential AIF model is often applied to only this section of the curve with either linear interpolation (e.g. [11, 19]) or a linear model (e.g. [24, 28]) applied to the short ascending phase. It should be mentioned that there are are other parametric AIF models which can fit the whole curve though [37, 40, 42]. By focusing on the descending part of the AIF, we also avoid having to accommodate the discontinuity in our smooth spline nonparametric models, as the focus of this manuscript is on demonstrating the use of Poisson regression for AIF modelling.

Software

All modelling was performed using R [32], using the gnm [38] package for parametric modelling, the scam [30] package for shape-constrained GAMs, and mgcv [44] for GAMs without shape constraints. All data and R code are provided in an open repository: https://github.com/77liner/ModelAIF.

留言 (0)

沒有登入
gif