Optimized segmented regression models for the transition period of intervention effects

Classic segmented regression (CSR)

$$Y_ = \beta_ + \beta_ \times time + \beta_ \times intervention + \beta_ \times \text + \varepsilon_ .$$

(1)

\(_\) is the value of the outcome series at time point \(t\). \(time\) is an indicator variable of the time point (\(time = 1,2,3, \ldots ,T_\)) and spans the first and last observation points. \(_\) is the time point at which the intervention is implemented (nominal intervention time), and \(_\) is the length of the entire time series. A dummy variable, \(intervention\), was used to represent the implementation of the intervention. The dummy variables 0 and 1 values represent pre- and post-intervention, respectively. The time elapsed after the nominal implementation of the intervention is monitored using the \(\text\) indicator variable. The value of \(\text\) is first set to 1 during the post-implementation phase and then increases over time (\(\text = 1, 2, 3, \ldots , \;T_ - T_ < T_\)). The random error term for time point \(t\) is\(_\). Before the implementation, the outcome series' baseline trend is depicted by \(_\). \(_\) reflects the instant effect of the intervention on \(_\). The long-term impact of the intervention consists in the change in the trend of the outcome time series (slopes), represented by\(_\). The matrix expression of Eq. (1) is:

$$}=}}_}}}}\boldsymbol}+};\boldsymbol}=_,_,_,_\right]}^,$$

(2)

where

$$}}_}}}}=\left[\begin1& 1& 0& 0\\ \vdots & \vdots & \vdots & \vdots \\ 1& _& 0& 0\\ 1& _+1& 1& 1\\ \vdots & \vdots & \vdots & \vdots \\ 1& _& 1& _-_\end\right].$$

Optimized segmented regression (OSR)

In the optimized model (Eq. 3), we model the transition period using different forms of CDFs as follows:

$$Y_ =\, \beta_ + \beta_ \times time + \beta_ \times F\left( t \right) \times intervention + \beta_ \times F\left( t \right) \times \text + \varepsilon_ .$$

(3)

The piecewise function \(F\left(t\right)\) is:

$$F\left( t \right) = \left\l} } \right),} \hfill & < t \le T_ ;} \hfill \\ \hfill & < t} \hfill \\ \end } \right..$$

(4)

where \(_\) is the nominal intervention time and has the same definition as the CSR model. \(_\) is the end time of the transition period, \(_=_+L\), where \(L\) stands for “transition length”. The effect of the intervention is assumed to last from \(_\) (first implementation) to \(_\) (fully valid): the transition period \(\left[_,_\right]\). \(CDF\left(t\right)\) represent the CDFs of the different distribution patterns of the intervention effect during the transition period.

The variable assignments (\(\), \(intervention\), and \(\text\)) of the optimized model and the meanings of the corresponding coefficients were the same as the CSR model. The matrix expression of Eq. (3) is:

$$}=}}_}}}}\boldsymbol}+};\boldsymbol}=_,_,_,_\right]}^,$$

(5)

where

$$}}_}}}}=\left[\begin1& 1& 0& 0\\ \vdots & \vdots & \vdots & \vdots \\ 1& _& 0& 0\\ 1& _+1& 1*CDF\left(1\right)& 1*CDF\left(1\right)\\ \vdots & \vdots & \vdots & \vdots \\ 1& _+L& 1*CDF\left(L\right)& L*CDF\left(L\right)\\ 1& _+L+1& 1& L+1\\ \vdots & \vdots & \vdots & \vdots \\ 1& _& 1& _-_\end\right].$$

Distribution patterns of intervention effects—CDFs

\(CDF\left(t\right)\) are the CDFs of the corresponding PDFs for the different distribution patterns of the intervention effect during the transition period. The PDFs represent how the effect of the intervention is distributed during the transition period \([_,_]\) and the values of the corresponding CDFs taken at specific points are used for modeling, that is, \(CDF\left(1\right),\dots ,CDF\left(L\right)\). In this study, we mainly discuss the common distributions: (1) uniform distribution, (2) normal distribution, (3) log-normal distribution (right-skewed distribution), and (4) log-normal flip distribution (left-skewed distribution). The CDFs and the corresponding PDFs are shown in Fig. 1.

Fig. 1figure 1

Schematic diagram of CDFs and corresponding PDFs for different distribution patterns of intervention effects

For the normal and log-normal distributions, their PDFs are respectively defined in the domain \([- \infty , + \infty ]\) and \([0, + \infty ]\). We truncated the PDFs so that we can describe the effect of the intervention at a fixed interval \([_,_]\). The probability of occurrence of a fixed interval can be determined by integrating the PDF. For the normal and log-normal distributions, we chose \((\mu -3\sigma ,\mu +3\sigma\)) and \((^,^)\), respectively, to truncate them such that the probability of occurrence in the fixed interval is up to 99.97%. Matching the truncated interval to our assumed time range \([_,_]\), we have \(\left\_=\mu -3\sigma \\ _=\mu +3\sigma \end\right.\) for the normal distribution and \(\left\_=^\\ _=^\end\right.\) for the log-normal distribution. The intervention was essentially fully effective at \([_,_]\). The truncated intervals of the normal and log-normal distributions are shown in Fig. 2. For the log-normal flip distribution, we only needed to apply an axisymmetric flip transformation to the truncated log-normal distribution. The log-normal and log-normal flip distributions represented the right-skewed and the left-skewed distributions, respectively, and accordingly indicated that intervention effects are concentrated in the front or the back part of the transition period \([_,_]\).

Fig. 2figure 2

Schematic diagram of the truncated probability distribution for the normal and lognormal distributions

Uniform distribution pattern (UD)

For a uniform distribution in the interval \([_,_]\), its PDF and CDF are:

$$\left\l} \left( t \right) = \frac;} \hfill \\ \left( t \right) = \frac;} \hfill \\ \end } \right. \quad 1 \le t \le L.$$

Then \(}}}}_=\left[_\left(1\right),\dots ,_\left(L\right)\right]=\left[\frac,\dots ,1\right]\).

Normal distribution pattern (ND)

For a normal distribution, its PDF and CDF are:

$$\left\l} \left( x \right) = \frac \sigma }}\exp \left( }} }}} \right);} \hfill \\ \left( x \right) = \mathop \int \nolimits_^ PDF_ \left( y \right)dy;} \hfill \\ \end } \right.\quad - \infty \le x \le + \infty .$$

The PDF of the normal distribution is an infinite integral; we truncated its PDF and calculated its mean \(\mu\) and standard deviation \(\sigma\), as \(\left\\sigma =\frac; \\ \mu =_+3\sigma .\end\right.\) At one specific time point \(t\),

$$CDF_ \left( t \right) = \mathop \int \nolimits_ }^+t} PDF_ \left( x \right)dx;\quad 1 \le t \le L.$$

Then \(}_=\left[_\left(1\right),\dots ,_\left(L\right)\right]=\left[__}^_+1}_\left(x\right)dx,\dots ,__}^_}_\left(x\right)dx\right]\).

Log-normal distribution pattern (LND)

For a log-normal distribution, in its definition domain \([0,+\infty ]\), its PDF and CDF are:

$$\left\l} \left( x \right) = \frac }}exp\left( x - \mu } \right)^2 }} }}} \right);} \hfill \\ \left( x \right) = \int\nolimits_^ (y) dy = \int\nolimits_^ }}e^ y - \mu ) ^ } } } dy = \int\nolimits_^ e^ y - \mu ) ^ } } } d\left[ y - \mu } \right)}} \sigma }} \right]} } } } \hfill \\ \end .} \right.$$

When an upper limit exists, this integral cannot be solved using algebraic operations; its integral is usually expressed in the form of an error function as follows.

$$CDF_ \left( x \right) = \frac\left\ \right)}}} \right]} \right\};erf\left( x \right) = \frac\mathop \smallint \limits_^ e^ }} dy.$$

Assuming that \(_\left(x\right)=\frac\left\\sigma }\right]\right\}\) = 0.5, with the integral symmetry, the median coordinate of the log-normal distribution is \(x=^\). The corresponding coordinate interval where the sample falls near the median with a distance of \(3\sigma\) standard deviation is \((^,^)\). Here, we used the same strategy as that for the truncated PDFs in the normal distribution. However, the log-normal distribution is skewed; thus, we additionally set its skewness ratio, which is defined by the ratio of the release time of the half effect of the intervention in a total transition period of intervention, i.e., \(Ratio=\frac^-^}^ - ^}\). For instance, in the context of a 12-session training course spanning three months, the parameter \(Ratio\)=\(\frac\) of the lognormal distribution implies that half of the training sessions were concluded within the initial month, specifically six sessions. The degree of skewness, denoted by the \(Ratio\), depends on the skewness of the actual intervention effect during the transition period \([_,_]\). Correspondingly, we truncated its PDF and calculated its mean \(\mu\) and standard deviation \(\sigma\), as \(\left\^-^=Ratio*L;\\ ^-^=L.\end\right.\) At one specific time point \(t\),

$$_\left(t\right)=\frac\left\\left[\frac_+t)-\mu \right)}\sigma }\right]\right\}, 1\le t\le L.$$

Then \(\begin}}_&=\left[_\left(1\right),\dots ,_\left(L\right)\right]\\ &=\left[\frac\left\\left[\frac_+1)-\mu \right)}\sigma }\right]\right\},\dots ,\frac\left\\left[\frac_-\mu \right)}\sigma }\right]\right\}\right] \end\).

Log-normal flip distribution pattern (LNFD)

For the log-normal flip distribution, we applied only an axisymmetric flip transformation to the truncated log-normal distribution. We chose the midpoint coordinates \(x=_+\frac\) of the transition period as the axis of symmetry to perform the axisymmetric flip transformation of the log-normal distribution, allowing us to obtain the log-normal flip PDF and integrate it to obtain its CDF. The schematic diagram of the axisymmetric flip transformation is shown in Fig. 3.

Fig. 3figure 3

PDFs and CDFs of log-normal distribution and log-normal flip distribution

According to the symmetry of the axisymmetric flip transformation, then we have

$$\begin }_ & = \left[ \left( 1 \right), \ldots ,CDF_ \left( \right),CDF_ \left( L \right)} \right] \\ & = \left[ \left( L \right) - CDF_ \left( \right), \ldots ,CDF_ \left( L \right) - CDF_ \left( 1 \right),CDF_ \left( L \right)} \right]. \\ \end$$

By modeling the four above-mentioned distribution patterns of the intervention effect, we developed four OSR branching models: OSR-UD, OSR-ND, OSR-LND, and OSR-LNFD.

Length of the transition period

In most cases, the length \(L\) of the transition period and the distribution pattern of the intervention effect are determined by the implementation process. When there was no information about the implementation process, we used a data-driven approach to select \(L\) for the above four distribution patterns of intervention effect and described the application process of the optimized model.

First, we set the maximum possible range for \(L\) selection, that is, the \(_\) (\(_=\max L\)). We then applied the optimized OSR model directly to all scenarios (\(L=0, 1, 2,\dots ,_\)), and \(_+1\) scenarios for each OSR branching model for a total of \(4\times (_+1)\) scenarios. \(L=0\) corresponds to the CSR model; that is, there is no transition period. For the different distribution patterns of the intervention effect, we selected the value of \(L\) corresponding to the minimum MSE in all scenarios.

留言 (0)

沒有登入
gif