Epidemiological characteristics of tuberculosis incidence and its macro-influence factors in Chinese mainland during 2014–2021

Data collection

The surveillance data for TB incidence in Chinese mainland from 2014 to 2021 were obtained from the NNDRS, an internet-based real-time disease-reporting system. The reported cases encompass suspected case, clinically diagnosed cases and etiologically confirmed cases, aligning with the diagnosis criteria for TB stipulated and disseminated by the National Health Commission of the People's Republic of China [15]. Suspected cases were excluded from the analysis, focusing on clinically diagnosed and etiologically confirmed cases. Anonymized data included demographic details (residential ID number, sex, age, and occupation) and clinical particulars (dates of symptom onset, diagnosis date, and diagnosis category).

Demographic data by age and sex for 31 provincial-level administrative divisions (PLADs) and the Chinese mainland were collected from the National Bureau of Statistics of China (http://www.stats.gov.cn/english/Statisticaldata/AnnualData, accessed on April 20, 2023). Daily meteorological monitoring data during 2017–2019, including daily average temperature (°C, Atemp), daily average relative humidity (%, ARH), daily average wind speed (m/s, AWS), daily sunshine duration (h, SD), and daily precipitation (mm, PRE), were collected from the China Meteorological Data Sharing Service System (http://data.cma.cn/, accessed on April 22, 2023). Yearly province-level economic [gross domestic product (GDP) per capita], demographic (population, population density, sex ratio, natural population growth rate and urbanization rate), and medical and health resource data (number of medical and health institutions, number of health technicians per 10,000 population, number of beds in medical and health institutions per 10,000 population, total health expenses) during 2014–2019 were collected from the National Bureau of Statistics of China (http://www.stats.gov.cn/english/Statisticaldata/AnnualData, accessed on March 20, 2023). The definition of each indicators were in the Supplement Table 1.

The administrative regions were categorized into province-level, prefecture-level, county-level and township level administrative regions. This study focused on the 31 PLADs in Chinese mainland, stratified into seven regions. The period from disease onset to diagnosis was calculated as the time of diseases onset minus the time of diagnosis by medical and health institutions, and classified into nine groups: 0–6 days, 1–2 weeks, 2–3 weeks, 3 weeks–1 month, 1–2 months, 2–6 months, 6 months–1 year, 1–2 years and more than 2 years.

Joinpoint regression analysis

Temporal trends were analyzed using Joinpoint regression software (version 4.9.0.0, National Cancer Institute, Rockville, MD, USA). The default modeling method was the grid search method, and Monte Carlo permutation testing was the default model optimization strategy. The Bayesian information criterion (BIC) was employed as a metric for gauging good fit [16]. The average annual percent changes (AAPCs) with their 95% confidence interval (CI) were calculated for incidence rates during 2014–2021, which was subsequently computerized as a geometrically weighted average of the generated annual percent changes (APCs). The APC serves as an indicator of the average annual percentage alteration in incidence rates and is represented by the slope of the fitted line of each interval. The APC is used to evaluate the internal trend of each independent interval of a segmented function, or a global trend with a number of connected points of zero. An AAPC/APC > 0 (P < 0.05) denotes an increasing trend in incidence rates, whereas an AAPC/APC < 0 (P < 0.05) signifies a decreasing trend. Conversely, P > 0.05 indicates the trends stable. The APC can be expressed as Eq. (1), where y represents the incidence rate, x represents year, β1 represents regression coefficient.

$$\ln (y) = \beta_ + \beta_ x$$

$$APC = \left[ }_ - y_ }} }}} \right] \times 100 = (e^ }} - 1) \times 100$$

(1)

Spatiotemporal analysis

Spatiotemporal analysis was conducted using SaTScan software (version 10.1, Kulldorff and Information Management Services, Inc., Boston, MA, USA). A Poisson probability model identified clusters of TB with a temporal and spatial window of 30% [17]. By juxtaposing observed and predicted events within each location window, assuming a random distribution, probable clusters were pinpointed. The cluster exhibiting the highest log-likelihood ratio (LLR) was deemed the most likely cluster, while others were ranked as secondary clusters in a specific sequence [17]. Relative risk (RR) indicated the elevated infection risk within the cluster compared to outside it [17]. Spatiotemporal analysis was performed at both province-level and prefecture-level during 2014–2021.

Distributed lag non-linear model (DLNM)

A two-stage DLNM was used to analyzed relationship between meteorological factors and TB incidence based on the daily data during 2017–2019. The first stage involved constructing DLNM at each prefecture-level site to assess the lag and non-linearity of factors on TB risk. The DLNM based on a quasi-Poisson distribution served as the basic model for detecting possible delayed effects and nonlinear associations between exposures and TB incidence for each city in the first stage of analysis. The DLNM can be expressed as Eq. (2), where Yt represents the outcome variable, which conforms to a normal distribution, Gamma distribution, or Poisson distribution; E (Yt) represents the expectation of the dependent variable Y at time t; g represents the connection function; sj represents the nonlinear function between xj and E (Yt); uk represents other variables that have a linear relationship with E(Yt); β, γ represents the parameter vectors of xj and uk respectively [18, 19].

$$}(u_ ) = \alpha + \sum\nolimits_^ (} x_ ;\beta_ ) + \sum\nolimits_^ u_ }$$

(2)

In the second stage, a multivariate meta-regression model was constructed to capture the overall pooled exposure–response relationship in Chinese mainland [18, 20]. The cumulative effects of each independent variable on TB incidence were calculated, then lag-specific effects were calculated in different levels of variable. The crossbasis functions of Atemp, ARH, SD, PRE and AWS were built to analyze the lag-exposure–response relationship of meteorological factors. When one factor was included in the function, the other variables were set as covariates. The variance inflation factor (VIF) of meteorological factors were calculated to judge the multicollinearity between variables in different models.

The degree of freedom (df) of spline function of meteorological factors was set to three. Some studies have reported that the average incubation period of TB ranges from four to eight weeks [14], we set the maximum lag as 60 days. The sensitivity analysis was conducted by adjusting three aspects of parameters to test the robustness of our results, including the df of crossbasis (df = 3–5), the df of time variable (df = 6–8) and the lag days (lag = 55, 60, 65). The different quantiles of each independent variable (P5, P25, P75, and P95) were defined as extreme low, low, high, and extreme high levels. On the basis of the above model, taking the median of each factor as the reference value, the influence of meteorological factors on TB was discussed. RR is a measure of association which represents the change in TB incidence risk at any given Atemp compared with a reference Atemp (median value) [21], as well as for ARH, AWS, SD and PRE. The attributable fraction (AF) is a measure that quantifies the public health impact of an exposure on a factor. The AF of low and high values were calculated for each prefecture-level site and then the overall AF was estimated. Low value refers to value below the median (P50), dividing into mild low value (P5–P50) and extreme low value (< P5). High value refers to value above the median (P50), dividing into mild high value (P50–P95) and extreme high value (> P95).

The analysis mentioned above was conducted by R (version 4.0.3, R Development Core Team, USA), with package “dlnm” [22] and “splines” (R Core Team, 2021) to fit all DLNMs and the package “mvmeta” to conduct all multivariate meta-regression models.

Spatial panel data model

A spatial panel yearly data model at each province-level site were constructed to evaluate the impacts of factors from demographic, medical and health resource, and economic aspects on TB incidence. A spatial autocorrelation analysis using Moran's I and scatter plot was performed to test if there is a spatial correlation between regions, followed by spatial panel estimations with suitable models. We adopted the data during 2014 to 2019 for spatial panel model analysis. All variables were taken as logarithmic values.

Spatial autocorrelation analysis

Spatial dependence is a geographical phenomenon. The regional TB incidence has the characteristics of spatial spillover and spatial diffusion, with a great impact on the incidence of neighboring areas. The Moran's I was selected to measure the spatial autocorrelation between the incidence rate of TB in different regions. The value of Moran's I range is from -1 to 1, with Moran's I being < 0, = 0 and > 0 indicating the presence of spatial negative, no and positive autocorrelation, respectively. The Moran's I is defined as Eq. (3), where N is the number of spatial units indexed by locations (PLADs in this study) i and j, Wij is a spatial weight matrix, yi and yj refer to the observations of i and j, respectively, y refers to the incidence rate of TB, \(\overline y\) refers to the mean of y.

$$I = \frac} = 1}}^ ^ } \left( - \overline} \right)\left( - \overline} \right)} }} \left( ^ ^ } } } \right)}} = \frac^ ^ \left( - \overline} \right)\left( - \overline} \right)} } }}^ ^ } } } \right)\sum\limits_^ - \overline} \right)^ } }}$$

(3)

The spatial panel data model defines the correlation mode and degree between research units by introducing a spatial weight matrix. A spatial weight matrix is necessary for providing spatial-structure information between adjacent areas and how they interact with each other. Here, the Rook weight matrix was adopted. The spatial weight matrix is defined as W, with elements Wij indicating whether or not observations i and j are spatially close. If units i and j (≠ i) are neighbors, the spatial weight is 1; otherwise, it is 0. Wij can be written as Eq. (4).

$$W_=\left\1\\0\end\right.if\;i\;is\;contiguous\;to\;j,\;W_=1;\;otherwise\;W_=0$$

(4)

GeoDa (version 1.18.0, Luc Anselin, Urbana) was used for calculating the Moran's I and drawing the Moran scatter plots, as well as constructing Rook spatial weight matrix based on the contiguity for 31 PLADs in Chinese mainland.

Spatial panel models

The spatial panel models can effectively solve the spatial dependence of TB incidence rate. Three types of spatial panel models were considered, including the spatial lag model (SLM), the spatial error model (SEM) and spatial Durbin model (SDM). The SLM can be interpreted the spatial dependency between the dependent variables, and can be written as Eq. (5) [23]. The δ is the spatial autoregressive coefficient and Wij′ is the row standardized spatial weight matrix (Wij).

$$y_ = \delta \sum\limits_^ } ^y_ + \beta x_ + u_ + \varepsilon_$$

(5)

The SEM considers spatial lag error term, and can be written as Eq. (6) [23]. The λ refers to the spatial autocorrelation coefficient. ϕit reflects the spatially autocorrelated error term.

$$\begin y_ = \beta x_ + u_ + \phi_ \hfill \\ \phi_ = \lambda \sum\limits_^ ^\phi_ + \varepsilon_ } \hfill \\ \end$$

(6)

The SDM can be used to investigate not only the influence of local variables on dependent variables but also the influence of adjacent regional dependent variables and their independent variables, and can be expressed as Eq. (7) [24].

$$y_ = \delta \sum\limits_} = 1}}^ ^\,y_ + } \beta x_ + \gamma \sum\limits_^ ^\,x_ + } u_ + \varepsilon_$$

(7)

Model selection

The SDM was selected as the base model, and conducted LR and Wald test to determine whether SDM can degenerate into SLM or SEM. If P < 0.05, the SDM was selected; otherwise SLM or SEM were selected. The lagrange multiplier test (LM test) was applied for testing if there is a spatial error effect and a spatial lag effect, including four tests (LM-lag, LM-error, robust LM-lag, and robust LM-error tests). The selection of fixed-effect and random-effect models was determined by the objective of this study and Hausman test. We focused on the analysis in 31 PLADs and did not extrapolate the results, so we chose the fixed effects model [13]. The Akaike information criterion (AIC), BIC, and R2 were compared between time fixed, individual fixed and two-way fixed SDM to select suitable model. The Stata (version 17.0, Stata Corporation, College Station, Texas) was used for panel spatial regression analyses.

Comments (0)

No login
gif