A multivariate generalized logistic approach with spatially varying nonlinear components for modeling epidemic data

Modeling epidemic counts is an important component of epidemiological studies. Several events can define these counts: expressing the disease, recovering from the disease, and becoming deceased are among the most common defining events. As mentioned earlier, epidemics vary in characteristics and share many common features, especially in terms of the counts. This paper concentrated on infected counts, but the methodology presented could be adapted to any other type of epidemic count and epidemic. Before going into details, it is useful to set the scene based on some of the previous efforts in the area.

Undoubtedly, epidemic data collection and analyses of epidemic counts have dramatically increased because of the COVID19 pandemic. Different types of growth models, compartmental models, and models based on time series are among the most widespread approaches that use statistical frameworks to model and predict epidemic evolution. The former is highly dependent on the evolution of observed epidemic counts and vaguely dependent on the underlying epidemiological mechanism and is usually referred to as a data-driven approach (see Gamerman et al., 2021). The latter is based on a mathematical model describing the underlying epidemiological mechanism and is usually referred to as the susceptible–infectious-removed (SIR)-type model approach (for example, Kermack et al., 1927, Kermack et al., 1932, Lee et al., 2021 among many others). SIR models provide a detailed description of an epidemic and allow for useful inference about its components. This feature has been prominent in many studies that proposed addition, adaptation, and generalizations of the SIR structure and it has become very useful in understanding specific features of an epidemic.

Nevertheless, the predictions they provide have been criticized because of their inaccurate performance and because they did not capture some essential characteristics of an epidemic, such as individual variability, the social dimension of the epidemic, unexpected responses to interventions, and side effects of interventions (Tolles and Luong, 2020, Iranzo and Pérez-González, 2021). Some possible explanations include the time-varying nature of epidemics. The typically static nature of the parameters of the contemplated models renders the predictions obtained using them ineffective. The predictive performance of more elaborate models with more compartments deteriorated even more dramatically. In addition, the rich parametrization setting of compartmental models sometimes makes it difficult to identify the components that share neighboring similarities and to introduce the vital components required for joint analysis. The richer and more detailed SIR models also lead to mean counts that are not available analytically. As a result, statistical analyses are performed on numerically approximated means, making it more difficult to identify and specify spatial components. Thus, SIR models do not form a suitable basis for integrated inference involving count predictions and parameter estimation.

We focused our proposal only on data-driven models because of the above reasons. However, the approach presented can also be adapted to handle both classes of models. All it requires is the identification of components that share spatial commonalities.

In line with previous work, this paper also considered a data-driven mean specification based on the generalized logistic model given by (Gamerman et al., 2021) as Yt|θ∼φ(M(t),ζ),t=1,…,TM(t)=a[b+exp(−ct)]f, where Yt and M(t) denote the cumulative counts and their respective means at time t, φ denotes a sampling distribution, ζ denotes the other parameter(s) of φ, e.g., a dispersion parameter, c controls the infection rate, a controls the magnitude of the disease, b controls the asymptote, and f controls the asymmetry of the infection process. Hereafter, θ denotes the collection of all unknown quantities in the model, including ζ.

The above specification represents a compromise between a parametric form based on epidemiological reasoning while retaining a parsimonious specification. The epidemiological reasoning assumed is among the simplest and considers a single differential equation to guide the evolution and spread of the disease. More elaborate models that consider other aspects of the disease mechanism have been proposed. Our modeling strategy can be easily adapted to any of these parametric specifications, provided (functions of) their parameters can be interpreted as meaningful disease characteristics.

Temporal dependence is a crucial component of prediction. Once again, many other specifications are available for addressing temporal dependence and can be adapted to the setting, however they would inevitably introduce computational complication that might compromise the release of frequently (e.g., daily) updated predictions. This constant adaptation of predictions as new data become available is crucial for health planners, specially in the more severe stages of the epidemic.

Moreover, current epidemiological studies rarely consider a single region in their analyses. Multivariate counts associated with a collection of (contiguous) regions are frequently available and require a single joint study of all data. Therefore, data from different regions must be integrated. The seminal work of Besag et al. (1991) introduced a model to perform this integrated task. They assumed that the count means consisted of excess mean mortality/morbidity rates after standardization of mean counts by total population at risk. They used conditional autoregressive (CAR) distributions (Besag, 1974) to impose the similarity of these excess means across neighboring regions. Additional information on spatial disease mapping can be found in MacNab (2022). This idea is not directly applicable because of the mean specification used. It must be adapted by imposing similarities only across the mean components subject to neighboring constraints. The similarity of infection rates c in neighboring regions is expected to increase in closer regions. The same could be applied to the asymptotic behavior of the regions, more clearly identified in parameter b, but not to the other mean characteristics associated with the magnitude a and asymmetry f.

This paper proposed a methodology to incorporate the above issues into a unified, multivariate spatio-temporal framework that combines generalized logistic evolution with different joint specifications for the model components, where some of which are spatially dependent and others are not. Spatial dependence was explicitly introduced into the model via CAR distributions (Besag et al., 1991, Clayton et al., 1993) imposed on the relevant components rather than directly on the means. This hierarchical modeling strategy is common in many areas, including spatial regression (Gelfand et al., 2003) and spatial extremes (Cooley et al., 2007).

In this study all regions are jointly analyzed. An important feature of this setting is that it allowed borrowing information from neighboring regions enabling more precise predictions. We also considered the predictive performance criteria for assessing the predictions of different models. These issues are addressed in Section 2. A Bayesian model was completed with prior specifications for the remaining model parameters and hyperparameters. The analysis was implemented via MCMC (Gamerman and Lopes, 2006) using Stan (Carpenter et al., 2017).

Data on the COVID19 pandemic counts in the 27 Brazilian states was used as an illustration. The data from January 23, 2020, to August 25, 2021 was collected using the GitHub repository at (https://github.com/covid19br/site_antigo/tree/master/dados/EstadosCov19.csv). This repository is maintained by the Observatório COVID-19 BR, a voluntary group of scientists dedicated to bringing scientifically accurate information about the COVID19 pandemic in Brazil into public debate. The repository relies on the official updates published by the Brazilian Ministry of Health through the Open Data SUS portal at (https://opendatasus.saude.gov.br/dataset/srag-2021-a-2023). All the data were automatically collected and updated in the repository.

Analyzing the first wave of counts of COVID19 cases (the first 36 weeks of data) from the 27 Brazilian states helps to understand the scenario in which this paper is set. Independent analyses of the states under model (1) were conducted to visualize the possibility of spatial dependence across regions. Details of the analysis are provided in Section 2.1. Spatial similarity can be assessed by inspecting the regional variation of the estimates of key model parameters b and c.

Fig. 1 shows some of the results of independent analyses of the data. The reasoning of the previous subsection suggests that visual assessment of the spatial similarities of the parameters b and c is depicted in Fig. 1.

For both sets of parameters, there is an indication of spatial similarity across regions, with neighboring values exhibiting mostly smooth transitions. These results support the formal incorporation of spatial similarities into the modeling structure. Section 2.2 presents an approach to formally address this consideration. More importantly, this route leads to better-fitted models and better predictions.

The paper is organized as follows. Section 2 describes the models used in this work. Section 3 discusses the guidelines adopted for Bayesian inference, including details regarding prediction. Section 4 briefly discusses the implementation used to generate the results from the considered class of models. Section 5 presents the results of simulation studies and illustrates the results using real data analysis. Section 6 presents conclusions and directions for future work.

Comments (0)

No login
gif