After applying the exclusion criteria to the 502,368 UKB participants, approximately 221,000, 210,000 and 90,700 participants were available for analysis for PD, T2D and dementia, respectively, for both “HES only” and “HES + GP” populations (detailed flow charts in Supplementary Figs. 2–4).
Table 1 shows that for all three diseases, including GP data at least doubled the number of incident cases compared with those diagnosed when only using HES/Death data (“HES only” population). For example, the number of incident cases for dementia increased from 619 in the “HES only” population to 1390 in the “HES + GP” population. Note that in the “HES + GP” population, cases diagnosed in the GP data prior to baseline were excluded, and therefore the number of cases diagnosed in the HES/Death data will be lower than that in the “HES only” population.
Table 1 Incident cases among the “HES only” and “HES + GP” study populations for each disease. The rows show the “number of incident cases / number of participants” and follow-up period. IQR: interquartile range. We note that the “HES only” and “HES + GP” population sizes are slightly different; this is because incident cases in the “HES only” population can become prevalent cases in the “HES + GP” population, where prevalent cases in the GP data were excluded from the “HES + GP” population, as shown in Fig. 1Figure 2 shows that of the 786 dementia cases (before GP censoring date) in the “HES + GP” population that were initially only recorded in the GP data, 421 appeared later in HES/Death data, after GP censoring date. Similar phenomena were observed for PD and T2D in Table 1 (detailed Venn diagrams in Supplementary Figs. 5, 7, and 9).
Combining the numbers in Table 1; Fig. 2, we can examine among the prevalent cases captured by the GP data but excluded from the “HES + GP” population, how many subsequently appeared in the HES/Death data. Table 1 shows that 32 (= 90,700 − 90,668) individuals from the “HES only” population for dementia were excluded in the “HES + GP” population. Figure 2 shows that 604 dementia cases in the “HES + GP” population were captured in the HES/Death data before the GP censoring date, compared to the 619 dementia cases in the “HES only” population (Table 1). We can therefore conclude that of the 32 individuals identified as prevalent dementia cases in the GP data, only 15 (= 619 − 604) were subsequently captured in the HES/Death data; the remaining were incorrectly regarded as non-cases in the “HES only” population. Similar considerations apply to the Supplementary Figures for PD and T2D.
Fig. 2Venn diagram comparing incident cases of dementia from HES/Death and those from GP records in the “HES + GP” population (n = 90,668). Among the 786 cases in GP (but not in HES/Death) data prior to the GP censoring date, 421 appeared in HES/Death later; i.e. 365 (= 786 − 421) cases were unique to the GP data even after allowing for the extended follow-up in the HES/Death data. Please see Sect. 3.4 and Supplementary Table 6 for details on different censoring approaches. Using the HES/Death data beyond the GP censoring date, 2218 (= 2639 − 421) further cases were recorded in the HES/Death data, but we do not know how many appeared in the subsequent GP records due to the lack of these records after 2016–2017. Dth: Death
For incident cases present in both HES/Death and GP data during the full follow-up period (i.e. until the HES censoring date), we plotted histograms (Supplementary Figs. 5, 7, 9) showing the distributions of the time difference (i.e. lag) between diagnosis dates between the two data sources. These median (interquartile range, IQR) time differences in years were 2.31 (0.83, 4.60) for PD, 2.82 (1.07, 5.30) for T2D, and 2.25 (0.76, 4.20) for dementia.
We note that the above represents the latency (i.e. time between a diagnoses being recorded in GP compared with HES/Death data) among those who had records in both GP and HES/Death data. For participants with an incident GP diagnosis, only 65.6% of dementia cases, 69.0% of PD cases, and 58.5% of T2D cases had their initial GP diagnosis recorded in HES/Death within 7 years since GP diagnosis (Supplementary Figs. 6, 8, and 10).
We note that these numbers reflect recorded diagnoses made during the available follow-up period (that differs for each participant). For example, if a participant had a GP diagnosis of dementia in 2016 (i.e. close to GP censoring date), and was followed up for a further 5 years until 2021 (i.e. close to the HES censoring date), this might not be long enough for the diagnosis to be captured in the HES record. In contrast, a participant with an earlier GP diagnosis (e.g. 2010) would have had a longer time period in which their diagnosis could be captured in the HES data.
The baseline characteristics of the “HES only” and “HES + GP” populations are very similar. The overlapping variables of the three diseases for the “HES + GP” population are shown in Table 2. Detailed baseline characteristics of both populations are in Supplementary Tables 4–5.
Table 2 Baseline characteristics of the “HES + GP” population for Parkinson’s disease (PD), type 2 diabetes (T2D), and dementia. Family history represents family history of PD, T2D, and dementiaCumulative incidenceTo illustrate differences in cumulative incidence stratified by a risk factor, we plotted the age-specific cumulative incidence of each disease stratified by family history - a common predictor for all three diseases. Figure 3 shows that for PD and T2D, the additional GP data approximately doubles the number of incident cases across all ages, regardless of family history. This trend is maintained for dementia, but less prominent towards the older age of 75–80 years. These age-specific cumulative incidence plots are overall consistent with the incident cases shown in Table 1.
Fig. 3Age specific cumulative incidence plots by family history, for all three diseases. Note that the ranges of the y-axis are different in the three subplots
Results obtained from Cox modelsWe built Cox proportional hazard models for each of the disease outcome defined in Methods. The resulting forest plots (Figs. 4, 5, 6 and 7) show the HR obtained from the “HES only” and “HES + GP” populations, respectively (details in Supplementary Tables 9–12). Similar results were obtained using complete-case analyses (Supplementary Figs. 12–15 and Supplementary Tables 13–16). The HR are largely in the same direction, and of comparable magnitude, indicating the overall agreement between the two populations. The confidence intervals (CI) of the HR obtained from the “HES + GP” populations are narrower than those from the “HES only” population, due to the increased statistical power from the additional incident cases in GP data.
To provide a statistical comparison between the two HR, we calculated the corresponding RHR and used bootstrap to obtain its 95% CI. An RHR < 1 means the “HES only” population yields a smaller HR than the “HES only” population, and vice versa. Among overlapping risk factors, only age had a statistically significant RHR for all three health outcomes by source of case ascertainment.
Our estimated effect of “hearing loss” on dementia in the “HES only” population (HR = 0.96, 95%CI 0.81, 1.14) is in the opposite direction to the existing literature, and therefore we performed additional analyses to examine this inconsistency. Our results (Supplementary Tables 7–8) showed that this is partly due to the short follow-up period in our analyses, in which we censored both populations by the GP censoring date, which is approximately 5 years earlier than the HES censoring date. In an additional sensitivity analysis using a longer follow-up period (i.e. HES censoring date) (Supplementary Tables 7–8), the HR of “hearing loss” in “HES only” population returned to the expected direction (HR = 1.04, 95% CI 0.97, 1.12). These results show that on occasion having limited follow-up period in primary care data may alter conclusions about a risk factor association.
High BMI appears to be inversely associated with incident dementia risk in both “HES only” and “HES + GP” populations. This is most likely caused by reverse causation owing to the short follow-up period of this analysis (we censored both populations by the GP censoring date).
Fig. 4Forest plot showing hazard ratios (HR) obtained from the Cox proportional hazard models for Parkinson’s Disease (PD), using the “HES only” and “HES + GP” populations, respectively. The corresponding ratio of HR (RHR) is shown with its 95% CI obtained from bootstrapping
Fig. 5Forrest plot for Type 2 Diabetes (Male only)
Fig. 6Forrest plot for Type 2 Diabetes (Female only)
Fig. 7
Comments (0)