Curated incidence of lysosomal storage diseases from the Taiwan Biobank

Here, using genetic data obtained from the TWB, we estimated that the combined incidence of 71 autosomal recessive LSDs is between 13 per 100,000 (pathologic and likely pathogenic variants) and 94 per 100,000 (pathologic and likely pathogenic variants and variants of unknown significance). This incidence range is considerably higher than the reported prevalence among clinical cases but similar to that obtained through newborn screening. LSDs are very rare, and diagnoses are often delayed or missed; therefore, an accurate estimation of the incidence of these diseases in Taiwan is challenging, if not impossible. Therefore, estimation methods from genome-wide sequencing databases, as conducted in this study, or unbiased population screening are alternative methods for understanding the true incidence of these diseases. Thus, these approaches assist in the development of policies that address the burden of rare diseases.

The conservative estimation data in the current study were more similar to the incidence rates observed in the clinic than the extended estimation data. For example, regarding MPS I, the conservative incidence estimate (0.03; 95% CI: 0.001–0.17) is similar to the published incidence in Taiwan (0.11; 95% CI: 0.003–0.61)13, confirming that this is an extremely rare disease. However, the extended and newborn screening incidences demonstrated a wider estimation range, implying that a milder or late-onset phenotype may exist that is not easily recognized by clinicians. Further understanding of the pathogenicity of VUS, either with functional or long-term follow-up data obtained through newborn screening, may further elucidate the true incidence of MPS I.

Although we analyzed limited genomic data in this study, the general incidence obtained is similar to that obtained in a previously published large-scale biobank study of the same population16. The carrier rate of Krabbe disease (GALC gene) in the previous study was estimated to be 1.67%, similar to the current study’s estimate (0.2-2.18%). Regarding mucolipidosis type II/III (GNPTAB), the previous estimate was 0.44%, and the current estimate is 0.3–1%, and the difference between the two estimates is not significant. The comparison of Pompe disease and GAA carrier incidence between the previous study and ours is more indirect, as the previous group calculated only the allele frequency (0.38%) of GAA causing infantile-onset Pompe disease among 103,106 individuals Taiwan16; however, we included late-onset and infantile-onset Pompe disease, yielding a conservative allele frequency of 0.65%. Overall, our data support validation using a small dataset instead of a large dataset such as the biobank. Since TWB 2.0 only contained 179 known disease-relevant regions16, the use of TWB 2.0 may decrease the ability to detect rare variants in rare diseases. However, the current study demonstrated no differences when using larger SNP chip datasets versus comprehensive whole-genome sequencing (WGS) data from a small population. It would be due to the fact that only exonic and nearby intronic variants were analyzed. Further validation will be required when more WGS data become available.

In this study, the allele frequency in Taiwanese individuals was too low to calculate the variant incidence for 18 among the 71 genes encoding for the autosomal recessive LSD, and an additional 18 genes among the rest of 53 genes without pathogenic variants were recorded. For example, in NPC1 and NPC2, which cause Niemann-Pick disease type C, no NPC1 variants were identified in the WGS data from the 1495 individuals in the TWB, and only one NPC2 variant was identified. The NPC2 variant was excluded because the severity score was 4 over 13. The published prevalence of Niemann-Pick disease type C is 0.25 per 100,000 in the United Arab Emirates and 2.2 per 100,000 in Portugal2., which converts to a carrier rate of at least 1 in 400. This range indicates that the variants should have been present among the 1,495 individuals studied here. Our current data demonstrate an even lower incidence of Niemann-Pick disease type C in Taiwan, although clinical cases have been reported7. The existence of selection bias, which is the prevalence of diseases only in specific populations, requires further study. Selection bias is less likely in our study because of Taiwan’s relatively homogenous Chinese-Han population12. We are not aware of any clustering of such LSD in specific populations in this country.

Many biobanks, such as the Global Biobank Meta-analysis Initiative (GBMI), UK Biobank, Estonian Biobank and China Biobank, have been established worldwide as a result of improvements in NGS techniques. Many researchers have tried used data from different biobanks to predict the risk or prevalence of different diseases. Most select likely pathogenic variants16,17 and use the Hardy–Weinberg equation to calculate the disease incidence, as in this study. Currently, most studies rely on biobank data to determine disease incidence and identify genetic and non-genetic factors contributing to various common chronic diseases. However, to date, no additional omics studies have been conducted. In the future, it would be highly valuable to organize further omics studies to delve deeper into the underlying mechanisms and molecular aspects of these diseases. Such studies could provide a more comprehensive understanding of the diseases’ complexities and potentially lead to more targeted and effective interventions. In addition, we estimated conservative and extended disease incidences due to the uncertainty of VUS curation and to better estimate the disease incidence range. Nevertheless, because biobanks are generated using different types of omics data, such as genotype arrays and WGS, additional caution should be taken when applying the resulting datasets to estimate disease prevalence. Furthermore, those biobanks, although population-based, may not represent the general population regarding sociodemographic or health-related characteristics18 and may not be a suitable resource for determining disease prevalence and incidence rates. UK BioBank has released 50,000 exomes19 and will add an additional 200,000 exomes to become the largest open-access resource of WES data linked to health records. A better understanding of rare disease incidence is expected in the future following analyses of a larger WES dataset.

Our study did not assess X-linked LSDs because the equation for X-linked disorders requires different interpretation methods, especially for those diseases with late-onset phenotypes. For example, the newborn screening for Fabry disease by enzyme assay revealed an incidence rate of 1 in 1250 among Taiwanese males20, most of whom had the GLA IVS4 + 919 G > A variant. The incidence rate of the GLA IVS4 + 919 G > A variant is estimated to be 1 in 600 among newborns21; however, it is unknown if those individuals participated without bias in the small WGS dataset used in this study; therefore, we did not include X-linked LSD.

Another limitation of our study is associated with the short-read WGS method. For example, the GBA gene recombines with its pseudogene; thus, it is challenging to determine where the variants are located accurately using WGS. Therefore, we could only roughly estimate the incidence of Gaucher disease. However, in the newborn screening data, the incidence range was similar to that estimated from the dataset22. Thus, we consider that our results provide useful information for estimating the burden of autosomal recessive LSD, although further clarification, such as improving the methods or data from biochemical screening, may be warranted for specific conditions.

Finally, although we used the WGS database, mutations in deep introns not regarded as critical for splicing may have been missed, and copy number changes were not reported. However, we could not demonstrate a significant difference in incidence when comparing our data to available NGS data regarding biochemical and protein levels, implying the minimal impact of using such genomic data for estimation. When calculating incidence, the possibility of in-cis variants was not considered; thus, the incidence may have been overestimated. The increasing acceptance of preconception carrier screening could also influence the clinical incidence since prenatal diagnosis and abortion if the fetus is found to be affected are permitted in Taiwanese culture. Such incidence drift has been observed in thalassemia and spinal muscular atrophy carrier screening, which are performed widely in Taiwan23.

In conclusion, the current study generated useful incidence data regarding LSDs in Taiwan. Our curated, conservative estimation of incidence could guide public health measures in calculating disease or drug burdens. Our extended estimation could also facilitate newborn and high-risk screening. Incidence estimation from genomic data will improve further as the clinical significance of variants becomes better understood.

留言 (0)

沒有登入
gif