Accurate digital quantification of tau pathology in progressive supranuclear palsy

Donors and brain regions

A total of 240 formalin-fixed paraffin embedded slides were obtained from 32 brains (2–10 slides per brain, median = 8.5, IQR = 6) donated by patients with a clinical and pathological diagnosis of progressive supranuclear palsy (PSP) that also meet Rainwater criteria of PSP (Table 1) to the Cambridge Brain Bank under the Neuropathology Research in Dementia (NERD) study with ethical approval from the Wales 6 Research Ethics Committee. The slides included 185 cortical slides (29 pre-frontal, 21 premotor, 20 primary motor, 22 primary somatosensory, 23 temporal, 20 parietal, 28 occipital, 22 cingulate), 25 basal ganglia and 30 cerebellar (dentate nucleus) slides. Of the 240 slides, 13 slides were used for model development and 6 as a held-out test set. Training and held-out test slides were annotated by a trained expert (TP), and a neuropathologist (AQ) independently annotated the held-out test slides to calculate the inter-rater reliability. Following pipeline development, 227 novel slides were used for validation against the PSP staging scheme [2] and all slides were used for further analyses.

Table 1 Clinical and neuropathological data of donor participants with pathological diagnosis of PSP in the studyTissue processing and immunohistochemistry

Immunostaining for hyperphosphorylated tau using AT8 (MN1020, Thermo Scientific, USA) was performed, followed by 3,3′-diaminobenzidine (DAB) staining to visualise pathological tau as a brown reaction product. Counter-staining was performed using haematoxylin to visualise cell nuclei as blue reaction products. Slide images were acquired by an Aperio AT2 whole slide scanner (Leica) at 40× magnification.

Image pre-processing

All pre-processing steps (see Fig. 1) were carried out in QuPath (version 0.4.3) software [23]. First, color deconvolution was applied to all scanned bright-field (H-DAB) whole slide images to digitally separate stains into three different channels: the DAB channel for hyperphosphorylated tau, the hematoxylin channel for cell nuclei, and a residual channel. Slides were then manually inspected to remove obvious artefacts such as DAB artefacts, de-focused regions, folded tissue, air bubbles and other confounding objects. Brain tissue was separated from the background and segmented into respective regions; for cortical regions, a semi-automated grey and white matter segmentation was carried out using the simple tissue detection tool, followed by the wand tool to manually fine-edit the segmentation. For basal ganglia regions, the putamen, globus pallidus (including internal and external part), and subthalamic nucleus were manually segmented by neuropathologists (AQ, SSK). The dentate nucleus was segmented from the cerebellum slide by a trained expert (TP).

Fig. 1figure 1

Tau pipeline overview. a In QuPath, a whole slide image is digitally separated into haematoxylin, DAB, and residual channels. Tissue segmentation for region of interest follows where grey matter is segmented from cortical slides, subthalamic nucleus, globus pallidus and putamen are segmented from basal ganglia slides, and dentate nucleus is segmented from cerebellum slides. Obvious artefacts are also manually removed. b DAB thresholding is performed to detect tau objects (in green) and features are extracted for each object. c Tau classification (examples presented in yellow and blue boxes for a pre-frontal slide) begins with separating non-tau artefacts from tau objects using a universal screening classifier and tau objects are then classified into different tau types using region-specific tau classifiers (which include 4 different tau classifiers, for cortical regions, putamen, subthalamic nucleus and globus pallidus, and dentate nucleus). Final slide checking is required to ensure accurate results before subsequent analysis. TA Tufted astrocyte, NFT Neurofibrillary tangle, CB coiled bodies, TF tau fragments, DAB 3,3′-diaminobenzidine

DAB thresholding and feature extraction

A thresholder tool in QuPath software [23] was applied to the DAB channel to detect tau objects (resolution = high, pre-filter = Gaussian, smoothing sigma = 0, threshold = 0.25, minimum object size = 5μm2). Areas with DAB intensity above the threshold were labelled as tau objects. Optimal parameters of the thresholder were obtained from visual inspection to maximise the detection of tau and minimise the detection of noise and artefacts.

To reduce the creation of artefacts resulting from bleeding of digital stains between the haematoxylin and DAB channels, we applied an initial screening classifier. This is a random forest classifier trained on all extracted features to separate non-tau from tau objects. Non-tau objects include artefacts from slide preparation, and brown biological elements such as iron granules and lipofuscin.

In total, 54 features were calculated using available built-in functions in Qupath and extracted from each tau object (see Table 2). These comprised 6 morphological features and 35 intensity features, where 5 features (minimum, maximum, mean, median and standard deviation) were calculated from 7 channels (red, green, blue, DAB, haematoxylin, brightness, and saturation). Thirteen Haralick features from the DAB channel were also computed for textural information.

Table 2 Haralick and morphological features extracted from detected objects used in training the machine learning modelTraining set

To create an equal sampling area for each training slide, a grid view was used (grid size = 250 × 250 mm). Each tau object labelled by DAB thresholding was manually labelled as belonging to one of the five classes (‘coiled body’ (CB), ‘neurofibrillary tangle’ (NFT), ‘tufted astrocyte’ (TA), ‘tau fragments’ (TF), and ‘non-tau’). CB is an oligodendroglial tau inclusion with coiled-like structure and are smaller than NFT, which is a neuronal tau inclusion with elongated, flamed shape. TA is generally quite large and has a star-like tufts of densely packed fibres in astrocytes, and TF are threads or fragments of tau that were not detected as CB, NFT, or TA. A screening classifier was trained on 9827 tau and 12,006 non-tau objects annotated from cortical and basal ganglia slides (see Fig. 2).

Fig. 2figure 2

Schematic diagram showing annotated data and hyper-parameter tuning steps for the a screening classifier and b region-specific tau classifiers. c For each loop through the stratified tenfold cross validation (CV), data normalization, feature selection using feature recursive elimination with a random forest and the hyper-parameter tuning of the random forest were carried out

For the cortical tau classifier, training objects were sampled from boxes defined over areas of high tau burden, yielding 3954 objects (661 CB, 126 NFT, 254 TA, 2913 TF). For basal ganglia and the dentate nucleus, 4-grid boxes with 1-grid spacing between the boxes were drawn to cover the entire area for sampling. The tau classifier for the putamen was trained on 3699 tau objects (335 CB, 48 NFT, 200 TA, 3116 TF) and the tau classifier for the subthalamic nucleus and globus pallidus was trained on 13,686 tau objects (601 CB, 97 NFT, 12,988 TF). The tau classifier for the dentate nucleus was trained on 2186 tau objects (147 CB, 234 NFT, 1805 TF). The tau classifiers for the subthalamic nucleus and globus pallidus, and dentate nucleus were not trained to detect TA as they are very rare in these regions, unlike in the putamen and cortex.

Held-out test set

Two slides from each of the cortical, basal ganglia and dentate nucleus regions were randomly selected as held-out test slides and annotated by a trained expert (TP) and a neuropathologist (AQ). Cohen’s kappa was used to assess the inter-rater reliability alongside classification performance against the trained expert. In total, 5754 objects were annotated for cortical slides (296 CB, 78 NFT, 237 TA, 1761 TF, 3382 non-tau). For the basal ganglia, 6528 objects were annotated (153 CB, 21 NFT, 2795 TF, 44 TA, 3515 non-tau), with 2207 objects in the globus pallidus, 2199 objects in putamen, 2122 objects in the subthalamic nucleus. For dentate nucleus, 2280 objects were annotated (18 CB, 26 NFT, 844 TF, 1392 non-tau).

Model development

Random forest algorithms are a type of tree-based ensemble algorithm that re-sample data to create many bootstrapped (smaller) datasets. A decision tree is then created for each random subset of variables for each bootstrap dataset. Random forest classification considers class prediction voting from all trees in the forest and outputs a final class prediction with the majority of votes. There are many potential extensions to the standard random forest to tackle the class imbalance issue, which can be largely grouped into two different techniques: cost-sensitive learning and re-sampling techniques [24]. The former concerns changing the weight or penalty parameters of the algorithm while the latter directly changes the class distribution by re-sampling the dataset. Re-sampling techniques have been widely shown to improve classification performance better than cost-sensitive learning techniques [18, 24]. Therefore, in this study, we used balanced random forest which randomly under-samples the majority class in each bootstrap, making the data balanced [24]. As a random forest classifier makes a final class prediction based on majority voting, it operates under the assumption that each class has equal likelihood or threshold of occurring. This can be adjusted to address severe class imbalance issue using a threshold-moving technique [25,26,27]. This is especially relevant for tau burden classification as their relative proportions are different in cortical and subcortical structures [2].

Hyper-parameter tuning

The Sci-kit learn (version 0.24.1) [28] and Imbalanced-learn (version 0.8.1) [29] libraries in Python (version 3.9.7) were used to implement a random forest algorithm for the tau classification pipeline. The data was standardised (mean = 0, SD = 1) and tenfold stratified cross validation was used to train the classifiers, partitioning data into 10 folds (see Fig. 2). At each iteration, 9 out of 10 folds were used as training data and one-fold was used to validate training performance.

In the balanced random forest classifier, each bootstrap sample was class balanced. During the training phase, feature selection was performed using recursive feature elimination. Hyper-parameters of the balanced random forest were tuned using a random-search with the following parameter space: n_features_to_select = [28, 30, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54] (number of features to select), n_estimators = [100, 200, 300, 400, 500, 600, 700, 800, 900, 1000] (number of trees in the forest), max_features = [0.2, 0.4, 0.6, 0.8, 1] (number of features to consider for best split), max_depth = [5, 10, 15, 20, None] (maximum depth of the tree), min_samples_split = [2, 5, 10] (minimum samples required to split further), min_samples_leaf = [1, 2, 4] (minimum samples required to be a leaf node), sampling_strategy = [‘auto’, ‘all’, ‘not majority’, ‘majority’] (sampling strategy to sample the dataset), max_samples = [0.25, 0.5, 0.75, None] (number of bootstrap samples to draw to train each base estimator), class_weight = [‘balanced’] (weight or importance associated with the classes). The balanced random forest was optimised based on the mean area under the 4 precision-recall curves (PR-AUC) using a one-vs-rest approach (TA vs. rest, CB vs. rest, NFT vs. rest, TF vs. rest).

Class-specific threshold tuning

Using the hyper-parameters found, optimal class-specific thresholds were tuned to tackle class imbalance. Predicted class scores were used to re-compute PR-AUC using a one-vs-rest approach as above. The PR-AUC for each class was optimised using the F1-score. After obtaining class-specific thresholds, class probabilities for each object were thresholded to obtain the predicted class label. Brain regions with similar tau morphology and distribution were grouped together where class-specific thresholds were tuned separately for each regional grouping. In this study, there were 4 regional groupings: cortex, putamen, globus pallidus and subthalamic nucleus, and dentate nucleus.

If an object’s class probability passed the class-specific threshold, an object would be labelled as the corresponding class. To mirror human classification of tau objects, we assessed the ambiguity of tau object classification. If more than one class or no class passed the class-specific threshold, the object was labelled as ‘Ambiguous’ and discarded from further analyses.

After classification, the precision, recall, macro F1-score and confusion matrix of the model were collected. The model was then applied to the held-out test set to evaluate its performance generalisability. Finally, the optimised model was applied to the remaining novel slides to perform tau classification and quantification for further analyses.

Tau quantification

The four types of tau quantified were CB, NFT, TA and TF. This enabled the calculation of total tau and tau hallmarks (all tau excluding TF). Using raw counts of tau quantified, tau density was calculated as the number of tau objects per unit area (μm2) of the region quantified. For cortical regions, tau density was quantified in cortical grey matter, while the entire nuclei area was used for basal ganglia and dentate nucleus.

Correlation with PSP staging

Polar plots using the plotly package in Python [30] were used to show regional tau distribution quantified from the pipeline for both total tau and tau density by tau type. Spearman's rank correlation coefficient was used to compute the correlation between tau density quantified across regions and PSP stage. Correlations between region-specific tau density and region-specific rating were also computed within regions of the PSP staging scheme.

Clinicopathological correlations

For this analysis, we included 28 PSP subjects with available PSPRS scores and due to the skewness of tau density distribution, a logarithmic transformation (log10) was applied to tau density. To investigate the relationship between postmortem tau and PSPRS score, the brms package in R (version 1.4.1717) [31,32,33] was used to construct Bayesian linear mixed regression models. Bayesian analysis enables the calculation of posterior probability distributions showing the uncertainty of the regression coefficient estimates based on effect size [34], and permits the null hypothesis to be rejected or accepted [34]. The analysis was first carried out with PSP stage as the predictor, PSPRS total score as the outcome variable, and disease duration and PSPRS to death interval as covariates to establish a baseline relationship between the staging scheme and PSPRS score. The same analysis was repeated with tau density quantified from all regions and separately from only cortical and subcortical regions as the predictor. To test whether tau type-specific burden was more informative of PSPRS score than total tau burden, total tau and tau type-specific models were created for model comparison. To estimate the strength of evidence in favor of the tau type-specific models against the total tau model, we used a standard Bayes Factor (BF) cut-off of 3 to indicate at least moderate evidence [35]. In the final model, the strength of regression coefficient was assessed using the Region of Practical Equivalence (ROPE). Given the optimal ROPE is not established a priori, we used a standard approach to define the ROPE as a range of values ± 0.1 of the standard deviation of a standardized parameter (PSPRS score) [36]. If 95% of the credible interval (Crl) of the regression coefficient falls completely within the ROPE, then the effect of the parameter would be equivalent to the null value for practical purposes [35, 37].

A Gaussian model family was selected based on the distribution of the data. A weakly informative normal prior (mean = 0, SD = 100) was chosen for the regression coefficients and default priors were used for the intercept (student-t prior; df = 3, mean = 53.5, SD = 12.6) and the sigma (student-t prior; df = 3, mean = 0, scale = 12.6). The model configuration was the same for all models (warmup = 10,000, iteration = 20,000). All models went through prior and posterior predictive checks to ensure that the configurations were appropriate. All models converged with no divergences or diagnostic warnings, and in all cases R^ convergence values were ~ 1.00 (see Additional file 1). Due to the complexity of our analysis, sensitivity analysis of priors was conducted to only assess the effect of prior choice on neuropathological severity (PSP stage, tau burden) in the final models. We chose two other weakly informative normal priors, one more informative (mean = 0, SD = 50) and the other less informative (mean = 0, SD = 150) to assess the sensitivity of posterior estimates on the prior choice.

留言 (0)

沒有登入
gif