This retrospective study was conducted at Elisabeth-TweeSteden Hospital in Tilburg, The Netherlands, a tertiary referral hospital for VS microsurgery and Gamma Knife radiosurgery. Institutional review board approval was obtained and the requirement for informed consent was waived.
MaterialsElisabeth-TweeSteden Hospital has an established extensive database (including follow-up) of patients with unilateral sporadic VS treated with Gamma Knife radiosurgery [20, 21]. All tumors were annotated and volumetrically analyzed using the Gamma Knife treatment planning software (GammaPlan, Version 11, Elekta AB, Stockholm, Sweden).
A total of 100 patients were purposively sampled from the database based on their tumor volume in order to adhere to the natural distribution of VS tumor volumes in both wait-and-scan [7] and radiosurgery cohorts [21], see Fig. 1. In order to verify whether this annotation dataset provides adequate statistical power, a required sample size was determined by the confidence interval (CI) lower limit procedure, which has proven to be a valid power analysis method in inter-observer reliability studies [22]. This method yielded a minimum required sample size of 26 (ρ = 0.9, ρ0 = 0.8, β = 0.8, α = 0.05, N = 5). This shows that the inclusion of 100 patients for the annotation dataset is sufficient.
Fig. 1Flowchart showing the inclusion and selection process
Imaging parametersThe patients included in our study underwent contrast-enhanced T1-weighted 3D MRI, as part of their follow-up protocol after radiosurgical treatment. These scans were obtained between 2005 and 2020. Imaging was performed on the axial plane with either a 1.0 T, 1.5 T, or 3.0 T scanner (Achieva, Intera, and Ingenia; all Philips Healthcare, Best, The Netherlands). Gadoterate meglumine (Dotarem, Guerbet) was administered intravenously (5 to 10 mmol, depending on body weight). Image acquisition parameters varied throughout the years: median echo time 4.6 ms (range: 3.9 – 6.9), median repetition time 25 ms (range: 8.4 – 26.6), median slice thickness 1.6 mm (range: 0.8 – 2.0), and median voxel spacing 0.78 mm (range: 0.25 – 1.0).
Observers and annotationFive observers participated in this study: two senior neurosurgeons, both with extensive experience in segmenting VS as part of radiosurgical treatment planning, and three researchers with experience in segmenting VS as part of follow-up analyses. The participating researchers were trained in VS annotation by the involved neurosurgeons at an earlier stage. Annotation was performed in GammaPlan, occurred independently, while no prior information was available to the observer (e.g. earlier annotations or measurements). The semi-automated segmentation method included in GammaPlan aided the observers in segmenting the tumors (see Fig. 2). This method enables the observer to select a voxel-value range that corresponds to the voxel values of the tumor, resulting in a coarse initial segmentation. Following this, the observer can manually fine-tune the segmentation.
Fig. 2Visualization of the used semi-automated annotation method: (a) the original image; (b) the annotator selects a voxel-value range (depicted in blue) and annotates the image within that range (depicted in red); (c) a coarse initial segmentation; (d) final segmentation after manual fine-tuning
Statistical analysesThe average observed volume was calculated for each subject. Relative volume standard deviation (\(_\)) was calculated by dividing the standard deviation of the observed volumes for each subject by its average observed volume. This metric is high when the observed volumes within a single subject substantially deviate between observers.
Observer agreement was assessed using the limits of agreement with the mean (LOAM) method [23]. This procedure is a generalization of the commonly used Bland–Altman plots, which calculates agreement limits for two observers. The LOAM method can be used for multiple observers and expresses the agreement limits as a confidence interval. More specific, this method calculates how much an individual measurement may ostensibly deviate from the mean of the measurements of all observers for a single subject. The method also allows for the detection of the source of the variation between observers by estimating both the inter-observer variance (\(}_^\)), which represents the systematic differences between observers, and the residual variance (\(}_^\)), which represents the random measurement error. The intraclass correlation coefficient (ICC) follows from this computation as well.
Tumors were categorized into four size categories (I to IV) based on their respective average observed tumor volume quartiles. The LOAM and its variance components were calculated for each of the four size categories. Furthermore, in order to extend this analysis to the entire dataset, a sliding window of width 26 (i.e. our required sample size) was employed. Starting with the smallest 26 tumors, the LOAM and overall average observed volume were calculated for each set of 26 consecutive tumors, ending with the largest 26 tumors. Through this, it enabled us to calculate a volume-dependent continuous LOAM.
As we hypothesized based on clinical practice, that peritumoral cysts have an important effect on the inter-observer variability, we performed all analyses both including and excluding tumors with peritumoral cysts.
Both univariable and multivariable linear regression analyses were executed for all imaging parameters and tumor characteristics, in order to investigate any significant associations on the relative volume standard deviation (\(_\)).
We considered P values smaller than 0.05 to indicate statistical significance. Analyses were performed by using statistical software (SPSS Version 27.0; SPSS, Chicago, Illinois) and in Python (Version 3.8.8).
Comments (0)