The adequacy of the algorithm has been independently evaluated by three different companies. Two companies took a simulation-based approach in which they simulated study data based on real ongoing or completed studies with subsequent introduction of under-reporting. Another approach was to detect sites for which AE-related protocol deviations were recorded.
In order to compare the performance metrics of each approach, we have categorized the different scenarios into either a high volume AE scenario or a medium volume AE scenario. The main findings are summarized in Fig. 1. Under high AE volume conditions, the detection of under-reporting is easier and we can detect 50–75% of all sites with AE under-reporting. With a medium AE volume in a trial, we can still detect 20–50% of all under-reporting sites.
These results depend on various other scenario parameters that are summarized in Table 2 and more details can be found in the Supplementary Materials.
For the high AE volume scenario, the “optimal study conditions” scenario included in the evaluation of the Roche approach was picked. It was not based on any study in the portfolio but a scenario with fixed simulation parameters that emulates a study with high AE rates. From the Boehringer Ingelheim approach, the large simulated trial provides a similar scenario. Merck stratified their reported results by the number of patients on site and the selected evaluation visit point (visit_med75) which was determined by the median of the maximum visit count of each patient multiplied by 0.75. Both numbers of patients and visits directly correlate with the total AE count expected at that site. Hence, the Merck analysis including only sites with visit_med75 ≥ 10 was selected as the high AE volume scenario.
Maximum TPRs were 1.000 (Roche), 0.875 (Boehringer Ingelheim), and 0.714 (Merck). In the Roche scenario, with its many similar sites each hosting 10 patients and frequent visits, detecting under-reporting is relatively easy. However, in the scenarios from Boehringer Ingelheim and Merck, the sites vary more and also include sites with fewer patients and visits, which makes detecting under-reporting harder. Collectively, this showed that under conditions with high AE rates high TPRs can be obtained using the algorithm. Decreasing the number of patients in the Merck scenario (or the rate of under-reporting in the other scenarios) increased the detection difficulty and reduced the TPR. Roche’s ideal scenario reported a very low FPR of 0.002, while Boehringer Ingelheim’s more heterogeneous scenario reported a fairly high FPR of 0.1488. Depending on the subsequent action, the Boehringer Ingelheim TPR would need to be reduced in order to bring the FPR down to a more practical level. We can try to combine these empirical results into likely performance estimates for the future application of . These estimates are not universal as they are dependent on the volume of AEs, the tolerable FPR, and the rate at which non-compliant sites are under-reporting AEs. They represent estimated averages from all three results. Altogether, this suggests for high volume AE scenarios and a targeted FPR < 0.01 the results suggest a TPR > 0.5 (Site-UR 0.25–0.5) or a TPR > 0.75 (Site-UR 0.5–0.75).
For studies with medium AE volume, we compared the portfolio-based results of the Roche and the Merck approach with the medium trial scenario of the Boehringer Ingelheim approach. Merck reported a TPR of 0.2, Boehringer Ingelheim reported a TPR of 0.889 at the cost of a very high FPR of 0.405 for a fixed under-reporting rate of 0.25, and Roche reported TPR of 0.213, 0.493, and 0.695 for Site-URs of, respectively, 0.25, 0.5, and 0.75. We can combine these results into the following estimates as we did for the high AE volume scenarios. For medium AE volume scenarios, sites with a Site-UR of 0.25 can expect a TPR of approximately 0.25. Sites with higher Site-URs of 0.5–0.75 can expect a TPR of greater than 0.5. For these TPR rates for medium volume AE scenarios, the expected FPR rate is approximately 0.025. When higher FPR rates can be tolerated, higher TPR rates can be obtained (see Boehringer Ingelheim approach).
These statements expect that the Study-UR in a medium AE volume trial is low and does not exceed the highest ratio tested in the Boehringer Ingelheim experiments of 0.16.
Each evaluation approach included some features that generated unique insights. The Roche approach compared against heuristic detection methods and found that the flagging pattern followed a similar pattern as flagging by boxplot outlier statistics of site AE rates. However, flagging on the basis of under-reporting probability provided more favorable TPR and FPR. Moreover, it was found that the algorithm's default settings provided the best performance. Roche also tried to detect under-reporting for low levels of under-reporting (Site-UR 0.1) resulting in TPR lower than 0.07 which is too low to be relevant. This implies that Site-UR should be greater than 0.25 to effectively detect under-reporting. In the context of the Boehringer Ingelheim approach, it was found that the Benjamini–Hochberg [14] multiplicity correction offered by was too conservative and reduced overall classification performance expressed by the receiver operating characteristics area under the curve (ROC AUC), a metric independent of classification cut-off threshold [19]. Additionally, we noted that approximately 50% of all sites with accurately detectable under-reporting were identifiable as early as when 25% of the study time had elapsed following the first visit in the medium trial, and 32% in the large trial.
Merck has reported that performance varied between therapeutic areas and that therapeutic areas with high enrollment such as vaccine trials had the highest TPR rates. Merck also noted that in 30% of the sites with AE-related PD, the average AE reported by the site at visit_med75 was higher than the site-level average AE reported by the study. These instances cannot be detected by the algorithm because it only identifies sites with a (significantly) lower average than the study.
Comments (0)