Using matrix assisted laser desorption ionisation mass spectrometry combined with machine learning for vaccine authenticity screening

Analysis of vaccines and falsified constituents by MALDI-MS

Four different authentic, commercially available, vaccines and eight falsified surrogates previously reported in falsified vaccine products3, were used in this study. The authentic vaccines were Nimenrix (Pfizer Ltd, Sandwich, UK), a conjugate vaccine that protects against Neisseria meningitidis groups A, C, W-135 and Y; Engerix B (GlaxoSmithKline, Brentford, UK), which protects against hepatitis B virus infection (HBV); Flucelvax Tetra (Seqirus Ltd., Maidenhead, UK) which protects against influenza (Sept/Oct 2021 to early 2022 season) and Ixiaro (Valneva Ltd., Fleet, UK), for immunisation against Japanese encephalitis virus infection. Information about genuine vaccines and falsified vaccine surrogates is provided in Table 13,8,44,45,46,47,48,49.

Table 1 Samples used for analysis

We performed sample analysis in parallel on two separate MALDI-MS systems, both routinely used for microorganism clinical testing with worldwide deployment. A MALDI Biotyper Sirius (Bruker Daltonics) and a VITEK MS (bioMérieux, Craponne, France). The two instruments provided very similar performance when combined with data modelling but interestingly provided slightly different mass spectral profiles when visually compared. First, we acquired mass spectra using methods adapted from the standard in vitro diagnostic (IVD) parameters provided on both instruments. We made slight adjustments to the laser raster pattern and percentage energy range to accommodate a broader range of sample types. Spectra were acquired over three different overlapping m/z ranges: 0–900; 700–2500 and 2000–20,000. Representative spectra for Engerix B and the eight falsified constituent samples at m/z 700–2500 and m/z 2000–20,000 mass ranges are shown in Supplementary Figs. 1, 2 for the Biotyper Sirius and VITEK MS instruments, respectively. Visible peaks in the low-mass range included matrix peaks that were common to all samples and could be identified from matrix blanks, as well as analyte peaks related to the individual samples. Given the rich spectral data obtained in the m/z 0–900 range, where vaccine-specific excipients were found, we decided to focus on this m/z range in further analyses. Figure 2 shows representative mass spectra for the Engerix B vaccine and each of the surrogate falsified samples as well as blank CHCA matrix at the m/z 0–900 range (similar comparisons for the other vaccines are provided in Supplementary Figures 3 & 4). Non-matrix peaks, that were unique to either individual vaccines or falsified constituents, were identified by manual inspection of the spectra. The spectral peaks in Fig. 3a, b provide an illustration of the presence and absence of mass spectral peaks which were observed for Engerix B and the falsified vaccine constituents. These analyses established the proof-of-principle that the MALDI-MS systems were capable of measuring mass spectral peaks that can distinguish genuine comparator vaccines from falsified vaccine surrogates.

Fig. 2: Representative mass spectra (m/z 0–900) for α-cyano-4-hydroxycinnamic acid (CHCA) matrix, Engerix B vaccine and eight samples of other compounds and mixtures previously reported as being constituents of falsified vaccines.figure 2

a Biotyper Sirius mass spectra. b VITEK mass spectrometry (MS) spectra. Through the presence, absence and relative intensity ratios of peaks in the spectra, the genuine vaccine can be distinguished from the falsified constituents by manual inspection of spectra. Common matrix peaks are indicated by shaded bars.

Fig. 3: Schema showing samples and a pooled quality control (QC) sample being spotted onto a matrix-assisted laser desorption/ionisation (MALDI) target.figure 3

A pooled QC sample was prepared from the vaccines and falsified samples. An Assist Plus robot was used to combine the matrix with each sample in a 1:1 (V/V) ratio and then spot onto the MALDI plate. Only the QC and first three samples are illustrated, but all four authentic vaccines and eight falsified constituent samples were prepared in the same way across multiple MALDI plates which were analysed in a random sequence within the MALDI instruments. CHCA: α-cyano-4-hydroxycinnamic acid; MALDI-MS: matrix-assisted laser desorption/ionisation-mass spectrometry. This figure was created using BioRender.com.

MALDI method development and validation

Having established the feasibility of distinguishing vaccines and falsified constituents by manual inspection, we next developed and validated a method and workflow for data processing and analysis. The reproducibility of MALDI-MS mass spectra is known to be largely affected by matrix type, sample composition and matrix-sample crystallisation conditions, as well as the specific laser ablation parameters50,51,52. We, therefore, investigated analytical reproducibility on both platforms.

In order to determine analytical “spot-to-spot” reproducibility and intra-batch (vaccine vial-to-vial) reproducibility, we analysed replicates of the four authentic vaccine samples and eight falsified surrogates. For each sample vial, we created four replicate spots on the MALDI target plate and replicated this three times using three separate vials (same manufacturer batch number/part number), so there were 12 MALDI sample spots for each vaccine and falsified constituent on a MALDI plate. All samples were distributed across three Bruker MALDI plates and six bioMérieux MALDI slides, respectively (due to the different dimensions of the plates for both systems). We also created a pooled quality control sample which comprised an equal volume mixture of each of the four authentic vaccines and eight falsified vaccine samples. The experiment was designed to investigate analytical reproducibility, spot-to-spot variability and vial-to-vial reproducibility. A schematic illustrating how the MALDI plate samples were spotted, and the plates configured is shown in Fig. 3.

Each MALDI spot was analysed under the same settings for each instrument. A randomised acquisition sequence was used to control for any bias in sample preparation or run order. Table 2 provides the percentage RSD for the total ion intensity for all 12 replicates of each sample and 24 QC replicates prior to intensity calibration from analysis on the Sirius MALDI platform (equivalent data for the VITEK is given in Supplementary Table 1). These results show the total variation of the vaccine or falsified constituent samples. The range in RSD values for all samples except Amikacin was from 18 to 44% over all sample replicates for each group. This reproducibility in signal intensity was similar to the RSDs reported in other MALDI-based profiling studies using other sample types53. Figure 4a shows the vial-to-vial reproducibility specifically (e.g., inter-vial variability) for each genuine vaccine and falsified constituent, comprising individual percentage RSD calculations for the four sample preparation replicates of each vial. Equivalent data for the VITEK is shown in Supplementary Fig. 5a.

Table 2 Evaluated reproducibility of the raw data from the Biotyper Sirius (0–900 m/z)Fig. 4: Method validation using mass spectrometry data.figure 4

a The percentage relative standard deviation (RSD) values for each vial per sample are plotted showing the range and mean. b The total ion count (TIC) for each quality control (QC) sample replicate plotted in consecutive run order shows no particular bias (replicates spotted on different target plates are alternately shaded/white). c TIC, laser power, and number of shots of the laser for replicates plotted consecutively for each QC sample.

Analysis of Amikacin, Gentamicin, and Nimenrix gave some of the highest RSD values and the total RSD for all 12 replicates of Amikacin was anomalously high at 122% in the Sirius data (see Table 2). These higher percentage RSD values correlated with poorer co-crystallisation of the sample with the CHCA matrix on the MALDI plate prior to analysis. For these three samples, all 12 replicates exhibited a shiny appearance on the spot surface as opposed to appearing matte with visible matrix crystals observed for most other samples. For Amikacin, the dried spots maintained a droplet-like three-dimensional structure (unlike all other samples which dried flat) and may have resulted in poor sample ionisation and, subsequently, greater intensity variation reflected in the percentage RSD values. This demonstrates the importance of ensuring optimal sample-matrix crystallisation conditions.

To investigate whether there was any observable bias in the intensity measurements, we next plotted the relationship between run order and peak intensity across the QC samples. Figure 4b illustrates the result from the Sirius showing no observable bias (similar results were obtained from the VITEK shown in Supplementary Fig. 5b). This suggested the process of analysing the MALDI plate in the ion source does not lead to bias in intensity measurement over time. Finally, in order to establish whether the variability observed in replicates of intensity measurements (indicated by the RSD values) was influenced by the laser power or the number of times the laser was fired, we plotted the laser power of the last 50 shots acquired (in the analysis of each sample spot) against the corresponding TICs and the total number of accumulated shots for each replicate in run order for the QC samples for the Bruker Sirius analysis (Fig. 4c). No correlation was observed suggesting total signal intensity was not biased by any variation in the laser power or in the number of laser firings that may occur between the analysis of different spots.

Developing a data processing and analysis workflow using MALDIquant

After establishing that multiple authentic and falsified vaccine constituents could be reproducibly differentiated by the identification of unique mass spectral peaks, and having established reproducibility of peak intensities across replicate samples, we next developed a spectral data processing workflow using the MALDIquant R package. Figure 5a illustrates the main steps in the workflow developed. This includes combining the full spectrum data from all samples into a table for each replicate across all samples, baseline correction, peak intensity normalisation and peak identification. These steps were performed to reduce experimental and analytical variability in the dataset, and to align peaks and their intensities between samples. To do this, we evaluated each step using our vaccine and falsified vaccine sample dataset. The data processing was performed using data from both MALDI platforms. Spectra files were imported into R in mzXML format, with quality control by visual inspection.

Fig. 5: Data processing steps using MALDIquant (Bruker Biotyper Sirius data).figure 5

a MALDIquant workflow. b Baseline drift correction using TopHat algorithm, spectra for hyaluronic acid. c Comparing the effect of pre and post probabilistic quotient normalisation (PQN) on the percentage relative standard deviation (RSD) for the vaccine, falsified constituent, and quality control (QC) sample replicates. d QC spectrum showing peaks labelled A–E used to illustrate m/z variation. e Box plots illustrating variation in m/z across 24 QC samples for peaks labelled A–E in part D. The line in the grey box indicates the median value, with the box limits showing the interquartile range. Whiskers extend to max and min values. f Comparing different signal-to-noise ratio (SNR) thresholds using an averaged mass spectrum incorporating authentic and falsified vaccines/constituents. Coloured coded numbering representing SNR thresholds.

Baseline drift across the mass range is a common feature of MALDI-mass spectra, and this can interfere with peak intensity comparisons between samples. For example, in Fig. 5b the upper spectrum without correction shows the baseline drifts with increasing m/z. MALDIquant provides either a statistics-sensitive non-linear iterative peak-clipping (SNIP) algorithm developed by Ryan et al.54, a TopHat approach derived from mathematical morphology55, ConvexHull or median algorithm to correct for this, based on user selection. We applied the TopHat baseline correction to each acquired spectrum which mimicked the default algorithm set in Bruker flexControl software. The lower mass spectrum in Fig. 5b shows the result of applying the baseline correction with the beneficial effect of lowering the baseline, especially towards the higher end of the mass range.

Intensity shifts from one replicate spectrum to another were identified in the analysis of the vaccine and falsified constituent samples (see sample RSD variation in Fig. 6a, b and QC sample analysis in Fig. 5c). Post-acquisition data normalisation can be used to minimise these variations and reduce the influence of experimental or analytical variability. There are various statistical approaches (used extensively in metabolomics, for example) where large datasets are compared, and here a probabilistic quotient normalisation (PQN) was applied56. This was found to have a positive effect by lowering the RSD values in almost all cases (Fig. 5c).

Fig. 6: Multivariate statistical analysis discriminates the authentic vaccines Engerix B, Flucelvax Tetra, Ixiaro and Nimenrix from falsified vaccine constituents.figure 6

a Biotyper Sirius dendrogram. b VITEK MS dendrogram. Hierarchical clustering dendrogram of all samples sorts almost all sample replicates (n = 12 for each sample type) into their respective groups.

After data normalisation, variations in m/z were evaluated and corrected to ensure effective comparisons could be made across multiple samples in the experiment. Figure 5d shows a representative mass spectrum of the QC sample with five peaks labelled (A − E). Peaks A to E in Fig. 5d show a variation in m/z across the 24 QC replicates which are illustrated by the box plots in Fig. 5e. The mean average range in m/z value per peak was 0.231 Da with a standard deviation of 0.06 Da. This variability is largely due to differences in peak shape where flat top peaks lead to fluctuation in the centroided m/z value (Exemplar peak shapes shown in Supplementary Fig. 6). Peaks were aligned to correct for this using non-linear warping of peaks with the locally weighted scatterplot smoothing (LOWESS) method57,58 with tolerance, SNR and half-window size parameters selected to optimise the spectral alignment of the dataset.

To evaluate how mass spectral peaks are 'picked', (e.g. automatically recognised as an individual mass spectral peak) and accurately assigned across samples, we tested various signal-to-noise ratio threshold settings. MALDIquant can identify local maxima and minima across the mass spectrum and then compare which peaks are above a set SNR threshold to identify the signal as a spectral peak for inclusion in the dataset. Figure 5f illustrates the effect of different signal-to-noise ratios using an averaged mass spectrum of all the genuine and falsified vaccine samples. Peak binning (with a user-defined threshold) was also used at this stage to ensure individual m/z features were correctly assigned across all the mass spectra. This increases mass spectral precision to ensure a more effective data comparison. The threshold for peak binning was chosen based on an evaluation of the spectral resolution across the dataset.

Vaccine authentication using machine learning (ML)

Having developed and validated a combined sample analysis and data processing workflow we applied this to analyse and compare authentic and falsified vaccine constituents using both MALDI platforms in parallel. We analysed samples from three replicate vials of each of the four authentic vaccines and eight falsified vaccine surrogates. Four analytical replicates were also analysed for each vial replicate to investigate analytical and vaccine vial-to-vial reproducibility. The samples were spotted and then analysed using the 0–900 m/z range. The resulting data were processed using the MALDIquant workflow developed, and a data table representing all the results was produced (example given in Supplementary Table 2). The heatmap in Supplementary Fig. 7 provides a visual overview of the dataset and was used to confirm that no individual or experimental class outliers were present (equivalent figure for the VITEK MS in Supplementary Fig. 8). To explore whether the vaccines and falsified constituents could be distinguished from each other using a multivariate statistical machine learning approach, we first performed hierarchical clustering (based on a Euclidean distance measure and a Ward clustering algorithm). We found that each of the samples replicates clustered together (Fig. 6) in almost all cases for the data collected on both MALDI platforms, which showed that both datasets contained m/z features that could differentiate authentic and falsified vaccines. To statistically model how well the data could distinguish the different sample groups, we compared each individual authentic vaccine with all the falsified vaccine samples using partial least squares-discriminant analysis (PLS-DA), commonly used in untargeted data modelling59,60. PLS-DA is a supervised dimensionality reduction method that builds models based on input variables and identifies which of these variables maximise separation between the groups. Validated models can be used to make future predictions on new data presented to the model. We first created a PLS-DA model using the Biotyper Sirius data for the authentic Engerix B vaccine with all the falsified vaccines. To illustrate the results, the PLS-DA scores plot (Fig. 7a) shows sample replicates cluster by sample type, and the model distinguished the authentic vaccine from the falsified vaccine constituents (and also the falsified constituents from each other) and was shown to create a strong model that was not overfitting the data (Fig. 7b, c). We subsequently created models for each authentic vaccine using both the Sirius and VITEK datasets. To demonstrate that the PLS-DA models were reliable and not overfitting the datasets, we performed cross-validation, permutation testing and a modified external validation for each model61. For the Engerix B Sirius data model R-squared (R2) and Q-squared (Q2) were between 0.8 and 1 and the permutation test statistic was P < 0.01 (Fig. 7b, c)62. Tabulated values for the PLS-DA cross-validation are displayed in Supplementary Table 3 (and the equivalent PLS-DA plots for the VITEK Engerix B data are shown in Supplementary Fig. 9). Similar results were obtained when comparing the other three genuine vaccines with all falsified vaccine surrogates across both MALDI platforms (Supplementary Figs. 1015). We also performed an independent external validation where each dataset was randomly split into a training set (80% of the data) and an external test set (20% of the data). The models were created using the training set, and then the classifications were confirmed using the test set (which had not been seen by the model previously). Confusion matrices (see Supplementary Tables 427, with the genuine vaccine highlighted in yellow) were created for the external validation datasets, and in each case (for both Sirius and VITEK results), the authentic vaccines were predicted correctly63. In some cases, the different types of water and saline falsified constituents were not fully resolved, but this was not unexpected considering their compositional similarity and this did not compromise the identification of the authentic vaccines. In summary, our PLS-DA modelling demonstrated that the MALDI-MS data could be used to reliably predict each genuine vaccine from falsified constituents.

Fig. 7: Biotyper Sirius partial least squares-discriminant analysis (PLS-DA) of authentic vaccine Engerix B and all falsified vaccine constituents, m/z 0–900.figure 7

a PLS-DA two-dimensional scores plot shows sample group clustering. b Cross-validation shows a minimum of four components (mass spectral peaks) are required to differentiate the experimental groups for the best Q-squared (Q2) value (shown by *). Supplementary Table 3 gives the numerical values for the performance of accuracy, R-squared (R2) and Q2 in the cross-validation. The performance axis indicates the predictive ability of the model. c Permutation testing showed the model was significant with P < 0.01.

Next, we identified the most discriminatory mass spectral peaks in the models by examining the top 15 m/z features in the Variable Importance in the Projection (VIP) plot. Figure 8a shows the ranking of each of the top 15 m/z values from the Sirius data by way of example. The mass spectral abundance differences for the top 15 VIPs were statistically significant for at least one or more of the falsified constituents individually compared to Engerix B (two-way ANOVA with Dunnett multiple comparison test, Fig. 8b). Supplementary Figs. 1622 further illustrate Sirius and VITEK MS VIP plots and ANOVA summaries for the falsified surrogates compared to the genuine vaccines. The PLS-DA results demonstrated that the MALDI data modelling, based on the full MALDI-mass spectrum, could be used to discriminate between authentic vaccines and falsified vaccine constituents in addition to the four genuine vaccines themselves (Supplementary Fig. 23).

Fig. 8: Biotyper Sirius analyses of compound feature significance.figure 8

a Variable importance in the projection (VIP) of the peaks at m/z 0–900 for the Engerix B vaccine compared to the eight falsified constituents. The top 15 m/z values are plotted based on their VIP score. The heatmaps to the right of the plot represent the relative intensities of the m/z values for each sample group averaged over the group. b Two-way analysis of variance (ANOVA) with Dunnett multiple comparison test results for the top 15 m/z values from the VIP analysis. m/z values with at least one statistically significant comparison (P < 0.05) for a falsified constituent compared to Engerix B are marked with a check.

One way to implement the MALDI-MS method as a tool for vaccine supply chain screening, would be to automate matching and scoring multiple spectral peaks identified in experimental samples with an online database containing multiple discriminatory m/z features previously collected and validated using samples of authentic vaccines. For example, a real-time score or percentage match for the mass spectral profile could be used to indicate the likelihood of vaccine authenticity. This approach is analogous to that currently used for bacterial strain identification by MALDI-MS in clinical laboratories worldwide. A complex profile of multiple m/z features would, therefore, be required to make a positive match with a falsified product and creating such a falsified product with the necessary specificity would likely be impractical and uneconomic.

Finally, we manually validated the multivariate model’s ability to predict important biomarker m/z values and identify candidate peaks. To do this, we interrogated the processed dataset independently from the PLS-DA model, comparing each individual m/z value’s peak intensity in the list of all identified peaks measured across all samples to look for statistically significant differences in mean abundance. For example, we compared each mass spectral peak from the Engerix B analysis with each peak from the analysis of the falsified vaccine constituents using ANOVA with the Dunnett multiple comparison test. In total 3699 m/z values were compared statistically, of these 143 showed statistically significant difference between Engerix B and at least one of the falsified vaccine constituents. Of the 143 significant peaks, 63 peaks were present in a falsified vaccine sample and not present at all in the genuine Engerix B, or vice versa. 63 peaks were, therefore, found to be unique differentiators of authenticity or falsification. It was, therefore, straightforward to unambiguously differentiate Engerix B from all other falsified vaccine surrogate samples using these peaks. The result of this analysis showed that there were many mass spectral peaks that could be used to discriminate the falsified from authentic vaccine samples. This provided strong redundancy and, therefore, demonstrated the potential for developing a database of distinguishing mass spectral peaks that could be used for vaccine authenticity testing. We have purposefully, on public health security grounds, not provided the full list of these features so as not to reveal specific features that may be used in any future databases for authenticity testing. However, Fig. 9 summarises the numbers of m/z features and those found to be significant and Fig. 10 presents two peaks from the group of 63 to illustrate. All of the Top 15 VIP m/z values from the PLS-DA modelling in Fig. 8a were also found in the 143 peaks identified by univariate statistical analysis for Engerix B, illustrating the overlap between the machine learning and manual inspection approaches for the identification of potential “biomarker” peaks suitable for differentiating genuine from fake vaccine samples.

Fig. 9: Bar graph of the numbers of m/z values and spectral peaks following two-way analysis of variance with Dunnett multiple comparison test between Engerix B and the falsified vaccine surrogates for Biotyper Sirius data.figure 9

Bar A represents the 3699 total m/z values identified by MALDIquant peak detection and binning. B represents the 143 peaks in the raw spectra that yielded a statistically significant P value (P ≤ 0.05) for at least one falsified constituent compared to Engerix B. Bar C represents the 63 significant peaks in the raw spectra that have a clear presence in Engerix B and absence in at least one falsified constituent (or vice versa).

Fig. 10: Exemplar peaks in raw spectra that could be targeted to confirm genuine Engerix B (Biotyper Sirius spectra).figure 10

a Peaks present at m/z 148.661 in 0.9% (m/V) sodium chloride, 5% (m/V) glucose, tap water, Milli-Q and water for injection but not the genuine vaccine Engerix B. b A peak at m/z 656.246 unique to Engerix B against the falsified vaccine constituents 5% (m/V) glucose, Amikacin, Gentamicin, Milli-Q and water for injection.

Comments (0)

No login
gif