Accurate genotyping of three major respiratory bacterial pathogens with ONT R10.4.1 long-read sequencing [RESEARCH]

Nora Zidane1, Carla Rodrigues1,2, Valérie Bouchez1,2, Martin Rethoret-Pasty1, Virginie Passet1,3, Sylvain Brisse1,2,3 and Chiara Crestani1 1Institut Pasteur, Université Paris Cité, Biodiversity and Epidemiology of Bacterial Pathogens, 75015 Paris, France; 2Institut Pasteur, National Reference Center for Whooping Cough and Other Bordetella Infections, 75015 Paris, France; 3Institut Pasteur, National Reference Center for Corynebacteria of the Diphtheriae Species Complex, 75015 Paris, France Corresponding author: chiara.crestanipasteur.fr Abstract

High-throughput massive parallel sequencing has significantly improved bacterial pathogen genomics, diagnostics, and epidemiology. Despite its high accuracy, short-read sequencing struggles with the complete genome reconstruction and assembly of extrachromosomal elements such as plasmids. Long-read sequencing with Oxford Nanopore Technologies (ONT) presents an alternative that offers benefits including real-time sequencing and cost efficiency, particularly useful in resource-limited settings. However, the historically higher error rates of ONT data have so far limited its application in high-precision genomic typing. The recent release of ONT's R10.4.1 chemistry, with significantly improved raw read accuracy (Q20+), offers a potential solution to this problem. The aim of this study is to evaluate the performance of ONT's latest chemistry for bacterial genomic typing against the gold-standard Illumina technology, focusing on three respiratory pathogens of public health importance, Klebsiella pneumoniae, Bordetella pertussis, and Corynebacterium diphtheriae, and their related species. Using the Rapid Barcoding Kit V14, we generate and analyze genome assemblies with different basecalling models, at different simulated depths of coverage. ONT assemblies are compared to the Illumina reference for completeness and core genome multilocus sequence typing (cgMLST) accuracy (number of allelic mismatches). Our results show that genomes obtained from raw ONT data basecalled with Dorado SUP v0.9.0, assembled with Flye, and with a minimum coverage depth of 35×, optimized accuracy for all bacterial species tested. Error rates are consistently <0.5% for each cgMLST scheme, indicating that ONT R10.4.1 data are suitable for high-resolution genomic typing applied to outbreak investigations and public health surveillance.

Received October 3, 2024. Accepted May 28, 2025.

Comments (0)

No login
gif