FastMLST: A Multi-core Tool for Multilocus Sequence Typing of Draft Genome Assemblies

Multilocus Sequence Typing (MLST) is a precise microbial typing approach at the intra-species level for epidemiologic and evolutionary purposes. It operates by assigning a sequence type (ST) identifier to each specimen, based on a combination of alleles of multiple housekeeping genes included in a defined scheme. The use of MLST has multiplied due to the availability of large numbers of genomic sequences and epidemiologic data in public repositories. However, data processing speed has become problematic due to the massive size of modern datasets. Here, we present FastMLST, a tool that is designed to perform PubMLST searches using BLASTn and a divide-and-conquer approach that processes each genome assembly in parallel. The output offered by FastMLST includes a table with the ST, allelic profile, and clonal complex or clade (when available), detected for a query, as well as a multi-FASTA file or a series of FASTA files with the concatenated or single allele sequences detected, respectively. FastMLST was validated with 91 different species, with a wide range of guanine-cytosine content (%GC), genome sizes, and fragmentation levels, and a speed test was performed on 3 datasets with varying genome sizes. Compared with other tools such as mlst, CGE/MLST, MLSTar, and PubMLST, FastMLST takes advantage of multiple processors to simultaneously type up to 28 000 genomes in less than 10 minutes, reducing processing times by at least 3-fold with 100% concordance to PubMLST, if contaminated genomes are excluded from the analysis. The source code, installation instructions, and documentation of FastMLST are available at https://github.com/EnzoAndree/FastMLST

1. Maiden, MC, Bygraves, JA, Feil, E, et al. Multilocus sequence typing: a portable approach to the identification of clones within populations of pathogenic microorganisms. Proc Natl Acad Sci USA. 1998;95:3140–3145. doi:10.1073/pnas.95.6.3140.
Google Scholar | Crossref | Medline | ISI2. Dingle, KE, Colles, FM, Wareing, DR, et al. Multilocus sequence typing system for Campylobacter jejuni. J Clin Microbiol. 2001;39:14–23. doi:10.1128/JCM.39.1.14-23.2001.
Google Scholar | Crossref | Medline3. Meats, E, Feil, EJ, Stringer, S, et al. Characterization of encapsulated and noncapsulated Haemophilus influenzae and determination of phylogenetic relationships by multilocus sequence typing. J Clin Microbiol. 2003;41:1623–1636. doi:10.1128/jcm.41.4.1623-1636.2003.
Google Scholar | Crossref | Medline4. Griffiths, D, Fawley, W, Kachrimanidou, M, et al. Multilocus sequence typing of Clostridium difficile. J Clin Microbiol. 2010;48:770–778. doi:10.1128/JCM.01796-09.
Google Scholar | Crossref | Medline | ISI5. Martin-Rodriguez, AJ, Suarez-Mesa, A, Artiles-Campelo, F, Romling, U, Hernandez, M. Multilocus sequence typing of Shewanella algae isolates identifies disease-causing Shewanella chilikensis strain 6I4. FEMS Microbiol Ecol. 2019;95:fiy210. doi:10.1093/femsec/fiy210.
Google Scholar | Crossref6. Kimura, B . Will the emergence of core genome MLST end the role of in silico MLST. Food Microbiol. 2018;75:28–36. doi:10.1016/j.fm.2017.09.003.
Google Scholar | Crossref | Medline7. mlst . https://github.com/tseemann/mlst. Updated 2015.
Google Scholar8. Larsen, MV, Cosentino, S, Rasmussen, S, et al. Multilocus sequence typing of total-genome-sequenced bacteria. J Clin Microbiol. 2012;50:1355–1361. doi:10.1128/jcm.06094-11.
Google Scholar | Crossref | Medline9. Ferrés, I, Iraola, G. MLSTar: automatic multilocus sequence typing of bacterial genomes in R. PeerJ. 2018;6:e5098. doi:10.7717/peerj.5098.
Google Scholar | Crossref | Medline10. Jolley, KA, Bray, JE, Maiden, MCJ. Open-access bacterial population genomics: BIGSdb software, the PubMLST.org website and their applications. Wellcome Open Res. 2018;3:124. doi:10.12688/wellcomeopenres.14826.1.
Google Scholar | Crossref | Medline11. Glaeser, SP, Kampfer, P. Multilocus sequence analysis (MLSA) in prokaryotic taxonomy. Syst Appl Microbiol. 2015;38:237–245. doi:10.1016/j.syapm.2015.03.007.
Google Scholar | Crossref | Medline | ISI12. Smith, DR . The design of divide and conquer algorithms. Sci Comp Prog. 1985;5:37–58. doi:10.1016/0167-6423(85)90003.
Google Scholar | Crossref13. Camacho, C, Coulouris, G, Avagyan, V, et al. BLAST+: architecture and applications. BMC Bioinf. 2009;10:421. doi:10.1186/1471-2105-10-421.
Google Scholar | Crossref | Medline | ISI14. Cock, PJ, Antao, T, Chang, JT, et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009;25:1422–1423. doi:10.1093/bioinformatics/btp163.
Google Scholar | Crossref | Medline | ISI15. Casper, da, Costa-Luis, SKL, Altendorf, K, Mary, H, et al. tqdm: a fast, extensible progress bar for Python and CLI. https://zenodo.org/record/5202772#.YYVCfGBBzIU. Updated 2021.
Google Scholar16. pandas-dev/pandas: Pandas . Version latest. Zenodo. https://zenodo.org/record/5574486#.YYVCy2BBzIU. Updated 2020.
Google Scholar17. Yim, WC, Cushman, JC. Divide and Conquer (DC) BLAST: fast and easy BLAST execution within HPC environments. PeerJ. 2017;5:e3486. doi:10.7717/peerj.3486.
Google Scholar | Crossref | Medline18. NCBI genome downloading scripts . Version 0.3.0. https://github.com/kblin/ncbi-genome-download. Updated 2020.
Google Scholar19. Pedregosa, F, Varoquaux, G, Gramfort, A, et al. Scikit-learn: machine learning in Python. J Mach Learn Res. 2011;12:2825–2830.
Google Scholar | ISI20. assembly-stats . https://github.com/sanger-pathogens/assembly-stats. Updated 2014.
Google Scholar

留言 (0)

沒有登入
gif