From large language models to multimodal AI: a scoping review on the potential of generative AI in medicine

Acosta JN, Falcone GJ, Rajpurkar P, Topol EJ. Multimodal biomedical AI. Nat Med. 2022;28(9):1773–84.

Tayebi Arasteh S, Han T, Lotfinia M, Kuhl C, Kather JN, Truhn D, Nebelung S. Large language models streamline automated machine learning for clinical studies. Nat Commun. 2024;15(1):1603.

Google Scholar

Cai X, Liu S, Han J, Yang L, Liu Z, Liu T. Chestxraybert: a pretrained language model for chest radiology report summarization. IEEE Trans Multimed. 2021;25:845–55.

Google Scholar

Li Y, Li Z, Zhang K, Dan R, Jiang S, Zhang Y. Chatdoctor: a medical chat model fine-tuned on a large language model meta-ai (llama) using medical domain knowledge. Cureus. 2023;15(6):65.

Google Scholar

Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, Tunyasuvunakool K, Bates R, Žídek A, Potapenko A, et al. Highly accurate protein structure prediction with alphafold. Nature. 2021;596(7873):583–9.

Google Scholar

Acosta JN, Dogra S, Adithan S, Wu K, Moritz M, Kwak S, Rajpurkar P. The impact of AI assistance on radiology reporting: a pilot study using simulated AI draft reports 2024; arxiv: 2412.12042.

Van Veen D, Van Uden C, Blankemeier L, Delbrouck J-B, Aali A, Bluethgen C, Pareek A, Polacin M, Reis EP, Seehofnerová A, et al. Adapted large language models can outperform medical experts in clinical text summarization. Nat Med. 2024;30(4):1134–42.

Google Scholar

Johnson AE, Pollard TJ, Berkowitz SJ, Greenbaum NR, Lungren MP, Deng C-Y, Mark RG, Horng S. Mimic-cxr, a de-identified publicly available database of chest radiographs with free-text reports. Sci Data. 2019;6(1):317.

Google Scholar

Huang S-C, Huo Z, Steinberg E, Chiang C-C, Lungren MP, Langlotz CP, Yeung S, Shah NH, Fries JA. Inspect: a multimodal dataset for pulmonary embolism diagnosis and prognosis. 2023; arXiv preprint arXiv:2311.10798.

Tayebi Arasteh S, Siepmann R, Huppertz M, Lotfinia M, Puladi B, Kuhl C, Truhn D, Nebelung S. The treasure trove hidden in plain sight: the utility of gpt-4 in chest radiograph evaluation. Radiology. 2024;313(2): 233441.

Google Scholar

Khader F, Müller-Franzes G, Wang T, Han T, Tayebi Arasteh S, Haarburger C, Stegmaier J, Bressem K, Kuhl C, Nebelung S, et al. Multimodal deep learning for integrating chest radiographs and clinical parameters: a case for transformers. Radiology. 2023;309(1): 230806.

Google Scholar

Frankish A, Diekhans M, Jungreis I, Lagarde J, Loveland JE, Mudge JM, Sisu C, Wright JC, Armstrong J, Barnes I, et al. Gencode 2021. Nucleic Acids Res. 2021;49(D1):916–23.

Google Scholar

Zhang K, Zhou R, Adhikarla E, Yan Z, Liu Y, Yu J, Liu Z, Chen X, Davison BD, Ren H, et al. A generalist vision-language foundation model for diverse biomedical tasks. Nat Med. 2024;30(11):1–13.

Google Scholar

Hamamci IE, Er S, Almas F, Simsek AG, Esirgun SN, Dogan I, Dasdelen MF, Wittmann B, Simsar E, Simsar M, et al. A foundation model utilizing chest ct volumes and radiology reports for supervised-level zero-shot detection of abnormalities. 2024; arXiv preprint arXiv:2403.17834 .

Li C, Wong C, Zhang S, Usuyama N, Liu H, Yang J, Naumann T, Poon H, Gao J. Llava-med: training a large language-and-vision assistant for biomedicine in one day. Adv Neural Inf Process Syst. 2024;8:36.

Google Scholar

Özsoy E, Pellegrini C, Keicher M, Navab N. Oracle: large vision-language models for knowledge-guided holistic or domain modeling. In: International conference on medical image computing and computer-assisted intervention. Springer; 2024. pp. 455–465.

Yin S, Fu C, Zhao S, Li K, Sun X, Xu T, Chen E. A survey on multimodal large language models. 2023; arXiv preprint arXiv:2306.13549.

Wang J, Jiang H, Liu Y, Ma C, Zhang X, Pan Y, Liu M, Gu P, Xia S, Li W, et al. A comprehensive review of multimodal large language models: performance and challenges across different tasks. 2024; arXiv preprint arXiv:2408.01319.

Kline A, Wang H, Li Y, Dennis S, Hutch M, Xu Z, Wang F, Cheng F, Luo Y. Multimodal machine learning in precision health: a scoping review. Npj Digit Med. 2022;5(1):171.

Google Scholar

He Y, Huang F, Jiang X, Nie Y, Wang M, Wang J, Chen H. Foundation model for advancing healthcare: challenges, opportunities, and future directions. 2024; arXiv preprint arXiv:2404.03264.

Tricco AC, Lillie E, Zarin W, O’Brien KK, Colquhoun H, Levac D, Moher D, Peters MD, Horsley T, Weeks L, et al. Prisma extension for scoping reviews (prisma-scr): checklist and explanation. Ann Intern Med. 2018;169(7):467–73.

Google Scholar

Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, Shamseer L, Tetzlaff JM, Akl EA, Brennan SE, et al. The prisma 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021;6:372.

Google Scholar

Ouzzani M, Hammady H, Fedorowicz Z, Elmagarmid A. Rayyan—a web and mobile app for systematic reviews. Syst Rev. 2016;5:1–10.

Google Scholar

Jiang LY, Liu XC, Nejatian NP, Nasir-Moin M, Wang D, Abidin A, Eaton K, Riina HA, Laufer I, Punjabi P, et al. Health system-scale language models are all-purpose prediction engines. Nature. 2023;619(7969):357–62.

Google Scholar

Vaswani A. Attention is all you need. Adv Neural Inf Process Syst. 2017.

Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, Kang J. Biobert: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics. 2020;36(4):1234–40.

Google Scholar

Naseem U, Khushi M, Reddy V, Rajendran S, Razzak I, Kim J. Bioalbert: a simple and effective pre-trained language model for biomedical named entity recognition. In: 2021 International joint conference on neural networks (IJCNN). IEEE; 2021. pp. 1–7.

Labrak Y, Bazoge A, Morin E, Gourraud P-A, Rouvier M, Dufour R. Biomistral: a collection of open-source pretrained large language models for medical domains. 2024; arXiv preprint arXiv:2402.10373.

Singhal K, Azizi S, Tu T, Mahdavi SS, Wei J, Chung HW, Scales N, Tanwani A, Cole-Lewis H, Pfohl S, et al. Large language models encode clinical knowledge. Nature. 2023;620(7972):172–80.

Google Scholar

Wang L, Chen X, Deng X, Wen H, You M, Liu W, Li Q, Li J. Prompt engineering in consistency and reliability with the evidence-based guideline for llms. Npj Digit Med. 2024;7(1):41.

Google Scholar

Lee H, Phatale S, Mansoor H, Lu KR, Mesnard T, Ferret J, Bishop C, Hall E, Carbune V, Rastogi A. Rlaif: scaling reinforcement learning from human feedback with ai feedback; 2023.

Zhang H, Chen J, Jiang F, Yu F, Chen Z, Li J, Chen G, Wu X, Zhang Z, Xiao Q, et al. Huatuogpt, towards taming language model to be a doctor. 2023; arXiv preprint arXiv:2305.15075.

Zakka C, Shad R, Chaurasia A, Dalal AR, Kim JL, Moor M, Fong R, Phillips C, Alexander K, Ashley E, et al. Almanac-retrieval-augmented language models for clinical medicine. NEJM AI. 2024;1(2):2300068.

Google Scholar

Luo R, Sun L, Xia Y, Qin T, Zhang S, Poon H, Liu T-Y. Biogpt: generative pre-trained transformer for biomedical text generation and mining. Brief Bioinform. 2022;23(6):409.

Google Scholar

Wang H, Gao C, Dantona C, Hull B, Sun J. Drg-llama: tuning llama model to predict diagnosis-related group for hospitalized patients. Npj Digit Med. 2024;7(1):16.

Google Scholar

Yang X, Chen A, PourNejatian N, Shin HC, Smith KE, Parisien C, Compas C, Martin C, Costa AB, Flores MG, et al. A large language model for electronic health records. NPJ Digit Med. 2022;5(1):194.

Google Scholar

Johnson AE, Bulgarelli L, Pollard TJ. Deidentification of free-text medical records using pre-trained bidirectional transformers. In: Proceedings of the ACM conference on health, inference, and learning; 2020. pp. 214–221.

Kresevic S, Giuffrè M, Ajcevic M, Accardo A, Crocè LS, Shung DL. Optimization of hepatological clinical guidelines interpretation by large language models: a retrieval augmented generation-based framework. NPJ Digit Med. 2024;7(1):102.

Google Scholar

Mahendran D, McInnes BT. Extracting adverse drug events from clinical notes. AMIA Summits Transl Sci Proc. 2021;2021:420.

Google Scholar

Lanfredi RB, Mukherjee P, Summers RM. Enhancing chest x-ray datasets with privacy-preserving large language models and multi-type annotations: a data-driven approach for improved classification. Med Image Anal. 2025;99: 103383.

Google Scholar

Liu N, Hu Q, Xu H, Xu X, Chen M. Med-bert: a pretraining framework for medical records named entity recognition. IEEE Trans Ind Inf. 2021;18(8):5600–8.

Google Scholar

Han T, Adams LC, Papaioannou J-M, Grundmann P, Oberhauser T, Löser A, Truhn D, Bressem KK. Medalpaca—an open-source collection of medical conversational ai models and training data. 2023; arXiv preprint arXiv:2304.08247.

Chen Z, Cano AH, Romanou A, Bonnet A, Matoba K, Salvi F, Pagliardini M, Fan S, Köpf A, Mohtashami A, et al. Meditron-70b: scaling medical pretraining for large language models. 2023; arXiv preprint arXiv:2311.16079.

Qiu P, Wu C, Zhang X, Lin W, Wang H, Zhang Y, Wang Y, Xie W. Towards building multilingual language model for medicine. Nat Commun. 2024;15(1):8384.

Google Scholar

Mu Y, Tizhoosh HR, Tayebi RM, Ross C, Sur M, Leber B, Campbell CJ. A bert model generates diagnostically relevant semantic embeddings from pathology synopses with active learning. Commun Med. 2021;1(1):11.

Google Scholar

Wu C, Lin W, Zhang X, Zhang Y, Xie W, Wang Y. Pmc-llama: toward building open-source language models for medicine. J Am Med Inform Assoc. 2024;6:045.

Google Scholar

Jia S, Bit S, Searls E, Claus LA, Fan P, Jasodanand VH, Lauber MV, Veerapaneni D, Wang WM, Au R, et al. Medpodgpt: a multilingual audio-augmented large language model for medical research and education. medRxiv 2024.

Yan A, McAuley J, Lu X, Du J, Chang EY, Gentili A, Hsu C-N. Radbert: adapting transformer-based language models to radiology. Radiol Artif Intell. 2022;4(4): 210258.

Google Scholar

Schmidt RA, Seah JC, Cao K, Lim L, Lim W, Yeung J. Generative large language models for detection of speech recognition errors in radiology reports. Radiol Artif Intell. 2024;6(2): 230205.

Google Scholar

Prihoda D, Maamary J, Waight A, Juan V, Fayadat-Dilman L, Svozil D, Bitton DA. Biophi: A platform for antibody design, humanization, and humanness evaluation based on natural antibody repertoires and deep learning. In: MAbs, vol 14. Taylor & Francis; 2022. p. 2020203.

Schubach M, Maass T, Nazaretyan L, Röner S, Kircher M. Cadd v1. 7: using protein language models, regulatory cnns and other nucleotide-level scores to improve genome-wide variant predictions. Nucleic Acids Res. 2024;52(D1):1143–54.

Google Scholar

Ji Y, Zhou Z, Liu H, Davuluri RV. Dnabert: pre-trained bidirectional encoder representations from transformers model for dna-language in genome. Bioinformatics. 2021;37(15):2112–20.

Google Scholar

Theodoris CV, Xiao L, Chopra A, Chaffin MD, Al Sayed ZR, Hill MC, Mantineo H, Brydon EM, Zeng Z, Liu XS, et al. Transfer learning enables predictions in network biology. Nature. 2023;618(7965):616–24.

Google Scholar

Hie BL, Shanker VR, Xu D, Bruun TU, Weidenbacher PA, Tang S, Wu W, Pak JE, Kim PS. Efficient evolution of human antibodies from general protein language models. Nat Biotechnol. 2024;42(2):275–83.

Google Scholar

Rao RM, Liu J, Verkuil R, Meier J, Canny J, Abbeel P, Sercu T, Rives A. Msa transformer. In: International conference on machine learning. PMLR; 2021. pp. 8844–8856.

Madani A, Krause B, Greene ER, Subramanian S, Mohr BP, Holton JM, Olmos JL, Xiong C, Sun ZZ, Socher R, et al. Large language models generate functional protein sequences across diverse families. Nat Biotechnol. 2023;41(8):1099–106.

Google Scholar

Ferruz N, Schmidt S, Höcker B. Protgpt2 is a deep unsupervised language model for protein design. Nat Commun. 2022;13(1):4348.

Google Scholar

Elnaggar A, Heinzinger M, Dallago C, Rehawi G, Wang Y, Jones L, Gibbs T, Feher T, Angerer C, Steinegger M, et al. Prottrans: toward understanding the language of life through self-supervised learning. IEEE Trans Pattern Anal Mach Intell. 2021;44(10):7112–27.

Google Scholar

Yang F, Wang W, Wang F, Fang Y, Tang D, Huang J, Lu H, Yao J. scbert as a large-scale pretrained deep language model for cell type annotation of single-cell rna-seq data. Nat Mach Intell. 2022;4(10):852–66.

Google Scholar

Rathore AS, Choudhury S, Arora A, Tijare P, Raghava GP. Toxinpred 3.0: an improved method for predicting the toxicity of peptides. Comput Biol Med. 2024;179: 108926.

Google Scholar

Chen J, Cai Z, Ji K, Wang X, Liu W, Wang R, Hou J, Wang B. Huatuogpt-o1, towards medical complex reasoning with llms. 2024; arXiv preprint arXiv:2412.18925.

Lewis P, Perez E, Piktus A, Petroni F, Karpukhin V, Goyal N, Küttler H, Lewis M, Yih W-T, Rocktäschel T, et al. Retrieval-augmented generation for knowledge-intensive nlp tasks. Adv Neural Inf Process Syst. 2020;33:9459–74.

Google Scholar

Arasteh ST, Lotfinia M, Bressem K, Siepmann R, Adams L, Ferber D, Kuhl C, Kather JN, Nebelung S, Truhn D. RadioRAG: factual large language models for enhanced diagnostics in radiology using online retrieval augmented generation 2024; arxiv: 2407.15621.

Gilbert S, Kather JN, Hogan A. Augmented non-hallucinating large language models as medical information curators. NPJ Digit Med. 2024;7(1):100.

Google Scholar

Nowak S, Biesner D, Layer Y, Theis M, Schneider H, Block W, Wulff B, Attenberger U, Sifa R, Sprinkart A. Transformer-based structuring of free-text radiology report databases. Eur Radiol. 2023;33(6):4228–36.

Google Scholar

Johnson AE, Bulgarelli L, Shen L, Gayles A, Shammout A, Horng S, Pollard TJ, Hao S, Moody B, Gow B, et al. Mimic-iv, a freely accessible electronic health record dataset. Sci Data. 2023;10(1):1.

Google Scholar

Pollard TJ, Johnson AE, Raffa JD, Celi LA, Mark RG, Badawi O. The eicu collaborative research database, a freely available multi-center database for critical care research. Sci Data. 2018;5(1):1–13.

Google Scholar

Zeng G, Yang W, Ju Z, Yang Y, Wang S, Zhang R, Zhou M, Zeng J, Dong X, Zhang R, et al. Meddialog: large-scale medical dialogue datasets. In: Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP); 2020. pp. 9241–9250.

Uniprot: the universal protein knowledgebase in 2023. Nucleic Acids Res. 2023;51(D1):523–531.

Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW. Genbank. Nucleic Acids Res. 2012;41(D1):36–42.

Google Scholar

View original article

BIOMEDICAL ENGINEERING LETTERS

Like

Share Bookmark

0 0 0 0 0 0 0

More from this channel

From large language models to multimodal AI: a scoping review on the potential of generative AI in medicine

Comments (0)