Sentences, entities, and keyphrases extraction from consumer health forums using multi-task learning

Cao Y, Liu F, Simpson P, Antieau L, Bennett A, Cimino JJ, et al. AskHERMES: An online question answering system for complex clinical questions. J Biomed Inform. 2011;44(2):277–88.

Article Google Scholar

Weissenborn D, Tsatsaronis G, Schroeder M. Answering factoid questions in the biomedical domain. In: Proceedings of the first Workshop on Bio-Medical Semantic Indexing and Question Answering, a Post-Conference Workshop of Conference and Labs of the Evaluation Forum (CLEF); 2013 September 27; Valencia, Spain. Aachen: CEUR-WS.org; 2013;1094:1-6. Available from: https://ceur-ws.org/Vol-1094/bioasq2013_submission_5.pdf.

Abacha AB, Zweigenbaum P. MEANS: A medical question-answering system combining NLP techniques and semantic Web technologies. Inf Process Manag. 2015;51(5):570–94.

Article Google Scholar

Yang Z, Zhou Y, Nyberg E. Learning to Answer Biomedical Questions: OAQA at BioASQ 4B. In: Kakadiaris IA, Paliouras G, Krithara A, editors. Proceedings of the Fourth BioASQ workshop. Berlin: Association for Computational Linguistics; 2016. pp. 23–37. https://doi.org/10.18653/v1/W16-3104.

Wiese G, Weissenborn D, Neves M. Neural Domain Adaptation for Biomedical Question Answering. In: Levy R, Specia L, editors. Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017). Vancouver: Association for Computational Linguistics; 2017. pp. 281–289. https://doi.org/10.18653/v1/K17-1029.

Jin Q, Yuan Z, Xiong G, Yu Q, Ying H, Tan C, et al. Biomedical Question Answering: A Survey of Approaches and Challenges. ACM Comput Surv. 2022;55(2). https://doi.org/10.1145/3490238.

Zhang Y, Lu W, Ou W, Zhang G, Zhang X, Cheng J, et al. Chinese medical question answer selection via hybrid models based on CNN and GRU. Multimedia Tools Appl. 2020;79:14751–76.

Article Google Scholar

Lamurias A, Sousa D, Couto FM. Generating biomedical question answering corpora from Q &A forums. IEEE Access. 2020;8:161042–51.

Article Google Scholar

Ekakristi AS, Mahendra R, Adriani M. Finding Questions in Medical Forum Posts Using Sequence Labeling Approach. In: International Conference on Computational Linguistics and Intelligent Text Processing. Springer; 2018. pp. 62–73.

Rohman WN. Pengenalan entitas kesehatan pada forum kesehatan online dengan menggunakan percurrent neural networks [Bachelor’s thesis]. Kampus UI Depok: Universitas Indonesia; 2017.

Google Scholar

Saputra IF, Mahendra R, Wicaksono AF. Keyphrases extraction from user generated contents in healthcare domain using long short-term memory networks. In: Proceedings of the BioNLP 2018 Workshop; 2018 July; Melbourne, Australia. USA: Association for Computational Linguistics; 2018. p. 28–34. Available from: https://doi.org/10.18653/v1/W18-2304.

Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein J, Doran C, Solorio T, editors. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Minneapolis: Association for Computational Linguistics; 2019. pp. 4171–4186. https://doi.org/10.18653/v1/N19-1423.

Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Adv Neural Inf Process Syst. 2020;33:1877–901.

Google Scholar

Touvron H, Martin L, Stone K, Albert P, Almahairi A, Babaei Y, et al. Llama 2: Open Foundation and Fine-Tuned Chat Models. 2023. arXiv:2307.09288.

Wang S, Sun X, Li X, Ouyang R, Wu F, Zhang T, et al. Gpt-ner: Named entity recognition via large language models. 2023. arXiv:2304.10428.

Hu Y, Ameer I, Zuo X, Peng X, Zhou Y, Li Z, et al. Zero-shot clinical entity recognition using chatgpt. 2023. arXiv:2303.16416.

Hu Y, Chen Q, Du J, Peng X, Keloth VK, Zuo X, Zhou Y, Li Z, Jiang X, Lu Z, et al. Improving large language models for clinical named entity recognition via prompt engineering. J Am Med Inform Assoc. 2024;31(9);1812–20. Available from: https://doi.org/10.1093/jamia/ocad259.

Roberts K, Kilicoglu H, Fiszman M, Demner-Fushman D. Decomposing Consumer Health Questions. In: Cohen K, Demner-Fushman D, Ananiadou S, Tsujii Ji, editors. Proceedings of BioNLP 2014. Baltimore: Association for Computational Linguistics; 2014. pp. 29–37. https://doi.org/10.3115/v1/W14-3405.

Mahendra R, Hakim AN, Adriani M. Towards question identification from online healthcare consultation forum post in bahasa. In: 2017 International Conference on Asian Language Processing (IALP). 2017. pp. 399–402. https://doi.org/10.1109/IALP.2017.8300627.

Abacha AB, Zweigenbaum P. Medical entity recognition: A comparaison of semantic and statistical methods. In: Proceedings of BioNLP 2011 Workshop; 2011 June. Portland: Association for Computational Linguistics; 2011. p. 56–64. Available from: https://aclanthology.org/W11-0207/.

Wu Y, Jiang M, Xu J, Zhi D, Xu H. Clinical named entity recognition using deep learning models. In: AMIA annual symposium proceedings. vol. 2017. American Medical Informatics Association; 2017. pp. 1812.

Xu K, Zhou Z, Hao T, Liu W. A bidirectional LSTM and conditional random fields approach to medical named entity recognition. In: Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2017. Springer; 2018. pp. 355–365.

Cho M, Ha J, Park C, Park S. Combinatorial feature embedding based on CNN and LSTM for biomedical named entity recognition. J Biomed Inform. 2020;103: 103381.

Article Google Scholar

Yu X, Hu W, Lu S, Sun X, Yuan Z. BioBERT Based Named Entity Recognition in Electronic Medical Record. In: 2019 10th International Conference on Information Technology in Medicine and Education (ITME). 2019. pp. 49–52. https://doi.org/10.1109/ITME.2019.00022.

Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics. 2019;36(4):1234–40. https://doi.org/10.1093/bioinformatics/btz682.

Article Google Scholar

Peng Y, Yan S, Lu Z. Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets. In: Demner-Fushman D, Cohen KB, Ananiadou S, Tsujii J, editors. Proceedings of the 18th BioNLP Workshop and Shared Task. Florence: Association for Computational Linguistics; 2019. pp. 58–65. https://doi.org/10.18653/v1/W19-5006.

Dai Z, Wang X, Ni P, Li Y, Li G, Bai X. Named Entity Recognition Using BERT BiLSTM CRF for Chinese Electronic Health Records. In: 2019 12th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI). 2019. pp. 1–5. https://doi.org/10.1109/CISP-BMEI48845.2019.8965823.

Ashok D, Lipton ZC. Promptner: Prompting for named entity recognition. 2023. arXiv:2305.15444.

Alsentzer E, Murphy J, Boag W, Weng WH, Jindi D, Naumann T, et al. Publicly Available Clinical BERT Embeddings. In: Rumshisky A, Roberts K, Bethard S, Naumann T, editors. Proceedings of the 2nd Clinical Natural Language Processing Workshop. Minneapolis: Association for Computational Linguistics; 2019. pp. 72–78. https://doi.org/10.18653/v1/W19-1909.

Suwarningsih W, Supriana I, Purwarianti A. ImNER Indonesian medical named entity recognition. In: 2014 2nd International Conference on Technology, Informatics, Management, Engineering & Environment. IEEE; 2014. pp. 184–188.

Sadikin M, Fanany MI, Basaruddin T. A new data representation based on training data characteristics to extract drug name entity in medical text. Comput Intell Neurosci. 2016;2016:3483528. Available from: https://doi.org/10.1155/2016/3483528.

Herwando R, Jiwanggi MA, Adriani M, Medical entity recognition using conditional random field (CRF). In: 2017 International Workshop on Big Data and Information Security (IWBIS). IEEE; 2017. pp. 57–62.

Hasan KS, Ng V. Automatic keyphrase extraction: A survey of the state of the art. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); 2014 June Baltimore, Maryland. USA: Association for Computational Linguistics; 2014. p. 1262–73. Available from: https://doi.org/10.3115/v1/P14-1119.

Chen M, Sun JT, Zeng HJ, Lam KY. A practical system of keyphrase extraction for web pages. In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management; 2005; Bremen, Germany. New YorK: Association for Computing Machinery; 2005. p. 277–8. Available from: https://doi.org/10.1145/1099554.1099625.

Nguyen TD, Kan MY. Keyphrase extraction in scientific publications. In: International conference on Asian digital libraries. Springer; 2007. pp. 317–326.

Mahata D, Kuriakose J, Shah R, Zimmermann R. Key2vec: Automatic ranked keyphrase extraction from scientific articles using phrase embeddings. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers); 2018 June; New Orleans, Louisiana. USA: Association for Computational Linguistics; 2018. p. 634–9. Available from: https://doi.org/10.18653/v1/N18-2100.

Devika R, Vairavasundaram S, Mahenthar CSJ, Varadarajan V, Kotecha K. A deep learning model based on BERT and sentence transformer for semantic keyphrase extraction on big social data. IEEE Access. 2021;9:165252–61.

Article Google Scholar

Dredze M, Wallach HM, Puller D, Pereira F. Generating summary keywords for emails using topics. In: Proceedings of the 13th International Conference on Intelligent User Interfaces; 2008; Gran Canaria, Spain. New York: Association for Computing Machinery; 2018. p. 199–206. Available from: https://doi.org/10.1145/1378773.1378800.

Kim SN, Baldwin T. Extracting keywords from multi-party live chats. In: Proceedings of the 26th Pacific Asia Conference on Language, Information, and Computation; 2012 November; Bali, Indonesia. Indonesia: Faculty of Computer Science, Universitas Indonesia; 2018. p. 199–208. Available from: https://aclanthology.org/Y12-1021.

Papagiannopoulou E, Tsoumakas G. A review of keyphrase extraction. Wiley Interdiscip Rev Data Min Knowl Disc. 2020;10(2):e1339.

Article Google Scholar

Campos R, Mangaravite V, Pasquali A, Jorge A, Nunes C, Jatowt A. YAKE! Keyword extraction from single documents using multiple local features. Inf Sci. 2020;509:257–89.

Article Google Scholar

Mihalcea R, Tarau P. Textrank: Bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing; 2004 July. Barcelona, Spain. USA: Association for Computational Linguistics; 2004. p. 404–11. Available from: https://aclanthology.org/W04-3252.

Rose S, Engel D, Cramer N, Cowley W. Text mining: applications and theory. Hoboken, USA: John Wiley & Sons, Ltd; 2010. Chapter 1, Automatic keyword extraction from individual documents; p. 1–20.

Witten IH, Paynter GW, Frank E, Gutwin C, Nevill-Manning CG. KEA: Practical automatic keyphrase extraction. In: Proceedings of the Fourth ACM Conference on Digital Libraries; 1999; Berkeley, California, USA. New York: Association for Computing Machinery; 1999. p. 254–5. Available from: https://doi.org/10.1145/313238.313437.

Meng R, Zhao S, Han S, He D, Brusilovsky P, Chi Y. Deep Keyphrase Generation. In: Barzilay R, Kan MY, editors. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Vancouver: Association for Computational Linguistics; 2017. pp. 582–592. https://doi.org/10.18653/v1/P17-1054.

Chen J, Zhang X, Wu Y, Yan Z, Li Z. Keyphrase Generation with Correlation Constraints. In: Riloff E, Chiang D, Hockenmaier J, Tsujii J, editors. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Brussels: Association for Computational Linguistics; 2018. pp. 4057–4066. https://doi.org/10.18653/v1/D18-1439.

Basaldella M, Antolli E, Serra G, Tasso C. Bidirectional lstm recurrent neural network for keyphrase extraction. In: Digital Libraries and Multimedia Archives: 14th Italian Research Conference on Digital Libraries, IRCDL 2018, Udine, Italy, January 25-26, 2018, Proceedings 14. Springer; 2018. pp. 180–187.

Kulkarni M, Mahata D, Arora R, Bhowmik R. Learning Rich Representation of Keyphrases from Text. In: Carpuat M, de Marneffe MC, Meza Ruiz IV, editors. Findings of the Association for Computational Linguistics: NAACL 2022. Seattle: Association for Computational Linguistics; 2022. pp. 891–906. https://doi.org/10.18653/v1/2022.findings-naacl.67.

Lewis M, Liu Y, Goyal N, Ghazvininejad M, Mohamed A, Levy O, et al. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. In: Jurafsky D, Chai J, Schluter N, Tetreault J, editors. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Online: Association for Computational Linguistics; 2020. pp. 7871–7880. https://doi.org/10.18653/v1/2020.acl-main.703.

Zhu X, Lou Y, Zhao J, Gao W, Deng H. Generative non-autoregressive unsupervised keyphrase extraction with neural topic modeling. Eng Appl Artif Intell. 2023;120:105934. Available from: https://doi.org/10.1016/j.engappai.2023.105934.

Chen W, Chan HP, Li P, Bing L, King I. An Integrated Approach for Keyphrase Generation via Exploring the Power of Retrieval and Extraction. In: Burstein J, Doran C, Solorio T, editors. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Minneapolis: Association for Computational Linguistics; 2019. pp. 2846–2856. https://doi.org/10.18653/v1/N19-1292.

Wu H, Liu W, Li L, Nie D, Chen T, Zhang F, et al. UniKeyphrase: A Unified Extraction and Generation Framework for Keyphrase Prediction. In: Zong C, Xia F, Li W, Navigli R, editors. Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. Online: Association for Computational Linguistics; 2021. pp. 825–835. https://doi.org/10.18653/v1/2021.findings-acl.73.

Cao Y, Cimino JJ, Ely J, Yu H. Automatically extracting information needs from complex clinical questions. J Biomed Inform. 2010;43(6):962–71. https://doi.org/10.1016/j.jbi.2010.07.007.

Sarkar K. A Hybrid Approach to Extract Keyphrases from Medical Documents. Int J Comput Appl. 2013;63(18):14–9. https://doi.org/10.5120/10565-5528.

Article Google Scholar

Ding L, Zhang Z, Liu H, Li J, Yu G. Automatic keyphrase extraction from scientific Chinese medical abstracts based on character-level sequence labeling. J Data Inf Sci. 2021;6(3):35–57.

Google Scholar

Ding L, Zhang Z, Zhao Y. Bert-based chinese medical keyphrase extraction model enhanced with external features. In: International Conference on Asian Digital Libraries. Springer; 2021. pp. 167–176.

Hakim AN, Mahendra R, Adriani M, Ekakristi AS. Corpus development for indonesian consumer-health question answering system. In: 2017 International Conference on Advanced Computer Science and Information Systems (ICACSIS). IEEE; 2017. pp. 222–227.

Tkachenko M, Malyuk M, Holmanyuk A, Liubimov N. Label Studio: Data labeling software. San Francisco: HumanSignal; 2020-2025. Accessed 2023 Sep 18. Available from: https://github.com/heartexlabs/label-studio.

Lehman E, DeYoung J, Barzilay R, Wallace BC. Inferring Which Medical Treatments Work from Reports of Clinical Trials. In: Burstein J, Doran C, Solorio T, editors. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Minneapolis: Association for Computational Linguistics; 2019. pp. 3705–3717. https://doi.org/10.18653/v1/N19-1371.

Cohen J. A coefficient of agreement for nominal scales. Educ Psychol Meas. 1960;20(1):37–46.

Article Google Scholar

Viera AJ, Garrett JM, et al. Understanding interobserver agreement: the kappa statistic. Fam Med. 2005;37(5):360–3.

Google Scholar

Hripcsak G, Rothschild AS. Agreement, the f-measure, and reliability in information retrieval. J Am Med Inform Assoc. 2005;12(3):296–8.

Article Google Scholar

Grouin C, Rosset S, Zweigenbaum P, Fort K, Galibert O, Quintard L. Proposal for an extension of traditional named entities: From guidelines to evaluation, an overview. In: Proceedings of the 5th Linguistic Annotation Workshop; 2011 June. Portland: Association for Computational Linguistics; 2011. p. 92–100. Available from: https://aclanthology.org/W11-0411.

Deleger L, Li Q, Lingren T, Kaiser M, Molnar K, Stoutenborough L, et al. Building gold standard corpora for medical natural language processing tasks. In: AMIA Annual Symposium Proceedings, vol. 2012. American Medical Informatics Association; 2012. p. 144.

Brandsen A, Verberne S, Wansleeben M, Lambers K. Creating a dataset for named entity recognition in the archaeology domain. In: Proceedings of the Twelfth Language Resources and Evaluation Conference; 2020 May; Marseille, France. France: European Language Resources Association; 2020. p. 4573–7. Available from: https://aclanthology.org/2020.lrec-1.562.

Lafferty J, McCallum A, Pereira F, et al. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Icml, vol. 1. Williamstown; 2001. p. 3.

Koto F, Rahimi A, Lau JH, Baldwin T. IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP. In: Proceedings of the 28th International Conference on Computational Linguistics. Barcelona: International Committee on Computational Linguistics; 2020. pp. 757–770. https://doi.org/10.18653/v1/2020.coling-main.66.

Wilie B, Vincentio K, Winata GI, Cahyawijaya S, Li X, Lim ZY, Soleman S, Mahendra R, Fung P, Bahar S, Purwarianti A. IndoNLU: Benchmark and resources for evaluating indonesian natural language understanding. In: Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing; 2020 December; Suzhou, China. USA: Association for Computational Linguistics; 2020. p. 843-57. Available from: https://doi.org/10.18653/v1/2020.aacl-main.85.

Koto F, Lau JH, Baldwin T. IndoBERTweet: A Pretrained Language Model for Indonesian Twitter with Effective Domain-Specific Vocabulary Initialization. In: Moens MF, Huang X, Specia L, Yih SWt, editors. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Online and Punta Cana: Association for Computational Linguistics; 2021. pp. 10660–10668. https://doi.org/10.18653/v1/2021.emnlp-main.833.

Conneau A, Lample G. Cross-lingual language model pretraining. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems; 2019. Red Hook, NY, USA: Curran Associates Inc.; 2019. p. 7059-69. Available from: https://doi.org/10.5555/3454287.3454921.

Conneau A, Khandelwal K, Goyal N, Chaudhary V, Wenzek G, Guzmán F, et al. Unsupervised Cross-lingual Representation Learning at Scale. In: Jurafsky D, Chai J, Schluter N, Tetreault J, editors. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Online: Association for Computational Linguistics; 2020. pp. 8440–8451. https://doi.org/10.18653/v1/2020.acl-main.747.

Del Barrio E, Cuesta-Albertos JA, Matran C. An optimal transportation approach for assessing almost stochastic order. The Mathematics of the Uncertain: A Tribute to Pedro Gil. Cham, Switzerland: Springer International Publishing; 2018. p. 33–44. (Studies in Systems, Decision and Control; vol 142).

Dror R, Shlomov S, Reichart R. Deep Dominance - How to Properly Compare Deep Neural Models. In: Korhonen A, Traum D, Màrquez L, editors. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence: Association for Computational Linguistics; 2019. pp. 2773–2785. https://doi.org/10.18653/v1/P19-1266.

Sang EFTK, De Meulder F. Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL; 2003. Edmonton: Association for Computational Linguistics; 2003. p. 142-7. Available from: https://doi.org/10.3115/1119176.1119195.

Han R, Peng T, Yang C, Wang B, Liu L, Wan X. Is information extraction solved by chatgpt? an analysis of performance, evaluation criteria, robustness and errors. 2023. arXiv:2305.14450.

View original article

JOURNAL OF BIOMEDICAL SEMANTICS

Like

Share Bookmark

0 0 0 0 0 0 0

More from this channel

Sentences, entities, and keyphrases extraction from consumer health forums using multi-task learning

Comments (0)