SPeC: A Soft Prompt-Based Calibration on Performance Variability of Large Language Model in Clinical Notes Summarization

Electronic health records (EHRs) have brought about a revolutionary change in the accessibility and utilization of patient information by healthcare providers. This transformation has impacted the decision-making process of medical professionals, enabling them to make informed decisions about patient care [1]. Among the many benefits of EHRs, the ability to summarize clinical notes is of particular importance [2], as it enables healthcare providers to quickly identify potential health risks and make better-informed decisions. By presenting the most relevant and up-to-date patient information, clinical note summarization ultimately contributes to a reduction in diagnosis errors and an improvement in patient outcomes [3]. However, manual summarization of clinical findings and reports into summaries is both time-consuming and prone to errors [4]. Moreover, given the volume and complexity of the data, even experienced clinicians can inadvertently overlook significant aspects of the patient’s condition. Thus, there is a pressing need to develop automated methods for generating summaries, enhancing both efficiency and accuracy in patient care.

Recent developments in the field of natural language processing, particularly the advent of large language models (LLMs), have showcased a substantial potential for enhancing the efficiency of automated clinical note summarization [5], [6], [7]. It is been noted that these models show remarkable ability for aligning input instruction [8], thereby suggesting the integration of text prompts as a viable approach for improving LLM performance in summarization tasks [9]. For instance, a carefully crafted instruction prompt such as ’Summarize the key findings in the patient’s medical history’ effectively directs the LLM’s focus toward extracting the most pertinent information from the clinical notes. However, designing an effective text prompt – termed ’prompt engineering’ – is challenging. It requires precision, informativeness, and a confluence of knowledge from several domains [10]. In this manner, the use of LLM-generated prompts can assist non-NLP experts in aligning the high requirements of prompt engineering with the knowledge that is already pre-trained in LLMs. However, our study reveals that even minor modifications to the LLM-generated prompts can result in significant variations in the summarization outcomes, as demonstrated in Table 1. The unstable quality of LLM-generated prompts may limit the ability of non-NLP experts to effectively leverage LLMs for their intended tasks.

In response to this challenge, we present a model-agnostic approach named Soft Prompt-BasedCalibration (SPeC), designed to address the issue of performance variability in clinical notes summarization [11]. Unlike conventional discrete text prompts, where each input token carries a definite meaning, soft prompts are flexible and learnable tokens devoid of pre-defined significance [12]. This flexibility enables the model to learn specific parameters tailored for them. By leveraging soft prompts, our approach aims to mitigate variance while retaining the benefits of discrete prompt-based summarization. Specifically, we propose a soft prompt encoder to interact soft prompts with discrete text prompts throughout the token embedding space. Our proposed soft prompt encoder is a zero-shot learning model that does not require any of the golden summarization references as ground truth labels during the training phase. Our aim is to ensure that the summarization generated from the prompted input (e.g., input clinical notes with LLM-generated prompts) maintains a semantic meaning similar to that of the input clinical notes and then embed that information into soft prompts for being robust to the quality impacts of LLM-generated prompts. The experimental findings demonstrate that SPeC not only improves overall performance but also effectively reduces variance across different LLMs. For instance, SPeC deduces up to 43.1% of the performance variability in the Flan-T5 Model [13]. This results in a more uniform and reliable solution for summarizing crucial medical information, ultimately supporting healthcare professionals in making well-informed decisions and providing optimal patient care. The success of SPeC in addressing the issue of performance variability has significant implications for the future of clinical note summarization and the broader application of LLMs in the healthcare domain.

留言 (0)

沒有登入
gif