ADHD-KG: a knowledge graph of attention deficit hyperactivity disorder

This section begins with a performance evaluation, focusing on graph size and average execution time. We then discuss outcomes of a preliminary validation conducted in relation to the information linking procedures and query results. Finally, we discuss potential use cases of the proposed knowledge graph, including maintenance considerations, as well as the impact of ADHD-KG to ADHD researchers and practitioners, and researchers working on applications of AI in healthcare.

Performance

An overview of the distribution of triples contained in the ADHD-KG is shown in Table 1. Triples are organised in named graphs in order to introduce a logical structure over data without compromising interoperability, while at the same time promote a smoother and more intuitive query composition process. ADHD-KG contains 43 M triples, 6% of which corresponds to ADHD specific information, which is at a similar scale compared to other disease-specific graphs [10, 13]. It is worth noting that the graph intentionally includes complete copies of MeSH, DrugBank and SIDER data sources even for resources that are not referred to by existing ADHD publications. This strategy facilitates scalability and transferability; the graph can be extended with additional ADHD resources, such as information on ADHD in children and adolescents or adjusted to different disorders, provided that the new data sources are subjected to the processes described in Section “Methods”.

Table 1 Size of ADHD-KG in triples

To optimise query execution time, we employ data grouping induced by named graphs, which allows for localised graph search. This can lead to a notable difference in performance because a named graph limits the total potential results of a query. For instance, searching for the drug mesh:D008774 (“Methylphenidate”) in 124K triples (Clinical trials) is more efficient compared to the full 43 M triple graph. We quantify query performance by testing an unrealistic query template that links information from every data source within ADHD-KG. Note that we exclude queries that implement online information linking, since they introduce latency that heavily depends on the complexity of the regular expression used.

The test query aims to find all drug actions (DrugBank) and side effects (SIDER) of a concept named after a known drug such as “Methylphenidate” (MeSH), which must appear as a semantic annotation (annotation graph) within publications (PubMed) or clinical trials. The average execution time recorded for the test query does not exceed 0.8 s when run on a Intel Core i7-4790 processor at 3.60GHZ with 16GB RAM. We consider such execution times as suitable for integrating ADHD-KG queries in clinical, training or research processes (as discussed in Section “Use cases”), especially taking into account that retrieving the same information using traditional means would require significant time and effort in manually searching and combining results from various data sources.

Validation

We focused on validating two important aspects of ADHD-KG: the introduced information links and the quality of query results. For the first aspect, we employed the Unified Medical Language System (UMLS), which interconnects various biomedical vocabularies including MeSH. In particular, every pair of textual mention and medical concept is validated by online term-based queries against the UMLS Metathesaurus database.Footnote 10 The final graph contains only results that are validated through this process.

In what concerns quality of query output, results were thoroughly examined by a team of 5 clinical experts in adult ADHD (one of which is co-author), who were involved in the planning, design and validation of ADHD-KG. These experts were contacted and asked to provide 10 questions needing answer through ADHD-KG, ranging from simple questions (e.g., what is the most effective medicine) to open questions (e.g., association of ADHD diagnosis with social stigma). The objective of the validation process was to compare answers generated by ADHD-KG to the answers expected by the experts. The criteria for evaluation were how relevant and correct the produced answers are. A detailed summary of the evaluation results is depicted in Table 2.

Generation of answers

Each question is transformed into a SPARQL query, which is issued against ADHD-KG and results into a number of resources. These include text summaries (e.g., publication abstracts or report summaries), semantic annotations within text or database entries such as medicine or medical concepts. Resources are used to inform the answer for the question under consideration and they result from the triple matching procedures as part of querying the ADHD-KG. Depending on the complexity of the question, aggregation is applied to provide a more direct answer. In particular, for questions that afford quantification, such as ordering entities based on some constraints, post-hoc aggregation procedures were applied to generate representative frequencies or plots.

Evaluation

Upon resource retrieval and generation of direct answers (when applicable), each expert was contacted separately and was asked to rate generated answers according to the criteria of relevance and correctness.

In terms of relevance, experts were presented with a random sample of 10 retrieved resources for each question, which were rated as “irrelevant” or “relevant”. Correctness was estimated by asking experts to classify the generated plots or frequencies as“valid” or “invalid”. In cases where the retrieved resources could not be aggregated, these were excluded from the estimation of the overall correctness; we denote these cases with “N/A” under correctness. The final scores of relevance per question are estimated by averaging the individual expert ratings and converting them into percentages depicting the overall agreement. For correctness, we used majority voting (e.g. the agreed classification is “valid” when at least 3 out of 5 experts classified an answer as “valid”).

Results

In most questions, ADHD-KG results were considered relevant and correct by experts, confirming correspondence to relevant research in ADHD. The experts did note that ADHD-KG query output sometimes included information that may be considered outdated. This is due to the fact that the queries that were executed were time insensitive, therefore an exhaustive set of results was returned. The overall relevance of the retrieved resources across all 10 questions is 77%, whereas the validity of the generated result is 85%. It is worth noting that 7 out of 10 questions were addressed with a direct answer in the form of a chart, but the remaining three (questions 6, 8 and 10) are quite elaborate to be quantified with simple aggregation. The score of 85% considers only the 7 questions were a direct answer was possible, with 6 out of 7 returning a valid result.

Elaborating on relevance results, ADHD-KG excels in addressing straightforward questions, which are based on medical concepts, for example associating medical documents with medication, side effects, proteins and so on. Relevance drops when answering questions related to textual information; this is prevalent in open questions such as patterns of ADHD onset or differences in ADHD symptoms in the presence of co-occurring conditions. Performance is expected to be affected in such scenarios, since these rely on text understanding, which is not implemented in ADHD-KG, apart from preliminary annotation of text in terms of medical concepts. Extending the annotation and connection graph (Section “ADHD-KG assembly”) with a richer medical thesauri such as SNOMED, as opposed to MeSH, can lead to major improvements in the expressiveness of the relevant queries. In order to improve ADHD-KG capabilities on addressing text-based questions, high-end natural language understanding technologies are necessary, which is an interesting idea to explore for future work. Nevertheless, question 10, which relates to the correlation of ADHD diagnosis with social stigma is excellent at showcasing the knowledge discovery potential of ADHD-KG; 70% resource relevance was achieved, despite the graph not being equipped to differentiate the textual co-occurrence of ADHD diagnosis and social stigma from an actual context-wise correlation.

Table 2 Evaluation of produced results in terms of correctness and resource relevance

It should be noted that this is a preliminary validation, thus we excluded completeness of the generated answers. We intend to follow this expert-based validation which confirmed the appropriateness of ADHD-KG in answering relevant questions, with a wider clinician-oriented validation phase as discussed in Section “Conclusion”.

Use cases

We identify three major use cases, in which ADHD-KG has the potential to improve current practice. These relate to answering clinical, training and medical queries, as follows:

Clinical queries include scenarios where a doctor treating a particular patient needs to investigate relevant knowledge in ADHD literature, in order to decide the best course of action. For instance, they may need to find an alternative medication that limits unwanted side effects or gain insights about a case by examining correlated clinical trials.

Training queries focus on supporting junior clinicians in learning the basics of the ADHD domain, such as ADHD medication that impedes sleep quality or mental disorders known to hinder the ADHD diagnosis.

Medical queries refer to queries about the latest developments in the field of ADHD, which medical experts and researchers wish to explore. These may indicatively involve applied scenarios, such as progress in using stimulants to improve emotional dysregulation, or prevailing questions, such as reviewing literature that explores whether ADHD diagnosis increases stigma.

By querying ADHD-KG instead of conducting a laborious manual search, we expect efficiency improvements in all three contexts: reduced time taken to identify appropriate treatment, faster training processes reducing the reliance of junior clinicians to senior colleagues and easier access to state of the art in adult ADHD for medical experts and researchers. In the long term, these efficiency improvements depend on the ability of ADHD-KG to account for the latest developments and debates in relevant literature, given the fact that contributions in ADHD research are quite frequent. The modular architecture of ADHD-KG allows for effective maintenance and evolution practices by enabling on-demand updates on the individual constituent graphs through the integration pipeline described in Section “Methods”. This will support frequent releases on a yearly cycle at minimum.

Impact of ADHD-KG

In this section, we expand on the contributions of ADHD-KG to relevant academic communities, in particular researchers within the ADHD and Artificial Intelligence in healthcare fields. In addition, we provide some insights on how the proposed technology can be integrated with complementary emerging technologies using Large Language Models (LLMs).

ADHD research and practice

As medical practice becomes more complicated and the available literature on ADHD is increasing, experts face difficulties in retrieving the best evidence to inform their practice, research or clinical trials. This task becomes a notable challenge, when considering that medical literature is often distributed in several knowledge sources with disparate schemas (for instance, databases for medicine or clinical trial reports), raising the need for manual inspection and association of different resources. ADHD-KG simplifies information retrieval and has the potential to set the foundation for effective medical question answering. Knowledge about ADHD is integrated into a single resource, which facilitates the transition from time-consuming manual reviews of medical literature towards automated semantic search over encoded knowledge using powerful SPARQL queries.

Using the developed graph, ADHD researchers are capable of issuing queries referring to medical entities and also considering multiple sources in their search space. As a result, ADHD-KG speeds up the acquisition of medical knowledge by lifting the burden of information alignment and enabling flexible information retrieval. In particular, search with ADHD-KG goes beyond word matching, benefitting, for instance, from class hierarchies of symptoms or categories of medicine. It can also be customised using complex constraints that combine multiple information sources, e.g., retrieve every stimulant that is included in clinical trials, where participants are classified as obese. In addition to these, ADHD-KG can be useful within training processes of junior clinicians and researchers who have limited ADHD-related experience. Instead of the traditional clinical case study approach to demonstrate a learning point, senior colleagues who act as trainers can ask ADHD-KG a specific question and demonstrate the learning point from the provided answer.

Another area where ADHD-KG can have a positive impact is related to currency of knowledge. On many occasions, clinical guidelines become obsolete as soon as they are published, therefore valuable time is spent by clinicians and researchers to seek the most up-to-date knowledge. ADHD-KG supports time-sensitive information retrieval. All entries are associated with time references (e.g., publication dates, dates for changes in medicine and so on), allowing users to discern whether an entry is deprecated or contemporaneous, thus keeping track of their currency and validity. Through the frequent update cycle that is planned for ADHD-KG, we ensure that the latest information is always included.

Artificial intelligence in healthcare

The importance of ADHD-KG as a collection and integration of high-quality knowledge about the specific disease is topical in that it may complement perfectly recent developments in artificial intelligence and its application to healthcare. Since the publication of ChatGPT,Footnote 11 LLMs have attracted a phenomenal attention in all sorts of domains, including healthcare and medical research [33, 34]. Despite the huge potential of this technology, there are weaknesses around the quality of its outputs, including so-called hallucinations [35], which make it less tailored to mission critical domains like healthcare. In addition, in healthcare there is often a paramount need for explainability and supporting decisions through references to high-quality sources such as medical research publications. It is in this context that ADHD-KG and similar research can play a crucial role, as such disease-specific knowledge graphs are collecting high-quality knowledge and are inherently capable of explaining the produced answers. Coupling such knowledge graphs with LLMs promises to combine the strengths of both approaches, and there is already significant research in this direction [36].

留言 (0)

沒有登入
gif