Risks and benefits of ChatGPT in informing patients and families with rare kidney diseases: an explorative assessment by the European Rare Kidney Disease Reference Network (ERKNet)

A total of 54 participants (42 ERKNet experts and 12 ePAG representatives) provided valid responses that were included in this analysis. Responses from four ERKNet experts were excluded due to the use of imprecise disease terminology (e.g., “polycystic kidney disease” rather than specifying ADPKD or ARPKD). Among the 54 participants, 34 were aged 30–50 years, and 20 were over 50 years old. Participants were from various countries, including Germany (n = 13), the Netherlands (n = 8), Italy (n = 6), Spain (n = 5), Belgium (n = 2), Poland (n = 2), Sweden (n = 2), the UK (n = 2), and one participant each from the Czech Republic, France, Ireland, Malta, Romania, and Slovenia, while 8 participants did not wish to reveal their country of origin (Supplemental Fig. 1). In terms of professional background, 32 participants identified as pediatric nephrologists, 7 as adult nephrologists, and 3 as pathologists. Regarding prior experience with ChatGPT, 16 participants reported using it for the first time, 19 had used it only “for fun,” and 19 had also used it for work-related tasks.

Fig. 1figure 1

Twenty-eight rare kidney diseases selected by 54 ERKNet experts and ePAGs. ADPKD, autosomal dominant polycystic kidney disease; ADTKD, autosomal dominant tubulointerstitial kidney disease; aHUS, atypical hemolytic uremic syndrome; APRTD, adenine phosphoribosyltransferase deficiency; ARPKD, autosomal recessive polycystic kidney disease; CF, cystic fibrosis; FHHNC, familial primary hypomagnesemia with hypercalciuria and nephrocalcinosis; FMF, familial mediterranean fever; MSpK, medullary sponge kidney; NDI, nephrogenic diabetes insipidus; NPHP, nephronophthisis; PHA, pseudohypoaldosteronism; PUV, posterior urethral valves; TMA, thrombotic microangiopathy; XLH, X-linked hypophosphatemia

The 54 participants selected a total of 28 different rare kidney diseases. The most frequently selected conditions included atypical hemolytic uremic syndrome (aHUS) (n = 6), autosomal recessive polycystic kidney disease (ARPKD) (n = 6), cystinosis (n = 5), nephrotic syndrome (n = 5), autosomal dominant polycystic kidney disease (ADPKD) (n = 4), nephronophthisis (n = 3), Alport syndrome (n = 3), Gitelman syndrome (n = 2), posterior urethral valves (n = 2), primary hyperoxaluria (n = 2), and thrombotic microangiopathy (n = 2) (Fig. 1).

For evaluating whether ChatGPT’s responses to various survey questions align with current scientific knowledge, we considered only the scores from 42 ERKNet experts (Table 1). For evaluating whether ChatGPT’s responses to various survey questions are helpful for patients and families, we considered scores from ERKNet experts and ePAGs (Table 1).

Table 1 Evaluation of the “scientific correctness” and “helpfulness” of ChatGPT responses

Our findings demonstrate that both ChatGPT 3.5 and 4.0 provide explanations of rare kidney diseases to patients and families that are consistent with scientific understanding and are considered helpful for patients and families (Table 1). Additionally, the prognostic information about the underlying disease and guidance on the decision whether to obtain genetic testing are presented accurately and in a helpful manner (Table 1). However, ERKNet experts and ePAGs expressed concerns about ChatGPT’s responses to questions related to alternative treatments, options for seeking a second opinion in various European cities, and recommendations for other reliable information sources (Table 1). ChatGPT’s ability to explain diseases in plain language was considered accurate and helpful (Table 1).

Responses to random expert-level questions (Supplemental Table 2) from participating experts were generally accurate (Table 1). In one instance, an expert remarked that the response in context of nephrogenic diabetes insipidus, “seek medical attention if signs of dehydration, such as dry mouth, sunken eyes, or decreased urination occur,” was “inappropriate” and “potentially harmful,” as patients should seek medical attention at an earlier stage. ChatGPT’s responses to the “emotional challenges” presented by ePAGs received a median score of 3, with a relatively broad interquartile range of 2.25, indicating mixed satisfaction (Table 2).

Table 2 General aspects of ChatGPT responses

ERKNet experts and ePAGs generally agreed that ChatGPT is helpful and empathetic (Table 2). However, concerns about safety of this new technology in general remained, as reflected by a neutral score on that specific question (Table 2).

About half of the participants shared comments on one or more survey questions. Overall, we received 50 comments from 36/54 participants comprising mixed feedback on specific ChatGPT responses and general matters (Supplemental Table 3). Many appreciated ChatGPT’s clear and readable responses (“I am a bit surprised. The answers are way better than I thought they would be”). However, several participants criticized the responses for being too general, lacking specific medical details, and sometimes offering information that was not directly relevant to the condition in question (“Very generic answer. Uses terms such as ‘reabsorption,’ which is probably meaningless or even confusing without explanation”). Concerns were raised about ChatGPT’s suggestions of alternative treatments, such as herbal remedies and complementary medicine, which were seen as potentially misleading or unsafe (“Recommends ginger and licorice! As well as any other quackery around, such as ‘Mind–body-techniques’”). Participants also observed that ChatGPT often recommended consulting a healthcare provider, which was perceived positively. However, the exclusion of important resources, such as the National Institutes of Health (NIH), the European Medicines Agency (EMA), the International Pediatric Nephrology Association (IPNA), the Pediatric Nephrology Research Consortium (PNRC), Nephcure, and the European Rare Kidney Disease Reference Network (ERKNet), was considered a notable limitation. Additionally, the language used in responses was sometimes considered too technical or abstract for the average patient, and some answers appeared to be more US-centric rather than tailored to a European audience (“ERKNet and ESPN as well as ESPU are missing. ChatGPT suggests rather US organizations like Mayo Clinic and National Kidney Foundation”

Comments (0)

No login
gif