Testing new versions of ChatGPT in terms of physiology and electrophysiology of hearing: improved accuracy but not consistency

Abstract

Introduction ChatGPT has revolutionized many aspects of modern life, including scientific ones. Since its introduction, new versions have been introduced and advertised as having better performance. But is this true? This study aimed to assess the accuracy and consistency of six versions of ChatGPT (3.5, 4, 4o mini, 4o, 4o1 mini, and 4o1 preview). Of interest was the variability of responses given to asking the same question multiple times.

Methods We evaluated 6 versions of ChatGPT based on their responses to 30 single-answer, multiple-choice exam questions from a 1-year course on objective methods of testing hearing. The questions were posed 10 times to each version of ChatGPT across two days (5 times each day). The accuracy of the responses was evaluated in terms of a response key. To evaluate consistency (repeatability) of the responses over time, percent agreement and Cohen’s Kappa were calculated.

Results The overall accuracy of ChatGPT increased with each version, starting from around 53% for version 3.5 and rising to 86% for version 4o1 preview. The greatest improvement in accuracy and repeatability came with the introduction of version 4o. Repeatability progressively rose with newer releases with the exception of version 4o1 mini. While the current top version 4o1 preview has similar repeatability to 4o, the faster version, 4o1 mini, had significantly lower repeatability than the older 4o mini.

Conclusion Newer versions of ChatGPT generally show improvement in terms of accuracy, but not in repeatability. The variability of responses is probably the current main limitation of ChatGPT for professional applications. Users must be especially careful with version 4o1 mini.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This study did not receive any funding

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Data Availability

All data produced in the present study are available as a supplementary file.

Comments (0)

No login
gif