Systematic evaluation of AI-based text-to-image models for generating medical illustrations in neurosurgery: a multi-stage comparative study

ElsevierVolume 257, October 2025, 109039Clinical Neurology and NeurosurgeryAuthor links open overlay panel, , , , , , , , , , , , , , , , , Highlights•

Gemini AI model outperforms others in neurosurgical illustration generation.

Advanced prompts significantly improve image accuracy and educational value.

Images rated highest were saccular aneurysms and flow-diverting stents.

Complex anatomy remains challenging for current AI image models.

Study introduces a reproducible framework for evaluating AI medical illustrations.

AbstractObjective

This study evaluates the effectiveness of artificial intelligence (AI) models in generating accurate, high-quality medical illustrations for vascular neurosurgery. It aims to develop a systematic framework for producing and assessing AI-generated medical images.

Methods

Four AI models—DALL-E, Copilot, Gemini, and Midjourney—were tested to generate illustrations of neurovascular structures and procedures (e.g., aneurysms, endovascular techniques). The study had three stages: (1) Proof of concept, in which each model generated images for nine neurovascular topics, evaluated using a standardized rubric; (2) A focused comparison using simple vs. advanced prompt strategies with Gemini; and (3) Validation of the best strategy with neurosurgery trainees and attendings. Images were scored from 1 (worst) to 5 (best) across eight domains: Accuracy, Location, Size/Scale, Color, Complexity, Educational Value, Relevance, and Aesthetic Quality.

Results

Gemini consistently outperformed other models in Stage 1, particularly in accuracy, color, and educational value. In Stage 2, advanced prompting significantly improved image quality across nearly all topics (e.g., fusiform aneurysm score rose from 22.4 to 35.0, p = 7E-08). In Stage 3, 85 % of respondents indicated they would use the saccular aneurysm image in a manuscript without modification. However, complex anatomy like anterior cerebral arteries scored lower in accuracy (2.18) and educational value (2.20).

Conclusions

AI-generated illustrations, especially from Gemini, show strong potential in neurosurgical education and communication. While advanced prompting improves output quality, challenges remain in consistently rendering complex anatomy. This study outlines a reproducible framework for clinical integration of AI-generated medical images.

Keywords

Artificial Intelligence

Neurosurgery

Medical Illustration

Intracranial Aneurysm

Arteriovenous Malformation

Craniotomy

Endovascular Procedures

© 2025 The Author(s). Published by Elsevier B.V.

Comments (0)

No login
gif