Evaluation of GPT-4's Chest X-Ray Impression Generation: A Reader Study on Performance and Perception.
Abstract:
Exploring the generative capabilities of the multimodal GPT-4, our study uncovered significant differences between radiological assessments and automatic evaluation metrics for chest x-ray impression generation and revealed radiological bias.