Autonomous medical evaluation for guideline adherence of large language models.

Fast, Dennis; Adams, Lisa C; Busch, Felix; Fallon, Conor; Huppertz, Marc; Siepmann, Robert; Prucker, Philipp; Bayerl, Nadine; Truhn, Daniel; Makowski, Marcus; Löser, Alexander; Bressem, Keno K

doi:10.1038/s41746-024-01356-6

Institut für Radiologie

Zurück
Zurück zum Anfang der Trefferliste
Dauerhafter Link zum angezeigten Objekt

Titel:: Autonomous medical evaluation for guideline adherence of large language models.
Dokumenttyp:: Journal Article
Autor(en):: Fast, Dennis; Adams, Lisa C; Busch, Felix; Fallon, Conor; Huppertz, Marc; Siepmann, Robert; Prucker, Philipp; Bayerl, Nadine; Truhn, Daniel; Makowski, Marcus; Löser, Alexander; Bressem, Keno K
Abstract:: Autonomous Medical Evaluation for Guideline Adherence (AMEGA) is a comprehensive benchmark designed to evaluate large language models' adherence to medical guidelines across 20 diagnostic scenarios spanning 13 specialties. It includes an evaluation framework and methodology to assess models' capabilities in medical reasoning, differential diagnosis, treatment planning, and guideline adherence, using open-ended questions that mirror real-world clinical interactions. It includes 135 questions and 1337 weighted scoring elements designed to assess comprehensive medical knowledge. In tests of 17 LLMs, GPT-4 scored highest with 41.9/50, followed closely by Llama-3 70B and WizardLM-2-8x22B. For comparison, a recent medical graduate scored 25.8/50. The benchmark introduces novel content to avoid the issue of LLMs memorizing existing medical data. AMEGA's publicly available code supports further research in AI-assisted clinical decision-making, aiming to enhance patient care by aiding clinicians in diagnosis and treatment under time constraints. «
Autonomous Medical Evaluation for Guideline Adherence (AMEGA) is a comprehensive benchmark designed to evaluate large language models' adherence to medical guidelines across 20 diagnostic scenarios spanning 13 specialties. It includes an evaluation framework and methodology to assess models' capabilities in medical reasoning, differential diagnosis, treatment planning, and guideline adherence, using open-ended questions that mirror real-world clinical interactions. It includes 135 questions and... »
Zeitschriftentitel:: NPJ Digit Med
Jahr:: 2024
Band / Volume:: 7
Heft / Issue:: 1
Volltext / DOI:: doi:10.1038/s41746-024-01356-6
PubMed:: http://view.ncbi.nlm.nih.gov/pubmed/39668168
TUM Einrichtung:: Institut für Diagnostische und Interventionelle Radiologie (Prof. Makowski)
BibTeX

Vorkommen:

mediaTUM Gesamtbestand Einrichtungen Schools TUM School of Medicine and Health Departments Clinical Medicine Institut für Diagnostische und Interventionelle Radiologie (Prof. Makowski)2024

mediaTUM Gesamtbestand Hochschulbibliographie 2024 Schools und Fakultäten TUM School of Medicine and Health Institut für Radiologie