Development and validation of prediction models for stroke and myocardial infarction in type 2 diabetes based on health insurance claims: does machine learning outperform traditional regression approaches?

Stephan AJ, Hanselmann M, Bajramovic M, Schosser S, Laxy M

doi:https://doi.org/10.1186/s12933-025-02640-9

User: Guest

Professur für Public Health und Prevention (Prof. Laxy)

Back
Back to start of result list
Permanent link for displayed object

Title:: Development and validation of prediction models for stroke and myocardial infarction in type 2 diabetes based on health insurance claims: does machine learning outperform traditional regression approaches?
Document type:: Zeitschriftenaufsatz
Author(s):: Stephan AJ, Hanselmann M, Bajramovic M, Schosser S, Laxy M
Abstract:: Background Digitalization and big health system data open new avenues for targeted prevention and treatment strategies. We aimed to develop and validate prediction models for stroke and myocardial infarction (MI) in patients with type 2 diabetes based on routinely collected high-dimensional health insurance claims and compared predictive performance of traditional regression with state-of-the-art machine learning including deep learning methods. Methods We used German health insurance claims from 2014 to 2019 with 287 potentially relevant literature-derived variables to predict 3-year risk of MI and stroke. Following a train-test split approach, we compared the performance of logistic methods with and without forward selection, LASSO-regularization, random forests (RF), gradient boosting (GB), multi-layer-perceptrons (MLP) and feature-tokenizer transformers (FTT). We assessed discrimination (Areas Under the Precision-Recall and Receiver-Operator Curves, AUPRC and AUROC) and calibration. Results Among n = 371,006 patients with type 2 diabetes (mean age: 67.2 years), 3.5% (n = 13,030) had MIs and 3.4% (n = 12,701) strokes. AUPRCs were 0.035 (MI) and 0.034 (stroke) for a null model, between 0.082 (MLP) and 0.092 (GB) for MI, and between 0.061 (MLP) and 0.073 (GB) for stoke. AUROCs were 0.5 for null models, between 0.70 (RF, MLP, FTT) and 0.71 (all other models) for MI, and between 0.66 (MLP) and 0.69 (GB) for stroke. All models were well calibrated. Conclusions Discrimination performance of claims-based models reached a ceiling at around 0.09 AUPRC and 0.7 AUROC. While for AUROC this performance was comparable to existing epidemiological models incorporating clinical information, comparison of other, potentially more relevant metrics, such as AUPRC, sensitivity and Positive Predictive Value was hampered by lack of reporting in the literature. The fact that machine learning including deep learning methods did not outperform more traditional approaches may suggest that feature richness and complexity were exploited before the choice of algorithm could become critical to maximize performance. Future research might focus on the impact of different feature derivation approaches on performance ceilings. In the absence of other more powerful screening alternatives, applying transparent regression-based models in routine claims, though certainly imperfect, remains a promising scalable low-cost approach for population-based cardiovascular risk prediction and stratification. «
Background Digitalization and big health system data open new avenues for targeted prevention and treatment strategies. We aimed to develop and validate prediction models for stroke and myocardial infarction (MI) in patients with type 2 diabetes based on routinely collected high-dimensional health insurance claims and compared predictive performance of traditional regression with state-of-the-art machine learning including deep learning methods. Methods We used German health insurance claim... »
Keywords:: Machine learning; health insurance; Claims database analysis; Predictive algorithms; Prediction model; Risk scores, Type 2 diabetes; Myocardial infarction; Stroke; Logistic regression; Deep learning
Journal title:: Cardiovascular Diabetology
Year:: 2025
Journal volume:: 24, Article number 80(2025)
Year / month:: 2025-02
Quarter:: 1. Quartal
Month:: Feb
Journal issue:: 24, Article number 80(2025)
Reviewed:: ja
Language:: en
Fulltext / DOI:: doi:https://doi.org/10.1186/s12933-025-02640-9
WWW:: https://cardiab.biomedcentral.com/articles/10.1186/s12933-025-02640-9
Impact Factor:: 8,5
Scimago Quartil:: Q1
Status:: Verlagsversion / published
Submitted:: 15.11.2024
Accepted:: 08.02.2025
Date of publication:: 18.02.2025
Format:: Bild/Text
Ingested:: 18.02.2025
BibTeX

Occurrences:

mediaTUM Gesamtbestand Einrichtungen Schools TUM School of Medicine and Health Departments Health and Sport Sciences Professur für Public Health und Prevention (Prof. Laxy)

mediaTUM Gesamtbestand Hochschulbibliographie 2025 Schools TUM School of Medicine and Health Professur für Public Health und Prevention (Prof. Laxy)