IMPORTANCE: Differentiating between malignant and benign etiology in large-bowel wall thickening on computed tomography (CT) images can be a challenging task. Artificial intelligence (AI) support systems can improve the diagnostic accuracy of radiologists, as shown for a variety of imaging tasks. Improvements in diagnostic performance, in particular the reduction of false-negative findings, may be useful in patient care.
OBJECTIVE: To develop and evaluate a deep learning algorithm able to differentiate colon carcinoma (CC) and acute diverticulitis (AD) on CT images and analyze the impact of the AI-support system in a reader study.
DESIGN, SETTING, AND PARTICIPANTS: In this diagnostic study, patients who underwent surgery between July 1, 2005, and October 1, 2020, for CC or AD were included. Three-dimensional (3-D) bounding boxes including the diseased bowel segment and surrounding mesentery were manually delineated and used to develop a 3-D convolutional neural network (CNN). A reader study with 10 observers of different experience levels was conducted. Readers were asked to classify the testing cohort under reading room conditions, first without and then with algorithmic support.
MAIN OUTCOMES AND MEASURES: To evaluate the diagnostic performance, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were calculated for all readers and reader groups with and without AI support. Metrics were compared using the McNemar test and relative and absolute predictive value comparisons.
RESULTS: A total of 585 patients (AD: n = 267, CC: n = 318; mean [SD] age, 63.2 [13.4] years; 341 men [58.3%]) were included. The 3-D CNN reached a sensitivity of 83.3% (95% CI, 70.0%-96.6%) and specificity of 86.6% (95% CI, 74.5%-98.8%) for the test set, compared with the mean reader sensitivity of 77.6% (95% CI, 72.9%-82.3%) and specificity of 81.6% (95% CI, 77.2%-86.1%). The combined group of readers improved significantly with AI support from a sensitivity of 77.6% to 85.6% (95% CI, 81.3%-89.3%; P < .001) and a specificity of 81.6% to 91.3% (95% CI, 88.1%-94.5%; P < .001). Artificial intelligence support significantly reduced the number of false-negative and false-positive findings (NPV from 78.5% to 86.4% and PPV from 80.9% to 90.8%; P < .001).
CONCLUSIONS AND RELEVANCE: The findings of this study suggest that a deep learning model able to distinguish CC and AD in CT images as a support system may significantly improve the diagnostic performance of radiologists, which may improve patient care.