Although neural networks have been used for pattern classification for decades, convolutional neural networks (CNNs) have become increasingly important over the past several years. In particular, CNNs are utilized in automated scenarios for traffic sign recognition and disease classification. However, they still suffer from overfitting and lack of robustness to undesired inputs. Hence, they can generate overconfident false predictions (FPs), which can be dangerous and costly, especially when used in safety- and/or mission-critical applications. Here, overconfident FPs can (1) cause collisions in robotic applications, (2) prompt false treatments in medical applications, or (3) increase costs in financial applications. These significant consequences limit the use of CNNs in the aforementioned fields even though their technological potential is of great interest.
To overcome these limitations and encourage the widespread use of CNNs in safety-and/or mission-critical applications, we aim to prevent FPs by improving the separability between true predictions (TPs) and FPs. To achieve this, we will force the degree of confidence (measuring uncertainty) to be high for TPs and low for FPs. This is based on the hypothesis that if the confidence is high for TPs and low for FPs, both TPs and FPs will be well-separated using a threshold. Therefore, the research questions are as follows:
(1) Which method forces the degree of confidence to be high for TPs and low for FPs?
(2) Under what circumstances does the method work?
(3) At what cost does the method help to maintain a low confidence for FPs and a high confidence for TPs?
To address the first question, we develop a method called Monte Carlo averaging (MCA) and compare it to related methods, such as baseline (single CNN), Monte Carlo dropout (MCD), ensemble of CNNs, and mixture of Monte Carlo dropout (MMCD). To answer the second question, we gauge the performance of the developed and related methods on four datasets with different difficulties. In addition, we gauge the performance of the developed and related methods on different CNNs to assess their performance on different architectures. Further, we investigate the impact of applying logit instead of probability averaging on the developed and related methods, as well as the impact of reducing the strength of regularization during training. To address the third question, we evaluate the ability of the developed and related methods to separate TPs and FPs and examine the classification accuracy, calibration error, and inference time.
Experimental results show improvements in the developed MCA and the state-of-the-art MMCD compared to the other related methods (baseline, MCD, and ensemble of CNNs).
Specifically, similar to MMCD, the developed MCA can preserve the accuracy of the underlying ensemble, which may increase the baseline accuracy. The baseline accuracy could only be preserved by MCD. Both MMCD and MCA improve the separability of TPs and FPs at the cost of increasing the calibration error and inference time. However, applying logit instead of probability averaging in MCA and related methods or reducing the strength of regularization decreases the calibration error at the cost of negatively impacting the separability of TPs and FPs. Hence, there is a tradeoff between improving the calibration and improving the separability of TPs and FPs. Although the performance of all methods heavily relies on the dataset and/or architecture, MCD and MMCD are more sensitive to the dataset and/or architecture.
Overall, we developed MCA to force the degree of confidence to be high for TPs and low for FPs in order to improve the separability of TPs and FPs. Compared to the state-of-the-art MMCD, the developed MCA is more than four times faster, has the same purpose and underlying principle, and shows similar or sometimes better performance. Therefore, we suggest utilizing MCA instead of MMCD for applications that require separability of TPs and FPs and where the computational budget is limited. MCA may also be advantageous for other fields of machine learning, such as active or reinforcement learning, where uncertainty is required. Moreover, MCA is preferable in the field of explainable artificial intelligence, which explores the role of uncertainty to explain predictions and increase the social acceptance of CNN-based decision-making systems. Finally, MCA opens new perspectives to fuse features of ensemble members.
«
Although neural networks have been used for pattern classification for decades, convolutional neural networks (CNNs) have become increasingly important over the past several years. In particular, CNNs are utilized in automated scenarios for traffic sign recognition and disease classification. However, they still suffer from overfitting and lack of robustness to undesired inputs. Hence, they can generate overconfident false predictions (FPs), which can be dangerous and costly, especially when use...
»