Many real-world applications, where data gathering and analysis play a central role, are faced with the fundamental question of how to select variables (to be instantiated or measured) that can efficiently reduce uncertainty. In this paper, we use the concept of conditional mutual information to approach problems involving the selection of variables observations in the area of medical diagnosis. Computing mutual information requires estimates of joint distributions over collections of variables. However, in general computing accurate joint distributions conditioned on a large set of variables is expensive in terms of data and computing power. Therefore, one must seek alternative ways to calculate the relevant quantities and still use all the available observations. We describe and compare a basic approach consisting of averaging mutual information estimates conditioned on individual observations and another approach where it is possible to condition on all observations at once by making some conditional independence assumptions (while the first approach does not require making any data modelling assumptions). This yields a data-efficient variant of information maximization for test selection. We present experimental results on public heart disease data (our test domain) and data from a controlled study in the area of breast cancer diagnosis (our main application).
«
Many real-world applications, where data gathering and analysis play a central role, are faced with the fundamental question of how to select variables (to be instantiated or measured) that can efficiently reduce uncertainty. In this paper, we use the concept of conditional mutual information to approach problems involving the selection of variables observations in the area of medical diagnosis. Computing mutual information requires estimates of joint distributions over collections of variables....
»