Model-based clustering methods are commonly used to cluster multivariate data. Usually, the data contains many variables. However, in many cases, considering all variables increases the complexity of the model or may even cause interference in detecting the hidden structure of the data. Therefore, variable selection in clustering may simplify the model and increase its accuracy. Hence, we discussed several variable selection approaches for Gaussian mixture models, incl. regularisation, information criterion based and hybrid-based approaches, and compare the results by applying these approaches to the real dataset. The dataset is obtained from the Alzheimer’s Disease Neuroimaging Initiative, an organization that unites researchers with research data in an effort to determine the progression of Alzheimer’s disease. It consists of 10 variables and three different disease states: Cognitively normal, Mild cognitive impairment and Dementia. Considering some non-Gaussian dependence between the pair of variables, the clustering results of the vine copula mixture model are also presented and compared with those of the Gaussian mixture model.
«
Model-based clustering methods are commonly used to cluster multivariate data. Usually, the data contains many variables. However, in many cases, considering all variables increases the complexity of the model or may even cause interference in detecting the hidden structure of the data. Therefore, variable selection in clustering may simplify the model and increase its accuracy. Hence, we discussed several variable selection approaches for Gaussian mixture models, incl. regularisation, informa...
»