The Markov aggregation problem considers approximating a Markov chain with a large alphabet by a Markov chain with a small alphabet, such that the latter preserves relevant aspects of the former while significantly reducing model complexity. Recently, information-theoretic cost functions have been proposed for the Markov aggregation problem, and they have been shown to be connected to predictability, spectral clustering, lumpability, and the information bottleneck method. In this talk, we develop an information-theoretic cost function from first principles and show that it includes previously proposed cost functions as special cases. We furthermore propose a simple optimization heuristic and illustrate its performance on examples from natural language processing and data clustering.
«
The Markov aggregation problem considers approximating a Markov chain with a large alphabet by a Markov chain with a small alphabet, such that the latter preserves relevant aspects of the former while significantly reducing model complexity. Recently, information-theoretic cost functions have been proposed for the Markov aggregation problem, and they have been shown to be connected to predictability, spectral clustering, lumpability, and the information bottleneck method. In this talk, we develo...
»