In this work, different applications for the automated detection of events have been investigated utilizing audio-visual pattern recognition methods. The recorded data has been taken both from video surveillance or video conferences. Acoustic, visual and semantic features are extracted from the available data and are subsequently analysed with the help of graphical models. These are particularly suitable for modeling multi-modal feature sequences and provide an efficient way for automatic feature fusion. All models are first described in detail theoretically and then the necessary structure for both the learning of required parameters and the classification process are presented. Finally a conclusion is drawn by describing the results and further possible research approaches. Graphical models are suitable for these tasks, but the results are strongly depending on the kind of problem.
«
In this work, different applications for the automated detection of events have been investigated utilizing audio-visual pattern recognition methods. The recorded data has been taken both from video surveillance or video conferences. Acoustic, visual and semantic features are extracted from the available data and are subsequently analysed with the help of graphical models. These are particularly suitable for modeling multi-modal feature sequences and provide an efficient way for automatic featur...
»