The growing number of time-labeled datasets in science and industry increases the need for algorithms that automatically induce process models. Existing methods are capable of identifying process models that typically only work on single attribute events. We propose a new model type and its corresponding algorithm to address the problem of mining multi-attribute events, meaning that each event is described by a vector of attributes. The model is based on timed automata, includes expressive descriptions of states and can be used for making predictions. A probabilistic real time automaton (PRTA) is created, where each state is annotated by a profile of events. To identify the states of the automaton, similar events are combined by a clustering approach. The method was implemented and tested on a synthetic, a medical and a biological dataset. Its prediction accuracy was evaluated on a medical dataset and compared to a combined logistic regression, which is considered a standard in this application domain. Moreover, the method was experimentally compared to Multi-Output HMMs and Petri nets learned by standard process mining algorithms. The experimental comparison suggests that the automaton-based approach performs favorably in several dimensions. Most importantly, we show that meaningful medical and biological process knowledge can be extracted from such automata.
«
The growing number of time-labeled datasets in science and industry increases the need for algorithms that automatically induce process models. Existing methods are capable of identifying process models that typically only work on single attribute events. We propose a new model type and its corresponding algorithm to address the problem of mining multi-attribute events, meaning that each event is described by a vector of attributes. The model is based on timed automata, includes expressive descr...
»