Data clustering is recently a common technique to group similar data with certain features. It enables finding the representative in each cluster as well. However, the clustering analysis comprises several challenging tasks, e.g., feature selection, choice among different clustering algorithms, defining the optimal cluster number, clustering with the use of a distance measure dealing with various levels of measurement, cluster validation, and interpretation of results in the end. The objective of this paper is the conceptual design of a scenario catalog including extracted representative near-crash and crash scenarios. Two clustering algorithms based on kcovers and k-medoids are applied to data in a naturalistic driving study under consideration of aforementioned aspects. Afterwards, the comparison of two clustering algorithms is conducted based on the cluster representativeness, purity, and average silhouette width. Moreover, the clusters are visualized in a two dimensional scenario space by t-Distributed Stochastic Neighbor Embedding (t-SNE). The derived scenario catalog covers the selected database at best possible rate and enables a cost-efficient development of predictive safety functions.
«
Data clustering is recently a common technique to group similar data with certain features. It enables finding the representative in each cluster as well. However, the clustering analysis comprises several challenging tasks, e.g., feature selection, choice among different clustering algorithms, defining the optimal cluster number, clustering with the use of a distance measure dealing with various levels of measurement, cluster validation, and interpretation of results in the end. The objective of...
»