Mode choice modeling is imperative for both predicting and understanding travel behavior. For this purpose, machine learning (ML) models have increasingly been applied to stated preference and traditional self-recorded revealed preference data with promising results, particularly for XGBoost and Random Forest (RF) models. Due to the rise in the use of tracking-based smartphone applications for recording travel behavior, we here address the important task of testing these ML models for mode choice modeling on such data. Furthermore, as ML approaches are to date still criticized for leading to results that are hard to understand, we consider it essential to provide an in-depth interpretability analysis of the best-performing model. Our results show that the XGBoost and RF models by far outperform a conventional multinomial logit model both overall and for each mode. The interpretability analysis using SHAP (SHapley Additive exPlanations) reveals that the XGBoost model can be explained well at an overall and mode level. Additionally, we demonstrate how to analyse individual predictions. Lastly, a brief sensitivity analysis gives insight into the relative importance of different data sources, sample size, and user reliability. We conclude that the XGBoost model performs best, while also being interpretable. Insights generated by such models can be used, for instance, to predict mode choice decisions for arbitrary O-D pairs to in turn see which impacts infrastructural changes would have on mode share.
«
Mode choice modeling is imperative for both predicting and understanding travel behavior. For this purpose, machine learning (ML) models have increasingly been applied to stated preference and traditional self-recorded revealed preference data with promising results, particularly for XGBoost and Random Forest (RF) models. Due to the rise in the use of tracking-based smartphone applications for recording travel behavior, we here address the important task of testing these ML models for mode choic...
»