Predicting Venue Popularity Using Crowd-Sourced and Passive Sensor Data
Efficient and reliable mobility pattern identification is essential for transport planning research. In order to infer mobility patterns, however, a large amount of spatiotemporal data is needed, which is not always available. Hence, location-based social networks (LBSNs) have received considerable attention as a potential data provider. The aim of this study is to investigate the possibility of using several different auxiliary information sources for venue popularity modeling and provide an alternative venue popularity measuring approach. Initially, data from widely used services, such as Google Maps, Yelp and OpenStreetMap (OSM), are used to model venue popularity. To estimate hourly venue occupancy, two different classes of model are used, including linear regression with lasso regularization and gradient boosted regression (GBR). The predictions are made based on venue-related parameters (e.g., rating, comments) and locational properties (e.g., stores, hotels, attractions). Results show that the prediction can be improved using GBR with a logarithmic transformation of the dependent variables. To investigate the quality of social media-based models by obtaining WiFi-based ground truth data, a microcontroller setup is developed to measure the actual number of people attending venues using WiFi presence detection, demonstrating that the similarity between the results of WiFi data collection and Google “ Popular Times” is relatively promising.