It is hard to underestimate the importance of transport planning and general research related to people mobility patterns. A lot of current research in this field relies heavily on data. However sometimes data availability issues due to system properties or some endogenous factors may limit study potential. Therefore, it was decided to discover the possibilities of use of auxiliary information sources that received limited attention previously.
A methodology to retrieve and predict data available for public and related to mobility patterns (i.e. shares of people attending particular venue from Google “Popular Times” section of maps) was developed and tested. Several sources were used in this study: Google Maps, Yelp, OpenStreetMap, Google API, government data on workplaces and population.
Certain scripts were developed for information retrieval and filtering for each data source. Additional procedures were developed to prepare highly aggregated data for use in prediction models. Special procedure was developed for combining venue specific and spatial data, which involved spatial operations (intersects/within) and spatial indexing to increase speed of spatial operations.
Clustering algorithm was developed for data exploration part. The algorithm is based on visual exploration of data projection with reduced dimensionality that is achieved with the help of t-SNE method.
Two classes of prediction models with and without transformation of dependent variables were tested: linear regression with lasso regularization and gradient boosted regression (GBR). Each model group tested consisted of 168 dependent variables (i.e. number of hours in a week), number of place parameters (like rating, number of related comments, type of service provided) and locational properties (like number of stores, hotels, attractions etc. nearby).
In general, it was found that prediction power of both classes of models increased with transformation of dependent variable.
GBR models with applied transformations were better, comparing with linear ones. In at least 50% of cases the difference is relatively low (𝑅2 difference of 0.02), increasing higher than 0.20 for certain hours.
As Google “Popular Times” data defines only venue shares, microcontroller setup to measure actual number of people attending particular venue by WIFI device presence detection was developed and tested. Real world tests showed that such setup is useful in practice and could be recommended in future research.
«
It is hard to underestimate the importance of transport planning and general research related to people mobility patterns. A lot of current research in this field relies heavily on data. However sometimes data availability issues due to system properties or some endogenous factors may limit study potential. Therefore, it was decided to discover the possibilities of use of auxiliary information sources that received limited attention previously.
A methodology to retrieve and predict data availab...
»