Combining Open-source Programming Languages with GIS for Spatial Data Science
Extracting knowledge from data has always been a challenge for a wide range of professionals which includes scientists, statisticians, programmers and many others. Information visualization and communication is, on the other hand, a challenge for Cartographers and GIS professionals since their aim is to visually deliver useful information to aid humankind and their sustainability. To achieve this aim theories and techniques from broad professional fields such as maths, statistics, probability, computer science, as well as knowledge from machine learning, classification, cluster analysis, data mining, databases, and visualization had to be unified.
This master thesis aims to meet these challenges by investigating which tools and software are suitable for solving the afore mentioned tasks. Therefore, three main objectives are explored. The first refers to a decision of which programming languages, among a variety, are suitable for integration with GIS software for enhancing spatial analysis and predictions. The second deals with parsing the work between chosen tools in the most efficient way. The third focuses on an easy to understand real world example whose implementation either supports or denies the made choices.
This thesis consists of two main parts. Theoretical part which gathers available literature in the domain of interest and which delivers conclusions upon which software and tools to use (in this case those were Python, R and ArcGIS Pro). Practical part which takes out real world data about San Francisco crimes and through exhaustive and massive analysis delivers conclusions suitable for police departments to deploy while relocating their forces.
Python and R stand-alone scripts (Appendix 1 and 2) are used for exploratory analysis over the San Francisco criminal data. These analyses show which crimes are the most prominent ones in San Francisco, which district is the most dangerous/safest one, which month/day/hour of the week has the highest criminal activity, percentage of crime resolution and possible dependency between crime occurrence and variables from the dataset. ArcgGIS Pro is used for inferential analysis and predictions where crimes are most likely to occur. These analyses enable insight into areas with persistent and intensifying crime occurrence and areas with unusually high numbers of crimes while accounting for the population. Further on, R-ArcGIS Bridge gives potential explanations about correlations between income, population and median house value which can be used as input variables while building a predication model. Additionally, Python is used for building a predictive model considering prominent variables from the dataset. However, results were not satisfying, thus, this part of the study is left to be explored in more details for some future work or research.
The conducted work suggests that combination and integration of programming languages, such as Python and R, with the GIS software, in this case ArcGIS Pro, leverages and provides information extraction and visualisation. For further work, cooperation between Cartographers, Data Scientist and Crime Analysts would be highly productive, beneficial and recommended. «
Extracting knowledge from data has always been a challenge for a wide range of professionals which includes scientists, statisticians, programmers and many others. Information visualization and communication is, on the other hand, a challenge for Cartographers and GIS professionals since their aim is to visually deliver useful information to aid humankind and their sustainability. To achieve this aim theories and techniques from broad professional fields such as maths, statistics, probability, c... »
Dr. Mathias Jahnke, TUM; Prof. Dr. Menno-Jan Kraak (ITC)