Social media provides rapidly growing content in the form of texts and multimedia files, which are rich data sources for many research disciplines.
With regard to processing these sources, there is a substantial evolution of data science related disciplines, such as, machine learning (ML), natural language processing (NLP), et cetera.
However, the integration of these innovations into the workflow of domain experts to manually analyze social media big data in application domains is understudied.
For instance, in qualitative research, data analyses, such as, content studies, are still carried out entirely manually on small amounts of thoughtfully selected data.
Thereby, the level of detail provided by domain experts is still unmet by computer algorithms.
Thus, a professional, handcrafted analysis is the gold standard for precise explanations and theories about individuals and their comments on the world as they perceive it.
However, qualitative studies are carried out on small datasets, which are condemned to be incomplete and biased.
For example, a specific data source, e.g., a domain-specific online forum about a particular topic, might be frequented by individuals with a predetermined mindset, which is commonly known as filter bubble.
This can be imposed by many factors, among other, culture, language, community guidelines, and so on.
Consequently, it is desirable to investigate as much data from as various data sources as possible to be able to discover all possible discussed themes and expressed opinions of a domain in a more representative manner.
Luckily, social media provides manifold sources from all sorts of interest groups and mindsets.
At the same time, the recent advances in NLP and deep learning provide computational methods with unforeseen performance with regard to semantic coherence and accuracy.
More than ever before, computer algorithms offer a highly connected and contextualized understanding of unstructured amounts of texts, leading to improved summarization of big social media data.
Tailoring that technology for the end user throws the spotlight on the person who uses the methods in order to gain insight from social media data, which is the so-called text miner or domain expert.
State-of-the-art methods need to add value to the tasks carried out by text miner to improve understanding of the investigated data domain and to enable better research.
This thesis aims at leveraging state-of-the-art (SOTA) NLP methods to support a domain expert in the big data opinion mining process.
It is dedicated to the exploration of an extended range of discussed themes as well as to providing representative quantitative statistics from social media texts.
We show the potential to improve domain exploration by providing tools and methods unveiling the big picture of social media, i.e., by displaying opinions about as many topics and aspects as possible at multiple languages at a time.
Further, our studies demonstrate how to complement or replace cross-cultural representative surveys by using opinion mining technology.
Last but not least, we improve the SOTA of existing NLP to transfer explanations, theories, and structured knowledge provided by domain experts from limited data to big data using predictive ML models.
«
Social media provides rapidly growing content in the form of texts and multimedia files, which are rich data sources for many research disciplines.
With regard to processing these sources, there is a substantial evolution of data science related disciplines, such as, machine learning (ML), natural language processing (NLP), et cetera.
However, the integration of these innovations into the workflow of domain experts to manually analyze social media big data in application domains is understudie...
»