Information about incidents within a conflict, e.g., shelling of an area of interest, is scattered amongst different data or media sources. For example, the ACLED dataset continuously documents local incidents recorded within the context of a specific conflict such as Russia’s war in Ukraine. However, these blocks of information might be incomplete. Therefore, it is useful to collect data from several sources to enrich the information pool of a certain incident. In this paper, we present a dataset of social media messages covering the same war events as those collected in the ACLED dataset. The information is extracted from automatically geocoded Twitter text data using state-of-the-art natural language processing methods based on large pre-trained language models (LMs). Our method can be applied to various textual data sources. Both the data as well as the approach can serve to help human analysts obtain a broader understanding of conflict events.
«
Information about incidents within a conflict, e.g., shelling of an area of interest, is scattered amongst different data or media sources. For example, the ACLED dataset continuously documents local incidents recorded within the context of a specific conflict such as Russia’s war in Ukraine. However, these blocks of information might be incomplete. Therefore, it is useful to collect data from several sources to enrich the information pool of a certain incident. In this paper, we present a datas...
»