mediaTUM - Medien- und Publikationsserver

Linguistik

Zurück
Zurück zum Anfang der Trefferliste
Dauerhafter Link zum angezeigten Objekt

Titel:: TweEvent: A dataset of Twitter messages about events in the Ukraine conflict
Dokumenttyp:: Forschungsdaten
Veröffentlichungsdatum:: 22.03.2023
Verantwortlich:: Kruspe, Anna
Autorinnen / Autoren:: Rode-Hasinger, Samyo; Häberle, Matthias; Racek, Daniel; Kruspe, Anna; Zhu, Xiao Xiang
Institutionszugehörigkeit:: TUM: Rode-Hasinger, Samyo; Häberle, Matthias; Racek, Daniel; Kruspe, Anna; Zhu, Xiao Xiang
Technische Hochschule Nürnberg: Kruspe, Anna
Herausgeber:: TUM
Identifikator:: doi:10.14459/2023mp1703244
Enddatum der Datenerzeugung:: 11.11.2022
Fachgebiet:: INF Informationswesen, Bibliotheks-, Dokumentations-, Archiv-, Museumswesen; LIN Linguistik; POL Politologie
Quellen der Daten:: Experimente und Beobachtungen / experiments and observations; Statistik und Referenzdaten / statistics and reference data; Textdokumente / text documents
Andere Quellen der Daten:: Social media messages
Datentyp:: Texte / texts
Anderer Datentyp:: IDs for retrieving text data from two sources (ACLED & Twitter)
Methode der Datenerhebung:: Long-term collection of Twitter messages, then alignment to known events from the ACLED dataset using NLP methods (see paper)
Beschreibung:: Information about incidents within a conflict, e.g., shelling of an area of interest, is scattered amongst different data or media sources. For example, the ACLED dataset continuously documents local incidents recorded within the context of a specific conflict such as Russia’s war in Ukraine. However, these blocks of information might be incomplete. Therefore, it is useful to collect data from several sources to enrich the information pool of a certain incident. In this paper, we present a dataset of social media messages covering the same war events as those collected in the ACLED dataset. The information is extracted from automatically geocoded Twitter text data using state-of-the-art natural language processing methods based on large pre-trained language models (LMs). Our method can be applied to various textual data sources. Both the data as well as the approach can serve to help human analysts obtain a broader understanding of conflict events. «
Information about incidents within a conflict, e.g., shelling of an area of interest, is scattered amongst different data or media sources. For example, the ACLED dataset continuously documents local incidents recorded within the context of a specific conflict such as Russia’s war in Ukraine. However, these blocks of information might be incomplete. Therefore, it is useful to collect data from several sources to enrich the information pool of a certain incident. In this paper, we present a datas... »
Schlagworte:: Conflict, Ukraine, Dataset, Social Media, NLP
Technische Hinweise:: View and download (1,8 MB total, 2 Files)
The data server also offers downloads with FTP
The data server also offers downloads with rsync (password m1703244):
rsync rsync://m1703244@dataserv.ub.tum.de/m1703244/
Sprache:: en
Rechte:: by, http://creativecommons.org/licenses/by/4.0

BibTeX

Vorkommen:

mediaTUM Gesamtbestand Einrichtungen Schools TUM School of Engineering and Design Departments Aerospace and Geodesy Data Science in Earth Observation (Prof. Zhu)

mediaTUM Gesamtbestand Forschungsdaten Datenverarbeitung, Informatik

mediaTUM Gesamtbestand Forschungsdaten Linguistik

mediaTUM Gesamtbestand Forschungsdaten Politologie

TweEvent: A dataset of Twitter messages about events in the Ukraine conflict