Legal JRC-Acquis Sum – Text Summarization Corpus

Verantwortlich:

Gebendorfer, Christoph; Elnaggar, Ahmed

Autorinnen / Autoren:

Gebendorfer, Christoph; Elnaggar, Ahmed

Institutionszugehörigkeit:

TUM

Herausgeber:

TUM

Identifikator:

doi:10.14459/2018md1446654

Enddatum der Datenerzeugung:

30.01.2018

Fachgebiet:

DAT Datenverarbeitung, Informatik

zusätzliche Fachgebiete:

Legal Domain

Quellen der Daten:

Textdokumente / text documents

Datentyp:

Texte / texts

Methode der Datenerhebung:

Derivation of the JRC-Acquis corpus

Beschreibung:

This corpus is a derivation of the original JRC Acquis corpus which contains legislative documents of the European Parliament since 1958. This derivation contains a subset of the original corpus and is processed into aligned form (Moses/Giza++). The full texts are taken from the body paragraphs of the JRC documents, whereas the summaries comprise the title elements of the documents. The data is split between 7 languages (cs, de, en, es, fr, it, sv). The files contain training and test sets. S... »

Links:

Chair:

https://wwwmatthes.in.tum.de/pages/t5ma0jrv6q7k/sebis-Public-Website-Home

Used in order to train a deep learning summarization model in the legal domain:

https://wwwmatthes.in.tum.de/pages/s4orjknmqls4/Master-s-Thesis-Christoph-Gebendorfer

Original Corpus:

https://ec.europa.eu/jrc/en/language-technologies/jrc-acquis

Schlagworte:

legal-jrc-acquis-sum; parallel legal summarization examples; jrc acquis documents

Technische Hinweise:

Moses/Giza++ Format
View and download (502.1 MB, 2 files)
The data server also offers downloads with FTP
The data server also offers downloads with rsync (password m1446654):
rsync rsync://m1446654@dataserv.ub.tum.de/m1446654/

Sprache:

Rechte:

by, http://creativecommons.org/licenses/by/4.0

Andere Rechte:

Rights implied by original corpus (JRC-Acquis), Commission Decision of 12 December 2011 on the re-use of Commission documents, published in Official Journal of the European Union L330 of 14 December 2011, pages 39 to 42

BibTeX

Vorkommen:

mediaTUM Gesamtbestand Einrichtungen Schools TUM School of Computation, Information and Technology Departments Computer Science Informatik 19 - Lehrstuhl für Software Engineering betrieblicher Informationssysteme (Prof. Matthes)

mediaTUM Gesamtbestand Forschungsdaten Datenverarbeitung, Informatik