Benutzer: Gast  Login
Titel:

Legal Europarl - Translation Corpus

Verantwortlich:
Gebendorfer, Christoph; Elnaggar, Ahmed
Autorinnen / Autoren:
Gebendorfer, Christoph; Elnaggar, Ahmed
Institutionszugehörigkeit:
TUM
Herausgeber:
TUM
Identifikator:
doi:10.14459/2018md1446650
Enddatum der Datenerzeugung:
30.01.2018
Fachgebiet:
DAT Datenverarbeitung, Informatik
zusätzliche Fachgebiete:
Legal Domain
Quellen der Daten:
Textdokumente / text documents
Datentyp:
Texte / texts
Methode der Datenerhebung:
Derivation of the Europarl-v7 corpus
Beschreibung:
The Legal Europarl is a derivation of the original Europarl-v7 corpus which contains proceedings of the european parliament. This derivation contains a subset of the original corpus and is available in aligned form (Moses/Giza++). It contains parallel text in 21 language pairs based on 7 languages (cs, de, en, es, fr, it, sv) which can be directly used to train data-intense machine translation systems. Additionally, a separate test set is enclosed for evaluation purposes. Size: ~32 million se...     »
Links:

Chair:

https://wwwmatthes.in.tum.de/pages/t5ma0jrv6q7k/sebis-Public-Website-Home

Used in order to train a deep learning translation model in the legal domain:

https://wwwmatthes.in.tum.de/pages/s4orjknmqls4/Master-s-Thesis-Christoph-Gebendorfer

 

Original Corpus:

http://www.statmt.org/europarl/

http://opus.nlpl.eu/Europarl.php

Schlagworte:
legal-europarl; parallel texts of the Proceedings of the European Parliament; Europarl-v7 documents
Technische Hinweise:
Moses/Giza++ Format
View and download (3.5 GB, 23 files)
The data server also offers downloads with FTP
The data server also offers downloads with rsync (password m1446650):
rsync rsync://m1446650@dataserv.ub.tum.de/m1446650/
Sprache:
de
Rechte:
by, http://creativecommons.org/licenses/by/4.0
Andere Rechte:
Rights implied by original corpus (Europarl-v7), free use
 BibTeX