The Legal DCEP is a derivation of the original Digital Corpus of the European Parliament. The Digital Corpus of the European Parliament (DCEP) is a data collection which contains descriptive legal texts (Agendas of plenary sessions, Parliamentary News, Press Releases, Motions for Resolutions, Plenary Sitting Protocols, Reports of the Parliamentary Comittees, Rules of Procedure of the European Parliament, Final Texts of Plenary Votes, Written Questions) published by the Joint Research Centre (JRC) of the European Union in 2014.
This derivation contains a subset of the original corpus and is processed into aligned form (Moses/Giza++). Therefore, it can be directly used in data-intense machine translation systems. It contains parallel text in 21 language pairs based on 7 languages (cs, de, en, es, fr, it, sv). The files are split up in training and test sets.
Size: ~103 million sentence pairs
Testset: 2%
«
The Legal DCEP is a derivation of the original Digital Corpus of the European Parliament. The Digital Corpus of the European Parliament (DCEP) is a data collection which contains descriptive legal texts (Agendas of plenary sessions, Parliamentary News, Press Releases, Motions for Resolutions, Plenary Sitting Protocols, Reports of the Parliamentary Comittees, Rules of Procedure of the European Parliament, Final Texts of Plenary Votes, Written Questions) published by the Joint Research Centre (JRC...
»