The Legal Europarl is a derivation of the original Europarl-v7 corpus which contains proceedings of the european parliament. This derivation contains a subset of the original corpus and is available in aligned form (Moses/Giza++). It contains parallel text in 21 language pairs based on 7 languages (cs, de, en, es, fr, it, sv) which can be directly used to train data-intense machine translation systems. Additionally, a separate test set is enclosed for evaluation purposes.
Size: ~32 million sentence pairs
Testset: 2%
«
The Legal Europarl is a derivation of the original Europarl-v7 corpus which contains proceedings of the european parliament. This derivation contains a subset of the original corpus and is available in aligned form (Moses/Giza++). It contains parallel text in 21 language pairs based on 7 languages (cs, de, en, es, fr, it, sv) which can be directly used to train data-intense machine translation systems. Additionally, a separate test set is enclosed for evaluation purposes.
Size: ~32 million se...
»