This corpus is a derivation of the original JRC Acquis corpus which contains legislative documents of the European Parliament since 1958. This derivation contains a subset of the original corpus and is processed into aligned form (Moses/Giza++). It contains parallel text in 21 language pairs based on 7 languages (cs, de, en, es, fr, it, sv) whcih can be directly used in data-intense translation systems. The files are split up in training and test sets.
Size: ~24 million sentence pairs
Testset: 2%
«
This corpus is a derivation of the original JRC Acquis corpus which contains legislative documents of the European Parliament since 1958. This derivation contains a subset of the original corpus and is processed into aligned form (Moses/Giza++). It contains parallel text in 21 language pairs based on 7 languages (cs, de, en, es, fr, it, sv) whcih can be directly used in data-intense translation systems. The files are split up in training and test sets.
Size: ~24 million sentence pairs
Testse...
»