



100,753,173,053 total sentence pairs
1004 languages available
This table displays 100 corpora, which make up 93.36% of the whole OPUS collection
| Corpus | Sentences | % of OPUS |
|---|---|---|
| OpenSubtitles | 27.2B | 27.04 |
| NLLB | 22.7B | 22.56 |
| CCMatrix | 17.1B | 16.96 |
| ParaCrawl | 4.6B | 4.60 |
| CCAligned | 3.1B | 3.11 |
| MultiParaCrawl | 2.8B | 2.80 |
| MultiCCAligned | 2.4B | 2.39 |
| GNOME | 1.7B | 1.65 |
| LinguaTools-WikiTitles | 1.5B | 1.45 |
| MultiHPLT | 1.5B | 1.45 |
| HPLT | 1.2B | 1.21 |
| DGT | 1.2B | 1.21 |
| XLEnt | 1.1B | 1.06 |
| WikiMatrix | 933.6M | 0.93 |
| UNPC | 543.9M | 0.54 |
| EUbookshop | 459.3M | 0.46 |
| ParaCrawl-Bonus | 316.2M | 0.31 |
| EMEA | 282.5M | 0.28 |
| translatewiki | 264.9M | 0.26 |
| MultiUN | 255.9M | 0.25 |
| EuroPat | 252.2M | 0.25 |
| KDE4 | 224.7M | 0.22 |
| Europarl | 217.4M | 0.22 |
| JRC-Acquis | 215.9M | 0.21 |
| TildeMODEL | 193M | 0.19 |
| QED | 191.9M | 0.19 |
| TED2020 | 153.1M | 0.15 |
| Samanantar | 151.2M | 0.15 |
| Mozilla-I10n | 124.3M | 0.12 |
| bible-uedin | 88.3M | 0.08760 |
| MaCoCu | 81.6M | 0.08099 |
| NeuLab-TedTalks | 79.7M | 0.07911 |
| JParaCrawl | 79.3M | 0.07870 |
| MultiMaCoCu | 79M | 0.07844 |
| wikimedia | 75.9M | 0.07538 |
| giga-fren | 70.1M | 0.06954 |
| GoURMET | 62.7M | 0.06224 |
| StanfordNLP-NMT | 58.3M | 0.05787 |
| Tanzil | 50M | 0.04960 |
| Anuvaad | 49.1M | 0.04874 |
| ECB | 45.9M | 0.04555 |
| Wikipedia | 38.9M | 0.03856 |
| ELITR-ECA | 28.3M | 0.02808 |
| SETIMES | 26.4M | 0.02621 |
| DOGC | 26.1M | 0.02586 |
| Tatoeba | 19.8M | 0.01962 |
| MBS | 15.1M | 0.01497 |
| GlobalVoices | 14M | 0.01388 |
| Finlex | 11.3M | 0.01123 |
| News-Commentary | 11.1M | 0.01100 |
| SciELO | 10.8M | 0.01074 |
| PHP | 10.6M | 0.01047 |
| JESC | 8.4M | 0.00833 |
| pmindia | 8.3M | 0.00820 |
| ParIce | 6.4M | 0.00639 |
| fiskmo | 6.3M | 0.00629 |
| EOPC | 6.1M | 0.00603 |
| MDN_Web_Docs | 6M | 0.00595 |
| TED2013 | 5.7M | 0.00570 |
| EhuHac | 5.7M | 0.00566 |
| IITB | 4.9M | 0.00483 |
| Nunavut_Hansard | 4.1M | 0.00402 |
| infopankki | 4M | 0.00393 |
| ChuBiCo | 3.7M | 0.00369 |
| SCB_MT_EN_TH | 3.5M | 0.00346 |
| CAPES | 3.5M | 0.00345 |
| MIZAN | 3.1M | 0.00307 |
| OpenOffice | 2.7M | 0.00268 |
| WikiTitles | 2.7M | 0.00264 |
| EUconst | 2.3M | 0.00232 |
| Books | 2.2M | 0.00218 |
| SUMMA | 2.1M | 0.00206 |
| Elhuyar | 2M | 0.00197 |
| EiTB-ParCC | 1.9M | 0.00193 |
| TEP | 1.8M | 0.00182 |
| Joshua-IPC | 1.6M | 0.00160 |
| KFTT | 1.3M | 0.00132 |
| KDEdoc | 1M | 0.00101 |
| WMT-News | 1M | 0.00100 |
| tico-19 | 983.8K | 0.00098 |
| tldr-pages | 778.5K | 0.00077 |
| ECDC | 749.1K | 0.00074 |
| memat | 489.3K | 0.00049 |
| hrenWaC | 297K | 0.00029 |
| TedTalks | 260.3K | 0.00026 |
| FFR | 246.4K | 0.00024 |
| SPC | 219.7K | 0.00022 |
| MontenegrinSubs | 211.5K | 0.00021 |
| OfisPublik | 191.1K | 0.00019 |
| Bianet | 186.9K | 0.00019 |
| XhosaNavy | 154.8K | 0.00015 |
| WikiSource | 113.3K | 0.00011 |
| ALT | 54.4K | 0.000053973 |
| sardware | 19.3K | 0.000019112 |
| Salome | 15.9K | 0.000015753 |
| ada83 | 12.5K | 0.000012449 |
| RF | 2.2K | 0.000002196 |
| komi | Not specified | Not specified |
| liv4ever | Not specified | Not specified |
| Ubuntu | Not specified | Not specified |





