Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

State of the Machine Translation by Intento (stock engines, Jun 2019)

4.363 Aufrufe

Veröffentlicht am

Evaluation of 25 major Cloud Machine Translation Services with Stock (pre-trained) models (Alibaba, Amazon, Baidu, CloutTranslate, DeepL, Google Translate, GTCom Yeecloud, IBM Watson v3, Microsoft Text Translator v3, ModernMT, Naver Papago, Niutrans, PROMT, SAP Translation Hub, SDL Language Cloud and BeGlobal, Systran SMT and PNMT, Sogou, Tencent, Tilde, Yandex, Youdao) for 48 language pairs: pricing, performance, quality, and language coverage. We also analyze how the MT landscape changed over the last year.

Veröffentlicht in: Technologie
  • Als Erste(r) kommentieren

State of the Machine Translation by Intento (stock engines, Jun 2019)

  1. 1. STATE OF THE MACHINE TRANSLATION STOCK* MT MODELS by Intento June 2019 * commercially available pre-trained MT models
  2. 2. June 2019© Intento, Inc. DISCLAIMER 2 The MT systems used in this report were accessed from May 10 to June 10, 2019. They may have changed many times since then. — This report demonstrates the performance of those systems exclusively on the datasets used for this report (see slide 13) using proximity scores. The final MT decision requires Human LQA and depends on the use- case. — We run multiple evaluations for our clients for various language pairs and domains, observing different rankings of the MT systems. — There’s no “best” MT system. Performance depends on how your data is similar to what they used to train their models and on their algorithms.
  3. 3. June 2019© Intento, Inc. About We have been evaluating models for Machine Translation since May 2017 (Custom NMT as well). — As we show in this report, the Machine Translation landscape is complex, with models from 8 different vendors required to get the best quality across popular language pairs and a 300x difference in price. And it changes often! — To evaluate on your own dataset, reach us at hello@inten.to — To conveniently use many MT engines at once, check out our Enterprise MT Hub (next slide). 3
  4. 4. June 2019© Intento, Inc. Intento Enterprise MT Hub One place to evaluate and manage MT Universal API to all MT engines Single MT dashboard Connects to many CAT, TMS and CMS Works with files of any size Get your API key at inten.to 4 Smart Routing with retries and failovers MAY BE DEPLOYED ON PRIVATE CLOUD
  5. 5. June 2019© Intento, Inc. Executive Summary Overall MT Quality significantly improved for Finnish<>English, German<>French, Romanian>English, Russian>English, Chinese>English* — The best MT provider has changed for 19 language pairs since January 2019. To get the best quality across 48 language pairs, one needs 8 different engines (see slide 17). — Many engines increased their language coverage: Google, Amazon, Kakao, Systran PNMT, SDL, PROMT, ModernMT, IBM (see slide 23). — New pre-trained engines: Alibaba launched their MT internationally, Tencent MT went from preview to production in China. More providers added pre-trained engines: Tilde (EN-PL) and Cloud Translation (EN-ZH Medical). 5 * Beyond fluctuations attributed to the datasets update.
  6. 6. June 2019© Intento, Inc. Overview 1 TRANSLATION QUALITY 2 PRICING 3 LANGUAGE COVERAGE 4 HISTORICAL PROGRESS 5 CONCLUSIONS 48 Language Pairs 25 Machine Translation Engines 6
  7. 7. June 2019© Intento, Inc. Machine Translation Engines* with Pre-Trained General-Purpose Models * We have evaluated general purpose Cloud Machine Translation services with pre-trained translation models, provided via API. Some vendors also provide web-based, on-premise or custom MT engines, which may differ on all aspects from what we’ve evaluated. Alibaba Cloud MT Amazon Translate Baidu Translate API DeepL API eBay Translation API Google Cloud Translation API GTCom YeeCloud MT IBM Watson Language Translator Kakao Developers Translation Microsoft Translator Text API v3 ModernMT Enterprise API Naver Cloud Papago NMT NiuTrans Maverick Translation PROMT Cloud API SAP Translation Hub SDL BeGlobal SDL (SMT) Language Cloud Sogou Deepi MT SYSTRAN PNMT Enterprise Server SYSTRAN REST Translation API Tencent Cloud TMT API Tilde Machine Translation Yandex Translate API Youdao Cloud Translation API 7 (MT systems highlighted in gray were unavailable for quantitative evaluation for different reasons)
  8. 8. June 2019© Intento, Inc. 1Translation Quality 1.1 Evaluation Methodology 1.2 Available MT Quality 1.3 Top-Performing Engines 1.4 Best General-Purpose Engines 1.5 Optimal General-Purpose Engines 8
  9. 9. June 2019© Intento, Inc. 1.1 Evaluation methodology (I) Translation quality is evaluated by computing LEPOR score between reference translations and the MT output (Slide 11). — Currently, our goal is to evaluate the performance of translation between the most popular languages (Slide 12). — We use public datasets from StatMT/WMT, CASMACAT News Commentary and Tatoeba (Slide 13). — We have performed LEPOR metric convergence analysis to identify the minimal viable number of segments in the dataset. See Slide 14 for some details. 9
  10. 10. June 2019© Intento, Inc. Evaluation methodology (II) We consider MT services BEST for a language pair if their hLEPOR scores are within top 0.5% hLEPOR available for this pair. — We consider MT services TOP for a language pair if their hLEPOR scores are within the top 5% hLEPOR available for this pair. — We consider MT services OPTIMAL for a language pair if they are cheapest amount the top 5% hLEPOR score available for this pair. 10
  11. 11. June 2019© Intento, Inc. LEPOR score LEPOR: automatic machine translation evaluation metric considering the enhanced Length Penalty, n-gram Position difference Penalty and Recall — In our evaluation, we used hLEPORA v.3.1: — (best metric at the ACL-WMT 2013 contest) https://www.slideshare.net/AaronHanLiFeng/lepor-an-augmented-machine-translation-evaluation-metric-thesis-ppt https://github.com/aaronlifenghan/aaron-project-lepor LIKE BLEU, BUT BETTER 11
  12. 12. June 2019© Intento, Inc. 48 Language Pairs * https://w3techs.com/technologies/overview/content_language/all Language groups by web popularity*: P1 - ≥ 2.0% websites P2 - 0.5%-2% websites P3 - 0.1-0.3% websites P4 - <0.1% websites — We focus on the en-P1, P1-en and P1-P1 (partially) en ru ja de es fr pt it zh cs tr fi ro ko ar nl en ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ru ✓ ✓ ✓ ✓ ✓ ja ✓ ✓ ✓ de ✓ ✓ ✓ ✓ ✓ es ✓ ✓ fr ✓ ✓ ✓ ✓ pt ✓ it ✓ ✓ ✓ zh ✓ ✓ ✓ cs ✓ tr ✓ fi ✓ ro ✓ ko ✓ ar ✓ nl ✓ 12
  13. 13. June 2019© Intento, Inc. Datasets WMT-2013 (translation task, news domain) en-es, es-en WMT-2015 (translation task, news domain) fr-en, en-fr WMT-2016 (translation task, news domain) ro-en, en-ro WMT-2018 (translation task, news domain) tr-en, en-tr, cs-en WMT-2019 (translation task, news domain) zh-en, en-zh, en-cs, de-en, en-de, ru-en, en-ru, fi-en, en-fi, de-fr, fr-de NewsCommentary-2011 en-ja, ja-en, en-pt, pt-en, en-it, it-en, ru-de, de-ru, ru-es, ru-fr, ru-pt, ja-fr, de-ja, es-zh, fr-ru, fr-es, it-pt, zh-it, en-ar, ar-en, en-nl, nl-en, de-it, it-de, ja-zh, zh-ja Tatoeba, JHE en-ko, ko-en 13
  14. 14. June 2019© Intento, Inc. We used 1600 - 2000 sentences per language pair. The metric stabilizes and adding more from the same domain won’t change the outcome. number of sentences regularisedhLEPORscores Aggregated across all language pairs Examples for individual language pairs LEPOR Convergence Confi- dence interval Aggre- gated mean 14
  15. 15. June 2019© Intento, Inc. en ru ja de es fr pt it zh cs tr fi ro ko ar nl en 5 8 2 9 7 8 7 4 2 3 1 4 1 5 7 ru 3 5 5 5 3 ja 5 4 6 de 5 5 3 2 5 es 9 5 fr 9 5 6 8 pt 9 it 6 6 5 zh 3 5 4 cs 5 tr 4 fi 2 ro 4 ko 1 ar 5 nl 9 $ $ Available MT Quality Maximal Available hLEPOR score: >80 % 70 % 60 % 50 % 40 % <40 % Minimal price for this quality, per 1M char*: $$$$ ≥$100 $$$ $20-25 $$ $10-15 $ <$10 No. of top-performing MT Providers** * base pricing tier ** up to 5% worse than the leader, SMT and NMT counted separately Check Appendix A for more detailed data. $ $ $$ $ $ $$$$ $ $ $ $ $ $ $$ $$$$$$ $ $ $ $ $ $ $ $$$ $$ $ $ $$ $ $ $$ $ $$ $$ $ $ $ $ $$$ $ $ $ $$$ $ $ 15 improved: fi-en fr-de ro-en zh-en en-fi de-fr ru-en changes: 25 pairs Changed since last report
  16. 16. June 2019© Intento, Inc. optimal Provides the lowest price among the top 5% MT engines for a language pair 0 8 16 24 32 40 deepl google am azon m odernm t yandex systran-pnm tm sft-nm t ibm -nm t tencent baidu sdl-beglobal prom t gtcom sdl-sm t across 48 language pairs 1.3 TOP Performing MT Providers best Provides the best MT Quality for a language pair top 5% Within 5% of the best available MT Quality for a language pair 16 numberoflanguagepairs Intento, Inc. June 2019
  17. 17. June 2019© Intento, Inc. en ru ja de es fr pt it zh cs tr fi ro ko ar nl en ru ja de es fr pt it zh cs tr fi ro ko ar nl 1.4 Best general- purpose MT engines MT Engines deepl google amazon yandex systran-pnmt modernmt ibm promt microsoft tencent baidu 17 In several cases, there’s no statistically significant difference between the top engines. Check Appendix A for more detailed data. changed since Jan 2019: 19 pairs
  18. 18. June 2019© Intento, Inc. en ru ja de es fr pt it zh cs tr fi ro ko ar nl en ru ja de es fr pt it zh cs tr fi ro ko ar nl * Cheapest with a performance within 5% of the best available engine for this language pair Optimal* general- purpose MT engines 18 MT Engines deepl google amazon yandex systran-pnmt modernmt ibm promt microsoft tencent baidu changed since Jan 2019: 12 pairs
  19. 19. June 2019© Intento, Inc. Sample pair analysis: English-Chinese LEPOR score Providers Price range (per 1M characters) 71 % Tencent (preview) 69 % Baidu, GTCom, Google $8-20 66 % Amazon, Systran PNMT $15-N/A 65 % Microsoft, ModernMT, Yandex $6-$1,000 based on WMT-19 dataset BEST QUALITY: Tencent (preview) TOP 5%: Tencent, Baidu, GTCom, Google BEST PRICE IN TOP 5%: Baidu 19
  20. 20. June 2019© Intento, Inc. Sample pair analysis: Finnish-English 20 sentence difficulty MTagreement source Oulun poliisi onnistui tiistaina löytämään Raahessa kadonneen sienestäjän dronen avulla vain parissa minuutissa. hLEPOR: reference On Tuesday, using a drone, Oulu police found a missing mushroom picker in Raahe in only a few minutes. 1 best MT On Tuesday, the Oulu police managed to find a mushroom picker missing in Raahe with the help of a drone in just a few minutes. 0.77 other MT engines On Tuesday, the Oulu police managed to find a missing mushroom spider in Raahe in just a few minutes. 0.75 On Tuesday, the Oulu police managed to find a lost mushroom maker in Raahe in just a few minutes. 0.71 On Tuesday, the Oulu police managed to find a missing mushroom inhibitor with a drone in just a few minutes. 0.70 The police of Oulu managed to find a missing mushroom pilot in Raahe in just a couple of minutes. 0.57 The police managed to find oulun Raahessa disappeared on tuesday dronen sienestäjän through on just one minute. 🤔 Oulu police managed on Tuesday to find the Raahe missing mushroom pickers drone using just a couple of minutes. 🤦 On Tuesday, the Oulu police managed to find a funeral in the Bible with the drone of the missing fungicides in just a few minutes. 🙈 To validate the MT choice, look at the sentences of median difficulty and high disagreement across different MT engines based on WMT-19 dataset
  21. 21. June 2019© Intento, Inc. 2 Public pricing USD per 1M symbols*** * volume estimation based on 4.79 symbols per word ** +20% for some language pairs *** freemium volumes are not shown 21
  22. 22. June 2019© Intento, Inc. 3Language Coverage 3.1 Supported and Unique per Provider 3.2 Coverage by Language Popularity 22
  23. 23. June 2019© Intento, Inc. 1 100 10000 N iutrans G oogle Yandex M icrosoftv3 Sogou Baidu Am azon Kakao Systran Tencent SDL PRO M T G TC om SAP DeepL M odernM TIBM W atson v3 N aver Youdao Alibaba eBay Tilde 1 3 2 54 6 8 272 2 202 1 2 20 24 38 5256 72 9090 111121122 139 342 594 756 3 422 3 782 7 482 10 50613 340 Total Unique 3.1 Supported and Unique Language Pairs* Unique language pairs - supported exclusively by one provider 23 * where possible, we have checked via API if all language pairs advertised by the documentation are supported and removed the pairs we were unable to locate in the API. ** as advertised (not validated via API) ** ** ** ** ** ** ** ****
  24. 24. June 2019© Intento, Inc. Language popularity Language groups by web popularity*: P1 - ≥ 2.0% websites P2 - 0.5%-2% websites P3 - 0.1-0.3% websites P4 - <0.1% websites * https://w3techs.com/technologies/overview/content_language/all A total of 28730 pairs possible, 14136 are supported across all providers P1 en, ru, ja, de, es, fr, pt, it, zh P2 pl, fa, tr, nl, ko, cs, ar, vi, el, sv, in, ro, hu P3 da, sk, fi, th, bg, he, lt, uk, hr, no/nb, sr, ca, sl, lv, et P4 hi, az, bs, ms, is, mk, bn, eu, ka, sq, gl, mn, kk, hy, se, uz, kr, ur, ta, nn, af, be, si, my, br, ne, sw, km, fil, ml, pa, … 24
  25. 25. June 2019© Intento, Inc. 100% 100% 63% 38% P1 P2 P3 P4 P1 P2 P3 P4 63% 100% 100% 100% 63% 100% 100% 100% 63% 63% 63% 99% 3.2 Language coverage by popularity 49% of possible language pairs 25
  26. 26. June 2019© Intento, Inc. Language coverage by service provider NiuTrans Maverick Translation Google Cloud Translation API Yandex Translate API Microsoft Translator Text API v3 Sogou Deepi MT Baidu Translate API Amazon Translate Tencent Cloud TMT API (preview) Youdao Cloud Translation API Systran PNMT SDL BeGlobal PROMT Cloud API SAP Translation Hub DeepL API IBM Watson Language Translator v3 ModernMT API Naver Papago NMT Alibaba Translate GTCom YeeCloud MT Kakao MT eBay MT (preview) Tilde MT 26
  27. 27. June 2019© Intento, Inc. 4 Historical Progress 4.1 Number of Cloud MT Vendors 4.2 MT Quality 4.3 Performance/Price Efficiency 27
  28. 28. June 2019© Intento, Inc. 4.1 Independent Cloud MT Vendors with pre-trained models Commercial Alibaba, Amazon, Baidu, CloudTranslate, DeepL, Google, GTCom, IBM, Microsoft, ModernMT, Naver, Niutrans, PROMT, SAP, SDL, Sogou, Systran, Tilde, Tencent, Yandex, Youdao Preview / Limited eBay, Kakao 0 5 10 15 20 25 Mar 18 Jul 18 Dec 18 Jun 19 Preview Commercial Intento, Inc. • June 2019 28
  29. 29. June 2019© Intento, Inc. 30 % 40 % 50 % 60 % 70 % 80 % Mar 18 Jul 18 Dec 18 Jun 19 Best pair Worst pair 8 9 4.2 Best available MT Quality Stable growth of “almost perfect” pairs Number of language pairs available at this level of LEPOR quality out of 48 pairs we evaluate since March 2018 15 16 8 16 17 7 Intento, Inc. • Jun 2019 15 17 7 11 15 16 6 29 2 7
  30. 30. June 2019© Intento, Inc. 4.3 Best available Performance/Price Efficiency Grows as NMT gets cheaper Efficiency = (hLEPOR in %)² / (USD per 1M symbols) — Number of language pairs available at this level of efficiency out of 48 pairs we evaluated since March 2018 2 12 14 17 Intento, Inc. • Jun 2019 30 3 3 11 15 16 3 4 13 18 12 1 1 9 16 14 7 1 0 200 400 600 800 1 000 1 200 Mar 18 Jul 18 Dec 18 Jun 19 Best pair Worst pair
  31. 31. June 2019© Intento, Inc. 5 Conclusions MT Landscape continues to evolve, both in terms of quality and price. — Language coverage is increasing faster than ever. — Even for the general domain, having the best quality across 48 language pairs requires 8 engines used simultaneously (and those are different from half a year ago). — Re-evaluate your MT choice often to stay competitive. 31
  32. 32. June 2019© Intento, Inc. Intento Professional Services MT Evaluation and Integration Training and statistically significant evaluation of NMT engines, which may bring the most cost and time reduction on the post-editing stage (see the example here). — Identifying a subset of MT results for fast and affordable manual inspection (~200x reduction of LQA efforts). — LQA and HTER also available via our LSP partners. — MT Integration - SDK and connectors to open platforms and in-house software. — Reach us at hello@inten.to 32
  33. 33. June 2019© Intento, Inc. Intento Enterprise MT Hub One place to evaluate and manage MT Universal API to all MT engines Single MT dashboard Connects to many CAT, TMS and CMS Works with files of any size Get your API key at inten.to 33 Smart Routing with retries and failovers MAY BE DEPLOYED ON PRIVATE CLOUD
  34. 34. June 2019© Intento, Inc. Intento Web-Tools Human-Friendly UI working directly with the Intento API — Quick way to try every MT engine and translate large files without API integration. — Available in preview at no added cost to Intento API 34 SIGN UP at https://console.inten.to
  35. 35. June 2019© Intento, Inc. Intento Plugins and Connectors 35 memoQ (private plugin) — SDL Trados (private plugin, also in SDL AppStore) — MateCat (private plugin) — YiCat (coming soon) — Also, many of the engines are available in Smartcat. — Any Enterprise TMS via XLIFF connector. — Miss some connector? Reach us at hello@inten.to! XLIFF
  36. 36. by Intento (https://inten.to) June 2019 Intento, Inc. hello@inten.to 2150 Shattuck Ave Berkeley CA 94704 36 STATE OF THE MACHINE TRANSLATION STOCK* MT MODELS * commercially available pre-trained MT models
  37. 37. June 2019© Intento, Inc. Appendix A Average hLEPOR ranking across all 48 language pairs WARNING: This chart looks cool but requires a high level of color sensitivity. Also, there are lots of overlapping circles. Please look at slides 18 and 19 for more digestible data. 37 AveragehLEPOR Intento, Inc. • Jun 2019

×