SlideShare ist ein Scribd-Unternehmen logo
1 von 16
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE


A Moses MT engine for legal
translation

By Joël Sigling
Joël Sigling
                                      Director



a Moses MT engine for
   legal translation
  Modern technology in a traditional sector
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE
Monte Carlo, 25 March 2012
AVB Translations background

•   Amstelveens Vertaalburo: founded 1972 – traditional, high-quality agency

•   Translation World: founded 2002, tech-savvy all-round player

•   Merger in 2010 >> AVB Translations: premium brand with strong tech focus

•   Top 5 player in The Netherlands, 2011 turnover € 4.6 million

•   Core business: general translations – legal, financial, technical, …
    NO software localization (yet!)
History of MT interest

•   Member of TAUS since 2008, 1st round table Amsterdam

•   Visited TAUS User Conferences in US since 2009

•   Sense of urgency developed, merger distraction 2010

•   Action in 2011 after merger

•   2011: choice for Dutch <> English legal (not IT-related!) domain engine

•   Why SMT, why Moses? Quicker, cheaper, similar quality (shows research)
Why legal domain MT engine?

•   Legal translations about approx. 40% of AVB business, 80% Dutch <>English

•   Not the obvious choice: people said MT wouldn’t work for legal: sentences
    too long, material too intricate

•   Statistical MT suited to non-stylistic materials: eg legal

•   If this works, we can make MT happen for all other domains
MT engine objectives

•   Increased productivity, no BLEU % target, but tangible, practical results.
    How much extra can a translator do when compared to HT?

•   Tool to offer usable quality with very quick turnarounds for high volume
    (typical “Friday afternoon lawyer requests”)

•   Becoming an MT front runner in the non-localization sector for Dutch
    (5th language in Europe after FIGS)
Developing the Moses engine

•   Choice between in-house and external development
     • In-house: control, developing expertise, lower long-term cost
     • External: lower initial cost, much more expertise > best for now

•   Our pre-requisites for development option
     • ownership and free access to engine
     • assurance data will not be used or copied by builder
     • Acceptable costs for development & usage
     • skilled partner > AsiaOnline, CrossLang, Pangeanic, LetsMT,
        SmartMate??

•   CrossLang > all of the above, closest to our office, independent
What we needed
•   Large quantities of high-quality translation data

     •   Aligning existing high-quality legal translations (took longest to prepare)
     •   Existing legal TMs
     •   Going forward: company-/industry-specific terminology

•   Ways to measure gains

     •   Not just automated evaluation % increase, but also tangible
         improvements > we are entrepreneurs, not scientists
     •   CrossLang automated assessment tool (TER, BLEU, NIST, METEOR)
     •   Manual assessment: eg. how many hours for post-editing 10,000 words?
Input data

•   Highest quality AVB Dutch <>English legal translations: approx.
    700k words per language. Predominantly civil law.

•   Not fully reviewed AVB TM, still high-quality: approx. 10 mi.
    words per language. Predominantly civil law.

•   Legal translations harvested by CrossLang, more diverse legal
    material: 7 mi. words per language
CrossLang automated test results

•   Best results from AVB + harvested data, AVB data weighted extra

•   Results particularly good in civil law domain (bulk of AVB input
    data)

•   Results improved dramatically for other legal domains by adding
    harvested data
AVB results in practice

•   Test done in CrossLang production assessment tool: productivity 5%
    higher for post-editing than human output (human output in this
    case very high >1000 w p/h, PE even higer)
AVB results in practice

•   Live rush translations done in past two weeks:

     •   1,500 word trial done for law firm needing high volume in
         very short time. Post-edited in 75 minutes. Customer happy
         with quality/price ratio.
     •   25,000 words in two days with moderate PE effort by two
         post-editors. Quality estimate 80-90% of human translation.
     •   4,500 words in 3 hours with almost full PE effort by one
         post-editor. Quality estimate >90% of human translation
     •   15,000 words in one day, done by two post-editors. Quality
         estimate 80-90% of human translation
AVB results in practice

•   Test and live project show great potential in two areas:

     •   Producing usable translations very quickly and at 50-60% of
         normal translation cost. Margins are similar to normal
         translation, but likely to improve!

     •   Higher productivity, ie lower production cost and
         increased margins.
CrossLang Gateway benefits
•   Standard Moses engine offers no high-level functions
     • Only plain text files, always sentence by sentence, experimental
        recasing, experimental tag handling

•   CrossLang Gateway offers Java service layer (not wrapper scripts)
     • Most common file formats: Word, XML, XLIFF,
     • Adjustable text segmentation
     • Hardened, aligment-based tag handling
     • Advanced recasing tool based on alignment data
     • Named entity recognition & (re)tokenization
     • Terminology checking and replacement

Gateway features crucial to processing our material properly
Conclusions

•   Developing a good engine is not an “out of the box” task

•   Sufficient high-quality data is necessary for good results

•   Results are very promising, our objectives can be achieved

•   Working with a value added partner is recommended

•   Need to integrate MT solution in translation workflow
    apparent
Phone:     +31 20 645.66.10
Mobile:    +31 625.025.475
E-mail:    joel.sigling@avb.nl
Twitter:   @JoelAVB
Adres:     Ouderkerkerlaan 50
           1185 AD Amstelveen
           The Netherlands
Website:   www.avb.nl

Weitere ähnliche Inhalte

Ähnlich wie Moses MT Engine Boosts Legal Translation Productivity

TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Seattle, Two Practical Use Cas...
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Seattle, Two Practical Use Cas...TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Seattle, Two Practical Use Cas...
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Seattle, Two Practical Use Cas...TAUS - The Language Data Network
 
Lexcelera MT Breaking Compromises
Lexcelera MT Breaking CompromisesLexcelera MT Breaking Compromises
Lexcelera MT Breaking CompromisesLoriThicke
 
Is Your Enterprise “fire-fighting” translation issues? Optimize the process w...
Is Your Enterprise “fire-fighting” translation issues? Optimize the process w...Is Your Enterprise “fire-fighting” translation issues? Optimize the process w...
Is Your Enterprise “fire-fighting” translation issues? Optimize the process w...dclsocialmedia
 
Using Checker Software for Clear, Concise and Consistent Content | Berry Braster
Using Checker Software for Clear, Concise and Consistent Content | Berry BrasterUsing Checker Software for Clear, Concise and Consistent Content | Berry Braster
Using Checker Software for Clear, Concise and Consistent Content | Berry BrasterLavaConConference
 
LavaCon 2015: Efficient Translation Management - 5 Specific Metrics That Wil...
LavaCon 2015:  Efficient Translation Management - 5 Specific Metrics That Wil...LavaCon 2015:  Efficient Translation Management - 5 Specific Metrics That Wil...
LavaCon 2015: Efficient Translation Management - 5 Specific Metrics That Wil...Scott Carothers
 
Webinar automotive and engineering content 16.06.16
Webinar   automotive and engineering content 16.06.16Webinar   automotive and engineering content 16.06.16
Webinar automotive and engineering content 16.06.16kantanmt
 
Language Quality Management: Models, Measures, Methodologies
Language Quality Management: Models, Measures, Methodologies Language Quality Management: Models, Measures, Methodologies
Language Quality Management: Models, Measures, Methodologies Sajan
 
Localization and DITA: What you Need to Know - LocWorld32
Localization and DITA: What you Need to Know - LocWorld32Localization and DITA: What you Need to Know - LocWorld32
Localization and DITA: What you Need to Know - LocWorld32IXIASOFT
 
Managing Translation Memories for Engineering and Automotive Translation
Managing Translation Memories for Engineering and Automotive TranslationManaging Translation Memories for Engineering and Automotive Translation
Managing Translation Memories for Engineering and Automotive TranslationPoulomi Choudhury
 
Good Applications of Bad Machine Translation
Good Applications of Bad Machine TranslationGood Applications of Bad Machine Translation
Good Applications of Bad Machine Translationbdonaldson
 
Connected and continuous localization systems for content management systems
Connected and continuous localization systems for content management systemsConnected and continuous localization systems for content management systems
Connected and continuous localization systems for content management systemsTolga Secilmis
 
Localizing Prestashop E-Commerce Site with Wordfast
Localizing Prestashop E-Commerce Site with WordfastLocalizing Prestashop E-Commerce Site with Wordfast
Localizing Prestashop E-Commerce Site with WordfastOlga Melnikova
 
Translation management for life sciences
Translation management for life sciencesTranslation management for life sciences
Translation management for life sciencesWordbee S.A
 
An MT Journey Intuit and Welocalize Localization World 2013
An MT Journey Intuit and Welocalize Localization World 2013An MT Journey Intuit and Welocalize Localization World 2013
An MT Journey Intuit and Welocalize Localization World 2013Welocalize
 
Karnov Super power your search with Text Analytics - Findability Day 2014
Karnov Super power your search with Text Analytics - Findability Day 2014Karnov Super power your search with Text Analytics - Findability Day 2014
Karnov Super power your search with Text Analytics - Findability Day 2014Findwise
 
Introducing language technology in the editing process: How to do things righ...
Introducing language technology in the editing process: How to do things righ...Introducing language technology in the editing process: How to do things righ...
Introducing language technology in the editing process: How to do things righ...Loctimize GmbH
 
Translation and Transcreation Workshop
Translation and Transcreation Workshop Translation and Transcreation Workshop
Translation and Transcreation Workshop Conversis
 
Opening the Black Box of Software Localization
Opening the Black Box of Software LocalizationOpening the Black Box of Software Localization
Opening the Black Box of Software LocalizationKenneth Farrall
 
MiTiN 2013 Keynote in Detroit Michigan
MiTiN 2013 Keynote in Detroit MichiganMiTiN 2013 Keynote in Detroit Michigan
MiTiN 2013 Keynote in Detroit MichiganKirti Vashee
 

Ähnlich wie Moses MT Engine Boosts Legal Translation Productivity (20)

TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Seattle, Two Practical Use Cas...
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Seattle, Two Practical Use Cas...TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Seattle, Two Practical Use Cas...
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Seattle, Two Practical Use Cas...
 
SDL Trados Studio 2017, Jocelyn He (SDL)
SDL Trados Studio 2017, Jocelyn He (SDL)SDL Trados Studio 2017, Jocelyn He (SDL)
SDL Trados Studio 2017, Jocelyn He (SDL)
 
Lexcelera MT Breaking Compromises
Lexcelera MT Breaking CompromisesLexcelera MT Breaking Compromises
Lexcelera MT Breaking Compromises
 
Is Your Enterprise “fire-fighting” translation issues? Optimize the process w...
Is Your Enterprise “fire-fighting” translation issues? Optimize the process w...Is Your Enterprise “fire-fighting” translation issues? Optimize the process w...
Is Your Enterprise “fire-fighting” translation issues? Optimize the process w...
 
Using Checker Software for Clear, Concise and Consistent Content | Berry Braster
Using Checker Software for Clear, Concise and Consistent Content | Berry BrasterUsing Checker Software for Clear, Concise and Consistent Content | Berry Braster
Using Checker Software for Clear, Concise and Consistent Content | Berry Braster
 
LavaCon 2015: Efficient Translation Management - 5 Specific Metrics That Wil...
LavaCon 2015:  Efficient Translation Management - 5 Specific Metrics That Wil...LavaCon 2015:  Efficient Translation Management - 5 Specific Metrics That Wil...
LavaCon 2015: Efficient Translation Management - 5 Specific Metrics That Wil...
 
Webinar automotive and engineering content 16.06.16
Webinar   automotive and engineering content 16.06.16Webinar   automotive and engineering content 16.06.16
Webinar automotive and engineering content 16.06.16
 
Language Quality Management: Models, Measures, Methodologies
Language Quality Management: Models, Measures, Methodologies Language Quality Management: Models, Measures, Methodologies
Language Quality Management: Models, Measures, Methodologies
 
Localization and DITA: What you Need to Know - LocWorld32
Localization and DITA: What you Need to Know - LocWorld32Localization and DITA: What you Need to Know - LocWorld32
Localization and DITA: What you Need to Know - LocWorld32
 
Managing Translation Memories for Engineering and Automotive Translation
Managing Translation Memories for Engineering and Automotive TranslationManaging Translation Memories for Engineering and Automotive Translation
Managing Translation Memories for Engineering and Automotive Translation
 
Good Applications of Bad Machine Translation
Good Applications of Bad Machine TranslationGood Applications of Bad Machine Translation
Good Applications of Bad Machine Translation
 
Connected and continuous localization systems for content management systems
Connected and continuous localization systems for content management systemsConnected and continuous localization systems for content management systems
Connected and continuous localization systems for content management systems
 
Localizing Prestashop E-Commerce Site with Wordfast
Localizing Prestashop E-Commerce Site with WordfastLocalizing Prestashop E-Commerce Site with Wordfast
Localizing Prestashop E-Commerce Site with Wordfast
 
Translation management for life sciences
Translation management for life sciencesTranslation management for life sciences
Translation management for life sciences
 
An MT Journey Intuit and Welocalize Localization World 2013
An MT Journey Intuit and Welocalize Localization World 2013An MT Journey Intuit and Welocalize Localization World 2013
An MT Journey Intuit and Welocalize Localization World 2013
 
Karnov Super power your search with Text Analytics - Findability Day 2014
Karnov Super power your search with Text Analytics - Findability Day 2014Karnov Super power your search with Text Analytics - Findability Day 2014
Karnov Super power your search with Text Analytics - Findability Day 2014
 
Introducing language technology in the editing process: How to do things righ...
Introducing language technology in the editing process: How to do things righ...Introducing language technology in the editing process: How to do things righ...
Introducing language technology in the editing process: How to do things righ...
 
Translation and Transcreation Workshop
Translation and Transcreation Workshop Translation and Transcreation Workshop
Translation and Transcreation Workshop
 
Opening the Black Box of Software Localization
Opening the Black Box of Software LocalizationOpening the Black Box of Software Localization
Opening the Black Box of Software Localization
 
MiTiN 2013 Keynote in Detroit Michigan
MiTiN 2013 Keynote in Detroit MichiganMiTiN 2013 Keynote in Detroit Michigan
MiTiN 2013 Keynote in Detroit Michigan
 

Mehr von TAUS - The Language Data Network

TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...
TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...
TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...TAUS - The Language Data Network
 
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...TAUS - The Language Data Network
 
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...TAUS - The Language Data Network
 
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...TAUS - The Language Data Network
 
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...TAUS - The Language Data Network
 
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...TAUS - The Language Data Network
 
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)TAUS - The Language Data Network
 
Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...
 Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann... Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...
Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...TAUS - The Language Data Network
 
A translation memory P2P trading platform - to make global translation memory...
A translation memory P2P trading platform - to make global translation memory...A translation memory P2P trading platform - to make global translation memory...
A translation memory P2P trading platform - to make global translation memory...TAUS - The Language Data Network
 
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...TAUS - The Language Data Network
 
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...TAUS - The Language Data Network
 
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...TAUS - The Language Data Network
 
The Theory and Practice of Computer Aided Translation Training System, Liu Q...
 The Theory and Practice of Computer Aided Translation Training System, Liu Q... The Theory and Practice of Computer Aided Translation Training System, Liu Q...
The Theory and Practice of Computer Aided Translation Training System, Liu Q...TAUS - The Language Data Network
 
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)TAUS - The Language Data Network
 
A use-case for getting MT into your company, Kerstin Berns (berns language c...
 A use-case for getting MT into your company, Kerstin Berns (berns language c... A use-case for getting MT into your company, Kerstin Berns (berns language c...
A use-case for getting MT into your company, Kerstin Berns (berns language c...TAUS - The Language Data Network
 
How Existing Quality Models Get Challenged, by Katka Gasova (Moravia)
How Existing Quality Models Get Challenged, by Katka Gasova (Moravia)How Existing Quality Models Get Challenged, by Katka Gasova (Moravia)
How Existing Quality Models Get Challenged, by Katka Gasova (Moravia)TAUS - The Language Data Network
 

Mehr von TAUS - The Language Data Network (20)

TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...
TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...
TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...
 
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...
 
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...
 
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...
 
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...
 
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...
 
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)
 
Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...
 Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann... Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...
Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...
 
A translation memory P2P trading platform - to make global translation memory...
A translation memory P2P trading platform - to make global translation memory...A translation memory P2P trading platform - to make global translation memory...
A translation memory P2P trading platform - to make global translation memory...
 
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...
 
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...
 
Farmer Lv (TrueTran)
Farmer Lv (TrueTran)Farmer Lv (TrueTran)
Farmer Lv (TrueTran)
 
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...
 
The Theory and Practice of Computer Aided Translation Training System, Liu Q...
 The Theory and Practice of Computer Aided Translation Training System, Liu Q... The Theory and Practice of Computer Aided Translation Training System, Liu Q...
The Theory and Practice of Computer Aided Translation Training System, Liu Q...
 
Translation Technology Showcase in Shenzhen
Translation Technology Showcase in ShenzhenTranslation Technology Showcase in Shenzhen
Translation Technology Showcase in Shenzhen
 
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)
 
How we train post-editors - Yongpeng Wei (Lingosail)
How we train post-editors - Yongpeng Wei (Lingosail)How we train post-editors - Yongpeng Wei (Lingosail)
How we train post-editors - Yongpeng Wei (Lingosail)
 
A use-case for getting MT into your company, Kerstin Berns (berns language c...
 A use-case for getting MT into your company, Kerstin Berns (berns language c... A use-case for getting MT into your company, Kerstin Berns (berns language c...
A use-case for getting MT into your company, Kerstin Berns (berns language c...
 
QE integrated in XTM, by Bob Willans (XTM)
QE integrated in XTM, by Bob Willans (XTM)QE integrated in XTM, by Bob Willans (XTM)
QE integrated in XTM, by Bob Willans (XTM)
 
How Existing Quality Models Get Challenged, by Katka Gasova (Moravia)
How Existing Quality Models Get Challenged, by Katka Gasova (Moravia)How Existing Quality Models Get Challenged, by Katka Gasova (Moravia)
How Existing Quality Models Get Challenged, by Katka Gasova (Moravia)
 

Kürzlich hochgeladen

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 

Kürzlich hochgeladen (20)

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 

Moses MT Engine Boosts Legal Translation Productivity

  • 1. TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE A Moses MT engine for legal translation By Joël Sigling
  • 2. Joël Sigling Director a Moses MT engine for legal translation Modern technology in a traditional sector TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE Monte Carlo, 25 March 2012
  • 3. AVB Translations background • Amstelveens Vertaalburo: founded 1972 – traditional, high-quality agency • Translation World: founded 2002, tech-savvy all-round player • Merger in 2010 >> AVB Translations: premium brand with strong tech focus • Top 5 player in The Netherlands, 2011 turnover € 4.6 million • Core business: general translations – legal, financial, technical, … NO software localization (yet!)
  • 4. History of MT interest • Member of TAUS since 2008, 1st round table Amsterdam • Visited TAUS User Conferences in US since 2009 • Sense of urgency developed, merger distraction 2010 • Action in 2011 after merger • 2011: choice for Dutch <> English legal (not IT-related!) domain engine • Why SMT, why Moses? Quicker, cheaper, similar quality (shows research)
  • 5. Why legal domain MT engine? • Legal translations about approx. 40% of AVB business, 80% Dutch <>English • Not the obvious choice: people said MT wouldn’t work for legal: sentences too long, material too intricate • Statistical MT suited to non-stylistic materials: eg legal • If this works, we can make MT happen for all other domains
  • 6. MT engine objectives • Increased productivity, no BLEU % target, but tangible, practical results. How much extra can a translator do when compared to HT? • Tool to offer usable quality with very quick turnarounds for high volume (typical “Friday afternoon lawyer requests”) • Becoming an MT front runner in the non-localization sector for Dutch (5th language in Europe after FIGS)
  • 7. Developing the Moses engine • Choice between in-house and external development • In-house: control, developing expertise, lower long-term cost • External: lower initial cost, much more expertise > best for now • Our pre-requisites for development option • ownership and free access to engine • assurance data will not be used or copied by builder • Acceptable costs for development & usage • skilled partner > AsiaOnline, CrossLang, Pangeanic, LetsMT, SmartMate?? • CrossLang > all of the above, closest to our office, independent
  • 8. What we needed • Large quantities of high-quality translation data • Aligning existing high-quality legal translations (took longest to prepare) • Existing legal TMs • Going forward: company-/industry-specific terminology • Ways to measure gains • Not just automated evaluation % increase, but also tangible improvements > we are entrepreneurs, not scientists • CrossLang automated assessment tool (TER, BLEU, NIST, METEOR) • Manual assessment: eg. how many hours for post-editing 10,000 words?
  • 9. Input data • Highest quality AVB Dutch <>English legal translations: approx. 700k words per language. Predominantly civil law. • Not fully reviewed AVB TM, still high-quality: approx. 10 mi. words per language. Predominantly civil law. • Legal translations harvested by CrossLang, more diverse legal material: 7 mi. words per language
  • 10. CrossLang automated test results • Best results from AVB + harvested data, AVB data weighted extra • Results particularly good in civil law domain (bulk of AVB input data) • Results improved dramatically for other legal domains by adding harvested data
  • 11. AVB results in practice • Test done in CrossLang production assessment tool: productivity 5% higher for post-editing than human output (human output in this case very high >1000 w p/h, PE even higer)
  • 12. AVB results in practice • Live rush translations done in past two weeks: • 1,500 word trial done for law firm needing high volume in very short time. Post-edited in 75 minutes. Customer happy with quality/price ratio. • 25,000 words in two days with moderate PE effort by two post-editors. Quality estimate 80-90% of human translation. • 4,500 words in 3 hours with almost full PE effort by one post-editor. Quality estimate >90% of human translation • 15,000 words in one day, done by two post-editors. Quality estimate 80-90% of human translation
  • 13. AVB results in practice • Test and live project show great potential in two areas: • Producing usable translations very quickly and at 50-60% of normal translation cost. Margins are similar to normal translation, but likely to improve! • Higher productivity, ie lower production cost and increased margins.
  • 14. CrossLang Gateway benefits • Standard Moses engine offers no high-level functions • Only plain text files, always sentence by sentence, experimental recasing, experimental tag handling • CrossLang Gateway offers Java service layer (not wrapper scripts) • Most common file formats: Word, XML, XLIFF, • Adjustable text segmentation • Hardened, aligment-based tag handling • Advanced recasing tool based on alignment data • Named entity recognition & (re)tokenization • Terminology checking and replacement Gateway features crucial to processing our material properly
  • 15. Conclusions • Developing a good engine is not an “out of the box” task • Sufficient high-quality data is necessary for good results • Results are very promising, our objectives can be achieved • Working with a value added partner is recommended • Need to integrate MT solution in translation workflow apparent
  • 16. Phone: +31 20 645.66.10 Mobile: +31 625.025.475 E-mail: joel.sigling@avb.nl Twitter: @JoelAVB Adres: Ouderkerkerlaan 50 1185 AD Amstelveen The Netherlands Website: www.avb.nl