SlideShare ist ein Scribd-Unternehmen logo
1 von 16
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE


A Moses MT engine for legal
translation

By Joël Sigling
Joël Sigling
                                      Director



a Moses MT engine for
   legal translation
  Modern technology in a traditional sector
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE
Monte Carlo, 25 March 2012
AVB Translations background

•   Amstelveens Vertaalburo: founded 1972 – traditional, high-quality agency

•   Translation World: founded 2002, tech-savvy all-round player

•   Merger in 2010 >> AVB Translations: premium brand with strong tech focus

•   Top 5 player in The Netherlands, 2011 turnover € 4.6 million

•   Core business: general translations – legal, financial, technical, …
    NO software localization (yet!)
History of MT interest

•   Member of TAUS since 2008, 1st round table Amsterdam

•   Visited TAUS User Conferences in US since 2009

•   Sense of urgency developed, merger distraction 2010

•   Action in 2011 after merger

•   2011: choice for Dutch <> English legal (not IT-related!) domain engine

•   Why SMT, why Moses? Quicker, cheaper, similar quality (shows research)
Why legal domain MT engine?

•   Legal translations about approx. 40% of AVB business, 80% Dutch <>English

•   Not the obvious choice: people said MT wouldn’t work for legal: sentences
    too long, material too intricate

•   Statistical MT suited to non-stylistic materials: eg legal

•   If this works, we can make MT happen for all other domains
MT engine objectives

•   Increased productivity, no BLEU % target, but tangible, practical results.
    How much extra can a translator do when compared to HT?

•   Tool to offer usable quality with very quick turnarounds for high volume
    (typical “Friday afternoon lawyer requests”)

•   Becoming an MT front runner in the non-localization sector for Dutch
    (5th language in Europe after FIGS)
Developing the Moses engine

•   Choice between in-house and external development
     • In-house: control, developing expertise, lower long-term cost
     • External: lower initial cost, much more expertise > best for now

•   Our pre-requisites for development option
     • ownership and free access to engine
     • assurance data will not be used or copied by builder
     • Acceptable costs for development & usage
     • skilled partner > AsiaOnline, CrossLang, Pangeanic, LetsMT,
        SmartMate??

•   CrossLang > all of the above, closest to our office, independent
What we needed
•   Large quantities of high-quality translation data

     •   Aligning existing high-quality legal translations (took longest to prepare)
     •   Existing legal TMs
     •   Going forward: company-/industry-specific terminology

•   Ways to measure gains

     •   Not just automated evaluation % increase, but also tangible
         improvements > we are entrepreneurs, not scientists
     •   CrossLang automated assessment tool (TER, BLEU, NIST, METEOR)
     •   Manual assessment: eg. how many hours for post-editing 10,000 words?
Input data

•   Highest quality AVB Dutch <>English legal translations: approx.
    700k words per language. Predominantly civil law.

•   Not fully reviewed AVB TM, still high-quality: approx. 10 mi.
    words per language. Predominantly civil law.

•   Legal translations harvested by CrossLang, more diverse legal
    material: 7 mi. words per language
CrossLang automated test results

•   Best results from AVB + harvested data, AVB data weighted extra

•   Results particularly good in civil law domain (bulk of AVB input
    data)

•   Results improved dramatically for other legal domains by adding
    harvested data
AVB results in practice

•   Test done in CrossLang production assessment tool: productivity 5%
    higher for post-editing than human output (human output in this
    case very high >1000 w p/h, PE even higer)
AVB results in practice

•   Live rush translations done in past two weeks:

     •   1,500 word trial done for law firm needing high volume in
         very short time. Post-edited in 75 minutes. Customer happy
         with quality/price ratio.
     •   25,000 words in two days with moderate PE effort by two
         post-editors. Quality estimate 80-90% of human translation.
     •   4,500 words in 3 hours with almost full PE effort by one
         post-editor. Quality estimate >90% of human translation
     •   15,000 words in one day, done by two post-editors. Quality
         estimate 80-90% of human translation
AVB results in practice

•   Test and live project show great potential in two areas:

     •   Producing usable translations very quickly and at 50-60% of
         normal translation cost. Margins are similar to normal
         translation, but likely to improve!

     •   Higher productivity, ie lower production cost and
         increased margins.
CrossLang Gateway benefits
•   Standard Moses engine offers no high-level functions
     • Only plain text files, always sentence by sentence, experimental
        recasing, experimental tag handling

•   CrossLang Gateway offers Java service layer (not wrapper scripts)
     • Most common file formats: Word, XML, XLIFF,
     • Adjustable text segmentation
     • Hardened, aligment-based tag handling
     • Advanced recasing tool based on alignment data
     • Named entity recognition & (re)tokenization
     • Terminology checking and replacement

Gateway features crucial to processing our material properly
Conclusions

•   Developing a good engine is not an “out of the box” task

•   Sufficient high-quality data is necessary for good results

•   Results are very promising, our objectives can be achieved

•   Working with a value added partner is recommended

•   Need to integrate MT solution in translation workflow
    apparent
Phone:     +31 20 645.66.10
Mobile:    +31 625.025.475
E-mail:    joel.sigling@avb.nl
Twitter:   @JoelAVB
Adres:     Ouderkerkerlaan 50
           1185 AD Amstelveen
           The Netherlands
Website:   www.avb.nl

Weitere ähnliche Inhalte

Ähnlich wie Moses MT Engine Boosts Legal Translation Productivity

TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Seattle, Two Practical Use Cas...
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Seattle, Two Practical Use Cas...TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Seattle, Two Practical Use Cas...
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Seattle, Two Practical Use Cas...TAUS - The Language Data Network
 
Lexcelera MT Breaking Compromises
Lexcelera MT Breaking CompromisesLexcelera MT Breaking Compromises
Lexcelera MT Breaking CompromisesLoriThicke
 
Is Your Enterprise “fire-fighting” translation issues? Optimize the process w...
Is Your Enterprise “fire-fighting” translation issues? Optimize the process w...Is Your Enterprise “fire-fighting” translation issues? Optimize the process w...
Is Your Enterprise “fire-fighting” translation issues? Optimize the process w...dclsocialmedia
 
Using Checker Software for Clear, Concise and Consistent Content | Berry Braster
Using Checker Software for Clear, Concise and Consistent Content | Berry BrasterUsing Checker Software for Clear, Concise and Consistent Content | Berry Braster
Using Checker Software for Clear, Concise and Consistent Content | Berry BrasterLavaConConference
 
LavaCon 2015: Efficient Translation Management - 5 Specific Metrics That Wil...
LavaCon 2015:  Efficient Translation Management - 5 Specific Metrics That Wil...LavaCon 2015:  Efficient Translation Management - 5 Specific Metrics That Wil...
LavaCon 2015: Efficient Translation Management - 5 Specific Metrics That Wil...Scott Carothers
 
Webinar automotive and engineering content 16.06.16
Webinar   automotive and engineering content 16.06.16Webinar   automotive and engineering content 16.06.16
Webinar automotive and engineering content 16.06.16kantanmt
 
Language Quality Management: Models, Measures, Methodologies
Language Quality Management: Models, Measures, Methodologies Language Quality Management: Models, Measures, Methodologies
Language Quality Management: Models, Measures, Methodologies Sajan
 
Localization and DITA: What you Need to Know - LocWorld32
Localization and DITA: What you Need to Know - LocWorld32Localization and DITA: What you Need to Know - LocWorld32
Localization and DITA: What you Need to Know - LocWorld32IXIASOFT
 
Managing Translation Memories for Engineering and Automotive Translation
Managing Translation Memories for Engineering and Automotive TranslationManaging Translation Memories for Engineering and Automotive Translation
Managing Translation Memories for Engineering and Automotive TranslationPoulomi Choudhury
 
Good Applications of Bad Machine Translation
Good Applications of Bad Machine TranslationGood Applications of Bad Machine Translation
Good Applications of Bad Machine Translationbdonaldson
 
Connected and continuous localization systems for content management systems
Connected and continuous localization systems for content management systemsConnected and continuous localization systems for content management systems
Connected and continuous localization systems for content management systemsTolga Secilmis
 
Localizing Prestashop E-Commerce Site with Wordfast
Localizing Prestashop E-Commerce Site with WordfastLocalizing Prestashop E-Commerce Site with Wordfast
Localizing Prestashop E-Commerce Site with WordfastOlga Melnikova
 
Translation management for life sciences
Translation management for life sciencesTranslation management for life sciences
Translation management for life sciencesWordbee S.A
 
An MT Journey Intuit and Welocalize Localization World 2013
An MT Journey Intuit and Welocalize Localization World 2013An MT Journey Intuit and Welocalize Localization World 2013
An MT Journey Intuit and Welocalize Localization World 2013Welocalize
 
Karnov Super power your search with Text Analytics - Findability Day 2014
Karnov Super power your search with Text Analytics - Findability Day 2014Karnov Super power your search with Text Analytics - Findability Day 2014
Karnov Super power your search with Text Analytics - Findability Day 2014Findwise
 
Introducing language technology in the editing process: How to do things righ...
Introducing language technology in the editing process: How to do things righ...Introducing language technology in the editing process: How to do things righ...
Introducing language technology in the editing process: How to do things righ...Loctimize GmbH
 
Translation and Transcreation Workshop
Translation and Transcreation Workshop Translation and Transcreation Workshop
Translation and Transcreation Workshop Conversis
 
Opening the Black Box of Software Localization
Opening the Black Box of Software LocalizationOpening the Black Box of Software Localization
Opening the Black Box of Software LocalizationKenneth Farrall
 
MiTiN 2013 Keynote in Detroit Michigan
MiTiN 2013 Keynote in Detroit MichiganMiTiN 2013 Keynote in Detroit Michigan
MiTiN 2013 Keynote in Detroit MichiganKirti Vashee
 

Ähnlich wie Moses MT Engine Boosts Legal Translation Productivity (20)

TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Seattle, Two Practical Use Cas...
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Seattle, Two Practical Use Cas...TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Seattle, Two Practical Use Cas...
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Seattle, Two Practical Use Cas...
 
SDL Trados Studio 2017, Jocelyn He (SDL)
SDL Trados Studio 2017, Jocelyn He (SDL)SDL Trados Studio 2017, Jocelyn He (SDL)
SDL Trados Studio 2017, Jocelyn He (SDL)
 
Lexcelera MT Breaking Compromises
Lexcelera MT Breaking CompromisesLexcelera MT Breaking Compromises
Lexcelera MT Breaking Compromises
 
Is Your Enterprise “fire-fighting” translation issues? Optimize the process w...
Is Your Enterprise “fire-fighting” translation issues? Optimize the process w...Is Your Enterprise “fire-fighting” translation issues? Optimize the process w...
Is Your Enterprise “fire-fighting” translation issues? Optimize the process w...
 
Using Checker Software for Clear, Concise and Consistent Content | Berry Braster
Using Checker Software for Clear, Concise and Consistent Content | Berry BrasterUsing Checker Software for Clear, Concise and Consistent Content | Berry Braster
Using Checker Software for Clear, Concise and Consistent Content | Berry Braster
 
LavaCon 2015: Efficient Translation Management - 5 Specific Metrics That Wil...
LavaCon 2015:  Efficient Translation Management - 5 Specific Metrics That Wil...LavaCon 2015:  Efficient Translation Management - 5 Specific Metrics That Wil...
LavaCon 2015: Efficient Translation Management - 5 Specific Metrics That Wil...
 
Webinar automotive and engineering content 16.06.16
Webinar   automotive and engineering content 16.06.16Webinar   automotive and engineering content 16.06.16
Webinar automotive and engineering content 16.06.16
 
Language Quality Management: Models, Measures, Methodologies
Language Quality Management: Models, Measures, Methodologies Language Quality Management: Models, Measures, Methodologies
Language Quality Management: Models, Measures, Methodologies
 
Localization and DITA: What you Need to Know - LocWorld32
Localization and DITA: What you Need to Know - LocWorld32Localization and DITA: What you Need to Know - LocWorld32
Localization and DITA: What you Need to Know - LocWorld32
 
Managing Translation Memories for Engineering and Automotive Translation
Managing Translation Memories for Engineering and Automotive TranslationManaging Translation Memories for Engineering and Automotive Translation
Managing Translation Memories for Engineering and Automotive Translation
 
Good Applications of Bad Machine Translation
Good Applications of Bad Machine TranslationGood Applications of Bad Machine Translation
Good Applications of Bad Machine Translation
 
Connected and continuous localization systems for content management systems
Connected and continuous localization systems for content management systemsConnected and continuous localization systems for content management systems
Connected and continuous localization systems for content management systems
 
Localizing Prestashop E-Commerce Site with Wordfast
Localizing Prestashop E-Commerce Site with WordfastLocalizing Prestashop E-Commerce Site with Wordfast
Localizing Prestashop E-Commerce Site with Wordfast
 
Translation management for life sciences
Translation management for life sciencesTranslation management for life sciences
Translation management for life sciences
 
An MT Journey Intuit and Welocalize Localization World 2013
An MT Journey Intuit and Welocalize Localization World 2013An MT Journey Intuit and Welocalize Localization World 2013
An MT Journey Intuit and Welocalize Localization World 2013
 
Karnov Super power your search with Text Analytics - Findability Day 2014
Karnov Super power your search with Text Analytics - Findability Day 2014Karnov Super power your search with Text Analytics - Findability Day 2014
Karnov Super power your search with Text Analytics - Findability Day 2014
 
Introducing language technology in the editing process: How to do things righ...
Introducing language technology in the editing process: How to do things righ...Introducing language technology in the editing process: How to do things righ...
Introducing language technology in the editing process: How to do things righ...
 
Translation and Transcreation Workshop
Translation and Transcreation Workshop Translation and Transcreation Workshop
Translation and Transcreation Workshop
 
Opening the Black Box of Software Localization
Opening the Black Box of Software LocalizationOpening the Black Box of Software Localization
Opening the Black Box of Software Localization
 
MiTiN 2013 Keynote in Detroit Michigan
MiTiN 2013 Keynote in Detroit MichiganMiTiN 2013 Keynote in Detroit Michigan
MiTiN 2013 Keynote in Detroit Michigan
 

Mehr von TAUS - The Language Data Network

TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...
TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...
TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...TAUS - The Language Data Network
 
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...TAUS - The Language Data Network
 
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...TAUS - The Language Data Network
 
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...TAUS - The Language Data Network
 
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...TAUS - The Language Data Network
 
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...TAUS - The Language Data Network
 
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)TAUS - The Language Data Network
 
Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...
 Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann... Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...
Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...TAUS - The Language Data Network
 
A translation memory P2P trading platform - to make global translation memory...
A translation memory P2P trading platform - to make global translation memory...A translation memory P2P trading platform - to make global translation memory...
A translation memory P2P trading platform - to make global translation memory...TAUS - The Language Data Network
 
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...TAUS - The Language Data Network
 
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...TAUS - The Language Data Network
 
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...TAUS - The Language Data Network
 
The Theory and Practice of Computer Aided Translation Training System, Liu Q...
 The Theory and Practice of Computer Aided Translation Training System, Liu Q... The Theory and Practice of Computer Aided Translation Training System, Liu Q...
The Theory and Practice of Computer Aided Translation Training System, Liu Q...TAUS - The Language Data Network
 
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)TAUS - The Language Data Network
 
A use-case for getting MT into your company, Kerstin Berns (berns language c...
 A use-case for getting MT into your company, Kerstin Berns (berns language c... A use-case for getting MT into your company, Kerstin Berns (berns language c...
A use-case for getting MT into your company, Kerstin Berns (berns language c...TAUS - The Language Data Network
 
How Existing Quality Models Get Challenged, by Katka Gasova (Moravia)
How Existing Quality Models Get Challenged, by Katka Gasova (Moravia)How Existing Quality Models Get Challenged, by Katka Gasova (Moravia)
How Existing Quality Models Get Challenged, by Katka Gasova (Moravia)TAUS - The Language Data Network
 

Mehr von TAUS - The Language Data Network (20)

TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...
TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...
TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...
 
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...
 
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...
 
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...
 
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...
 
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...
 
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)
 
Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...
 Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann... Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...
Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...
 
A translation memory P2P trading platform - to make global translation memory...
A translation memory P2P trading platform - to make global translation memory...A translation memory P2P trading platform - to make global translation memory...
A translation memory P2P trading platform - to make global translation memory...
 
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...
 
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...
 
Farmer Lv (TrueTran)
Farmer Lv (TrueTran)Farmer Lv (TrueTran)
Farmer Lv (TrueTran)
 
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...
 
The Theory and Practice of Computer Aided Translation Training System, Liu Q...
 The Theory and Practice of Computer Aided Translation Training System, Liu Q... The Theory and Practice of Computer Aided Translation Training System, Liu Q...
The Theory and Practice of Computer Aided Translation Training System, Liu Q...
 
Translation Technology Showcase in Shenzhen
Translation Technology Showcase in ShenzhenTranslation Technology Showcase in Shenzhen
Translation Technology Showcase in Shenzhen
 
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)
 
How we train post-editors - Yongpeng Wei (Lingosail)
How we train post-editors - Yongpeng Wei (Lingosail)How we train post-editors - Yongpeng Wei (Lingosail)
How we train post-editors - Yongpeng Wei (Lingosail)
 
A use-case for getting MT into your company, Kerstin Berns (berns language c...
 A use-case for getting MT into your company, Kerstin Berns (berns language c... A use-case for getting MT into your company, Kerstin Berns (berns language c...
A use-case for getting MT into your company, Kerstin Berns (berns language c...
 
QE integrated in XTM, by Bob Willans (XTM)
QE integrated in XTM, by Bob Willans (XTM)QE integrated in XTM, by Bob Willans (XTM)
QE integrated in XTM, by Bob Willans (XTM)
 
How Existing Quality Models Get Challenged, by Katka Gasova (Moravia)
How Existing Quality Models Get Challenged, by Katka Gasova (Moravia)How Existing Quality Models Get Challenged, by Katka Gasova (Moravia)
How Existing Quality Models Get Challenged, by Katka Gasova (Moravia)
 

Kürzlich hochgeladen

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 

Kürzlich hochgeladen (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 

Moses MT Engine Boosts Legal Translation Productivity

  • 1. TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE A Moses MT engine for legal translation By Joël Sigling
  • 2. Joël Sigling Director a Moses MT engine for legal translation Modern technology in a traditional sector TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE Monte Carlo, 25 March 2012
  • 3. AVB Translations background • Amstelveens Vertaalburo: founded 1972 – traditional, high-quality agency • Translation World: founded 2002, tech-savvy all-round player • Merger in 2010 >> AVB Translations: premium brand with strong tech focus • Top 5 player in The Netherlands, 2011 turnover € 4.6 million • Core business: general translations – legal, financial, technical, … NO software localization (yet!)
  • 4. History of MT interest • Member of TAUS since 2008, 1st round table Amsterdam • Visited TAUS User Conferences in US since 2009 • Sense of urgency developed, merger distraction 2010 • Action in 2011 after merger • 2011: choice for Dutch <> English legal (not IT-related!) domain engine • Why SMT, why Moses? Quicker, cheaper, similar quality (shows research)
  • 5. Why legal domain MT engine? • Legal translations about approx. 40% of AVB business, 80% Dutch <>English • Not the obvious choice: people said MT wouldn’t work for legal: sentences too long, material too intricate • Statistical MT suited to non-stylistic materials: eg legal • If this works, we can make MT happen for all other domains
  • 6. MT engine objectives • Increased productivity, no BLEU % target, but tangible, practical results. How much extra can a translator do when compared to HT? • Tool to offer usable quality with very quick turnarounds for high volume (typical “Friday afternoon lawyer requests”) • Becoming an MT front runner in the non-localization sector for Dutch (5th language in Europe after FIGS)
  • 7. Developing the Moses engine • Choice between in-house and external development • In-house: control, developing expertise, lower long-term cost • External: lower initial cost, much more expertise > best for now • Our pre-requisites for development option • ownership and free access to engine • assurance data will not be used or copied by builder • Acceptable costs for development & usage • skilled partner > AsiaOnline, CrossLang, Pangeanic, LetsMT, SmartMate?? • CrossLang > all of the above, closest to our office, independent
  • 8. What we needed • Large quantities of high-quality translation data • Aligning existing high-quality legal translations (took longest to prepare) • Existing legal TMs • Going forward: company-/industry-specific terminology • Ways to measure gains • Not just automated evaluation % increase, but also tangible improvements > we are entrepreneurs, not scientists • CrossLang automated assessment tool (TER, BLEU, NIST, METEOR) • Manual assessment: eg. how many hours for post-editing 10,000 words?
  • 9. Input data • Highest quality AVB Dutch <>English legal translations: approx. 700k words per language. Predominantly civil law. • Not fully reviewed AVB TM, still high-quality: approx. 10 mi. words per language. Predominantly civil law. • Legal translations harvested by CrossLang, more diverse legal material: 7 mi. words per language
  • 10. CrossLang automated test results • Best results from AVB + harvested data, AVB data weighted extra • Results particularly good in civil law domain (bulk of AVB input data) • Results improved dramatically for other legal domains by adding harvested data
  • 11. AVB results in practice • Test done in CrossLang production assessment tool: productivity 5% higher for post-editing than human output (human output in this case very high >1000 w p/h, PE even higer)
  • 12. AVB results in practice • Live rush translations done in past two weeks: • 1,500 word trial done for law firm needing high volume in very short time. Post-edited in 75 minutes. Customer happy with quality/price ratio. • 25,000 words in two days with moderate PE effort by two post-editors. Quality estimate 80-90% of human translation. • 4,500 words in 3 hours with almost full PE effort by one post-editor. Quality estimate >90% of human translation • 15,000 words in one day, done by two post-editors. Quality estimate 80-90% of human translation
  • 13. AVB results in practice • Test and live project show great potential in two areas: • Producing usable translations very quickly and at 50-60% of normal translation cost. Margins are similar to normal translation, but likely to improve! • Higher productivity, ie lower production cost and increased margins.
  • 14. CrossLang Gateway benefits • Standard Moses engine offers no high-level functions • Only plain text files, always sentence by sentence, experimental recasing, experimental tag handling • CrossLang Gateway offers Java service layer (not wrapper scripts) • Most common file formats: Word, XML, XLIFF, • Adjustable text segmentation • Hardened, aligment-based tag handling • Advanced recasing tool based on alignment data • Named entity recognition & (re)tokenization • Terminology checking and replacement Gateway features crucial to processing our material properly
  • 15. Conclusions • Developing a good engine is not an “out of the box” task • Sufficient high-quality data is necessary for good results • Results are very promising, our objectives can be achieved • Working with a value added partner is recommended • Need to integrate MT solution in translation workflow apparent
  • 16. Phone: +31 20 645.66.10 Mobile: +31 625.025.475 E-mail: joel.sigling@avb.nl Twitter: @JoelAVB Adres: Ouderkerkerlaan 50 1185 AD Amstelveen The Netherlands Website: www.avb.nl