SlideShare ist ein Scribd-Unternehmen logo
1 von 18
Downloaden Sie, um offline zu lesen
Introduction
                                Methodology
                                  Discussion




Integrating Machine Translation with Translation
         Memory: A Practical Approach

            Panagiotis Kanavos and Dimitrios Kartsaklis


                                 November 4, 2010




  Panagiotis Kanavos and Dimitrios Kartsaklis   Integrating MT with TM: A Practical Approach   1/ 18
Introduction
                                     Methodology
                                       Discussion


Introduction


      Despite the ongoing research and the progress on the field,
      Machine Translation has not been widely accepted by the
      professional translation industry
      Common criticisms:
              MT is only suitable for draft translations of e-mails and web
              pages
              MT is not efficient for morphologically rich languages
              MT is useful only to large companies owning a wealth of
              resources
      In a nutshell: MT is something for researchers to play around
      with



       Panagiotis Kanavos and Dimitrios Kartsaklis   Integrating MT with TM: A Practical Approach   2/ 18
Introduction
                                    Methodology
                                      Discussion


A Case Study


      How MT can be incorporated into professional translation
      workflows, with limited resources, in ways that significantly
      increase productivity.
      We combine both statistical and rule-based MT systems with
      Translation Memory software using two approaches:
             The on demand, sentence-by-sentence application of MT
             The one-time application of MT into the whole translation
             project
      The case study is conducted in production conditions, with
      final deliverables that require the highest translation quality.



      Panagiotis Kanavos and Dimitrios Kartsaklis   Integrating MT with TM: A Practical Approach   3/ 18
Introduction    Configuration
                                     Methodology     Segment-by-segment workflows
                                       Discussion    One-time MT application workflow


Our setting



      Language pair: English to Greek
      Text to be translated: Two Informatics books: one
      technical guide and one academic textbook.
      TM size: 140,000 TUs coming from in-domain texts
      Terminology DB size: 30,000 entries
      Fuzzy threshold: 70%




       Panagiotis Kanavos and Dimitrios Kartsaklis   Integrating MT with TM: A Practical Approach   4/ 18
Introduction    Configuration
                                    Methodology     Segment-by-segment workflows
                                      Discussion    One-time MT application workflow


Software programs and combinations


      MT systems:
             Statistical: Moses
             Rule-based: Systran
      CAT programs:
             Swordfish II (Java application) over Linux
             D´j` Vu X over MS Windows
              ea
             Wordfast, an MS Word macro template
      Three combinations, based on practical factors:
             Sentence-by-sentence workflow with Swordfish/Moses
             Sentence-by-sentence workflow with Wordfast/Systran
             One-time MT application workflow with D´j` Vu X/Moses
                                                    ea



      Panagiotis Kanavos and Dimitrios Kartsaklis   Integrating MT with TM: A Practical Approach   5/ 18
Introduction    Configuration
                                    Methodology     Segment-by-segment workflows
                                      Discussion    One-time MT application workflow


Swordfish/Moses combination
      Swordfish: Allows connection to external programs or scripts
      Connection with Moses achieved with a custom Python script
      Basic workflow:
        if TM match > 80% then
           accept fuzzy match for post-edit
        else if 70% < TM match =< 80% then
           evaluate the fuzzy match
           if quality not acceptable then
              apply MT
           end if
        else
           apply MT
           if quality not acceptable then
              type the translation from scratch
           end if
        end if
        post-edit
      Panagiotis Kanavos and Dimitrios Kartsaklis   Integrating MT with TM: A Practical Approach   6/ 18
Introduction    Configuration
                                    Methodology     Segment-by-segment workflows
                                      Discussion    One-time MT application workflow


Swordfish/Moses combination: Results




                       Book 1 : Instructive guide, Book 2 : Textbook

      Panagiotis Kanavos and Dimitrios Kartsaklis   Integrating MT with TM: A Practical Approach   7/ 18
Introduction    Configuration
                                    Methodology     Segment-by-segment workflows
                                      Discussion    One-time MT application workflow


Wordfast/Systran combination

      Wordfast: A macro template working on top of MS Word
      Great deal of customization through MS Word macros
      Rule-based version of Systran, supporting user dictionaries
      Basic workflow:
        if TM match < 70% then
           apply pre-editing macros
           send segment to MT engine
           apply post-editing macros
           while MT result not good do
              amend Systran user dictionary and re-send segment to MT
           end while
        else
           accept the translation for post-edit
        end if
        post-edit

      Panagiotis Kanavos and Dimitrios Kartsaklis   Integrating MT with TM: A Practical Approach   8/ 18
Introduction    Configuration
                                    Methodology     Segment-by-segment workflows
                                      Discussion    One-time MT application workflow


Wordfast/Systran combination: Results




                       Book 1 : Instructive guide, Book 2 : Textbook

      Panagiotis Kanavos and Dimitrios Kartsaklis   Integrating MT with TM: A Practical Approach   9/ 18
Introduction    Configuration
                                    Methodology     Segment-by-segment workflows
                                      Discussion    One-time MT application workflow


D´j` Vu X/Moses combination
 ea
      D´j` Vu X: similar concept to Swordfish
       ea
      However: No way of integration with an MT system, so the
      only option is pre-translation of the whole project with Moses
      Send for MT only segments with no TM matches or TM
      matches below 80%
      Pre-translation stage:




      Panagiotis Kanavos and Dimitrios Kartsaklis   Integrating MT with TM: A Practical Approach   10/ 18
Introduction    Configuration
                                    Methodology     Segment-by-segment workflows
                                      Discussion    One-time MT application workflow


D´j` Vu X/Moses combination
 ea
      Basic workflow:
        if TM match > 80% then
           accept the translation for post-edit
        else
           evaluate MT translation
           if quality not acceptable then
              if any TM match exists (between 70-80%) then
                 accept the translation for post-edit
              else
                 apply “auto-assemble” feature
                 if quality not acceptable then
                     type the translation from scratch
                 end if
              end if
           end if
        end if
        post-edit
      Panagiotis Kanavos and Dimitrios Kartsaklis   Integrating MT with TM: A Practical Approach   11/ 18
Introduction    Configuration
                                    Methodology     Segment-by-segment workflows
                                      Discussion    One-time MT application workflow


D´j` Vu X/Moses combination: Results
 ea




                       Book 1 : Instructive guide, Book 2 : Textbook

      Panagiotis Kanavos and Dimitrios Kartsaklis   Integrating MT with TM: A Practical Approach   12/ 18
Introduction
                                     Methodology
                                       Discussion


Productivity increase
       MT & TM combination: Productivity increased to a level not
       possible by applying either technology in isolation:




       Panagiotis Kanavos and Dimitrios Kartsaklis   Integrating MT with TM: A Practical Approach   13/ 18
Introduction
                                     Methodology
                                       Discussion


Important factors

      Quantity and quality of TM entries
      The domain of the translation material used to train the
      statistical MT system
              The above impose serious limitations for those who work with
              small texts in many different domains. Rule-based systems are
              more suitable in such cases
      Language pair: Coding efficient user dictionaries with
      morphologically rich languages is difficult and requires some
      trial and error. Phrase-based systems like Moses have better
      performance
      Style of text: Productivity is higher with repetitive text and
      step-by-step instructions
      User expertise with all technologies involved

       Panagiotis Kanavos and Dimitrios Kartsaklis   Integrating MT with TM: A Practical Approach   14/ 18
Introduction
                                     Methodology
                                       Discussion


A proposal for a unified application

       For general acceptance by the professional translation
       community, MT should be integrated with TM into an
       intuitive unified system
       Basically a TM environment, with the MT engine as an extra
       component working on top of it
       MT suggestions should be presented in a controlled and
       selective way
       Basic components:
              A 2-column translation grid for source and target segments
              Terminology management
              MT engine
              Alignment tool
              Quality assurance control

       Panagiotis Kanavos and Dimitrios Kartsaklis   Integrating MT with TM: A Practical Approach   15/ 18
Introduction
                                    Methodology
                                      Discussion


Advanced issues


      Automation of the training process with TM databases
      Statistical systems require considerable computing resources.
      A solution: MT as Software As a Service (SaaS)
      Terminology databases can be used for more than reference
      purposes
             Additional entry fields for coding MT dictionary entries
             (Systran)
             Linguistic information can be used for creating factored models
             (Moses)
      Automatic suggestions-as-you-type (TransType, Caitra)



      Panagiotis Kanavos and Dimitrios Kartsaklis   Integrating MT with TM: A Practical Approach   16/ 18
Introduction
                                   Methodology
                                     Discussion


Summary



     The combination of MT with TM results in significant
     productivity increase not feasible in a TM-only environment
     Currently there is not a straightforward way for doing that
     Work is in progress by the authors towards this purpose, in
     the form of a Software Specification document that will
     describe the design and the components of such a system in
     every detail




     Panagiotis Kanavos and Dimitrios Kartsaklis   Integrating MT with TM: A Practical Approach   17/ 18
Introduction
                              Methodology
                                Discussion




                            Thank you!

                        Any questions?




Panagiotis Kanavos and Dimitrios Kartsaklis   Integrating MT with TM: A Practical Approach   18/ 18

Weitere ähnliche Inhalte

Ähnlich wie Integrating Machine Translation with Translation Memory: A Practical Approach

Carla Parra Escartin - ER2 Hermes Traducciones
Carla Parra Escartin - ER2 Hermes Traducciones Carla Parra Escartin - ER2 Hermes Traducciones
Carla Parra Escartin - ER2 Hermes Traducciones RIILP
 
5 challenges of scaling l10n workflows KantanMT/bmmt webinar
5 challenges of scaling l10n workflows KantanMT/bmmt webinar5 challenges of scaling l10n workflows KantanMT/bmmt webinar
5 challenges of scaling l10n workflows KantanMT/bmmt webinarkantanmt
 
Application of computer aided translation technology in translation teaching
Application of computer aided translation technology in translation teachingApplication of computer aided translation technology in translation teaching
Application of computer aided translation technology in translation teachingHoangtrungchinh Ttnct
 
New Breakthroughs in Machine Transation Technology
New Breakthroughs in Machine Transation TechnologyNew Breakthroughs in Machine Transation Technology
New Breakthroughs in Machine Transation Technologykantanmt
 
Gestión proyectos traducción - Universitat Autònoma de Barcelona
Gestión proyectos traducción - Universitat Autònoma de BarcelonaGestión proyectos traducción - Universitat Autònoma de Barcelona
Gestión proyectos traducción - Universitat Autònoma de BarcelonaManuel Herranz
 
Gestión proyectos traducción en la Universitat Autònoma de Barcelona
Gestión proyectos traducción en la Universitat Autònoma de BarcelonaGestión proyectos traducción en la Universitat Autònoma de Barcelona
Gestión proyectos traducción en la Universitat Autònoma de BarcelonaManuel Herranz
 
Temporal Convolutional Networks - Dethroning RNN's for sequence modelling?
Temporal Convolutional Networks - Dethroning RNN's for sequence modelling?Temporal Convolutional Networks - Dethroning RNN's for sequence modelling?
Temporal Convolutional Networks - Dethroning RNN's for sequence modelling?Thomas Hjelde Thoresen
 
Amta 2012-federico (1)
Amta 2012-federico (1)Amta 2012-federico (1)
Amta 2012-federico (1)FabiolaPanetti
 
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISH
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISHA NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISH
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISHIRJET Journal
 
Machine Translation Master Class at the EUATC Conference by Diego Bartolome
Machine Translation Master Class at the EUATC Conference by Diego BartolomeMachine Translation Master Class at the EUATC Conference by Diego Bartolome
Machine Translation Master Class at the EUATC Conference by Diego Bartolometauyou
 
Lexcelera MT Breaking Compromises
Lexcelera MT Breaking CompromisesLexcelera MT Breaking Compromises
Lexcelera MT Breaking CompromisesLoriThicke
 
Webinar automotive and engineering content 16.06.16
Webinar   automotive and engineering content 16.06.16Webinar   automotive and engineering content 16.06.16
Webinar automotive and engineering content 16.06.16kantanmt
 
kerstin bier, localization world barcelona, manuel herranz, mt, pangeanic, sy...
kerstin bier, localization world barcelona, manuel herranz, mt, pangeanic, sy...kerstin bier, localization world barcelona, manuel herranz, mt, pangeanic, sy...
kerstin bier, localization world barcelona, manuel herranz, mt, pangeanic, sy...Manuel Herranz
 
Pangeanic presentation at Elia Together Athens - Manuel Herranz
Pangeanic presentation at Elia Together Athens - Manuel HerranzPangeanic presentation at Elia Together Athens - Manuel Herranz
Pangeanic presentation at Elia Together Athens - Manuel HerranzManuel Herranz
 
No more SMT black boxes with MTradumàtica: a step-by-step web-based SMT appli...
No more SMT black boxes with MTradumàtica: a step-by-step web-based SMT appli...No more SMT black boxes with MTradumàtica: a step-by-step web-based SMT appli...
No more SMT black boxes with MTradumàtica: a step-by-step web-based SMT appli...TAUS - The Language Data Network
 
Collaborative Construction of Telecommunications Services
Collaborative Construction of Telecommunications ServicesCollaborative Construction of Telecommunications Services
Collaborative Construction of Telecommunications ServicesVanea Chiprianov
 
Welocalize Throughputs and Post-Editing Productivity Webinar Laura Casanellas
Welocalize Throughputs and Post-Editing Productivity Webinar Laura CasanellasWelocalize Throughputs and Post-Editing Productivity Webinar Laura Casanellas
Welocalize Throughputs and Post-Editing Productivity Webinar Laura CasanellasWelocalize
 

Ähnlich wie Integrating Machine Translation with Translation Memory: A Practical Approach (20)

Carla Parra Escartin - ER2 Hermes Traducciones
Carla Parra Escartin - ER2 Hermes Traducciones Carla Parra Escartin - ER2 Hermes Traducciones
Carla Parra Escartin - ER2 Hermes Traducciones
 
5 challenges of scaling l10n workflows KantanMT/bmmt webinar
5 challenges of scaling l10n workflows KantanMT/bmmt webinar5 challenges of scaling l10n workflows KantanMT/bmmt webinar
5 challenges of scaling l10n workflows KantanMT/bmmt webinar
 
Application of computer aided translation technology in translation teaching
Application of computer aided translation technology in translation teachingApplication of computer aided translation technology in translation teaching
Application of computer aided translation technology in translation teaching
 
New Breakthroughs in Machine Transation Technology
New Breakthroughs in Machine Transation TechnologyNew Breakthroughs in Machine Transation Technology
New Breakthroughs in Machine Transation Technology
 
Gestión proyectos traducción - Universitat Autònoma de Barcelona
Gestión proyectos traducción - Universitat Autònoma de BarcelonaGestión proyectos traducción - Universitat Autònoma de Barcelona
Gestión proyectos traducción - Universitat Autònoma de Barcelona
 
Gestión proyectos traducción en la Universitat Autònoma de Barcelona
Gestión proyectos traducción en la Universitat Autònoma de BarcelonaGestión proyectos traducción en la Universitat Autònoma de Barcelona
Gestión proyectos traducción en la Universitat Autònoma de Barcelona
 
Temporal Convolutional Networks - Dethroning RNN's for sequence modelling?
Temporal Convolutional Networks - Dethroning RNN's for sequence modelling?Temporal Convolutional Networks - Dethroning RNN's for sequence modelling?
Temporal Convolutional Networks - Dethroning RNN's for sequence modelling?
 
CAT TOOLS.ppt
CAT TOOLS.pptCAT TOOLS.ppt
CAT TOOLS.ppt
 
Amta 2012-federico (1)
Amta 2012-federico (1)Amta 2012-federico (1)
Amta 2012-federico (1)
 
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISH
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISHA NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISH
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISH
 
Machine Translation Master Class at the EUATC Conference by Diego Bartolome
Machine Translation Master Class at the EUATC Conference by Diego BartolomeMachine Translation Master Class at the EUATC Conference by Diego Bartolome
Machine Translation Master Class at the EUATC Conference by Diego Bartolome
 
Lexcelera MT Breaking Compromises
Lexcelera MT Breaking CompromisesLexcelera MT Breaking Compromises
Lexcelera MT Breaking Compromises
 
Webinar automotive and engineering content 16.06.16
Webinar   automotive and engineering content 16.06.16Webinar   automotive and engineering content 16.06.16
Webinar automotive and engineering content 16.06.16
 
kerstin bier, localization world barcelona, manuel herranz, mt, pangeanic, sy...
kerstin bier, localization world barcelona, manuel herranz, mt, pangeanic, sy...kerstin bier, localization world barcelona, manuel herranz, mt, pangeanic, sy...
kerstin bier, localization world barcelona, manuel herranz, mt, pangeanic, sy...
 
Pangeanic presentation at Elia Together Athens - Manuel Herranz
Pangeanic presentation at Elia Together Athens - Manuel HerranzPangeanic presentation at Elia Together Athens - Manuel Herranz
Pangeanic presentation at Elia Together Athens - Manuel Herranz
 
Intento Enterprise MT Hub
Intento Enterprise MT HubIntento Enterprise MT Hub
Intento Enterprise MT Hub
 
Intento Enterprise MT Hub
Intento Enterprise MT HubIntento Enterprise MT Hub
Intento Enterprise MT Hub
 
No more SMT black boxes with MTradumàtica: a step-by-step web-based SMT appli...
No more SMT black boxes with MTradumàtica: a step-by-step web-based SMT appli...No more SMT black boxes with MTradumàtica: a step-by-step web-based SMT appli...
No more SMT black boxes with MTradumàtica: a step-by-step web-based SMT appli...
 
Collaborative Construction of Telecommunications Services
Collaborative Construction of Telecommunications ServicesCollaborative Construction of Telecommunications Services
Collaborative Construction of Telecommunications Services
 
Welocalize Throughputs and Post-Editing Productivity Webinar Laura Casanellas
Welocalize Throughputs and Post-Editing Productivity Webinar Laura CasanellasWelocalize Throughputs and Post-Editing Productivity Webinar Laura Casanellas
Welocalize Throughputs and Post-Editing Productivity Webinar Laura Casanellas
 

Kürzlich hochgeladen

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 

Kürzlich hochgeladen (20)

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 

Integrating Machine Translation with Translation Memory: A Practical Approach

  • 1. Introduction Methodology Discussion Integrating Machine Translation with Translation Memory: A Practical Approach Panagiotis Kanavos and Dimitrios Kartsaklis November 4, 2010 Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 1/ 18
  • 2. Introduction Methodology Discussion Introduction Despite the ongoing research and the progress on the field, Machine Translation has not been widely accepted by the professional translation industry Common criticisms: MT is only suitable for draft translations of e-mails and web pages MT is not efficient for morphologically rich languages MT is useful only to large companies owning a wealth of resources In a nutshell: MT is something for researchers to play around with Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 2/ 18
  • 3. Introduction Methodology Discussion A Case Study How MT can be incorporated into professional translation workflows, with limited resources, in ways that significantly increase productivity. We combine both statistical and rule-based MT systems with Translation Memory software using two approaches: The on demand, sentence-by-sentence application of MT The one-time application of MT into the whole translation project The case study is conducted in production conditions, with final deliverables that require the highest translation quality. Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 3/ 18
  • 4. Introduction Configuration Methodology Segment-by-segment workflows Discussion One-time MT application workflow Our setting Language pair: English to Greek Text to be translated: Two Informatics books: one technical guide and one academic textbook. TM size: 140,000 TUs coming from in-domain texts Terminology DB size: 30,000 entries Fuzzy threshold: 70% Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 4/ 18
  • 5. Introduction Configuration Methodology Segment-by-segment workflows Discussion One-time MT application workflow Software programs and combinations MT systems: Statistical: Moses Rule-based: Systran CAT programs: Swordfish II (Java application) over Linux D´j` Vu X over MS Windows ea Wordfast, an MS Word macro template Three combinations, based on practical factors: Sentence-by-sentence workflow with Swordfish/Moses Sentence-by-sentence workflow with Wordfast/Systran One-time MT application workflow with D´j` Vu X/Moses ea Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 5/ 18
  • 6. Introduction Configuration Methodology Segment-by-segment workflows Discussion One-time MT application workflow Swordfish/Moses combination Swordfish: Allows connection to external programs or scripts Connection with Moses achieved with a custom Python script Basic workflow: if TM match > 80% then accept fuzzy match for post-edit else if 70% < TM match =< 80% then evaluate the fuzzy match if quality not acceptable then apply MT end if else apply MT if quality not acceptable then type the translation from scratch end if end if post-edit Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 6/ 18
  • 7. Introduction Configuration Methodology Segment-by-segment workflows Discussion One-time MT application workflow Swordfish/Moses combination: Results Book 1 : Instructive guide, Book 2 : Textbook Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 7/ 18
  • 8. Introduction Configuration Methodology Segment-by-segment workflows Discussion One-time MT application workflow Wordfast/Systran combination Wordfast: A macro template working on top of MS Word Great deal of customization through MS Word macros Rule-based version of Systran, supporting user dictionaries Basic workflow: if TM match < 70% then apply pre-editing macros send segment to MT engine apply post-editing macros while MT result not good do amend Systran user dictionary and re-send segment to MT end while else accept the translation for post-edit end if post-edit Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 8/ 18
  • 9. Introduction Configuration Methodology Segment-by-segment workflows Discussion One-time MT application workflow Wordfast/Systran combination: Results Book 1 : Instructive guide, Book 2 : Textbook Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 9/ 18
  • 10. Introduction Configuration Methodology Segment-by-segment workflows Discussion One-time MT application workflow D´j` Vu X/Moses combination ea D´j` Vu X: similar concept to Swordfish ea However: No way of integration with an MT system, so the only option is pre-translation of the whole project with Moses Send for MT only segments with no TM matches or TM matches below 80% Pre-translation stage: Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 10/ 18
  • 11. Introduction Configuration Methodology Segment-by-segment workflows Discussion One-time MT application workflow D´j` Vu X/Moses combination ea Basic workflow: if TM match > 80% then accept the translation for post-edit else evaluate MT translation if quality not acceptable then if any TM match exists (between 70-80%) then accept the translation for post-edit else apply “auto-assemble” feature if quality not acceptable then type the translation from scratch end if end if end if end if post-edit Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 11/ 18
  • 12. Introduction Configuration Methodology Segment-by-segment workflows Discussion One-time MT application workflow D´j` Vu X/Moses combination: Results ea Book 1 : Instructive guide, Book 2 : Textbook Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 12/ 18
  • 13. Introduction Methodology Discussion Productivity increase MT & TM combination: Productivity increased to a level not possible by applying either technology in isolation: Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 13/ 18
  • 14. Introduction Methodology Discussion Important factors Quantity and quality of TM entries The domain of the translation material used to train the statistical MT system The above impose serious limitations for those who work with small texts in many different domains. Rule-based systems are more suitable in such cases Language pair: Coding efficient user dictionaries with morphologically rich languages is difficult and requires some trial and error. Phrase-based systems like Moses have better performance Style of text: Productivity is higher with repetitive text and step-by-step instructions User expertise with all technologies involved Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 14/ 18
  • 15. Introduction Methodology Discussion A proposal for a unified application For general acceptance by the professional translation community, MT should be integrated with TM into an intuitive unified system Basically a TM environment, with the MT engine as an extra component working on top of it MT suggestions should be presented in a controlled and selective way Basic components: A 2-column translation grid for source and target segments Terminology management MT engine Alignment tool Quality assurance control Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 15/ 18
  • 16. Introduction Methodology Discussion Advanced issues Automation of the training process with TM databases Statistical systems require considerable computing resources. A solution: MT as Software As a Service (SaaS) Terminology databases can be used for more than reference purposes Additional entry fields for coding MT dictionary entries (Systran) Linguistic information can be used for creating factored models (Moses) Automatic suggestions-as-you-type (TransType, Caitra) Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 16/ 18
  • 17. Introduction Methodology Discussion Summary The combination of MT with TM results in significant productivity increase not feasible in a TM-only environment Currently there is not a straightforward way for doing that Work is in progress by the authors towards this purpose, in the form of a Software Specification document that will describe the design and the components of such a system in every detail Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 17/ 18
  • 18. Introduction Methodology Discussion Thank you! Any questions? Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 18/ 18