SlideShare ist ein Scribd-Unternehmen logo
1 von 152
Downloaden Sie, um offline zu lesen
Outline
            Tools for Translators
            Free Language Data
            Machine Translation




Machine Translation and Translation Technology

                      Jimmy O’Regan

                     The Apertium Project


         OSS Bar Camp, 19 September 2009




                Jimmy O’Regan       Machine Translation and Translation Technology
Outline
                  Tools for Translators
                  Free Language Data
                  Machine Translation




1   Free Language Data




                      Jimmy O’Regan       Machine Translation and Translation Technology
Outline
                  Tools for Translators   Terminology
                  Free Language Data      Localisation vs. Translation
                  Machine Translation


Tools for Translators




                      Jimmy O’Regan       Machine Translation and Translation Technology
Outline
                    Tools for Translators   Terminology
                    Free Language Data      Localisation vs. Translation
                    Machine Translation


Some Terminology

     Internationalisation
     Giving software the capability to display text in another
     language




                        Jimmy O’Regan       Machine Translation and Translation Technology
Outline
                    Tools for Translators   Terminology
                    Free Language Data      Localisation vs. Translation
                    Machine Translation


Some Terminology

     Internationalisation
     Giving software the capability to display text in another
     language
     In Open Source, this generally means adding support for
     gettext.




                        Jimmy O’Regan       Machine Translation and Translation Technology
Outline
                    Tools for Translators   Terminology
                    Free Language Data      Localisation vs. Translation
                    Machine Translation


Some Terminology

     Internationalisation
     Giving software the capability to display text in another
     language
     In Open Source, this generally means adding support for
     gettext.
     Localisation




                        Jimmy O’Regan       Machine Translation and Translation Technology
Outline
                    Tools for Translators   Terminology
                    Free Language Data      Localisation vs. Translation
                    Machine Translation


Some Terminology

     Internationalisation
     Giving software the capability to display text in another
     language
     In Open Source, this generally means adding support for
     gettext.
     Localisation
     Customising the messages displayed to the user to appear in
     the manner most appropriate for them.




                        Jimmy O’Regan       Machine Translation and Translation Technology
Outline
                    Tools for Translators   Terminology
                    Free Language Data      Localisation vs. Translation
                    Machine Translation


Some Terminology

     Internationalisation
     Giving software the capability to display text in another
     language
     In Open Source, this generally means adding support for
     gettext.
     Localisation
     Customising the messages displayed to the user to appear in
     the manner most appropriate for them.
     In their language, or their dialect.




                        Jimmy O’Regan       Machine Translation and Translation Technology
Outline
                    Tools for Translators   Terminology
                    Free Language Data      Localisation vs. Translation
                    Machine Translation


Some Terminology

     Internationalisation
     Giving software the capability to display text in another
     language
     In Open Source, this generally means adding support for
     gettext.
     Localisation
     Customising the messages displayed to the user to appear in
     the manner most appropriate for them.
     In their language, or their dialect.

     Translation



                        Jimmy O’Regan       Machine Translation and Translation Technology
Outline
                    Tools for Translators   Terminology
                    Free Language Data      Localisation vs. Translation
                    Machine Translation


Some Terminology

     Internationalisation
     Giving software the capability to display text in another
     language
     In Open Source, this generally means adding support for
     gettext.
     Localisation
     Customising the messages displayed to the user to appear in
     the manner most appropriate for them.
     In their language, or their dialect.

     Translation
      Converting text from one language to another

                        Jimmy O’Regan       Machine Translation and Translation Technology
Outline
                     Tools for Translators   Terminology
                     Free Language Data      Localisation vs. Translation
                     Machine Translation


Localisation vs. Translation




  Localisation and translation are sometimes, but not always, the
  same.




                         Jimmy O’Regan       Machine Translation and Translation Technology
Outline
                     Tools for Translators   Terminology
                     Free Language Data      Localisation vs. Translation
                     Machine Translation


Localisation vs. Translation




  Localisation and translation are sometimes, but not always, the
  same.
   Documents may need to be localised, but not translated:




                         Jimmy O’Regan       Machine Translation and Translation Technology
Outline
                     Tools for Translators   Terminology
                     Free Language Data      Localisation vs. Translation
                     Machine Translation


Localisation vs. Translation




  Localisation and translation are sometimes, but not always, the
  same.
   Documents may need to be localised, but not translated:
   A British company with an Irish office still needs to localise their
  documents: any reference to “our London office” will need to be
  changed to “our Dublin office”.




                         Jimmy O’Regan       Machine Translation and Translation Technology
Outline
                     Tools for Translators   Terminology
                     Free Language Data      Localisation vs. Translation
                     Machine Translation


Localisation vs. Translation



  Localised translations can also have additional requirements:




                         Jimmy O’Regan       Machine Translation and Translation Technology
Outline
                     Tools for Translators   Terminology
                     Free Language Data      Localisation vs. Translation
                     Machine Translation


Localisation vs. Translation



  Localised translations can also have additional requirements:
   gettext allows numbers to be specially treated: ‘‘%d
  file(s)’’ ugliness is not necessary.




                         Jimmy O’Regan       Machine Translation and Translation Technology
Outline
                     Tools for Translators   Terminology
                     Free Language Data      Localisation vs. Translation
                     Machine Translation


Localisation vs. Translation



  Localised translations can also have additional requirements:
   gettext allows numbers to be specially treated: ‘‘%d
  file(s)’’ ugliness is not necessary.
   English and Spanish need two forms of words for number




                         Jimmy O’Regan       Machine Translation and Translation Technology
Outline
                    Tools for Translators   Terminology
                    Free Language Data      Localisation vs. Translation
                    Machine Translation


Localisation vs. Translation



  Localised translations can also have additional requirements:
   gettext allows numbers to be specially treated: ‘‘%d
  file(s)’’ ugliness is not necessary.
   English and Spanish need two forms of words for number: single
  and plural




                        Jimmy O’Regan       Machine Translation and Translation Technology
Outline
                    Tools for Translators   Terminology
                    Free Language Data      Localisation vs. Translation
                    Machine Translation


Localisation vs. Translation



  Localised translations can also have additional requirements:
   gettext allows numbers to be specially treated: ‘‘%d
  file(s)’’ ugliness is not necessary.
   English and Spanish need two forms of words for number: single
  and plural
  Polish needs three




                        Jimmy O’Regan       Machine Translation and Translation Technology
Outline
                     Tools for Translators   Terminology
                     Free Language Data      Localisation vs. Translation
                     Machine Translation


Localisation vs. Translation



  Localised translations can also have additional requirements:
   gettext allows numbers to be specially treated: ‘‘%d
  file(s)’’ ugliness is not necessary.
   English and Spanish need two forms of words for number: single
  and plural
  Polish needs three: single, plural, and quantity (greater than 5)




                         Jimmy O’Regan       Machine Translation and Translation Technology
Outline
                     Tools for Translators   Terminology
                     Free Language Data      Localisation vs. Translation
                     Machine Translation


Localisation vs. Translation



  Localised translations can also have additional requirements:
   gettext allows numbers to be specially treated: ‘‘%d
  file(s)’’ ugliness is not necessary.
   English and Spanish need two forms of words for number: single
  and plural
  Polish needs three: single, plural, and quantity (greater than 5)
  Slovenian needs four




                         Jimmy O’Regan       Machine Translation and Translation Technology
Outline
                     Tools for Translators   Terminology
                     Free Language Data      Localisation vs. Translation
                     Machine Translation


Localisation vs. Translation



  Localised translations can also have additional requirements:
   gettext allows numbers to be specially treated: ‘‘%d
  file(s)’’ ugliness is not necessary.
   English and Spanish need two forms of words for number: single
  and plural
  Polish needs three: single, plural, and quantity (greater than 5)
  Slovenian needs four: single, dual, plural, and quantity.




                         Jimmy O’Regan       Machine Translation and Translation Technology
Outline
                      Tools for Translators   Terminology
                      Free Language Data      Localisation vs. Translation
                      Machine Translation


Localisation



  Software localisation is a huge business area for proprietary
  software.




                          Jimmy O’Regan       Machine Translation and Translation Technology
Outline
                      Tools for Translators   Terminology
                      Free Language Data      Localisation vs. Translation
                      Machine Translation


Localisation



  Software localisation is a huge business area for proprietary
  software.
  One that traditionally lags behind Open Source.




                          Jimmy O’Regan       Machine Translation and Translation Technology
Outline
                      Tools for Translators   Terminology
                      Free Language Data      Localisation vs. Translation
                      Machine Translation


Localisation



  Software localisation is a huge business area for proprietary
  software.
  One that traditionally lags behind Open Source.
  That advantage is usually due to the efforts of a handful of
  dedicated volunteers for the majority of languages.




                          Jimmy O’Regan       Machine Translation and Translation Technology
Outline
                     Tools for Translators   Terminology
                     Free Language Data      Localisation vs. Translation
                     Machine Translation


Localisation



  Software localisation is a huge business area for proprietary
  software.
  One that traditionally lags behind Open Source.
  That advantage is usually due to the efforts of a handful of
  dedicated volunteers for the majority of languages.
  But they’re catching up: Facebook is using Open Source-like
  efforts for their translations.




                         Jimmy O’Regan       Machine Translation and Translation Technology
Outline
                     Tools for Translators   Terminology
                     Free Language Data      Localisation vs. Translation
                     Machine Translation


Localisation Tools


  Unsurprisingly then, localisation is very well supported by Open
  Source software:
      Pootle (http://translate.sourceforge.net/wiki/pootle/index -
      Web-based)
      Virtaal (http://translate.sourceforge.net/wiki/virtaal/index -
      cross platform)
      poEdit (http://www.poedit.net/ - cross platform)
      Lokalize (http://userbase.kde.org/Lokalize - KDE)
      GTranslator (http://gtranslator.sourceforge.net/ - GNOME)



                         Jimmy O’Regan       Machine Translation and Translation Technology
Outline
                      Tools for Translators   Terminology
                      Free Language Data      Localisation vs. Translation
                      Machine Translation


Translation Tools




  Unfortunately, there’s only one real equivalent tool for general
  translation: OmegaT




                          Jimmy O’Regan       Machine Translation and Translation Technology
Outline
                      Tools for Translators   Terminology
                      Free Language Data      Localisation vs. Translation
                      Machine Translation


Common Features



  All of these tools include these features:




                          Jimmy O’Regan       Machine Translation and Translation Technology
Outline
                      Tools for Translators   Terminology
                      Free Language Data      Localisation vs. Translation
                      Machine Translation


Common Features



  All of these tools include these features:
       Translation Memory




                          Jimmy O’Regan       Machine Translation and Translation Technology
Outline
                      Tools for Translators   Terminology
                      Free Language Data      Localisation vs. Translation
                      Machine Translation


Common Features



  All of these tools include these features:
       Translation Memory
       Automatically reuse previous translations




                          Jimmy O’Regan       Machine Translation and Translation Technology
Outline
                      Tools for Translators   Terminology
                      Free Language Data      Localisation vs. Translation
                      Machine Translation


Common Features



  All of these tools include these features:
       Translation Memory
       Automatically reuse previous translations
       Fuzzy matching




                          Jimmy O’Regan       Machine Translation and Translation Technology
Outline
                      Tools for Translators   Terminology
                      Free Language Data      Localisation vs. Translation
                      Machine Translation


Common Features



  All of these tools include these features:
       Translation Memory
       Automatically reuse previous translations
       Fuzzy matching
       Suggest translations similar to previously translated sentences




                          Jimmy O’Regan       Machine Translation and Translation Technology
Outline
                      Tools for Translators   Terminology
                      Free Language Data      Localisation vs. Translation
                      Machine Translation


Common Features



  All of these tools include these features:
       Translation Memory
       Automatically reuse previous translations
       Fuzzy matching
       Suggest translations similar to previously translated sentences
       Terminology Management




                          Jimmy O’Regan       Machine Translation and Translation Technology
Outline
                      Tools for Translators   Terminology
                      Free Language Data      Localisation vs. Translation
                      Machine Translation


Common Features



  All of these tools include these features:
       Translation Memory
       Automatically reuse previous translations
       Fuzzy matching
       Suggest translations similar to previously translated sentences
       Terminology Management
       Give suggestions from a per-project dictionary




                          Jimmy O’Regan       Machine Translation and Translation Technology
Outline
                     Tools for Translators   Terminology
                     Free Language Data      Localisation vs. Translation
                     Machine Translation


Translate Toolkit



  http://translate.sourceforge.net/
  A set of common tools for translation/localisation:
      Translation Memory server
      Format conversion
      Terminology management
      Quality control tools
  All brought to you by the wonderful people of translate.org.za




                         Jimmy O’Regan       Machine Translation and Translation Technology
Outline
                    Tools for Translators
                    Free Language Data
                    Machine Translation


Free Language Software Needs Free Data




  Just like “Free Software Needs Free Documentation”, so too does
  Free Language Software need Free Data.




                        Jimmy O’Regan       Machine Translation and Translation Technology
Outline
                    Tools for Translators
                    Free Language Data
                    Machine Translation


Free Language Software Needs Free Data




  Just like “Free Software Needs Free Documentation”, so too does
  Free Language Software need Free Data.
  Usually, this means we have to make it ourselves.




                        Jimmy O’Regan       Machine Translation and Translation Technology
Outline
                    Tools for Translators
                    Free Language Data
                    Machine Translation


Free Language Software Needs Free Data




  Just like “Free Software Needs Free Documentation”, so too does
  Free Language Software need Free Data.
  Usually, this means we have to make it ourselves.
  Unfortunately, the community of developers of free language
  software, and thus free language data, is quite small.




                        Jimmy O’Regan       Machine Translation and Translation Technology
Outline
                     Tools for Translators
                     Free Language Data
                     Machine Translation


The importance of spell checkers




  Spell checking data packages are the absolute bare minimum of
  support for a language with technology.
  Usually, the people who develop them tend to be involved in other
  areas of Free language software:




                         Jimmy O’Regan       Machine Translation and Translation Technology
Outline
                   Tools for Translators
                   Free Language Data
                   Machine Translation


The importance of spell checkers


  Kevin Scannell




                       Jimmy O’Regan       Machine Translation and Translation Technology
Outline
                     Tools for Translators
                     Free Language Data
                     Machine Translation


The importance of spell checkers


  Kevin Scannell
  Makes the Irish spell checking data.




                         Jimmy O’Regan       Machine Translation and Translation Technology
Outline
                     Tools for Translators
                     Free Language Data
                     Machine Translation


The importance of spell checkers


  Kevin Scannell
  Makes the Irish spell checking data. And An Gramad´ir, an Irish
                                                    o
  language grammar checker.




                         Jimmy O’Regan       Machine Translation and Translation Technology
Outline
                    Tools for Translators
                    Free Language Data
                    Machine Translation


The importance of spell checkers


  Kevin Scannell
  Makes the Irish spell checking data. And An Gramad´ir, an Irish
                                                    o
  language grammar checker. And created a WordNet/thesaurus for
  Irish.




                        Jimmy O’Regan       Machine Translation and Translation Technology
Outline
                    Tools for Translators
                    Free Language Data
                    Machine Translation


The importance of spell checkers


  Kevin Scannell
  Makes the Irish spell checking data. And An Gramad´ir, an Irish
                                                      o
  language grammar checker. And created a WordNet/thesaurus for
  Irish. And contributed the language data for Apertium’s
  Irish–Scots Gaelic translator. etc.




                        Jimmy O’Regan       Machine Translation and Translation Technology
Outline
                    Tools for Translators
                    Free Language Data
                    Machine Translation


The importance of spell checkers


  Kevin Scannell
  Makes the Irish spell checking data. And An Gramad´ir, an Irish
                                                      o
  language grammar checker. And created a WordNet/thesaurus for
  Irish. And contributed the language data for Apertium’s
  Irish–Scots Gaelic translator. etc.
  Marcin Milkowski




                        Jimmy O’Regan       Machine Translation and Translation Technology
Outline
                    Tools for Translators
                    Free Language Data
                    Machine Translation


The importance of spell checkers


  Kevin Scannell
  Makes the Irish spell checking data. And An Gramad´ir, an Irish
                                                      o
  language grammar checker. And created a WordNet/thesaurus for
  Irish. And contributed the language data for Apertium’s
  Irish–Scots Gaelic translator. etc.
  Marcin Milkowski
  Heavily involved in the Polish spell checking data




                        Jimmy O’Regan       Machine Translation and Translation Technology
Outline
                    Tools for Translators
                    Free Language Data
                    Machine Translation


The importance of spell checkers


  Kevin Scannell
  Makes the Irish spell checking data. And An Gramad´ir, an Irish
                                                       o
  language grammar checker. And created a WordNet/thesaurus for
  Irish. And contributed the language data for Apertium’s
  Irish–Scots Gaelic translator. etc.
  Marcin Milkowski
  Heavily involved in the Polish spell checking data And
  LanguageTool, a multilingual grammar checker that’s integrated
  with Open Office.




                        Jimmy O’Regan       Machine Translation and Translation Technology
Outline
                    Tools for Translators
                    Free Language Data
                    Machine Translation


The importance of spell checkers


  Kevin Scannell
  Makes the Irish spell checking data. And An Gramad´ir, an Irish
                                                       o
  language grammar checker. And created a WordNet/thesaurus for
  Irish. And contributed the language data for Apertium’s
  Irish–Scots Gaelic translator. etc.
  Marcin Milkowski
  Heavily involved in the Polish spell checking data And
  LanguageTool, a multilingual grammar checker that’s integrated
  with Open Office. And maintains the open Polish thesaurus.




                        Jimmy O’Regan       Machine Translation and Translation Technology
Outline
                     Tools for Translators
                     Free Language Data
                     Machine Translation


The importance of spell checkers



  translate.org.za
  Make the spelling checkers for several South African languages, as
  well as many tools for translators already mentioned–Virtaal,
  Translate Toolkit, Pootle. Much of Apertium’s English–Africans
  translator was made directly by translate.org.za developers, as well
  as Apertium’s dbus interface, and a GUI. (Virtaal allows translators
  to use machine translations as a basis for their work).




                         Jimmy O’Regan       Machine Translation and Translation Technology
Outline
                     Tools for Translators
                     Free Language Data
                     Machine Translation


The importance of spell checkers


  Spell checkers are set to become even more important. Hunspell,
  which is fast becoming the standard spell checker in Open Source
  projects, now includes morphological analysis and generation. This
  will greatly improve, among other things, terminology management
  in translator’s tools.
  At the moment, if you have “dog” in your terminology list, the
  translation tool will see that and only that: “dogs” will go
  unrecognised. With morphological analysis, the tool can know that
  “dogs” is not only related to “dog”, but is the plural of a noun:
  another assistance to the translator.



                         Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                     Tools for Translators   N-Grams
                     Free Language Data      Moses
                     Machine Translation     Apertium


Machine Translation




  Machine translation has a bad reputation.




                         Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                     Tools for Translators   N-Grams
                     Free Language Data      Moses
                     Machine Translation     Apertium


Mechanical Translation



  Mechanical translation, of any form, does tend, inevitably, to have
  mistakes.




                         Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                     Tools for Translators   N-Grams
                     Free Language Data      Moses
                     Machine Translation     Apertium


Mechanical Translation



  Mechanical translation, of any form, does tend, inevitably, to have
  mistakes.
  For centuries, paintings of Moses portrayed him as having horns.




                         Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                     Tools for Translators   N-Grams
                     Free Language Data      Moses
                     Machine Translation     Apertium


Mechanical Translation



  Mechanical translation, of any form, does tend, inevitably, to have
  mistakes.
  For centuries, paintings of Moses portrayed him as having horns.
  A translator of the Latin Vulgate added the wrong vowel: he
  thought that Moses had horns, not that his face was glowing.




                         Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                     Tools for Translators   N-Grams
                     Free Language Data      Moses
                     Machine Translation     Apertium


Mechanical Translation



  Mechanical translation, of any form, does tend, inevitably, to have
  mistakes.
  For centuries, paintings of Moses portrayed him as having horns.
  A translator of the Latin Vulgate added the wrong vowel: he
  thought that Moses had horns, not that his face was glowing.
  And people were killed for wishing to correct that, and other
  mistakes.




                         Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                     Tools for Translators   N-Grams
                     Free Language Data      Moses
                     Machine Translation     Apertium


Mechanical Translation



  Mechanical translation, of any form, does tend, inevitably, to have
  mistakes.
  For centuries, paintings of Moses portrayed him as having horns.
  A translator of the Latin Vulgate added the wrong vowel: he
  thought that Moses had horns, not that his face was glowing.
  And people were killed for wishing to correct that, and other
  mistakes.
  Proofread translations, always.




                         Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                   Tools for Translators   N-Grams
                   Free Language Data      Moses
                   Machine Translation     Apertium


Types of Machine Translation

     Dictionary lookup




                       Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                   Tools for Translators   N-Grams
                   Free Language Data      Moses
                   Machine Translation     Apertium


Types of Machine Translation

     Dictionary lookup – the most basic form of MT




                       Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                   Tools for Translators   N-Grams
                   Free Language Data      Moses
                   Machine Translation     Apertium


Types of Machine Translation

     Dictionary lookup – the most basic form of MT
     Translation Memory




                       Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                   Tools for Translators   N-Grams
                   Free Language Data      Moses
                   Machine Translation     Apertium


Types of Machine Translation

     Dictionary lookup – the most basic form of MT
     Translation Memory – also, a basic form of MT




                       Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                   Tools for Translators   N-Grams
                   Free Language Data      Moses
                   Machine Translation     Apertium


Types of Machine Translation

     Dictionary lookup – the most basic form of MT
     Translation Memory – also, a basic form of MT
     Example Based Machine Translation




                       Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                   Tools for Translators   N-Grams
                   Free Language Data      Moses
                   Machine Translation     Apertium


Types of Machine Translation

     Dictionary lookup – the most basic form of MT
     Translation Memory – also, a basic form of MT
     Example Based Machine Translation – considered the most
     accurate form of MT, but there are few if any examples “in
     the wild”.




                       Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                   Tools for Translators   N-Grams
                   Free Language Data      Moses
                   Machine Translation     Apertium


Types of Machine Translation

     Dictionary lookup – the most basic form of MT
     Translation Memory – also, a basic form of MT
     Example Based Machine Translation – considered the most
     accurate form of MT, but there are few if any examples “in
     the wild”.
     Statistical Machine Translation




                       Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                    Tools for Translators   N-Grams
                    Free Language Data      Moses
                    Machine Translation     Apertium


Types of Machine Translation

     Dictionary lookup – the most basic form of MT
     Translation Memory – also, a basic form of MT
     Example Based Machine Translation – considered the most
     accurate form of MT, but there are few if any examples “in
     the wild”.
     Statistical Machine Translation – currently the darling of
     research and the basis of Google Translate.




                        Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                    Tools for Translators   N-Grams
                    Free Language Data      Moses
                    Machine Translation     Apertium


Types of Machine Translation

     Dictionary lookup – the most basic form of MT
     Translation Memory – also, a basic form of MT
     Example Based Machine Translation – considered the most
     accurate form of MT, but there are few if any examples “in
     the wild”.
     Statistical Machine Translation – currently the darling of
     research and the basis of Google Translate. Solves a lot of old
     problems, but introduces new one.




                        Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                    Tools for Translators   N-Grams
                    Free Language Data      Moses
                    Machine Translation     Apertium


Types of Machine Translation

     Dictionary lookup – the most basic form of MT
     Translation Memory – also, a basic form of MT
     Example Based Machine Translation – considered the most
     accurate form of MT, but there are few if any examples “in
     the wild”.
     Statistical Machine Translation – currently the darling of
     research and the basis of Google Translate. Solves a lot of old
     problems, but introduces new one. And breaks a lot of things
     that “used to work”.




                        Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                    Tools for Translators   N-Grams
                    Free Language Data      Moses
                    Machine Translation     Apertium


Types of Machine Translation

     Dictionary lookup – the most basic form of MT
     Translation Memory – also, a basic form of MT
     Example Based Machine Translation – considered the most
     accurate form of MT, but there are few if any examples “in
     the wild”.
     Statistical Machine Translation – currently the darling of
     research and the basis of Google Translate. Solves a lot of old
     problems, but introduces new one. And breaks a lot of things
     that “used to work”.
     Rule Based Machine Translation




                        Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                    Tools for Translators   N-Grams
                    Free Language Data      Moses
                    Machine Translation     Apertium


Types of Machine Translation

     Dictionary lookup – the most basic form of MT
     Translation Memory – also, a basic form of MT
     Example Based Machine Translation – considered the most
     accurate form of MT, but there are few if any examples “in
     the wild”.
     Statistical Machine Translation – currently the darling of
     research and the basis of Google Translate. Solves a lot of old
     problems, but introduces new one. And breaks a lot of things
     that “used to work”.
     Rule Based Machine Translation – The oldest kind of MT,
     dating back to the 1950s.


                        Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                    Tools for Translators   N-Grams
                    Free Language Data      Moses
                    Machine Translation     Apertium


Types of Machine Translation

     Dictionary lookup – the most basic form of MT
     Translation Memory – also, a basic form of MT
     Example Based Machine Translation – considered the most
     accurate form of MT, but there are few if any examples “in
     the wild”.
     Statistical Machine Translation – currently the darling of
     research and the basis of Google Translate. Solves a lot of old
     problems, but introduces new one. And breaks a lot of things
     that “used to work”.
     Rule Based Machine Translation – The oldest kind of MT,
     dating back to the 1950s. – The kind I work with, so
     obviously it’s the best

                        Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                    Tools for Translators   N-Grams
                    Free Language Data      Moses
                    Machine Translation     Apertium


Types of Machine Translation

     Dictionary lookup – the most basic form of MT
     Translation Memory – also, a basic form of MT
     Example Based Machine Translation – considered the most
     accurate form of MT, but there are few if any examples “in
     the wild”.
     Statistical Machine Translation – currently the darling of
     research and the basis of Google Translate. Solves a lot of old
     problems, but introduces new one. And breaks a lot of things
     that “used to work”.
     Rule Based Machine Translation – The oldest kind of MT,
     dating back to the 1950s. – The kind I work with, so
     obviously it’s the best!!!

                        Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                 Tools for Translators   N-Grams
                 Free Language Data      Moses
                 Machine Translation     Apertium


Is Machine Translation a Translator’s Tool?




  Yes.




                     Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                    Tools for Translators   N-Grams
                    Free Language Data      Moses
                    Machine Translation     Apertium


Is Machine Translation a Translator’s Tool?




  Yes.
  That might be hard to accept.




                        Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                     Tools for Translators   N-Grams
                     Free Language Data      Moses
                     Machine Translation     Apertium


Is Machine Translation a Translator’s Tool?




  Yes.
  That might be hard to accept. Particularly if you only speak
  English.




                         Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                     Tools for Translators   N-Grams
                     Free Language Data      Moses
                     Machine Translation     Apertium


Is Machine Translation a Translator’s Tool?




  Yes.
  That might be hard to accept. Particularly if you only speak
  English. But for closely-related, similar languages, machine
  translation can be as effective and accurate as a spelling checker.




                         Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                 Tools for Translators   N-Grams
                 Free Language Data      Moses
                 Machine Translation     Apertium


Uses of Machine Translation




                     Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                       Tools for Translators   N-Grams
                       Free Language Data      Moses
                       Machine Translation     Apertium


Uses of Machine Translation




    1   Assimilation




                           Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                      Tools for Translators   N-Grams
                      Free Language Data      Moses
                      Machine Translation     Apertium


Uses of Machine Translation




    1   Assimilation Understanding a text




                          Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                        Tools for Translators   N-Grams
                        Free Language Data      Moses
                        Machine Translation     Apertium


Uses of Machine Translation




    1   Assimilation Understanding a text
    2   Dissemination




                            Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                      Tools for Translators   N-Grams
                      Free Language Data      Moses
                      Machine Translation     Apertium


Uses of Machine Translation




    1   Assimilation Understanding a text
    2   Dissemination Preparing a text for translation.




                          Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                      Tools for Translators   N-Grams
                      Free Language Data      Moses
                      Machine Translation     Apertium


Uses of Machine Translation




    1   Assimilation Understanding a text
    2   Dissemination Preparing a text for translation. That is; for
        preparing a rough draft for a translator. Who then edits the
        text.




                          Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                     Tools for Translators   N-Grams
                     Free Language Data      Moses
                     Machine Translation     Apertium


“Phrase-Based” SMT




  Most current research (and commercial use) of Statistical MT uses
  “phrase-based” SMT.




                         Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                     Tools for Translators   N-Grams
                     Free Language Data      Moses
                     Machine Translation     Apertium


“Phrase-Based” SMT




  Most current research (and commercial use) of Statistical MT uses
  “phrase-based” SMT.
  The problem is it’s not phrase-based.




                         Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                     Tools for Translators   N-Grams
                     Free Language Data      Moses
                     Machine Translation     Apertium


“Phrase-Based” SMT




  Most current research (and commercial use) of Statistical MT uses
  “phrase-based” SMT.
  The problem is it’s not phrase-based.
  It’s N-Gram based.




                         Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                     Tools for Translators   N-Grams
                     Free Language Data      Moses
                     Machine Translation     Apertium


N-Grams




  An n-gram is a collection of “n” amounts of tokens




                         Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                     Tools for Translators   N-Grams
                     Free Language Data      Moses
                     Machine Translation     Apertium


N-Grams




  An n-gram is a collection of “n” amounts of tokens
  For text, these are usually (not always!) words




                         Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                     Tools for Translators   N-Grams
                     Free Language Data      Moses
                     Machine Translation     Apertium


N-Grams




  An n-gram is a collection of “n” amounts of tokens
  For text, these are usually (not always!) words
  ...punctuation is counted as a “word”.




                         Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                     Tools for Translators   N-Grams
                     Free Language Data      Moses
                     Machine Translation     Apertium


Unigrams


  “This is John’s dog.”




                          Jimmy O’Regan      Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                     Tools for Translators   N-Grams
                     Free Language Data      Moses
                     Machine Translation     Apertium


Unigrams


  “This is John’s dog.”
  Example
  This
  is
  John
  ’s
  dog
  .




                          Jimmy O’Regan      Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                     Tools for Translators   N-Grams
                     Free Language Data      Moses
                     Machine Translation     Apertium


Bigrams



  “This is John’s dog.”




                          Jimmy O’Regan      Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                     Tools for Translators   N-Grams
                     Free Language Data      Moses
                     Machine Translation     Apertium


Bigrams



  “This is John’s dog.”
  Example
  This is
  is John
  John ’s
  ’s dog
  dog .




                          Jimmy O’Regan      Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                   Tools for Translators   N-Grams
                   Free Language Data      Moses
                   Machine Translation     Apertium




Trigrams
“This is John’s dog.”




                        Jimmy O’Regan      Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                   Tools for Translators   N-Grams
                   Free Language Data      Moses
                   Machine Translation     Apertium




Trigrams
“This is John’s dog.”
Example
This is John
is John ’s
John ’s dog
’s dog .




                        Jimmy O’Regan      Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                    Tools for Translators   N-Grams
                    Free Language Data      Moses
                    Machine Translation     Apertium


N-Gram Language Models




  An n-gram language model is a collection of n-grams




                        Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                     Tools for Translators   N-Grams
                     Free Language Data      Moses
                     Machine Translation     Apertium


N-Gram Language Models




  An n-gram language model is a collection of n-grams for n..1: a
  trigram model includes trigrams, bigrams, and unigrams.




                         Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                     Tools for Translators   N-Grams
                     Free Language Data      Moses
                     Machine Translation     Apertium


N-Gram Language Models




  An n-gram language model is a collection of n-grams for n..1: a
  trigram model includes trigrams, bigrams, and unigrams.
  Each n-gram is listed along with its frequency




                         Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                     Tools for Translators   N-Grams
                     Free Language Data      Moses
                     Machine Translation     Apertium


N-Gram Language Models




  An n-gram language model is a collection of n-grams for n..1: a
  trigram model includes trigrams, bigrams, and unigrams.
  Each n-gram is listed along with its frequency
  (According to a particular corpus)




                         Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                     Tools for Translators   N-Grams
                     Free Language Data      Moses
                     Machine Translation     Apertium


N-Gram Language Models




  An n-gram language model is a collection of n-grams for n..1: a
  trigram model includes trigrams, bigrams, and unigrams.
  Each n-gram is listed along with its frequency
  (According to a particular corpus)
  But, most importantly...




                         Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                     Tools for Translators   N-Grams
                     Free Language Data      Moses
                     Machine Translation     Apertium


N-Gram Language Models




  N-grams overlap.




                         Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                    Tools for Translators   N-Grams
                    Free Language Data      Moses
                    Machine Translation     Apertium


N-Gram Language Models




  When a sequence of words is queried against a language model,
  the language model software computes the combined likelihood of
  1..n combinations in that sequence.




                        Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                    Tools for Translators   N-Grams
                    Free Language Data      Moses
                    Machine Translation     Apertium


N-Gram Language Models




  Possibly the first use of n-gram language models was in Automatic
  Speech Recognition.




                        Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                     Tools for Translators   N-Grams
                     Free Language Data      Moses
                     Machine Translation     Apertium


N-Gram Language Models


  For basic uses of ASR, such as call centres, a custom grammar is
  used.




                         Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                     Tools for Translators   N-Grams
                     Free Language Data      Moses
                     Machine Translation     Apertium


N-Gram Language Models


  For basic uses of ASR, such as call centres, a custom grammar is
  used.
  In a mobile phone, such a grammar could look like this:
  Example
  CALLWORD : phone call dial
  ZEROWORD : zero oh
  NUMBER : one two three four five six seven eight nine
  ZEROWORD
  NUMBERS : NUMBER* NUMBER
  COMMAND : CALLWORD NUMBERS



                         Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                     Tools for Translators   N-Grams
                     Free Language Data      Moses
                     Machine Translation     Apertium


N-Gram Language Models




  However, for continuous dictation, building such grammars is an
  almost infinite task.




                         Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                     Tools for Translators   N-Grams
                     Free Language Data      Moses
                     Machine Translation     Apertium


N-Gram Language Models




  However, for continuous dictation, building such grammars is an
  almost infinite task.
  Instead of defining long, complicated grammars that define, for
  example, when the sound /mi:t/ represents “meet” and when it
  represents “meat”, n-gram language models allow the correct
  sound to be chosen based on the context of the surrounding words.




                         Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                     Tools for Translators   N-Grams
                     Free Language Data      Moses
                     Machine Translation     Apertium


N-Gram Language Models




  This was an obvious progression for ASR, which uses statistical
  modelling to choose in context which sound is most likely, based
  on the surrounding sounds.




                         Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                     Tools for Translators   N-Grams
                     Free Language Data      Moses
                     Machine Translation     Apertium


N-Gram Language Models


  Now, language models are being used in all areas of language
  technology.
  The problem is: useful language models are huge, and can be
  computationally costly to use




                         Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                     Tools for Translators   N-Grams
                     Free Language Data      Moses
                     Machine Translation     Apertium


N-Gram Language Models


  Now, language models are being used in all areas of language
  technology.
  The problem is: useful language models are huge, and can be
  computationally costly to use unless you have a data centre.




                         Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                     Tools for Translators   N-Grams
                     Free Language Data      Moses
                     Machine Translation     Apertium


N-Gram Language Models


  Now, language models are being used in all areas of language
  technology.
  The problem is: useful language models are huge, and can be
  computationally costly to use unless you have a data centre.
  Google, for example, use language models for everything :
      Spell checking (Search, Google Wave, GMail)
      Search queries (“Did you mean?”)
      Machine translation




                         Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                     Tools for Translators   N-Grams
                     Free Language Data      Moses
                     Machine Translation     Apertium


N-Gram-Based SMT




  Basic Statistical MT uses a probabilistic dictionary: each word pair
  has a probability assigned.




                         Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                     Tools for Translators   N-Grams
                     Free Language Data      Moses
                     Machine Translation     Apertium


N-Gram-Based SMT




  Basic Statistical MT uses a probabilistic dictionary: each word pair
  has a probability assigned.
  The interesting part is how they get those dictionaries.




                         Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                     Tools for Translators   N-Grams
                     Free Language Data      Moses
                     Machine Translation     Apertium


N-Gram-Based SMT




  Basic Statistical MT uses a probabilistic dictionary: each word pair
  has a probability assigned.
  The interesting part is how they get those dictionaries.
  A program, usually GIZA++ (Open Source), reads two pairs of
  text: the source language, and the target language.




                         Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                      Tools for Translators   N-Grams
                      Free Language Data      Moses
                      Machine Translation     Apertium


N-Gram-Based SMT


  For each word in each sentence of the source language, the
  probable translation is considered to be every word in the target; as
  more words are seen, the translations are re-evaluated: the next
  time word 1 is used, perhaps “possible translation 1” is present,
  but “possible translation 2” is absent from the sentence: the
  probability of the former is increased; the latter, decreased.
  And so on, over the course of the text, the probabilities of each
  word are re-evaluated; then the whole text is processed again, and
  again, until a reasonable level of probability remains.
  The result is a probabilistic dictionary.



                          Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                    Tools for Translators   N-Grams
                    Free Language Data      Moses
                    Machine Translation     Apertium


N-Gram-Based SMT


  N-grams come into the picture in two ways:




                        Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                      Tools for Translators   N-Grams
                      Free Language Data      Moses
                      Machine Translation     Apertium


N-Gram-Based SMT


  N-grams come into the picture in two ways:
    1   Evaluating multiple probable translations




                          Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                      Tools for Translators   N-Grams
                      Free Language Data      Moses
                      Machine Translation     Apertium


N-Gram-Based SMT


  N-grams come into the picture in two ways:
    1   Evaluating multiple probable translations
        Similarly to Speech Recognition, each choice is evaluated
        against a language model




                          Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                      Tools for Translators   N-Grams
                      Free Language Data      Moses
                      Machine Translation     Apertium


N-Gram-Based SMT


  N-grams come into the picture in two ways:
    1   Evaluating multiple probable translations
        Similarly to Speech Recognition, each choice is evaluated
        against a language model
    2   N-grams as “words”




                          Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                       Tools for Translators   N-Grams
                       Free Language Data      Moses
                       Machine Translation     Apertium


N-Gram-Based SMT


  N-grams come into the picture in two ways:
    1   Evaluating multiple probable translations
        Similarly to Speech Recognition, each choice is evaluated
        against a language model
    2   N-grams as “words”
        As well as considering individual words, each n-gram is
        considered as a possible “phrase”, and treated as an individual
        word.




                           Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                       Tools for Translators   N-Grams
                       Free Language Data      Moses
                       Machine Translation     Apertium


N-Gram-Based SMT


  N-grams come into the picture in two ways:
    1   Evaluating multiple probable translations
        Similarly to Speech Recognition, each choice is evaluated
        against a language model
    2   N-grams as “words”
        As well as considering individual words, each n-gram is
        considered as a possible “phrase”, and treated as an individual
        word. This helps to cut down on ambiguous terms:
        “basketball coach” vs. “coach driver”.




                           Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                      Tools for Translators   N-Grams
                      Free Language Data      Moses
                      Machine Translation     Apertium


Moses

  Moses is an Open Source SMT system. Moses has a distinct
  advantage over several other SMT systems:
    1   It’s Open Source




                           Jimmy O’Regan      Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                      Tools for Translators   N-Grams
                      Free Language Data      Moses
                      Machine Translation     Apertium


Moses

  Moses is an Open Source SMT system. Moses has a distinct
  advantage over several other SMT systems:
    1   It’s Open Source
        Actively developed, and supported by a large community




                          Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                      Tools for Translators   N-Grams
                      Free Language Data      Moses
                      Machine Translation     Apertium


Moses

  Moses is an Open Source SMT system. Moses has a distinct
  advantage over several other SMT systems:
    1   It’s Open Source
        Actively developed, and supported by a large community
    2   Factored Models




                          Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                       Tools for Translators   N-Grams
                       Free Language Data      Moses
                       Machine Translation     Apertium


Moses

  Moses is an Open Source SMT system. Moses has a distinct
  advantage over several other SMT systems:
    1   It’s Open Source
        Actively developed, and supported by a large community
    2   Factored Models
        Moses is able to make use of linguistic information.




                           Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                       Tools for Translators   N-Grams
                       Free Language Data      Moses
                       Machine Translation     Apertium


Moses

  Moses is an Open Source SMT system. Moses has a distinct
  advantage over several other SMT systems:
    1   It’s Open Source
        Actively developed, and supported by a large community
    2   Factored Models
        Moses is able to make use of linguistic information.
    3   Open Data




                           Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                       Tools for Translators   N-Grams
                       Free Language Data      Moses
                       Machine Translation     Apertium


Moses

  Moses is an Open Source SMT system. Moses has a distinct
  advantage over several other SMT systems:
    1   It’s Open Source
        Actively developed, and supported by a large community
    2   Factored Models
        Moses is able to make use of linguistic information.
    3   Open Data
        The Moses developers also recognise the importance of Free
        Linguistic Data, and have provided the EuroParl corpus so
        that others may build a statistical MT system using it. Also,
        the JRC Acquis – the corpus of EU legal text (and most of
        the data behind Google Translate’s support for most official
        EU languages) have prepared their corpus for use with Moses.

                           Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                    Tools for Translators   N-Grams
                    Free Language Data      Moses
                    Machine Translation     Apertium


Apertium
  Apertium is an Open Source Machine Translation platform.




                        Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                    Tools for Translators   N-Grams
                    Free Language Data      Moses
                    Machine Translation     Apertium


Apertium
  Apertium is an Open Source Machine Translation platform.
      Rule Based




                        Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                    Tools for Translators   N-Grams
                    Free Language Data      Moses
                    Machine Translation     Apertium


Apertium
  Apertium is an Open Source Machine Translation platform.
      Rule Based
      Statistical disambiguation




                        Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                    Tools for Translators   N-Grams
                    Free Language Data      Moses
                    Machine Translation     Apertium


Apertium
  Apertium is an Open Source Machine Translation platform.
      Rule Based
      Statistical disambiguation
      Follows the UNIX philosophy




                        Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                    Tools for Translators   N-Grams
                    Free Language Data      Moses
                    Machine Translation     Apertium


Apertium
  Apertium is an Open Source Machine Translation platform.
      Rule Based
      Statistical disambiguation
      Follows the UNIX philosophy:
      The system is a pipeline




                        Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                    Tools for Translators   N-Grams
                    Free Language Data      Moses
                    Machine Translation     Apertium


Apertium
  Apertium is an Open Source Machine Translation platform.
      Rule Based
      Statistical disambiguation
      Follows the UNIX philosophy:
      The system is a pipeline
      Each piece “does one thing, and does it well”




                        Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                     Tools for Translators   N-Grams
                     Free Language Data      Moses
                     Machine Translation     Apertium


Apertium
  Apertium is an Open Source Machine Translation platform.
      Rule Based
      Statistical disambiguation
      Follows the UNIX philosophy:
      The system is a pipeline
      Each piece “does one thing, and does it well”
      (Not quite: analysis and generation of words are performed by
      separate modes of the same program)




                         Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                     Tools for Translators   N-Grams
                     Free Language Data      Moses
                     Machine Translation     Apertium


Apertium
  Apertium is an Open Source Machine Translation platform.
      Rule Based
      Statistical disambiguation
      Follows the UNIX philosophy:
      The system is a pipeline
      Each piece “does one thing, and does it well”
      (Not quite: analysis and generation of words are performed by
      separate modes of the same program)
      Each component can be easily replaced




                         Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                     Tools for Translators   N-Grams
                     Free Language Data      Moses
                     Machine Translation     Apertium


Apertium
  Apertium is an Open Source Machine Translation platform.
      Rule Based
      Statistical disambiguation
      Follows the UNIX philosophy:
      The system is a pipeline
      Each piece “does one thing, and does it well”
      (Not quite: analysis and generation of words are performed by
      separate modes of the same program)
      Each component can be easily replaced
      The apertium program itself is just a shell script that calls
      the correct pipeline.
      Several statistics-based tools for building data



                         Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                     Tools for Translators   N-Grams
                     Free Language Data      Moses
                     Machine Translation     Apertium


Apertium
  Apertium is an Open Source Machine Translation platform.
      Rule Based
      Statistical disambiguation
      Follows the UNIX philosophy:
      The system is a pipeline
      Each piece “does one thing, and does it well”
      (Not quite: analysis and generation of words are performed by
      separate modes of the same program)
      Each component can be easily replaced
      The apertium program itself is just a shell script that calls
      the correct pipeline.
      Several statistics-based tools for building data: dictionaries



                         Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                     Tools for Translators   N-Grams
                     Free Language Data      Moses
                     Machine Translation     Apertium


Apertium
  Apertium is an Open Source Machine Translation platform.
      Rule Based
      Statistical disambiguation
      Follows the UNIX philosophy:
      The system is a pipeline
      Each piece “does one thing, and does it well”
      (Not quite: analysis and generation of words are performed by
      separate modes of the same program)
      Each component can be easily replaced
      The apertium program itself is just a shell script that calls
      the correct pipeline.
      Several statistics-based tools for building data: dictionaries,
      rules
      Doesn’t run on Windows
                         Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                     Tools for Translators   N-Grams
                     Free Language Data      Moses
                     Machine Translation     Apertium


Apertium
  Apertium is an Open Source Machine Translation platform.
      Rule Based
      Statistical disambiguation
      Follows the UNIX philosophy:
      The system is a pipeline
      Each piece “does one thing, and does it well”
      (Not quite: analysis and generation of words are performed by
      separate modes of the same program)
      Each component can be easily replaced
      The apertium program itself is just a shell script that calls
      the correct pipeline.
      Several statistics-based tools for building data: dictionaries,
      rules
      Doesn’t run on Windows yet.
                         Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                     Tools for Translators   N-Grams
                     Free Language Data      Moses
                     Machine Translation     Apertium


Apertium
  Apertium is an Open Source Machine Translation platform.
      Rule Based
      Statistical disambiguation
      Follows the UNIX philosophy:
      The system is a pipeline
      Each piece “does one thing, and does it well”
      (Not quite: analysis and generation of words are performed by
      separate modes of the same program)
      Each component can be easily replaced
      The apertium program itself is just a shell script that calls
      the correct pipeline.
      Several statistics-based tools for building data: dictionaries,
      rules
      Doesn’t run on Windows yet. Shame on you for cheering!
                         Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                     Tools for Translators   N-Grams
                     Free Language Data      Moses
                     Machine Translation     Apertium


Apertium
  Apertium is an Open Source Machine Translation platform.
      Rule Based
      Statistical disambiguation
      Follows the UNIX philosophy:
      The system is a pipeline
      Each piece “does one thing, and does it well”
      (Not quite: analysis and generation of words are performed by
      separate modes of the same program)
      Each component can be easily replaced
      The apertium program itself is just a shell script that calls
      the correct pipeline.
      Several statistics-based tools for building data: dictionaries,
      rules
      Doesn’t run on Windows yet. Shame on you for cheering! ;)
                         Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                   Tools for Translators   N-Grams
                   Free Language Data      Moses
                   Machine Translation     Apertium


Some Errors




     (Spanish – English)




                       Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                   Tools for Translators   N-Grams
                   Free Language Data      Moses
                   Machine Translation     Apertium


Some Errors




     (Spanish – English)
     Fondo Monetario Internacional




                       Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                  Tools for Translators   N-Grams
                  Free Language Data      Moses
                  Machine Translation     Apertium


Some Errors




     (Spanish – English)
     Fondo Monetario Internacional
     International Monetary bottom




                      Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                   Tools for Translators   N-Grams
                   Free Language Data      Moses
                   Machine Translation     Apertium


Some Errors




     (Spanish – English)
     Fondo Monetario Internacional
     International Monetary bottom
     (Catalan – English)




                       Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                   Tools for Translators   N-Grams
                   Free Language Data      Moses
                   Machine Translation     Apertium


Some Errors




     (Spanish – English)
     Fondo Monetario Internacional
     International Monetary bottom
     (Catalan – English)
     Fidel Castro




                       Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                   Tools for Translators   N-Grams
                   Free Language Data      Moses
                   Machine Translation     Apertium


Some Errors




     (Spanish – English)
     Fondo Monetario Internacional
     International Monetary bottom
     (Catalan – English)
     Fidel Castro
     Faithful Castrate




                       Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                     Tools for Translators   N-Grams
                     Free Language Data      Moses
                     Machine Translation     Apertium


Apertium




  Apertium was born from a set of translators developed in
  Universitat d’Alacant, as part of the OpenTrad project.




                         Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                     Tools for Translators   N-Grams
                     Free Language Data      Moses
                     Machine Translation     Apertium


Apertium




  Apertium was born from a set of translators developed in
  Universitat d’Alacant, as part of the OpenTrad project. Originally
  designed to translate between the Romance languages of Spain, it
  has been expanded over time to support more distant languages:




                         Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                     Tools for Translators   N-Grams
                     Free Language Data      Moses
                     Machine Translation     Apertium


Apertium




  Apertium was born from a set of translators developed in
  Universitat d’Alacant, as part of the OpenTrad project. Originally
  designed to translate between the Romance languages of Spain, it
  has been expanded over time to support more distant languages:
  First English–Catalan




                         Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                     Tools for Translators   N-Grams
                     Free Language Data      Moses
                     Machine Translation     Apertium


Apertium




  Apertium was born from a set of translators developed in
  Universitat d’Alacant, as part of the OpenTrad project. Originally
  designed to translate between the Romance languages of Spain, it
  has been expanded over time to support more distant languages:
  First English–Catalan More recently, Basque to Spanish




                         Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                     Tools for Translators   N-Grams
                     Free Language Data      Moses
                     Machine Translation     Apertium


Community

  As part of the OpenTrad project, Apertium had a community of
  developers, but limited to university and business developments.
  Thanks mostly to Francis Tyers, Apertium has in recent years
  begun to also acquire a community of volunteer contributors.




                         Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                     Tools for Translators   N-Grams
                     Free Language Data      Moses
                     Machine Translation     Apertium


Community

  As part of the OpenTrad project, Apertium had a community of
  developers, but limited to university and business developments.
  Thanks mostly to Francis Tyers, Apertium has in recent years
  begun to also acquire a community of volunteer contributors.
  The first release from the volunteer community was our Welsh to
  English translator (mostly designed by Kevin Donnelly – who also
  maintains the Welsh spell checking data).




                         Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                     Tools for Translators   N-Grams
                     Free Language Data      Moses
                     Machine Translation     Apertium


Community

  As part of the OpenTrad project, Apertium had a community of
  developers, but limited to university and business developments.
  Thanks mostly to Francis Tyers, Apertium has in recent years
  begun to also acquire a community of volunteer contributors.
  The first release from the volunteer community was our Welsh to
  English translator (mostly designed by Kevin Donnelly – who also
  maintains the Welsh spell checking data).
  This summer, we took part in Google’s Summer of Code
  programme, with 8 successful students. One of the translators
  developed during GSoC, for Norwegian Bokm˚l–Nynorsk, has
                                                a
  (within a month of release) been used to translate 30 articles on
  the Nynorsk Wikipedia.


                         Jimmy O’Regan       Machine Translation and Translation Technology
Outline    Is Machine Translation a Translator’s Tool?
                    Tools for Translators   N-Grams
                    Free Language Data      Moses
                    Machine Translation     Apertium


Language Pairs Supported


   Spanish – Catalan         Spanish – Romanian
   French – Catalan          Occitan – Catalan
   English – Galician        Occitan – Spanish
   Spanish – Portuguese      English – Catalan
   English – Spanish         English – Esperanto
   Spanish – Galician        French – Spanish
   Esperanto – Spanish       Welsh – English
   Breton – French           Esperanto – Catalan
   Portuguese – Catalan      Portuguese – Galician
   Basque – Spanish          Nynorsk – Bokm˚la




                        Jimmy O’Regan       Machine Translation and Translation Technology

Weitere ähnliche Inhalte

Was ist angesagt?

Instant speech translation 10BM60080 - VGSOM
Instant speech translation   10BM60080 - VGSOMInstant speech translation   10BM60080 - VGSOM
Instant speech translation 10BM60080 - VGSOMsathiyaseelanm
 
Statistical machine translation for indian language copy
Statistical machine translation for indian language   copyStatistical machine translation for indian language   copy
Statistical machine translation for indian language copyNakul Sharma
 
Computer programing 111 lecture 1
Computer programing 111 lecture 1 Computer programing 111 lecture 1
Computer programing 111 lecture 1 ITNet
 
Draft Translation: An Insight In 5 Minutes
Draft Translation: An Insight In 5 MinutesDraft Translation: An Insight In 5 Minutes
Draft Translation: An Insight In 5 MinutesEleonoreWapler
 
Controlled Language
Controlled LanguageControlled Language
Controlled LanguageUwe Muegge
 
Better problem solving through scripting: How to think through your #eprdctn ...
Better problem solving through scripting: How to think through your #eprdctn ...Better problem solving through scripting: How to think through your #eprdctn ...
Better problem solving through scripting: How to think through your #eprdctn ...BookNet Canada
 
Natural language processing with python and amharic syntax parse tree by dani...
Natural language processing with python and amharic syntax parse tree by dani...Natural language processing with python and amharic syntax parse tree by dani...
Natural language processing with python and amharic syntax parse tree by dani...Daniel Adenew
 
Past, Present, and Future: Machine Translation & Natural Language Processing ...
Past, Present, and Future: Machine Translation & Natural Language Processing ...Past, Present, and Future: Machine Translation & Natural Language Processing ...
Past, Present, and Future: Machine Translation & Natural Language Processing ...John Tinsley
 
Script writing (2)
Script writing (2)Script writing (2)
Script writing (2)lenteraide
 
Machine Translation: The Neural Frontier
Machine Translation: The Neural FrontierMachine Translation: The Neural Frontier
Machine Translation: The Neural FrontierJohn Tinsley
 
Translation and Transcreation Workshop
Translation and Transcreation Workshop Translation and Transcreation Workshop
Translation and Transcreation Workshop Conversis
 

Was ist angesagt? (19)

Instant speech translation 10BM60080 - VGSOM
Instant speech translation   10BM60080 - VGSOMInstant speech translation   10BM60080 - VGSOM
Instant speech translation 10BM60080 - VGSOM
 
Statistical machine translation for indian language copy
Statistical machine translation for indian language   copyStatistical machine translation for indian language   copy
Statistical machine translation for indian language copy
 
Computer programing 111 lecture 1
Computer programing 111 lecture 1 Computer programing 111 lecture 1
Computer programing 111 lecture 1
 
Draft Translation: An Insight In 5 Minutes
Draft Translation: An Insight In 5 MinutesDraft Translation: An Insight In 5 Minutes
Draft Translation: An Insight In 5 Minutes
 
Why Ruby
Why RubyWhy Ruby
Why Ruby
 
Lec 0 p pl
Lec 0 p plLec 0 p pl
Lec 0 p pl
 
Glossary Translation
Glossary TranslationGlossary Translation
Glossary Translation
 
JAVA
JAVAJAVA
JAVA
 
Controlled Language
Controlled LanguageControlled Language
Controlled Language
 
Better problem solving through scripting: How to think through your #eprdctn ...
Better problem solving through scripting: How to think through your #eprdctn ...Better problem solving through scripting: How to think through your #eprdctn ...
Better problem solving through scripting: How to think through your #eprdctn ...
 
Natural language processing with python and amharic syntax parse tree by dani...
Natural language processing with python and amharic syntax parse tree by dani...Natural language processing with python and amharic syntax parse tree by dani...
Natural language processing with python and amharic syntax parse tree by dani...
 
Past, Present, and Future: Machine Translation & Natural Language Processing ...
Past, Present, and Future: Machine Translation & Natural Language Processing ...Past, Present, and Future: Machine Translation & Natural Language Processing ...
Past, Present, and Future: Machine Translation & Natural Language Processing ...
 
Script writing (2)
Script writing (2)Script writing (2)
Script writing (2)
 
Machine Translation: The Neural Frontier
Machine Translation: The Neural FrontierMachine Translation: The Neural Frontier
Machine Translation: The Neural Frontier
 
Chatbot ppt
Chatbot pptChatbot ppt
Chatbot ppt
 
Cuttingedge profile
Cuttingedge  profileCuttingedge  profile
Cuttingedge profile
 
Python lec 1001_for_biologists
Python lec 1001_for_biologistsPython lec 1001_for_biologists
Python lec 1001_for_biologists
 
Translation and Transcreation Workshop
Translation and Transcreation Workshop Translation and Transcreation Workshop
Translation and Transcreation Workshop
 
Python lec 1002_for_biologists
Python lec 1002_for_biologistsPython lec 1002_for_biologists
Python lec 1002_for_biologists
 

Ähnlich wie MT and Translator's Tools

10 World’s Leading Speech or Voice Recognition Software That Can 3X Your Prod...
10 World’s Leading Speech or Voice Recognition Software That Can 3X Your Prod...10 World’s Leading Speech or Voice Recognition Software That Can 3X Your Prod...
10 World’s Leading Speech or Voice Recognition Software That Can 3X Your Prod...nehachhh
 
What is chat gpt advance guide.docx
What is chat gpt advance guide.docxWhat is chat gpt advance guide.docx
What is chat gpt advance guide.docxVersionsol
 
What is the best programming language for your web product?
What is the best programming language for your web product?What is the best programming language for your web product?
What is the best programming language for your web product?MobiDev
 
Cloud-Native Roadshow - Google - San Francisco
Cloud-Native Roadshow - Google - San FranciscoCloud-Native Roadshow - Google - San Francisco
Cloud-Native Roadshow - Google - San FranciscoVMware Tanzu
 
Cloud-Native Roadshow - Google - Denver
Cloud-Native Roadshow - Google - DenverCloud-Native Roadshow - Google - Denver
Cloud-Native Roadshow - Google - DenverVMware Tanzu
 
Cloud-Native Roadshow Boston: Google
Cloud-Native Roadshow Boston: GoogleCloud-Native Roadshow Boston: Google
Cloud-Native Roadshow Boston: GoogleVMware Tanzu
 
Cloud-Native Roadshow - Google - Dallas
Cloud-Native Roadshow - Google - DallasCloud-Native Roadshow - Google - Dallas
Cloud-Native Roadshow - Google - DallasVMware Tanzu
 
AIM | HDC 2016 Globalization As a Service
AIM | HDC 2016 Globalization As a ServiceAIM | HDC 2016 Globalization As a Service
AIM | HDC 2016 Globalization As a ServiceRamzi Yassine
 
How to Purchase Translations and What to Look For in a Supplier
How to Purchase Translations and What to Look For in a SupplierHow to Purchase Translations and What to Look For in a Supplier
How to Purchase Translations and What to Look For in a SupplierResearchShare
 
SDL BeGlobal The SDL Platform for Automated Translation
SDL BeGlobal The SDL Platform for Automated TranslationSDL BeGlobal The SDL Platform for Automated Translation
SDL BeGlobal The SDL Platform for Automated TranslationSDL Trados
 
BIS07 Application Development - I
BIS07 Application Development - IBIS07 Application Development - I
BIS07 Application Development - IPrithwis Mukerjee
 
2. Project Management - Alexandre Helle & Manuel Herranz (Pangeanic)
2. Project Management - Alexandre Helle & Manuel Herranz (Pangeanic)2. Project Management - Alexandre Helle & Manuel Herranz (Pangeanic)
2. Project Management - Alexandre Helle & Manuel Herranz (Pangeanic)RIILP
 
CSC1100 - Chapter11 - Programming Languages and Program Development
CSC1100 - Chapter11 - Programming Languages and Program DevelopmentCSC1100 - Chapter11 - Programming Languages and Program Development
CSC1100 - Chapter11 - Programming Languages and Program DevelopmentYhal Htet Aung
 
ChatGPT-Revolutionizing Communication.pdf
ChatGPT-Revolutionizing Communication.pdfChatGPT-Revolutionizing Communication.pdf
ChatGPT-Revolutionizing Communication.pdfRahul Ghorpade
 
Top 10 Best Speech Recognition Software
Top 10 Best Speech Recognition Software Top 10 Best Speech Recognition Software
Top 10 Best Speech Recognition Software Jame Williamson
 
wepik-understanding-computer-languages-and-translators-a-comprehensive-analys...
wepik-understanding-computer-languages-and-translators-a-comprehensive-analys...wepik-understanding-computer-languages-and-translators-a-comprehensive-analys...
wepik-understanding-computer-languages-and-translators-a-comprehensive-analys...juristsjunction
 
Cloud-Native Workshop NYC - Leveraging Google Cloud Services with Spring Boot...
Cloud-Native Workshop NYC - Leveraging Google Cloud Services with Spring Boot...Cloud-Native Workshop NYC - Leveraging Google Cloud Services with Spring Boot...
Cloud-Native Workshop NYC - Leveraging Google Cloud Services with Spring Boot...VMware Tanzu
 

Ähnlich wie MT and Translator's Tools (20)

10 World’s Leading Speech or Voice Recognition Software That Can 3X Your Prod...
10 World’s Leading Speech or Voice Recognition Software That Can 3X Your Prod...10 World’s Leading Speech or Voice Recognition Software That Can 3X Your Prod...
10 World’s Leading Speech or Voice Recognition Software That Can 3X Your Prod...
 
What is chat gpt advance guide.docx
What is chat gpt advance guide.docxWhat is chat gpt advance guide.docx
What is chat gpt advance guide.docx
 
What is the best programming language for your web product?
What is the best programming language for your web product?What is the best programming language for your web product?
What is the best programming language for your web product?
 
Cloud-Native Roadshow - Google - San Francisco
Cloud-Native Roadshow - Google - San FranciscoCloud-Native Roadshow - Google - San Francisco
Cloud-Native Roadshow - Google - San Francisco
 
Cloud-Native Roadshow - Google - Denver
Cloud-Native Roadshow - Google - DenverCloud-Native Roadshow - Google - Denver
Cloud-Native Roadshow - Google - Denver
 
Cloud-Native Roadshow Boston: Google
Cloud-Native Roadshow Boston: GoogleCloud-Native Roadshow Boston: Google
Cloud-Native Roadshow Boston: Google
 
Cloud-Native Roadshow - Google - Dallas
Cloud-Native Roadshow - Google - DallasCloud-Native Roadshow - Google - Dallas
Cloud-Native Roadshow - Google - Dallas
 
AIM | HDC 2016 Globalization As a Service
AIM | HDC 2016 Globalization As a ServiceAIM | HDC 2016 Globalization As a Service
AIM | HDC 2016 Globalization As a Service
 
Google Voice-to-text
Google Voice-to-textGoogle Voice-to-text
Google Voice-to-text
 
How to Purchase Translations and What to Look For in a Supplier
How to Purchase Translations and What to Look For in a SupplierHow to Purchase Translations and What to Look For in a Supplier
How to Purchase Translations and What to Look For in a Supplier
 
SDL BeGlobal The SDL Platform for Automated Translation
SDL BeGlobal The SDL Platform for Automated TranslationSDL BeGlobal The SDL Platform for Automated Translation
SDL BeGlobal The SDL Platform for Automated Translation
 
BIS07 Application Development - I
BIS07 Application Development - IBIS07 Application Development - I
BIS07 Application Development - I
 
2. Project Management - Alexandre Helle & Manuel Herranz (Pangeanic)
2. Project Management - Alexandre Helle & Manuel Herranz (Pangeanic)2. Project Management - Alexandre Helle & Manuel Herranz (Pangeanic)
2. Project Management - Alexandre Helle & Manuel Herranz (Pangeanic)
 
Review of GPTVoicer
Review of GPTVoicerReview of GPTVoicer
Review of GPTVoicer
 
CSC1100 - Chapter11 - Programming Languages and Program Development
CSC1100 - Chapter11 - Programming Languages and Program DevelopmentCSC1100 - Chapter11 - Programming Languages and Program Development
CSC1100 - Chapter11 - Programming Languages and Program Development
 
ChatGPT-Revolutionizing Communication.pdf
ChatGPT-Revolutionizing Communication.pdfChatGPT-Revolutionizing Communication.pdf
ChatGPT-Revolutionizing Communication.pdf
 
Top 10 Best Speech Recognition Software
Top 10 Best Speech Recognition Software Top 10 Best Speech Recognition Software
Top 10 Best Speech Recognition Software
 
.Pigeon Project
.Pigeon Project.Pigeon Project
.Pigeon Project
 
wepik-understanding-computer-languages-and-translators-a-comprehensive-analys...
wepik-understanding-computer-languages-and-translators-a-comprehensive-analys...wepik-understanding-computer-languages-and-translators-a-comprehensive-analys...
wepik-understanding-computer-languages-and-translators-a-comprehensive-analys...
 
Cloud-Native Workshop NYC - Leveraging Google Cloud Services with Spring Boot...
Cloud-Native Workshop NYC - Leveraging Google Cloud Services with Spring Boot...Cloud-Native Workshop NYC - Leveraging Google Cloud Services with Spring Boot...
Cloud-Native Workshop NYC - Leveraging Google Cloud Services with Spring Boot...
 

Mehr von Jim O'Regan

How to use a toilet brush
How to use a toilet brushHow to use a toilet brush
How to use a toilet brushJim O'Regan
 
Speech recognition for Riksdag open data
Speech recognition for Riksdag open dataSpeech recognition for Riksdag open data
Speech recognition for Riksdag open dataJim O'Regan
 
Continued Fine-tuning as Single Speaker Adaptation
Continued Fine-tuning as Single Speaker AdaptationContinued Fine-tuning as Single Speaker Adaptation
Continued Fine-tuning as Single Speaker AdaptationJim O'Regan
 
Seminar: Language Variation in Parliamentary Speeches_ First Steps Towards Ro...
Seminar: Language Variation in Parliamentary Speeches_ First Steps Towards Ro...Seminar: Language Variation in Parliamentary Speeches_ First Steps Towards Ro...
Seminar: Language Variation in Parliamentary Speeches_ First Steps Towards Ro...Jim O'Regan
 
Seminar: PSST challenge LREC
Seminar: PSST challenge LRECSeminar: PSST challenge LREC
Seminar: PSST challenge LRECJim O'Regan
 
30% seminar "kappa"
30% seminar "kappa"30% seminar "kappa"
30% seminar "kappa"Jim O'Regan
 
Language Variation in Parliamentary Speeches: First Steps Towards Robust Phon...
Language Variation in Parliamentary Speeches: First Steps Towards Robust Phon...Language Variation in Parliamentary Speeches: First Steps Towards Robust Phon...
Language Variation in Parliamentary Speeches: First Steps Towards Robust Phon...Jim O'Regan
 
PSST challenge LREC
PSST challenge LRECPSST challenge LREC
PSST challenge LRECJim O'Regan
 
Shallow-transfer rule-based machine translation from Czech to Polish
Shallow-transfer rule-based machine translation from Czech to PolishShallow-transfer rule-based machine translation from Czech to Polish
Shallow-transfer rule-based machine translation from Czech to PolishJim O'Regan
 

Mehr von Jim O'Regan (9)

How to use a toilet brush
How to use a toilet brushHow to use a toilet brush
How to use a toilet brush
 
Speech recognition for Riksdag open data
Speech recognition for Riksdag open dataSpeech recognition for Riksdag open data
Speech recognition for Riksdag open data
 
Continued Fine-tuning as Single Speaker Adaptation
Continued Fine-tuning as Single Speaker AdaptationContinued Fine-tuning as Single Speaker Adaptation
Continued Fine-tuning as Single Speaker Adaptation
 
Seminar: Language Variation in Parliamentary Speeches_ First Steps Towards Ro...
Seminar: Language Variation in Parliamentary Speeches_ First Steps Towards Ro...Seminar: Language Variation in Parliamentary Speeches_ First Steps Towards Ro...
Seminar: Language Variation in Parliamentary Speeches_ First Steps Towards Ro...
 
Seminar: PSST challenge LREC
Seminar: PSST challenge LRECSeminar: PSST challenge LREC
Seminar: PSST challenge LREC
 
30% seminar "kappa"
30% seminar "kappa"30% seminar "kappa"
30% seminar "kappa"
 
Language Variation in Parliamentary Speeches: First Steps Towards Robust Phon...
Language Variation in Parliamentary Speeches: First Steps Towards Robust Phon...Language Variation in Parliamentary Speeches: First Steps Towards Robust Phon...
Language Variation in Parliamentary Speeches: First Steps Towards Robust Phon...
 
PSST challenge LREC
PSST challenge LRECPSST challenge LREC
PSST challenge LREC
 
Shallow-transfer rule-based machine translation from Czech to Polish
Shallow-transfer rule-based machine translation from Czech to PolishShallow-transfer rule-based machine translation from Czech to Polish
Shallow-transfer rule-based machine translation from Czech to Polish
 

Kürzlich hochgeladen

Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 

Kürzlich hochgeladen (20)

Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 

MT and Translator's Tools

  • 1. Outline Tools for Translators Free Language Data Machine Translation Machine Translation and Translation Technology Jimmy O’Regan The Apertium Project OSS Bar Camp, 19 September 2009 Jimmy O’Regan Machine Translation and Translation Technology
  • 2. Outline Tools for Translators Free Language Data Machine Translation 1 Free Language Data Jimmy O’Regan Machine Translation and Translation Technology
  • 3. Outline Tools for Translators Terminology Free Language Data Localisation vs. Translation Machine Translation Tools for Translators Jimmy O’Regan Machine Translation and Translation Technology
  • 4. Outline Tools for Translators Terminology Free Language Data Localisation vs. Translation Machine Translation Some Terminology Internationalisation Giving software the capability to display text in another language Jimmy O’Regan Machine Translation and Translation Technology
  • 5. Outline Tools for Translators Terminology Free Language Data Localisation vs. Translation Machine Translation Some Terminology Internationalisation Giving software the capability to display text in another language In Open Source, this generally means adding support for gettext. Jimmy O’Regan Machine Translation and Translation Technology
  • 6. Outline Tools for Translators Terminology Free Language Data Localisation vs. Translation Machine Translation Some Terminology Internationalisation Giving software the capability to display text in another language In Open Source, this generally means adding support for gettext. Localisation Jimmy O’Regan Machine Translation and Translation Technology
  • 7. Outline Tools for Translators Terminology Free Language Data Localisation vs. Translation Machine Translation Some Terminology Internationalisation Giving software the capability to display text in another language In Open Source, this generally means adding support for gettext. Localisation Customising the messages displayed to the user to appear in the manner most appropriate for them. Jimmy O’Regan Machine Translation and Translation Technology
  • 8. Outline Tools for Translators Terminology Free Language Data Localisation vs. Translation Machine Translation Some Terminology Internationalisation Giving software the capability to display text in another language In Open Source, this generally means adding support for gettext. Localisation Customising the messages displayed to the user to appear in the manner most appropriate for them. In their language, or their dialect. Jimmy O’Regan Machine Translation and Translation Technology
  • 9. Outline Tools for Translators Terminology Free Language Data Localisation vs. Translation Machine Translation Some Terminology Internationalisation Giving software the capability to display text in another language In Open Source, this generally means adding support for gettext. Localisation Customising the messages displayed to the user to appear in the manner most appropriate for them. In their language, or their dialect. Translation Jimmy O’Regan Machine Translation and Translation Technology
  • 10. Outline Tools for Translators Terminology Free Language Data Localisation vs. Translation Machine Translation Some Terminology Internationalisation Giving software the capability to display text in another language In Open Source, this generally means adding support for gettext. Localisation Customising the messages displayed to the user to appear in the manner most appropriate for them. In their language, or their dialect. Translation Converting text from one language to another Jimmy O’Regan Machine Translation and Translation Technology
  • 11. Outline Tools for Translators Terminology Free Language Data Localisation vs. Translation Machine Translation Localisation vs. Translation Localisation and translation are sometimes, but not always, the same. Jimmy O’Regan Machine Translation and Translation Technology
  • 12. Outline Tools for Translators Terminology Free Language Data Localisation vs. Translation Machine Translation Localisation vs. Translation Localisation and translation are sometimes, but not always, the same. Documents may need to be localised, but not translated: Jimmy O’Regan Machine Translation and Translation Technology
  • 13. Outline Tools for Translators Terminology Free Language Data Localisation vs. Translation Machine Translation Localisation vs. Translation Localisation and translation are sometimes, but not always, the same. Documents may need to be localised, but not translated: A British company with an Irish office still needs to localise their documents: any reference to “our London office” will need to be changed to “our Dublin office”. Jimmy O’Regan Machine Translation and Translation Technology
  • 14. Outline Tools for Translators Terminology Free Language Data Localisation vs. Translation Machine Translation Localisation vs. Translation Localised translations can also have additional requirements: Jimmy O’Regan Machine Translation and Translation Technology
  • 15. Outline Tools for Translators Terminology Free Language Data Localisation vs. Translation Machine Translation Localisation vs. Translation Localised translations can also have additional requirements: gettext allows numbers to be specially treated: ‘‘%d file(s)’’ ugliness is not necessary. Jimmy O’Regan Machine Translation and Translation Technology
  • 16. Outline Tools for Translators Terminology Free Language Data Localisation vs. Translation Machine Translation Localisation vs. Translation Localised translations can also have additional requirements: gettext allows numbers to be specially treated: ‘‘%d file(s)’’ ugliness is not necessary. English and Spanish need two forms of words for number Jimmy O’Regan Machine Translation and Translation Technology
  • 17. Outline Tools for Translators Terminology Free Language Data Localisation vs. Translation Machine Translation Localisation vs. Translation Localised translations can also have additional requirements: gettext allows numbers to be specially treated: ‘‘%d file(s)’’ ugliness is not necessary. English and Spanish need two forms of words for number: single and plural Jimmy O’Regan Machine Translation and Translation Technology
  • 18. Outline Tools for Translators Terminology Free Language Data Localisation vs. Translation Machine Translation Localisation vs. Translation Localised translations can also have additional requirements: gettext allows numbers to be specially treated: ‘‘%d file(s)’’ ugliness is not necessary. English and Spanish need two forms of words for number: single and plural Polish needs three Jimmy O’Regan Machine Translation and Translation Technology
  • 19. Outline Tools for Translators Terminology Free Language Data Localisation vs. Translation Machine Translation Localisation vs. Translation Localised translations can also have additional requirements: gettext allows numbers to be specially treated: ‘‘%d file(s)’’ ugliness is not necessary. English and Spanish need two forms of words for number: single and plural Polish needs three: single, plural, and quantity (greater than 5) Jimmy O’Regan Machine Translation and Translation Technology
  • 20. Outline Tools for Translators Terminology Free Language Data Localisation vs. Translation Machine Translation Localisation vs. Translation Localised translations can also have additional requirements: gettext allows numbers to be specially treated: ‘‘%d file(s)’’ ugliness is not necessary. English and Spanish need two forms of words for number: single and plural Polish needs three: single, plural, and quantity (greater than 5) Slovenian needs four Jimmy O’Regan Machine Translation and Translation Technology
  • 21. Outline Tools for Translators Terminology Free Language Data Localisation vs. Translation Machine Translation Localisation vs. Translation Localised translations can also have additional requirements: gettext allows numbers to be specially treated: ‘‘%d file(s)’’ ugliness is not necessary. English and Spanish need two forms of words for number: single and plural Polish needs three: single, plural, and quantity (greater than 5) Slovenian needs four: single, dual, plural, and quantity. Jimmy O’Regan Machine Translation and Translation Technology
  • 22. Outline Tools for Translators Terminology Free Language Data Localisation vs. Translation Machine Translation Localisation Software localisation is a huge business area for proprietary software. Jimmy O’Regan Machine Translation and Translation Technology
  • 23. Outline Tools for Translators Terminology Free Language Data Localisation vs. Translation Machine Translation Localisation Software localisation is a huge business area for proprietary software. One that traditionally lags behind Open Source. Jimmy O’Regan Machine Translation and Translation Technology
  • 24. Outline Tools for Translators Terminology Free Language Data Localisation vs. Translation Machine Translation Localisation Software localisation is a huge business area for proprietary software. One that traditionally lags behind Open Source. That advantage is usually due to the efforts of a handful of dedicated volunteers for the majority of languages. Jimmy O’Regan Machine Translation and Translation Technology
  • 25. Outline Tools for Translators Terminology Free Language Data Localisation vs. Translation Machine Translation Localisation Software localisation is a huge business area for proprietary software. One that traditionally lags behind Open Source. That advantage is usually due to the efforts of a handful of dedicated volunteers for the majority of languages. But they’re catching up: Facebook is using Open Source-like efforts for their translations. Jimmy O’Regan Machine Translation and Translation Technology
  • 26. Outline Tools for Translators Terminology Free Language Data Localisation vs. Translation Machine Translation Localisation Tools Unsurprisingly then, localisation is very well supported by Open Source software: Pootle (http://translate.sourceforge.net/wiki/pootle/index - Web-based) Virtaal (http://translate.sourceforge.net/wiki/virtaal/index - cross platform) poEdit (http://www.poedit.net/ - cross platform) Lokalize (http://userbase.kde.org/Lokalize - KDE) GTranslator (http://gtranslator.sourceforge.net/ - GNOME) Jimmy O’Regan Machine Translation and Translation Technology
  • 27. Outline Tools for Translators Terminology Free Language Data Localisation vs. Translation Machine Translation Translation Tools Unfortunately, there’s only one real equivalent tool for general translation: OmegaT Jimmy O’Regan Machine Translation and Translation Technology
  • 28. Outline Tools for Translators Terminology Free Language Data Localisation vs. Translation Machine Translation Common Features All of these tools include these features: Jimmy O’Regan Machine Translation and Translation Technology
  • 29. Outline Tools for Translators Terminology Free Language Data Localisation vs. Translation Machine Translation Common Features All of these tools include these features: Translation Memory Jimmy O’Regan Machine Translation and Translation Technology
  • 30. Outline Tools for Translators Terminology Free Language Data Localisation vs. Translation Machine Translation Common Features All of these tools include these features: Translation Memory Automatically reuse previous translations Jimmy O’Regan Machine Translation and Translation Technology
  • 31. Outline Tools for Translators Terminology Free Language Data Localisation vs. Translation Machine Translation Common Features All of these tools include these features: Translation Memory Automatically reuse previous translations Fuzzy matching Jimmy O’Regan Machine Translation and Translation Technology
  • 32. Outline Tools for Translators Terminology Free Language Data Localisation vs. Translation Machine Translation Common Features All of these tools include these features: Translation Memory Automatically reuse previous translations Fuzzy matching Suggest translations similar to previously translated sentences Jimmy O’Regan Machine Translation and Translation Technology
  • 33. Outline Tools for Translators Terminology Free Language Data Localisation vs. Translation Machine Translation Common Features All of these tools include these features: Translation Memory Automatically reuse previous translations Fuzzy matching Suggest translations similar to previously translated sentences Terminology Management Jimmy O’Regan Machine Translation and Translation Technology
  • 34. Outline Tools for Translators Terminology Free Language Data Localisation vs. Translation Machine Translation Common Features All of these tools include these features: Translation Memory Automatically reuse previous translations Fuzzy matching Suggest translations similar to previously translated sentences Terminology Management Give suggestions from a per-project dictionary Jimmy O’Regan Machine Translation and Translation Technology
  • 35. Outline Tools for Translators Terminology Free Language Data Localisation vs. Translation Machine Translation Translate Toolkit http://translate.sourceforge.net/ A set of common tools for translation/localisation: Translation Memory server Format conversion Terminology management Quality control tools All brought to you by the wonderful people of translate.org.za Jimmy O’Regan Machine Translation and Translation Technology
  • 36. Outline Tools for Translators Free Language Data Machine Translation Free Language Software Needs Free Data Just like “Free Software Needs Free Documentation”, so too does Free Language Software need Free Data. Jimmy O’Regan Machine Translation and Translation Technology
  • 37. Outline Tools for Translators Free Language Data Machine Translation Free Language Software Needs Free Data Just like “Free Software Needs Free Documentation”, so too does Free Language Software need Free Data. Usually, this means we have to make it ourselves. Jimmy O’Regan Machine Translation and Translation Technology
  • 38. Outline Tools for Translators Free Language Data Machine Translation Free Language Software Needs Free Data Just like “Free Software Needs Free Documentation”, so too does Free Language Software need Free Data. Usually, this means we have to make it ourselves. Unfortunately, the community of developers of free language software, and thus free language data, is quite small. Jimmy O’Regan Machine Translation and Translation Technology
  • 39. Outline Tools for Translators Free Language Data Machine Translation The importance of spell checkers Spell checking data packages are the absolute bare minimum of support for a language with technology. Usually, the people who develop them tend to be involved in other areas of Free language software: Jimmy O’Regan Machine Translation and Translation Technology
  • 40. Outline Tools for Translators Free Language Data Machine Translation The importance of spell checkers Kevin Scannell Jimmy O’Regan Machine Translation and Translation Technology
  • 41. Outline Tools for Translators Free Language Data Machine Translation The importance of spell checkers Kevin Scannell Makes the Irish spell checking data. Jimmy O’Regan Machine Translation and Translation Technology
  • 42. Outline Tools for Translators Free Language Data Machine Translation The importance of spell checkers Kevin Scannell Makes the Irish spell checking data. And An Gramad´ir, an Irish o language grammar checker. Jimmy O’Regan Machine Translation and Translation Technology
  • 43. Outline Tools for Translators Free Language Data Machine Translation The importance of spell checkers Kevin Scannell Makes the Irish spell checking data. And An Gramad´ir, an Irish o language grammar checker. And created a WordNet/thesaurus for Irish. Jimmy O’Regan Machine Translation and Translation Technology
  • 44. Outline Tools for Translators Free Language Data Machine Translation The importance of spell checkers Kevin Scannell Makes the Irish spell checking data. And An Gramad´ir, an Irish o language grammar checker. And created a WordNet/thesaurus for Irish. And contributed the language data for Apertium’s Irish–Scots Gaelic translator. etc. Jimmy O’Regan Machine Translation and Translation Technology
  • 45. Outline Tools for Translators Free Language Data Machine Translation The importance of spell checkers Kevin Scannell Makes the Irish spell checking data. And An Gramad´ir, an Irish o language grammar checker. And created a WordNet/thesaurus for Irish. And contributed the language data for Apertium’s Irish–Scots Gaelic translator. etc. Marcin Milkowski Jimmy O’Regan Machine Translation and Translation Technology
  • 46. Outline Tools for Translators Free Language Data Machine Translation The importance of spell checkers Kevin Scannell Makes the Irish spell checking data. And An Gramad´ir, an Irish o language grammar checker. And created a WordNet/thesaurus for Irish. And contributed the language data for Apertium’s Irish–Scots Gaelic translator. etc. Marcin Milkowski Heavily involved in the Polish spell checking data Jimmy O’Regan Machine Translation and Translation Technology
  • 47. Outline Tools for Translators Free Language Data Machine Translation The importance of spell checkers Kevin Scannell Makes the Irish spell checking data. And An Gramad´ir, an Irish o language grammar checker. And created a WordNet/thesaurus for Irish. And contributed the language data for Apertium’s Irish–Scots Gaelic translator. etc. Marcin Milkowski Heavily involved in the Polish spell checking data And LanguageTool, a multilingual grammar checker that’s integrated with Open Office. Jimmy O’Regan Machine Translation and Translation Technology
  • 48. Outline Tools for Translators Free Language Data Machine Translation The importance of spell checkers Kevin Scannell Makes the Irish spell checking data. And An Gramad´ir, an Irish o language grammar checker. And created a WordNet/thesaurus for Irish. And contributed the language data for Apertium’s Irish–Scots Gaelic translator. etc. Marcin Milkowski Heavily involved in the Polish spell checking data And LanguageTool, a multilingual grammar checker that’s integrated with Open Office. And maintains the open Polish thesaurus. Jimmy O’Regan Machine Translation and Translation Technology
  • 49. Outline Tools for Translators Free Language Data Machine Translation The importance of spell checkers translate.org.za Make the spelling checkers for several South African languages, as well as many tools for translators already mentioned–Virtaal, Translate Toolkit, Pootle. Much of Apertium’s English–Africans translator was made directly by translate.org.za developers, as well as Apertium’s dbus interface, and a GUI. (Virtaal allows translators to use machine translations as a basis for their work). Jimmy O’Regan Machine Translation and Translation Technology
  • 50. Outline Tools for Translators Free Language Data Machine Translation The importance of spell checkers Spell checkers are set to become even more important. Hunspell, which is fast becoming the standard spell checker in Open Source projects, now includes morphological analysis and generation. This will greatly improve, among other things, terminology management in translator’s tools. At the moment, if you have “dog” in your terminology list, the translation tool will see that and only that: “dogs” will go unrecognised. With morphological analysis, the tool can know that “dogs” is not only related to “dog”, but is the plural of a noun: another assistance to the translator. Jimmy O’Regan Machine Translation and Translation Technology
  • 51. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Machine Translation Machine translation has a bad reputation. Jimmy O’Regan Machine Translation and Translation Technology
  • 52. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Mechanical Translation Mechanical translation, of any form, does tend, inevitably, to have mistakes. Jimmy O’Regan Machine Translation and Translation Technology
  • 53. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Mechanical Translation Mechanical translation, of any form, does tend, inevitably, to have mistakes. For centuries, paintings of Moses portrayed him as having horns. Jimmy O’Regan Machine Translation and Translation Technology
  • 54. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Mechanical Translation Mechanical translation, of any form, does tend, inevitably, to have mistakes. For centuries, paintings of Moses portrayed him as having horns. A translator of the Latin Vulgate added the wrong vowel: he thought that Moses had horns, not that his face was glowing. Jimmy O’Regan Machine Translation and Translation Technology
  • 55. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Mechanical Translation Mechanical translation, of any form, does tend, inevitably, to have mistakes. For centuries, paintings of Moses portrayed him as having horns. A translator of the Latin Vulgate added the wrong vowel: he thought that Moses had horns, not that his face was glowing. And people were killed for wishing to correct that, and other mistakes. Jimmy O’Regan Machine Translation and Translation Technology
  • 56. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Mechanical Translation Mechanical translation, of any form, does tend, inevitably, to have mistakes. For centuries, paintings of Moses portrayed him as having horns. A translator of the Latin Vulgate added the wrong vowel: he thought that Moses had horns, not that his face was glowing. And people were killed for wishing to correct that, and other mistakes. Proofread translations, always. Jimmy O’Regan Machine Translation and Translation Technology
  • 57. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Types of Machine Translation Dictionary lookup Jimmy O’Regan Machine Translation and Translation Technology
  • 58. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Types of Machine Translation Dictionary lookup – the most basic form of MT Jimmy O’Regan Machine Translation and Translation Technology
  • 59. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Types of Machine Translation Dictionary lookup – the most basic form of MT Translation Memory Jimmy O’Regan Machine Translation and Translation Technology
  • 60. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Types of Machine Translation Dictionary lookup – the most basic form of MT Translation Memory – also, a basic form of MT Jimmy O’Regan Machine Translation and Translation Technology
  • 61. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Types of Machine Translation Dictionary lookup – the most basic form of MT Translation Memory – also, a basic form of MT Example Based Machine Translation Jimmy O’Regan Machine Translation and Translation Technology
  • 62. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Types of Machine Translation Dictionary lookup – the most basic form of MT Translation Memory – also, a basic form of MT Example Based Machine Translation – considered the most accurate form of MT, but there are few if any examples “in the wild”. Jimmy O’Regan Machine Translation and Translation Technology
  • 63. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Types of Machine Translation Dictionary lookup – the most basic form of MT Translation Memory – also, a basic form of MT Example Based Machine Translation – considered the most accurate form of MT, but there are few if any examples “in the wild”. Statistical Machine Translation Jimmy O’Regan Machine Translation and Translation Technology
  • 64. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Types of Machine Translation Dictionary lookup – the most basic form of MT Translation Memory – also, a basic form of MT Example Based Machine Translation – considered the most accurate form of MT, but there are few if any examples “in the wild”. Statistical Machine Translation – currently the darling of research and the basis of Google Translate. Jimmy O’Regan Machine Translation and Translation Technology
  • 65. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Types of Machine Translation Dictionary lookup – the most basic form of MT Translation Memory – also, a basic form of MT Example Based Machine Translation – considered the most accurate form of MT, but there are few if any examples “in the wild”. Statistical Machine Translation – currently the darling of research and the basis of Google Translate. Solves a lot of old problems, but introduces new one. Jimmy O’Regan Machine Translation and Translation Technology
  • 66. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Types of Machine Translation Dictionary lookup – the most basic form of MT Translation Memory – also, a basic form of MT Example Based Machine Translation – considered the most accurate form of MT, but there are few if any examples “in the wild”. Statistical Machine Translation – currently the darling of research and the basis of Google Translate. Solves a lot of old problems, but introduces new one. And breaks a lot of things that “used to work”. Jimmy O’Regan Machine Translation and Translation Technology
  • 67. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Types of Machine Translation Dictionary lookup – the most basic form of MT Translation Memory – also, a basic form of MT Example Based Machine Translation – considered the most accurate form of MT, but there are few if any examples “in the wild”. Statistical Machine Translation – currently the darling of research and the basis of Google Translate. Solves a lot of old problems, but introduces new one. And breaks a lot of things that “used to work”. Rule Based Machine Translation Jimmy O’Regan Machine Translation and Translation Technology
  • 68. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Types of Machine Translation Dictionary lookup – the most basic form of MT Translation Memory – also, a basic form of MT Example Based Machine Translation – considered the most accurate form of MT, but there are few if any examples “in the wild”. Statistical Machine Translation – currently the darling of research and the basis of Google Translate. Solves a lot of old problems, but introduces new one. And breaks a lot of things that “used to work”. Rule Based Machine Translation – The oldest kind of MT, dating back to the 1950s. Jimmy O’Regan Machine Translation and Translation Technology
  • 69. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Types of Machine Translation Dictionary lookup – the most basic form of MT Translation Memory – also, a basic form of MT Example Based Machine Translation – considered the most accurate form of MT, but there are few if any examples “in the wild”. Statistical Machine Translation – currently the darling of research and the basis of Google Translate. Solves a lot of old problems, but introduces new one. And breaks a lot of things that “used to work”. Rule Based Machine Translation – The oldest kind of MT, dating back to the 1950s. – The kind I work with, so obviously it’s the best Jimmy O’Regan Machine Translation and Translation Technology
  • 70. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Types of Machine Translation Dictionary lookup – the most basic form of MT Translation Memory – also, a basic form of MT Example Based Machine Translation – considered the most accurate form of MT, but there are few if any examples “in the wild”. Statistical Machine Translation – currently the darling of research and the basis of Google Translate. Solves a lot of old problems, but introduces new one. And breaks a lot of things that “used to work”. Rule Based Machine Translation – The oldest kind of MT, dating back to the 1950s. – The kind I work with, so obviously it’s the best!!! Jimmy O’Regan Machine Translation and Translation Technology
  • 71. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Is Machine Translation a Translator’s Tool? Yes. Jimmy O’Regan Machine Translation and Translation Technology
  • 72. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Is Machine Translation a Translator’s Tool? Yes. That might be hard to accept. Jimmy O’Regan Machine Translation and Translation Technology
  • 73. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Is Machine Translation a Translator’s Tool? Yes. That might be hard to accept. Particularly if you only speak English. Jimmy O’Regan Machine Translation and Translation Technology
  • 74. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Is Machine Translation a Translator’s Tool? Yes. That might be hard to accept. Particularly if you only speak English. But for closely-related, similar languages, machine translation can be as effective and accurate as a spelling checker. Jimmy O’Regan Machine Translation and Translation Technology
  • 75. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Uses of Machine Translation Jimmy O’Regan Machine Translation and Translation Technology
  • 76. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Uses of Machine Translation 1 Assimilation Jimmy O’Regan Machine Translation and Translation Technology
  • 77. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Uses of Machine Translation 1 Assimilation Understanding a text Jimmy O’Regan Machine Translation and Translation Technology
  • 78. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Uses of Machine Translation 1 Assimilation Understanding a text 2 Dissemination Jimmy O’Regan Machine Translation and Translation Technology
  • 79. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Uses of Machine Translation 1 Assimilation Understanding a text 2 Dissemination Preparing a text for translation. Jimmy O’Regan Machine Translation and Translation Technology
  • 80. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Uses of Machine Translation 1 Assimilation Understanding a text 2 Dissemination Preparing a text for translation. That is; for preparing a rough draft for a translator. Who then edits the text. Jimmy O’Regan Machine Translation and Translation Technology
  • 81. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium “Phrase-Based” SMT Most current research (and commercial use) of Statistical MT uses “phrase-based” SMT. Jimmy O’Regan Machine Translation and Translation Technology
  • 82. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium “Phrase-Based” SMT Most current research (and commercial use) of Statistical MT uses “phrase-based” SMT. The problem is it’s not phrase-based. Jimmy O’Regan Machine Translation and Translation Technology
  • 83. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium “Phrase-Based” SMT Most current research (and commercial use) of Statistical MT uses “phrase-based” SMT. The problem is it’s not phrase-based. It’s N-Gram based. Jimmy O’Regan Machine Translation and Translation Technology
  • 84. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium N-Grams An n-gram is a collection of “n” amounts of tokens Jimmy O’Regan Machine Translation and Translation Technology
  • 85. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium N-Grams An n-gram is a collection of “n” amounts of tokens For text, these are usually (not always!) words Jimmy O’Regan Machine Translation and Translation Technology
  • 86. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium N-Grams An n-gram is a collection of “n” amounts of tokens For text, these are usually (not always!) words ...punctuation is counted as a “word”. Jimmy O’Regan Machine Translation and Translation Technology
  • 87. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Unigrams “This is John’s dog.” Jimmy O’Regan Machine Translation and Translation Technology
  • 88. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Unigrams “This is John’s dog.” Example This is John ’s dog . Jimmy O’Regan Machine Translation and Translation Technology
  • 89. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Bigrams “This is John’s dog.” Jimmy O’Regan Machine Translation and Translation Technology
  • 90. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Bigrams “This is John’s dog.” Example This is is John John ’s ’s dog dog . Jimmy O’Regan Machine Translation and Translation Technology
  • 91. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Trigrams “This is John’s dog.” Jimmy O’Regan Machine Translation and Translation Technology
  • 92. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Trigrams “This is John’s dog.” Example This is John is John ’s John ’s dog ’s dog . Jimmy O’Regan Machine Translation and Translation Technology
  • 93. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium N-Gram Language Models An n-gram language model is a collection of n-grams Jimmy O’Regan Machine Translation and Translation Technology
  • 94. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium N-Gram Language Models An n-gram language model is a collection of n-grams for n..1: a trigram model includes trigrams, bigrams, and unigrams. Jimmy O’Regan Machine Translation and Translation Technology
  • 95. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium N-Gram Language Models An n-gram language model is a collection of n-grams for n..1: a trigram model includes trigrams, bigrams, and unigrams. Each n-gram is listed along with its frequency Jimmy O’Regan Machine Translation and Translation Technology
  • 96. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium N-Gram Language Models An n-gram language model is a collection of n-grams for n..1: a trigram model includes trigrams, bigrams, and unigrams. Each n-gram is listed along with its frequency (According to a particular corpus) Jimmy O’Regan Machine Translation and Translation Technology
  • 97. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium N-Gram Language Models An n-gram language model is a collection of n-grams for n..1: a trigram model includes trigrams, bigrams, and unigrams. Each n-gram is listed along with its frequency (According to a particular corpus) But, most importantly... Jimmy O’Regan Machine Translation and Translation Technology
  • 98. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium N-Gram Language Models N-grams overlap. Jimmy O’Regan Machine Translation and Translation Technology
  • 99. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium N-Gram Language Models When a sequence of words is queried against a language model, the language model software computes the combined likelihood of 1..n combinations in that sequence. Jimmy O’Regan Machine Translation and Translation Technology
  • 100. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium N-Gram Language Models Possibly the first use of n-gram language models was in Automatic Speech Recognition. Jimmy O’Regan Machine Translation and Translation Technology
  • 101. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium N-Gram Language Models For basic uses of ASR, such as call centres, a custom grammar is used. Jimmy O’Regan Machine Translation and Translation Technology
  • 102. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium N-Gram Language Models For basic uses of ASR, such as call centres, a custom grammar is used. In a mobile phone, such a grammar could look like this: Example CALLWORD : phone call dial ZEROWORD : zero oh NUMBER : one two three four five six seven eight nine ZEROWORD NUMBERS : NUMBER* NUMBER COMMAND : CALLWORD NUMBERS Jimmy O’Regan Machine Translation and Translation Technology
  • 103. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium N-Gram Language Models However, for continuous dictation, building such grammars is an almost infinite task. Jimmy O’Regan Machine Translation and Translation Technology
  • 104. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium N-Gram Language Models However, for continuous dictation, building such grammars is an almost infinite task. Instead of defining long, complicated grammars that define, for example, when the sound /mi:t/ represents “meet” and when it represents “meat”, n-gram language models allow the correct sound to be chosen based on the context of the surrounding words. Jimmy O’Regan Machine Translation and Translation Technology
  • 105. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium N-Gram Language Models This was an obvious progression for ASR, which uses statistical modelling to choose in context which sound is most likely, based on the surrounding sounds. Jimmy O’Regan Machine Translation and Translation Technology
  • 106. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium N-Gram Language Models Now, language models are being used in all areas of language technology. The problem is: useful language models are huge, and can be computationally costly to use Jimmy O’Regan Machine Translation and Translation Technology
  • 107. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium N-Gram Language Models Now, language models are being used in all areas of language technology. The problem is: useful language models are huge, and can be computationally costly to use unless you have a data centre. Jimmy O’Regan Machine Translation and Translation Technology
  • 108. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium N-Gram Language Models Now, language models are being used in all areas of language technology. The problem is: useful language models are huge, and can be computationally costly to use unless you have a data centre. Google, for example, use language models for everything : Spell checking (Search, Google Wave, GMail) Search queries (“Did you mean?”) Machine translation Jimmy O’Regan Machine Translation and Translation Technology
  • 109. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium N-Gram-Based SMT Basic Statistical MT uses a probabilistic dictionary: each word pair has a probability assigned. Jimmy O’Regan Machine Translation and Translation Technology
  • 110. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium N-Gram-Based SMT Basic Statistical MT uses a probabilistic dictionary: each word pair has a probability assigned. The interesting part is how they get those dictionaries. Jimmy O’Regan Machine Translation and Translation Technology
  • 111. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium N-Gram-Based SMT Basic Statistical MT uses a probabilistic dictionary: each word pair has a probability assigned. The interesting part is how they get those dictionaries. A program, usually GIZA++ (Open Source), reads two pairs of text: the source language, and the target language. Jimmy O’Regan Machine Translation and Translation Technology
  • 112. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium N-Gram-Based SMT For each word in each sentence of the source language, the probable translation is considered to be every word in the target; as more words are seen, the translations are re-evaluated: the next time word 1 is used, perhaps “possible translation 1” is present, but “possible translation 2” is absent from the sentence: the probability of the former is increased; the latter, decreased. And so on, over the course of the text, the probabilities of each word are re-evaluated; then the whole text is processed again, and again, until a reasonable level of probability remains. The result is a probabilistic dictionary. Jimmy O’Regan Machine Translation and Translation Technology
  • 113. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium N-Gram-Based SMT N-grams come into the picture in two ways: Jimmy O’Regan Machine Translation and Translation Technology
  • 114. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium N-Gram-Based SMT N-grams come into the picture in two ways: 1 Evaluating multiple probable translations Jimmy O’Regan Machine Translation and Translation Technology
  • 115. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium N-Gram-Based SMT N-grams come into the picture in two ways: 1 Evaluating multiple probable translations Similarly to Speech Recognition, each choice is evaluated against a language model Jimmy O’Regan Machine Translation and Translation Technology
  • 116. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium N-Gram-Based SMT N-grams come into the picture in two ways: 1 Evaluating multiple probable translations Similarly to Speech Recognition, each choice is evaluated against a language model 2 N-grams as “words” Jimmy O’Regan Machine Translation and Translation Technology
  • 117. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium N-Gram-Based SMT N-grams come into the picture in two ways: 1 Evaluating multiple probable translations Similarly to Speech Recognition, each choice is evaluated against a language model 2 N-grams as “words” As well as considering individual words, each n-gram is considered as a possible “phrase”, and treated as an individual word. Jimmy O’Regan Machine Translation and Translation Technology
  • 118. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium N-Gram-Based SMT N-grams come into the picture in two ways: 1 Evaluating multiple probable translations Similarly to Speech Recognition, each choice is evaluated against a language model 2 N-grams as “words” As well as considering individual words, each n-gram is considered as a possible “phrase”, and treated as an individual word. This helps to cut down on ambiguous terms: “basketball coach” vs. “coach driver”. Jimmy O’Regan Machine Translation and Translation Technology
  • 119. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Moses Moses is an Open Source SMT system. Moses has a distinct advantage over several other SMT systems: 1 It’s Open Source Jimmy O’Regan Machine Translation and Translation Technology
  • 120. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Moses Moses is an Open Source SMT system. Moses has a distinct advantage over several other SMT systems: 1 It’s Open Source Actively developed, and supported by a large community Jimmy O’Regan Machine Translation and Translation Technology
  • 121. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Moses Moses is an Open Source SMT system. Moses has a distinct advantage over several other SMT systems: 1 It’s Open Source Actively developed, and supported by a large community 2 Factored Models Jimmy O’Regan Machine Translation and Translation Technology
  • 122. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Moses Moses is an Open Source SMT system. Moses has a distinct advantage over several other SMT systems: 1 It’s Open Source Actively developed, and supported by a large community 2 Factored Models Moses is able to make use of linguistic information. Jimmy O’Regan Machine Translation and Translation Technology
  • 123. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Moses Moses is an Open Source SMT system. Moses has a distinct advantage over several other SMT systems: 1 It’s Open Source Actively developed, and supported by a large community 2 Factored Models Moses is able to make use of linguistic information. 3 Open Data Jimmy O’Regan Machine Translation and Translation Technology
  • 124. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Moses Moses is an Open Source SMT system. Moses has a distinct advantage over several other SMT systems: 1 It’s Open Source Actively developed, and supported by a large community 2 Factored Models Moses is able to make use of linguistic information. 3 Open Data The Moses developers also recognise the importance of Free Linguistic Data, and have provided the EuroParl corpus so that others may build a statistical MT system using it. Also, the JRC Acquis – the corpus of EU legal text (and most of the data behind Google Translate’s support for most official EU languages) have prepared their corpus for use with Moses. Jimmy O’Regan Machine Translation and Translation Technology
  • 125. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Apertium Apertium is an Open Source Machine Translation platform. Jimmy O’Regan Machine Translation and Translation Technology
  • 126. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Apertium Apertium is an Open Source Machine Translation platform. Rule Based Jimmy O’Regan Machine Translation and Translation Technology
  • 127. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Apertium Apertium is an Open Source Machine Translation platform. Rule Based Statistical disambiguation Jimmy O’Regan Machine Translation and Translation Technology
  • 128. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Apertium Apertium is an Open Source Machine Translation platform. Rule Based Statistical disambiguation Follows the UNIX philosophy Jimmy O’Regan Machine Translation and Translation Technology
  • 129. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Apertium Apertium is an Open Source Machine Translation platform. Rule Based Statistical disambiguation Follows the UNIX philosophy: The system is a pipeline Jimmy O’Regan Machine Translation and Translation Technology
  • 130. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Apertium Apertium is an Open Source Machine Translation platform. Rule Based Statistical disambiguation Follows the UNIX philosophy: The system is a pipeline Each piece “does one thing, and does it well” Jimmy O’Regan Machine Translation and Translation Technology
  • 131. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Apertium Apertium is an Open Source Machine Translation platform. Rule Based Statistical disambiguation Follows the UNIX philosophy: The system is a pipeline Each piece “does one thing, and does it well” (Not quite: analysis and generation of words are performed by separate modes of the same program) Jimmy O’Regan Machine Translation and Translation Technology
  • 132. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Apertium Apertium is an Open Source Machine Translation platform. Rule Based Statistical disambiguation Follows the UNIX philosophy: The system is a pipeline Each piece “does one thing, and does it well” (Not quite: analysis and generation of words are performed by separate modes of the same program) Each component can be easily replaced Jimmy O’Regan Machine Translation and Translation Technology
  • 133. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Apertium Apertium is an Open Source Machine Translation platform. Rule Based Statistical disambiguation Follows the UNIX philosophy: The system is a pipeline Each piece “does one thing, and does it well” (Not quite: analysis and generation of words are performed by separate modes of the same program) Each component can be easily replaced The apertium program itself is just a shell script that calls the correct pipeline. Several statistics-based tools for building data Jimmy O’Regan Machine Translation and Translation Technology
  • 134. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Apertium Apertium is an Open Source Machine Translation platform. Rule Based Statistical disambiguation Follows the UNIX philosophy: The system is a pipeline Each piece “does one thing, and does it well” (Not quite: analysis and generation of words are performed by separate modes of the same program) Each component can be easily replaced The apertium program itself is just a shell script that calls the correct pipeline. Several statistics-based tools for building data: dictionaries Jimmy O’Regan Machine Translation and Translation Technology
  • 135. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Apertium Apertium is an Open Source Machine Translation platform. Rule Based Statistical disambiguation Follows the UNIX philosophy: The system is a pipeline Each piece “does one thing, and does it well” (Not quite: analysis and generation of words are performed by separate modes of the same program) Each component can be easily replaced The apertium program itself is just a shell script that calls the correct pipeline. Several statistics-based tools for building data: dictionaries, rules Doesn’t run on Windows Jimmy O’Regan Machine Translation and Translation Technology
  • 136. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Apertium Apertium is an Open Source Machine Translation platform. Rule Based Statistical disambiguation Follows the UNIX philosophy: The system is a pipeline Each piece “does one thing, and does it well” (Not quite: analysis and generation of words are performed by separate modes of the same program) Each component can be easily replaced The apertium program itself is just a shell script that calls the correct pipeline. Several statistics-based tools for building data: dictionaries, rules Doesn’t run on Windows yet. Jimmy O’Regan Machine Translation and Translation Technology
  • 137. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Apertium Apertium is an Open Source Machine Translation platform. Rule Based Statistical disambiguation Follows the UNIX philosophy: The system is a pipeline Each piece “does one thing, and does it well” (Not quite: analysis and generation of words are performed by separate modes of the same program) Each component can be easily replaced The apertium program itself is just a shell script that calls the correct pipeline. Several statistics-based tools for building data: dictionaries, rules Doesn’t run on Windows yet. Shame on you for cheering! Jimmy O’Regan Machine Translation and Translation Technology
  • 138. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Apertium Apertium is an Open Source Machine Translation platform. Rule Based Statistical disambiguation Follows the UNIX philosophy: The system is a pipeline Each piece “does one thing, and does it well” (Not quite: analysis and generation of words are performed by separate modes of the same program) Each component can be easily replaced The apertium program itself is just a shell script that calls the correct pipeline. Several statistics-based tools for building data: dictionaries, rules Doesn’t run on Windows yet. Shame on you for cheering! ;) Jimmy O’Regan Machine Translation and Translation Technology
  • 139. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Some Errors (Spanish – English) Jimmy O’Regan Machine Translation and Translation Technology
  • 140. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Some Errors (Spanish – English) Fondo Monetario Internacional Jimmy O’Regan Machine Translation and Translation Technology
  • 141. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Some Errors (Spanish – English) Fondo Monetario Internacional International Monetary bottom Jimmy O’Regan Machine Translation and Translation Technology
  • 142. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Some Errors (Spanish – English) Fondo Monetario Internacional International Monetary bottom (Catalan – English) Jimmy O’Regan Machine Translation and Translation Technology
  • 143. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Some Errors (Spanish – English) Fondo Monetario Internacional International Monetary bottom (Catalan – English) Fidel Castro Jimmy O’Regan Machine Translation and Translation Technology
  • 144. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Some Errors (Spanish – English) Fondo Monetario Internacional International Monetary bottom (Catalan – English) Fidel Castro Faithful Castrate Jimmy O’Regan Machine Translation and Translation Technology
  • 145. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Apertium Apertium was born from a set of translators developed in Universitat d’Alacant, as part of the OpenTrad project. Jimmy O’Regan Machine Translation and Translation Technology
  • 146. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Apertium Apertium was born from a set of translators developed in Universitat d’Alacant, as part of the OpenTrad project. Originally designed to translate between the Romance languages of Spain, it has been expanded over time to support more distant languages: Jimmy O’Regan Machine Translation and Translation Technology
  • 147. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Apertium Apertium was born from a set of translators developed in Universitat d’Alacant, as part of the OpenTrad project. Originally designed to translate between the Romance languages of Spain, it has been expanded over time to support more distant languages: First English–Catalan Jimmy O’Regan Machine Translation and Translation Technology
  • 148. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Apertium Apertium was born from a set of translators developed in Universitat d’Alacant, as part of the OpenTrad project. Originally designed to translate between the Romance languages of Spain, it has been expanded over time to support more distant languages: First English–Catalan More recently, Basque to Spanish Jimmy O’Regan Machine Translation and Translation Technology
  • 149. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Community As part of the OpenTrad project, Apertium had a community of developers, but limited to university and business developments. Thanks mostly to Francis Tyers, Apertium has in recent years begun to also acquire a community of volunteer contributors. Jimmy O’Regan Machine Translation and Translation Technology
  • 150. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Community As part of the OpenTrad project, Apertium had a community of developers, but limited to university and business developments. Thanks mostly to Francis Tyers, Apertium has in recent years begun to also acquire a community of volunteer contributors. The first release from the volunteer community was our Welsh to English translator (mostly designed by Kevin Donnelly – who also maintains the Welsh spell checking data). Jimmy O’Regan Machine Translation and Translation Technology
  • 151. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Community As part of the OpenTrad project, Apertium had a community of developers, but limited to university and business developments. Thanks mostly to Francis Tyers, Apertium has in recent years begun to also acquire a community of volunteer contributors. The first release from the volunteer community was our Welsh to English translator (mostly designed by Kevin Donnelly – who also maintains the Welsh spell checking data). This summer, we took part in Google’s Summer of Code programme, with 8 successful students. One of the translators developed during GSoC, for Norwegian Bokm˚l–Nynorsk, has a (within a month of release) been used to translate 30 articles on the Nynorsk Wikipedia. Jimmy O’Regan Machine Translation and Translation Technology
  • 152. Outline Is Machine Translation a Translator’s Tool? Tools for Translators N-Grams Free Language Data Moses Machine Translation Apertium Language Pairs Supported Spanish – Catalan Spanish – Romanian French – Catalan Occitan – Catalan English – Galician Occitan – Spanish Spanish – Portuguese English – Catalan English – Spanish English – Esperanto Spanish – Galician French – Spanish Esperanto – Spanish Welsh – English Breton – French Esperanto – Catalan Portuguese – Catalan Portuguese – Galician Basque – Spanish Nynorsk – Bokm˚la Jimmy O’Regan Machine Translation and Translation Technology