1. Outline
Tools for Translators
Free Language Data
Machine Translation
Machine Translation and Translation Technology
Jimmy O’Regan
The Apertium Project
OSS Bar Camp, 19 September 2009
Jimmy O’Regan Machine Translation and Translation Technology
2. Outline
Tools for Translators
Free Language Data
Machine Translation
1 Free Language Data
Jimmy O’Regan Machine Translation and Translation Technology
3. Outline
Tools for Translators Terminology
Free Language Data Localisation vs. Translation
Machine Translation
Tools for Translators
Jimmy O’Regan Machine Translation and Translation Technology
4. Outline
Tools for Translators Terminology
Free Language Data Localisation vs. Translation
Machine Translation
Some Terminology
Internationalisation
Giving software the capability to display text in another
language
Jimmy O’Regan Machine Translation and Translation Technology
5. Outline
Tools for Translators Terminology
Free Language Data Localisation vs. Translation
Machine Translation
Some Terminology
Internationalisation
Giving software the capability to display text in another
language
In Open Source, this generally means adding support for
gettext.
Jimmy O’Regan Machine Translation and Translation Technology
6. Outline
Tools for Translators Terminology
Free Language Data Localisation vs. Translation
Machine Translation
Some Terminology
Internationalisation
Giving software the capability to display text in another
language
In Open Source, this generally means adding support for
gettext.
Localisation
Jimmy O’Regan Machine Translation and Translation Technology
7. Outline
Tools for Translators Terminology
Free Language Data Localisation vs. Translation
Machine Translation
Some Terminology
Internationalisation
Giving software the capability to display text in another
language
In Open Source, this generally means adding support for
gettext.
Localisation
Customising the messages displayed to the user to appear in
the manner most appropriate for them.
Jimmy O’Regan Machine Translation and Translation Technology
8. Outline
Tools for Translators Terminology
Free Language Data Localisation vs. Translation
Machine Translation
Some Terminology
Internationalisation
Giving software the capability to display text in another
language
In Open Source, this generally means adding support for
gettext.
Localisation
Customising the messages displayed to the user to appear in
the manner most appropriate for them.
In their language, or their dialect.
Jimmy O’Regan Machine Translation and Translation Technology
9. Outline
Tools for Translators Terminology
Free Language Data Localisation vs. Translation
Machine Translation
Some Terminology
Internationalisation
Giving software the capability to display text in another
language
In Open Source, this generally means adding support for
gettext.
Localisation
Customising the messages displayed to the user to appear in
the manner most appropriate for them.
In their language, or their dialect.
Translation
Jimmy O’Regan Machine Translation and Translation Technology
10. Outline
Tools for Translators Terminology
Free Language Data Localisation vs. Translation
Machine Translation
Some Terminology
Internationalisation
Giving software the capability to display text in another
language
In Open Source, this generally means adding support for
gettext.
Localisation
Customising the messages displayed to the user to appear in
the manner most appropriate for them.
In their language, or their dialect.
Translation
Converting text from one language to another
Jimmy O’Regan Machine Translation and Translation Technology
11. Outline
Tools for Translators Terminology
Free Language Data Localisation vs. Translation
Machine Translation
Localisation vs. Translation
Localisation and translation are sometimes, but not always, the
same.
Jimmy O’Regan Machine Translation and Translation Technology
12. Outline
Tools for Translators Terminology
Free Language Data Localisation vs. Translation
Machine Translation
Localisation vs. Translation
Localisation and translation are sometimes, but not always, the
same.
Documents may need to be localised, but not translated:
Jimmy O’Regan Machine Translation and Translation Technology
13. Outline
Tools for Translators Terminology
Free Language Data Localisation vs. Translation
Machine Translation
Localisation vs. Translation
Localisation and translation are sometimes, but not always, the
same.
Documents may need to be localised, but not translated:
A British company with an Irish office still needs to localise their
documents: any reference to “our London office” will need to be
changed to “our Dublin office”.
Jimmy O’Regan Machine Translation and Translation Technology
14. Outline
Tools for Translators Terminology
Free Language Data Localisation vs. Translation
Machine Translation
Localisation vs. Translation
Localised translations can also have additional requirements:
Jimmy O’Regan Machine Translation and Translation Technology
15. Outline
Tools for Translators Terminology
Free Language Data Localisation vs. Translation
Machine Translation
Localisation vs. Translation
Localised translations can also have additional requirements:
gettext allows numbers to be specially treated: ‘‘%d
file(s)’’ ugliness is not necessary.
Jimmy O’Regan Machine Translation and Translation Technology
16. Outline
Tools for Translators Terminology
Free Language Data Localisation vs. Translation
Machine Translation
Localisation vs. Translation
Localised translations can also have additional requirements:
gettext allows numbers to be specially treated: ‘‘%d
file(s)’’ ugliness is not necessary.
English and Spanish need two forms of words for number
Jimmy O’Regan Machine Translation and Translation Technology
17. Outline
Tools for Translators Terminology
Free Language Data Localisation vs. Translation
Machine Translation
Localisation vs. Translation
Localised translations can also have additional requirements:
gettext allows numbers to be specially treated: ‘‘%d
file(s)’’ ugliness is not necessary.
English and Spanish need two forms of words for number: single
and plural
Jimmy O’Regan Machine Translation and Translation Technology
18. Outline
Tools for Translators Terminology
Free Language Data Localisation vs. Translation
Machine Translation
Localisation vs. Translation
Localised translations can also have additional requirements:
gettext allows numbers to be specially treated: ‘‘%d
file(s)’’ ugliness is not necessary.
English and Spanish need two forms of words for number: single
and plural
Polish needs three
Jimmy O’Regan Machine Translation and Translation Technology
19. Outline
Tools for Translators Terminology
Free Language Data Localisation vs. Translation
Machine Translation
Localisation vs. Translation
Localised translations can also have additional requirements:
gettext allows numbers to be specially treated: ‘‘%d
file(s)’’ ugliness is not necessary.
English and Spanish need two forms of words for number: single
and plural
Polish needs three: single, plural, and quantity (greater than 5)
Jimmy O’Regan Machine Translation and Translation Technology
20. Outline
Tools for Translators Terminology
Free Language Data Localisation vs. Translation
Machine Translation
Localisation vs. Translation
Localised translations can also have additional requirements:
gettext allows numbers to be specially treated: ‘‘%d
file(s)’’ ugliness is not necessary.
English and Spanish need two forms of words for number: single
and plural
Polish needs three: single, plural, and quantity (greater than 5)
Slovenian needs four
Jimmy O’Regan Machine Translation and Translation Technology
21. Outline
Tools for Translators Terminology
Free Language Data Localisation vs. Translation
Machine Translation
Localisation vs. Translation
Localised translations can also have additional requirements:
gettext allows numbers to be specially treated: ‘‘%d
file(s)’’ ugliness is not necessary.
English and Spanish need two forms of words for number: single
and plural
Polish needs three: single, plural, and quantity (greater than 5)
Slovenian needs four: single, dual, plural, and quantity.
Jimmy O’Regan Machine Translation and Translation Technology
22. Outline
Tools for Translators Terminology
Free Language Data Localisation vs. Translation
Machine Translation
Localisation
Software localisation is a huge business area for proprietary
software.
Jimmy O’Regan Machine Translation and Translation Technology
23. Outline
Tools for Translators Terminology
Free Language Data Localisation vs. Translation
Machine Translation
Localisation
Software localisation is a huge business area for proprietary
software.
One that traditionally lags behind Open Source.
Jimmy O’Regan Machine Translation and Translation Technology
24. Outline
Tools for Translators Terminology
Free Language Data Localisation vs. Translation
Machine Translation
Localisation
Software localisation is a huge business area for proprietary
software.
One that traditionally lags behind Open Source.
That advantage is usually due to the efforts of a handful of
dedicated volunteers for the majority of languages.
Jimmy O’Regan Machine Translation and Translation Technology
25. Outline
Tools for Translators Terminology
Free Language Data Localisation vs. Translation
Machine Translation
Localisation
Software localisation is a huge business area for proprietary
software.
One that traditionally lags behind Open Source.
That advantage is usually due to the efforts of a handful of
dedicated volunteers for the majority of languages.
But they’re catching up: Facebook is using Open Source-like
efforts for their translations.
Jimmy O’Regan Machine Translation and Translation Technology
26. Outline
Tools for Translators Terminology
Free Language Data Localisation vs. Translation
Machine Translation
Localisation Tools
Unsurprisingly then, localisation is very well supported by Open
Source software:
Pootle (http://translate.sourceforge.net/wiki/pootle/index -
Web-based)
Virtaal (http://translate.sourceforge.net/wiki/virtaal/index -
cross platform)
poEdit (http://www.poedit.net/ - cross platform)
Lokalize (http://userbase.kde.org/Lokalize - KDE)
GTranslator (http://gtranslator.sourceforge.net/ - GNOME)
Jimmy O’Regan Machine Translation and Translation Technology
27. Outline
Tools for Translators Terminology
Free Language Data Localisation vs. Translation
Machine Translation
Translation Tools
Unfortunately, there’s only one real equivalent tool for general
translation: OmegaT
Jimmy O’Regan Machine Translation and Translation Technology
28. Outline
Tools for Translators Terminology
Free Language Data Localisation vs. Translation
Machine Translation
Common Features
All of these tools include these features:
Jimmy O’Regan Machine Translation and Translation Technology
29. Outline
Tools for Translators Terminology
Free Language Data Localisation vs. Translation
Machine Translation
Common Features
All of these tools include these features:
Translation Memory
Jimmy O’Regan Machine Translation and Translation Technology
30. Outline
Tools for Translators Terminology
Free Language Data Localisation vs. Translation
Machine Translation
Common Features
All of these tools include these features:
Translation Memory
Automatically reuse previous translations
Jimmy O’Regan Machine Translation and Translation Technology
31. Outline
Tools for Translators Terminology
Free Language Data Localisation vs. Translation
Machine Translation
Common Features
All of these tools include these features:
Translation Memory
Automatically reuse previous translations
Fuzzy matching
Jimmy O’Regan Machine Translation and Translation Technology
32. Outline
Tools for Translators Terminology
Free Language Data Localisation vs. Translation
Machine Translation
Common Features
All of these tools include these features:
Translation Memory
Automatically reuse previous translations
Fuzzy matching
Suggest translations similar to previously translated sentences
Jimmy O’Regan Machine Translation and Translation Technology
33. Outline
Tools for Translators Terminology
Free Language Data Localisation vs. Translation
Machine Translation
Common Features
All of these tools include these features:
Translation Memory
Automatically reuse previous translations
Fuzzy matching
Suggest translations similar to previously translated sentences
Terminology Management
Jimmy O’Regan Machine Translation and Translation Technology
34. Outline
Tools for Translators Terminology
Free Language Data Localisation vs. Translation
Machine Translation
Common Features
All of these tools include these features:
Translation Memory
Automatically reuse previous translations
Fuzzy matching
Suggest translations similar to previously translated sentences
Terminology Management
Give suggestions from a per-project dictionary
Jimmy O’Regan Machine Translation and Translation Technology
35. Outline
Tools for Translators Terminology
Free Language Data Localisation vs. Translation
Machine Translation
Translate Toolkit
http://translate.sourceforge.net/
A set of common tools for translation/localisation:
Translation Memory server
Format conversion
Terminology management
Quality control tools
All brought to you by the wonderful people of translate.org.za
Jimmy O’Regan Machine Translation and Translation Technology
36. Outline
Tools for Translators
Free Language Data
Machine Translation
Free Language Software Needs Free Data
Just like “Free Software Needs Free Documentation”, so too does
Free Language Software need Free Data.
Jimmy O’Regan Machine Translation and Translation Technology
37. Outline
Tools for Translators
Free Language Data
Machine Translation
Free Language Software Needs Free Data
Just like “Free Software Needs Free Documentation”, so too does
Free Language Software need Free Data.
Usually, this means we have to make it ourselves.
Jimmy O’Regan Machine Translation and Translation Technology
38. Outline
Tools for Translators
Free Language Data
Machine Translation
Free Language Software Needs Free Data
Just like “Free Software Needs Free Documentation”, so too does
Free Language Software need Free Data.
Usually, this means we have to make it ourselves.
Unfortunately, the community of developers of free language
software, and thus free language data, is quite small.
Jimmy O’Regan Machine Translation and Translation Technology
39. Outline
Tools for Translators
Free Language Data
Machine Translation
The importance of spell checkers
Spell checking data packages are the absolute bare minimum of
support for a language with technology.
Usually, the people who develop them tend to be involved in other
areas of Free language software:
Jimmy O’Regan Machine Translation and Translation Technology
40. Outline
Tools for Translators
Free Language Data
Machine Translation
The importance of spell checkers
Kevin Scannell
Jimmy O’Regan Machine Translation and Translation Technology
41. Outline
Tools for Translators
Free Language Data
Machine Translation
The importance of spell checkers
Kevin Scannell
Makes the Irish spell checking data.
Jimmy O’Regan Machine Translation and Translation Technology
42. Outline
Tools for Translators
Free Language Data
Machine Translation
The importance of spell checkers
Kevin Scannell
Makes the Irish spell checking data. And An Gramad´ir, an Irish
o
language grammar checker.
Jimmy O’Regan Machine Translation and Translation Technology
43. Outline
Tools for Translators
Free Language Data
Machine Translation
The importance of spell checkers
Kevin Scannell
Makes the Irish spell checking data. And An Gramad´ir, an Irish
o
language grammar checker. And created a WordNet/thesaurus for
Irish.
Jimmy O’Regan Machine Translation and Translation Technology
44. Outline
Tools for Translators
Free Language Data
Machine Translation
The importance of spell checkers
Kevin Scannell
Makes the Irish spell checking data. And An Gramad´ir, an Irish
o
language grammar checker. And created a WordNet/thesaurus for
Irish. And contributed the language data for Apertium’s
Irish–Scots Gaelic translator. etc.
Jimmy O’Regan Machine Translation and Translation Technology
45. Outline
Tools for Translators
Free Language Data
Machine Translation
The importance of spell checkers
Kevin Scannell
Makes the Irish spell checking data. And An Gramad´ir, an Irish
o
language grammar checker. And created a WordNet/thesaurus for
Irish. And contributed the language data for Apertium’s
Irish–Scots Gaelic translator. etc.
Marcin Milkowski
Jimmy O’Regan Machine Translation and Translation Technology
46. Outline
Tools for Translators
Free Language Data
Machine Translation
The importance of spell checkers
Kevin Scannell
Makes the Irish spell checking data. And An Gramad´ir, an Irish
o
language grammar checker. And created a WordNet/thesaurus for
Irish. And contributed the language data for Apertium’s
Irish–Scots Gaelic translator. etc.
Marcin Milkowski
Heavily involved in the Polish spell checking data
Jimmy O’Regan Machine Translation and Translation Technology
47. Outline
Tools for Translators
Free Language Data
Machine Translation
The importance of spell checkers
Kevin Scannell
Makes the Irish spell checking data. And An Gramad´ir, an Irish
o
language grammar checker. And created a WordNet/thesaurus for
Irish. And contributed the language data for Apertium’s
Irish–Scots Gaelic translator. etc.
Marcin Milkowski
Heavily involved in the Polish spell checking data And
LanguageTool, a multilingual grammar checker that’s integrated
with Open Office.
Jimmy O’Regan Machine Translation and Translation Technology
48. Outline
Tools for Translators
Free Language Data
Machine Translation
The importance of spell checkers
Kevin Scannell
Makes the Irish spell checking data. And An Gramad´ir, an Irish
o
language grammar checker. And created a WordNet/thesaurus for
Irish. And contributed the language data for Apertium’s
Irish–Scots Gaelic translator. etc.
Marcin Milkowski
Heavily involved in the Polish spell checking data And
LanguageTool, a multilingual grammar checker that’s integrated
with Open Office. And maintains the open Polish thesaurus.
Jimmy O’Regan Machine Translation and Translation Technology
49. Outline
Tools for Translators
Free Language Data
Machine Translation
The importance of spell checkers
translate.org.za
Make the spelling checkers for several South African languages, as
well as many tools for translators already mentioned–Virtaal,
Translate Toolkit, Pootle. Much of Apertium’s English–Africans
translator was made directly by translate.org.za developers, as well
as Apertium’s dbus interface, and a GUI. (Virtaal allows translators
to use machine translations as a basis for their work).
Jimmy O’Regan Machine Translation and Translation Technology
50. Outline
Tools for Translators
Free Language Data
Machine Translation
The importance of spell checkers
Spell checkers are set to become even more important. Hunspell,
which is fast becoming the standard spell checker in Open Source
projects, now includes morphological analysis and generation. This
will greatly improve, among other things, terminology management
in translator’s tools.
At the moment, if you have “dog” in your terminology list, the
translation tool will see that and only that: “dogs” will go
unrecognised. With morphological analysis, the tool can know that
“dogs” is not only related to “dog”, but is the plural of a noun:
another assistance to the translator.
Jimmy O’Regan Machine Translation and Translation Technology
51. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Machine Translation
Machine translation has a bad reputation.
Jimmy O’Regan Machine Translation and Translation Technology
52. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Mechanical Translation
Mechanical translation, of any form, does tend, inevitably, to have
mistakes.
Jimmy O’Regan Machine Translation and Translation Technology
53. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Mechanical Translation
Mechanical translation, of any form, does tend, inevitably, to have
mistakes.
For centuries, paintings of Moses portrayed him as having horns.
Jimmy O’Regan Machine Translation and Translation Technology
54. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Mechanical Translation
Mechanical translation, of any form, does tend, inevitably, to have
mistakes.
For centuries, paintings of Moses portrayed him as having horns.
A translator of the Latin Vulgate added the wrong vowel: he
thought that Moses had horns, not that his face was glowing.
Jimmy O’Regan Machine Translation and Translation Technology
55. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Mechanical Translation
Mechanical translation, of any form, does tend, inevitably, to have
mistakes.
For centuries, paintings of Moses portrayed him as having horns.
A translator of the Latin Vulgate added the wrong vowel: he
thought that Moses had horns, not that his face was glowing.
And people were killed for wishing to correct that, and other
mistakes.
Jimmy O’Regan Machine Translation and Translation Technology
56. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Mechanical Translation
Mechanical translation, of any form, does tend, inevitably, to have
mistakes.
For centuries, paintings of Moses portrayed him as having horns.
A translator of the Latin Vulgate added the wrong vowel: he
thought that Moses had horns, not that his face was glowing.
And people were killed for wishing to correct that, and other
mistakes.
Proofread translations, always.
Jimmy O’Regan Machine Translation and Translation Technology
57. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Types of Machine Translation
Dictionary lookup
Jimmy O’Regan Machine Translation and Translation Technology
58. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Types of Machine Translation
Dictionary lookup – the most basic form of MT
Jimmy O’Regan Machine Translation and Translation Technology
59. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Types of Machine Translation
Dictionary lookup – the most basic form of MT
Translation Memory
Jimmy O’Regan Machine Translation and Translation Technology
60. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Types of Machine Translation
Dictionary lookup – the most basic form of MT
Translation Memory – also, a basic form of MT
Jimmy O’Regan Machine Translation and Translation Technology
61. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Types of Machine Translation
Dictionary lookup – the most basic form of MT
Translation Memory – also, a basic form of MT
Example Based Machine Translation
Jimmy O’Regan Machine Translation and Translation Technology
62. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Types of Machine Translation
Dictionary lookup – the most basic form of MT
Translation Memory – also, a basic form of MT
Example Based Machine Translation – considered the most
accurate form of MT, but there are few if any examples “in
the wild”.
Jimmy O’Regan Machine Translation and Translation Technology
63. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Types of Machine Translation
Dictionary lookup – the most basic form of MT
Translation Memory – also, a basic form of MT
Example Based Machine Translation – considered the most
accurate form of MT, but there are few if any examples “in
the wild”.
Statistical Machine Translation
Jimmy O’Regan Machine Translation and Translation Technology
64. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Types of Machine Translation
Dictionary lookup – the most basic form of MT
Translation Memory – also, a basic form of MT
Example Based Machine Translation – considered the most
accurate form of MT, but there are few if any examples “in
the wild”.
Statistical Machine Translation – currently the darling of
research and the basis of Google Translate.
Jimmy O’Regan Machine Translation and Translation Technology
65. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Types of Machine Translation
Dictionary lookup – the most basic form of MT
Translation Memory – also, a basic form of MT
Example Based Machine Translation – considered the most
accurate form of MT, but there are few if any examples “in
the wild”.
Statistical Machine Translation – currently the darling of
research and the basis of Google Translate. Solves a lot of old
problems, but introduces new one.
Jimmy O’Regan Machine Translation and Translation Technology
66. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Types of Machine Translation
Dictionary lookup – the most basic form of MT
Translation Memory – also, a basic form of MT
Example Based Machine Translation – considered the most
accurate form of MT, but there are few if any examples “in
the wild”.
Statistical Machine Translation – currently the darling of
research and the basis of Google Translate. Solves a lot of old
problems, but introduces new one. And breaks a lot of things
that “used to work”.
Jimmy O’Regan Machine Translation and Translation Technology
67. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Types of Machine Translation
Dictionary lookup – the most basic form of MT
Translation Memory – also, a basic form of MT
Example Based Machine Translation – considered the most
accurate form of MT, but there are few if any examples “in
the wild”.
Statistical Machine Translation – currently the darling of
research and the basis of Google Translate. Solves a lot of old
problems, but introduces new one. And breaks a lot of things
that “used to work”.
Rule Based Machine Translation
Jimmy O’Regan Machine Translation and Translation Technology
68. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Types of Machine Translation
Dictionary lookup – the most basic form of MT
Translation Memory – also, a basic form of MT
Example Based Machine Translation – considered the most
accurate form of MT, but there are few if any examples “in
the wild”.
Statistical Machine Translation – currently the darling of
research and the basis of Google Translate. Solves a lot of old
problems, but introduces new one. And breaks a lot of things
that “used to work”.
Rule Based Machine Translation – The oldest kind of MT,
dating back to the 1950s.
Jimmy O’Regan Machine Translation and Translation Technology
69. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Types of Machine Translation
Dictionary lookup – the most basic form of MT
Translation Memory – also, a basic form of MT
Example Based Machine Translation – considered the most
accurate form of MT, but there are few if any examples “in
the wild”.
Statistical Machine Translation – currently the darling of
research and the basis of Google Translate. Solves a lot of old
problems, but introduces new one. And breaks a lot of things
that “used to work”.
Rule Based Machine Translation – The oldest kind of MT,
dating back to the 1950s. – The kind I work with, so
obviously it’s the best
Jimmy O’Regan Machine Translation and Translation Technology
70. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Types of Machine Translation
Dictionary lookup – the most basic form of MT
Translation Memory – also, a basic form of MT
Example Based Machine Translation – considered the most
accurate form of MT, but there are few if any examples “in
the wild”.
Statistical Machine Translation – currently the darling of
research and the basis of Google Translate. Solves a lot of old
problems, but introduces new one. And breaks a lot of things
that “used to work”.
Rule Based Machine Translation – The oldest kind of MT,
dating back to the 1950s. – The kind I work with, so
obviously it’s the best!!!
Jimmy O’Regan Machine Translation and Translation Technology
71. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Is Machine Translation a Translator’s Tool?
Yes.
Jimmy O’Regan Machine Translation and Translation Technology
72. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Is Machine Translation a Translator’s Tool?
Yes.
That might be hard to accept.
Jimmy O’Regan Machine Translation and Translation Technology
73. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Is Machine Translation a Translator’s Tool?
Yes.
That might be hard to accept. Particularly if you only speak
English.
Jimmy O’Regan Machine Translation and Translation Technology
74. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Is Machine Translation a Translator’s Tool?
Yes.
That might be hard to accept. Particularly if you only speak
English. But for closely-related, similar languages, machine
translation can be as effective and accurate as a spelling checker.
Jimmy O’Regan Machine Translation and Translation Technology
75. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Uses of Machine Translation
Jimmy O’Regan Machine Translation and Translation Technology
76. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Uses of Machine Translation
1 Assimilation
Jimmy O’Regan Machine Translation and Translation Technology
77. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Uses of Machine Translation
1 Assimilation Understanding a text
Jimmy O’Regan Machine Translation and Translation Technology
78. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Uses of Machine Translation
1 Assimilation Understanding a text
2 Dissemination
Jimmy O’Regan Machine Translation and Translation Technology
79. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Uses of Machine Translation
1 Assimilation Understanding a text
2 Dissemination Preparing a text for translation.
Jimmy O’Regan Machine Translation and Translation Technology
80. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Uses of Machine Translation
1 Assimilation Understanding a text
2 Dissemination Preparing a text for translation. That is; for
preparing a rough draft for a translator. Who then edits the
text.
Jimmy O’Regan Machine Translation and Translation Technology
81. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
“Phrase-Based” SMT
Most current research (and commercial use) of Statistical MT uses
“phrase-based” SMT.
Jimmy O’Regan Machine Translation and Translation Technology
82. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
“Phrase-Based” SMT
Most current research (and commercial use) of Statistical MT uses
“phrase-based” SMT.
The problem is it’s not phrase-based.
Jimmy O’Regan Machine Translation and Translation Technology
83. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
“Phrase-Based” SMT
Most current research (and commercial use) of Statistical MT uses
“phrase-based” SMT.
The problem is it’s not phrase-based.
It’s N-Gram based.
Jimmy O’Regan Machine Translation and Translation Technology
84. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
N-Grams
An n-gram is a collection of “n” amounts of tokens
Jimmy O’Regan Machine Translation and Translation Technology
85. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
N-Grams
An n-gram is a collection of “n” amounts of tokens
For text, these are usually (not always!) words
Jimmy O’Regan Machine Translation and Translation Technology
86. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
N-Grams
An n-gram is a collection of “n” amounts of tokens
For text, these are usually (not always!) words
...punctuation is counted as a “word”.
Jimmy O’Regan Machine Translation and Translation Technology
87. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Unigrams
“This is John’s dog.”
Jimmy O’Regan Machine Translation and Translation Technology
88. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Unigrams
“This is John’s dog.”
Example
This
is
John
’s
dog
.
Jimmy O’Regan Machine Translation and Translation Technology
89. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Bigrams
“This is John’s dog.”
Jimmy O’Regan Machine Translation and Translation Technology
90. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Bigrams
“This is John’s dog.”
Example
This is
is John
John ’s
’s dog
dog .
Jimmy O’Regan Machine Translation and Translation Technology
91. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Trigrams
“This is John’s dog.”
Jimmy O’Regan Machine Translation and Translation Technology
92. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Trigrams
“This is John’s dog.”
Example
This is John
is John ’s
John ’s dog
’s dog .
Jimmy O’Regan Machine Translation and Translation Technology
93. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
N-Gram Language Models
An n-gram language model is a collection of n-grams
Jimmy O’Regan Machine Translation and Translation Technology
94. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
N-Gram Language Models
An n-gram language model is a collection of n-grams for n..1: a
trigram model includes trigrams, bigrams, and unigrams.
Jimmy O’Regan Machine Translation and Translation Technology
95. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
N-Gram Language Models
An n-gram language model is a collection of n-grams for n..1: a
trigram model includes trigrams, bigrams, and unigrams.
Each n-gram is listed along with its frequency
Jimmy O’Regan Machine Translation and Translation Technology
96. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
N-Gram Language Models
An n-gram language model is a collection of n-grams for n..1: a
trigram model includes trigrams, bigrams, and unigrams.
Each n-gram is listed along with its frequency
(According to a particular corpus)
Jimmy O’Regan Machine Translation and Translation Technology
97. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
N-Gram Language Models
An n-gram language model is a collection of n-grams for n..1: a
trigram model includes trigrams, bigrams, and unigrams.
Each n-gram is listed along with its frequency
(According to a particular corpus)
But, most importantly...
Jimmy O’Regan Machine Translation and Translation Technology
98. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
N-Gram Language Models
N-grams overlap.
Jimmy O’Regan Machine Translation and Translation Technology
99. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
N-Gram Language Models
When a sequence of words is queried against a language model,
the language model software computes the combined likelihood of
1..n combinations in that sequence.
Jimmy O’Regan Machine Translation and Translation Technology
100. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
N-Gram Language Models
Possibly the first use of n-gram language models was in Automatic
Speech Recognition.
Jimmy O’Regan Machine Translation and Translation Technology
101. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
N-Gram Language Models
For basic uses of ASR, such as call centres, a custom grammar is
used.
Jimmy O’Regan Machine Translation and Translation Technology
102. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
N-Gram Language Models
For basic uses of ASR, such as call centres, a custom grammar is
used.
In a mobile phone, such a grammar could look like this:
Example
CALLWORD : phone call dial
ZEROWORD : zero oh
NUMBER : one two three four five six seven eight nine
ZEROWORD
NUMBERS : NUMBER* NUMBER
COMMAND : CALLWORD NUMBERS
Jimmy O’Regan Machine Translation and Translation Technology
103. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
N-Gram Language Models
However, for continuous dictation, building such grammars is an
almost infinite task.
Jimmy O’Regan Machine Translation and Translation Technology
104. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
N-Gram Language Models
However, for continuous dictation, building such grammars is an
almost infinite task.
Instead of defining long, complicated grammars that define, for
example, when the sound /mi:t/ represents “meet” and when it
represents “meat”, n-gram language models allow the correct
sound to be chosen based on the context of the surrounding words.
Jimmy O’Regan Machine Translation and Translation Technology
105. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
N-Gram Language Models
This was an obvious progression for ASR, which uses statistical
modelling to choose in context which sound is most likely, based
on the surrounding sounds.
Jimmy O’Regan Machine Translation and Translation Technology
106. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
N-Gram Language Models
Now, language models are being used in all areas of language
technology.
The problem is: useful language models are huge, and can be
computationally costly to use
Jimmy O’Regan Machine Translation and Translation Technology
107. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
N-Gram Language Models
Now, language models are being used in all areas of language
technology.
The problem is: useful language models are huge, and can be
computationally costly to use unless you have a data centre.
Jimmy O’Regan Machine Translation and Translation Technology
108. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
N-Gram Language Models
Now, language models are being used in all areas of language
technology.
The problem is: useful language models are huge, and can be
computationally costly to use unless you have a data centre.
Google, for example, use language models for everything :
Spell checking (Search, Google Wave, GMail)
Search queries (“Did you mean?”)
Machine translation
Jimmy O’Regan Machine Translation and Translation Technology
109. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
N-Gram-Based SMT
Basic Statistical MT uses a probabilistic dictionary: each word pair
has a probability assigned.
Jimmy O’Regan Machine Translation and Translation Technology
110. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
N-Gram-Based SMT
Basic Statistical MT uses a probabilistic dictionary: each word pair
has a probability assigned.
The interesting part is how they get those dictionaries.
Jimmy O’Regan Machine Translation and Translation Technology
111. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
N-Gram-Based SMT
Basic Statistical MT uses a probabilistic dictionary: each word pair
has a probability assigned.
The interesting part is how they get those dictionaries.
A program, usually GIZA++ (Open Source), reads two pairs of
text: the source language, and the target language.
Jimmy O’Regan Machine Translation and Translation Technology
112. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
N-Gram-Based SMT
For each word in each sentence of the source language, the
probable translation is considered to be every word in the target; as
more words are seen, the translations are re-evaluated: the next
time word 1 is used, perhaps “possible translation 1” is present,
but “possible translation 2” is absent from the sentence: the
probability of the former is increased; the latter, decreased.
And so on, over the course of the text, the probabilities of each
word are re-evaluated; then the whole text is processed again, and
again, until a reasonable level of probability remains.
The result is a probabilistic dictionary.
Jimmy O’Regan Machine Translation and Translation Technology
113. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
N-Gram-Based SMT
N-grams come into the picture in two ways:
Jimmy O’Regan Machine Translation and Translation Technology
114. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
N-Gram-Based SMT
N-grams come into the picture in two ways:
1 Evaluating multiple probable translations
Jimmy O’Regan Machine Translation and Translation Technology
115. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
N-Gram-Based SMT
N-grams come into the picture in two ways:
1 Evaluating multiple probable translations
Similarly to Speech Recognition, each choice is evaluated
against a language model
Jimmy O’Regan Machine Translation and Translation Technology
116. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
N-Gram-Based SMT
N-grams come into the picture in two ways:
1 Evaluating multiple probable translations
Similarly to Speech Recognition, each choice is evaluated
against a language model
2 N-grams as “words”
Jimmy O’Regan Machine Translation and Translation Technology
117. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
N-Gram-Based SMT
N-grams come into the picture in two ways:
1 Evaluating multiple probable translations
Similarly to Speech Recognition, each choice is evaluated
against a language model
2 N-grams as “words”
As well as considering individual words, each n-gram is
considered as a possible “phrase”, and treated as an individual
word.
Jimmy O’Regan Machine Translation and Translation Technology
118. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
N-Gram-Based SMT
N-grams come into the picture in two ways:
1 Evaluating multiple probable translations
Similarly to Speech Recognition, each choice is evaluated
against a language model
2 N-grams as “words”
As well as considering individual words, each n-gram is
considered as a possible “phrase”, and treated as an individual
word. This helps to cut down on ambiguous terms:
“basketball coach” vs. “coach driver”.
Jimmy O’Regan Machine Translation and Translation Technology
119. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Moses
Moses is an Open Source SMT system. Moses has a distinct
advantage over several other SMT systems:
1 It’s Open Source
Jimmy O’Regan Machine Translation and Translation Technology
120. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Moses
Moses is an Open Source SMT system. Moses has a distinct
advantage over several other SMT systems:
1 It’s Open Source
Actively developed, and supported by a large community
Jimmy O’Regan Machine Translation and Translation Technology
121. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Moses
Moses is an Open Source SMT system. Moses has a distinct
advantage over several other SMT systems:
1 It’s Open Source
Actively developed, and supported by a large community
2 Factored Models
Jimmy O’Regan Machine Translation and Translation Technology
122. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Moses
Moses is an Open Source SMT system. Moses has a distinct
advantage over several other SMT systems:
1 It’s Open Source
Actively developed, and supported by a large community
2 Factored Models
Moses is able to make use of linguistic information.
Jimmy O’Regan Machine Translation and Translation Technology
123. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Moses
Moses is an Open Source SMT system. Moses has a distinct
advantage over several other SMT systems:
1 It’s Open Source
Actively developed, and supported by a large community
2 Factored Models
Moses is able to make use of linguistic information.
3 Open Data
Jimmy O’Regan Machine Translation and Translation Technology
124. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Moses
Moses is an Open Source SMT system. Moses has a distinct
advantage over several other SMT systems:
1 It’s Open Source
Actively developed, and supported by a large community
2 Factored Models
Moses is able to make use of linguistic information.
3 Open Data
The Moses developers also recognise the importance of Free
Linguistic Data, and have provided the EuroParl corpus so
that others may build a statistical MT system using it. Also,
the JRC Acquis – the corpus of EU legal text (and most of
the data behind Google Translate’s support for most official
EU languages) have prepared their corpus for use with Moses.
Jimmy O’Regan Machine Translation and Translation Technology
125. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Apertium
Apertium is an Open Source Machine Translation platform.
Jimmy O’Regan Machine Translation and Translation Technology
126. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Apertium
Apertium is an Open Source Machine Translation platform.
Rule Based
Jimmy O’Regan Machine Translation and Translation Technology
127. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Apertium
Apertium is an Open Source Machine Translation platform.
Rule Based
Statistical disambiguation
Jimmy O’Regan Machine Translation and Translation Technology
128. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Apertium
Apertium is an Open Source Machine Translation platform.
Rule Based
Statistical disambiguation
Follows the UNIX philosophy
Jimmy O’Regan Machine Translation and Translation Technology
129. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Apertium
Apertium is an Open Source Machine Translation platform.
Rule Based
Statistical disambiguation
Follows the UNIX philosophy:
The system is a pipeline
Jimmy O’Regan Machine Translation and Translation Technology
130. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Apertium
Apertium is an Open Source Machine Translation platform.
Rule Based
Statistical disambiguation
Follows the UNIX philosophy:
The system is a pipeline
Each piece “does one thing, and does it well”
Jimmy O’Regan Machine Translation and Translation Technology
131. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Apertium
Apertium is an Open Source Machine Translation platform.
Rule Based
Statistical disambiguation
Follows the UNIX philosophy:
The system is a pipeline
Each piece “does one thing, and does it well”
(Not quite: analysis and generation of words are performed by
separate modes of the same program)
Jimmy O’Regan Machine Translation and Translation Technology
132. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Apertium
Apertium is an Open Source Machine Translation platform.
Rule Based
Statistical disambiguation
Follows the UNIX philosophy:
The system is a pipeline
Each piece “does one thing, and does it well”
(Not quite: analysis and generation of words are performed by
separate modes of the same program)
Each component can be easily replaced
Jimmy O’Regan Machine Translation and Translation Technology
133. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Apertium
Apertium is an Open Source Machine Translation platform.
Rule Based
Statistical disambiguation
Follows the UNIX philosophy:
The system is a pipeline
Each piece “does one thing, and does it well”
(Not quite: analysis and generation of words are performed by
separate modes of the same program)
Each component can be easily replaced
The apertium program itself is just a shell script that calls
the correct pipeline.
Several statistics-based tools for building data
Jimmy O’Regan Machine Translation and Translation Technology
134. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Apertium
Apertium is an Open Source Machine Translation platform.
Rule Based
Statistical disambiguation
Follows the UNIX philosophy:
The system is a pipeline
Each piece “does one thing, and does it well”
(Not quite: analysis and generation of words are performed by
separate modes of the same program)
Each component can be easily replaced
The apertium program itself is just a shell script that calls
the correct pipeline.
Several statistics-based tools for building data: dictionaries
Jimmy O’Regan Machine Translation and Translation Technology
135. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Apertium
Apertium is an Open Source Machine Translation platform.
Rule Based
Statistical disambiguation
Follows the UNIX philosophy:
The system is a pipeline
Each piece “does one thing, and does it well”
(Not quite: analysis and generation of words are performed by
separate modes of the same program)
Each component can be easily replaced
The apertium program itself is just a shell script that calls
the correct pipeline.
Several statistics-based tools for building data: dictionaries,
rules
Doesn’t run on Windows
Jimmy O’Regan Machine Translation and Translation Technology
136. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Apertium
Apertium is an Open Source Machine Translation platform.
Rule Based
Statistical disambiguation
Follows the UNIX philosophy:
The system is a pipeline
Each piece “does one thing, and does it well”
(Not quite: analysis and generation of words are performed by
separate modes of the same program)
Each component can be easily replaced
The apertium program itself is just a shell script that calls
the correct pipeline.
Several statistics-based tools for building data: dictionaries,
rules
Doesn’t run on Windows yet.
Jimmy O’Regan Machine Translation and Translation Technology
137. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Apertium
Apertium is an Open Source Machine Translation platform.
Rule Based
Statistical disambiguation
Follows the UNIX philosophy:
The system is a pipeline
Each piece “does one thing, and does it well”
(Not quite: analysis and generation of words are performed by
separate modes of the same program)
Each component can be easily replaced
The apertium program itself is just a shell script that calls
the correct pipeline.
Several statistics-based tools for building data: dictionaries,
rules
Doesn’t run on Windows yet. Shame on you for cheering!
Jimmy O’Regan Machine Translation and Translation Technology
138. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Apertium
Apertium is an Open Source Machine Translation platform.
Rule Based
Statistical disambiguation
Follows the UNIX philosophy:
The system is a pipeline
Each piece “does one thing, and does it well”
(Not quite: analysis and generation of words are performed by
separate modes of the same program)
Each component can be easily replaced
The apertium program itself is just a shell script that calls
the correct pipeline.
Several statistics-based tools for building data: dictionaries,
rules
Doesn’t run on Windows yet. Shame on you for cheering! ;)
Jimmy O’Regan Machine Translation and Translation Technology
139. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Some Errors
(Spanish – English)
Jimmy O’Regan Machine Translation and Translation Technology
140. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Some Errors
(Spanish – English)
Fondo Monetario Internacional
Jimmy O’Regan Machine Translation and Translation Technology
141. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Some Errors
(Spanish – English)
Fondo Monetario Internacional
International Monetary bottom
Jimmy O’Regan Machine Translation and Translation Technology
142. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Some Errors
(Spanish – English)
Fondo Monetario Internacional
International Monetary bottom
(Catalan – English)
Jimmy O’Regan Machine Translation and Translation Technology
143. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Some Errors
(Spanish – English)
Fondo Monetario Internacional
International Monetary bottom
(Catalan – English)
Fidel Castro
Jimmy O’Regan Machine Translation and Translation Technology
144. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Some Errors
(Spanish – English)
Fondo Monetario Internacional
International Monetary bottom
(Catalan – English)
Fidel Castro
Faithful Castrate
Jimmy O’Regan Machine Translation and Translation Technology
145. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Apertium
Apertium was born from a set of translators developed in
Universitat d’Alacant, as part of the OpenTrad project.
Jimmy O’Regan Machine Translation and Translation Technology
146. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Apertium
Apertium was born from a set of translators developed in
Universitat d’Alacant, as part of the OpenTrad project. Originally
designed to translate between the Romance languages of Spain, it
has been expanded over time to support more distant languages:
Jimmy O’Regan Machine Translation and Translation Technology
147. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Apertium
Apertium was born from a set of translators developed in
Universitat d’Alacant, as part of the OpenTrad project. Originally
designed to translate between the Romance languages of Spain, it
has been expanded over time to support more distant languages:
First English–Catalan
Jimmy O’Regan Machine Translation and Translation Technology
148. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Apertium
Apertium was born from a set of translators developed in
Universitat d’Alacant, as part of the OpenTrad project. Originally
designed to translate between the Romance languages of Spain, it
has been expanded over time to support more distant languages:
First English–Catalan More recently, Basque to Spanish
Jimmy O’Regan Machine Translation and Translation Technology
149. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Community
As part of the OpenTrad project, Apertium had a community of
developers, but limited to university and business developments.
Thanks mostly to Francis Tyers, Apertium has in recent years
begun to also acquire a community of volunteer contributors.
Jimmy O’Regan Machine Translation and Translation Technology
150. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Community
As part of the OpenTrad project, Apertium had a community of
developers, but limited to university and business developments.
Thanks mostly to Francis Tyers, Apertium has in recent years
begun to also acquire a community of volunteer contributors.
The first release from the volunteer community was our Welsh to
English translator (mostly designed by Kevin Donnelly – who also
maintains the Welsh spell checking data).
Jimmy O’Regan Machine Translation and Translation Technology
151. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Community
As part of the OpenTrad project, Apertium had a community of
developers, but limited to university and business developments.
Thanks mostly to Francis Tyers, Apertium has in recent years
begun to also acquire a community of volunteer contributors.
The first release from the volunteer community was our Welsh to
English translator (mostly designed by Kevin Donnelly – who also
maintains the Welsh spell checking data).
This summer, we took part in Google’s Summer of Code
programme, with 8 successful students. One of the translators
developed during GSoC, for Norwegian Bokm˚l–Nynorsk, has
a
(within a month of release) been used to translate 30 articles on
the Nynorsk Wikipedia.
Jimmy O’Regan Machine Translation and Translation Technology
152. Outline Is Machine Translation a Translator’s Tool?
Tools for Translators N-Grams
Free Language Data Moses
Machine Translation Apertium
Language Pairs Supported
Spanish – Catalan Spanish – Romanian
French – Catalan Occitan – Catalan
English – Galician Occitan – Spanish
Spanish – Portuguese English – Catalan
English – Spanish English – Esperanto
Spanish – Galician French – Spanish
Esperanto – Spanish Welsh – English
Breton – French Esperanto – Catalan
Portuguese – Catalan Portuguese – Galician
Basque – Spanish Nynorsk – Bokm˚la
Jimmy O’Regan Machine Translation and Translation Technology