4. Internet
• 90% of content in just 12 languages
• How big an issue is extinction?
• Language transformation vs. transformation
of old media (TV, newspapers, radio)
• Unicode - first major breakthrough
5. Slovenian (my language)
• Roughly 2 million speakers
• More speakers than 96% of languages
• Official EU language - enforcement policies
• Endangerment?
6.
7. Use of foreign words in scientific text where
appropriate Slovenian counterparts exist.
15. The Rosetta Project
• http://rosettaproject.org/
• Publicly accessible digital library
• Aiming to preserve information about
eventually all human languages
16. Preservation of knowledge
contained in a language
• Smithsonian Institute
• Rosetta Project
• Unesco
• Revitalization (non-extinct)
• Resurrection (extinct)
• Only successful known example: Hebrew
17. Keeping use of a language
viable/economical
• Consistent use
• Dictionaries, tools
• Translation tools
• Advanced language software (TTS, SR)
18. Language technologies
• Machine translation
• Speech synthesis
• Speech recognition
• ...
• Advance in one field accelerates advances
in others through increased feasibility
19. Language technologies
• Machine translation
• Speech synthesis
• Speech recognition
• ...
• Advance in one field accelerates advances
in others through increased feasibility
20.
21. 2005
• Systran (fr.)
• Yahoo!, Altavista Babelfish
• Google
• Rule based + statistical approach
22. Live translation
• Done in 2005 as Ethnocon project
(presented at MS Imagine Cup)
• Speech recognition (language 1)
• Text machine translation (Systran API)
• Speech synthesis (language 2)
• MT quality poor
23. 2006+
• Google Translate Systran
• Google obtained United Nations parallel
corpora
• Words = data, grammar = code
• Purely statistical approach (a huge amount
of data, code )
35. Google Translator Toolkit
• June, 2009 (200+ languages in October)
• “Open Trados”
• Global parallel TM
• Google TT + Google Translate
• 345 languages, 10.664 language pairs
36.
37.
38. Google Translator Toolkit
• Incentive for professionals: productivity
• Motivated to contribute to global TM
• GT pre-translates text with
• Huge parallel corpora
• Professional translation!
39. Professional translations are fed into the
crowdsourced Google Translate parallel
corpora.
Like Wikipedia with professional editors.
Huge quality gains over time if Google
Translator Toolkit takes off.