This document discusses an MT case study of localizing Spanish translations for different Latin American countries on a small budget. It describes translating from English to standard Spanish, then using MT to adapt the Spanish to different Latin American variants like Argentine Spanish, Mexican Spanish, etc. Testing showed the prototype engine reduced errors from 7.78% to 1.21% compared to human localization. While not perfect, combining human and machine translation helped achieve high quality at a lower cost. Further work would focus on improving glossaries and expanding language coverage.
Comparing Sidecar-less Service Mesh from Cilium and Istio
2013 GALA Miami: Breaking into Latin Maerican Markets on a Small Budget
1. An MT Case Study:
Breaking into Latin American Markets
on a Small Budget
María Azqueta (SeproTec) & Diego Bartolomé (tauyou)
2. Spanish Worldwide
Spanish Language:
• Also known as Castellano.
• Latin-derived Romance language.
• Spanish is one of the six official languages of
the United Nations and an official language of
the European Union.
4. Spanish Worldwide
0 200 400 600 800 1000 1200
Mandarin Chinese
Spanish
English
Hindi/Urdu
407 million
311 million
955 million
360 million
Second most spoken language by number of native speakers
5. Spanish Worldwide
• For demographic reasons, the percentage of the
orld’s populatio that speaks Spa ish as a ati e
language is increasing, while the percentage of
Chinese and English speakers is decreasing.
• Withi three or four ge eratio s, % of the orld’s
population will communicate in Spanish.
• I 5 , the U ited States ill e the orld’s
foremost Spanish speaking country.
6. Spanish on the Internet
• Spanish is the third most widely used language on
the Net.
• The use of Spanish on the Net has experienced a
growth rate of 807.4% between 2000 and 2011.
• Spain and Mexico are among the 20 countries with
the highest number of internet users.
• The demand for documents in Spanish is the fourth
largest fro a o g the orld’s la guages.
7. Spanish Worldwide and its Differences
High demand for translations into Spanish.
But… is the same Spanish spoken
everywhere?
8. Spanish Worldwide and its Differences
RAE (Royal Spanish Academy) :
– Created in the 18th century, it is widely seen as
the arbiter of what is considered standard
Spanish.
– It produces authoritative dictionaries and
grammar guides.
– Although its decisions are not formally binding,
they are widely followed in both Spain and Latin
America.
9. Spanish Worldwide and its Differences
Lexical
variations
Grammatical
differences
Idioms
Different dialects and many differences:
10. Spanish Worldwide and its Differences
‘Neutral’ or
‘International’
Spanish
Latin American
Spanish &
European
Spanish
Market Trend:
11. Why Adapt to the
Local Spanish of Each Country?
To reach different markets
People are most likely to buy when a product is
advertised in their dialect
12. Why Adapt to the
Local Spanish of Each Country?
EN: Take a card from the deck
ES: Coge una carta de la baraja
Client A (Gaming Industry)
13. Why Adapt to the
Local Spanish of Each Country?
ES: Coge una carta de la baraja
AR: Agarrá una carta del mazo
CL: Toma una carta del naipe
CO: Coge una carta de la baraja
MX: Saca una carta de la baraja
PR: Coge una carta de la baraja
16. Advise Clients
If you really want to break into a specific
market, you must decide which country
you want to target and localize your
material for the different Spanish dialects
spoken in each individual country.
18. Is there a cost-efficient solution
on the market?
19. tauyou MT Solution at SeproTec
Hybrid machine translation since January 2011
La guages: EN, ES, PT, GA, FR, IT…
Do ai s: Legal, Te h i al…
Glossaries and forbidden words lists
Average translated words per month: 700,000
21. Final Scope of the Project
Human translation + revision
English > Spanish (Spain)
MT of Spanish (Spain) into
Spanish from:
• Argentina
• Chile
• Colombia
• Mexico
• Puerto Rico
22. Initial Approach for Latin American MT
Traditional Workflow
. Gather tra slatio e ories (EN → ES-XX)
2. Add generic material
3. Develop engine
4. Add linguistic pre- and post-processing
5. Improve quality over time
23. Drawbacks
Varying MT Quality
Depending on the domain and dialect
Initial Inconsistencies among Dialects
Handled with glossaries
Medium Post-Editing Effort
Could be improved over time
24. New Approach
Translate EN to Standard ES
Via standard high-quality human translation
Convert Standard ES to Latin American Variants
From Spanish to Spanish
Better final quality is achieved
26. Testing the Prototype Engine
Extraction of several texts (fashion, real-
estate, human resources, automobile)
Sent to linguists and/or translators in
each target country for localization
Performance of the same localizations
by the engine
Comparison and contrasting of human
and machine localization results
27. First Bug Report
Not all terms
were localized
Concordance
issues
(masc./fem.;
sing./pl.)
Verbal tenses
for Argentina
Human vs. Machine
MT: 7.78 % error rate
28. First Bug Report
Some terms were changed/localized by the
engine, but not by the humans.
(example)
Human error or MT error?
29. Testing the Prototype Engine
A glossary was created by
extracting the terms localized by the
linguists/translators.
This glossary was then sent to
the same people who localized
the texts to verify that all the
terms were correctly localized
and nothing was missing.
31. Testing the Prototype Engine
People can miss things.
Although many different variants of Spanish
exist, Spanish speakers understand many
terms that are foreign to their own dialect
when they read them in context,
sometimes to the point of accepting them
as their own. I believe that this may be
due to the phenomenon of globalization
and the internet.
34. Conclusions
Human localization is not perfect.
MT is not perfect either.
Combining human and machine translation
helps achieve high quality and reduce cost.
35. Further Work
Improving Glossaries
Through a simple web interface for PE
Extending Spanish Language Coverage
More dialects
Traductor.cervantes.es
Incorporating more languages
English, French and Portuguese
36. Bibliography
Yule, G. (2006). The Study of Language: Third
Edition, Cambridge University New York.
RAE
Instituto Cervantes
http://www.linguapress.com
37. THANK YOU FOR
YOUR TIME!
María Azqueta
mazqueta@seprotec.com
Diego Bartolomé
diego.bartolome@tauyou.com