Presentation by CPSL and tauyou at the tekom annual conference. It provides the case of a successful implementation of machine translation in a mid-size Language Service Providers.
Elevate Developer Efficiency & build GenAI Application with Amazon Q
2011 Tekom Wiesbaden: Implementation of a machine translation engine at CPSL
1. Speaker:Speaker: BelBeléénn GarcGarcííaa--Ochoa (CPSL)Ochoa (CPSL)
CoCo--speaker: Diegospeaker: Diego BartolomBartoloméé ((tauyoutauyou <language technology>)<language technology>)
Implementation of a MachineImplementation of a Machine
Translation Engine at CPSLTranslation Engine at CPSL
2. TheThe speakerspeaker
Localization Director at CPSL
CPSL is a Multilingual Service
Provider since 1963
Headquarters in Barcelona-Spain
Other Offices in:
Madrid-Spain
Germany
UK
CPSL staff includes over 50 people
Belén García-Ochoa
3. TheThe coco--speakerspeaker
CEO tauyou <language technology>
tauyou provides language
technologies for the localization
industry since 2006
Main clients: medium-sized LSPs
Headquarters in Barcelona
Diego Bartolomé
4. CPSL and Machine Translation
Post-editing services provided to a software
company for a huge project
Lots of translated words in a tight timeframe
7. HumanHuman postpost--editingediting vs.vs.
humanhuman translationtranslation
TheThe standardstandard wordswords thatthat aa translatortranslator
can docan do perper dayday isis 2,5002,500..
TheThe standardstandard wordswords thatthat aa reviewerreviewer ofof
humanhuman translationtranslation can docan do perper dayday isis
12,000.12,000.
AnAn averageaverage ofof thethe wordswords thatthat can becan be
postpost--editededited perper dayday isis 8,000.8,000.
8. Dedicated hybrid machine translationDedicated hybrid machine translation
engine that is continuously customizedengine that is continuously customized
CorpusCorpus--based with rules for prebased with rules for pre-- andand
postpost--processingprocessing
Data confidentiality is guaranteedData confidentiality is guaranteed
Translation speedTranslation speed
The tauyou solutionThe tauyou solution
9. Any type of documentAny type of document
Glossary priorizationGlossary priorization
Fast domain creation/updateFast domain creation/update
Fully customizableFully customizable
Quality metrics computationQuality metrics computation
Terminology extractionTerminology extraction
Main characteristicsMain characteristics
10. gather ingather in--domain datadomain data
train the translation solutiontrain the translation solution
enrich solution with related textenrich solution with related text
terminology priorizationterminology priorization
update the translation solutionupdate the translation solution
add rules to enhance qualityadd rules to enhance quality
weekly updatesweekly updates
Optimum domain creationOptimum domain creation
11. Optimize translation quality for a clientOptimize translation quality for a client
gather client datagather client data
train the translation solutiontrain the translation solution
add rules to enhance qualityadd rules to enhance quality
continuous improvementcontinuous improvement
CPSL workflow 1CPSL workflow 1
12. General purpose translatorGeneral purpose translator
gather clients datagather clients data
add generic texts to provide a good sampleadd generic texts to provide a good sample
train the translation solutiontrain the translation solution
add rules to enhance qualityadd rules to enhance quality
periodical improvementperiodical improvement
CPSL workflow 2CPSL workflow 2
13. Data creation and enhancementData creation and enhancement
user defineduser defined
unaligned translated documentsunaligned translated documents
generic translationsgeneric translations
optimum corpus/memories creationoptimum corpus/memories creation
rulerule--based extension/filteringbased extension/filtering
Other use casesOther use cases
15. Detailed analysis of translated documentsDetailed analysis of translated documents
Several customized parameters, including wordSeveral customized parameters, including word
error rate, number of word edits, tag differences, etcerror rate, number of word edits, tag differences, etc
Useful in machine translation but also in normalUseful in machine translation but also in normal
quality processquality process
Quality metricsQuality metrics
16. Unilingual and bilingual terminology listsUnilingual and bilingual terminology lists
Customized according to position in the sentence,Customized according to position in the sentence,
word type, number of words, etcword type, number of words, etc
Feed the MT engine or tool for human translatorFeed the MT engine or tool for human translator
Terminology extractionTerminology extraction
17. Increase usage of translation memoriesIncrease usage of translation memories
Automatic domain classificationAutomatic domain classification
Source text enhancementSource text enhancement
spelling, grammar, structure, terminology ...spelling, grammar, structure, terminology ...
Special words detectionSpecial words detection
New domains/language pairs creationNew domains/language pairs creation
The futureThe future