MT is useful, and it gets better and more useful when it is customized to the terminology and style of the documents to be translated. But it is extra work, not much, but extra work. In this talk you’ll get an overview of MT domain customization, its benefits, pitfalls, and conditions for making it work, as well as an overview of the actual work and helpful vs. not so helpful training documents. The theory of MT. Introduction to MT: short history, the pros and cons of different techniques. Statistical MT versus rule-based MT and what the brand new model-based MT can offer, as well as the hybridization and the challenges and possible breakthroughs.
7. Slide 8
Start With
•Parallel sentences
•
•Monolingual data
•Decoding Algorithm
Build These Components
•Translation Model
•
•Language Model – P(E)
•Decoder
•
9. Slide 10
Your site or
application
Translator
Service
Supply Corrections
Consume Translations Collaborative
Translations
Store
Microsoft
Translator Hub
Custom
ModelsGeneric
Models
Your own,
previously
translated
documents
Supply Documents
Build custom models
Import Corrections
for training
10. Slide 11
Your site or
application
Translator
Service
Supply Corrections
Consume Translations Collaborative
Translations
Store
Microsoft
Translator Hub
Custom
ModelsGeneric
Models
Your own,
previously
translated
documents
Supply Documents
Build custom models
Import Corrections
for training
Translate()
AddTranslation()
GetTranslations()
GetUserTranslations()
Speak()
Detect()
BreakSentences()
Thorough customization
Retrain every 2 months,
or 20000 segments
Continuous
Improvement
11. Slide 12
What goes in What it does Rules to follow
Be strict. Compose them to be optimally
representative of what you are going to
translate in the future.
Calculate the BLEU score –
just for you.
Dictionaries Forces the given
translation with a
probability of 1.
Be restrictive. Safe to use only for
compound nouns and named entities.
Better to not use and let the system learn.
Build the translation
model aka phrase table.
Teaches how to translate.
Be liberal. Any in-domain human
translation is better than MT. Add and
remove documents as you go and try to
improve the score.
Build the target language
model. Improve grammar
and fluency.
Be liberal. Use any in-domain target
language material you can get.
12. Slide 13
•
• Humans can easily detect 0.5 to 1.0 points
•
Faster post-editing
Higher document comprehension
•
• Small: Higher improvement within the domain
• Large: Better suited for input variability Better exploit of training docs
•
Better to build a larger domain (lower BLEU delta)
•
15. Slide 16
Post-Editing
•Goal: Human translation quality
•Increase human translator’s
productivity
•In practice: 0% to 25%
productivity increase
Varies by content, style and language
Raw publishing
Goals:
Good enough for the purpose
Speed
Cost
Publish the output of the MT system directly
to end user
Best with bilingual UI
Good results with technical audiences
Cost-effective way for inbound material
Triage
Analysis and classification
P3 – Post-Publish Post-Editing
Know what you are human
translating, and why
Make use of community
Domain experts
Enthusiasts
Employees
Professional translators
Best of both worlds
Fast
Better than raw
Always current
16. Slide 17
Assimilation Dissemination Post-Edit
Use customized machine translation
Never miss a chance to collect a human edit
Make the source visible on demand Show the source
Show domain-relevant dictionaries
Apply TM with 100% Apply TM with 80%
Reveal alternatives
Publish raw first, collect human feedback Use modern, collaborative TM
systems (i.e. MemSource)
Statistical machine translation involves using a large corpus of parallel texts (in two languages) to train statistical models that are able to subsequently help translate new text. Microsoft Translator machine translation service is built on these principles, and serves millions of translations every day through various Microsoft products (Bing, Office, Internet Explorer) and a rich API.
Slide is heavily animated.
Slide is heavily animated.
Although the use here demonstrates the use of the technology with deaf or hard of hearing students, it’s not much of a stretch to adapt the technology, since the components already exist, to hearing students that speak other languages. In fact, it could be used in that manner now. We haven’t tested it in this scenario…yet.