More Related Content Similar to TAUS Scotland Asia Online Technology Platform V1 (20) TAUS Scotland Asia Online Technology Platform V11. TM
Translation Technology Platform
Kirti Vashee
VP Sales, Asia Online
Kirti.vashee@asiaonline.net
2. Revolutionize the enterprise
Revolutionize the Internet translation process with a
experience for non-English comprehensive, continuous
speakers in Asia learning SMT platform
Provide 1 billion+ local-language pages online SaaS environment that allows data
using mostly translated open license content, cleaning and preparation, develop SMT
combined with compelling portal and social
networking style services in Thailand, engines on demand and enable ongoing
Indonesia, India, Malaysia, Philippines, comprehensive post editing and correction
Vietnam and China, Japan & Korea to continuously improve engines
The Consumer Market The Enterprise Market
Large Buyer & Translation Tools
Publisher Perspective Vendor Perspective
TM Copyright © 2008, All Rights Reserved
3. • The only SMT technology provider that is also a major user of
ALT technology on one of the largest translation projects in the
world - English Wikipedia (1B Words+) into 11 Asian languages
using SMT and crowdsourcing
• The translation tools and technology platform used to
accomplish this, is also being made available as a SaaS
product for the enterprise translation market
TM Copyright © 2008, All Rights Reserved
4. Battlefield of words
Fusion with customer support
Continuous translation
Community translation
Industry-shared language data
Massive online collaboration
Translation automation
TM Copyright © 2008, All Rights Reserved
5. Interactive
Support:
EMAIL
Knowledge
Knowledge Instant Base
Base Data Messaging
User Manuals User Generated Voice
Support Content Blogs
Documentation
User Interactive
Manual Support
• Web 2.0 is much more interactive and dynamic
• Globalization will be further driven by internet penetration into Asia
• Word-of-mouth-marketing gaining prominence all over the world
• Unstructured content in blogs, review sites is becoming critical
• The dialogue with global customer needs to be more interactive
TM Copyright © 2008, All Rights Reserved
6. Continuous Improvement HDSMT Engines
Sales /
Blogs
Marketing
CRM
Product Biz Intelligence
Management
Human
Content Resources
Management ECM
BPM
The Global
Customer
CRM
Email
Customer
Support IM
• Highly adaptive human driven process for continuous output quality
improvement in SMT engines and translation automation
• Intensive Collaboration with human translators to raise quality of SMT
• Integration with content creation and content refinement tools to enhance
speed and improve business process management
• Continued evolution in standards to facilitate sharing linguistic assets
TM Copyright © 2008, All Rights Reserved
7. • Comprehensive SaaS Platform that facilitates the
translation and continued refinement of any large high
value translatable corpus using HDSMT
• Existing Feature Set
– Data Cleaning & Preparation Tools
– On Demand SMT engine development
– Support for both user created and online dictionaries and glossaries
– Ability to pool data for greater leverage
– Multiple level domain support
– Seamless integration with collaborative post-editing environment
– Real time updates of translated assets
– Web Services based APIs for integration
• System and process foundation for managed online
community collaboration
TM Copyright © 2008, All Rights Reserved
8. • Bilingual Data Preparation & Cleaning
• Bilingual Data Normalization & Optimization
• Source Cleanup and Preparation
Data • Grammar and Spelling validation
Management • Monolingual Data Extraction & Analysis
• SMT System Training & Development
• Monolingual Data Training
• Ongoing Corpus Refinement and Tuning
SMT Engine • Analysis and Evaluation of Ngrams
• Error Pattern Identification & Correction
• Automated error correction tools
Output • Continuing Cycle of Exception Identification and Correction
Proofing & • Development of small sets of new data to correct errors
Editing
TM Copyright © 2008, All Rights Reserved
9. TM Copyright © 2008, All Rights Reserved
10. • Data Cleaning Utilities to normalize and standardize data
prior to consolidation to provide maximum leverage
• Recent study for TAUS proves conclusively that sharing
clean data provides leverage
– Smaller amount of clean data can produce better results than
datasets even 2X larger
– Consistent Terminology matters and provides real leverage
– Data optimized for TM Tools can be “dirty data “ for SMT
TM Copyright © 2008, All Rights Reserved
11. Initial System put
into production
Changes are collected Trained Internal
and added to initial Experts begin initial
corpus to drive clean up and correction
continuous retraining process
All users allowed to
Expert Users also
suggest changes which
allowed to make
go through vetting
changes
Community process
TM Copyright © 2008, All Rights Reserved
12. Targeted Corrections
Initial System of Bad Learning
Spelling & Terminology
Correct
Mistranslation
Syntax/Grammar
Terminology
Spelling
Punctuation
Human Feedback can
raise the raw output to previously
unseen quality levels
TM Copyright © 2008, All Rights Reserved
13. TM Copyright © 2008, All Rights Reserved
14. TM Copyright © 2008, All Rights Reserved
15. Information Requests Data Training
GetAccountInformation CancelTrainingJob
GetAccountUsageHistory GetTrainingJobList
GetAvailableDomainCombinationsForLanguagePair GetTrainingJobStatus
GetAvailableDomainsForLanguagePair SubmitDatasetForTraining
GetAvailableLanguagePairs Data Preparation
GetCustomDomainsForLanguagePair CleanText
Data Storage ExtractText
CreateDataset NormalizeText
DeleteDataset OCRImage
DeleteDataFromDataset ParagraphAlignLanguagePairText
DownloadDataset SentenceAlignLanguagePairText
DownloadDatasetItem SentenceSegmentText
GetDatasetList SpellCheckText
GetDatasetItemList WordSegmentText
LinkDataToDataset Translation
MergeDatasets CancelTranslationJob
UploadData GetTranslationJobList
UploadGlossary GetTranslationJobStatus
UploadImage SubmitDatasetForTranslation
UploadLanguageModel SubmitSinglePhraseForTranslation
UploadMonolingualText
UploadOCRPageLayout sUsername String The username of the person making the request.
UploadPhrasePairs
sPassword String The password of the person making the request.
UploadTranslationMemory
iAccountNo Integer The account number that this request is associated with.
UploadZIP
iDepartmentNo Integer The department number that this request is associated with.
iLanguagePairCode Integer The code for the language pair that is being looked up.
TM Copyright © 2008, All Rights Reserved
16. TM Copyright © 2008, All Rights Reserved
17. TM Copyright © 2008, All Rights Reserved
18. TM Copyright © 2008, All Rights Reserved
19. TM Copyright © 2008, All Rights Reserved
20. TM Copyright © 2008, All Rights Reserved
21. Provide existing human
translated content for
training language engines Translation
Systems User
Publishers Constant User accesses
Improvement online content in Social Networks /
local language Community
Leverage ASP
Translation service Translated content proof
for translation of read using community
new material principles and paid proof
readers using Asia Online
proof reading system
Proof reading
still required
whether human
or machine New
translation translations
sent back to
publisher
Translated
Translation Asia Online content made Translated Content
SaaS Portal available to
users
Human Proof Readers Translations are
proof read via ASP
Original Content translated
proof reading system
to local language Original Content
TM Copyright © 2008, All Rights Reserved
22. • Integrated data cleaning, data preparation, SMT systems
development and post-editing environment
• Comprehensive proof-reading and post-editing environment
that is integrated with core SMT engines to enable instant
updates Greater Control & Better systems
• Greater transparency of many key SMT building blocks to
enable users to see and modify what the system has learnt
resulting in greater control and better systems
• A richer and deeper taxonomy for domains to ensure the best
quality Better systems
• Incremental additions of new training data to any existing
system to enable rapid updates Faster updates
• Easy handling of terminology, glossaries, dictionaries
TM Copyright © 2008, All Rights Reserved