2. Example-Based
Machine Translation
• Translation example sets (S₁→T₁),
(S₂→T₂), (S₃→T₃), ...
• Given a query text S, find the closest match
S’ such that (S’→T’)
• T’ is accepted as the translation of S
4. Relationship with
Content Addressability
• Content recognizability
• Hash - Winnowing
• Content recoverability
• By locating or reconstructing
• Unlike other projects like NDN or Receipt, mine is
relatively straightforward
• Simple key-value storage
• Key: hash
• Value: (reference to original text, offset)
5. Text Matching
• Full-text search may be an effective solution, but...
• Loses information regarding the ordering of the query
words
• Limited support for phrase search
• Certain linguistic features will be ignored (e.g.,“a”,“the”)
• Matching long enough partial text
• Longer text - lower probability of finding matches
• Shorter text - higher probability of ambiguity (i.e.,
homonym, false cognates)
6. Grand Plan
• Winnowing algorithm implementation
• Index a large number of samples (+10,000)
• Translation sample search engine with
simple RESTful interface
• Integrate it with Better Translator
7. Better Translator
• Language translator exploiting an indirect
translation trick
• e.g., (Korean)→(Japanese)→(English)
• A perfect platform to test the hypothesis
• 여러분이 몰랐던 구글 번역기
• Google Translate: You did not know GoogleTranslate
• Better Translator: GoogleTranslate you did not know