Project Proposal: Translation Example Search Engine

Project Proposal
CSC 630, Fall 2013, University of Arizona
Sumin Byeon

Example-Based
Machine Translation
• Translation example sets (S₁→T₁),
(S₂→T₂), (S₃→T₃), ...
• Given a query text S, ﬁnd the closest match
S’ such that (S’→T’)
• T’ is accepted as the translation of S

Hypothesis
S2# T2#S#
Sn# Tn#
S1# T1#
…#
h(S)# h(Sσ),#φ(S)# Ti#
Which hash function? Optimal value of k? Window size?

Relationship with
Content Addressability
• Content recognizability
• Hash - Winnowing
• Content recoverability
• By locating or reconstructing
• Unlike other projects like NDN or Receipt, mine is
relatively straightforward
• Simple key-value storage
• Key: hash
• Value: (reference to original text, offset)

Text Matching
• Full-text search may be an effective solution, but...
• Loses information regarding the ordering of the query
words
• Limited support for phrase search
• Certain linguistic features will be ignored (e.g.,“a”,“the”)
• Matching long enough partial text
• Longer text - lower probability of ﬁnding matches
• Shorter text - higher probability of ambiguity (i.e.,
homonym, false cognates)

Grand Plan
• Winnowing algorithm implementation
• Index a large number of samples (+10,000)
• Translation sample search engine with
simple RESTful interface
• Integrate it with Better Translator

Better Translator
• Language translator exploiting an indirect
translation trick
• e.g., (Korean)→(Japanese)→(English)
• A perfect platform to test the hypothesis
• 여러분이 몰랐던 구글 번역기
• Google Translate: You did not know GoogleTranslate
• Better Translator: GoogleTranslate you did not know

Project Proposal: Translation Example Search Engine

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (17)

Mehr von Sumin Byeon

Mehr von Sumin Byeon (16)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Project Proposal: Translation Example Search Engine