SlideShare ist ein Scribd-Unternehmen logo
1 von 7
Downloaden Sie, um offline zu lesen
Project Proposal
CSC 630, Fall 2013, University of Arizona
Sumin Byeon
Example-Based
Machine Translation
• Translation example sets (S₁→T₁),
(S₂→T₂), (S₃→T₃), ...
• Given a query text S, find the closest match
S’ such that (S’→T’)
• T’ is accepted as the translation of S
Hypothesis
S2# T2#S#
Sn# Tn#
S1# T1#
…#
h(S)# h(Sσ),#φ(S)# Ti#
Which hash function? Optimal value of k? Window size?
Relationship with
Content Addressability
• Content recognizability
• Hash - Winnowing
• Content recoverability
• By locating or reconstructing
• Unlike other projects like NDN or Receipt, mine is
relatively straightforward
• Simple key-value storage
• Key: hash
• Value: (reference to original text, offset)
Text Matching
• Full-text search may be an effective solution, but...
• Loses information regarding the ordering of the query
words
• Limited support for phrase search
• Certain linguistic features will be ignored (e.g.,“a”,“the”)
• Matching long enough partial text
• Longer text - lower probability of finding matches
• Shorter text - higher probability of ambiguity (i.e.,
homonym, false cognates)
Grand Plan
• Winnowing algorithm implementation
• Index a large number of samples (+10,000)
• Translation sample search engine with
simple RESTful interface
• Integrate it with Better Translator
Better Translator
• Language translator exploiting an indirect
translation trick
• e.g., (Korean)→(Japanese)→(English)
• A perfect platform to test the hypothesis
• 여러분이 몰랐던 구글 번역기
• Google Translate: You did not know GoogleTranslate
• Better Translator: GoogleTranslate you did not know

Weitere ähnliche Inhalte

Andere mochten auch

Final project proposal
Final project proposalFinal project proposal
Final project proposal
ridewan hilmi
 
Proposal format
Proposal formatProposal format
Proposal format
Mr SMAK
 
The Research Proposal
The Research ProposalThe Research Proposal
The Research Proposal
guest349908
 
Senior Project Proposal Form 2012-13
Senior Project Proposal Form 2012-13Senior Project Proposal Form 2012-13
Senior Project Proposal Form 2012-13
meghan06
 

Andere mochten auch (17)

Writing Successful Project Proposal
Writing Successful Project ProposalWriting Successful Project Proposal
Writing Successful Project Proposal
 
Gulayaan sa Paaralan Project [GPP] - Project Proposal for FY 2014
Gulayaan sa Paaralan Project [GPP] - Project Proposal for FY 2014Gulayaan sa Paaralan Project [GPP] - Project Proposal for FY 2014
Gulayaan sa Paaralan Project [GPP] - Project Proposal for FY 2014
 
Project proposal
Project proposalProject proposal
Project proposal
 
Final project proposal
Final project proposalFinal project proposal
Final project proposal
 
Proposal format
Proposal formatProposal format
Proposal format
 
Project proposal
Project proposalProject proposal
Project proposal
 
Sample Project Proposal Design Document
Sample Project Proposal Design DocumentSample Project Proposal Design Document
Sample Project Proposal Design Document
 
Developing effective research proposal
Developing effective research proposalDeveloping effective research proposal
Developing effective research proposal
 
The Research Proposal
The Research ProposalThe Research Proposal
The Research Proposal
 
Methods Of Translation
Methods Of TranslationMethods Of Translation
Methods Of Translation
 
Project Proposal document for Hotel Management System
Project Proposal document for Hotel Management SystemProject Proposal document for Hotel Management System
Project Proposal document for Hotel Management System
 
Translation Types
Translation TypesTranslation Types
Translation Types
 
Eco Batik Exhibition
Eco Batik ExhibitionEco Batik Exhibition
Eco Batik Exhibition
 
Proposal Writing
Proposal WritingProposal Writing
Proposal Writing
 
yosra
yosrayosra
yosra
 
Senior Project Proposal Form 2012-13
Senior Project Proposal Form 2012-13Senior Project Proposal Form 2012-13
Senior Project Proposal Form 2012-13
 
The Bolivia Orphanage Project Slideshow
The Bolivia Orphanage Project SlideshowThe Bolivia Orphanage Project Slideshow
The Bolivia Orphanage Project Slideshow
 

Mehr von Sumin Byeon

Error tolerant search
Error tolerant searchError tolerant search
Error tolerant search
Sumin Byeon
 
Git with bitbucket
Git with bitbucketGit with bitbucket
Git with bitbucket
Sumin Byeon
 
Git with bitbucket (draft)
Git with bitbucket (draft)Git with bitbucket (draft)
Git with bitbucket (draft)
Sumin Byeon
 
RNA Secondary Structure Prediction
RNA Secondary Structure PredictionRNA Secondary Structure Prediction
RNA Secondary Structure Prediction
Sumin Byeon
 

Mehr von Sumin Byeon (16)

PyCon 2017 프로그래머가 이사하는 법 2 [천원경매]
PyCon 2017 프로그래머가 이사하는 법 2 [천원경매]PyCon 2017 프로그래머가 이사하는 법 2 [천원경매]
PyCon 2017 프로그래머가 이사하는 법 2 [천원경매]
 
BD Talk 2017 봄 - 원정코딩
BD Talk 2017 봄 - 원정코딩BD Talk 2017 봄 - 원정코딩
BD Talk 2017 봄 - 원정코딩
 
NDC 2017 마이크로토크 - 프로그래머가 뉴스 읽는 법
NDC 2017 마이크로토크 - 프로그래머가 뉴스 읽는 법NDC 2017 마이크로토크 - 프로그래머가 뉴스 읽는 법
NDC 2017 마이크로토크 - 프로그래머가 뉴스 읽는 법
 
Are Credit Cards Evil
Are Credit Cards EvilAre Credit Cards Evil
Are Credit Cards Evil
 
NDC 2016 마이크로토크 - 프로그래머가 투자하는 법
NDC 2016 마이크로토크 - 프로그래머가 투자하는 법NDC 2016 마이크로토크 - 프로그래머가 투자하는 법
NDC 2016 마이크로토크 - 프로그래머가 투자하는 법
 
[야생의 땅: 듀랑고] 지형 관리 완전 자동화 - 생생한 AWS와 Docker 체험기
[야생의 땅: 듀랑고] 지형 관리 완전 자동화 - 생생한 AWS와 Docker 체험기[야생의 땅: 듀랑고] 지형 관리 완전 자동화 - 생생한 AWS와 Docker 체험기
[야생의 땅: 듀랑고] 지형 관리 완전 자동화 - 생생한 AWS와 Docker 체험기
 
더 나은 번역기는 나의 삶을 어떻게 바꾸었는가
더 나은 번역기는 나의 삶을 어떻게 바꾸었는가더 나은 번역기는 나의 삶을 어떻게 바꾸었는가
더 나은 번역기는 나의 삶을 어떻게 바꾸었는가
 
2015 PyCon - 프로그래머가 이사하는 법
2015 PyCon - 프로그래머가 이사하는 법2015 PyCon - 프로그래머가 이사하는 법
2015 PyCon - 프로그래머가 이사하는 법
 
[야생의 땅: 듀랑고]의 식물 생태계를 담당하는 21세기 정원사의 OpenCL 경험담
[야생의 땅: 듀랑고]의 식물 생태계를 담당하는 21세기 정원사의 OpenCL 경험담[야생의 땅: 듀랑고]의 식물 생태계를 담당하는 21세기 정원사의 OpenCL 경험담
[야생의 땅: 듀랑고]의 식물 생태계를 담당하는 21세기 정원사의 OpenCL 경험담
 
Cross-Language Information Retrieval
Cross-Language Information RetrievalCross-Language Information Retrieval
Cross-Language Information Retrieval
 
SLINKY: Static Linking Reloaded
SLINKY: Static Linking ReloadedSLINKY: Static Linking Reloaded
SLINKY: Static Linking Reloaded
 
Self-Tuning Wireless Network Power Management
Self-Tuning Wireless Network Power ManagementSelf-Tuning Wireless Network Power Management
Self-Tuning Wireless Network Power Management
 
Error tolerant search
Error tolerant searchError tolerant search
Error tolerant search
 
Git with bitbucket
Git with bitbucketGit with bitbucket
Git with bitbucket
 
Git with bitbucket (draft)
Git with bitbucket (draft)Git with bitbucket (draft)
Git with bitbucket (draft)
 
RNA Secondary Structure Prediction
RNA Secondary Structure PredictionRNA Secondary Structure Prediction
RNA Secondary Structure Prediction
 

Kürzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Kürzlich hochgeladen (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 

Project Proposal: Translation Example Search Engine

  • 1. Project Proposal CSC 630, Fall 2013, University of Arizona Sumin Byeon
  • 2. Example-Based Machine Translation • Translation example sets (S₁→T₁), (S₂→T₂), (S₃→T₃), ... • Given a query text S, find the closest match S’ such that (S’→T’) • T’ is accepted as the translation of S
  • 3. Hypothesis S2# T2#S# Sn# Tn# S1# T1# …# h(S)# h(Sσ),#φ(S)# Ti# Which hash function? Optimal value of k? Window size?
  • 4. Relationship with Content Addressability • Content recognizability • Hash - Winnowing • Content recoverability • By locating or reconstructing • Unlike other projects like NDN or Receipt, mine is relatively straightforward • Simple key-value storage • Key: hash • Value: (reference to original text, offset)
  • 5. Text Matching • Full-text search may be an effective solution, but... • Loses information regarding the ordering of the query words • Limited support for phrase search • Certain linguistic features will be ignored (e.g.,“a”,“the”) • Matching long enough partial text • Longer text - lower probability of finding matches • Shorter text - higher probability of ambiguity (i.e., homonym, false cognates)
  • 6. Grand Plan • Winnowing algorithm implementation • Index a large number of samples (+10,000) • Translation sample search engine with simple RESTful interface • Integrate it with Better Translator
  • 7. Better Translator • Language translator exploiting an indirect translation trick • e.g., (Korean)→(Japanese)→(English) • A perfect platform to test the hypothesis • 여러분이 몰랐던 구글 번역기 • Google Translate: You did not know GoogleTranslate • Better Translator: GoogleTranslate you did not know