SlideShare ist ein Scribd-Unternehmen logo
1 von 13
Downloaden Sie, um offline zu lesen
ACL	2016	reading	
Neural	Machine	Transla8on	of	
Rare	Words	with	Subword	Units		
	author	:	Rico	Sennrich,	
Barry	Haddow	,	Alexandra	Birch	
	
presenta8on	:	Sekizawa	Yuuki	
Komachi	lab	M1	
	
16/10/26	 1
Neural	Machine	Transla8on	of	
Rare	Words	with	Subword	Units		
•  NMT	:	fixed	vocabulary	
•  transla8on	:	open-vocabulary	
àNMT	have	to	address	out-of-vocabulary(OOV)	
																																						such	as	rare	and	unknown	words	
•  propose	method	
•  encode	OOV	words	as	sequences	of	subword	units		
•  result(BLEU,	WMT2015,	compare	with	baseline)	
•  Eng-Ger	:	+1.1,	Eng-Rus	:	+1.3	
•  main	contribu8on	
•  open	vocabulary	NMT	by	encoding	words	via	subword	units	
•  adapt	byte	pair	encoding	to	word	segmenta8on	
16/10/26	 2
transparent	word	category	to	translate	
•  name	en88es	
•  copy	src	à	trg	
•  need	transcrip8on	(if	alphabets	or	syllabraries	differ)	
•  cognates,	loanwords	
•  character-level	differ	
•  morphologically	complex	words	
•  mul8ple	morphemes	
•  tranlsate	separately		
16/10/26	 3
related	work	
•  Durrani	et	al.	2014	
•  copy	unknown	words	(alphabet	is	shared)	
•  translitera8on	is	required	(alphabets	differ)	
•  Mikolov	et	al.	2012	
•  inves8gate	subword	language	models	
•  propose	to	use	syllables	(speech	recogni8on)	
16/10/26	 4
byte	pair	encoding(BPE)	(Gage,	1994)	
•  BPE	:	simple	data	compression	technique	
•  itera8vely	replace	the	most	frequent	pair	of	bytes	in	
a	with	a	single,	unused	byte	
•  this	paper	
•  merge	characters	or	character	sequences	
•  most	frequent	pair	(‘A’,’B’)	à	‘AB’	
•  don’t	cross	word	boundary	(for	efficiency)	
•  aden8on	model	operates	on	variable-length	units		
16/10/26	 5
BPE	example	
•  learning	
•  word:freq	:	{low:5,	lowest:2,	newer:6,	wider:3}	
•  marge	&	count		
1.  ‘r’	‘</w>’	:	9			à	marge’r</w>’	
2.  ‘e’	‘r</w>’	:	9	àmarge’er</w>’	
3.  ‘l’	‘o’	:	7											àmarge’lo’	
4.  ‘lo’	‘w’	:	7							àmarge’low’	
à	OOV	:	‘lower’	segmented	‘low	er</w>’	
16/10/26	 6
Evalua8on	
•  data	:	shared	transla8on	task	of	WMT	2015	
•  En-Ge	train	:	4.2m	sentence,	100m	tokens	
•  En-Ru	train	:	2.6m	sentence,	50m				tokens	
•  dev	:	newstest2013,	test	:	newstest2015	
•  use	BLEU,	CHR	F3,	character	ngram	F3	
	
16/10/26	 7
segmenta8on	sta8cs	(train)	
number	of	unknown	tokens	in	newstest2013		
16/10/26	 8	
segmenta8on	
technique	
in	SMT	
	
59,500	merge		
89,500	merge	
		
unsegmented	
words
result(En-Ge)	
•  Wunk	:	word-level	model	OOV	output	is	UNK	
•  Wdict	:	Wunk	with	a	back-off	dict	to	rare	words			(baseline)	
•  C2-50k	:	character	bigrams	with	50,000	unsegmented	words	
•  BPE-J90k	:	learning	BPE	symbols	on	vocab	union	
•  BPE-60k		:	learning	BPE	symbols	separately	
16/10/26	 9
result(En-Ge)	
•  words	:	44,085	
•  not	in	top	50,000	words	:	2,900	
•  OOV	:	1,168	
16/10/26	 10
result(En-Ge)	
•  words	:	55,654	
•  not	in	top	50,000	words	:	5,442	
•  OOV	:	851	
16/10/26	 11
transla8on	example	
	
En	–	Ge	
	
	
	
En-	Ru	
16/10/26	 12
Neural	Machine	Transla8on	of	
Rare	Words	with	Subword	Units		
•  main	contribu8on	
•  capable	of	open-vocabulary	in	NMT	
•  represent	OOV	as	a	sequence	of	subword	units	
•  using	byte	pair	encoding	
•  simple	and	effec8ve	than	back-off	model	
•  future	work		
•  learn	op8oal	vocab	size	for	a	transla8on	task	
•  ex:	language	pair,	amount	of	training	data…	
16/10/26	 13

Weitere ähnliche Inhalte

Was ist angesagt?

Scientific and technical translation in English - Week 8
Scientific and technical translation in English - Week 8Scientific and technical translation in English - Week 8
Scientific and technical translation in English - Week 8
Ron Martinez
 
Presentation of OpenNLP
Presentation of OpenNLPPresentation of OpenNLP
Presentation of OpenNLP
Robert Viseur
 

Was ist angesagt? (16)

Scientific and Technical Translation in English - Week 4
Scientific and Technical Translation in English - Week 4Scientific and Technical Translation in English - Week 4
Scientific and Technical Translation in English - Week 4
 
Scientific and Technical Translation in English - Week 7
Scientific and Technical Translation in English - Week 7Scientific and Technical Translation in English - Week 7
Scientific and Technical Translation in English - Week 7
 
Introduction to functional programming, with Elixir
Introduction to functional programming,  with ElixirIntroduction to functional programming,  with Elixir
Introduction to functional programming, with Elixir
 
Scientific and technical translation in English - Week 8
Scientific and technical translation in English - Week 8Scientific and technical translation in English - Week 8
Scientific and technical translation in English - Week 8
 
AINL 2016: Eyecioglu
AINL 2016: EyeciogluAINL 2016: Eyecioglu
AINL 2016: Eyecioglu
 
Recent trends in natural language processing
Recent trends in natural language processingRecent trends in natural language processing
Recent trends in natural language processing
 
AINL 2016: Kravchenko
AINL 2016: KravchenkoAINL 2016: Kravchenko
AINL 2016: Kravchenko
 
DotNet 2019 | Pablo Doval - Recurrent Neural Networks with TF2.0
DotNet 2019 | Pablo Doval - Recurrent Neural Networks with TF2.0DotNet 2019 | Pablo Doval - Recurrent Neural Networks with TF2.0
DotNet 2019 | Pablo Doval - Recurrent Neural Networks with TF2.0
 
Presentation of OpenNLP
Presentation of OpenNLPPresentation of OpenNLP
Presentation of OpenNLP
 
Everyday Functional Programming in JavaScript
Everyday Functional Programming in JavaScriptEveryday Functional Programming in JavaScript
Everyday Functional Programming in JavaScript
 
Neural Network Language Models for Candidate Scoring in Multi-System Machine...
 Neural Network Language Models for Candidate Scoring in Multi-System Machine... Neural Network Language Models for Candidate Scoring in Multi-System Machine...
Neural Network Language Models for Candidate Scoring in Multi-System Machine...
 
A Model-Based Approach to Language Integration
A Model-Based Approach to Language Integration A Model-Based Approach to Language Integration
A Model-Based Approach to Language Integration
 
CoLing 2016
CoLing 2016CoLing 2016
CoLing 2016
 
Modern Programming Languages classification Poster
Modern Programming Languages classification PosterModern Programming Languages classification Poster
Modern Programming Languages classification Poster
 
Xml processing-by-asfak
Xml processing-by-asfakXml processing-by-asfak
Xml processing-by-asfak
 
Open nlp presentationss
Open nlp presentationssOpen nlp presentationss
Open nlp presentationss
 

Andere mochten auch (7)

paper introducing: Exploiting source side monolingual data in neural machine ...
paper introducing: Exploiting source side monolingual data in neural machine ...paper introducing: Exploiting source side monolingual data in neural machine ...
paper introducing: Exploiting source side monolingual data in neural machine ...
 
目的言語の低頻度語の高頻度語への言い換えによるニューラル機械翻訳の改善
目的言語の低頻度語の高頻度語への言い換えによるニューラル機械翻訳の改善目的言語の低頻度語の高頻度語への言い換えによるニューラル機械翻訳の改善
目的言語の低頻度語の高頻度語への言い換えによるニューラル機械翻訳の改善
 
Nlp2016 sekizawa
Nlp2016 sekizawaNlp2016 sekizawa
Nlp2016 sekizawa
 
Coling2016 pre-translation for neural machine translation
Coling2016 pre-translation for neural machine translationColing2016 pre-translation for neural machine translation
Coling2016 pre-translation for neural machine translation
 
Emnlp読み会@2015 10-09
Emnlp読み会@2015 10-09Emnlp読み会@2015 10-09
Emnlp読み会@2015 10-09
 
[論文紹介]Selecting syntactic, non redundant segments in active learning for mach...
[論文紹介]Selecting syntactic, non redundant segments in active learning for mach...[論文紹介]Selecting syntactic, non redundant segments in active learning for mach...
[論文紹介]Selecting syntactic, non redundant segments in active learning for mach...
 
Emnlp読み会@2017 02-15
Emnlp読み会@2017 02-15Emnlp読み会@2017 02-15
Emnlp読み会@2017 02-15
 

Ähnlich wie Acl reading@2016 10-26

Translating phrases in neural machine translation
Translating phrases in neural machine translationTranslating phrases in neural machine translation
Translating phrases in neural machine translation
sekizawayuuki
 

Ähnlich wie Acl reading@2016 10-26 (20)

Ontology Access Kit_ Workshop Intro Slides.pptx
Ontology Access Kit_ Workshop Intro Slides.pptxOntology Access Kit_ Workshop Intro Slides.pptx
Ontology Access Kit_ Workshop Intro Slides.pptx
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)
 
AINL 2016: Nikolenko
AINL 2016: NikolenkoAINL 2016: Nikolenko
AINL 2016: Nikolenko
 
Intro to KotlinNLP
Intro to KotlinNLPIntro to KotlinNLP
Intro to KotlinNLP
 
Introduction to KotlinNLP
Introduction to KotlinNLPIntroduction to KotlinNLP
Introduction to KotlinNLP
 
Php packages
Php packagesPhp packages
Php packages
 
A Brief Introduction to SKOS
A Brief Introduction to SKOSA Brief Introduction to SKOS
A Brief Introduction to SKOS
 
GPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersGPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask Learners
 
sw owl
 sw owl sw owl
sw owl
 
Open vocabulary problem
Open vocabulary problemOpen vocabulary problem
Open vocabulary problem
 
Translating phrases in neural machine translation
Translating phrases in neural machine translationTranslating phrases in neural machine translation
Translating phrases in neural machine translation
 
ICANN 51: IDN Root Zone LGR (workshop)
ICANN 51: IDN Root Zone LGR (workshop)ICANN 51: IDN Root Zone LGR (workshop)
ICANN 51: IDN Root Zone LGR (workshop)
 
Preliminary study on using vector quantization latent spaces for TTS/VC syste...
Preliminary study on using vector quantization latent spaces for TTS/VC syste...Preliminary study on using vector quantization latent spaces for TTS/VC syste...
Preliminary study on using vector quantization latent spaces for TTS/VC syste...
 
haenelt.ppt
haenelt.ppthaenelt.ppt
haenelt.ppt
 
Pedagogical applications of corpus data for English for General and Specific ...
Pedagogical applications of corpus data for English for General and Specific ...Pedagogical applications of corpus data for English for General and Specific ...
Pedagogical applications of corpus data for English for General and Specific ...
 
Ontology and Ontology Libraries: a Critical Study
Ontology and Ontology Libraries: a Critical StudyOntology and Ontology Libraries: a Critical Study
Ontology and Ontology Libraries: a Critical Study
 
Improving Japanese-to-English Neural Machine Translation by Paraphrasing the ...
Improving Japanese-to-English Neural Machine Translation by Paraphrasing the ...Improving Japanese-to-English Neural Machine Translation by Paraphrasing the ...
Improving Japanese-to-English Neural Machine Translation by Paraphrasing the ...
 
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
 
sete linguagens em sete semanas
sete linguagens em sete semanassete linguagens em sete semanas
sete linguagens em sete semanas
 
Controlled Natural Language Generation from a Multilingual FrameNet-based Gra...
Controlled Natural Language Generation from a Multilingual FrameNet-based Gra...Controlled Natural Language Generation from a Multilingual FrameNet-based Gra...
Controlled Natural Language Generation from a Multilingual FrameNet-based Gra...
 

Mehr von sekizawayuuki

Improving lexical choice in neural machine translation
Improving lexical choice in neural machine translationImproving lexical choice in neural machine translation
Improving lexical choice in neural machine translation
sekizawayuuki
 
Incorporating word reordering knowledge into attention-based neural machine t...
Incorporating word reordering knowledge into attention-based neural machine t...Incorporating word reordering knowledge into attention-based neural machine t...
Incorporating word reordering knowledge into attention-based neural machine t...
sekizawayuuki
 
読解支援@2015 07-13
読解支援@2015 07-13読解支援@2015 07-13
読解支援@2015 07-13
sekizawayuuki
 
読解支援@2015 07-03
読解支援@2015 07-03読解支援@2015 07-03
読解支援@2015 07-03
sekizawayuuki
 
読解支援@2015 06-26
読解支援@2015 06-26読解支援@2015 06-26
読解支援@2015 06-26
sekizawayuuki
 
読解支援@2015 06-12
読解支援@2015 06-12読解支援@2015 06-12
読解支援@2015 06-12
sekizawayuuki
 
読解支援@2015 06-09
読解支援@2015 06-09読解支援@2015 06-09
読解支援@2015 06-09
sekizawayuuki
 
読解支援@2015 06-05
読解支援@2015 06-05読解支援@2015 06-05
読解支援@2015 06-05
sekizawayuuki
 
読解支援@2015 05-22
読解支援@2015 05-22読解支援@2015 05-22
読解支援@2015 05-22
sekizawayuuki
 
読解支援@2015 05-15
読解支援@2015 05-15読解支援@2015 05-15
読解支援@2015 05-15
sekizawayuuki
 

Mehr von sekizawayuuki (20)

Improving lexical choice in neural machine translation
Improving lexical choice in neural machine translationImproving lexical choice in neural machine translation
Improving lexical choice in neural machine translation
 
Incorporating word reordering knowledge into attention-based neural machine t...
Incorporating word reordering knowledge into attention-based neural machine t...Incorporating word reordering knowledge into attention-based neural machine t...
Incorporating word reordering knowledge into attention-based neural machine t...
 
Acl読み会@2015 09-18
Acl読み会@2015 09-18Acl読み会@2015 09-18
Acl読み会@2015 09-18
 
読解支援@2015 08-10-6
読解支援@2015 08-10-6読解支援@2015 08-10-6
読解支援@2015 08-10-6
 
読解支援@2015 08-10-5
読解支援@2015 08-10-5読解支援@2015 08-10-5
読解支援@2015 08-10-5
 
読解支援@2015 08-10-4
読解支援@2015 08-10-4読解支援@2015 08-10-4
読解支援@2015 08-10-4
 
読解支援@2015 08-10-3
読解支援@2015 08-10-3読解支援@2015 08-10-3
読解支援@2015 08-10-3
 
読解支援@2015 08-10-2
読解支援@2015 08-10-2読解支援@2015 08-10-2
読解支援@2015 08-10-2
 
読解支援@2015 08-10-1
読解支援@2015 08-10-1読解支援@2015 08-10-1
読解支援@2015 08-10-1
 
読解支援@2015 07-24
読解支援@2015 07-24読解支援@2015 07-24
読解支援@2015 07-24
 
読解支援@2015 07-17
読解支援@2015 07-17読解支援@2015 07-17
読解支援@2015 07-17
 
読解支援@2015 07-13
読解支援@2015 07-13読解支援@2015 07-13
読解支援@2015 07-13
 
読解支援@2015 07-03
読解支援@2015 07-03読解支援@2015 07-03
読解支援@2015 07-03
 
読解支援@2015 06-26
読解支援@2015 06-26読解支援@2015 06-26
読解支援@2015 06-26
 
Naacl読み会@2015 06-24
Naacl読み会@2015 06-24Naacl読み会@2015 06-24
Naacl読み会@2015 06-24
 
読解支援@2015 06-12
読解支援@2015 06-12読解支援@2015 06-12
読解支援@2015 06-12
 
読解支援@2015 06-09
読解支援@2015 06-09読解支援@2015 06-09
読解支援@2015 06-09
 
読解支援@2015 06-05
読解支援@2015 06-05読解支援@2015 06-05
読解支援@2015 06-05
 
読解支援@2015 05-22
読解支援@2015 05-22読解支援@2015 05-22
読解支援@2015 05-22
 
読解支援@2015 05-15
読解支援@2015 05-15読解支援@2015 05-15
読解支援@2015 05-15
 

Kürzlich hochgeladen

Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
AnaAcapella
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 

Kürzlich hochgeladen (20)

ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 

Acl reading@2016 10-26