SlideShare ist ein Scribd-Unternehmen logo
1 von 16
Downloaden Sie, um offline zu lesen
System Combination of 

RBMT plus SPE and 

Preordering plus SMT	
Terumasa EHARA	
Ehara NLP Research Laboratory	
http://www.ne.jp/asahi/eharate/eharate/	
	
2015. 10. 16	
WAT2015
Introduction	
• Hybrid system combining rule-based
method (RBMT, preordering) and
statistical method (SMT, SPE) may
make MT more accurate.	
• Several improvements are
implemented.	
	
 2
System architecture	
RBMT
Semi-­‐monotone
SPE
Preordering
Semi-­‐monotone
SMT
Candidate
selection
Input
Output
RBMT
3
RBMT part	
Task Engi
ne
U ser
di
cti
onary
Sentence
pattern	
 di
c.
En-j
a C om m erci
al -- --
Zh-j
a C om m erci
al
JPO 	
 di
c
1,
463,
265
--
JPO zh-j
a C om m erci
al
JPO 	
 di
c
1,
463,
265
13
JPO ko-j
a C om m erci
al 434,
334 --
4
SPE part	
• Phrase based Moses	
• LM for en-ja and zh-ja:

3.6M sent., KenLM, order 6	
• LM for JPOzh-ja and JPOko-ja:

5M sent. (with NTCIR-10 data),
KenLM, order 6	
• TM size: see later	
• Distortion limit: 6 (semi-monotone)	
	
5
Preordering part (1)	
• En-ja, zh-ja, JPOzh-ja task only	
• Stanford segmenter	
• Berkeley parser	
• Rule-based preordering

	
VP	
  ⇒ ADVP	
  VBN	
  PP	
  2	
  0	
  1	
  	
  (widely	
  u3lized	
  in	
  many	
  fields)	
  
	
  
VP ⇒ VV	
  NP	
  IP	
  1	
  2	
  0	
  (使 各个 电动机 13	
  旋转 驱动)	
6
Preordering part (2)	
• Improvements of the parser	
ü Chinese	
  grammar	
  improvement 	
ü 	
  Reranking	
  of	
  k-­‐best	
  parse	
  trees



	
7
Chinese grammar improvement (1)	
• Determine gold standard of Chinese
sentence head	
• Select best parse tree from 100 parse
trees	
シンチレータ 54 は 、 X線 を 受けて 光 を 発生 する 。	
	
	
闪烁器 54 接收 X 射线 产生 光 。 	
  
	
8
Chinese grammar improvement (2)	
• Select the best parse tree ⇒ (b)	
• Retraining the grammar	
┗IP
┣NP
┃┣NR━闪烁器
┃┗NN━54
┣VP
┃┣VV━接收
┃┣NP
┃┃┣NN━X
┃┃┗NN━射线
┃┗IP
┃ ┗VP
┃ ┣VV━产生
┃ ┗NP
┃ ┗NN━光
┗PU━。
┗IP
┣NP
┃┣NR━闪烁器
┃┗NN━54
┣VP
┃┣VP
┃┃┣VV━接收
┃┃┗NP
┃┃ ┣NN━X
┃┃ ┗NN━射线
┃┗VP
┃ ┣VV━产生
┃ ┗NP
┃ ┗NN━光
┗PU━。
(a)	
 (b)	
9
Chinese grammar improvement (3)	
• Sentence head agreement rate with
the gold standard (devtest data)

	
G ram m ar Agreem ent	
 
rate	
 
(
%)
O ri
gi
nal
(
chn_
sm 5.
gr)
50.
5
I
m proved
(
chn_
j
po.
gr)
63.
0
10
Reranking of k-best parse trees (1)	
• Reranking with the reordering accuracy
(WER, for training data)



Gold:闪烁器 54  X 射线 接收 光 产生 。
P(a):闪烁器 54 X 射线 光 产生 接收 。


P(b):闪烁器 54  X 射线 接收 光 产生 。	
11
Reranking of k-best parse trees (2)	
• Reranking with the LM score for dev,
devtest and test data



LM: trained by the reordered training data
(1M)



LM score = output of query1 / # of words



1. ~/mosesdecoder/bin/query	
12
SMT part	
• Phrase based Moses	
• LM: same as SPE’s LM	
• TM:	
• Distortion limit: 6 (semi-monotone SMT)	
Task C orpus	
 si
ze
JPC ko-j
a 994,
998
JPC zh-j
a 995,
385
zh-j
a 668,
468
en-j
a 2,
351,
575
13
Candidate selection part	
• Selection with the LM score



Number of selected candidates	
Task RBM T RBM T+SPE
Preorderi
ng
+SM T
Total
JPC ko-j
a 25 1177 798 2000
JPC zh-j
a 2 875 1123 2000
zh-j
a 9 1270 828 2107
en-j
a 5 658 1149 1812
14
JPOzh-ja results for devtest data	
System Addi
ti
onal
	
 feature BLEU RI
BES
RBM T 16.
55 0.
7192
RBM T User	
 di
cti
onary 23.
54 0.
7510
RBM T+SPE User	
 di
cti
onary 41.
35 0.
8203
SM T (DL=10) 40.
86 0.
8071
Preorderi
ng+SM T ori
gi
nal
	
 gram m ar 40.
15 0.
8164
Preorderi
ng+SM T i
m proved	
 gram m ar 41.
84 0.
8218
Preorderi
ng+SM T
i
m proved	
 gram m ar	
 +
reranki
ng	
 of	
 parse
trees
42.
75 0.
8237
System 	
 com bi
nati
on 43.
01 0.
8265
15
Conclusion and future works	
• Combination of rule-based method and
statistical method.	
• Human evaluation results:

	
	
• To improve parsing accuracy.	
Task C rowd JPO 	
 
adequacy
En-j
a 11/15 --
Zh-j
a 2/11 3/3
JPO zh-j
a 3/12 2/3
JPO ko-j
a 4/13 --
16

Weitere ähnliche Inhalte

Was ist angesagt?

【DL輪読会】Incorporating group update for speech enhancement based on convolutio...
【DL輪読会】Incorporating group update for speech enhancement  based on convolutio...【DL輪読会】Incorporating group update for speech enhancement  based on convolutio...
【DL輪読会】Incorporating group update for speech enhancement based on convolutio...Deep Learning JP
 
GTC Japan 2016 Chainer feature introduction
GTC Japan 2016 Chainer feature introductionGTC Japan 2016 Chainer feature introduction
GTC Japan 2016 Chainer feature introductionKenta Oono
 
Optimizing High Performance Big Data Cancer Workflows
Optimizing High Performance Big Data Cancer WorkflowsOptimizing High Performance Big Data Cancer Workflows
Optimizing High Performance Big Data Cancer WorkflowsIvan Jimenez
 
Introduction to Chainer 11 may,2018
Introduction to Chainer 11 may,2018Introduction to Chainer 11 may,2018
Introduction to Chainer 11 may,2018Preferred Networks
 
QuantumChemistry500
QuantumChemistry500QuantumChemistry500
QuantumChemistry500Maho Nakata
 
Introduction to Chainer
Introduction to ChainerIntroduction to Chainer
Introduction to ChainerSeiya Tokui
 
Implementation of linear regression and logistic regression on Spark
Implementation of linear regression and logistic regression on SparkImplementation of linear regression and logistic regression on Spark
Implementation of linear regression and logistic regression on SparkDalei Li
 
Benchmark Analysis of Multi-core Processor Memory Contention April 2009
Benchmark Analysis of Multi-core Processor Memory Contention April 2009Benchmark Analysis of Multi-core Processor Memory Contention April 2009
Benchmark Analysis of Multi-core Processor Memory Contention April 2009James McGalliard
 
PFN Summer Internship 2019 / Kenshin Abe: Extension of Chainer-Chemistry for ...
PFN Summer Internship 2019 / Kenshin Abe: Extension of Chainer-Chemistry for ...PFN Summer Internship 2019 / Kenshin Abe: Extension of Chainer-Chemistry for ...
PFN Summer Internship 2019 / Kenshin Abe: Extension of Chainer-Chemistry for ...Preferred Networks
 
Efficient Parallel Set-Similarity Joins Using MapReduce - Poster
Efficient Parallel Set-Similarity Joins Using MapReduce - PosterEfficient Parallel Set-Similarity Joins Using MapReduce - Poster
Efficient Parallel Set-Similarity Joins Using MapReduce - Posterrvernica
 
Introduction to Radial Basis Function Networks
Introduction to Radial Basis Function NetworksIntroduction to Radial Basis Function Networks
Introduction to Radial Basis Function NetworksESCOM
 
Chainer Update v1.8.0 -> v1.10.0+
Chainer Update v1.8.0 -> v1.10.0+Chainer Update v1.8.0 -> v1.10.0+
Chainer Update v1.8.0 -> v1.10.0+Seiya Tokui
 
Large Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate DescentLarge Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate DescentShaleen Kumar Gupta
 
การคอนฟิกส์ OSPF บน Ubiquiti Edge Router
การคอนฟิกส์ OSPF บน Ubiquiti Edge Routerการคอนฟิกส์ OSPF บน Ubiquiti Edge Router
การคอนฟิกส์ OSPF บน Ubiquiti Edge RouterTũi Wichets
 

Was ist angesagt? (16)

【DL輪読会】Incorporating group update for speech enhancement based on convolutio...
【DL輪読会】Incorporating group update for speech enhancement  based on convolutio...【DL輪読会】Incorporating group update for speech enhancement  based on convolutio...
【DL輪読会】Incorporating group update for speech enhancement based on convolutio...
 
GTC Japan 2016 Chainer feature introduction
GTC Japan 2016 Chainer feature introductionGTC Japan 2016 Chainer feature introduction
GTC Japan 2016 Chainer feature introduction
 
Optimizing High Performance Big Data Cancer Workflows
Optimizing High Performance Big Data Cancer WorkflowsOptimizing High Performance Big Data Cancer Workflows
Optimizing High Performance Big Data Cancer Workflows
 
Introduction to Chainer 11 may,2018
Introduction to Chainer 11 may,2018Introduction to Chainer 11 may,2018
Introduction to Chainer 11 may,2018
 
Radial Basis Function
Radial Basis FunctionRadial Basis Function
Radial Basis Function
 
QuantumChemistry500
QuantumChemistry500QuantumChemistry500
QuantumChemistry500
 
Introduction to Chainer
Introduction to ChainerIntroduction to Chainer
Introduction to Chainer
 
Implementation of linear regression and logistic regression on Spark
Implementation of linear regression and logistic regression on SparkImplementation of linear regression and logistic regression on Spark
Implementation of linear regression and logistic regression on Spark
 
Benchmark Analysis of Multi-core Processor Memory Contention April 2009
Benchmark Analysis of Multi-core Processor Memory Contention April 2009Benchmark Analysis of Multi-core Processor Memory Contention April 2009
Benchmark Analysis of Multi-core Processor Memory Contention April 2009
 
PFN Summer Internship 2019 / Kenshin Abe: Extension of Chainer-Chemistry for ...
PFN Summer Internship 2019 / Kenshin Abe: Extension of Chainer-Chemistry for ...PFN Summer Internship 2019 / Kenshin Abe: Extension of Chainer-Chemistry for ...
PFN Summer Internship 2019 / Kenshin Abe: Extension of Chainer-Chemistry for ...
 
676.v3
676.v3676.v3
676.v3
 
Efficient Parallel Set-Similarity Joins Using MapReduce - Poster
Efficient Parallel Set-Similarity Joins Using MapReduce - PosterEfficient Parallel Set-Similarity Joins Using MapReduce - Poster
Efficient Parallel Set-Similarity Joins Using MapReduce - Poster
 
Introduction to Radial Basis Function Networks
Introduction to Radial Basis Function NetworksIntroduction to Radial Basis Function Networks
Introduction to Radial Basis Function Networks
 
Chainer Update v1.8.0 -> v1.10.0+
Chainer Update v1.8.0 -> v1.10.0+Chainer Update v1.8.0 -> v1.10.0+
Chainer Update v1.8.0 -> v1.10.0+
 
Large Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate DescentLarge Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate Descent
 
การคอนฟิกส์ OSPF บน Ubiquiti Edge Router
การคอนฟิกส์ OSPF บน Ubiquiti Edge Routerการคอนฟิกส์ OSPF บน Ubiquiti Edge Router
การคอนฟิกส์ OSPF บน Ubiquiti Edge Router
 

Mehr von Association for Computational Linguistics

Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...Association for Computational Linguistics
 
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...Association for Computational Linguistics
 
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsDaniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsAssociation for Computational Linguistics
 
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsDaniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsAssociation for Computational Linguistics
 
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...Association for Computational Linguistics
 
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...Association for Computational Linguistics
 
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Association for Computational Linguistics
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopAssociation for Computational Linguistics
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...Association for Computational Linguistics
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...Association for Computational Linguistics
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Association for Computational Linguistics
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Association for Computational Linguistics
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopAssociation for Computational Linguistics
 
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...Association for Computational Linguistics
 

Mehr von Association for Computational Linguistics (20)

Muis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal Text
Muis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal TextMuis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal Text
Muis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal Text
 
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
 
Castro - 2018 - A Crowd-Annotated Spanish Corpus for Humour Analysis
Castro - 2018 - A Crowd-Annotated Spanish Corpus for Humour AnalysisCastro - 2018 - A Crowd-Annotated Spanish Corpus for Humour Analysis
Castro - 2018 - A Crowd-Annotated Spanish Corpus for Humour Analysis
 
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
 
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsDaniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
 
Elior Sulem - 2018 - Semantic Structural Evaluation for Text Simplification
Elior Sulem - 2018 - Semantic Structural Evaluation for Text SimplificationElior Sulem - 2018 - Semantic Structural Evaluation for Text Simplification
Elior Sulem - 2018 - Semantic Structural Evaluation for Text Simplification
 
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsDaniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
 
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
 
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
 
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
 
Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
 
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
 
Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015
 
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
 

Terumasa Ehara - 2015 - System Combination of RBMT plus SPE and Preordering plus SMT

  • 1. System Combination of 
 RBMT plus SPE and 
 Preordering plus SMT Terumasa EHARA Ehara NLP Research Laboratory http://www.ne.jp/asahi/eharate/eharate/ 2015. 10. 16 WAT2015
  • 2. Introduction • Hybrid system combining rule-based method (RBMT, preordering) and statistical method (SMT, SPE) may make MT more accurate. • Several improvements are implemented. 2
  • 4. RBMT part Task Engi ne U ser di cti onary Sentence pattern di c. En-j a C om m erci al -- -- Zh-j a C om m erci al JPO di c 1, 463, 265 -- JPO zh-j a C om m erci al JPO di c 1, 463, 265 13 JPO ko-j a C om m erci al 434, 334 -- 4
  • 5. SPE part • Phrase based Moses • LM for en-ja and zh-ja:
 3.6M sent., KenLM, order 6 • LM for JPOzh-ja and JPOko-ja:
 5M sent. (with NTCIR-10 data), KenLM, order 6 • TM size: see later • Distortion limit: 6 (semi-monotone) 5
  • 6. Preordering part (1) • En-ja, zh-ja, JPOzh-ja task only • Stanford segmenter • Berkeley parser • Rule-based preordering
 VP  ⇒ ADVP  VBN  PP  2  0  1    (widely  u3lized  in  many  fields)     VP ⇒ VV  NP  IP  1  2  0  (使 各个 电动机 13  旋转 驱动) 6
  • 7. Preordering part (2) • Improvements of the parser ü Chinese  grammar  improvement ü   Reranking  of  k-­‐best  parse  trees
 
 7
  • 8. Chinese grammar improvement (1) • Determine gold standard of Chinese sentence head • Select best parse tree from 100 parse trees シンチレータ 54 は 、 X線 を 受けて 光 を 発生 する 。 闪烁器 54 接收 X 射线 产生 光 。   8
  • 9. Chinese grammar improvement (2) • Select the best parse tree ⇒ (b) • Retraining the grammar ┗IP ┣NP ┃┣NR━闪烁器 ┃┗NN━54 ┣VP ┃┣VV━接收 ┃┣NP ┃┃┣NN━X ┃┃┗NN━射线 ┃┗IP ┃ ┗VP ┃ ┣VV━产生 ┃ ┗NP ┃ ┗NN━光 ┗PU━。 ┗IP ┣NP ┃┣NR━闪烁器 ┃┗NN━54 ┣VP ┃┣VP ┃┃┣VV━接收 ┃┃┗NP ┃┃ ┣NN━X ┃┃ ┗NN━射线 ┃┗VP ┃ ┣VV━产生 ┃ ┗NP ┃ ┗NN━光 ┗PU━。 (a) (b) 9
  • 10. Chinese grammar improvement (3) • Sentence head agreement rate with the gold standard (devtest data)
 G ram m ar Agreem ent rate ( %) O ri gi nal ( chn_ sm 5. gr) 50. 5 I m proved ( chn_ j po. gr) 63. 0 10
  • 11. Reranking of k-best parse trees (1) • Reranking with the reordering accuracy (WER, for training data)
 
 Gold:闪烁器 54  X 射线 接收 光 产生 。 P(a):闪烁器 54 X 射线 光 产生 接收 。 
 P(b):闪烁器 54  X 射线 接收 光 产生 。 11
  • 12. Reranking of k-best parse trees (2) • Reranking with the LM score for dev, devtest and test data
 
 LM: trained by the reordered training data (1M)
 
 LM score = output of query1 / # of words
 
 1. ~/mosesdecoder/bin/query 12
  • 13. SMT part • Phrase based Moses • LM: same as SPE’s LM • TM: • Distortion limit: 6 (semi-monotone SMT) Task C orpus si ze JPC ko-j a 994, 998 JPC zh-j a 995, 385 zh-j a 668, 468 en-j a 2, 351, 575 13
  • 14. Candidate selection part • Selection with the LM score
 
 Number of selected candidates Task RBM T RBM T+SPE Preorderi ng +SM T Total JPC ko-j a 25 1177 798 2000 JPC zh-j a 2 875 1123 2000 zh-j a 9 1270 828 2107 en-j a 5 658 1149 1812 14
  • 15. JPOzh-ja results for devtest data System Addi ti onal feature BLEU RI BES RBM T 16. 55 0. 7192 RBM T User di cti onary 23. 54 0. 7510 RBM T+SPE User di cti onary 41. 35 0. 8203 SM T (DL=10) 40. 86 0. 8071 Preorderi ng+SM T ori gi nal gram m ar 40. 15 0. 8164 Preorderi ng+SM T i m proved gram m ar 41. 84 0. 8218 Preorderi ng+SM T i m proved gram m ar + reranki ng of parse trees 42. 75 0. 8237 System com bi nati on 43. 01 0. 8265 15
  • 16. Conclusion and future works • Combination of rule-based method and statistical method. • Human evaluation results:
 • To improve parsing accuracy. Task C rowd JPO adequacy En-j a 11/15 -- Zh-j a 2/11 3/3 JPO zh-j a 3/12 2/3 JPO ko-j a 4/13 -- 16