SlideShare ist ein Scribd-Unternehmen logo
1 von 47
Left-to-Right Hierarchical Phrase-
based Translation System
(LR-Hiero)
Maryam Siahbani
Overview
• History of Machine Translation
• Rule based MT
• Statistical MT
– Training
– Decoding
• Left-to-Right Hierarchical Phrase-based MT
• Using LR-Hiero in Simultaneous Translation
2
History of Machine Translation
• Late 1940’s: Early rule-based systems
– computers would replace human translations within
5 years!
• 1966: ALPAC report cuts research funding
• Early 1970’s: First commercial system (Systran)
• Late 1980’s: IBM developed first statistical
models inspired by speech research
• Late 2000’s: Explosion in MT research
• 2006: First version of Google Translate
3
Rule-based Machine Translation
• Rules hand-written by linguists
• State of the art until early 2000’s
– e.g. Systran
• Expensive to create maintain and adapt
4
French
NP
Noun
chat
Adjective
noir
English
NP
Noun
cat
Adjective
black
Statistical Machine Translation
• Data driven approaches to MT
• Learn translation from textual data
– Parallel Data
• Language independent
• Normally use probabilistic models
– The best translation = the most probable translation
𝑒∗ = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑒 𝑃(𝑒|𝑓) where f: source sentence
• State of the art for most language pairs
– Best systems include rules (hybrid)
5
translation
model
Statistical Machine Translation
6
Training
Pipeline
Training data
Monolingual
& Bilingual data
Decoder
Input
sentence
translation
Translation Data
Parallel Text:
(Web, United Nations, European/Canadian Parliament,
Wikipedia, etc.)
Statistical Machine Translation (SMT)
8
Aligned Words
EnZh
happens
发生 事情我们十分 关注 的
we are very much concerned with what in region
地区非洲
African
Learn alignment from parallel text
Statistical Machine Translation (SMT)
9
Aligned Words
EnZh
Translation rules
happens
发生 事情我们十分 关注 的
we are very much concerned with what in region
地区非洲
African
Learn alignment from parallel text
Id Source Target Weight
r1 关注 X_1 concerned with X_1 -5.3
r2 X_1 发生 X_2事情 what happens X_2 X_1 -4.8
r3 非洲 地区 African region -3.1
Learn weighted translation rules from word aligned text
Translation Rules (phrase-pairs)
10
Source Target p(e|f)
den Vorschlag the proposal 0.6227
den Vorschlag ‘s proposal 0.1068
den Vorschlag a proposal 0.0341
den Vorschlag the idea 0.0250
den Vorschlag this proposal 0.0227
den Vorschlag proposal 0.0205
den Vorschlag of the proposal 0.0159
den Vorschlag the proposals 0.0159
* German-English phrase table trained on Europarl
Millions of
translation rules
Log probability
-1.7986
translation
model
Statistical Machine Translation (SMT)
11


drdyee
rhwfePe )(.maxarg)|(maxarg*
)(
Aligned Words
EnZh
Translation rules
Decoder
happens
发生 事情我们十分 关注 的
we are very much concerned with what in region
地区非洲
African
Learn alignment from parallel text
Id Source Target Weight
r1 关注 X_1 concerned with X_1 -5.3
r2 X_1 发生 X_2事情 what happens X_2 X_1 -4.8
r3 非洲 地区 African region -3.1
Learn weighted translation rules from word aligned text
Decoder generates many candidate translations,
scores them and returns the most likely one
Find the translation for any
given input (f)
f e
Measuring Translation Quality:
BLEU score
• BLEU is a simple but effective scoring metric
shown to be proportional to human judgment of
translation quality
• The idea is to measure overlap between the
translation generated by MT system and the
reference translation
• Measure one word overlaps, two word
overlaps,… (n-grams)
• Compute precision score for each n-gram
• Impose a brevity penalty for candidates that are
shorter than reference
12
Measuring Translation Quality:
BLEU score
• Input:
– Ich war in meinen zwangzigern bevor ich erstmals in
ein kunstmuseum ging .
• Reference translation:
– I was in my twenties before I ever went to an art
museum .
• Low BLEU score (41.1):
– I was twenty I ever went to art .
• High BLEU score (89.0):
– I was in my twenties before I first went to an art
museum .
13
Hierarchical Phrase-based
Translation (Hiero)
SCFG
Hierarchical Phrase-based Translation
Synchronous Context-Free Grammar
15
Aligned Words
EnZh
Translation Rules
X -> <我们十分X_1 / we are very much X_1>
X -> <事情 / what >
我们 十分 关注 发生 的 事情地区非洲
(Hiero)
X -> <非洲 地区 / african region >
we are very much
X-> <关注 X_1 发生 的 X_2 /concerned with X_2 happens in X_1>
concerned with happens inwhat african region
X -> <我们十分X_1 / we are very much X_1>
X-> <关注 X_1 发生 的 X_2 /concerned with X_2 happens in X_1>
X -> <事情 / what >
X -> <非洲 地区 / african region >
translation
model
Decoder
Hiero Decoder
O(n^3)
LM computation
我们 关注 发生 的 事情地区十分 非洲 。
we are very much concerned with what happens in african regions .
X_2
X_1 X_2= what
X -> <关注 X_1 发生 的 X_2 / concerned with X_2 happens in X_1>
X_1= african region
concerned with happens in
what
african region
LM LM LM
Bottom-up Dynamic
Programing algorithm
we are very much concerned with
16
Left-to-Right Hierarchical
Phrase-based Translation System
Left-to-Right Target Generation
(Watanabe et al. 2006)
18
X1
X1
X1
we are very much
concerned with
X2what happens X1
in african region
X1
X1
X1
我们十分
关注
X2发生X1
的非洲 地区
发生
的我们 关注 发生 事情地区十分 非洲
we are very muchconcerned with what happens african regionin
X -> <我们十分 X_1 / we are very much X_1>
X -> <X_1 发生 X_2事情 / what happens X_2 X_1>
X -> < 关注 X_1 / concerned with X_1>
X -> <X_1 发生 的 X_2 / X_2 happens in X_1>Non-GNF
Greibach Normal Form
(GNF)
• Search for sub-phrases within larger ones
– Smaller phrases are replaced by non-terminal X
• Dynamic programming algorithm to extract rules
for LR-
– Linear time complexity (in number of rules)
LR-Hiero Rule Extraction
19
<我们十分X_1 / we are very much X_1>
事情
happens
发生我们十分 关注 的
we are very much concerned with what in region
地区非洲
AfricanX_1
X_1
• Search for sub-phrases within larger ones
– Smaller phrases are replaced by non-terminal X
• A novel Dynamic programming algorithm to extract
rules for LR-Hiero
– Linear time complexity vs. exhaustive search
LR-Hiero Rule Extraction
20
<我们十分X_1 / we are very much X_1>
事情
happens
发生我们十分 关注 的
we are very much concerned with what in region
地区非洲
African
X2X_1
< X_1 发生 X_2事情 / what happens X_2 X_1>
X2 X_1
• Linear time complexity vs. exhaustive search
• Can easily extract rules with more non-terminals
LR-Hiero Rule Extraction
21
0
1000
2000
3000
4000
1 2 3 4
Time(sec.)
No. of Non-terminals
Effect of No. of Non-terminals on
extraction time
Hiero Heuristic
DP Extractor
Expressive Hierarchical Rule Extraction for Left-to-Right Translation. M. Siahbani and A.
Sarkar. AMTA(2014)
的
Left-to-Right Decoding
X -> <我们十分 X_1 / we are very much X_1>
X -> <X_1 发生 X_2事情 / what happens X_2 X_1>
X -> <非洲 地区 / African region >
<s> [0,8]
<s>
<s> we are very much
<s> we are very much concerned with
<s> we are very much concerned with what happens
<s> we are very much concerned with what happens in
0 1 2 3 4 5 6 7 8
我们 关注 发生 事情地区十分 非洲
X -> < 关注 X_1 / concerned with X_1>
X -> <的 / in >
we are very much[2,8]
concerned with[3,8]
what happens[6,7] [3,5]
in
[3,5]
African region
22
的
Left-to-Right Decoding
<s> [0,8]
<s> we are very much [2,8]
<s> we are very much concerned with [3,8]
<s> we are very much concerned with what happens [6,7][3.5]
<s> we are very much concerned with what happens in [3,5]
<s> we are very much concerned with what happens in African region
0 1 2 3 4 5 6 7 8
我们 关注 发生 事情地区十分 非洲
𝑶(𝒏 𝟐
)
Typical CKY: 𝑶(𝒏 𝟑
)
23


drdyt
rfwt )(.maxarg*
)(
Candidate translations are scored by:
<我们十分 X_1 / we are very much X_1>, -4.7
<X_1 发生 X_2事情 / what happens X_2 X_1>, -3.6
<非洲 地区 / African region >, -2.7
< 关注 X_1 / concerned with X_1>, -3.8
<的 / in >, -1.2
, -7.7
, -7.1
, -5.9
, -4.5
, -3.3
, 0
LR-Hiero State-of-the-art
17
19
21
23
25
27
29
0 2000 4000 6000 8000
BLEU(translationaccuracy)
LM Calls (translation time)
Czech-English
German-English
Chinese-English
LR-Hiero Results
3 Times Faster
Comparable Translation Accuracy
Statistical Machine Translation (SMT)
• Available SMT systems:
– Moses (Edinburgh)
– Phrasal (Stanford)
– Jane 2 (Aachen University)
– Joshua (JHU)
– Kriya (SFU)
– CDEC (CMU)
– LR-Hiero
Phrase-Based
Hierarchical
Phrase-Based
(Hiero)
Left-to-Right Hierarchical
Phrase-based
Available : https://github.com/sfu-
natlang/lrhiero
• Time efficient
• Can model complex translation
• Generates translation in left-to-right
manner
• Suitable choice for online translation
Simultaneous Translation
Speech to Speech Translation
Karlsruhe (KIT)
Lecture Translator
NICT Speech Translator Skype Translator
Incremental Translation
• Facilitate continuous translation with low
latency
– Latency: time difference between start of source
sentence (speech) and start of target sentence
(speech)
• Ensure acceptable translation accuracy
Good evening, I would like
a taxi to the airport please
Buenas noches. Quiero un
taxi al aeropuerto por favor
6 sec
Good evening, I would
0.7 sec
0.2 sec
0.2 sec
like a taxi
to the airport please
Non-incremental
Buenas noches quiero
como un taxi
al aeropuerto por favor
Incremental
translate
segment?
Good
Integrating Segmentation with
Translation Process
segment?
Goodevening translate
Integrating Segmentation with
Translation Process
Integrating Segmentation with
Translation Process
segment?
Good eveningI Buenas nochestranslate
Incremental Translation Results
Translation accuracy
measure
• Task: English-German TED speech translation
• MT System Training Data: IWSLT 2013 Train data +
Europarl v7 data [Koehn 2005]
Bleu Latency (sec) Segs/Second
Non-incremental 21.08 6.353 0.15
Prosodic 20.88 0.468 2.27
Incremental 20.86 0.311 3.22
Publications
33
• Efficient Left-to-Right Hierarchical Phrase-Based Translation
with Improved Reordering. Siahbani, Maryam and
Sankaran, Baskaran and Sarkar, Anoop. EMNLP(2014)
• Two Improvements to Left-to-Right Decoding for
Hierarchical Phrase-based Machine Translation. Siahbani,
Maryam and Sarkar, Anoop. EMNLP(2014)
• Expressive Hierarchical Rule Extraction for Left-to-Right
Translation. Siahbani, Maryam and Sarkar, Anoop.
AMTA(2014)
• Incremental Translation using a Hierarchical Phrase-based
Translation System. Siahbani, Maryam and Mehdizadeh
Seraj, Ramtin and Sankaran, Baskaran and Sarkar, Anoop. SLT
(2014)complexity (in number of rules)
Question?
Partial Hypothesis
<s> [0,8], -3.3
<s> we are very much [2,8], -4.5
的
0 1 2 3 4 5 6 7 8
我们 关注 发生 事情地区十分 非洲
<s> we are very much concerned with [3,8], -5.9
<s> we are very much concerned with what happens [6,7][3,5], -7.1
LR-Decoding with Beam Search
• LR-Decoding integrated with beam-search
(Watanabe et al. 2006)
• Stacks: hypotheses with same number of source side
words covered
• Exhaustively generating all possible partial
hypotheses for a given stack
36
Cube pruning
• Each cube: a group of hypotheses and applicable
rules
• Cubes are fed to a priority queue which fills the
current stack
37
• Rows: hypotheses
• Columns: rules
• Rows and columns are sorted based on the scores
• Assumption: The best hypothesis is in the top left
– The next best are the
neighbours of this entry
Cube pruning
38
12.5 12.4 14.3
12.6 12.8 14.7
13.3 13.5 15.4
0.9 1.1 3.2
students have not yet 10.2 12.5
12.5
12.412.4
made
done
do
pupils have not yet 11.5
student has not 12.7
Time Efficiency: avg of LM queries
Efficient Left-to-Right Hierarchical Phrase-Based Translation with Improved Reordering.
M. Siahbani, B. Sankaran and A. Sarkar. EMNLP(2013) 39
Watanabe et al. (2006)
Reordering Features
• LR-Hiero by (Watanabe et al. 2006) achieves ~2 BLEU
scores less than Hiero
40
Watanabe et al. (2006)
Reordering Features
• Distortion feature (when apply each rule)
• Number of reordering rules (non-terminals on source
and target side are reordered)
41
r<>= 1
r<>= 0
<X_1 发生 X_2事情 / what happens X_2 X_1>
<X_1 发生 X_2事情 / what happens X_1 X_2>
<X_1 发生 X_2事情 / what happens X_2 X_1>
的
0 1 2 3 4 5 6 7 8
我们 关注 发生 事情地区十分 非洲
d =(5-3) + (7-6) + (8-6) + (7-3) + (8-5)
Translation Quality
Efficient Left-to-Right Hierarchical Phrase-Based Translation with Improved Reordering.
M. Siahbani, B. Sankaran and A. Sarkar. EMNLP(2013) 42
Watanabe et al. (2006)
Search Error in Cube Pruning
43
8.1 8.2 8.5
8.0 8.4 8.6
8.3 8.9 8.8
0.9 1.3 3.2
6.6
6.7
6.9
9.1 8.9 9.3
8.0 8.5 9.0
7.7 7.9 8.1
1.0 1.3 1.5
6.2
6.3
6.5
8.1
8.0 8.1
8.0
8.28.2
• Assumption: The best hypothesis is in the top left
– The next best are the neighbours of this entry
• Adding LM score violates the assumption
Search Error in Cube Pruning
44
• Assumption: The best hypothesis is in the top left
– The next best are the neighbours of this entry
• Adding LM score violates the assumption
8.1 8.2 8.5
8.0 8.4 8.6
8.3 8.9 8.8
0.9 1.3 3.2
6.6
6.7
6.9
9.1 8.9 9.3
8.0 8.5 9.0
7.7 7.9 8.1
1.0 1.3 1.5
6.2
6.3
6.5
8.08.0 8.0
8.0
7.7
7.7
Queue
diversity
Queue Diversity
Two Improvements to Left-to-Right Decoding for Hierarchical Phrase-based Machine
Translation. M. Siahbani and A. Sarkar. EMNLP(2014) 45
23.5
24
24.5
25
25.5
26
26.5
Chinese-English
BLEU score
LR-Hiero
LR-Hiero+CP
LR-Hiero+CP
(QD=10)
0
10000
20000
30000
40000
Chinese-English
No. LM calls
LR-Hiero
LR-Hiero+CP
LR-Hiero+CP
(QD=10)
Lexicalized Reordering Model
• Distortion penalty is weak
– deviation from the monotonic translation
• Learn reordering preferences for each phrase
(respect to previous phrase)
– Monotone
– Swap
– Discontinuous
46
F
E
Figure from "Statistical Machine Translation“
Koehn 2010
Lexicalized Reordering Model
• Collect orientation information during rule extraction
– Convert each rule to a phrase-pair (possibly discontinuous)
– M: If there is a phrase-pair on the top-left
– S: If there is a phrase-pair on the top right
– D: otherwise
• Estimation by relative frequency
𝑃𝑜 𝑜𝑟𝑖𝑒𝑛𝑡𝑎𝑡𝑖𝑜𝑛 𝑒, 𝑓 =
𝑐𝑜𝑢𝑛𝑡(𝑜𝑟𝑖𝑒𝑛𝑡𝑎𝑡𝑖𝑜𝑛,𝑒,𝑓)
𝑜 𝑐𝑜𝑢𝑛𝑡(𝑜,𝑒,𝑓)
47
F
E
Figure from "Statistical Machine Translation“
Koehn 2010

Weitere ähnliche Inhalte

Mehr von WithTheBest

Engaging and sharing your VR experience
Engaging and sharing your VR experienceEngaging and sharing your VR experience
Engaging and sharing your VR experience
WithTheBest
 
How to survive the early days of VR as an Indie Studio
How to survive the early days of VR as an Indie StudioHow to survive the early days of VR as an Indie Studio
How to survive the early days of VR as an Indie Studio
WithTheBest
 
Unlocking Human Potential with Immersive Technology
Unlocking Human Potential with Immersive TechnologyUnlocking Human Potential with Immersive Technology
Unlocking Human Potential with Immersive Technology
WithTheBest
 

Mehr von WithTheBest (20)

Riccardo Vittoria
Riccardo VittoriaRiccardo Vittoria
Riccardo Vittoria
 
Recreating history in virtual reality
Recreating history in virtual realityRecreating history in virtual reality
Recreating history in virtual reality
 
Engaging and sharing your VR experience
Engaging and sharing your VR experienceEngaging and sharing your VR experience
Engaging and sharing your VR experience
 
How to survive the early days of VR as an Indie Studio
How to survive the early days of VR as an Indie StudioHow to survive the early days of VR as an Indie Studio
How to survive the early days of VR as an Indie Studio
 
Mixed reality 101
Mixed reality 101 Mixed reality 101
Mixed reality 101
 
Unlocking Human Potential with Immersive Technology
Unlocking Human Potential with Immersive TechnologyUnlocking Human Potential with Immersive Technology
Unlocking Human Potential with Immersive Technology
 
Building your own video devices
Building your own video devicesBuilding your own video devices
Building your own video devices
 
Maximizing performance of 3 d user generated assets in unity
Maximizing performance of 3 d user generated assets in unityMaximizing performance of 3 d user generated assets in unity
Maximizing performance of 3 d user generated assets in unity
 
Wizdish rovr
Wizdish rovrWizdish rovr
Wizdish rovr
 
Haptics & amp; null space vr
Haptics & amp; null space vrHaptics & amp; null space vr
Haptics & amp; null space vr
 
How we use vr to break the laws of physics
How we use vr to break the laws of physicsHow we use vr to break the laws of physics
How we use vr to break the laws of physics
 
The Virtual Self
The Virtual Self The Virtual Self
The Virtual Self
 
You dont have to be mad to do VR and AR ... but it helps
You dont have to be mad to do VR and AR ... but it helpsYou dont have to be mad to do VR and AR ... but it helps
You dont have to be mad to do VR and AR ... but it helps
 
Omnivirt overview
Omnivirt overviewOmnivirt overview
Omnivirt overview
 
VR Interactions - Jason Jerald
VR Interactions - Jason JeraldVR Interactions - Jason Jerald
VR Interactions - Jason Jerald
 
Japheth Funding your startup - dating the devil
Japheth  Funding your startup - dating the devilJapheth  Funding your startup - dating the devil
Japheth Funding your startup - dating the devil
 
Transported vr the virtual reality platform for real estate
Transported vr the virtual reality platform for real estateTransported vr the virtual reality platform for real estate
Transported vr the virtual reality platform for real estate
 
Measuring Behavior in VR - Rob Merki Cognitive VR
Measuring Behavior in VR - Rob Merki Cognitive VRMeasuring Behavior in VR - Rob Merki Cognitive VR
Measuring Behavior in VR - Rob Merki Cognitive VR
 
Global demand for Mixed Realty (VR/AR) content is about to explode.
Global demand for Mixed Realty (VR/AR) content is about to explode. Global demand for Mixed Realty (VR/AR) content is about to explode.
Global demand for Mixed Realty (VR/AR) content is about to explode.
 
VR, a new technology over 40,000 years old
VR, a new technology over 40,000 years oldVR, a new technology over 40,000 years old
VR, a new technology over 40,000 years old
 

Kürzlich hochgeladen

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Kürzlich hochgeladen (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Left-to-Right Hierarchical Phrase-based Translation and its Application in Simultaneous Speech Translation - Maryam Siahbani

  • 1. Left-to-Right Hierarchical Phrase- based Translation System (LR-Hiero) Maryam Siahbani
  • 2. Overview • History of Machine Translation • Rule based MT • Statistical MT – Training – Decoding • Left-to-Right Hierarchical Phrase-based MT • Using LR-Hiero in Simultaneous Translation 2
  • 3. History of Machine Translation • Late 1940’s: Early rule-based systems – computers would replace human translations within 5 years! • 1966: ALPAC report cuts research funding • Early 1970’s: First commercial system (Systran) • Late 1980’s: IBM developed first statistical models inspired by speech research • Late 2000’s: Explosion in MT research • 2006: First version of Google Translate 3
  • 4. Rule-based Machine Translation • Rules hand-written by linguists • State of the art until early 2000’s – e.g. Systran • Expensive to create maintain and adapt 4 French NP Noun chat Adjective noir English NP Noun cat Adjective black
  • 5. Statistical Machine Translation • Data driven approaches to MT • Learn translation from textual data – Parallel Data • Language independent • Normally use probabilistic models – The best translation = the most probable translation 𝑒∗ = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑒 𝑃(𝑒|𝑓) where f: source sentence • State of the art for most language pairs – Best systems include rules (hybrid) 5
  • 6. translation model Statistical Machine Translation 6 Training Pipeline Training data Monolingual & Bilingual data Decoder Input sentence translation
  • 7. Translation Data Parallel Text: (Web, United Nations, European/Canadian Parliament, Wikipedia, etc.)
  • 8. Statistical Machine Translation (SMT) 8 Aligned Words EnZh happens 发生 事情我们十分 关注 的 we are very much concerned with what in region 地区非洲 African Learn alignment from parallel text
  • 9. Statistical Machine Translation (SMT) 9 Aligned Words EnZh Translation rules happens 发生 事情我们十分 关注 的 we are very much concerned with what in region 地区非洲 African Learn alignment from parallel text Id Source Target Weight r1 关注 X_1 concerned with X_1 -5.3 r2 X_1 发生 X_2事情 what happens X_2 X_1 -4.8 r3 非洲 地区 African region -3.1 Learn weighted translation rules from word aligned text
  • 10. Translation Rules (phrase-pairs) 10 Source Target p(e|f) den Vorschlag the proposal 0.6227 den Vorschlag ‘s proposal 0.1068 den Vorschlag a proposal 0.0341 den Vorschlag the idea 0.0250 den Vorschlag this proposal 0.0227 den Vorschlag proposal 0.0205 den Vorschlag of the proposal 0.0159 den Vorschlag the proposals 0.0159 * German-English phrase table trained on Europarl Millions of translation rules Log probability -1.7986
  • 11. translation model Statistical Machine Translation (SMT) 11   drdyee rhwfePe )(.maxarg)|(maxarg* )( Aligned Words EnZh Translation rules Decoder happens 发生 事情我们十分 关注 的 we are very much concerned with what in region 地区非洲 African Learn alignment from parallel text Id Source Target Weight r1 关注 X_1 concerned with X_1 -5.3 r2 X_1 发生 X_2事情 what happens X_2 X_1 -4.8 r3 非洲 地区 African region -3.1 Learn weighted translation rules from word aligned text Decoder generates many candidate translations, scores them and returns the most likely one Find the translation for any given input (f) f e
  • 12. Measuring Translation Quality: BLEU score • BLEU is a simple but effective scoring metric shown to be proportional to human judgment of translation quality • The idea is to measure overlap between the translation generated by MT system and the reference translation • Measure one word overlaps, two word overlaps,… (n-grams) • Compute precision score for each n-gram • Impose a brevity penalty for candidates that are shorter than reference 12
  • 13. Measuring Translation Quality: BLEU score • Input: – Ich war in meinen zwangzigern bevor ich erstmals in ein kunstmuseum ging . • Reference translation: – I was in my twenties before I ever went to an art museum . • Low BLEU score (41.1): – I was twenty I ever went to art . • High BLEU score (89.0): – I was in my twenties before I first went to an art museum . 13
  • 15. SCFG Hierarchical Phrase-based Translation Synchronous Context-Free Grammar 15 Aligned Words EnZh Translation Rules X -> <我们十分X_1 / we are very much X_1> X -> <事情 / what > 我们 十分 关注 发生 的 事情地区非洲 (Hiero) X -> <非洲 地区 / african region > we are very much X-> <关注 X_1 发生 的 X_2 /concerned with X_2 happens in X_1> concerned with happens inwhat african region X -> <我们十分X_1 / we are very much X_1> X-> <关注 X_1 发生 的 X_2 /concerned with X_2 happens in X_1> X -> <事情 / what > X -> <非洲 地区 / african region > translation model Decoder
  • 16. Hiero Decoder O(n^3) LM computation 我们 关注 发生 的 事情地区十分 非洲 。 we are very much concerned with what happens in african regions . X_2 X_1 X_2= what X -> <关注 X_1 发生 的 X_2 / concerned with X_2 happens in X_1> X_1= african region concerned with happens in what african region LM LM LM Bottom-up Dynamic Programing algorithm we are very much concerned with 16
  • 18. Left-to-Right Target Generation (Watanabe et al. 2006) 18 X1 X1 X1 we are very much concerned with X2what happens X1 in african region X1 X1 X1 我们十分 关注 X2发生X1 的非洲 地区 发生 的我们 关注 发生 事情地区十分 非洲 we are very muchconcerned with what happens african regionin X -> <我们十分 X_1 / we are very much X_1> X -> <X_1 发生 X_2事情 / what happens X_2 X_1> X -> < 关注 X_1 / concerned with X_1> X -> <X_1 发生 的 X_2 / X_2 happens in X_1>Non-GNF Greibach Normal Form (GNF)
  • 19. • Search for sub-phrases within larger ones – Smaller phrases are replaced by non-terminal X • Dynamic programming algorithm to extract rules for LR- – Linear time complexity (in number of rules) LR-Hiero Rule Extraction 19 <我们十分X_1 / we are very much X_1> 事情 happens 发生我们十分 关注 的 we are very much concerned with what in region 地区非洲 AfricanX_1 X_1
  • 20. • Search for sub-phrases within larger ones – Smaller phrases are replaced by non-terminal X • A novel Dynamic programming algorithm to extract rules for LR-Hiero – Linear time complexity vs. exhaustive search LR-Hiero Rule Extraction 20 <我们十分X_1 / we are very much X_1> 事情 happens 发生我们十分 关注 的 we are very much concerned with what in region 地区非洲 African X2X_1 < X_1 发生 X_2事情 / what happens X_2 X_1> X2 X_1
  • 21. • Linear time complexity vs. exhaustive search • Can easily extract rules with more non-terminals LR-Hiero Rule Extraction 21 0 1000 2000 3000 4000 1 2 3 4 Time(sec.) No. of Non-terminals Effect of No. of Non-terminals on extraction time Hiero Heuristic DP Extractor Expressive Hierarchical Rule Extraction for Left-to-Right Translation. M. Siahbani and A. Sarkar. AMTA(2014)
  • 22. 的 Left-to-Right Decoding X -> <我们十分 X_1 / we are very much X_1> X -> <X_1 发生 X_2事情 / what happens X_2 X_1> X -> <非洲 地区 / African region > <s> [0,8] <s> <s> we are very much <s> we are very much concerned with <s> we are very much concerned with what happens <s> we are very much concerned with what happens in 0 1 2 3 4 5 6 7 8 我们 关注 发生 事情地区十分 非洲 X -> < 关注 X_1 / concerned with X_1> X -> <的 / in > we are very much[2,8] concerned with[3,8] what happens[6,7] [3,5] in [3,5] African region 22
  • 23. 的 Left-to-Right Decoding <s> [0,8] <s> we are very much [2,8] <s> we are very much concerned with [3,8] <s> we are very much concerned with what happens [6,7][3.5] <s> we are very much concerned with what happens in [3,5] <s> we are very much concerned with what happens in African region 0 1 2 3 4 5 6 7 8 我们 关注 发生 事情地区十分 非洲 𝑶(𝒏 𝟐 ) Typical CKY: 𝑶(𝒏 𝟑 ) 23   drdyt rfwt )(.maxarg* )( Candidate translations are scored by: <我们十分 X_1 / we are very much X_1>, -4.7 <X_1 发生 X_2事情 / what happens X_2 X_1>, -3.6 <非洲 地区 / African region >, -2.7 < 关注 X_1 / concerned with X_1>, -3.8 <的 / in >, -1.2 , -7.7 , -7.1 , -5.9 , -4.5 , -3.3 , 0
  • 24. LR-Hiero State-of-the-art 17 19 21 23 25 27 29 0 2000 4000 6000 8000 BLEU(translationaccuracy) LM Calls (translation time) Czech-English German-English Chinese-English LR-Hiero Results 3 Times Faster Comparable Translation Accuracy
  • 25. Statistical Machine Translation (SMT) • Available SMT systems: – Moses (Edinburgh) – Phrasal (Stanford) – Jane 2 (Aachen University) – Joshua (JHU) – Kriya (SFU) – CDEC (CMU) – LR-Hiero Phrase-Based Hierarchical Phrase-Based (Hiero) Left-to-Right Hierarchical Phrase-based Available : https://github.com/sfu- natlang/lrhiero • Time efficient • Can model complex translation • Generates translation in left-to-right manner • Suitable choice for online translation
  • 27. Speech to Speech Translation Karlsruhe (KIT) Lecture Translator NICT Speech Translator Skype Translator
  • 28. Incremental Translation • Facilitate continuous translation with low latency – Latency: time difference between start of source sentence (speech) and start of target sentence (speech) • Ensure acceptable translation accuracy Good evening, I would like a taxi to the airport please Buenas noches. Quiero un taxi al aeropuerto por favor 6 sec Good evening, I would 0.7 sec 0.2 sec 0.2 sec like a taxi to the airport please Non-incremental Buenas noches quiero como un taxi al aeropuerto por favor Incremental
  • 31. Integrating Segmentation with Translation Process segment? Good eveningI Buenas nochestranslate
  • 32. Incremental Translation Results Translation accuracy measure • Task: English-German TED speech translation • MT System Training Data: IWSLT 2013 Train data + Europarl v7 data [Koehn 2005] Bleu Latency (sec) Segs/Second Non-incremental 21.08 6.353 0.15 Prosodic 20.88 0.468 2.27 Incremental 20.86 0.311 3.22
  • 33. Publications 33 • Efficient Left-to-Right Hierarchical Phrase-Based Translation with Improved Reordering. Siahbani, Maryam and Sankaran, Baskaran and Sarkar, Anoop. EMNLP(2014) • Two Improvements to Left-to-Right Decoding for Hierarchical Phrase-based Machine Translation. Siahbani, Maryam and Sarkar, Anoop. EMNLP(2014) • Expressive Hierarchical Rule Extraction for Left-to-Right Translation. Siahbani, Maryam and Sarkar, Anoop. AMTA(2014) • Incremental Translation using a Hierarchical Phrase-based Translation System. Siahbani, Maryam and Mehdizadeh Seraj, Ramtin and Sankaran, Baskaran and Sarkar, Anoop. SLT (2014)complexity (in number of rules)
  • 35. Partial Hypothesis <s> [0,8], -3.3 <s> we are very much [2,8], -4.5 的 0 1 2 3 4 5 6 7 8 我们 关注 发生 事情地区十分 非洲 <s> we are very much concerned with [3,8], -5.9 <s> we are very much concerned with what happens [6,7][3,5], -7.1
  • 36. LR-Decoding with Beam Search • LR-Decoding integrated with beam-search (Watanabe et al. 2006) • Stacks: hypotheses with same number of source side words covered • Exhaustively generating all possible partial hypotheses for a given stack 36
  • 37. Cube pruning • Each cube: a group of hypotheses and applicable rules • Cubes are fed to a priority queue which fills the current stack 37
  • 38. • Rows: hypotheses • Columns: rules • Rows and columns are sorted based on the scores • Assumption: The best hypothesis is in the top left – The next best are the neighbours of this entry Cube pruning 38 12.5 12.4 14.3 12.6 12.8 14.7 13.3 13.5 15.4 0.9 1.1 3.2 students have not yet 10.2 12.5 12.5 12.412.4 made done do pupils have not yet 11.5 student has not 12.7
  • 39. Time Efficiency: avg of LM queries Efficient Left-to-Right Hierarchical Phrase-Based Translation with Improved Reordering. M. Siahbani, B. Sankaran and A. Sarkar. EMNLP(2013) 39 Watanabe et al. (2006)
  • 40. Reordering Features • LR-Hiero by (Watanabe et al. 2006) achieves ~2 BLEU scores less than Hiero 40 Watanabe et al. (2006)
  • 41. Reordering Features • Distortion feature (when apply each rule) • Number of reordering rules (non-terminals on source and target side are reordered) 41 r<>= 1 r<>= 0 <X_1 发生 X_2事情 / what happens X_2 X_1> <X_1 发生 X_2事情 / what happens X_1 X_2> <X_1 发生 X_2事情 / what happens X_2 X_1> 的 0 1 2 3 4 5 6 7 8 我们 关注 发生 事情地区十分 非洲 d =(5-3) + (7-6) + (8-6) + (7-3) + (8-5)
  • 42. Translation Quality Efficient Left-to-Right Hierarchical Phrase-Based Translation with Improved Reordering. M. Siahbani, B. Sankaran and A. Sarkar. EMNLP(2013) 42 Watanabe et al. (2006)
  • 43. Search Error in Cube Pruning 43 8.1 8.2 8.5 8.0 8.4 8.6 8.3 8.9 8.8 0.9 1.3 3.2 6.6 6.7 6.9 9.1 8.9 9.3 8.0 8.5 9.0 7.7 7.9 8.1 1.0 1.3 1.5 6.2 6.3 6.5 8.1 8.0 8.1 8.0 8.28.2 • Assumption: The best hypothesis is in the top left – The next best are the neighbours of this entry • Adding LM score violates the assumption
  • 44. Search Error in Cube Pruning 44 • Assumption: The best hypothesis is in the top left – The next best are the neighbours of this entry • Adding LM score violates the assumption 8.1 8.2 8.5 8.0 8.4 8.6 8.3 8.9 8.8 0.9 1.3 3.2 6.6 6.7 6.9 9.1 8.9 9.3 8.0 8.5 9.0 7.7 7.9 8.1 1.0 1.3 1.5 6.2 6.3 6.5 8.08.0 8.0 8.0 7.7 7.7 Queue diversity
  • 45. Queue Diversity Two Improvements to Left-to-Right Decoding for Hierarchical Phrase-based Machine Translation. M. Siahbani and A. Sarkar. EMNLP(2014) 45 23.5 24 24.5 25 25.5 26 26.5 Chinese-English BLEU score LR-Hiero LR-Hiero+CP LR-Hiero+CP (QD=10) 0 10000 20000 30000 40000 Chinese-English No. LM calls LR-Hiero LR-Hiero+CP LR-Hiero+CP (QD=10)
  • 46. Lexicalized Reordering Model • Distortion penalty is weak – deviation from the monotonic translation • Learn reordering preferences for each phrase (respect to previous phrase) – Monotone – Swap – Discontinuous 46 F E Figure from "Statistical Machine Translation“ Koehn 2010
  • 47. Lexicalized Reordering Model • Collect orientation information during rule extraction – Convert each rule to a phrase-pair (possibly discontinuous) – M: If there is a phrase-pair on the top-left – S: If there is a phrase-pair on the top right – D: otherwise • Estimation by relative frequency 𝑃𝑜 𝑜𝑟𝑖𝑒𝑛𝑡𝑎𝑡𝑖𝑜𝑛 𝑒, 𝑓 = 𝑐𝑜𝑢𝑛𝑡(𝑜𝑟𝑖𝑒𝑛𝑡𝑎𝑡𝑖𝑜𝑛,𝑒,𝑓) 𝑜 𝑐𝑜𝑢𝑛𝑡(𝑜,𝑒,𝑓) 47 F E Figure from "Statistical Machine Translation“ Koehn 2010

Hinweis der Redaktion

  1. In Statistical Machine Translation We are basically looking for a translation sentence e which maximizes the probability of e given source sentence f. Statistical approaches to Machine Translation have achieved impressive performance by leveraging large amounts of parallel corpora. However, such data are available only for a few dozen language pairs in limited domains #Currently we just have parallel data for a few language pairs like: French-English, Arabic-English, and so on. But we have more than 5000 languages spoken by people on the world. And we do not have parallel data between most of them.
  2. In Statistical Machine Translation We are basically looking for a translation sentence e which maximizes the probability of e given source sentence f. Statistical approaches to Machine Translation have achieved impressive performance by leveraging large amounts of parallel corpora. However, such data are available only for a few dozen language pairs in limited domains #Currently we just have parallel data for a few language pairs like: French-English, Arabic-English, and so on. But we have more than 5000 languages spoken by people on the world. And we do not have parallel data between most of them.
  3. In Statistical Machine Translation We are basically looking for a translation sentence e which maximizes the probability of e given source sentence f. Statistical approaches to Machine Translation have achieved impressive performance by leveraging large amounts of parallel corpora. However, such data are available only for a few dozen language pairs in limited domains #Currently we just have parallel data for a few language pairs like: French-English, Arabic-English, and so on. But we have more than 5000 languages spoken by people on the world. And we do not have parallel data between most of them.
  4. - Hiero uses a simple rule extraction algorithm based on word alignments to avoid excessively large grammars, they apply constraints on length of phrase-pairs and rule configuration Assumes unit count for phrase-pairs Uniformly distributes the fractional count to all rules extracted from the phrase-pair
  5. - Hiero uses a simple rule extraction algorithm based on word alignments to avoid excessively large grammars, they apply constraints on length of phrase-pairs and rule configuration Assumes unit count for phrase-pairs Uniformly distributes the fractional count to all rules extracted from the phrase-pair
  6. Left-to-right decoding is a potential alternative. It is a Early style decoder which generate the target side in left-to-right order. Each partial hypothesis consists of a partial translation and a sequence of uncovered spans on source side. It is a faster decoder compare to CKY,
  7. Left-to-right decoding is a potential alternative. It is a Early style decoder which generate the target side in left-to-right order. Each partial hypothesis consists of a partial translation and a sequence of uncovered spans on source side. It is a faster decoder compare to CKY,
  8. In incremental translation we need to optimize two criteria, Facilitate continuous translation with low latency Latency: time difference between start of source language (speech) and start of target language (speech) Ensure acceptable translation accuracy