SlideShare ist ein Scribd-Unternehmen logo
1 von 19
Downloaden Sie, um offline zu lesen
Lexically Constrained
Decoding for Sequence
Generation Using Grid Beam
Search
Chris Hokamp, Qun Liu
(ACL 2017)
B4 勝又 智
Abstract
● They present Grid Beam Search (GBS), an algorithm which extends beam
search to allow the inclusion of pre-specified lexical constraints.
● They demonstrate the feasibility and flexibility of Lexically Constrained
Decoding by conducting 2 experiments.
○ Neural Interactive-Predictive Translation
○ Domain Adaptation for Neural Machine Translation
● Result
○ GBS can provide large improvements in translation quality in interactive scenarios.
○ even without any user input, GBS can be used to achieve significant gains in performance
in domain adaptation scenarios.
2
Introduction: What is this research?
● The output of many natural language processing model is a sequence of text.
→ Especially, they focus on machine translation usecases.†
○ such as Post-Editing (PE) and Interactive-Predictive MT
● Their goal is to find a way to force the output of a model to contain such lexical
constraints.
● In this paper, they propose a decoding algorithm which allows the specification
of subsequences to be present in a model’s output.
3
input model
output
(sequenc
e)
†: lexically constrained decoding is relevant to any
scenario where a model is asked to generate a
secuence given an input.
(image description, dialog generation, abstractive
summarization, question answering)
Background: Beam Search
4algorithm: Kyoto University Participation to WAT 2016. Cromieres et al., WAT2016
Grid Beam Search: Abstract
● their goal is to organize decoding in such a way that they can constrain the
search space to outputs which contain one or more pre-specified
sub-sequences.
○ want to place lexical constraints correctly and
to generate the parts of the output which are not covered by the constraints.
5time step (t)
Constraint
Coverage
(c)
constraints
is an array of sequences.
← constraints
= [[Ihre, Rechte, müssen],
[Abreise, geschützt]]
more detail on the next slides →
Grid Beam Search: Algorithm
6
numC: total number of tokens in all
constaints
k: beam window size
(this pseudo-code is for any model)
1. open hypotheses can either
generate from the model’s
distribution or start constraints
2. closed hypotheses can only
generate the next token for in a
currently unfinished constraint
1. open hypotheses (Grid[t-1][c]) may
generate continuations from the
model’s distribution
2. open hypotheses (Grid[t-1][c-1])
may start new constraints
3. closed hypotheses (Grid[t-1][c-1])
may continue constraints
Grid Beam Search: Deteil
7
time step (t)
Constraint
Coverage
(c)
a colored strip inside a beam box
represents an individual hypothesis
in the beam’s k-best stack.
Hypotheses with circles are closed,
all other hypotheses are open.
← conceptual drawing
Grid Beam Search: Multi-token Constraints
● By distinguishing between open and closed hypotheses,
→ allow for arbitrary multi-token phrases in the search.
● Each hypothesis maintains a coverage vector.
○ ensure that constraints cannot be repeated in a search path
(hypotheses which have already covered constrainti can only generate or start new constraints)
● Discontinuous lexical constraints such as phrasal verbs in En or Ge
○ it is easy to apply GBS by adding filters to the search ↓
8
constraint
ask <someone> out
constraint1 ask
constraint2 out
constraint1 cannot be
used before constraint0
there must be at least one
generated token between
the constraints.
use 2 filters
Grid Beam Search: Addressing unknown words
● their decoder can handle arbitrary constraints.
→ there is a risk that constraints will contain unknown tokens.
○ Especially in domain adaptation scenarios, some user-specified constraints are very likely to
contain unseen tokens.
→ Use Subword Units. (maybe Byte Pair Encoding...)
(In the experiments, they use this technique to ensure that no input tokens are unknown, even if a constraint contains
words which never appeared in the training data.)
9
Grid Beam Search: Effciency
● the runtime complexity of a naive implementation of GBS is O(ktc).
(standard time-based beam search is O(tk))
↓ some consideration must be given to the efficiency of this algorithm
● the beams in each column c of this Figure are independent.
→ GBS can be parallelized to allow all beams at each timestep to be filled
simultaneously.
● they find that the most time is spent
computing the states for the hyp candidates.
→ by keeping the beam size small,
they can make GBS significantly faster.
10
Experiments setup (including appendix)
● The models are SOTA NMT systems using our own implementation of NMT
with attention over the source sequence (Bahdnau et al., 2014).
○ bidirectional GRUs
○ AdaDelta
○ train all systems for 500,000 iterations, with validation every 5,000 steps,
best single model from validation is used.
○ use ℓ2 regularization (α = 1e^(-5))
○ Dropout is used on the output layers, rate is 0.5.
○ beam size is 10.
● corpus
○ English-German
■ 4.4M segments from the Europarl and CommonCrawl
○ English-French
■ 4.9M segments from the Europarl and CommonCrawl
○ English-Portuguese
■ 28.5M segments from the Europarl, JRC-Aquis and OpenSubtitles
● subword
○ they created a shared subword representation for each language pair
by extracting a vocabulary of 80,000 symbols from the concatenated source and target data. 11
Experiments: Pick-Revise for Interactive PE
● Pick-Revise is an interaction cycle for MT Post-Editing.
● they modify this experiments.
○ the user only provides sequences of up to three words
which are missing from the hypothesis from reference.
○ in this experiments, four cycles.
○ strict setting:
the complete phrase must be missing from the hypothesis.
○ relaxed setting:
only the first word must be missing.
○ when a 3-token phrase cannot be found,
they backoff to 2-token phrases, then to single tokens as constraints.
○ If a hypothesis already matches the reference, no constraints are added.
12fig1: PRIMT: A Pick-Revise Framework for Interactive Machine Translation. Cheng et al., NAACL HLT 2016
Result: Pick-Revise for Interactive PE
13
Experiments: Domain Adaptation via Terminology
● The requirement for use of domain-specific terminologies is common in real
world applications of MT.
→ provide the term as constraints.
● target domain data
○ Autodesk Post-Editing corpus
↑ focused upon sofware localization, very dfferent from train data.
● Extract Terminology
○ devide the corpus into approximately 100,000 training sents, 1,000 test segments.
automatically generate a terminology by computing PMI between source and target n-grams in
training set.
○ extract all n-grams from length 2-5 as terminology
candidates.
○ filter the candidates to only include pairs
whose PMI is >= 0.9 and where both src and tgt occur in at least 5 times.
→ make the set of the terminology.
14
normarized to
[-1, +1]
Results: Domain Adaptation via Terminology
● confirming that this improvements is caused
by the placement of the terms by GBS,
they evaluate the other inserting methods.
○ randomly inserting terms to the baseline output
○ prepending terms to the baseline output
● even an automatically created terminology
yield improvements.
○ manually created domain terminologies
are likely to lead to greater gains.
15
test set
src tgt
terminology
phrase
corrsponding
phrase
constraints
for that segmentsadd
↑ conceptual drawing for previous slide
Output Analysis
16
phrases added as constraints have global effects upon
trainslation quolity.
related work
● Most related work has presented modifications of SMT systems for specific
usecases. (focusing on prefix and sufix)
○ The largest body of work considers Interactive Machine Translation (IMT)
● some attention has also been given to SMT decoding with multiple lexical
constraints. (Cheng et al.,2016)
○ their approach relies upon the phrase segmentation provided by the SMT system.
○ the decoding algorithm can only make use of constraints that match phrase boundaries.
In contrast, GBS approach decodes at the token level.
● “To the best of our knowledge, ours is the first work which considers
general lexically constrained decoding for any model which outputs
sequences, without relying upon alignments between input and output, and
without using a search organized by coverage of the input.”
17
Conclusion
● Lexically constrained decoding is a flexible way to incorporate arbitrary
subsequences into the output of any model that generates output sequences
token-by-token.
● in interctive translation, user inputs can be used as constraints,
generating a new output each time a user fixes an error.
○ such a workflow can provide a large improvement in translation quality.
● using a domain-specific terminology to generate target-side constraints
○ a general domain model can be adapted to a new domain without anay retraining
● future work
○ evaluate GBS with models outside of MT, such as automatic summarization, image captioning
or dialog generation
○ introduce new constraint-aware models (via secondary attention mecanisms over lexical
constraints) 18
reference
● their implementation of GBS
○ https://github.com/chrishokamp/constrained_decoding
● Kyoto University Participation to WAT 2016. Cromieres et al., WAT2016
● PRIMT: A Pick-Revise Framework for Interactive Machine Translation. Cheng
et al., NAACL HLT 2016
19

Weitere ähnliche Inhalte

Ähnlich wie Lexically constrained decoding for sequence generation using grid beam search

A minimization approach for two level logic synthesis using constrained depth...
A minimization approach for two level logic synthesis using constrained depth...A minimization approach for two level logic synthesis using constrained depth...
A minimization approach for two level logic synthesis using constrained depth...IAEME Publication
 
Connectionist Temporal Classification
Connectionist Temporal ClassificationConnectionist Temporal Classification
Connectionist Temporal ClassificationJulius Hietala
 
M ESH S IMPLIFICATION V IA A V OLUME C OST M EASURE
M ESH S IMPLIFICATION V IA A V OLUME C OST M EASUREM ESH S IMPLIFICATION V IA A V OLUME C OST M EASURE
M ESH S IMPLIFICATION V IA A V OLUME C OST M EASUREijcga
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...PyData
 
Once-for-All: Train One Network and Specialize it for Efficient Deployment
 Once-for-All: Train One Network and Specialize it for Efficient Deployment Once-for-All: Train One Network and Specialize it for Efficient Deployment
Once-for-All: Train One Network and Specialize it for Efficient Deploymenttaeseon ryu
 
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Deep Learning Italia
 
Comprehensive Performance Evaluation on Multiplication of Matrices using MPI
Comprehensive Performance Evaluation on Multiplication of Matrices using MPIComprehensive Performance Evaluation on Multiplication of Matrices using MPI
Comprehensive Performance Evaluation on Multiplication of Matrices using MPIijtsrd
 
Parallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMPParallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMPAnil Bohare
 
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
AI optimizing HPC simulations (presentation from  6th EULAG Workshop)AI optimizing HPC simulations (presentation from  6th EULAG Workshop)
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)byteLAKE
 
3 article azojete vol 7 24 33
3 article azojete vol 7 24 333 article azojete vol 7 24 33
3 article azojete vol 7 24 33Oyeniyi Samuel
 
[20240422_LabSeminar_Huy]Taming_Effect.pptx
[20240422_LabSeminar_Huy]Taming_Effect.pptx[20240422_LabSeminar_Huy]Taming_Effect.pptx
[20240422_LabSeminar_Huy]Taming_Effect.pptxthanhdowork
 

Ähnlich wie Lexically constrained decoding for sequence generation using grid beam search (20)

Recurrent Instance Segmentation (UPC Reading Group)
Recurrent Instance Segmentation (UPC Reading Group)Recurrent Instance Segmentation (UPC Reading Group)
Recurrent Instance Segmentation (UPC Reading Group)
 
A minimization approach for two level logic synthesis using constrained depth...
A minimization approach for two level logic synthesis using constrained depth...A minimization approach for two level logic synthesis using constrained depth...
A minimization approach for two level logic synthesis using constrained depth...
 
Connectionist Temporal Classification
Connectionist Temporal ClassificationConnectionist Temporal Classification
Connectionist Temporal Classification
 
M ESH S IMPLIFICATION V IA A V OLUME C OST M EASURE
M ESH S IMPLIFICATION V IA A V OLUME C OST M EASUREM ESH S IMPLIFICATION V IA A V OLUME C OST M EASURE
M ESH S IMPLIFICATION V IA A V OLUME C OST M EASURE
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
Thesis Giani UIC Slides EN
Thesis Giani UIC Slides ENThesis Giani UIC Slides EN
Thesis Giani UIC Slides EN
 
L046056365
L046056365L046056365
L046056365
 
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...
 
Once-for-All: Train One Network and Specialize it for Efficient Deployment
 Once-for-All: Train One Network and Specialize it for Efficient Deployment Once-for-All: Train One Network and Specialize it for Efficient Deployment
Once-for-All: Train One Network and Specialize it for Efficient Deployment
 
FrackingPaper
FrackingPaperFrackingPaper
FrackingPaper
 
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
 
Comprehensive Performance Evaluation on Multiplication of Matrices using MPI
Comprehensive Performance Evaluation on Multiplication of Matrices using MPIComprehensive Performance Evaluation on Multiplication of Matrices using MPI
Comprehensive Performance Evaluation on Multiplication of Matrices using MPI
 
Bh36352357
Bh36352357Bh36352357
Bh36352357
 
Parallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMPParallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMP
 
cug2011-praveen
cug2011-praveencug2011-praveen
cug2011-praveen
 
I0343047049
I0343047049I0343047049
I0343047049
 
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
AI optimizing HPC simulations (presentation from  6th EULAG Workshop)AI optimizing HPC simulations (presentation from  6th EULAG Workshop)
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
 
Genetic Algorithms
Genetic AlgorithmsGenetic Algorithms
Genetic Algorithms
 
3 article azojete vol 7 24 33
3 article azojete vol 7 24 333 article azojete vol 7 24 33
3 article azojete vol 7 24 33
 
[20240422_LabSeminar_Huy]Taming_Effect.pptx
[20240422_LabSeminar_Huy]Taming_Effect.pptx[20240422_LabSeminar_Huy]Taming_Effect.pptx
[20240422_LabSeminar_Huy]Taming_Effect.pptx
 

Mehr von Satoru Katsumata

Exploiting Monolingual Data at Scale for Neural Machine Translation
Exploiting Monolingual Data at Scale for Neural Machine TranslationExploiting Monolingual Data at Scale for Neural Machine Translation
Exploiting Monolingual Data at Scale for Neural Machine TranslationSatoru Katsumata
 
Incorporating Syntactic and Semantic Information in Word Embeddings using Gra...
Incorporating Syntactic and Semantic Information in Word Embeddings using Gra...Incorporating Syntactic and Semantic Information in Word Embeddings using Gra...
Incorporating Syntactic and Semantic Information in Word Embeddings using Gra...Satoru Katsumata
 
How Contextual are Contextualized Word Representations?
How Contextual are Contextualized Word Representations?How Contextual are Contextualized Word Representations?
How Contextual are Contextualized Word Representations?Satoru Katsumata
 
Corpora Generation for Grammatical Error Correction
Corpora Generation for Grammatical Error CorrectionCorpora Generation for Grammatical Error Correction
Corpora Generation for Grammatical Error CorrectionSatoru Katsumata
 
Understanding Back-Translation at Scale
Understanding Back-Translation at ScaleUnderstanding Back-Translation at Scale
Understanding Back-Translation at ScaleSatoru Katsumata
 
2018年度レトリバインターン参加報告
2018年度レトリバインターン参加報告2018年度レトリバインターン参加報告
2018年度レトリバインターン参加報告Satoru Katsumata
 
On the Limitations of Unsupervised Bilingual Dictionary Induction
On the Limitations of Unsupervised Bilingual Dictionary InductionOn the Limitations of Unsupervised Bilingual Dictionary Induction
On the Limitations of Unsupervised Bilingual Dictionary InductionSatoru Katsumata
 
Guiding neural machine translation with retrieved translation pieces
Guiding neural machine translation with retrieved translation piecesGuiding neural machine translation with retrieved translation pieces
Guiding neural machine translation with retrieved translation piecesSatoru Katsumata
 
Bi-Directional Block Self-Attention for Fast and Memory-Efficient Sequence Mo...
Bi-Directional Block Self-Attention for Fast and Memory-Efficient Sequence Mo...Bi-Directional Block Self-Attention for Fast and Memory-Efficient Sequence Mo...
Bi-Directional Block Self-Attention for Fast and Memory-Efficient Sequence Mo...Satoru Katsumata
 
Memory-augmented Neural Machine Translation
Memory-augmented Neural Machine TranslationMemory-augmented Neural Machine Translation
Memory-augmented Neural Machine TranslationSatoru Katsumata
 
Deep Neural Machine Translation with Linear Associative Unit
Deep Neural Machine Translation with Linear Associative UnitDeep Neural Machine Translation with Linear Associative Unit
Deep Neural Machine Translation with Linear Associative UnitSatoru Katsumata
 
A convolutional encoder model for neural machine translation
A convolutional encoder model for neural machine translationA convolutional encoder model for neural machine translation
A convolutional encoder model for neural machine translationSatoru Katsumata
 

Mehr von Satoru Katsumata (13)

Exploiting Monolingual Data at Scale for Neural Machine Translation
Exploiting Monolingual Data at Scale for Neural Machine TranslationExploiting Monolingual Data at Scale for Neural Machine Translation
Exploiting Monolingual Data at Scale for Neural Machine Translation
 
Incorporating Syntactic and Semantic Information in Word Embeddings using Gra...
Incorporating Syntactic and Semantic Information in Word Embeddings using Gra...Incorporating Syntactic and Semantic Information in Word Embeddings using Gra...
Incorporating Syntactic and Semantic Information in Word Embeddings using Gra...
 
How Contextual are Contextualized Word Representations?
How Contextual are Contextualized Word Representations?How Contextual are Contextualized Word Representations?
How Contextual are Contextualized Word Representations?
 
Word-node2vec
Word-node2vecWord-node2vec
Word-node2vec
 
Corpora Generation for Grammatical Error Correction
Corpora Generation for Grammatical Error CorrectionCorpora Generation for Grammatical Error Correction
Corpora Generation for Grammatical Error Correction
 
Understanding Back-Translation at Scale
Understanding Back-Translation at ScaleUnderstanding Back-Translation at Scale
Understanding Back-Translation at Scale
 
2018年度レトリバインターン参加報告
2018年度レトリバインターン参加報告2018年度レトリバインターン参加報告
2018年度レトリバインターン参加報告
 
On the Limitations of Unsupervised Bilingual Dictionary Induction
On the Limitations of Unsupervised Bilingual Dictionary InductionOn the Limitations of Unsupervised Bilingual Dictionary Induction
On the Limitations of Unsupervised Bilingual Dictionary Induction
 
Guiding neural machine translation with retrieved translation pieces
Guiding neural machine translation with retrieved translation piecesGuiding neural machine translation with retrieved translation pieces
Guiding neural machine translation with retrieved translation pieces
 
Bi-Directional Block Self-Attention for Fast and Memory-Efficient Sequence Mo...
Bi-Directional Block Self-Attention for Fast and Memory-Efficient Sequence Mo...Bi-Directional Block Self-Attention for Fast and Memory-Efficient Sequence Mo...
Bi-Directional Block Self-Attention for Fast and Memory-Efficient Sequence Mo...
 
Memory-augmented Neural Machine Translation
Memory-augmented Neural Machine TranslationMemory-augmented Neural Machine Translation
Memory-augmented Neural Machine Translation
 
Deep Neural Machine Translation with Linear Associative Unit
Deep Neural Machine Translation with Linear Associative UnitDeep Neural Machine Translation with Linear Associative Unit
Deep Neural Machine Translation with Linear Associative Unit
 
A convolutional encoder model for neural machine translation
A convolutional encoder model for neural machine translationA convolutional encoder model for neural machine translation
A convolutional encoder model for neural machine translation
 

Kürzlich hochgeladen

SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICEayushi9330
 
Unit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 oUnit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 oManavSingh202607
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
Introduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxIntroduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxBhagirath Gogikar
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptxAlMamun560346
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)AkefAfaneh2
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformationAreesha Ahmad
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Servicemonikaservice1
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxRizalinePalanog2
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 

Kürzlich hochgeladen (20)

SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
Unit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 oUnit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 o
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Introduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxIntroduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptx
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
 

Lexically constrained decoding for sequence generation using grid beam search

  • 1. Lexically Constrained Decoding for Sequence Generation Using Grid Beam Search Chris Hokamp, Qun Liu (ACL 2017) B4 勝又 智
  • 2. Abstract ● They present Grid Beam Search (GBS), an algorithm which extends beam search to allow the inclusion of pre-specified lexical constraints. ● They demonstrate the feasibility and flexibility of Lexically Constrained Decoding by conducting 2 experiments. ○ Neural Interactive-Predictive Translation ○ Domain Adaptation for Neural Machine Translation ● Result ○ GBS can provide large improvements in translation quality in interactive scenarios. ○ even without any user input, GBS can be used to achieve significant gains in performance in domain adaptation scenarios. 2
  • 3. Introduction: What is this research? ● The output of many natural language processing model is a sequence of text. → Especially, they focus on machine translation usecases.† ○ such as Post-Editing (PE) and Interactive-Predictive MT ● Their goal is to find a way to force the output of a model to contain such lexical constraints. ● In this paper, they propose a decoding algorithm which allows the specification of subsequences to be present in a model’s output. 3 input model output (sequenc e) †: lexically constrained decoding is relevant to any scenario where a model is asked to generate a secuence given an input. (image description, dialog generation, abstractive summarization, question answering)
  • 4. Background: Beam Search 4algorithm: Kyoto University Participation to WAT 2016. Cromieres et al., WAT2016
  • 5. Grid Beam Search: Abstract ● their goal is to organize decoding in such a way that they can constrain the search space to outputs which contain one or more pre-specified sub-sequences. ○ want to place lexical constraints correctly and to generate the parts of the output which are not covered by the constraints. 5time step (t) Constraint Coverage (c) constraints is an array of sequences. ← constraints = [[Ihre, Rechte, müssen], [Abreise, geschützt]] more detail on the next slides →
  • 6. Grid Beam Search: Algorithm 6 numC: total number of tokens in all constaints k: beam window size (this pseudo-code is for any model) 1. open hypotheses can either generate from the model’s distribution or start constraints 2. closed hypotheses can only generate the next token for in a currently unfinished constraint 1. open hypotheses (Grid[t-1][c]) may generate continuations from the model’s distribution 2. open hypotheses (Grid[t-1][c-1]) may start new constraints 3. closed hypotheses (Grid[t-1][c-1]) may continue constraints
  • 7. Grid Beam Search: Deteil 7 time step (t) Constraint Coverage (c) a colored strip inside a beam box represents an individual hypothesis in the beam’s k-best stack. Hypotheses with circles are closed, all other hypotheses are open. ← conceptual drawing
  • 8. Grid Beam Search: Multi-token Constraints ● By distinguishing between open and closed hypotheses, → allow for arbitrary multi-token phrases in the search. ● Each hypothesis maintains a coverage vector. ○ ensure that constraints cannot be repeated in a search path (hypotheses which have already covered constrainti can only generate or start new constraints) ● Discontinuous lexical constraints such as phrasal verbs in En or Ge ○ it is easy to apply GBS by adding filters to the search ↓ 8 constraint ask <someone> out constraint1 ask constraint2 out constraint1 cannot be used before constraint0 there must be at least one generated token between the constraints. use 2 filters
  • 9. Grid Beam Search: Addressing unknown words ● their decoder can handle arbitrary constraints. → there is a risk that constraints will contain unknown tokens. ○ Especially in domain adaptation scenarios, some user-specified constraints are very likely to contain unseen tokens. → Use Subword Units. (maybe Byte Pair Encoding...) (In the experiments, they use this technique to ensure that no input tokens are unknown, even if a constraint contains words which never appeared in the training data.) 9
  • 10. Grid Beam Search: Effciency ● the runtime complexity of a naive implementation of GBS is O(ktc). (standard time-based beam search is O(tk)) ↓ some consideration must be given to the efficiency of this algorithm ● the beams in each column c of this Figure are independent. → GBS can be parallelized to allow all beams at each timestep to be filled simultaneously. ● they find that the most time is spent computing the states for the hyp candidates. → by keeping the beam size small, they can make GBS significantly faster. 10
  • 11. Experiments setup (including appendix) ● The models are SOTA NMT systems using our own implementation of NMT with attention over the source sequence (Bahdnau et al., 2014). ○ bidirectional GRUs ○ AdaDelta ○ train all systems for 500,000 iterations, with validation every 5,000 steps, best single model from validation is used. ○ use ℓ2 regularization (α = 1e^(-5)) ○ Dropout is used on the output layers, rate is 0.5. ○ beam size is 10. ● corpus ○ English-German ■ 4.4M segments from the Europarl and CommonCrawl ○ English-French ■ 4.9M segments from the Europarl and CommonCrawl ○ English-Portuguese ■ 28.5M segments from the Europarl, JRC-Aquis and OpenSubtitles ● subword ○ they created a shared subword representation for each language pair by extracting a vocabulary of 80,000 symbols from the concatenated source and target data. 11
  • 12. Experiments: Pick-Revise for Interactive PE ● Pick-Revise is an interaction cycle for MT Post-Editing. ● they modify this experiments. ○ the user only provides sequences of up to three words which are missing from the hypothesis from reference. ○ in this experiments, four cycles. ○ strict setting: the complete phrase must be missing from the hypothesis. ○ relaxed setting: only the first word must be missing. ○ when a 3-token phrase cannot be found, they backoff to 2-token phrases, then to single tokens as constraints. ○ If a hypothesis already matches the reference, no constraints are added. 12fig1: PRIMT: A Pick-Revise Framework for Interactive Machine Translation. Cheng et al., NAACL HLT 2016
  • 13. Result: Pick-Revise for Interactive PE 13
  • 14. Experiments: Domain Adaptation via Terminology ● The requirement for use of domain-specific terminologies is common in real world applications of MT. → provide the term as constraints. ● target domain data ○ Autodesk Post-Editing corpus ↑ focused upon sofware localization, very dfferent from train data. ● Extract Terminology ○ devide the corpus into approximately 100,000 training sents, 1,000 test segments. automatically generate a terminology by computing PMI between source and target n-grams in training set. ○ extract all n-grams from length 2-5 as terminology candidates. ○ filter the candidates to only include pairs whose PMI is >= 0.9 and where both src and tgt occur in at least 5 times. → make the set of the terminology. 14 normarized to [-1, +1]
  • 15. Results: Domain Adaptation via Terminology ● confirming that this improvements is caused by the placement of the terms by GBS, they evaluate the other inserting methods. ○ randomly inserting terms to the baseline output ○ prepending terms to the baseline output ● even an automatically created terminology yield improvements. ○ manually created domain terminologies are likely to lead to greater gains. 15 test set src tgt terminology phrase corrsponding phrase constraints for that segmentsadd ↑ conceptual drawing for previous slide
  • 16. Output Analysis 16 phrases added as constraints have global effects upon trainslation quolity.
  • 17. related work ● Most related work has presented modifications of SMT systems for specific usecases. (focusing on prefix and sufix) ○ The largest body of work considers Interactive Machine Translation (IMT) ● some attention has also been given to SMT decoding with multiple lexical constraints. (Cheng et al.,2016) ○ their approach relies upon the phrase segmentation provided by the SMT system. ○ the decoding algorithm can only make use of constraints that match phrase boundaries. In contrast, GBS approach decodes at the token level. ● “To the best of our knowledge, ours is the first work which considers general lexically constrained decoding for any model which outputs sequences, without relying upon alignments between input and output, and without using a search organized by coverage of the input.” 17
  • 18. Conclusion ● Lexically constrained decoding is a flexible way to incorporate arbitrary subsequences into the output of any model that generates output sequences token-by-token. ● in interctive translation, user inputs can be used as constraints, generating a new output each time a user fixes an error. ○ such a workflow can provide a large improvement in translation quality. ● using a domain-specific terminology to generate target-side constraints ○ a general domain model can be adapted to a new domain without anay retraining ● future work ○ evaluate GBS with models outside of MT, such as automatic summarization, image captioning or dialog generation ○ introduce new constraint-aware models (via secondary attention mecanisms over lexical constraints) 18
  • 19. reference ● their implementation of GBS ○ https://github.com/chrishokamp/constrained_decoding ● Kyoto University Participation to WAT 2016. Cromieres et al., WAT2016 ● PRIMT: A Pick-Revise Framework for Interactive Machine Translation. Cheng et al., NAACL HLT 2016 19