SlideShare ist ein Scribd-Unternehmen logo
1 von 26
Downloaden Sie, um offline zu lesen
Dynamic Pooling and Unfolding Recursive Autoencoders
for Paraphrase Detection1
Richard Socher, Eric Huang, Jeffrey Penningotn, Andrew Ng,
Christopher Manning
Feynman Liang
May 16, 2013
1
Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection
Richard Socher, Eric H. Huang, Jeffrey Pennington, Andrew Y. Ng, and Christopher D.
Manning Advances in Neural Information Processing Systems (NIPS 2011)
F. Liang Unfolding RAE for Paraphrase Detection May 2013 1 / 26
Motivation
Consider the following phrases:
The judge also refused to postpone the trial date of Sept. 29.
Obus also denied a defense motion to postpone the September trial
date.
F. Liang Unfolding RAE for Paraphrase Detection May 2013 2 / 26
Paraphrase Detection Problem
Given: A pair of sentences S1 = (w1, . . . , wm) and
S2 = (w1, . . . , wn), w ∈ V
Task: Classify whether S1 and S2 are paraphrases or not
F. Liang Unfolding RAE for Paraphrase Detection May 2013 3 / 26
Overview
Background
Neural Language Models
Recursive Autoencoders
Contributions
Unfolding RAEs
Dynamic Pooling of Similarity Matrix
Experiments
F. Liang Unfolding RAE for Paraphrase Detection May 2013 4 / 26
Prior Work
Similarity Metrics
n-gram Overlap / Longest Common Subsequence
Ordered Tree Edit Distance
WordNet hypernyms
Language Models
n-gram HMMs
P(wt|wt−1
1 ) ≈ P(wt|wt−1
t−n+1)
Log-Linear Models
P(y|wt
1; θ) ≈
eθ f (wt
1 ,y)
y ∈Y
eθ f (wt
1 ,y )
Neural Language Models2
2
R. Collobert and J. Weston. A unified architecture for natural language processing:
deep neural networks with multitask learning. In ICML, 2008.
F. Liang Unfolding RAE for Paraphrase Detection May 2013 5 / 26
Neural Language Models
Vocabulary V
Embedding Matrix L ∈ Rn×|V|
L : V → Rn
Each column of L “embeds” w ∈ V on a n-dimensional feature space
Capture semantic and syntactic information about a word
A sentence S = (w1, . . . , wm), wi ∈ V is represented as an ordered list
(x1, . . . , xm), xi ∈ Rn
F. Liang Unfolding RAE for Paraphrase Detection May 2013 6 / 26
Neural Language Models
F. Liang Unfolding RAE for Paraphrase Detection May 2013 7 / 26
Y. Bengio, R. Ducharme, P. Vincent, and C. Janvin. A neural probabilistic language
model. J. Mach. Learn. Res., 3, March 2003.
Recursive Autoencoders (RAEs)
Assume we are given a binary parse tree T :
A binary parse tree is a list of triplets of parents with children:
(p → (c1, c2))
c1, c2 are either a terminal word vector xi ∈ Rn or a non-terminal
parent y1 ∈ Rn
Figure : Parse tree for ((y1 → x2x3), (y2 → x1y1)), ∀x, y ∈ Rn
F. Liang Unfolding RAE for Paraphrase Detection May 2013 8 / 26
Recursive Autoencoders (RAEs)
Non-terminal parent p computed as
p = f (We[c1; c2] + b)
f is an activation function (eg. sigmoid, tanh)
We ∈ Rn×2n
the encoding matrix to be learned
[c1; c2] ∈ R2n
is the concatenated children
b is a bias term
Figure : y2 = f

We


x1
f We
x2
x3
+ b1

 + b2


F. Liang Unfolding RAE for Paraphrase Detection May 2013 9 / 26
Recursive Autoencoders (RAEs)
Wd inverts We s.t. [c1; c2] = f (Wd p + bd ) is the decoding of p
Erec(p) = [c1; c2] − [c1; c2] 2
2
To train:
Minimize Erec (T ) =
p∈T
Erec (p) = Erec (y1) + Erec (y2)
Add length normalization layer p = p
p 2
to avoid degenerate solution
F. Liang Unfolding RAE for Paraphrase Detection May 2013 10 / 26
Unfolding RAEs
Measure reconstruction error down to terminal xi s:
For a node y that spans words i to j:
Erec(y(i,j)) = [xi ; . . . ; xj ] − [xi ; . . . ; xj ] 2
2
Hidden layer norms no longer shrink
Children with larger subtrees get more weight
F. Liang Unfolding RAE for Paraphrase Detection May 2013 11 / 26
Deep RAEs
h = f (W
(1)
e [c1; c2] + b
(1)
e )
p = f (W
(2)
e h + b
(2)
2 )
F. Liang Unfolding RAE for Paraphrase Detection May 2013 12 / 26
Andrew Ng. Autoencoders (CS294A Lecture notes).
Training RAEs
Data: A set of parse trees
Objective: Minimize
J =
1
|T |
n∈T
Erec(n; We) +
λ
2
( We
2
)
Gradient descent (backpropogation, L-BFGS)
Non-convex, smooth convergance =⇒ local optima
F. Liang Unfolding RAE for Paraphrase Detection May 2013 13 / 26
Sentence Similarity Matrix
For two sentences S1, S2 of lengths n and m, concatenate terminal
xi s (in sentence order) with non-terminal yi s (depth-first, right-to-left)
Compute similarity matrix S ∈ R(2v−1)×(2w−1), where Si,j is the
2-norm between the ith element from S1’s feature vector and the jth
element from S2’s feature vector
F. Liang Unfolding RAE for Paraphrase Detection May 2013 14 / 26
Dynamic Pooling
Sentence lengths may vary =⇒ S dimensionality may vary.
Want to map S ∈ R(2n−1)×(2m−1) to Spooled ∈ Rnp×np with np constant
Dynamically partition rows and columns of S into np equal parts
Min. pool over each part
Normalize µ = 0, σ = 1 and pass on to classifier (e.g. softmax)
F. Liang Unfolding RAE for Paraphrase Detection May 2013 15 / 26
Qualitative Evaluation of Unsupervised Feature Learning
Dataset
150,000 sentences from NYT and AP sections of Gigaword corpus for
RAE training
Setup
R100 unsupervised feature vectors provided by Turian et al.3 for initial
word embeddings
Stanford parser4 to extract parse tree
Hidden layer h set to 200 units in both standard and unfolding RAE
(0 in NN qualitative evaluation)
3
J. Turian, L. Ratinov, and Y. Bengio. Word representations: a simple and general
method for semisupervised learning. In Proceedings of ACL, pages 384394, 2010.
4
D. Klein and C. D. Manning. Accurate unlexicalized parsing. In ACL, 2003.
F. Liang Unfolding RAE for Paraphrase Detection May 2013 16 / 26
Nearest Neighbor
Figure : Comparison of nearest 2-norm neighbor
F. Liang Unfolding RAE for Paraphrase Detection May 2013 17 / 26
Recursive Decoding
Figure : Phrase reconstruction via recursive decoding
F. Liang Unfolding RAE for Paraphrase Detection May 2013 18 / 26
Paraphrase Detection Task
Dataset
Microsoft Research paraphrase corpus (MSRP)5
5,801 sentence pairs, 3,900 labeled as paraphrases
5
B. Dolan, C. Quirk, and C. Brockett. Unsupervised construction of large paraphrase
corpora: exploiting massively parallel news sources. In COLING, 2004.
F. Liang Unfolding RAE for Paraphrase Detection May 2013 19 / 26
Paraphrase Detection Task
Setup
4,076 training pairs (67.5% positive), 1,725 test pairs (66.5%
positive)
For all (S1, S2) in training data, (S2, S1) also added
Negative examples selected for high lexical overlap
Add features ∈ {0, 1} to Spooled related to the set of numbers in S1
and S2
Numbers in S1 = numbers in S2
(Numbers in S1 ∪ numbers in S2) = ∅
Numbers in one sentence ⊂ numbers in other
Softmax classifier on top of Spooled
Hyperparameter selection: 10-fold cross-validation
np = 15
λRAE = 10−5
λsoftmax = 0.05
Two annotators (83% agreement), third to resolve conflict
F. Liang Unfolding RAE for Paraphrase Detection May 2013 20 / 26
Example Results
F. Liang Unfolding RAE for Paraphrase Detection May 2013 21 / 26
State of the Art
F. Liang Unfolding RAE for Paraphrase Detection May 2013 22 / 26
“Paraphrase Identification (State of the Art).” ACLWiki. Web. 14 May 2013.
Comparison of Unsupervised Feature Learning Methods
Setup
Dynamic pooling layer
Hyperparameters optimized over C.V. set
Results
Recursive averaging: 75.9%
Standard RAE: 75.5%
Unfolding RAE without hidden layers: 76.8%
Unfolding RAE with hidden layers: 76.6%
F. Liang Unfolding RAE for Paraphrase Detection May 2013 23 / 26
Evaluating cotribution of Dynamic Pooling Layer
Setup
Unfolding RAE used to compute S
Hyperparameters optimized over C.V. set
Results
S-histogram 73.0%
Only added number features 73.2%
Only Spooled 72.6%
Top URAE Node 74.2%
Spooled + number features 76.8%
F. Liang Unfolding RAE for Paraphrase Detection May 2013 24 / 26
Criteque
Pros:
Novel unfolding reconstruction error metric, dynamic pooling layer
State of the art (2011) performance
Cons:
Vague training details / time to convergence
Unconvincing improvement over baseline (recursive averaging, top
RAE node)
Training requires labeled parse trees (unsupervised performance
depends on parser accuracy)
Representing phrases on same feature-space as words
F. Liang Unfolding RAE for Paraphrase Detection May 2013 25 / 26
Criteque
Suggestions:
Add additional features to Spooled
Overlap pooling regions
Letting We vary depending on labels of children in parse tree
Capture the operational meaning of a word to a sentence (MV-RNN6)
p = f We
c1
c2
+ b → p = f We
Ba + b0
Ab + a0
+ p0
6
Richard Socher, Brody Huval, Christopher D. Manning and Andrew Y. Ng
Conference on Empirical Methods in Natural Language Processing
F. Liang Unfolding RAE for Paraphrase Detection May 2013 26 / 26

Weitere ähnliche Inhalte

Was ist angesagt?

Align, Disambiguate and Walk : A Unified Approach forMeasuring Semantic Simil...
Align, Disambiguate and Walk  : A Unified Approach forMeasuring Semantic Simil...Align, Disambiguate and Walk  : A Unified Approach forMeasuring Semantic Simil...
Align, Disambiguate and Walk : A Unified Approach forMeasuring Semantic Simil...
Koji Matsuda
 
A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...
A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...
A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...
iyo
 

Was ist angesagt? (19)

Incremental View Maintenance for openCypher Queries
Incremental View Maintenance for openCypher QueriesIncremental View Maintenance for openCypher Queries
Incremental View Maintenance for openCypher Queries
 
Automatic Mathematical Information Retrieval to Perform Translations up to Co...
Automatic Mathematical Information Retrieval to Perform Translations up to Co...Automatic Mathematical Information Retrieval to Perform Translations up to Co...
Automatic Mathematical Information Retrieval to Perform Translations up to Co...
 
Introduction to R for Data Science :: Session 7 [Multiple Linear Regression i...
Introduction to R for Data Science :: Session 7 [Multiple Linear Regression i...Introduction to R for Data Science :: Session 7 [Multiple Linear Regression i...
Introduction to R for Data Science :: Session 7 [Multiple Linear Regression i...
 
A Unifying Four-State Labelling Semantics for Bridging Abstract Argumentation...
A Unifying Four-State Labelling Semantics for Bridging Abstract Argumentation...A Unifying Four-State Labelling Semantics for Bridging Abstract Argumentation...
A Unifying Four-State Labelling Semantics for Bridging Abstract Argumentation...
 
A Concurrent Language for Argumentation: Preliminary Notes
A Concurrent Language for Argumentation: Preliminary NotesA Concurrent Language for Argumentation: Preliminary Notes
A Concurrent Language for Argumentation: Preliminary Notes
 
深層意味表現学習 (Deep Semantic Representations)
深層意味表現学習 (Deep Semantic Representations)深層意味表現学習 (Deep Semantic Representations)
深層意味表現学習 (Deep Semantic Representations)
 
Align, Disambiguate and Walk : A Unified Approach forMeasuring Semantic Simil...
Align, Disambiguate and Walk  : A Unified Approach forMeasuring Semantic Simil...Align, Disambiguate and Walk  : A Unified Approach forMeasuring Semantic Simil...
Align, Disambiguate and Walk : A Unified Approach forMeasuring Semantic Simil...
 
Teaching algebra through functional programming
Teaching algebra through functional programmingTeaching algebra through functional programming
Teaching algebra through functional programming
 
Machine Learning : Latent variable models for discrete data (Topic model ...)
Machine Learning : Latent variable models for discrete data (Topic model ...)Machine Learning : Latent variable models for discrete data (Topic model ...)
Machine Learning : Latent variable models for discrete data (Topic model ...)
 
Harnessing Deep Neural Networks with Logic Rules
Harnessing Deep Neural Networks with Logic RulesHarnessing Deep Neural Networks with Logic Rules
Harnessing Deep Neural Networks with Logic Rules
 
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
 
A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...
A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...
A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...
 
A Matrix Based Approach for Weighted Argumentation Frameworks
A Matrix Based Approach for Weighted Argumentation FrameworksA Matrix Based Approach for Weighted Argumentation Frameworks
A Matrix Based Approach for Weighted Argumentation Frameworks
 
A Labelling Semantics for Weighted Argumentation Frameworks
A Labelling Semantics for Weighted Argumentation FrameworksA Labelling Semantics for Weighted Argumentation Frameworks
A Labelling Semantics for Weighted Argumentation Frameworks
 
Extending Labelling Semantics to Weighted Argumentation Frameworks
Extending Labelling Semantics to Weighted Argumentation FrameworksExtending Labelling Semantics to Weighted Argumentation Frameworks
Extending Labelling Semantics to Weighted Argumentation Frameworks
 
Optimization of dfa
Optimization of dfaOptimization of dfa
Optimization of dfa
 
grammer
grammergrammer
grammer
 
4 informed-search
4 informed-search4 informed-search
4 informed-search
 
A Concurrent Argumentation Language for Negotiation and Debating
A Concurrent Argumentation Language for Negotiation and DebatingA Concurrent Argumentation Language for Negotiation and Debating
A Concurrent Argumentation Language for Negotiation and Debating
 

Andere mochten auch

Sprawozdanie finansowe 2013 pl
Sprawozdanie finansowe 2013 plSprawozdanie finansowe 2013 pl
Sprawozdanie finansowe 2013 pl
odfoundation
 
Odf sprawozdanie finansowe-2012_pl
Odf sprawozdanie finansowe-2012_plOdf sprawozdanie finansowe-2012_pl
Odf sprawozdanie finansowe-2012_pl
odfoundation
 
Stay yourself, just converse
Stay yourself, just converseStay yourself, just converse
Stay yourself, just converse
Kailin Chang
 
Odf report-massacre-in-odessa-pl
Odf report-massacre-in-odessa-plOdf report-massacre-in-odessa-pl
Odf report-massacre-in-odessa-pl
odfoundation
 
History of Afghanistan Railway Network
History of Afghanistan Railway NetworkHistory of Afghanistan Railway Network
History of Afghanistan Railway Network
Ahmad Basim Hamza
 
Defensem els animals
Defensem els animals Defensem els animals
Defensem els animals
martita56
 

Andere mochten auch (20)

SEO-АПОКАЛИПСИС: Как выживать при новом поиске (2014, К. Скобеев, "Интернет и...
SEO-АПОКАЛИПСИС: Как выживать при новом поиске (2014, К. Скобеев, "Интернет и...SEO-АПОКАЛИПСИС: Как выживать при новом поиске (2014, К. Скобеев, "Интернет и...
SEO-АПОКАЛИПСИС: Как выживать при новом поиске (2014, К. Скобеев, "Интернет и...
 
25 08-2014-odf-report-case-of-nadezhda-savchenko-ua
25 08-2014-odf-report-case-of-nadezhda-savchenko-ua25 08-2014-odf-report-case-of-nadezhda-savchenko-ua
25 08-2014-odf-report-case-of-nadezhda-savchenko-ua
 
Sprawozdanie finansowe 2013 pl
Sprawozdanie finansowe 2013 plSprawozdanie finansowe 2013 pl
Sprawozdanie finansowe 2013 pl
 
23 06-2014-odf-report-russian-federation-supports-terrorists-in-eastern-ukrai...
23 06-2014-odf-report-russian-federation-supports-terrorists-in-eastern-ukrai...23 06-2014-odf-report-russian-federation-supports-terrorists-in-eastern-ukrai...
23 06-2014-odf-report-russian-federation-supports-terrorists-in-eastern-ukrai...
 
Odf sprawozdanie finansowe-2012_pl
Odf sprawozdanie finansowe-2012_plOdf sprawozdanie finansowe-2012_pl
Odf sprawozdanie finansowe-2012_pl
 
Facebook in education
Facebook in educationFacebook in education
Facebook in education
 
Stay yourself, just converse
Stay yourself, just converseStay yourself, just converse
Stay yourself, just converse
 
History of afghanistan railway network
History of afghanistan railway networkHistory of afghanistan railway network
History of afghanistan railway network
 
12 hadits lemah dan palsu seputar ramadhan
12 hadits lemah dan palsu seputar ramadhan12 hadits lemah dan palsu seputar ramadhan
12 hadits lemah dan palsu seputar ramadhan
 
Odf report-massacre-in-odessa-pl
Odf report-massacre-in-odessa-plOdf report-massacre-in-odessa-pl
Odf report-massacre-in-odessa-pl
 
Zbiórki sprawozdanie 2015
Zbiórki sprawozdanie 2015Zbiórki sprawozdanie 2015
Zbiórki sprawozdanie 2015
 
Linguine
LinguineLinguine
Linguine
 
Burabod
BurabodBurabod
Burabod
 
3wish game walk through edugaming presentation
3wish game walk through edugaming presentation3wish game walk through edugaming presentation
3wish game walk through edugaming presentation
 
Web Components: back to the future
Web Components: back to the futureWeb Components: back to the future
Web Components: back to the future
 
Google docs pagrindai
Google docs pagrindaiGoogle docs pagrindai
Google docs pagrindai
 
History of Afghanistan Railway Network
History of Afghanistan Railway NetworkHistory of Afghanistan Railway Network
History of Afghanistan Railway Network
 
Defensem els animals
Defensem els animals Defensem els animals
Defensem els animals
 
Parang Machete
Parang MacheteParang Machete
Parang Machete
 
Minds head2
Minds head2Minds head2
Minds head2
 

Ähnlich wie Recursive Autoencoders for Paraphrase Detection (Socher et al)

AllenClarkStarek.eaIROS2014.Presentation
AllenClarkStarek.eaIROS2014.PresentationAllenClarkStarek.eaIROS2014.Presentation
AllenClarkStarek.eaIROS2014.Presentation
Joseph Starek
 
A Graph-based Cross-lingual Projection Approach for Spoken Language Understan...
A Graph-based Cross-lingual Projection Approach for Spoken Language Understan...A Graph-based Cross-lingual Projection Approach for Spoken Language Understan...
A Graph-based Cross-lingual Projection Approach for Spoken Language Understan...
Seokhwan Kim
 
RECENT ADVANCES in PREDICTIVE (MACHINE) LEARNING
RECENT ADVANCES in PREDICTIVE (MACHINE) LEARNINGRECENT ADVANCES in PREDICTIVE (MACHINE) LEARNING
RECENT ADVANCES in PREDICTIVE (MACHINE) LEARNING
butest
 
SNLI_presentation_2
SNLI_presentation_2SNLI_presentation_2
SNLI_presentation_2
Viral Gupta
 

Ähnlich wie Recursive Autoencoders for Paraphrase Detection (Socher et al) (20)

Framester: A Wide Coverage Linguistic Linked Data Hub
Framester: A Wide Coverage Linguistic Linked Data HubFramester: A Wide Coverage Linguistic Linked Data Hub
Framester: A Wide Coverage Linguistic Linked Data Hub
 
AllenClarkStarek.eaIROS2014.Presentation
AllenClarkStarek.eaIROS2014.PresentationAllenClarkStarek.eaIROS2014.Presentation
AllenClarkStarek.eaIROS2014.Presentation
 
AINL 2016: Maraev
AINL 2016: MaraevAINL 2016: Maraev
AINL 2016: Maraev
 
GENERAL REGRESSION NEURAL NETWORK BASED POS TAGGING FOR NEPALI TEXT
GENERAL REGRESSION NEURAL NETWORK BASED POS TAGGING FOR NEPALI TEXTGENERAL REGRESSION NEURAL NETWORK BASED POS TAGGING FOR NEPALI TEXT
GENERAL REGRESSION NEURAL NETWORK BASED POS TAGGING FOR NEPALI TEXT
 
MT SUMMIT2013 poster boaster slides.Language-independent Model for Machine Tr...
MT SUMMIT2013 poster boaster slides.Language-independent Model for Machine Tr...MT SUMMIT2013 poster boaster slides.Language-independent Model for Machine Tr...
MT SUMMIT2013 poster boaster slides.Language-independent Model for Machine Tr...
 
Basic review on topic modeling
Basic review on  topic modelingBasic review on  topic modeling
Basic review on topic modeling
 
A Graph-based Cross-lingual Projection Approach for Spoken Language Understan...
A Graph-based Cross-lingual Projection Approach for Spoken Language Understan...A Graph-based Cross-lingual Projection Approach for Spoken Language Understan...
A Graph-based Cross-lingual Projection Approach for Spoken Language Understan...
 
Analysis And Indexing General Terms Experimentation
Analysis And Indexing General Terms ExperimentationAnalysis And Indexing General Terms Experimentation
Analysis And Indexing General Terms Experimentation
 
Topic model an introduction
Topic model an introductionTopic model an introduction
Topic model an introduction
 
Invitation to Scala
Invitation to ScalaInvitation to Scala
Invitation to Scala
 
Understanding R for Epidemiologists
Understanding R for EpidemiologistsUnderstanding R for Epidemiologists
Understanding R for Epidemiologists
 
Tutorial - Introduction to Rule Technologies and Systems
Tutorial - Introduction to Rule Technologies and SystemsTutorial - Introduction to Rule Technologies and Systems
Tutorial - Introduction to Rule Technologies and Systems
 
Link Discovery Tutorial Part II: Accuracy
Link Discovery Tutorial Part II: AccuracyLink Discovery Tutorial Part II: Accuracy
Link Discovery Tutorial Part II: Accuracy
 
Latent Relational Model for Relation Extraction
Latent Relational Model for Relation ExtractionLatent Relational Model for Relation Extraction
Latent Relational Model for Relation Extraction
 
Pertemuan 5_Relation Matriks_01 (17)
Pertemuan 5_Relation Matriks_01 (17)Pertemuan 5_Relation Matriks_01 (17)
Pertemuan 5_Relation Matriks_01 (17)
 
RECENT ADVANCES in PREDICTIVE (MACHINE) LEARNING
RECENT ADVANCES in PREDICTIVE (MACHINE) LEARNINGRECENT ADVANCES in PREDICTIVE (MACHINE) LEARNING
RECENT ADVANCES in PREDICTIVE (MACHINE) LEARNING
 
Machine Learning on Knowledge Graphs: a Quick Tour of Knowledge Graph Embeddings
Machine Learning on Knowledge Graphs: a Quick Tour of Knowledge Graph EmbeddingsMachine Learning on Knowledge Graphs: a Quick Tour of Knowledge Graph Embeddings
Machine Learning on Knowledge Graphs: a Quick Tour of Knowledge Graph Embeddings
 
LAF Fabric
LAF FabricLAF Fabric
LAF Fabric
 
SNLI_presentation_2
SNLI_presentation_2SNLI_presentation_2
SNLI_presentation_2
 
information theoretic subspace clustering
information theoretic subspace clusteringinformation theoretic subspace clustering
information theoretic subspace clustering
 

Mehr von Feynman Liang (6)

Accelerating Metropolis Hastings with Lightweight Inference Compilation
Accelerating Metropolis Hastings with Lightweight Inference CompilationAccelerating Metropolis Hastings with Lightweight Inference Compilation
Accelerating Metropolis Hastings with Lightweight Inference Compilation
 
transplantation-isospectral-poster
transplantation-isospectral-postertransplantation-isospectral-poster
transplantation-isospectral-poster
 
A Unifying Review of Gaussian Linear Models (Roweis 1999)
A Unifying Review of Gaussian Linear Models (Roweis 1999)A Unifying Review of Gaussian Linear Models (Roweis 1999)
A Unifying Review of Gaussian Linear Models (Roweis 1999)
 
Engineered histone acetylation using DNA-binding domains (DBD), chemical ind...
 Engineered histone acetylation using DNA-binding domains (DBD), chemical ind... Engineered histone acetylation using DNA-binding domains (DBD), chemical ind...
Engineered histone acetylation using DNA-binding domains (DBD), chemical ind...
 
A LOV2 Domain-Based Optogenetic Tool to Control Protein Degradation and Cellu...
A LOV2 Domain-Based Optogenetic Tool to Control Protein Degradation and Cellu...A LOV2 Domain-Based Optogenetic Tool to Control Protein Degradation and Cellu...
A LOV2 Domain-Based Optogenetic Tool to Control Protein Degradation and Cellu...
 
Integrated Genomic and Proteomic Analyses of a Systematically Perturbed Metab...
Integrated Genomic and Proteomic Analyses of a Systematically Perturbed Metab...Integrated Genomic and Proteomic Analyses of a Systematically Perturbed Metab...
Integrated Genomic and Proteomic Analyses of a Systematically Perturbed Metab...
 

Kürzlich hochgeladen

LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.
Silpa
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
1301aanya
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
Silpa
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
Silpa
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
seri bangash
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Silpa
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
 

Kürzlich hochgeladen (20)

Genome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxGenome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptx
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Chemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfChemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdf
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLGwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 

Recursive Autoencoders for Paraphrase Detection (Socher et al)

  • 1. Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection1 Richard Socher, Eric Huang, Jeffrey Penningotn, Andrew Ng, Christopher Manning Feynman Liang May 16, 2013 1 Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection Richard Socher, Eric H. Huang, Jeffrey Pennington, Andrew Y. Ng, and Christopher D. Manning Advances in Neural Information Processing Systems (NIPS 2011) F. Liang Unfolding RAE for Paraphrase Detection May 2013 1 / 26
  • 2. Motivation Consider the following phrases: The judge also refused to postpone the trial date of Sept. 29. Obus also denied a defense motion to postpone the September trial date. F. Liang Unfolding RAE for Paraphrase Detection May 2013 2 / 26
  • 3. Paraphrase Detection Problem Given: A pair of sentences S1 = (w1, . . . , wm) and S2 = (w1, . . . , wn), w ∈ V Task: Classify whether S1 and S2 are paraphrases or not F. Liang Unfolding RAE for Paraphrase Detection May 2013 3 / 26
  • 4. Overview Background Neural Language Models Recursive Autoencoders Contributions Unfolding RAEs Dynamic Pooling of Similarity Matrix Experiments F. Liang Unfolding RAE for Paraphrase Detection May 2013 4 / 26
  • 5. Prior Work Similarity Metrics n-gram Overlap / Longest Common Subsequence Ordered Tree Edit Distance WordNet hypernyms Language Models n-gram HMMs P(wt|wt−1 1 ) ≈ P(wt|wt−1 t−n+1) Log-Linear Models P(y|wt 1; θ) ≈ eθ f (wt 1 ,y) y ∈Y eθ f (wt 1 ,y ) Neural Language Models2 2 R. Collobert and J. Weston. A unified architecture for natural language processing: deep neural networks with multitask learning. In ICML, 2008. F. Liang Unfolding RAE for Paraphrase Detection May 2013 5 / 26
  • 6. Neural Language Models Vocabulary V Embedding Matrix L ∈ Rn×|V| L : V → Rn Each column of L “embeds” w ∈ V on a n-dimensional feature space Capture semantic and syntactic information about a word A sentence S = (w1, . . . , wm), wi ∈ V is represented as an ordered list (x1, . . . , xm), xi ∈ Rn F. Liang Unfolding RAE for Paraphrase Detection May 2013 6 / 26
  • 7. Neural Language Models F. Liang Unfolding RAE for Paraphrase Detection May 2013 7 / 26 Y. Bengio, R. Ducharme, P. Vincent, and C. Janvin. A neural probabilistic language model. J. Mach. Learn. Res., 3, March 2003.
  • 8. Recursive Autoencoders (RAEs) Assume we are given a binary parse tree T : A binary parse tree is a list of triplets of parents with children: (p → (c1, c2)) c1, c2 are either a terminal word vector xi ∈ Rn or a non-terminal parent y1 ∈ Rn Figure : Parse tree for ((y1 → x2x3), (y2 → x1y1)), ∀x, y ∈ Rn F. Liang Unfolding RAE for Paraphrase Detection May 2013 8 / 26
  • 9. Recursive Autoencoders (RAEs) Non-terminal parent p computed as p = f (We[c1; c2] + b) f is an activation function (eg. sigmoid, tanh) We ∈ Rn×2n the encoding matrix to be learned [c1; c2] ∈ R2n is the concatenated children b is a bias term Figure : y2 = f  We   x1 f We x2 x3 + b1   + b2   F. Liang Unfolding RAE for Paraphrase Detection May 2013 9 / 26
  • 10. Recursive Autoencoders (RAEs) Wd inverts We s.t. [c1; c2] = f (Wd p + bd ) is the decoding of p Erec(p) = [c1; c2] − [c1; c2] 2 2 To train: Minimize Erec (T ) = p∈T Erec (p) = Erec (y1) + Erec (y2) Add length normalization layer p = p p 2 to avoid degenerate solution F. Liang Unfolding RAE for Paraphrase Detection May 2013 10 / 26
  • 11. Unfolding RAEs Measure reconstruction error down to terminal xi s: For a node y that spans words i to j: Erec(y(i,j)) = [xi ; . . . ; xj ] − [xi ; . . . ; xj ] 2 2 Hidden layer norms no longer shrink Children with larger subtrees get more weight F. Liang Unfolding RAE for Paraphrase Detection May 2013 11 / 26
  • 12. Deep RAEs h = f (W (1) e [c1; c2] + b (1) e ) p = f (W (2) e h + b (2) 2 ) F. Liang Unfolding RAE for Paraphrase Detection May 2013 12 / 26 Andrew Ng. Autoencoders (CS294A Lecture notes).
  • 13. Training RAEs Data: A set of parse trees Objective: Minimize J = 1 |T | n∈T Erec(n; We) + λ 2 ( We 2 ) Gradient descent (backpropogation, L-BFGS) Non-convex, smooth convergance =⇒ local optima F. Liang Unfolding RAE for Paraphrase Detection May 2013 13 / 26
  • 14. Sentence Similarity Matrix For two sentences S1, S2 of lengths n and m, concatenate terminal xi s (in sentence order) with non-terminal yi s (depth-first, right-to-left) Compute similarity matrix S ∈ R(2v−1)×(2w−1), where Si,j is the 2-norm between the ith element from S1’s feature vector and the jth element from S2’s feature vector F. Liang Unfolding RAE for Paraphrase Detection May 2013 14 / 26
  • 15. Dynamic Pooling Sentence lengths may vary =⇒ S dimensionality may vary. Want to map S ∈ R(2n−1)×(2m−1) to Spooled ∈ Rnp×np with np constant Dynamically partition rows and columns of S into np equal parts Min. pool over each part Normalize µ = 0, σ = 1 and pass on to classifier (e.g. softmax) F. Liang Unfolding RAE for Paraphrase Detection May 2013 15 / 26
  • 16. Qualitative Evaluation of Unsupervised Feature Learning Dataset 150,000 sentences from NYT and AP sections of Gigaword corpus for RAE training Setup R100 unsupervised feature vectors provided by Turian et al.3 for initial word embeddings Stanford parser4 to extract parse tree Hidden layer h set to 200 units in both standard and unfolding RAE (0 in NN qualitative evaluation) 3 J. Turian, L. Ratinov, and Y. Bengio. Word representations: a simple and general method for semisupervised learning. In Proceedings of ACL, pages 384394, 2010. 4 D. Klein and C. D. Manning. Accurate unlexicalized parsing. In ACL, 2003. F. Liang Unfolding RAE for Paraphrase Detection May 2013 16 / 26
  • 17. Nearest Neighbor Figure : Comparison of nearest 2-norm neighbor F. Liang Unfolding RAE for Paraphrase Detection May 2013 17 / 26
  • 18. Recursive Decoding Figure : Phrase reconstruction via recursive decoding F. Liang Unfolding RAE for Paraphrase Detection May 2013 18 / 26
  • 19. Paraphrase Detection Task Dataset Microsoft Research paraphrase corpus (MSRP)5 5,801 sentence pairs, 3,900 labeled as paraphrases 5 B. Dolan, C. Quirk, and C. Brockett. Unsupervised construction of large paraphrase corpora: exploiting massively parallel news sources. In COLING, 2004. F. Liang Unfolding RAE for Paraphrase Detection May 2013 19 / 26
  • 20. Paraphrase Detection Task Setup 4,076 training pairs (67.5% positive), 1,725 test pairs (66.5% positive) For all (S1, S2) in training data, (S2, S1) also added Negative examples selected for high lexical overlap Add features ∈ {0, 1} to Spooled related to the set of numbers in S1 and S2 Numbers in S1 = numbers in S2 (Numbers in S1 ∪ numbers in S2) = ∅ Numbers in one sentence ⊂ numbers in other Softmax classifier on top of Spooled Hyperparameter selection: 10-fold cross-validation np = 15 λRAE = 10−5 λsoftmax = 0.05 Two annotators (83% agreement), third to resolve conflict F. Liang Unfolding RAE for Paraphrase Detection May 2013 20 / 26
  • 21. Example Results F. Liang Unfolding RAE for Paraphrase Detection May 2013 21 / 26
  • 22. State of the Art F. Liang Unfolding RAE for Paraphrase Detection May 2013 22 / 26 “Paraphrase Identification (State of the Art).” ACLWiki. Web. 14 May 2013.
  • 23. Comparison of Unsupervised Feature Learning Methods Setup Dynamic pooling layer Hyperparameters optimized over C.V. set Results Recursive averaging: 75.9% Standard RAE: 75.5% Unfolding RAE without hidden layers: 76.8% Unfolding RAE with hidden layers: 76.6% F. Liang Unfolding RAE for Paraphrase Detection May 2013 23 / 26
  • 24. Evaluating cotribution of Dynamic Pooling Layer Setup Unfolding RAE used to compute S Hyperparameters optimized over C.V. set Results S-histogram 73.0% Only added number features 73.2% Only Spooled 72.6% Top URAE Node 74.2% Spooled + number features 76.8% F. Liang Unfolding RAE for Paraphrase Detection May 2013 24 / 26
  • 25. Criteque Pros: Novel unfolding reconstruction error metric, dynamic pooling layer State of the art (2011) performance Cons: Vague training details / time to convergence Unconvincing improvement over baseline (recursive averaging, top RAE node) Training requires labeled parse trees (unsupervised performance depends on parser accuracy) Representing phrases on same feature-space as words F. Liang Unfolding RAE for Paraphrase Detection May 2013 25 / 26
  • 26. Criteque Suggestions: Add additional features to Spooled Overlap pooling regions Letting We vary depending on labels of children in parse tree Capture the operational meaning of a word to a sentence (MV-RNN6) p = f We c1 c2 + b → p = f We Ba + b0 Ab + a0 + p0 6 Richard Socher, Brody Huval, Christopher D. Manning and Andrew Y. Ng Conference on Empirical Methods in Natural Language Processing F. Liang Unfolding RAE for Paraphrase Detection May 2013 26 / 26