Learning Multilingual Semantic Parsers for Question Answering over Linked Data - A comparison of neural and probabilistic graphical model architectures
Ähnlich wie Learning Multilingual Semantic Parsers for Question Answering over Linked Data - A comparison of neural and probabilistic graphical model architectures
Ähnlich wie Learning Multilingual Semantic Parsers for Question Answering over Linked Data - A comparison of neural and probabilistic graphical model architectures (20)
The 7 Things I Know About Cyber Security After 25 Years | April 2024
Learning Multilingual Semantic Parsers for Question Answering over Linked Data - A comparison of neural and probabilistic graphical model architectures
1. Learning Multilingual Semantic Parsers for
Question Answering over Linked Data
A comparison of neural and probabilistic graphical model architectures
PhD Dissertation Defense Talk
March 2019
Sherzod Hakimov
Semantic Computing Group, CITEC
Bielefeld University
7. 7
Give me the route to Jahnplatz
What is Semantic Parsing?
• mapping natural language sentence to a detailed meaning representation
8. 8
Give me the route to Jahnplatz
route($LOC, “Jahnplatz”)
What is Semantic Parsing?
• mapping natural language sentence to a detailed representation of meaning representation
route(StartLocation, EndLocation)
9. 9
Give me the route to Jahnplatz
route($LOC, “Jahnplatz”)
What is Semantic Parsing?
• mapping natural language sentence to a detailed representation of meaning representation
• meaning representation can be modelled using a formal language that
10. 10
Give me the route to Jahnplatz
route($LOC, “Jahnplatz”)
What is Semantic Parsing?
• mapping natural language sentence to a detailed representation of meaning representation
• meaning representation can be modelled using a formal language, e.g. lambda calculus
• an ontology with properties, classes, entities, etc. (route, create_calendar_event, set_alarm)
• supports automated execution or reasoning
11. 11
Give me the route to Jahnplatz Query
Knowledge Base
Answer
Why do we need Semantic Parsers?
route($LOC, “Jahnplatz”)
12. 12
Give me the route to Jahnplatz Query
Knowledge Base
Answer
Why do we need Semantic Parsers?
route($LOC, “Jahnplatz”)
14. Motivation
• Building semantic parsers with application on Question Answering
• Building multilingual solutions that can be applied for multiple languages
14
Which German politicians were born in Bielefeld?
Which metal has a liquid form?
Welche deutschen Politiker wurden in Bielefeld geboren?
Welches Metall hat eine flüssige Form?
¿Qué políticos alemanes nacieron en Bielefeld?
¿Qué metal tiene una forma líquida?
15. Motivation
• Building semantic parsers with application on Question Answering
• Building multilingual solutions that can be extended for other languages
• Comparison and evaluation of different model architectures
15
16. Motivation
• Building semantic parsers with application on Question Answering
• Building multilingual solutions that can be extended for other languages
• Comparison and evaluation of different model architectures
• Highlight the challenges of building Question Answering systems
16
17. • based on structured content from Wikipedia
• more than 130 languages supported
• 760 classes, 1105 object & 1622 data type properties
• ca. 9 million resources
17
21. dbr:Dan_Brown
dbo:author
Question Answering on RDF Data
dbr:Inferno_(novel)
Dan Brown is the author of Inferno
Triple:
Natural Language:
dbr:Inferno_(novel) dbo:author dbr:Dan_Brown
21
22. dbr:Dan_Brown
dbo:author
Question Answering on RDF Data
dbr:Inferno_(novel)
Dan Brown is the author of Inferno
Who is the author of Inferno?Natural Language:
Question format
dbr:Inferno_(novel) dbo:author dbr:Dan_BrownTriple:
Natural Language:
22
23. dbr:Dan_Brown
dbo:author
Question Answering on RDF Data
dbr:Inferno_(novel)
Dan Brown is the author of Inferno
dbr:Inferno_(novel) dbo:author dbr:Dan_BrownTriple:
Natural Language:
Who is the author of Inferno?Natural Language:
SPARQL Query:
Question format
SELECT ?x WHERE {dbr:Inferno_(novel) dbo:author ?x}
23
25. Research Questions
How to map natural language phrases into knowledge base entries for multiple languages?
Which linguistic resources can be used?
25
dbr:Dan_Brown
dbo:author
Who is the author of Inferno? dbr:Inferno_(novel)
Who wrote Inferno?
Who is the writer of Inferno?
SELECT ?x WHERE {dbr:Inferno_(novel) dbo:author ?x}
26. Research Questions
How to map natural language phrases into knowledge base entries for multiple languages?
Which linguistic resources can be used?
26
dbr:Dan_Brown
dbo:author
Who is the author of Inferno? dbr:Inferno_(novel)
Who wrote Inferno?
Who is the writer of Inferno?
SELECT ?x WHERE {dbr:Inferno_(novel) dbo:author ?x}
Lexical Gap: write -> dbo:author
27. Research Questions
How to disambiguate URIs when multiple candidates are retrieved from mapping natural
language tokens into knowledge base entries?
27
When was Inferno released?
SELECT ?x WHERE {dbr:Inferno_(novel) dbo:releaseDate ?x}
dbr:Inferno_(2016_film) dbr:Inferno_(novel)
28. Research Questions
How to use syntactic information of a natural language question together with semantic
representations of entries in a knowledge base?
28
Who wrote Inferno?
dbr:Dan_Brown
dbo:author
dbr:Inferno_(novel)
SELECT ?x WHERE { dbr:Inferno_(novel) dbo:author ?x }
wrote
(VERB)
Who
(PRON)
Inferno
(PROPN)
nsubj dobj
29. Research Questions
What are the advantages and the disadvantages of a multilingual QA system vs. a
monolingual system built for each language?
29
Who is the author of Inferno? dbr:Dan_Brown
dbo:author
dbr:Inferno_(novel)
Wer ist der Autor von Inferno?
¿Quién es el autor de Inferno?
SELECT ?x WHERE { dbr:Inferno_(novel) dbo:author ?x }
30. Research Questions
What effort is required to adapt our QA pipelines to another language?
30
Who is the author of Inferno? dbr:Dan_Brown
dbo:author
dbr:Inferno_(novel)
Qui est l'auteur de Inferno?
Infernoning muallifi kim?
SELECT ?x WHERE { dbr:Inferno_(novel) dbo:author ?x }
32. Preliminaries
• Logical Form - DUDES, formalism for specifying meaning representations for dependency tree
structures
32
33. Preliminaries
• Logical Form - DUDES, formalism for specifying meaning representations for dependency tree
structures
• Semantic Composition - acquiring the meaning representations using the syntax of questions
33
34. Logical Form
34
• DUDES - Dependency-based Underspecified Discourse Representation Structures (Cimiano et al [1])
[1] Cimiano, P., 2009, Flexible semantic composition with DUDES. In Proceedings of the Eighth International Conference on
Computational Semantics (pp. 272-276). Association for Computational Linguistics.
35. Logical Form
35
• DUDES - Dependency-based Underspecified Discourse Representation Structures (Cimiano et al [1])
• Formalism for specifying meaning representation
• Flexible semantic composition w.r.t order of application
• Build on semantic dependencies e.g. suitable for working with dependency-based syntactic analysis
[1] Cimiano, P., 2009, Flexible semantic composition with DUDES. In Proceedings of the Eighth International Conference on
Computational Semantics (pp. 272-276). Association for Computational Linguistics.
36. DUDES
v : is the main variable
vs : the projection variables
l : is the label of the main DRS
drs : is a DRS (Discourse Representation Structure)
slots : is a set of semantic dependencies
36
37. Semantic Composition with DUDES
Who created Wikipedia?
Input: a natural language question and its dependency parse tree
37
38. Semantic Composition with DUDES
Who created Wikipedia?
Input: a natural language question and its dependency parse tree
dbr:Wikipedia dbo:author ?x
Output: a meaning representation based on certain domain
38
39. Semantic Composition with DUDES
Each node gets a pair of assignments: DUDES Type + Knowledge base ID
Oracle
39
49. Dependency parse tree-based Semantic Parsing
Approach
• multilingual semantic parsing approach: English, German & Spanish [1]
49
[1] Hakimov S, Jebbara S, Cimiano P. AMUSE: Multilingual Semantic Parsing for Question Answering over Linked Data.
In Proceedings of the 16th International Semantic Web Conference (ISWC), 2017
50. Dependency parse tree-based Semantic Parsing
Approach
• multilingual semantic parsing approach: English, German & Spanish [1]
• uses language-independent dependency parse trees from Universal
Dependencies
50
[1] Hakimov S, Jebbara S, Cimiano P. AMUSE: Multilingual Semantic Parsing for Question Answering over Linked Data.
In Proceedings of the 16th International Semantic Web Conference (ISWC), 2017
51. Dependency parse tree-based Semantic Parsing
Approach
• multilingual semantic parsing approach: English, German & Spanish [1]
• uses language-independent dependency parse trees from Universal
Dependencies
• combines different types of lexical information: DBpedia Ontology labels,
the M-ATOLL[2] lexicon & word embeddings
51
[1] Hakimov S, Jebbara S, Cimiano P. “AMUSE: Multilingual Semantic Parsing for Question Answering over Linked
Data”. ISWC 2017
[2] Walter S, Unger C, and Cimiano P. “M-ATOLL: A Framework for the Lexicalization of Ontologies in Multiple
Languages”. ISWC 2014
[3] Hakimov S, Walter S, Unger C, and Cimiano P. “Applying semantic parsing to question answering over linked data:
Addressing the lexical gap”. NLDB 2015
55. Inference
• Metropolis-Hastings: exploring huge search space (ca. 10 million resources, 2000 properties)
• Linking to Knowledge Base (L2KB)
•objective : compare set of URIs to the expected set of URIs
• Query Construction (QC)
•objective : compare the constructed query to the expected query
55
Input: initial state
56. L2KB Sampling
Explore the edges and assign Knowledge Base IDs based on lemmas of nodes
Inverted index: Ontology labels, lexicon from M-ATOLL & word embeddings
56
57. L2KB Sampling
Explore the edges and assign Knowledge Base IDs based on lemmas of nodes
Check the triple pattern- ?x dbo:author dbr:Wikipedia : Slot 2, dbr:Wikipedia dbo:author ?x : Slot1
Inverted index: Ontology labels, lexicon from M-ATOLL & word embeddings
57
66. Evaluation
Dataset: Question Answering over Linked Data (QALD), 6th challenge
English, German, Spanish, Italian, French, Dutch, Romanian, Farsi
350 for train, 100 for test
Unger, Christina, Axel-Cyrille Ngonga Ngomo, and Elena Cabrio (2016). “6th open challenge on question
answering over linked data (qald-6)”. In: Semantic Web Evaluation Challenge.
66
67. Evaluation
DBP: lexicon from DBpedia Ontology & WordNet
M-ATOLL: lexicon induced by the M-ATOLL (Walter et al. 2014)
Embed: lexicon added using pre-trained word embeddings (Mikolov et al. 2013)
Walter, Sebastian, Christina Unger, and Philipp Cimiano. “M-ATOLL: A Framework for the Lexicalization of Ontologies in Multiple Languages”. ISWC 2014
Mikolov, Tomas et al. “Distributed representations of words and phrases and their compositionality”. NIPS 2013
67
68. Evaluation
DBP: lexicon from DBpedia Ontology & WordNet
M-ATOLL: lexicon induced by the M-ATOLL (Walter et al. 2014)
Embed: lexicon added using pre-trained word embeddings (Mikolov et al. 2013)
Dict: manually defined lexicon
Walter, Sebastian, Christina Unger, and Philipp Cimiano. “M-ATOLL: A Framework for the Lexicalization of Ontologies in Multiple Languages”. ISWC 2014
Mikolov, Tomas et al. “Distributed representations of words and phrases and their compositionality”. NIPS 2013
68
69. Evaluation
DBP: lexicon from DBpedia Ontology & WordNet
M-ATOLL: lexicon induced by the M-ATOLL (Walter et al. 2014)
Embed: lexicon added using pre-trained word embeddings (Mikolov et al. 2013)
Dict: manually defined lexicon
Walter, Sebastian, Christina Unger, and Philipp Cimiano. “M-ATOLL: A Framework for the Lexicalization of Ontologies in Multiple Languages”. ISWC 2014
Mikolov, Tomas et al. “Distributed representations of words and phrases and their compositionality”. NIPS 2013
69
72. Outline
• SimpleQuestions dataset, 74k samples, Freebase data
• Question: “Who wrote Mildred Pierced?”
• Fact: mildred_pierced, book.written_work.author, stuart_kaminsky
• Answer: mildred_pierced, book.written_work.author, ?x
• Systematic comparison of different model architectures
72
Hakimov S, Jebbara S, Cimiano P. “Evaluating Architectural Choices for Deep Learning Approaches for Question
Answering over Knowledge Bases”. ICSC 2019
73. Named Entity Recognition
• Used by all models to predict the entity span
• Character & word embeddings
• Trained using weak supervision: inference is correct if the
expected entity has been found
73
75. Model1: BiLSTM-Softmax
75
Model2: BiLSTM-KB Model3: BiLSTM-Binary Model4: Fasttext [1]
Architectures
[1] A. Joulin, E. Grave, P. Bojanowski, T. Mikolov, "Bag of Tricks for Efficient Text Classification", 2016, arxiv.org
76. Model1: BiLSTM-Softmax
76
Model2: BiLSTM-KB Model3: BiLSTM-Binary Model4: Fasttext [1]
Architectures
[1] A. Joulin, E. Grave, P. Bojanowski, T. Mikolov, "Bag of Tricks for Efficient Text Classification", 2016, arxiv.org
82. Discussion
• Manual Effort
• Syntax and Semantics
• Multilinguality
• Cross-domain Transferability
• Training Data Size and Search Space
82
83. Discussion
83
Systems Manual Effort Syntax & Semantics Multilinguality
Cross-domain
transferability
Training Data &
Number of KB IDs
CCG-based
(Chapter 6)
CCG combination rules
manual lexicon
• learned in tandem
• CCG for syntax
• lambda calculus for semantics
manual effort is required manual effort is required
600 training instances
750 entities
Dependency-
based
(Chapter 7)
feature templates
syntax is given
DUDES as a formalism
an adaptable solution
• a dependency parser
required.
• e.g. biomedical domain
300 training instances
<= 10 mil. entities
>= 2000 predicates
BiLSTM-Softmax
(Chapter 8)
-
• word & char embed. for lexical
& contextual info
• semantics is limited to a single
predicate and a subject entity
an adaptable solution
only word & char
embed.
an adaptable solution
only word & char embed.
>= 75K instances
<= 2 mil. entities
85. Research Questions
85
• Ontology lexicalisations, e.g. M-ATOLL (Walter et al. 2014)
• Ontology labels, e.g. DBpedia labels
• Dictionaries
• WordNet synsets
• Induced from contextual embeddings of words
RQ1: How to map natural language phrases into knowledge base entries for
multiple languages? Which linguistic resources can be used?
86. Research Questions
86
• Supervised models with objective for disambiguation
• CCG-based model
• uses lexical and syntactic information as features
• Dependency tree-based model
• syntactic dependency between words, lexical similarity, ontology restrictions
• Neural network-based model
• ranking objective of predicates
RQ2: How to disambiguate URIs when multiple candidates are retrieved from
mapping natural language tokens into knowledge base entries?
87. Research Questions
87
• Semantic Parsing
• bottom-up composition
• CCG-based model
• learns the syntax and semantics together
• Dependency tree-based model
• learns composing semantics based on dependency trees
RQ3: How to use syntactic information of a natural language question together
with semantic representations of entries in a knowledge base?
88. Research Questions
88
RQ4: What are the advantages and the disadvantages of a multilingual QA system
vs. a monolingual system built for each language?
• Advantages
• Multilingual: broader coverage
• Monolingual: higher performance, e.g. Xser (Xu et al. 2014) 0.7 F1 on QALD-4
• Disadvantages
• Multilingual: lower performance, e.g. AMUSE 0.3 F1 on QALD-6
• Monolingual: need expertise, e.g. CCG rules, lexicon
89. Research Questions
89
• CCG-based model
• grammar rules, manually defined lexicon
• language-specific
• Dependency parse tree-based model
• dependency parse tree generator
• lexicon
• Neural network-based model
• depends on the training data
RQ5: What effort is required to adapt our QA pipelines to another language?
90. Conclusion
• Address the lexical gap for QA systems
• Incorporate ontology lexicalizations to reduce the lexical gap
• Use Universal Dependencies to build language-independent QA pipeline
• Multilingual semantic parsing for Question Answering
• Evaluate different QA models under the certain conditions
• Highlight importance of building blocks of a pipeline for a fair comparison
90
92. GENLEX
Barack Obama is married to Michelle Obama
[1] Zettlemoyer, Luke S and Michael Collins (2005). “Learning to Map Sentences to Logical Form : Structured
Classification with Probabilistic Categorial Grammars”. In: 21st Conference on Uncertainty in Artificial Intelligence
[2] Hakimov, Sherzod et al. (2015). “Applying semantic parsing to question answering over linked data: Addressing the lexical
gap”. In: International Conference on Applications of Natural Language to Information Systems 92
96. Lexicon
During sampling, compute cosine similarity of words into Ontology labels of properties
Vectors for multiple words are summed, e.g. V(population) + V(total)
96
103. Semantic Composition
•recursively computing the meaning of each node from the meanings of its child nodes
•build the meaning representation bottom-up
ComposeSemantics(dependency-parse-tree)
If parse-tree is a terminal node (word) then
return an atomic lexical meaning for the word.
Else
For each child, subtreei, of parse-tree
Create its MR by calling ComposeSemantics(subtreei)
Return an MR by properly combining the resulting MRs
for its children into an MR for the overall parse-tree.
103
108. Model Representation
108
Observed variables: dependency parse tree
Hidden variables: KB IDs, slot, DUDE types
• States can be ranked by
• objective score : compare to ground truth
• model score: computed using feature weights
• Training procedures
• switch between model & objective score after every iteration
111. Model1: BiLSTM-Softmax
• Softmax layer that predicts predicates seen
during training
• Encoding layer: word & character
• BiLSTM: two LSTM layers (backward, forward)
111
112. Model2: BiLSTM-KB
• Learn embedding of predicates in KB
• Encoding layer: word & character
• BiLSTM: two LSTM layers (backward, forward)
• Output layer computes cosine similarity to all
predicates and chooses the closest
112
113. Model3: BiLSTM-Binary
• Encoding layer: encodes input question with word &
character embeddings
• Encoding layer: encodes input predicate with word &
character embeddings
• Output layer: binary decision
113
114. Model4: Fasttext
• Document classification tool developed by Facebook*
• Uses word & character embeddings
• Softmax layer that predicts the expected predicate
114
* http://fasttext.cc
116. 116
Generative Models -> computing joint probability distribution on p(y|x)
HMM -> y_t depends on y_t-1 and x_t
how output label y_t generates input vector x
Discriminative Models -> computing conditional probability distribution over inputs x and outputs y
CRF -> doesn’t have any limitation like that
how feature vector x gets assignment y_t
118. Manual Effort
• CCG-based model
• define CCG grammar rules, hand-crafted lexicon for domain independent phrases
• Dependency parse tree-based model
• feature functions
• Neural network-based model (BiLSTM-Softmax)
• not required
118
119. Syntax and Semantics
• CCG-based model
• syntax and semantics is learned in tandem
• CCG for syntax and the lambda calculus for semantics
• syntax guides the semantics of the sentences
• Dependency parse tree-based model
• syntax is given and the semantics is learned
• DUDES as a formalism for semantics, syntax is based on dependency trees from Universal Dependencies
• Neural network-based model (BiLSTM-Softmax)
• syntactic information is learned, e.g. word and character embeddings provide contextual information
• semantics is based on a single subject and the predicate, simpler task
119
120. Multilinguality
• CCG-based model
• CCG grammar rules needs to be extended
• Dependency parse tree-based model
• a multilingual solution
• Neural network-based model (BiLSTM-Softmax)
• can be adapted to other languages, e.g. word & characters as features
120
121. Cross-domain Transferability
• CCG-based model
• manual effort is required: CCG rules, lexicon
• Dependency parse tree-based model
• dependency parse trees e.g. biomedical domain
• Neural network-based model (BiLSTM-Softmax)
• can be easily adapted
121
122. Training Data Size and Search Space
• CCG-based model
• Dependency parse tree-based model
• Neural network-based model (BiLSTM-Softmax)
• heavily depends on the data
122