Why Reinvent the Wheel: Let's Build Question Answering Systems Together

Why Reinvent the Wheel- Let’s Build Question
Answering Systems Together
Kuldeep Singh, Arun Sethupat, Andreas Both, Saeedeh Shekarpour, Ioanna Lytra,
Ricardo Usbeck, Akhilesh Vyas, Akmal Khikmatullaev, Dharmen Punjani, Christoph Lange,
Maria-Esther Vidal, Jens Lehmann, Sören Auer
The Web Conference 2018 26.04.2018

● More than 60
Question Answering
(QA) systems in last
6 years for
structured data.
● QA systems
implement similar
tasks to answer
user’s question.
State of the Art

Motivating Example
Why Reinvent the Wheel- Let’s Build Question Answering Systems Together

Problem

Approach
● Step 1: Prediction of top k QA
components per task
● Step 2: Formation of QA pipelines
using a greedy algorithm
QA Optimisation Pipeline Algorithm

Frankenstein Framework

Frankenstein - Characteristics

DataSet Questions
Considered
Complexity Simple Questions
LC-QuAD 3252 High 22%
QALD-5 204 Low 53%
Corpus Creation

Preparing Training Dataset- Question Features

Evaluation Metrics (per component)
Metric Description
Precision (Micro) (#correct answer retrieved for q)/(#answers retrieved for q)
Recall (Micro) (#correct answer retrieved for q)/(# gold standard answers
retrieved for q)
Precision Average of Micro Precision for all questions
Recall Average of Micro Recall for all questions
F-Score (Micro) Harmonic Mean of Micro Precision and Micro Recall
F-Score Harmonic Mean of Precision and Recall

Evaluating Component Performance
QA Task Dataset Best Component Precision Recall Macro F-score
QB LC-QuAD
QALD 5
NLIWOD QB
NLIWOD QB
0.48
0.49
0.49
0.50
0.48
0.49
CL LC-QuAD
QALD 5
OKBQA DM CLS
OKBQA DM CLS
0.47
0.58
0.59
0.64
0.52
0.61
NED LC-QuAD
QALD 5
0.69
0.67
0.66
0.75
RL LC-QuAD
QALD 5
0.25
0.54
0.22
0.74
DBpediaSpotlight
Tag Me
ReMatch
0.23
0.62
RNLIWOD
0.67
0.71

Variation in Performance of Components
Question Type Highest Precision among all
components over LC-QuAD
India → dbr:India (#entities=2) 0.91
English → dbr:English_Speaking_Country
(#entity=1)
0.24
number of relation=1, relation is explicit 0.46
When no natural language relation
(e.g. “Give me all cosmonauts”)
0.0
NED
NED
RL
RL

Key Message
No one is perfect!
Question Type defines the perfection!

Evaluating Pipeline Performance (task level)
QA Tasks
Total
Questions
Answerable
Questions
Frankenstein* Best component
of LC-QuAD
(per task)Top 1 Top 2 Top 3
QB 324.3 175.4 175.4 -
CL 324.3 76 76 -
NED 324.3 294.2 270.9
RL 324.3 153.1 118.9
10 fold-Cross Validation Experiments on LC-QuAD Dataset
Message: Frankenstein’s Dynamic Composition outperform
the best components
162.7 159.6
68.2
236.3
84.2
68.1
245.2
90.2
284.3
134.4

Evaluating Pipeline Performance (task level)
QA Tasks
Total
Questions
Answerable
Questions
Frankenstein* Best component
of LC-QuAD
(per task)Top 1 Top 2 Top 3
QB 204 119 119 -
CL 204 55 55 -
NED 204 168 153
RL 204 138 107
Using LC-QuAD for training, QALD for Test data
Message: Frankenstein’s Dynamic Composition outperform
the best components again!
91
52
132
83
163
121
102
55
109
46

Evaluating Pipeline Performance (Pipeline level)
Frankenstein -
Pipeline
Answered
Question
Precision Recall Macro F-score
With QALD’s
best
components
37 0.17 0.19 0.18
Dynamic
Using LC-QuAD for Training, QALD for Test Data
Message: 1. Dynamic Pipeline outperforms the static one.
2. Query Builder is bottleneck for accuracy.
3. Used Relation Linkers kills the runtime.
41 410.20 410.21 410.20

No one is perfect!
Frankenstein can compose dynamic pipelines based on type of question.
Missing intelligence in Query Builder effect QA pipeline miserably.
Worst runtime → Relation Linker
Using same feature set for all tasks impact performance of prediction.

Extend Feature set and dataset
Include more domain independent components in Frankenstein
Improve learning mechanism
Research in direction to add intelligence to Query Builder
Research in direction to address implicit/hidden relations
Research in direction to identify implicit entities

Join us : https://github.com/WDAqua/Frankenstein
Find us on : http://wdaqua.eu/ and http://sda.cs.uni-bonn.de/
Email : kuldeep.singh@iais.fraunhofer.de

Why Reinvent the Wheel: Let's Build Question Answering Systems Together

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Why Reinvent the Wheel: Let's Build Question Answering Systems Together

Ähnlich wie Why Reinvent the Wheel: Let's Build Question Answering Systems Together (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Why Reinvent the Wheel: Let's Build Question Answering Systems Together