Information access over linked data requires to determine
subgraph(s), in linked data's underlying graph, that correspond to the required information need. Usually, an information access framework is able to retrieve richer information by checking of a large number of possible subgraphs. However, on the ecking of a large number of possible subgraphs increases information access complexity. This makes information access frameworks less eective. A large number of contemporary linked data information access frameworks reduce the complexity by introducing dierent heuristics but they suer on retrieving richer information. Or, some frameworks do not care about the complexity. However, a practically usable framework should retrieve richer information with lower complexity. In linked data information access, we hypothesize that pre-processed data statistics of linked data can be used to eciently check a large number of possible subgraphs. This will help to retrieve comparatively richer information with lower data access complexity. Preliminary evaluation of our proposed hypothesis shows promising performance.
inteSearch: An Intelligent Linked Data Information Access Framework
1. inteSearch: An Intelligent Linked Data Information Access
Framework
Md-Mizanur Rahoman, Ryutaro Ichise
November 11, 2014
2. Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion
Outline
Introduction
Background of Linked Data Information Access
Problem and Probable Solution
Proposed Retrieval Framework: inteSearch
Pre-processing of Linked Data
Framework Details
Experiment
Conclusion
Md-Mizanur Rahoman, Ryutaro Ichise j 2
3. Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion
Linked Data (LD)
are structured data
represent knowledge with tuples like
<< Subject, Predicate, Object >>
which called as RDF triples
can be represented by graph
can use SQL-like expressive query
store, as openly available,
2122 datasets, 61 billion
RDF triples (as of Apr. 2014)
label
type
Property
type type
:birthPlace :supervisor :spouse
Birth Place
Supervisor Spouse
label label
range domain
domain range
domainrange
:Country :Person
Country
Person
label label
type
Class
type
Schema/Ontology
:amnd :barl :clra :dnld
label label
Amanda
type
:grmn :uk :grce
Germany United
Kingdom
Greece
Donald
:spouse :supervisor :spouse
:birthPlace :birthPlace :birthPlace
:birthPlace
label label label
type
Berlusconi Cleyra
label label
Instances
Md-Mizanur Rahoman, Ryutaro Ichise j 3
5. nding over LD graph
impose sub-stantial execution cost,
if graph size get bigger
know-how of (dataset speci
6. c)
vocabulary, schema, LD query
(i.e., linked data semantics)
demand domain-level expertise
expect automated tool to
understand linked data semantics
label
type
Property
type type
:birthPlace :supervisor :spouse
Birth Place
Supervisor Spouse
label label
range domain
domain range
domainrange
:Country :Person
Country
Person
label label
type
Class
type
Schema/Ontology
:amnd :barl :clra :dnld
label label
Amanda
type
:grmn :uk :grce
Germany United
Kingdom
Greece
Donald
:spouse :supervisor :spouse
:birthPlace :birthPlace :birthPlace
:birthPlace
label label label
type
Berlusconi Cleyra
label label
Instances
:spouse
:dnld
:birthPlacelabel
:grce
Donald
label
Greece
Md-Mizanur Rahoman, Ryutaro Ichise j 4
7. Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion
Contemporary LD Information Access Systems
Language-Tool-Based-Systems (PowerAqua'06, TBSL'12,
FREyA'11, SemSek'12, CASIA'13 etc.)
use language tools (e.g., parser, POS tagger etc.) to predict possible
sub-graphs (over LD graph)
convert sub-graphs to
8. nd SPARQL query
Pivot-Point-Based-Systems (Treo'11, NLP-Reduce'07 etc.)
pick a query word (i.e., pivot point), then try to pick other query word
w.r.t. the pivot point and predict a possible sub-graph (over LD graph)
convert sub-graph to
10. Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion
Language-Tool-Based-Systems
Problem
generate many improper parsed trees - dierent parser gives dierent
parsed trees, with dierent parsing tags.
tag for improper semantics (e.g., miss tagging of query words, such as
whether query word spouse should be tagged for Object or
Predicate)
generate empty result or improper result - choosing incorrect sub-graph
Md-Mizanur Rahoman, Ryutaro Ichise j 6
11. Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion
Pivot-Point-Based-Systems
Problem
depend heavily upon picking correct pivot point - most of the cases,
systems pick NE (named entities) related pivot points
12. rst, then other
pivot points
impose huge cost, if pivot point need to change - one pivot point can
have multiple LD resources
miss contextual information attachment e.g., random choosing of pivot
points could generate very dierent result
Md-Mizanur Rahoman, Ryutaro Ichise j 7
13. Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion
Problem Statement Probable Solution
Problem Statement
To LD information access, how can we
14. nd the required sub-graph
(over LD graph) within minimum execution cost that
will not generate empty result
will not miss contextual information of query
Solution
To
15. nd correct sub-graph - check maximum possible sub-graph
generation possibility
To achieve minimum execute cost - prepare pre-processed LD statistics
which insight sub-graph generation possibility
To not lose contextual information of query - adapt a sub-graph
joining technique called Progressive Joining Approach (Rahoman
Ichise'14)
Md-Mizanur Rahoman, Ryutaro Ichise j 8
16. Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion
inteSearch - Overview
Pre-processed data statistics
store LD resources in a way so that they can be picked easily
store pattern of LD resources so that they can give insight about
possible sub-graph
Development of framework
generate single query word based graph (called as, Basic Graph)
merge all Basic Graphs to predict all possible sub-graphs (i.e., called as
Keyword Graphs)
rank all possible Keyword Graphs using pre-processed data statistics
generate SPARQL query for the best ranked Keyword Graphs
Md-Mizanur Rahoman, Ryutaro Ichise j 9
17. Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion
Pre-processed data statistics
Label Extractor - extract and store label of LD resource
lv (r ) = fo j 9 r ; p; o 2 RDF triples of dataset ^ p 2 rrp
rrp is resource representing Predicates e.g., label, title etc.g
Pattern-wise Resource Frequency Generator - compute and store
LD resource pattern frequency
sf (r ) = j f r ; p; o j 9 r ; p; o 2 RDF triples of datasetg j
pf (r ) = j f s; r ; o j 9 s; r ; o 2 RDF triples of datasetg j
of (r ) = j f s; p; r j 9 s; p; r 2 RDF triples of datasetg j
Md-Mizanur Rahoman, Ryutaro Ichise j 10
18. Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion
Example of Pre-processed Data Statistics
Exemplary LD graph
Supervisor Spouse
label
type
Property
type type
:birthPlace :supervisor :spouse
Birth Place
label label
range domain
domain range
domainrange
:Country :Person
Country
Person
label label
type
Class
type
Schema/Ontology
:amnd :barl :clra :dnld
label label
Amanda
type
:grmn :uk :grce
Germany United
Kingdom
Greece
Donald
:spouse :supervisor :spouse
:birthPlace :birthPlace :birthPlace
:birthPlace
label label label
type
Berlusconi Cleyra
label label
Instances
Country
label
:Country
type
Class
Pre-processed data statistics
r lv (r ) sf(r) pf (r ) of (r )
:Country Country 2 ... ...
:... ... ... ... ...
Md-Mizanur Rahoman, Ryutaro Ichise j 11
19. Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion
Development of Framework
Basic Graph Generator - generate the Basic Graphs
Keyword Graph Generator - merge all Basic Graphs to predict the
Keyword Graphs
Ranker - rank all possible Keyword Graphs using pre-processed data
statistics
SPARQL Query Generator - generate SPARQL query for the best
ranked Keyword Graphs
Md-Mizanur Rahoman, Ryutaro Ichise j 12
20. Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion
Development of Framework
Md-Mizanur Rahoman, Ryutaro Ichise j 13
21. Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion
Basic Graph Generator
Choose one of the three Basic Graphs for each query word
?o
?p
k
?s , or k
k , or ?o
?p
?s
decided by (particular) similar LD resources (toward the query word)
and their pattern frequencies
e.g.,
if (particular) similar LD resources fR
g and
Predicate Pattern-wise Resource Frequency of a LD resource (e.g.,
pf (ri )) is bigger than all Subject and Object Pattern-wise Resource
Frequencies, then we select Basic Graph
?o
k
?s
weight computed by highest pattern frequencies of LD resources fR
g
Md-Mizanur Rahoman, Ryutaro Ichise j 14
22. Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion
Development of Framework
Md-Mizanur Rahoman, Ryutaro Ichise j 15
23. Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion
Keyword Graph Generator
Merge all Basic Graphs in their all possible merging options by
following Progressive Joining Approach
e.g., merging 1st and 2nd Basic Graphs at all possible options
k1
?s ?o
k
?p
?s 2
1st Basic Graph
k
1
2nd Basic Graph k
?s1 2
, and ?s
k
?o
1
1
k
2
?p
2
1
?o
k
?s
1
1 k
2
?p
2
1
Progressive Joining Approach - if query words with order
fk1; k2; k3; :::; kmg, then
join Basic Graph of k1 and Basic Graph of k2 and
24. nd a
Intermediate-version Keyword Graph, then
progressively join next Basic Graph for remaining query words and
update Intermediate-version Keyword Graph, until there is query word
Progressive Joining Approach maintain contextual information
attachment
Md-Mizanur Rahoman, Ryutaro Ichise j 16
25. Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion
Progressive Joining Approach - an Example
Intermediate-version Keyword Graph k
?p
?s
1
1 ?o
2
k2
1
?p
and Next query word corresponding Basic Graph k
?s 3
all possible contextualy-feasible Keyword Graph
Intermediate Next BG Joining between Increase of KG
Version KG last joined BG
and next BG
k
?p
?s
1
1 ?o
2
k2
?p
1 k
?s 3
k
k
2
?s 3
1
?s
k
?o
2
1
k
3
?p
3
2
?o
k
?s
2
2
k
3
?p
3
1
k
k
2
?s 3
1
?s
k
?o
2
1
k
3
?p
3
2
?o
k
?s
2
2
k
3
?p
3
1
k1
k1
k1
Md-Mizanur Rahoman, Ryutaro Ichise j 17
26. Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion
Development of Framework
Md-Mizanur Rahoman, Ryutaro Ichise j 18
27. Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion
Ranker
Rank Keyword Graphs for
Weight - minimum weight of constituent Basic Graphs
Depth level - how many edges a Keyword Graph holds
Consider lower depth level Keyword Graphs with higher ranked than
higher depth level Keyword Graphs
Md-Mizanur Rahoman, Ryutaro Ichise j 19
28. Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion
Development of Framework
Md-Mizanur Rahoman, Ryutaro Ichise j 20
29. Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion
SPARQL Query Generator
Construct SPARQL query
for higher ranked Keyword Graphs, until get the
30. rst non-empty result
directly converted by
putting Variables in SELECT clause
merging keyword corresponding resources in UNION clause
Md-Mizanur Rahoman, Ryutaro Ichise j 21
31. Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion
Experimental Setup
Question setup
Questions: Question Answering over Linked Data test question set
3(QALD-3)
consist natural language questions
Dataset Total Qs QALD-3
DBpedia 99 99
Keywords: constructed manually w.r.t. word order of question words
Evaluation metrics
Recall, Precision F1-Measure
Evaluated for
detail performance analysis, execution complexity measure, comparison
with other systems
Md-Mizanur Rahoman, Ryutaro Ichise j 22
32. Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion
Detail performance analysis
Analyzed for number of keywords each question hold
No of Qs Recall (Avg) Precision (Avg) F1 Measure (Avg)
One Keyword Group 1 1.00 1.00 1.00
Two Keyword Group 45 0.90 0.96 0.92
Three Keyword Group 13 0.77 0.77 0.77
Four Keyword Group 8 0.75 0.75 0.75
Five Keyword Group 3 1.000 1.000 1.000
0.87 0.90 0.88
Observation
according to One/Two/Three Keyword Group questions, selection of
Basic Graph works well
according to more-than-one Keyword Group questions, merging-based
Keyword Graph construction and ranking works well
pre-processed data statistics helps in ecient sub-graph
33. nding over
linked data graph
Md-Mizanur Rahoman, Ryutaro Ichise j 23
34. Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion
Execution time wise performance analysis
Environment
Machine: Intel R
CoreTMi7-4770K central processing unit (CPU) 3.50
GHz based system with 16 GB memory.
Triple Store: Network-connected Virtuoso (version 06.01.3127)
One Two Three Four Five
Keyword Keyword Keyword Keyword Keyword
Group Group Group Group Group
710 (ms) 2441 (ms) 2774 (ms) 3585 (ms) 3720 (ms)
Observation
execution cost linearly increase over number of keywords
pre-processed data statistics supports in faster execution
Md-Mizanur Rahoman, Ryutaro Ichise j 24
38. nding proper sub-graph over LD graph
We contributed devising LD IA framework that
does not generate empty result
maintain contextual information attachment
retrieve rich information with low execution cost
Single query word based Basic Graph can be extended for multiple
query words, that can increase further eciency
Md-Mizanur Rahoman, Ryutaro Ichise j 26