The document describes LODeX, a system that automatically extracts and summarizes schemas from Linked Open Data sources to help users discover and query these datasets. It extracts statistical indexes from the datasets and uses them to generate interactive schema summaries describing classes, properties, and attributes. Users can then build basic queries through a GUI, which are compiled into SPARQL and executed. The system was tested on over 500 datasets and evaluations found the schema summaries and query generation were over 88% accurate. Future work includes improving the interface based on user surveys and extending the system to cluster summaries for large datasets.
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
LODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources
1. DBGroup@UNIMO
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 1
D Day 2015 – Modena Italy
LODeX: Schema Summarization and automatic SPARQL query
generation for Linked Open Data sources
Fabio Benedetti
Department of Engineering “Enzo Ferrari”
University of Modena & Reggio Emilia
D-Day 2015 - Modena
2. DBGroup@UNIMO
3
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia
D Day 2015 – Modena Italy
LODeX: Schema Summarization and automatic SPARQL query
generation for Linked Open Data sources
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 3
[Schmachtenberg, Max, Christian Bizer, and Heiko Paulheim. "Adoption of the Linked Data Best Practices in
Different Topical Domains." The Semantic Web–ISWC 2014. Springer International Publishing, 2014. 245-260}
3. DBGroup@UNIMO
4
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia
D Day 2015 – Modena Italy
LODeX: Schema Summarization and automatic SPARQL query
generation for Linked Open Data sources
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 4
*Only 570 datasets belong to the LOD cloud,
the remaining datasets do not contain
ingoing/outgoing links to the LOD Cloud.
2009 2014*
Domain Number % Number %
Cross-domain 41 13.95% 41 4.04%
Geographic 31 10.54% 21 2.07%
Government 49 16.67% 183 18.05%
Life sciences 41 13.95% 83 8.19%
Media 25 8.50% 22 2.17%
Publications 87 29.59% 96 9.47%
Social web 0 0.00% 520 51.28%
User-generated
content 20 6.80% 48 4.73%
Total 294 1014
2009 Domain
Cross-domain
Geographic
Government
Life sciences
Media
Publications
Social web
2014
4. DBGroup@UNIMO
5
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia
D Day 2015 – Modena Italy
LODeX: Schema Summarization and automatic SPARQL query
generation for Linked Open Data sources
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 5
The Open Access trends encourage the
publication of Open Data in form of
Linked Data
But
discovering LOD sources of interest is a
complex task for a user
Main issues
• Do not exist any standard to document a Dataset
• The structure of the Dataset can be understood only
manually exploring the Dataset
• The Semantic Web technologies are extremely complex for
unskilled user
5. DBGroup@UNIMO
6
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia
D Day 2015 – Modena Italy
LODeX: Schema Summarization and automatic SPARQL query
generation for Linked Open Data sources
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 6
• To automatically extract and summarize a schema
(Schema Summary) able to describe a LOD Dataset
• Use the Schema Summary to support the user in the
information extraction task
Online & Automatic extraction
• It does not require any additional information by the user
• It works with SPARQL endpoints
– We have to handle the bad performance issues of these Datasets
The Schema Summary has to describe a Dataset
• Ontology/Vocabulary (OWL & RDFS constraints)
• Open Data (i.e. generated from existing RDBMS)
6. DBGroup@UNIMO
7
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia
D Day 2015 – Modena Italy
LODeX: Schema Summarization and automatic SPARQL query
generation for Linked Open Data sources
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 7
Two main modules
• Extraction & Summarization
• Visualization & Querying
LODeX uses a NoSQL
Database as back-end
Input
URLs of SPARQL endpoints
Output
Interactive Schema Summary
LOD Cloud
SPARQL
Queries
Schema
Summary
NoSQL
LODeX
Post-
processing
Statistical
Indexes
LODeX
Indexes
Extraction
Query
Orchestrator
Schema
Summary
Visualizzation
Schema
Summary
Basic
QueryResults
Endpoint
URLs
Sgvizler
SPARQL
Queries
7. DBGroup@UNIMO
8
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia
D Day 2015 – Modena Italy
LODeX: Schema Summarization and automatic SPARQL query
generation for Linked Open Data sources
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 8
Statistical Indexes
They are composed by 9 indexes divided in three groups:
• General group
• Intensional group
• Extensional group
The IE process is able to generate the SPARQL queries used to extract the
different indexes.
• Iterative algorithm able to extract the Intensional knowledge
• Pattern Strategy technique
– It is a technique able to produce an higher number of less complex
SPARQL query
The IE process is able to perform online index extraction handling the
performance issues of the SPARQL endpoints
[F. Benedetti, S. Bergamaschi, and L. Po, “Online index extraction from linked open data sources,” 2014, Linked Data for Information
Extraction (LD4IE) Workshop held at International Semantic Web Conference]
8. DBGroup@UNIMO
9
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia
D Day 2015 – Modena Italy
LODeX: Schema Summarization and automatic SPARQL query
generation for Linked Open Data sources
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 9
The elements composing the Schema Summary are:
• Classes
• Properties
• Attributes
An algorithm combines
the information
contained in the
Statistical Indexes to
produce and store the
Schema Summary
[F. Benedetti, S. Bergamaschi, and L. Po, “A visual summary for linked open data sources,” 2014, International
Semantic Web Conference (Posters & Demos)]
9. DBGroup@UNIMO
10
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia
D Day 2015 – Modena Italy
LODeX: Schema Summarization and automatic SPARQL query
generation for Linked Open Data sources
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 10
Schema
Summary
SPARQL
compiler
SPARQL
query
Basic
Query
• The User using the Web Application GUI is
driven to building a Basic Query
• A refinement panel helps the user in refine
the Basic Query
A SPARQL compiler automatically generates
the corresponding SPARQL query
Operator supported by the compiler:
• AND
• Optional
• Filter
The query is sent to the SPARQL endpoint
and the results can be visualized in a
tabular, maps or chart view (pie, bar, etc.)
• ORDER BY
• LIMIT
• OFFSET
10. DBGroup@UNIMO
11
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia
D Day 2015 – Modena Italy
LODeX: Schema Summarization and automatic SPARQL query
generation for Linked Open Data sources
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 11
11. DBGroup@UNIMO
12
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia
D Day 2015 – Modena Italy
LODeX: Schema Summarization and automatic SPARQL query
generation for Linked Open Data sources
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 12
Try LODeX demo at: http://dbgroup.unimo.it/lodex2
[F. Benedetti, S. Bergamaschi, and L. Po, “Visual Querying LOD sources with LODeX,” 2014, submitted at The
Semantic Web journal]
12. DBGroup@UNIMO
13
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia
D Day 2015 – Modena Italy
LODeX: Schema Summarization and automatic SPARQL query
generation for Linked Open Data sources
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 13
Test Nov. 2014
Dataset URLs 559
Reachable datasets 302
SPARQL 1.1
compatible
206
Extraction completed 185
Task Correct Answers
Schema Summary browsing 94% (32/34)
Query generation 88% (60/68)
Online survey with 17 anonymous
users:
• 8 Skilled users
• 9 Unskilled user
The survey is divided in two parts:
• Schema Summary browsing
clarity
• Query generation
13. DBGroup@UNIMO
14
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia
D Day 2015 – Modena Italy
LODeX: Schema Summarization and automatic SPARQL query
generation for Linked Open Data sources
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 14
• Modify the interface of LODeX according to the
results of the online survey
• Extends the VOID descriptor vocabulary in order
to represent the Statistical Indexes and publish our
data as LOD
– Build an observatory for the LOD cloud
• Define clustering techniques to reduce the size of
the Summary for huge dataset
14. DBGroup@UNIMO
15
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia
D Day 2015 – Modena Italy
LODeX: Schema Summarization and automatic SPARQL query
generation for Linked Open Data sources
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 15
Accepted papers
• Beneventano, D., Bergamaschi, S., Sorrentino, S., Vincini, M., Benedetti, F. “Semantic
annotation of the CEREALAB database by the AGROVOC linked dataset” (2014)
Ecological Informatics journal, . Article in Press.
• F. Benedetti, S. Bergamaschi, and L. Po, “Online index extraction from linked open
data sources” 2014, Linked Data for Information Extraction (LD4IE) Workshop held at
International Semantic Web Conference
• F. Benedetti, S. Bergamaschi, and L. Po, “A visual summary for linked open data
sources” 2014, International Semantic Web Conference (Posters & Demos)
Submitted papers
• F. Benedetti, S. Bergamaschi, and L. Po, “Visual Querying LOD sources with LODeX”
2014, submitted at Semantic Web – Interoperability, Usability, Applicability an IOS
Press Journal
European projects & schools
• Web Science Summer School - Southampton University (20-26 July 2014)
• RDA Research Data Alliance - RDA Fourth Plenary Meeting 22 - 24 September 2014 in
Amsterdam. I won an Early Career Scientist grant and I belong to the Big Data
Analytics Interest group.
• Keystone - COST Action IC1302. Autumn 2014 MC and WG Meetings “QUERYING THE
SEMANTIC WEB” 17-18 October 2014, Riva del Garda, TN.
15. DBGroup@UNIMO
16
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia
D Day 2015 – Modena Italy
LODeX: Schema Summarization and automatic SPARQL query
generation for Linked Open Data sources
Thanks for your attention!