SlideShare ist ein Scribd-Unternehmen logo
1 von 15
DBGroup@UNIMO
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 1
D Day 2015 – Modena Italy
LODeX: Schema Summarization and automatic SPARQL query
generation for Linked Open Data sources
Fabio Benedetti
Department of Engineering “Enzo Ferrari”
University of Modena & Reggio Emilia
D-Day 2015 - Modena
DBGroup@UNIMO
3
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia
D Day 2015 – Modena Italy
LODeX: Schema Summarization and automatic SPARQL query
generation for Linked Open Data sources
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 3
[Schmachtenberg, Max, Christian Bizer, and Heiko Paulheim. "Adoption of the Linked Data Best Practices in
Different Topical Domains." The Semantic Web–ISWC 2014. Springer International Publishing, 2014. 245-260}
DBGroup@UNIMO
4
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia
D Day 2015 – Modena Italy
LODeX: Schema Summarization and automatic SPARQL query
generation for Linked Open Data sources
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 4
*Only 570 datasets belong to the LOD cloud,
the remaining datasets do not contain
ingoing/outgoing links to the LOD Cloud.
2009 2014*
Domain Number % Number %
Cross-domain 41 13.95% 41 4.04%
Geographic 31 10.54% 21 2.07%
Government 49 16.67% 183 18.05%
Life sciences 41 13.95% 83 8.19%
Media 25 8.50% 22 2.17%
Publications 87 29.59% 96 9.47%
Social web 0 0.00% 520 51.28%
User-generated
content 20 6.80% 48 4.73%
Total 294 1014
2009 Domain
Cross-domain
Geographic
Government
Life sciences
Media
Publications
Social web
2014
DBGroup@UNIMO
5
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia
D Day 2015 – Modena Italy
LODeX: Schema Summarization and automatic SPARQL query
generation for Linked Open Data sources
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 5
The Open Access trends encourage the
publication of Open Data in form of
Linked Data
But
discovering LOD sources of interest is a
complex task for a user
Main issues
• Do not exist any standard to document a Dataset
• The structure of the Dataset can be understood only
manually exploring the Dataset
• The Semantic Web technologies are extremely complex for
unskilled user
DBGroup@UNIMO
6
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia
D Day 2015 – Modena Italy
LODeX: Schema Summarization and automatic SPARQL query
generation for Linked Open Data sources
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 6
• To automatically extract and summarize a schema
(Schema Summary) able to describe a LOD Dataset
• Use the Schema Summary to support the user in the
information extraction task
Online & Automatic extraction
• It does not require any additional information by the user
• It works with SPARQL endpoints
– We have to handle the bad performance issues of these Datasets
The Schema Summary has to describe a Dataset
• Ontology/Vocabulary (OWL & RDFS constraints)
• Open Data (i.e. generated from existing RDBMS)
DBGroup@UNIMO
7
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia
D Day 2015 – Modena Italy
LODeX: Schema Summarization and automatic SPARQL query
generation for Linked Open Data sources
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 7
Two main modules
• Extraction & Summarization
• Visualization & Querying
LODeX uses a NoSQL
Database as back-end
Input
URLs of SPARQL endpoints
Output
Interactive Schema Summary
LOD Cloud
SPARQL
Queries
Schema
Summary
NoSQL
LODeX
Post-
processing
Statistical
Indexes
LODeX
Indexes
Extraction
Query
Orchestrator
Schema
Summary
Visualizzation
Schema
Summary
Basic
QueryResults
Endpoint
URLs
Sgvizler
SPARQL
Queries
DBGroup@UNIMO
8
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia
D Day 2015 – Modena Italy
LODeX: Schema Summarization and automatic SPARQL query
generation for Linked Open Data sources
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 8
Statistical Indexes
They are composed by 9 indexes divided in three groups:
• General group
• Intensional group
• Extensional group
The IE process is able to generate the SPARQL queries used to extract the
different indexes.
• Iterative algorithm able to extract the Intensional knowledge
• Pattern Strategy technique
– It is a technique able to produce an higher number of less complex
SPARQL query
The IE process is able to perform online index extraction handling the
performance issues of the SPARQL endpoints
[F. Benedetti, S. Bergamaschi, and L. Po, “Online index extraction from linked open data sources,” 2014, Linked Data for Information
Extraction (LD4IE) Workshop held at International Semantic Web Conference]
DBGroup@UNIMO
9
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia
D Day 2015 – Modena Italy
LODeX: Schema Summarization and automatic SPARQL query
generation for Linked Open Data sources
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 9
The elements composing the Schema Summary are:
• Classes
• Properties
• Attributes
An algorithm combines
the information
contained in the
Statistical Indexes to
produce and store the
Schema Summary
[F. Benedetti, S. Bergamaschi, and L. Po, “A visual summary for linked open data sources,” 2014, International
Semantic Web Conference (Posters & Demos)]
DBGroup@UNIMO
10
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia
D Day 2015 – Modena Italy
LODeX: Schema Summarization and automatic SPARQL query
generation for Linked Open Data sources
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 10
Schema
Summary
SPARQL
compiler
SPARQL
query
Basic
Query
• The User using the Web Application GUI is
driven to building a Basic Query
• A refinement panel helps the user in refine
the Basic Query
A SPARQL compiler automatically generates
the corresponding SPARQL query
Operator supported by the compiler:
• AND
• Optional
• Filter
The query is sent to the SPARQL endpoint
and the results can be visualized in a
tabular, maps or chart view (pie, bar, etc.)
• ORDER BY
• LIMIT
• OFFSET
DBGroup@UNIMO
11
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia
D Day 2015 – Modena Italy
LODeX: Schema Summarization and automatic SPARQL query
generation for Linked Open Data sources
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 11
DBGroup@UNIMO
12
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia
D Day 2015 – Modena Italy
LODeX: Schema Summarization and automatic SPARQL query
generation for Linked Open Data sources
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 12
Try LODeX demo at: http://dbgroup.unimo.it/lodex2
[F. Benedetti, S. Bergamaschi, and L. Po, “Visual Querying LOD sources with LODeX,” 2014, submitted at The
Semantic Web journal]
DBGroup@UNIMO
13
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia
D Day 2015 – Modena Italy
LODeX: Schema Summarization and automatic SPARQL query
generation for Linked Open Data sources
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 13
Test Nov. 2014
Dataset URLs 559
Reachable datasets 302
SPARQL 1.1
compatible
206
Extraction completed 185
Task Correct Answers
Schema Summary browsing 94% (32/34)
Query generation 88% (60/68)
Online survey with 17 anonymous
users:
• 8 Skilled users
• 9 Unskilled user
The survey is divided in two parts:
• Schema Summary browsing
clarity
• Query generation
DBGroup@UNIMO
14
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia
D Day 2015 – Modena Italy
LODeX: Schema Summarization and automatic SPARQL query
generation for Linked Open Data sources
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 14
• Modify the interface of LODeX according to the
results of the online survey
• Extends the VOID descriptor vocabulary in order
to represent the Statistical Indexes and publish our
data as LOD
– Build an observatory for the LOD cloud
• Define clustering techniques to reduce the size of
the Summary for huge dataset
DBGroup@UNIMO
15
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia
D Day 2015 – Modena Italy
LODeX: Schema Summarization and automatic SPARQL query
generation for Linked Open Data sources
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 15
Accepted papers
• Beneventano, D., Bergamaschi, S., Sorrentino, S., Vincini, M., Benedetti, F. “Semantic
annotation of the CEREALAB database by the AGROVOC linked dataset” (2014)
Ecological Informatics journal, . Article in Press.
• F. Benedetti, S. Bergamaschi, and L. Po, “Online index extraction from linked open
data sources” 2014, Linked Data for Information Extraction (LD4IE) Workshop held at
International Semantic Web Conference
• F. Benedetti, S. Bergamaschi, and L. Po, “A visual summary for linked open data
sources” 2014, International Semantic Web Conference (Posters & Demos)
Submitted papers
• F. Benedetti, S. Bergamaschi, and L. Po, “Visual Querying LOD sources with LODeX”
2014, submitted at Semantic Web – Interoperability, Usability, Applicability an IOS
Press Journal
European projects & schools
• Web Science Summer School - Southampton University (20-26 July 2014)
• RDA Research Data Alliance - RDA Fourth Plenary Meeting 22 - 24 September 2014 in
Amsterdam. I won an Early Career Scientist grant and I belong to the Big Data
Analytics Interest group.
• Keystone - COST Action IC1302. Autumn 2014 MC and WG Meetings “QUERYING THE
SEMANTIC WEB” 17-18 October 2014, Riva del Garda, TN.
DBGroup@UNIMO
16
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia
D Day 2015 – Modena Italy
LODeX: Schema Summarization and automatic SPARQL query
generation for Linked Open Data sources
Thanks for your attention!

Weitere ähnliche Inhalte

Ähnlich wie LODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources​

CNI fall 2009 enhanced publications john_doove-SURFfoundation
CNI fall 2009 enhanced publications john_doove-SURFfoundationCNI fall 2009 enhanced publications john_doove-SURFfoundation
CNI fall 2009 enhanced publications john_doove-SURFfoundation
John Doove
 
Master defence 2020 - Kateryna Liubonko - Matching Red Links to Wikidata Items
 Master defence 2020 - Kateryna Liubonko - Matching Red Links to Wikidata Items Master defence 2020 - Kateryna Liubonko - Matching Red Links to Wikidata Items
Master defence 2020 - Kateryna Liubonko - Matching Red Links to Wikidata Items
Lviv Data Science Summer School
 

Ähnlich wie LODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources​ (20)

Visual Querying LOD sources with LODeX
 Visual Querying LOD sources with LODeX Visual Querying LOD sources with LODeX
Visual Querying LOD sources with LODeX
 
LOD2 Webinar Series: CubeViz
LOD2 Webinar Series: CubeViz LOD2 Webinar Series: CubeViz
LOD2 Webinar Series: CubeViz
 
A Survey of Exploratory Search Systems Based on LOD Resources
A Survey of Exploratory Search Systems Based on LOD ResourcesA Survey of Exploratory Search Systems Based on LOD Resources
A Survey of Exploratory Search Systems Based on LOD Resources
 
Linked Open Data Visualization
Linked Open Data VisualizationLinked Open Data Visualization
Linked Open Data Visualization
 
LOD2: State of Play WP3A - Knowledge Base Creation, Enrichment and Repair
LOD2: State of Play WP3A - Knowledge Base Creation, Enrichment and RepairLOD2: State of Play WP3A - Knowledge Base Creation, Enrichment and Repair
LOD2: State of Play WP3A - Knowledge Base Creation, Enrichment and Repair
 
Linked Open Graph: browsing multiple SPARQL entry points to build your own LO...
Linked Open Graph: browsing multiple SPARQL entry points to build your own LO...Linked Open Graph: browsing multiple SPARQL entry points to build your own LO...
Linked Open Graph: browsing multiple SPARQL entry points to build your own LO...
 
CNI fall 2009 enhanced publications john_doove-SURFfoundation
CNI fall 2009 enhanced publications john_doove-SURFfoundationCNI fall 2009 enhanced publications john_doove-SURFfoundation
CNI fall 2009 enhanced publications john_doove-SURFfoundation
 
Linked data and semantic wikis
Linked data and semantic wikisLinked data and semantic wikis
Linked data and semantic wikis
 
Semantic Web Methodologies, Best Practices and Ontology Engineering Applied t...
Semantic Web Methodologies, Best Practices and Ontology Engineering Applied t...Semantic Web Methodologies, Best Practices and Ontology Engineering Applied t...
Semantic Web Methodologies, Best Practices and Ontology Engineering Applied t...
 
LOD2 Webinar: UnifiedViews
LOD2 Webinar: UnifiedViewsLOD2 Webinar: UnifiedViews
LOD2 Webinar: UnifiedViews
 
[3.6] Beyond Data Sharing - Pieter van Gorp [3TU.Datacentrum Symposium 2014, ...
[3.6] Beyond Data Sharing - Pieter van Gorp [3TU.Datacentrum Symposium 2014, ...[3.6] Beyond Data Sharing - Pieter van Gorp [3TU.Datacentrum Symposium 2014, ...
[3.6] Beyond Data Sharing - Pieter van Gorp [3TU.Datacentrum Symposium 2014, ...
 
Linked Data at the OU - the story so far
Linked Data at the OU - the story so farLinked Data at the OU - the story so far
Linked Data at the OU - the story so far
 
How To Structure Your Search Team for Success
How To Structure Your Search Team for SuccessHow To Structure Your Search Team for Success
How To Structure Your Search Team for Success
 
LOD2 Plenary Vienna 2012: WP3 - Knowledge Base Creation, Enrichment and Repair
LOD2 Plenary Vienna 2012: WP3 - Knowledge Base Creation, Enrichment and RepairLOD2 Plenary Vienna 2012: WP3 - Knowledge Base Creation, Enrichment and Repair
LOD2 Plenary Vienna 2012: WP3 - Knowledge Base Creation, Enrichment and Repair
 
A Benchmark of PDF Information Extraction Tools using a Multi-Task and Multi-...
A Benchmark of PDF Information Extraction Tools using a Multi-Task and Multi-...A Benchmark of PDF Information Extraction Tools using a Multi-Task and Multi-...
A Benchmark of PDF Information Extraction Tools using a Multi-Task and Multi-...
 
bonino
boninobonino
bonino
 
Describing Theses and Dissertations Using Schema.org
Describing Theses and Dissertations Using Schema.orgDescribing Theses and Dissertations Using Schema.org
Describing Theses and Dissertations Using Schema.org
 
Proof of Concept for Learning Analytics Interoperability
Proof of Concept for Learning Analytics InteroperabilityProof of Concept for Learning Analytics Interoperability
Proof of Concept for Learning Analytics Interoperability
 
Master defence 2020 - Kateryna Liubonko - Matching Red Links to Wikidata Items
 Master defence 2020 - Kateryna Liubonko - Matching Red Links to Wikidata Items Master defence 2020 - Kateryna Liubonko - Matching Red Links to Wikidata Items
Master defence 2020 - Kateryna Liubonko - Matching Red Links to Wikidata Items
 
Linked Data for the Masses: The approach and the Software
Linked Data for the Masses: The approach and the SoftwareLinked Data for the Masses: The approach and the Software
Linked Data for the Masses: The approach and the Software
 

Kürzlich hochgeladen

Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
Areesha Ahmad
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 

Kürzlich hochgeladen (20)

Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 

LODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources​

  • 1. DBGroup@UNIMO Dot. Fabio Benedetti Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 1 D Day 2015 – Modena Italy LODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources Fabio Benedetti Department of Engineering “Enzo Ferrari” University of Modena & Reggio Emilia D-Day 2015 - Modena
  • 2. DBGroup@UNIMO 3 Dot. Fabio Benedetti Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia D Day 2015 – Modena Italy LODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources Dot. Fabio Benedetti Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 3 [Schmachtenberg, Max, Christian Bizer, and Heiko Paulheim. "Adoption of the Linked Data Best Practices in Different Topical Domains." The Semantic Web–ISWC 2014. Springer International Publishing, 2014. 245-260}
  • 3. DBGroup@UNIMO 4 Dot. Fabio Benedetti Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia D Day 2015 – Modena Italy LODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources Dot. Fabio Benedetti Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 4 *Only 570 datasets belong to the LOD cloud, the remaining datasets do not contain ingoing/outgoing links to the LOD Cloud. 2009 2014* Domain Number % Number % Cross-domain 41 13.95% 41 4.04% Geographic 31 10.54% 21 2.07% Government 49 16.67% 183 18.05% Life sciences 41 13.95% 83 8.19% Media 25 8.50% 22 2.17% Publications 87 29.59% 96 9.47% Social web 0 0.00% 520 51.28% User-generated content 20 6.80% 48 4.73% Total 294 1014 2009 Domain Cross-domain Geographic Government Life sciences Media Publications Social web 2014
  • 4. DBGroup@UNIMO 5 Dot. Fabio Benedetti Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia D Day 2015 – Modena Italy LODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources Dot. Fabio Benedetti Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 5 The Open Access trends encourage the publication of Open Data in form of Linked Data But discovering LOD sources of interest is a complex task for a user Main issues • Do not exist any standard to document a Dataset • The structure of the Dataset can be understood only manually exploring the Dataset • The Semantic Web technologies are extremely complex for unskilled user
  • 5. DBGroup@UNIMO 6 Dot. Fabio Benedetti Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia D Day 2015 – Modena Italy LODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources Dot. Fabio Benedetti Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 6 • To automatically extract and summarize a schema (Schema Summary) able to describe a LOD Dataset • Use the Schema Summary to support the user in the information extraction task Online & Automatic extraction • It does not require any additional information by the user • It works with SPARQL endpoints – We have to handle the bad performance issues of these Datasets The Schema Summary has to describe a Dataset • Ontology/Vocabulary (OWL & RDFS constraints) • Open Data (i.e. generated from existing RDBMS)
  • 6. DBGroup@UNIMO 7 Dot. Fabio Benedetti Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia D Day 2015 – Modena Italy LODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources Dot. Fabio Benedetti Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 7 Two main modules • Extraction & Summarization • Visualization & Querying LODeX uses a NoSQL Database as back-end Input URLs of SPARQL endpoints Output Interactive Schema Summary LOD Cloud SPARQL Queries Schema Summary NoSQL LODeX Post- processing Statistical Indexes LODeX Indexes Extraction Query Orchestrator Schema Summary Visualizzation Schema Summary Basic QueryResults Endpoint URLs Sgvizler SPARQL Queries
  • 7. DBGroup@UNIMO 8 Dot. Fabio Benedetti Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia D Day 2015 – Modena Italy LODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources Dot. Fabio Benedetti Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 8 Statistical Indexes They are composed by 9 indexes divided in three groups: • General group • Intensional group • Extensional group The IE process is able to generate the SPARQL queries used to extract the different indexes. • Iterative algorithm able to extract the Intensional knowledge • Pattern Strategy technique – It is a technique able to produce an higher number of less complex SPARQL query The IE process is able to perform online index extraction handling the performance issues of the SPARQL endpoints [F. Benedetti, S. Bergamaschi, and L. Po, “Online index extraction from linked open data sources,” 2014, Linked Data for Information Extraction (LD4IE) Workshop held at International Semantic Web Conference]
  • 8. DBGroup@UNIMO 9 Dot. Fabio Benedetti Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia D Day 2015 – Modena Italy LODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources Dot. Fabio Benedetti Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 9 The elements composing the Schema Summary are: • Classes • Properties • Attributes An algorithm combines the information contained in the Statistical Indexes to produce and store the Schema Summary [F. Benedetti, S. Bergamaschi, and L. Po, “A visual summary for linked open data sources,” 2014, International Semantic Web Conference (Posters & Demos)]
  • 9. DBGroup@UNIMO 10 Dot. Fabio Benedetti Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia D Day 2015 – Modena Italy LODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources Dot. Fabio Benedetti Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 10 Schema Summary SPARQL compiler SPARQL query Basic Query • The User using the Web Application GUI is driven to building a Basic Query • A refinement panel helps the user in refine the Basic Query A SPARQL compiler automatically generates the corresponding SPARQL query Operator supported by the compiler: • AND • Optional • Filter The query is sent to the SPARQL endpoint and the results can be visualized in a tabular, maps or chart view (pie, bar, etc.) • ORDER BY • LIMIT • OFFSET
  • 10. DBGroup@UNIMO 11 Dot. Fabio Benedetti Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia D Day 2015 – Modena Italy LODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources Dot. Fabio Benedetti Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 11
  • 11. DBGroup@UNIMO 12 Dot. Fabio Benedetti Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia D Day 2015 – Modena Italy LODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources Dot. Fabio Benedetti Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 12 Try LODeX demo at: http://dbgroup.unimo.it/lodex2 [F. Benedetti, S. Bergamaschi, and L. Po, “Visual Querying LOD sources with LODeX,” 2014, submitted at The Semantic Web journal]
  • 12. DBGroup@UNIMO 13 Dot. Fabio Benedetti Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia D Day 2015 – Modena Italy LODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources Dot. Fabio Benedetti Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 13 Test Nov. 2014 Dataset URLs 559 Reachable datasets 302 SPARQL 1.1 compatible 206 Extraction completed 185 Task Correct Answers Schema Summary browsing 94% (32/34) Query generation 88% (60/68) Online survey with 17 anonymous users: • 8 Skilled users • 9 Unskilled user The survey is divided in two parts: • Schema Summary browsing clarity • Query generation
  • 13. DBGroup@UNIMO 14 Dot. Fabio Benedetti Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia D Day 2015 – Modena Italy LODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources Dot. Fabio Benedetti Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 14 • Modify the interface of LODeX according to the results of the online survey • Extends the VOID descriptor vocabulary in order to represent the Statistical Indexes and publish our data as LOD – Build an observatory for the LOD cloud • Define clustering techniques to reduce the size of the Summary for huge dataset
  • 14. DBGroup@UNIMO 15 Dot. Fabio Benedetti Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia D Day 2015 – Modena Italy LODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources Dot. Fabio Benedetti Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 15 Accepted papers • Beneventano, D., Bergamaschi, S., Sorrentino, S., Vincini, M., Benedetti, F. “Semantic annotation of the CEREALAB database by the AGROVOC linked dataset” (2014) Ecological Informatics journal, . Article in Press. • F. Benedetti, S. Bergamaschi, and L. Po, “Online index extraction from linked open data sources” 2014, Linked Data for Information Extraction (LD4IE) Workshop held at International Semantic Web Conference • F. Benedetti, S. Bergamaschi, and L. Po, “A visual summary for linked open data sources” 2014, International Semantic Web Conference (Posters & Demos) Submitted papers • F. Benedetti, S. Bergamaschi, and L. Po, “Visual Querying LOD sources with LODeX” 2014, submitted at Semantic Web – Interoperability, Usability, Applicability an IOS Press Journal European projects & schools • Web Science Summer School - Southampton University (20-26 July 2014) • RDA Research Data Alliance - RDA Fourth Plenary Meeting 22 - 24 September 2014 in Amsterdam. I won an Early Career Scientist grant and I belong to the Big Data Analytics Interest group. • Keystone - COST Action IC1302. Autumn 2014 MC and WG Meetings “QUERYING THE SEMANTIC WEB” 17-18 October 2014, Riva del Garda, TN.
  • 15. DBGroup@UNIMO 16 Dot. Fabio Benedetti Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia D Day 2015 – Modena Italy LODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources Thanks for your attention!