SlideShare ist ein Scribd-Unternehmen logo
1 von 23
Downloaden Sie, um offline zu lesen
How redundant is it? – An empirical analysis on linked datasets 
Honghan Wu1, Boris Villazon-Terrazas2, Jeff Z. Pan1 and José Manuel Gómez Pérez2 
University of Aberdeen1, UK 
iSOCO2 , Spain 
20/10/2014 1
2 
Content 
• 
What is data redundancy with linked data? 
• 
Why is it of special interest to linked data consumption? 
• 
Linked Data redundancy categorisation 
• 
How to analysis? 
• 
Dataset selection & The Result 
• 
Conclusion
3 
What is the data redundancy in LD? 
• 
Data Redundancy 
– 
[Database systems] Same piece of data in multiple places 
– 
[Information theory] Wasted "space" used to transmit certain data 
• 
(In this work)Linked Data Redundancy 
– 
Wasted “space” to represent certain meaning (represented in certain semantics) 
– 
Duplication-free
4 
Why is it of special interest to LD consumption? 
• 
Bad Redundancy & Good Redundancy 
– 
Bad for exchange: storage, transmission 
– 
Good for inference computation 
• 
Relevant consumption tasks 
– 
Hosting/Sharing 
– 
Query Answering (SPARQL) 
– 
Ontology Based Data Access 
– 
Reasoning
Redundancy in Linked Data 
• 
Redundancy Categorisation for RDF Data 
• 
Redundancies caused by the “Linked” nature
6 
RDF Redundancies vs. Succinct Representations 
[Rule based] A. K. Joshi, P. Hitzler, and G. Dong. Logical linked data compression. In The Semantic Web: Semantics and Big Data, pages 170–184. Springer, 2013. 
[HDT]J. D. FernáNdez, M. A. MartíNez-Prieto, C. GutiéRrez, A. Polleres, and M. Arias. Binary rdf representation for publication and exchange (hdt). Web Semant., 19:22–41, Mar. 2013. 
[WaterFowl] O. Curé, G. Blin, D. Revuz, and D. C. Faye. Waterfowl: A compact, self-indexed and inference-enabled immutable rdf store. In The Semantic Web: Trends and Challenges, pages 302– 316. Springer, 2014. 
Pan, Jeff Z., Jose Manuel Gomez-Perez, Yuan Ren, Honghan Wu, Haofen Wang and Man Zhu. “Graph Pattern based RDF Data Compression”. In Proc. of 4th Joint International Semantic Technology Conference (JIST). 2014. (To appear)
7 
Semantic redundancy 
Rule Representation 
- 
DL Axioms (T-Box) 
- 
Other semantics (graph pattern substitution)
8 
Syntactic Redundancy 
Concise syntax 
- 
RDF abbreviation & striping syntax 
- 
Intra-structure & Inter- structure
9 
Symbolic Redundancy 
• 
http://xmlns.com/foaf/0.1/name 
– 
31 bytes in ASCII 
URI 
ID (4 bytes) 
… 
… 
http://xmlns.com/foaf/0.1/name 
128 
… 
… 
Less bytes for basic data units 
- 
(Fix-length)Dictionary Based 
- 
(Variable-length) Huffman coding 
- 
Predictive encoding
10 
Semantic Redundancy Caused by “Linked” Nature 
• 
Vocabulary Linkage 
– 
Reuse of other vocabularies: more rules 
– 
Less redundancy ratio: more triples derivable 
– 
More redundancy: co-occurrence triples removable 
• 
Instance Linkage 
– 
sameAs linkages 
– 
Bring in new assertions (e.g., type assertions) 
– 
Bring in new axioms
How to analysis? 
• 
Two dimension analysis 
• 
Methodology 
• 
Metrics
12 
Two dimension analysis 
Semantic 
Syntactic 
Symbolic 
A-Box 
✔ 
✔ 
A-Box & T-Box 
No Linkage 
✔ 
- 
- 
T-Box Reuse 
✔ 
- 
- 
A-Box Linkage 
- 
- 
RDF Redundancy Dimension 
Linked Semantic 
Dimension
13 
Methodology: EDP Summarisation
14 
Virtually Materialised A-Box: expanded EDP 
A1, B1 (1) 
A2, B2 (1) 
A-Box: A1(o1) B1(o1) A2(o2) B2(o2) R(o1, o2) 
T-Box: A1⊆A, A2⊆A, B1⊆B, B2⊆B 
R (1:1) 
A, B, 
A, B,
Linked Dataset Analysis Results 
• 
Dataset Selection & Summary 
• 
Analysis Results
16 
Dataset Selection and Summary 
LOD 2011
17 
A-Box Only: Semantic Redundancies 
– Redundant Triples 
– Semantic redundancy ratio, i.e. 
– # Graph Patterns used to substitute redundant triples
18 
A-Box Only: Syntactic Redundancies 
– the redundant resource occurrences of inter-structural 
redundancies 
– the syntactic redundancy ratio, i.e.
19 
A-Box & T-Box: No Linkage 
DBLP2013: SWRC ontology 
Ordnance Survey: official published OS ontology 
1.7% 
184% 
108% 
4.7%
20 
A-Box & T-Box: No Linkage 
First 3 datasts are reusing FOAF Ontology 
– the number of directly used terms from reused T-Box 
– the number of applicable axioms from (materialised) reused T-Box 
26.9% 
4% 
45.4% 
1.3%
21 
Conclusion 
• 
LOD redundancy are heterogeneous & huge 
• 
Vocabulary linkage might lead to huge number of derivable triples 
• 
Redundancy aware techniques are demanded
22 
Redundancy-aware Consumption 
• 
Compression: different redundancies might need different techniques 
• 
For Data Access: (high inter-structure redundancy) skewed entity distributions over EDPs -> efficient access? 
• 
OBDA/Reasoning: A-Box redundancy = less T-Box axioms 
• 
Data Publisher: should be aware of the consequences of reusing
Thanks! Q & A

Weitere ähnliche Inhalte

Was ist angesagt?

Zhishi.me - Weaving Chinese Linking Open Data
Zhishi.me - Weaving Chinese Linking Open DataZhishi.me - Weaving Chinese Linking Open Data
Zhishi.me - Weaving Chinese Linking Open DataXing Niu
 
The Maze of Deletion in Ontology Stream Reasoning
The Maze of Deletion in Ontology Stream Reasoning The Maze of Deletion in Ontology Stream Reasoning
The Maze of Deletion in Ontology Stream Reasoning Jeff Z. Pan
 
Towards the implementation of a refined data model for a Zulu machine-readabl...
Towards the implementation of a refined data model for a Zulu machine-readabl...Towards the implementation of a refined data model for a Zulu machine-readabl...
Towards the implementation of a refined data model for a Zulu machine-readabl...Guy De Pauw
 
3. Stack - Data Structures using C++ by Varsha Patil
3. Stack - Data Structures using C++ by Varsha Patil3. Stack - Data Structures using C++ by Varsha Patil
3. Stack - Data Structures using C++ by Varsha Patilwidespreadpromotion
 
6. Linked list - Data Structures using C++ by Varsha Patil
6. Linked list - Data Structures using C++ by Varsha Patil6. Linked list - Data Structures using C++ by Varsha Patil
6. Linked list - Data Structures using C++ by Varsha Patilwidespreadpromotion
 
13. Indexing MTrees - Data Structures using C++ by Varsha Patil
13. Indexing MTrees - Data Structures using C++ by Varsha Patil13. Indexing MTrees - Data Structures using C++ by Varsha Patil
13. Indexing MTrees - Data Structures using C++ by Varsha Patilwidespreadpromotion
 
Effective Data Retrieval in XML using TreeMatch Algorithm
Effective Data Retrieval in XML using TreeMatch AlgorithmEffective Data Retrieval in XML using TreeMatch Algorithm
Effective Data Retrieval in XML using TreeMatch AlgorithmIRJET Journal
 
A hierarchical approach for semi structured document indexing and
A hierarchical approach for semi structured document indexing andA hierarchical approach for semi structured document indexing and
A hierarchical approach for semi structured document indexing andIbrahim Bounhas
 
5. Queue - Data Structures using C++ by Varsha Patil
5. Queue - Data Structures using C++ by Varsha Patil5. Queue - Data Structures using C++ by Varsha Patil
5. Queue - Data Structures using C++ by Varsha Patilwidespreadpromotion
 
On the Way to a Holding Ontology
On the Way to a Holding OntologyOn the Way to a Holding Ontology
On the Way to a Holding OntologyJakob .
 
SQL For PHP Programmers
SQL For PHP ProgrammersSQL For PHP Programmers
SQL For PHP ProgrammersDave Stokes
 
Contexts and Importing in RDF
Contexts and Importing in RDFContexts and Importing in RDF
Contexts and Importing in RDFJie Bao
 
1. Fundamental Concept - Data Structures using C++ by Varsha Patil
1. Fundamental Concept - Data Structures using C++ by Varsha Patil1. Fundamental Concept - Data Structures using C++ by Varsha Patil
1. Fundamental Concept - Data Structures using C++ by Varsha Patilwidespreadpromotion
 
10. Search Tree - Data Structures using C++ by Varsha Patil
10. Search Tree - Data Structures using C++ by Varsha Patil10. Search Tree - Data Structures using C++ by Varsha Patil
10. Search Tree - Data Structures using C++ by Varsha Patilwidespreadpromotion
 
7. Tree - Data Structures using C++ by Varsha Patil
7. Tree - Data Structures using C++ by Varsha Patil7. Tree - Data Structures using C++ by Varsha Patil
7. Tree - Data Structures using C++ by Varsha Patilwidespreadpromotion
 
14. Files - Data Structures using C++ by Varsha Patil
14. Files - Data Structures using C++ by Varsha Patil14. Files - Data Structures using C++ by Varsha Patil
14. Files - Data Structures using C++ by Varsha Patilwidespreadpromotion
 

Was ist angesagt? (20)

Zhishi.me - Weaving Chinese Linking Open Data
Zhishi.me - Weaving Chinese Linking Open DataZhishi.me - Weaving Chinese Linking Open Data
Zhishi.me - Weaving Chinese Linking Open Data
 
The Maze of Deletion in Ontology Stream Reasoning
The Maze of Deletion in Ontology Stream Reasoning The Maze of Deletion in Ontology Stream Reasoning
The Maze of Deletion in Ontology Stream Reasoning
 
Towards the implementation of a refined data model for a Zulu machine-readabl...
Towards the implementation of a refined data model for a Zulu machine-readabl...Towards the implementation of a refined data model for a Zulu machine-readabl...
Towards the implementation of a refined data model for a Zulu machine-readabl...
 
3. Stack - Data Structures using C++ by Varsha Patil
3. Stack - Data Structures using C++ by Varsha Patil3. Stack - Data Structures using C++ by Varsha Patil
3. Stack - Data Structures using C++ by Varsha Patil
 
6. Linked list - Data Structures using C++ by Varsha Patil
6. Linked list - Data Structures using C++ by Varsha Patil6. Linked list - Data Structures using C++ by Varsha Patil
6. Linked list - Data Structures using C++ by Varsha Patil
 
13. Indexing MTrees - Data Structures using C++ by Varsha Patil
13. Indexing MTrees - Data Structures using C++ by Varsha Patil13. Indexing MTrees - Data Structures using C++ by Varsha Patil
13. Indexing MTrees - Data Structures using C++ by Varsha Patil
 
Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-
 
Effective Data Retrieval in XML using TreeMatch Algorithm
Effective Data Retrieval in XML using TreeMatch AlgorithmEffective Data Retrieval in XML using TreeMatch Algorithm
Effective Data Retrieval in XML using TreeMatch Algorithm
 
Efficient RDF Interchange (ERI) Format for RDF Data Streams
Efficient RDF Interchange (ERI) Format for RDF Data StreamsEfficient RDF Interchange (ERI) Format for RDF Data Streams
Efficient RDF Interchange (ERI) Format for RDF Data Streams
 
A hierarchical approach for semi structured document indexing and
A hierarchical approach for semi structured document indexing andA hierarchical approach for semi structured document indexing and
A hierarchical approach for semi structured document indexing and
 
5. Queue - Data Structures using C++ by Varsha Patil
5. Queue - Data Structures using C++ by Varsha Patil5. Queue - Data Structures using C++ by Varsha Patil
5. Queue - Data Structures using C++ by Varsha Patil
 
On the Way to a Holding Ontology
On the Way to a Holding OntologyOn the Way to a Holding Ontology
On the Way to a Holding Ontology
 
SQL For PHP Programmers
SQL For PHP ProgrammersSQL For PHP Programmers
SQL For PHP Programmers
 
Contexts and Importing in RDF
Contexts and Importing in RDFContexts and Importing in RDF
Contexts and Importing in RDF
 
1. Fundamental Concept - Data Structures using C++ by Varsha Patil
1. Fundamental Concept - Data Structures using C++ by Varsha Patil1. Fundamental Concept - Data Structures using C++ by Varsha Patil
1. Fundamental Concept - Data Structures using C++ by Varsha Patil
 
normalization
normalizationnormalization
normalization
 
10. Search Tree - Data Structures using C++ by Varsha Patil
10. Search Tree - Data Structures using C++ by Varsha Patil10. Search Tree - Data Structures using C++ by Varsha Patil
10. Search Tree - Data Structures using C++ by Varsha Patil
 
7. Tree - Data Structures using C++ by Varsha Patil
7. Tree - Data Structures using C++ by Varsha Patil7. Tree - Data Structures using C++ by Varsha Patil
7. Tree - Data Structures using C++ by Varsha Patil
 
Introduction to LDL 2012
Introduction to LDL 2012Introduction to LDL 2012
Introduction to LDL 2012
 
14. Files - Data Structures using C++ by Varsha Patil
14. Files - Data Structures using C++ by Varsha Patil14. Files - Data Structures using C++ by Varsha Patil
14. Files - Data Structures using C++ by Varsha Patil
 

Andere mochten auch

Introduction to Data Mining for Newbies
Introduction to Data Mining for NewbiesIntroduction to Data Mining for Newbies
Introduction to Data Mining for NewbiesEunjeong (Lucy) Park
 
Jonanthan Leopard's Visual Resume
Jonanthan Leopard's Visual ResumeJonanthan Leopard's Visual Resume
Jonanthan Leopard's Visual Resumejonleopard
 
Experimental investigation of effectiveness of heat wheel as a rotory heat ex...
Experimental investigation of effectiveness of heat wheel as a rotory heat ex...Experimental investigation of effectiveness of heat wheel as a rotory heat ex...
Experimental investigation of effectiveness of heat wheel as a rotory heat ex...eSAT Publishing House
 
Design of a usb based data acquisition system
Design of a usb based data acquisition systemDesign of a usb based data acquisition system
Design of a usb based data acquisition systemeSAT Publishing House
 
Performance bounds for unequally punctured
Performance bounds for unequally puncturedPerformance bounds for unequally punctured
Performance bounds for unequally puncturedeSAT Publishing House
 
A comprehensive survey on security issues in cloud computing and data privacy...
A comprehensive survey on security issues in cloud computing and data privacy...A comprehensive survey on security issues in cloud computing and data privacy...
A comprehensive survey on security issues in cloud computing and data privacy...eSAT Publishing House
 
Implementation of delay measurement technique using signature register for sm...
Implementation of delay measurement technique using signature register for sm...Implementation of delay measurement technique using signature register for sm...
Implementation of delay measurement technique using signature register for sm...eSAT Publishing House
 
Загадки о животных
Загадки о животныхЗагадки о животных
Загадки о животныхdrugsem
 
Solentive / InRule AADI Gartner Summit 2014
Solentive / InRule AADI Gartner Summit 2014Solentive / InRule AADI Gartner Summit 2014
Solentive / InRule AADI Gartner Summit 2014Solentive
 
Космическое фотопутешествие с телескопом хаббл
Космическое фотопутешествие с телескопом хабблКосмическое фотопутешествие с телескопом хаббл
Космическое фотопутешествие с телескопом хабблdrugsem
 
NBPC 1613 San Diego, CA Proposed 2014 bylaws draft_july_24_unanimous_consensu...
NBPC 1613 San Diego, CA Proposed 2014 bylaws draft_july_24_unanimous_consensu...NBPC 1613 San Diego, CA Proposed 2014 bylaws draft_july_24_unanimous_consensu...
NBPC 1613 San Diego, CA Proposed 2014 bylaws draft_july_24_unanimous_consensu...NBPCSanDiego
 
A language independent web data extraction using vision based page segmentati...
A language independent web data extraction using vision based page segmentati...A language independent web data extraction using vision based page segmentati...
A language independent web data extraction using vision based page segmentati...eSAT Publishing House
 
Road map of development for pull system in thailand small and medium automoti...
Road map of development for pull system in thailand small and medium automoti...Road map of development for pull system in thailand small and medium automoti...
Road map of development for pull system in thailand small and medium automoti...eSAT Publishing House
 
Study of protein content and effect of p h variation on solubility of seed pr...
Study of protein content and effect of p h variation on solubility of seed pr...Study of protein content and effect of p h variation on solubility of seed pr...
Study of protein content and effect of p h variation on solubility of seed pr...eSAT Publishing House
 
Ga based dynamic routing in wdm optical networks
Ga based dynamic routing in wdm optical networksGa based dynamic routing in wdm optical networks
Ga based dynamic routing in wdm optical networkseSAT Publishing House
 
Andrea paola duran 11- 03 trabajo
Andrea paola duran 11- 03 trabajoAndrea paola duran 11- 03 trabajo
Andrea paola duran 11- 03 trabajoAndrea Duran ʚïɞ
 
Hardware cristian villavicencio 1
Hardware cristian villavicencio 1Hardware cristian villavicencio 1
Hardware cristian villavicencio 1cristianlukas
 
Kaoru.K_portfolio_ADV124
Kaoru.K_portfolio_ADV124Kaoru.K_portfolio_ADV124
Kaoru.K_portfolio_ADV124Kaoru Kishigami
 

Andere mochten auch (20)

Introduction to Data Mining for Newbies
Introduction to Data Mining for NewbiesIntroduction to Data Mining for Newbies
Introduction to Data Mining for Newbies
 
Jonanthan Leopard's Visual Resume
Jonanthan Leopard's Visual ResumeJonanthan Leopard's Visual Resume
Jonanthan Leopard's Visual Resume
 
Experimental investigation of effectiveness of heat wheel as a rotory heat ex...
Experimental investigation of effectiveness of heat wheel as a rotory heat ex...Experimental investigation of effectiveness of heat wheel as a rotory heat ex...
Experimental investigation of effectiveness of heat wheel as a rotory heat ex...
 
Design of a usb based data acquisition system
Design of a usb based data acquisition systemDesign of a usb based data acquisition system
Design of a usb based data acquisition system
 
Performance bounds for unequally punctured
Performance bounds for unequally puncturedPerformance bounds for unequally punctured
Performance bounds for unequally punctured
 
A comprehensive survey on security issues in cloud computing and data privacy...
A comprehensive survey on security issues in cloud computing and data privacy...A comprehensive survey on security issues in cloud computing and data privacy...
A comprehensive survey on security issues in cloud computing and data privacy...
 
IVT Företagspresentation
IVT FöretagspresentationIVT Företagspresentation
IVT Företagspresentation
 
Implementation of delay measurement technique using signature register for sm...
Implementation of delay measurement technique using signature register for sm...Implementation of delay measurement technique using signature register for sm...
Implementation of delay measurement technique using signature register for sm...
 
Загадки о животных
Загадки о животныхЗагадки о животных
Загадки о животных
 
Solentive / InRule AADI Gartner Summit 2014
Solentive / InRule AADI Gartner Summit 2014Solentive / InRule AADI Gartner Summit 2014
Solentive / InRule AADI Gartner Summit 2014
 
Космическое фотопутешествие с телескопом хаббл
Космическое фотопутешествие с телескопом хабблКосмическое фотопутешествие с телескопом хаббл
Космическое фотопутешествие с телескопом хаббл
 
NBPC 1613 San Diego, CA Proposed 2014 bylaws draft_july_24_unanimous_consensu...
NBPC 1613 San Diego, CA Proposed 2014 bylaws draft_july_24_unanimous_consensu...NBPC 1613 San Diego, CA Proposed 2014 bylaws draft_july_24_unanimous_consensu...
NBPC 1613 San Diego, CA Proposed 2014 bylaws draft_july_24_unanimous_consensu...
 
A language independent web data extraction using vision based page segmentati...
A language independent web data extraction using vision based page segmentati...A language independent web data extraction using vision based page segmentati...
A language independent web data extraction using vision based page segmentati...
 
Ar
ArAr
Ar
 
Road map of development for pull system in thailand small and medium automoti...
Road map of development for pull system in thailand small and medium automoti...Road map of development for pull system in thailand small and medium automoti...
Road map of development for pull system in thailand small and medium automoti...
 
Study of protein content and effect of p h variation on solubility of seed pr...
Study of protein content and effect of p h variation on solubility of seed pr...Study of protein content and effect of p h variation on solubility of seed pr...
Study of protein content and effect of p h variation on solubility of seed pr...
 
Ga based dynamic routing in wdm optical networks
Ga based dynamic routing in wdm optical networksGa based dynamic routing in wdm optical networks
Ga based dynamic routing in wdm optical networks
 
Andrea paola duran 11- 03 trabajo
Andrea paola duran 11- 03 trabajoAndrea paola duran 11- 03 trabajo
Andrea paola duran 11- 03 trabajo
 
Hardware cristian villavicencio 1
Hardware cristian villavicencio 1Hardware cristian villavicencio 1
Hardware cristian villavicencio 1
 
Kaoru.K_portfolio_ADV124
Kaoru.K_portfolio_ADV124Kaoru.K_portfolio_ADV124
Kaoru.K_portfolio_ADV124
 

Ähnlich wie Redundancy analysis on linked data #cold2014 #ISWC2014

Data curation and data archiving at different stages of the research process
Data curation and data archiving at different stages of the research processData curation and data archiving at different stages of the research process
Data curation and data archiving at different stages of the research processAndrea Scharnhorst
 
The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedSören Auer
 
ESWC 2019 - A Software Framework and Datasets for the Analysis of Graphs Meas...
ESWC 2019 - A Software Framework and Datasets for the Analysis of Graphs Meas...ESWC 2019 - A Software Framework and Datasets for the Analysis of Graphs Meas...
ESWC 2019 - A Software Framework and Datasets for the Analysis of Graphs Meas...Matthäus Zloch
 
Efficient Query Answering against Dynamic RDF Databases
Efficient Query Answering against Dynamic RDF DatabasesEfficient Query Answering against Dynamic RDF Databases
Efficient Query Answering against Dynamic RDF DatabasesAlexandra Roatiș
 
Efficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
Efficient Distributed In-Memory Processing of RDF Datasets - PhD VivaEfficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
Efficient Distributed In-Memory Processing of RDF Datasets - PhD VivaGezim Sejdiu
 
Explanations in Dialogue Systems through Uncertain RDF Knowledge Bases
Explanations in Dialogue Systems through Uncertain RDF Knowledge BasesExplanations in Dialogue Systems through Uncertain RDF Knowledge Bases
Explanations in Dialogue Systems through Uncertain RDF Knowledge BasesDaniel Sonntag
 
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...CONUL Conference
 
Sem facet paper
Sem facet paperSem facet paper
Sem facet paperDBOnto
 
SemFacet paper
SemFacet paperSemFacet paper
SemFacet paperDBOnto
 
Fedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data ProcessingFedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data ProcessingPeter Haase
 
Data Integration at the Ontology Engineering Group
Data Integration at the Ontology Engineering GroupData Integration at the Ontology Engineering Group
Data Integration at the Ontology Engineering GroupOscar Corcho
 
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeNational Institute of Informatics
 
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRathachai Chawuthai
 
RDF Stream Processing and the role of Semantics
RDF Stream Processing and the role of SemanticsRDF Stream Processing and the role of Semantics
RDF Stream Processing and the role of SemanticsJean-Paul Calbimonte
 
Automatically converting tabular data to
Automatically converting tabular data toAutomatically converting tabular data to
Automatically converting tabular data toIJwest
 
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesReasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesOntotext
 

Ähnlich wie Redundancy analysis on linked data #cold2014 #ISWC2014 (20)

Data curation and data archiving at different stages of the research process
Data curation and data archiving at different stages of the research processData curation and data archiving at different stages of the research process
Data curation and data archiving at different stages of the research process
 
The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge stripped
 
ESWC 2019 - A Software Framework and Datasets for the Analysis of Graphs Meas...
ESWC 2019 - A Software Framework and Datasets for the Analysis of Graphs Meas...ESWC 2019 - A Software Framework and Datasets for the Analysis of Graphs Meas...
ESWC 2019 - A Software Framework and Datasets for the Analysis of Graphs Meas...
 
Efficient Query Answering against Dynamic RDF Databases
Efficient Query Answering against Dynamic RDF DatabasesEfficient Query Answering against Dynamic RDF Databases
Efficient Query Answering against Dynamic RDF Databases
 
Democratizing Big Semantic Data management
Democratizing Big Semantic Data managementDemocratizing Big Semantic Data management
Democratizing Big Semantic Data management
 
Efficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
Efficient Distributed In-Memory Processing of RDF Datasets - PhD VivaEfficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
Efficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
 
Explanations in Dialogue Systems through Uncertain RDF Knowledge Bases
Explanations in Dialogue Systems through Uncertain RDF Knowledge BasesExplanations in Dialogue Systems through Uncertain RDF Knowledge Bases
Explanations in Dialogue Systems through Uncertain RDF Knowledge Bases
 
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
 
Sem facet paper
Sem facet paperSem facet paper
Sem facet paper
 
SemFacet paper
SemFacet paperSemFacet paper
SemFacet paper
 
Fedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data ProcessingFedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data Processing
 
Data Integration at the Ontology Engineering Group
Data Integration at the Ontology Engineering GroupData Integration at the Ontology Engineering Group
Data Integration at the Ontology Engineering Group
 
Quantifying the bias in data links
Quantifying the bias in data linksQuantifying the bias in data links
Quantifying the bias in data links
 
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
 
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
 
Timbuctoo 2 EASY
Timbuctoo 2 EASYTimbuctoo 2 EASY
Timbuctoo 2 EASY
 
RDF Stream Processing and the role of Semantics
RDF Stream Processing and the role of SemanticsRDF Stream Processing and the role of Semantics
RDF Stream Processing and the role of Semantics
 
Automatically converting tabular data to
Automatically converting tabular data toAutomatically converting tabular data to
Automatically converting tabular data to
 
semantic web & natural language
semantic web & natural languagesemantic web & natural language
semantic web & natural language
 
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesReasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 

Kürzlich hochgeladen (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 

Redundancy analysis on linked data #cold2014 #ISWC2014

  • 1. How redundant is it? – An empirical analysis on linked datasets Honghan Wu1, Boris Villazon-Terrazas2, Jeff Z. Pan1 and José Manuel Gómez Pérez2 University of Aberdeen1, UK iSOCO2 , Spain 20/10/2014 1
  • 2. 2 Content • What is data redundancy with linked data? • Why is it of special interest to linked data consumption? • Linked Data redundancy categorisation • How to analysis? • Dataset selection & The Result • Conclusion
  • 3. 3 What is the data redundancy in LD? • Data Redundancy – [Database systems] Same piece of data in multiple places – [Information theory] Wasted "space" used to transmit certain data • (In this work)Linked Data Redundancy – Wasted “space” to represent certain meaning (represented in certain semantics) – Duplication-free
  • 4. 4 Why is it of special interest to LD consumption? • Bad Redundancy & Good Redundancy – Bad for exchange: storage, transmission – Good for inference computation • Relevant consumption tasks – Hosting/Sharing – Query Answering (SPARQL) – Ontology Based Data Access – Reasoning
  • 5. Redundancy in Linked Data • Redundancy Categorisation for RDF Data • Redundancies caused by the “Linked” nature
  • 6. 6 RDF Redundancies vs. Succinct Representations [Rule based] A. K. Joshi, P. Hitzler, and G. Dong. Logical linked data compression. In The Semantic Web: Semantics and Big Data, pages 170–184. Springer, 2013. [HDT]J. D. FernáNdez, M. A. MartíNez-Prieto, C. GutiéRrez, A. Polleres, and M. Arias. Binary rdf representation for publication and exchange (hdt). Web Semant., 19:22–41, Mar. 2013. [WaterFowl] O. Curé, G. Blin, D. Revuz, and D. C. Faye. Waterfowl: A compact, self-indexed and inference-enabled immutable rdf store. In The Semantic Web: Trends and Challenges, pages 302– 316. Springer, 2014. Pan, Jeff Z., Jose Manuel Gomez-Perez, Yuan Ren, Honghan Wu, Haofen Wang and Man Zhu. “Graph Pattern based RDF Data Compression”. In Proc. of 4th Joint International Semantic Technology Conference (JIST). 2014. (To appear)
  • 7. 7 Semantic redundancy Rule Representation - DL Axioms (T-Box) - Other semantics (graph pattern substitution)
  • 8. 8 Syntactic Redundancy Concise syntax - RDF abbreviation & striping syntax - Intra-structure & Inter- structure
  • 9. 9 Symbolic Redundancy • http://xmlns.com/foaf/0.1/name – 31 bytes in ASCII URI ID (4 bytes) … … http://xmlns.com/foaf/0.1/name 128 … … Less bytes for basic data units - (Fix-length)Dictionary Based - (Variable-length) Huffman coding - Predictive encoding
  • 10. 10 Semantic Redundancy Caused by “Linked” Nature • Vocabulary Linkage – Reuse of other vocabularies: more rules – Less redundancy ratio: more triples derivable – More redundancy: co-occurrence triples removable • Instance Linkage – sameAs linkages – Bring in new assertions (e.g., type assertions) – Bring in new axioms
  • 11. How to analysis? • Two dimension analysis • Methodology • Metrics
  • 12. 12 Two dimension analysis Semantic Syntactic Symbolic A-Box ✔ ✔ A-Box & T-Box No Linkage ✔ - - T-Box Reuse ✔ - - A-Box Linkage - - RDF Redundancy Dimension Linked Semantic Dimension
  • 13. 13 Methodology: EDP Summarisation
  • 14. 14 Virtually Materialised A-Box: expanded EDP A1, B1 (1) A2, B2 (1) A-Box: A1(o1) B1(o1) A2(o2) B2(o2) R(o1, o2) T-Box: A1⊆A, A2⊆A, B1⊆B, B2⊆B R (1:1) A, B, A, B,
  • 15. Linked Dataset Analysis Results • Dataset Selection & Summary • Analysis Results
  • 16. 16 Dataset Selection and Summary LOD 2011
  • 17. 17 A-Box Only: Semantic Redundancies – Redundant Triples – Semantic redundancy ratio, i.e. – # Graph Patterns used to substitute redundant triples
  • 18. 18 A-Box Only: Syntactic Redundancies – the redundant resource occurrences of inter-structural redundancies – the syntactic redundancy ratio, i.e.
  • 19. 19 A-Box & T-Box: No Linkage DBLP2013: SWRC ontology Ordnance Survey: official published OS ontology 1.7% 184% 108% 4.7%
  • 20. 20 A-Box & T-Box: No Linkage First 3 datasts are reusing FOAF Ontology – the number of directly used terms from reused T-Box – the number of applicable axioms from (materialised) reused T-Box 26.9% 4% 45.4% 1.3%
  • 21. 21 Conclusion • LOD redundancy are heterogeneous & huge • Vocabulary linkage might lead to huge number of derivable triples • Redundancy aware techniques are demanded
  • 22. 22 Redundancy-aware Consumption • Compression: different redundancies might need different techniques • For Data Access: (high inter-structure redundancy) skewed entity distributions over EDPs -> efficient access? • OBDA/Reasoning: A-Box redundancy = less T-Box axioms • Data Publisher: should be aware of the consequences of reusing