SlideShare ist ein Scribd-Unternehmen logo
1 von 34
Javier D. Fernández, Miguel A. Martínez-Prieto, Claudio Gutiérrez Compact Representation of Large RDF Data Sets for Publishing and Exchange
The Motivation ,[object Object]
Syntaxes oriented mainly to represent documents
RDF/XML, N3, Turtle, JSON, etc.
Document-centric data-centricview
Redundancy
No structure(chunks)
Lackof metadata
sequentiality of theinformation
Use?
examples:
Billion Triple 2010 (~3200M triples, 318 gzippedchunks, ~27GB)
Uniprot (~845M, 12 gzippedchunks, ~23GB)Pag 2 Image: renjithkrishnan / FreeDigitalPhotos.net
Real World example: Billion Triple 2010 Where is the metadata? Who did publish this? Do I have all the data? ? [318] PUBLICATION EXCHANGE RDF RDF RDF gzip RDF RDF RDF gzip [318] basicoperations Pag 3
Needs Theaims of theformat are:  ,[object Object]
Metadata
Compactness
Efficient exchange
RDF compression
Basic data operations Pag 4 Image: jscreationzs / FreeDigitalPhotos.net
HDT Overview HDT ,[object Object]
Phylosophy of publication and exchange,
Compact RDF representation
basedon 3 maincomponents:  Header, Dictionary and TriplesPag 5
HDT Overview Pag 6
Header Metadatainformationaboutthe RDF collection ,[object Object]
Source and providerinformation
Publication data
Data set statistics
Otherinformation
Information required to retrieve and process the represented data
Location/s, format/s, encoding/s, etc.Pag 7
Header use ? Header Header [318] HDT HDT RDF RDF RDF HDT HDT RDF RDF RDF HDT HDT Dictionary &Triples [318] Dictionary &Triples Pag 8
Header in Practice http://purl.org/HDT/hdt# SWP SCOVO, SDMX, hdt Void, DublinCore, etc hdt Pag 9

Weitere ähnliche Inhalte

Was ist angesagt?

SOLID principles with Typescript examples
SOLID principles with Typescript examplesSOLID principles with Typescript examples
SOLID principles with Typescript examplesAndrew Nester
 
Geospatial and MongoDB
Geospatial and MongoDBGeospatial and MongoDB
Geospatial and MongoDBNorberto Leite
 
Vector Search for Data Scientists.pdf
Vector Search for Data Scientists.pdfVector Search for Data Scientists.pdf
Vector Search for Data Scientists.pdfConnorShorten2
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA
 
Back to Basics 1: Thinking in documents
Back to Basics 1: Thinking in documentsBack to Basics 1: Thinking in documents
Back to Basics 1: Thinking in documentsMongoDB
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBRavi Teja
 
MongoDB GeoSpatial Feature
MongoDB GeoSpatial FeatureMongoDB GeoSpatial Feature
MongoDB GeoSpatial FeatureHüseyin BABAL
 
Optimising Geospatial Queries with Dynamic File Pruning
Optimising Geospatial Queries with Dynamic File PruningOptimising Geospatial Queries with Dynamic File Pruning
Optimising Geospatial Queries with Dynamic File PruningDatabricks
 
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Simplilearn
 
Basics of MongoDB
Basics of MongoDB Basics of MongoDB
Basics of MongoDB Habilelabs
 
Qualicorp Scales to Millions of Customers and Data Relationships to Provide W...
Qualicorp Scales to Millions of Customers and Data Relationships to Provide W...Qualicorp Scales to Millions of Customers and Data Relationships to Provide W...
Qualicorp Scales to Millions of Customers and Data Relationships to Provide W...Neo4j
 
JavaOne 2013: Memory Efficient Java
JavaOne 2013: Memory Efficient JavaJavaOne 2013: Memory Efficient Java
JavaOne 2013: Memory Efficient JavaChris Bailey
 
The Apache Spark File Format Ecosystem
The Apache Spark File Format EcosystemThe Apache Spark File Format Ecosystem
The Apache Spark File Format EcosystemDatabricks
 
9. Document Oriented Databases
9. Document Oriented Databases9. Document Oriented Databases
9. Document Oriented DatabasesFabio Fumarola
 
How Kafka Powers the World's Most Popular Vector Database System with Charles...
How Kafka Powers the World's Most Popular Vector Database System with Charles...How Kafka Powers the World's Most Popular Vector Database System with Charles...
How Kafka Powers the World's Most Popular Vector Database System with Charles...HostedbyConfluent
 
FHIR tutorial - Afternoon
FHIR tutorial - AfternoonFHIR tutorial - Afternoon
FHIR tutorial - AfternoonEwout Kramer
 
Extending Machine Learning Algorithms with PySpark
Extending Machine Learning Algorithms with PySparkExtending Machine Learning Algorithms with PySpark
Extending Machine Learning Algorithms with PySparkDatabricks
 

Was ist angesagt? (20)

SOLID principles with Typescript examples
SOLID principles with Typescript examplesSOLID principles with Typescript examples
SOLID principles with Typescript examples
 
Geospatial and MongoDB
Geospatial and MongoDBGeospatial and MongoDB
Geospatial and MongoDB
 
Vector Search for Data Scientists.pdf
Vector Search for Data Scientists.pdfVector Search for Data Scientists.pdf
Vector Search for Data Scientists.pdf
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
 
Back to Basics 1: Thinking in documents
Back to Basics 1: Thinking in documentsBack to Basics 1: Thinking in documents
Back to Basics 1: Thinking in documents
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
MongoDB GeoSpatial Feature
MongoDB GeoSpatial FeatureMongoDB GeoSpatial Feature
MongoDB GeoSpatial Feature
 
Optimising Geospatial Queries with Dynamic File Pruning
Optimising Geospatial Queries with Dynamic File PruningOptimising Geospatial Queries with Dynamic File Pruning
Optimising Geospatial Queries with Dynamic File Pruning
 
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
 
Basics of MongoDB
Basics of MongoDB Basics of MongoDB
Basics of MongoDB
 
Qualicorp Scales to Millions of Customers and Data Relationships to Provide W...
Qualicorp Scales to Millions of Customers and Data Relationships to Provide W...Qualicorp Scales to Millions of Customers and Data Relationships to Provide W...
Qualicorp Scales to Millions of Customers and Data Relationships to Provide W...
 
JavaOne 2013: Memory Efficient Java
JavaOne 2013: Memory Efficient JavaJavaOne 2013: Memory Efficient Java
JavaOne 2013: Memory Efficient Java
 
The Apache Spark File Format Ecosystem
The Apache Spark File Format EcosystemThe Apache Spark File Format Ecosystem
The Apache Spark File Format Ecosystem
 
ontop: A tutorial
ontop: A tutorialontop: A tutorial
ontop: A tutorial
 
Go, meet Lua
Go, meet LuaGo, meet Lua
Go, meet Lua
 
9. Document Oriented Databases
9. Document Oriented Databases9. Document Oriented Databases
9. Document Oriented Databases
 
How Kafka Powers the World's Most Popular Vector Database System with Charles...
How Kafka Powers the World's Most Popular Vector Database System with Charles...How Kafka Powers the World's Most Popular Vector Database System with Charles...
How Kafka Powers the World's Most Popular Vector Database System with Charles...
 
FHIR tutorial - Afternoon
FHIR tutorial - AfternoonFHIR tutorial - Afternoon
FHIR tutorial - Afternoon
 
OrientDB
OrientDBOrientDB
OrientDB
 
Extending Machine Learning Algorithms with PySpark
Extending Machine Learning Algorithms with PySparkExtending Machine Learning Algorithms with PySpark
Extending Machine Learning Algorithms with PySpark
 

Andere mochten auch

F14 101 syllabus
F14 101 syllabusF14 101 syllabus
F14 101 syllabusGale Pooley
 
Lecture 5 handout
Lecture 5 handoutLecture 5 handout
Lecture 5 handoutGale Pooley
 
Creative Writing Projects at the National Aspergillosis Centre
Creative Writing Projects at the National Aspergillosis CentreCreative Writing Projects at the National Aspergillosis Centre
Creative Writing Projects at the National Aspergillosis CentreGraham Atherton
 
Oracle Day 2013 ~ MySQL Replication
Oracle Day 2013 ~ MySQL Replication Oracle Day 2013 ~ MySQL Replication
Oracle Day 2013 ~ MySQL Replication Manuel Contreras
 
ソーシャル+動画 セミナー
ソーシャル+動画 セミナーソーシャル+動画 セミナー
ソーシャル+動画 セミナーYuichi Morito
 
BrandZ Top 50 Most Valuable Latin American Brands 2014
BrandZ Top 50 Most Valuable Latin American Brands 2014BrandZ Top 50 Most Valuable Latin American Brands 2014
BrandZ Top 50 Most Valuable Latin American Brands 2014Kantar
 
Brand Ministry - Cas Client Mercialys
Brand Ministry - Cas Client MercialysBrand Ministry - Cas Client Mercialys
Brand Ministry - Cas Client MercialysBrand Ministry
 
Get visible on google+
Get visible on google+Get visible on google+
Get visible on google+Bill Layton
 
Artical. list. report
Artical. list. reportArtical. list. report
Artical. list. reportsadia213
 
Millward Brown AdReaction 2012: Kenya
Millward Brown AdReaction 2012: KenyaMillward Brown AdReaction 2012: Kenya
Millward Brown AdReaction 2012: KenyaKantar
 
Yard shed designs
Yard shed designsYard shed designs
Yard shed designsJim Young
 
clean tech Industry Analysis
clean tech Industry Analysisclean tech Industry Analysis
clean tech Industry AnalysisManvindra Singh
 

Andere mochten auch (20)

F14 101 syllabus
F14 101 syllabusF14 101 syllabus
F14 101 syllabus
 
Lecture 5 handout
Lecture 5 handoutLecture 5 handout
Lecture 5 handout
 
Creative Writing Projects at the National Aspergillosis Centre
Creative Writing Projects at the National Aspergillosis CentreCreative Writing Projects at the National Aspergillosis Centre
Creative Writing Projects at the National Aspergillosis Centre
 
Oracle Day 2013 ~ MySQL Replication
Oracle Day 2013 ~ MySQL Replication Oracle Day 2013 ~ MySQL Replication
Oracle Day 2013 ~ MySQL Replication
 
ソーシャル+動画 セミナー
ソーシャル+動画 セミナーソーシャル+動画 セミナー
ソーシャル+動画 セミナー
 
Headache
HeadacheHeadache
Headache
 
BrandZ Top 50 Most Valuable Latin American Brands 2014
BrandZ Top 50 Most Valuable Latin American Brands 2014BrandZ Top 50 Most Valuable Latin American Brands 2014
BrandZ Top 50 Most Valuable Latin American Brands 2014
 
Brand Ministry - Cas Client Mercialys
Brand Ministry - Cas Client MercialysBrand Ministry - Cas Client Mercialys
Brand Ministry - Cas Client Mercialys
 
I session short
I session shortI session short
I session short
 
Get visible on google+
Get visible on google+Get visible on google+
Get visible on google+
 
How do I TWEET?
How do I TWEET?How do I TWEET?
How do I TWEET?
 
Artical. list. report
Artical. list. reportArtical. list. report
Artical. list. report
 
590 2
590 2590 2
590 2
 
367 lecture 4
367 lecture 4367 lecture 4
367 lecture 4
 
Millward Brown AdReaction 2012: Kenya
Millward Brown AdReaction 2012: KenyaMillward Brown AdReaction 2012: Kenya
Millward Brown AdReaction 2012: Kenya
 
Yard shed designs
Yard shed designsYard shed designs
Yard shed designs
 
Sigue tuestrella
Sigue tuestrellaSigue tuestrella
Sigue tuestrella
 
374 2
374 2374 2
374 2
 
Text 1-113
Text 1-113Text 1-113
Text 1-113
 
clean tech Industry Analysis
clean tech Industry Analysisclean tech Industry Analysis
clean tech Industry Analysis
 

Ähnlich wie Compact Representation of Large RDF Data Sets for Publishing and Exchange

Inductive Triple Graphs: A purely functional approach to represent RDF
Inductive Triple Graphs: A purely functional approach to represent RDFInductive Triple Graphs: A purely functional approach to represent RDF
Inductive Triple Graphs: A purely functional approach to represent RDFJose Emilio Labra Gayo
 
Getty Vocabulary Program LOD: Ontologies and Semantic Representation
Getty Vocabulary Program LOD: Ontologies and Semantic RepresentationGetty Vocabulary Program LOD: Ontologies and Semantic Representation
Getty Vocabulary Program LOD: Ontologies and Semantic RepresentationVladimir Alexiev, PhD, PMP
 
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...Jinho Choi
 
Making Use of the Linked Data Cloud: The Role of Index Structures
Making Use of the Linked Data Cloud: The Role of Index StructuresMaking Use of the Linked Data Cloud: The Role of Index Structures
Making Use of the Linked Data Cloud: The Role of Index StructuresThomas Gottron
 
A middleware for storing massive RDF graphs into NoSQL
A middleware for storing massive RDF graphs into NoSQLA middleware for storing massive RDF graphs into NoSQL
A middleware for storing massive RDF graphs into NoSQLLuiz Henrique Zambom Santana
 
Image Similarity Detection at Scale Using LSH and Tensorflow with Andrey Gusev
Image Similarity Detection at Scale Using LSH and Tensorflow with Andrey GusevImage Similarity Detection at Scale Using LSH and Tensorflow with Andrey Gusev
Image Similarity Detection at Scale Using LSH and Tensorflow with Andrey GusevDatabricks
 
Graph Analytics with ArangoDB
Graph Analytics with ArangoDBGraph Analytics with ArangoDB
Graph Analytics with ArangoDBArangoDB Database
 
introtorandrstudio.ppt
introtorandrstudio.pptintrotorandrstudio.ppt
introtorandrstudio.pptMalkaParveen3
 
R Brown-bag seminars : Seminar-8
R Brown-bag seminars : Seminar-8R Brown-bag seminars : Seminar-8
R Brown-bag seminars : Seminar-8Muhammad Nabi Ahmad
 
THoSP: an Algorithm for Nesting Property Graphs
THoSP: an Algorithm for Nesting Property GraphsTHoSP: an Algorithm for Nesting Property Graphs
THoSP: an Algorithm for Nesting Property GraphsGiacomo Bergami
 
Map-Side Merge Joins for Scalable SPARQL BGP Processing
Map-Side Merge Joins for Scalable SPARQL BGP ProcessingMap-Side Merge Joins for Scalable SPARQL BGP Processing
Map-Side Merge Joins for Scalable SPARQL BGP ProcessingAlexander Schätzle
 
Rstudio is an integrated development environment for R that allows users to i...
Rstudio is an integrated development environment for R that allows users to i...Rstudio is an integrated development environment for R that allows users to i...
Rstudio is an integrated development environment for R that allows users to i...SWAROOP KUMAR K
 
From Data to Knowledge thru Grailog Visualization
From Data to Knowledge thru Grailog VisualizationFrom Data to Knowledge thru Grailog Visualization
From Data to Knowledge thru Grailog Visualizationgiurca
 
Learning spark ch04 - Working with Key/Value Pairs
Learning spark ch04 - Working with Key/Value PairsLearning spark ch04 - Working with Key/Value Pairs
Learning spark ch04 - Working with Key/Value Pairsphanleson
 
Supplementary material for my following paper: Infinite Latent Process Decomp...
Supplementary material for my following paper: Infinite Latent Process Decomp...Supplementary material for my following paper: Infinite Latent Process Decomp...
Supplementary material for my following paper: Infinite Latent Process Decomp...Tomonari Masada
 

Ähnlich wie Compact Representation of Large RDF Data Sets for Publishing and Exchange (20)

Inductive Triple Graphs: A purely functional approach to represent RDF
Inductive Triple Graphs: A purely functional approach to represent RDFInductive Triple Graphs: A purely functional approach to represent RDF
Inductive Triple Graphs: A purely functional approach to represent RDF
 
Democratizing Big Semantic Data management
Democratizing Big Semantic Data managementDemocratizing Big Semantic Data management
Democratizing Big Semantic Data management
 
Getty Vocabulary Program LOD: Ontologies and Semantic Representation
Getty Vocabulary Program LOD: Ontologies and Semantic RepresentationGetty Vocabulary Program LOD: Ontologies and Semantic Representation
Getty Vocabulary Program LOD: Ontologies and Semantic Representation
 
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
 
Unit3 slides
Unit3 slidesUnit3 slides
Unit3 slides
 
Making Use of the Linked Data Cloud: The Role of Index Structures
Making Use of the Linked Data Cloud: The Role of Index StructuresMaking Use of the Linked Data Cloud: The Role of Index Structures
Making Use of the Linked Data Cloud: The Role of Index Structures
 
An Introduction To Map-Reduce
An Introduction To Map-ReduceAn Introduction To Map-Reduce
An Introduction To Map-Reduce
 
A middleware for storing massive RDF graphs into NoSQL
A middleware for storing massive RDF graphs into NoSQLA middleware for storing massive RDF graphs into NoSQL
A middleware for storing massive RDF graphs into NoSQL
 
3DRepo
3DRepo3DRepo
3DRepo
 
Image Similarity Detection at Scale Using LSH and Tensorflow with Andrey Gusev
Image Similarity Detection at Scale Using LSH and Tensorflow with Andrey GusevImage Similarity Detection at Scale Using LSH and Tensorflow with Andrey Gusev
Image Similarity Detection at Scale Using LSH and Tensorflow with Andrey Gusev
 
Graph Analytics with ArangoDB
Graph Analytics with ArangoDBGraph Analytics with ArangoDB
Graph Analytics with ArangoDB
 
introtorandrstudio.ppt
introtorandrstudio.pptintrotorandrstudio.ppt
introtorandrstudio.ppt
 
R Brown-bag seminars : Seminar-8
R Brown-bag seminars : Seminar-8R Brown-bag seminars : Seminar-8
R Brown-bag seminars : Seminar-8
 
THoSP: an Algorithm for Nesting Property Graphs
THoSP: an Algorithm for Nesting Property GraphsTHoSP: an Algorithm for Nesting Property Graphs
THoSP: an Algorithm for Nesting Property Graphs
 
Map-Side Merge Joins for Scalable SPARQL BGP Processing
Map-Side Merge Joins for Scalable SPARQL BGP ProcessingMap-Side Merge Joins for Scalable SPARQL BGP Processing
Map-Side Merge Joins for Scalable SPARQL BGP Processing
 
Rstudio is an integrated development environment for R that allows users to i...
Rstudio is an integrated development environment for R that allows users to i...Rstudio is an integrated development environment for R that allows users to i...
Rstudio is an integrated development environment for R that allows users to i...
 
Big Data & Hadoop
Big Data & HadoopBig Data & Hadoop
Big Data & Hadoop
 
From Data to Knowledge thru Grailog Visualization
From Data to Knowledge thru Grailog VisualizationFrom Data to Knowledge thru Grailog Visualization
From Data to Knowledge thru Grailog Visualization
 
Learning spark ch04 - Working with Key/Value Pairs
Learning spark ch04 - Working with Key/Value PairsLearning spark ch04 - Working with Key/Value Pairs
Learning spark ch04 - Working with Key/Value Pairs
 
Supplementary material for my following paper: Infinite Latent Process Decomp...
Supplementary material for my following paper: Infinite Latent Process Decomp...Supplementary material for my following paper: Infinite Latent Process Decomp...
Supplementary material for my following paper: Infinite Latent Process Decomp...
 

Compact Representation of Large RDF Data Sets for Publishing and Exchange