SlideShare ist ein Scribd-Unternehmen logo
1 von 65
Downloaden Sie, um offline zu lesen
Graph Databases &
data integration
Voxxed Days Athens 2018
Dimitris Kontokostas
Senior Knowledge Engineer @GeoPhy
About me
● Data geek, software engineer & open source enthusiast
● Involved in many R&D projects since 2003
● Participate(d) in graph-related standardization activities
● PhD in knowledge extraction and quality assessment
● Working on the GeoPhy Real Estate Knowledge Graph
Agenda
● Graphs
● RDF Graphs (*)
● Semantics & why they matter (*)
● Linked Data
● Who uses RDF
● How Google uses RDF
● How we (GeoPhy) uses RDF
(*)
Some concepts are simplified or skipped to make this talk easier to digest in the allocated time
Heatmap for Graph Databases
(*) See also this
Gartner study in 2013 found:
● many organizations find the
variety dimension a greater
challenge than volume or
velocity.
Graph DBs to the rescue:
● Combine multiple sources with
different structures
● Retain the flexibility to add
new ones without adapting
schemas
● Query combined data, or
multiple sources at once
● Detect patterns in the data
© Image by Max De Margi
● A graph is a way of specifying relationships among a collection of items
● Items can be:
○ Nodes: Alice, Bob, …
○ Edges
■ undirected: knows, …
■ directed: follows, …
○ Attributes: name, age, type, since, ...
○ Values: 18, 2001/10/13, ...
Graphs
Image source from wikimedia commons
Graph Data Models
Property graphs
● Industry standards
○ Cypher mainly Neo4j
○ Gremlin traversal API
(Apache TinkerPop)
=> Most common
○ GraphQL
● Data import / export using Cypher,
gremlin or vendor-specific
● Usually optimized for specific
operations / use cases
RDF Graphs
● W3C standards
○ Like XML, HTML, define once
run everywhere ™
● Standardised way for querying
(SPARQL), exporting & importing
(RDF)
Slide input from Andy Seaborn @VoxxedDays Bristol
Graph Databases Landscape
Property Graphs
Gremlin traversal API
RDF Graphs
SPARQL
Hybrid
Gremlin API + SPARQL
+Cypher
● Each node has
○ unique identifier
○ outgoing edges
○ incoming edges
○ key-value properties collection
● Each edge has
○ unique identifier
○ direction
○ label for the relationship
○ key-value properties collection
● Extreme flexibility
Property Graphs
RDF - Resource Description Framework
● An RDF Graph is a set of RDF Triples
● An RDF triple consists of only three components (simplified):
○ the subject which is a Thing
○ the predicate which is a (special) Thing
○ the object that can be either a Thing or a Literal (Value)
● Things are represented with URIs
● Literals have a value and a value type or a language tag (defaults to string)
Subject Predicate Object
RDF - Resource Description Framework
● An RDF Graph is a set of RDF Triples
● An RDF triple consists of only three components (simplified):
○ the subject which is a Thing
○ the predicate which is a (special) Thing
○ the object that can be either a Thing or a Literal (Value)
● Things are represented with URIs
● Literals have a value and a value type or a language tag (defaults to string)
Subject Predicate Object
RDF - Resource Description Framework
Depending on the serialization format, URIs can be abbreviated with namespaces
> just like XML
> Improves readability, e.g.
@prefix dbpedia: <http://dbpedia.org/resource/> .
@prefix schema: <http://schema.org/> .
Subject Predicate Object
RDF is an abstract data model
Many different serialization formats…
Turtle, NTriples, JSON-LD, XML, RDFa, Microdata*
RDF is an abstract data model
Many different serialization formats…
Turtle, NTriples, JSON-LD, XML, RDFa, Microdata*
@prefix dbpedia: <http://dbpedia.org/resource/> .
@prefix schema: <http://schema.org/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
dbpedia:Friends
schema:name "Friends"@en ;
schema:datePublished "1994-09-22"^^xsd:date ;
schema:numberOfSeasons 10 ;
schema:genre dbpedia:Sitcom .
dbpedia:The_Office
schema:name "The Office"@en ;
schema:genre dbpedia:Sitcom .
RDF is an abstract data model
Many different serialization formats…
Turtle, NTriples, JSON-LD, XML, RDFa, Microdata*
RDF is an abstract data model
Many different serialization formats…
Turtle, NTriples, JSON-LD, XML, RDFa, Microdata*
RDF is an abstract data model
Many different serialization formats…
Turtle, NTriples, JSON-LD, XML, RDFa, Microdata*
[Fun fact]
What does RSS stand for?
Rich Site Summary but...
Original name was: RDF Site Summary
Based on first versions of RDF/XML
See https://en.wikipedia.org/wiki/RSS
RDF is an abstract data model
Many different serialization formats…
Turtle, NTriples, JSON-LD, XML, RDFa, Microdata*
RDF is an abstract data model
Many different serialization formats…
Turtle, NTriples, JSON-LD, XML, RDFa, Microdata*
You can store RDF ...
In simple (text) files,
locally, remote, HDFS, ...
Embedded web documents
In graph databases
RDF & Graphs (Separate)
@prefix dbpedia: <http://dbpedia.org/resource/> .
@prefix schema: <http://schema.org/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
dbpedia:Friends
schema:numberOfSeasons 10 ;
schema:datePublished "1994-09-22"^^xsd:date ;
schema:genre dbpedia:Sitcom .
@prefix dbpedia: <http://dbpedia.org/resource/> .
@prefix schema: <http://schema.org/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
dbpedia:Friends schema:name "Friends"@en .
dbpedia:The_Office schema:name "The Office"@en .
/data/tvseries/labels.ttl
/data/tvseries/metadata.ttl
RDF & Graphs (merge)
File_all.ttl
Can you name of any
other format where files
can be merged without
losing data integrity?
CSV, SQL, XML, JSON, ...
@prefix dbpedia: <http://dbpedia.org/resource/> .
@prefix schema: <http://schema.org/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
dbpedia:Friends
schema:name "Friends"@en ;
schema:numberOfSeasons 10 ;
schema:datePublished "1994-09-22"^^xsd:date ;
schema:genre dbpedia:Sitcom .
dbpedia:The_Office
schema:name "The Office"@en ;
schema:genre dbpedia:Sitcom .
/data/tvseries.ttl
Datasets / multi-graph TriG files
@prefix dbpedia: <http://dbpedia.org/resource/> .
@prefix schema: <http://schema.org/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
<http://example.com/labels> {
dbpedia:Friends schema:name "Friends"@en ;
dbpedia:The_Office schema:name "The Office"@en ;
}
<http://example.com/metadata> {
dbpedia:Friends
schema:datePublished "1994-09-22"^^xsd:date ;
schema:numberOfSeasons 10 .
}
<http://example.com/genre> {
dbpedia:Friends schema:genre dbpedia:Sitcom .
dbpedia:The_Office schema:genre dbpedia:Sitcom .
}
/data/tvseries.trig
RDF is persistent, wherever it’s stored
RDF DB
Input
Files
Output
Files
Import
Export
Exactly
same (*)
(*)
The proper term is isomorphic graphs, to cover some special cases called blank nodes
Query
Big ecosystem
SPARQL: RDF query language
RDFS, OWL: RDF schema languages
SHACL, ShEx: RDF constraint languages
See http://book.validatingrdf.com (free online)
R2RML: Virtual RDF views on top of RDBMS (i.e. MySQL)
And many more specification & tools...
Takeaway points, so far...
RDF is a graph data model
> can be serialized in many formats
> identifiers are persistent by design
Natively store & integrates diverse data
RDF is kind of the new XML
> but it is much cooler...
> and you don’t need to write XML ;)
Semantics & RDF
Why they matter
Semantics & RDF
● RDF is a core part of the Semantic Web vision
● Semantics is defined as:
○ the meaning of something (word, phrase, text, etc)
○ the branch of linguistics and logic concerned with meaning
● Too academic?
“A Little Semantics Goes a Long Way”
by prof. J. Hendler
BuzzwordAlert!!!
RDF & Semantics
Ontologies are the results of modelling a specific domain
Some people prefer the terms: model, vocabulary, taxonomy, schema
(doesn’t make much difference)
Ontologies in RDF deal with classes & properties
> Some part is machine readable
> Some part is human readable
Can you tell which part is more important?
(... a more pragmatic view)
@prefix ex: <http://example.com/>
ex:TVSeries
rdf:type rdfs:Class ;
rdfs:comment “Series dedicated to TV broadcast” ;
rdfs:subClassOf ex:CreativeWork .
ex:CreativeWork
rdf:type rdfs:Class ;
rdfs:comment “A generic kind of creative work, i.e. books, movies, etc.” .
RDF Schema - Classes
Classes of Things
Machine-Readable
Semantics
Human-Readable
Semantics
… and we can assign types to Things
(i.e. “Friends” is an instance of “TVSeries”)
dbpedia:Friends rdf:type ex:TVSeries.
@prefix ex: <http://example.com/>
ex:actor
rdf:type rdf:Property ;
rdfs:comment “The person that is the actor of a TVSeries.” ;
rdfs:domain ex:TVSeries ;
rdfs:range ex:Person .
RDF Schema - Properties
Relationships between subjects and objects
Machine-Readable
Semantics
Human-Readable
Semantics
dbpedia:Friends ex:actor dbpedia:Jennifer_Aniston .
… and we can use this in RDF statements
to Infer or to Validate ?
Given only the following, what can we say about
dbpedia:Jennifer_Aniston and dbpedia:Friends ?
dbpedia:Jennifer_Aniston rdf:type ex:Person.
dbpedia:Friends rdf:type ex:TVSeries .
ex:actor
rdf:type rdf:Property ;
rdfs:domain ex:TVSeries ;
rdfs:range ex:Person.
dbpedia:Friends ex:actor dbpedia:Jennifer_Aniston .
to Infer or to Validate ?
Given only the following, what can we say ?
ex:actor
rdf:type rdf:Property ;
rdfs:domain ex:TVSeries ;
rdfs:range ex:Person.
ex:Dimitris rdf:type ex:Person .
ex:VoxxedDaysAthens rdf:type ex:Conference .
ex:VoxxedDaysAthens ex:actor ex:Dimitris .
Something is
not right…
ex:VoxxedDaysAthens
is not a ex:TVSeries
to Infer or to Validate ?
Given only the following, what can we say ?
ex:actor rdf:type rdf:Property ;
rdfs:domain ex:TVSeries ;
rdfs:range ex:Person.
ex:Dimitris rdf:type ex:Person .
dbpedia:Friends rdf:type ex:TVSeries .
dbpedia:Friends ex:actor ex:Dimitris .
Appears legit
Schema stored & queried as Data
ex:TVSeries
rdf:type rdfs:Class ;
rdfs:subClassOf ex:CreativeWork .
ex:BookSeries
rdf:type rdfs:Class ;
rdfs:subClassOf ex:CreativeWork .
ex:CreativeWork
rdf:type rdfs:Class .
dbpedia:Friends rdf:type ex:TVSeries.
dbpedia:The_Office rdf:type ex:TVSeries.
dbpedia:Narnia rdf:type ex:BookSeries.
SELECT ?s WHERE {
?s rdfs:subClassOf ex:CreativeWork .
}
ex:TVSeries, ex:BookSeries
SELECT ?s WHERE {
?s rdf:type ex:TVSeries .
}
dbpedia:Friends, dbpedia:The_Office
Schema stored & queried as Data
Navigates the
class hierarchy
SELECT ?s WHERE {
?s rdf:type/rdfs:subClassOf*
ex:CreativeWork }
dbpedia:Friends,
dbpedia:The_Office,
dbpedia:Narnia
Hierarchy can be
extended without
breaking the query
ex:TVSeries
rdf:type rdfs:Class ;
rdfs:subClassOf ex:CreativeWork .
ex:BookSeries
rdf:type rdfs:Class ;
rdfs:subClassOf ex:CreativeWork .
ex:CreativeWork
rdf:type rdfs:Class .
dbpedia:Friends rdf:type ex:TVSeries.
dbpedia:The_Office rdf:type ex:TVSeries.
dbpedia:Narnia rdf:type ex:BookSeries.
Many Available free Schemas
Many existing free (as in beer) ontologies (or schemas)
model different domains
> General purpose (DBpedia, schema.org)
> Geographical (geo)
> Provenance (prov-o)
> Taxonomies / Classification (SKOS family)
> Organizations (org)
> Find ~600 entries at http://lov.okfn.org
Reusing Available (Free) schemas
Get part of your data modeling for free
> Groups of people already worked on modeling the domain
> Spent time defining human and machine-readable semantics
Facilitates data integration easier
> Data published with common schemas
> Data easier to be consumed
Mapping to Available (Free) schemas
Map when not reusing
> integrate data in a loosely coupled way
ex:TVSeries owl:equivalentClass schema:TVSeries .
ex:actor owl:equivalentProperty schema:actor .
RDF & Semantics - take away points
It’s all about Classes & Properties
Human-readable semantics
> Commonly accepted modelling conventions
Machine-readable semantics
> Can be used for inference and/or validation
> Can be queried together with data
Reusing [or linking to] common ontologies / schemas
> Integrating data with less variety
> Network effect (the more people/data use it the better)
> Developing reusable applications against schemas
Linked Data & RDF
Given only this, can can we do/say?
<https://voxxeddays.com/athens> <https://schema.org/attendee> <http://kontokostas.com>.
schema:Event (domain) schema:Person (range)A person attending the event.
HTTPGET
<https://voxxeddays.com/athens>
rdf:type schema:Event;
schema:name “Voxxed Athens”;
schema:startDate “2018-06-01”;
schema:endDate “2018-06-02”;
schema:inLanguage “English”
schema:description “...”
HTTP GET
<http://kontokostas.com>
rdf:type schema:Person ;
schema:givenName “Dimitris” ;
schema:familyName “Kontokostas” ;
schema:birthPlace dbpedia:Greece ;
schema:jobTitle “Data Engineer” ;
schema:worksFor <https://geophy.com>.
HTTP GET
Follow your nose pattern
<http://kontokostas.com> <https://schema.org/birthPlace> <http://dbpedia.org/resource/Greece>.
schema:Person (domain) schema:Place (range)The place where the person was born.
HTTPGET
<http://kontokostas.com>
rdf:type schema:Person ;
schema:givenName “Dimitris” ;
schema:familyName “Kontokostas” ;
schema:birthPlace dbpedia:Greece ;
schema:jobTitle “Data Engineer” ;
schema:worksFor <https://geophy.com>.
HTTP GET
<http://dbpedia.org/resource/Greece>
rdf:type schema:Place, dbpedia:Country;
dbo:capital dbpedia:Athens;
dbo:currency dbpedia:Euro ;
geo:lat “39.0”^^xsd:float ;
geo:long “22.0”^^xsd:float .
HTTP GET
RDF & Linked Data
Things represented with http(s)-based URIs
can be self-published
HTTP GET requests on Things return RDF Triples
where it is a subject (or an object)
Decentralized storage / access / semantics
(*) a.k.a. the Web of Data, see TED talk from Tim Berners Lee (Creator of WWW)
RDF & Linked Data (on the web)
kontokostas.com
example.com
voxxeddays.com/At
hens
DBpedia
Web of Data DBpedia
DBpedia
DBpedia
Wikipedia
As RDF
RDF & Linked Data (on the enterprise)
Web of Data
RDF
DB x
LD x
RDF
DB y
LD y
RDF
DB z
LD z
LD w
Linked Open Data Cloud
Diagram from 2014
v2018 is too big
1.184 datasets
15.993 links
https://lod-cloud.net/
Reusing available datasets / identifiers
Just like reusing schemas, referencing / reusing external
identifiers, facilitates:
Data integration
e.g. dbpedia:Friends represents the Friends TV series, not some friends
> use dbpedia:Friends directly
> link it: ex:tv_series_123 owl:sameAs dbpedia:Friends
Data enrichment
e.g. dbpedia:Friends may have additional information about the series than our
database, and we can easily (http) get it
RDF & Linked Data - take away points
Decentralisation of Data Management
Self-documented schemas & data
Scale your [local] graphs to the [Enterprise] Web
Big pool of stable identifiers (i.e. DBpedia)
Pay as you go data integration
You can get benefit with low effort
> RDF views on top of RDBMS with R2RML (mappings, SPARQL 2 SQL translation)
> Convert XML/JSON/CSV/… to RDF with RML
The more time you invest the better the results
> Schema developement, mapping & linking
> Semi-automatically link discovery with tools like Limes & Silk
e.g.: ex:tv_series_123 owl:sameAs dbpedia:Friends
RDF does not need to be your master dataset
Who uses RDF
https://github.com/json-ld/json-ld.org/wiki/Users-of-JSON-LD
28% of TLD (or 39% of HTML pages)
> 3.7M Microdata
> 2.7M JSON-LD
> 1.2M RDFa
In total 9 billion Things & 38 billion RDF triples
Full report at http://webdatacommons.org/structureddata/#results-2017-1
Structured data on the web (Nov 2017)
RDF @ Google
RDF Ontology
> Less strict / formal
> Promotes JSON-LD
Funded & maintained
by all Search engines
drives many google
products...
Schema.org && Google && Search
https://developers.google.com/search/docs/guides/search-features
Google is...
Using the RDF graph model to integrate diverse
data from webpages & emails
By using the concept of Linked Data
And this is all empowered by a
common ontology (or schema)
RDF @ GeoPhy
GeoPhy provides value,
risk, & quality metrics
for every building in the world
RDF @GeoPhy
We collect & integrate a lot of data
> on properties, on its surroundings, and on the market conditions
Master dataset on Real Estate (aka Knowledge Graph)
> driving our Machine Learning / Deep Learning models
Challenges...
> We have thousands of sources,
> Sources are updated at arbitrary intervals
> We get our data in CSV, in the good days
And, of course…
we are not Google
to make people
write RDF for us :-)
Geophy Data Management Platform
CSV PDF
GeoPhy
Ontologies
Transform
To RDF
Validate
Identify &
Deduplicate
Conflict
resolution
Data Fusion
Data
Wrangling &
Extraction
Annotation &
Provenance
Modeling
Mapping
CoreDB
Provenance
(value-level)
Data Indexing
Data Ingestion
Data Enrichment
Dependency
Detection Geo
Enrichment Trigger ML/DL
API
And the closing slide...
People think RDF is a pain because it is complicated.
The truth is even worse.
RDF is painfully simplistic, but it allows you to work with
real-world data and problems that are horribly complicated.
While you can avoid RDF, it is harder to avoid complicated
data and complicated computer problems.
Dan Brickley, Schema.org and Google
Libby Miller, BBC
Thank you for your attention
Questions?
Many thanks to Sander, Matt and the whole GeoPhy Eng. Team for their feedback

Weitere ähnliche Inhalte

Was ist angesagt?

Semantic Variation Graphs the case for RDF & SPARQL
Semantic Variation Graphs the case for RDF & SPARQLSemantic Variation Graphs the case for RDF & SPARQL
Semantic Variation Graphs the case for RDF & SPARQLJerven Bolleman
 
Debunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative FactsDebunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative FactsNeo4j
 
Open data easy, explicit and fast
Open data easy, explicit and fastOpen data easy, explicit and fast
Open data easy, explicit and fastMetaSolutions AB
 
SHACL: Shaping the Big Ball of Data Mud
SHACL: Shaping the Big Ball of Data MudSHACL: Shaping the Big Ball of Data Mud
SHACL: Shaping the Big Ball of Data MudRichard Cyganiak
 
Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011
Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011
Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011François Scharffe
 
Theory behind Image Compression and Semantic Search
Theory behind Image Compression and Semantic SearchTheory behind Image Compression and Semantic Search
Theory behind Image Compression and Semantic SearchSanti Adavani
 
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...Fabrizio Orlandi
 
ELSE IF 2019: Porting the xEBR Taxonomy to a Linked Open Data compliant Format
ELSE IF 2019: Porting the xEBR Taxonomy to a Linked Open Data compliant FormatELSE IF 2019: Porting the xEBR Taxonomy to a Linked Open Data compliant Format
ELSE IF 2019: Porting the xEBR Taxonomy to a Linked Open Data compliant FormatPretaLLOD
 
Semantic Cartography: Using ontologies to create adaptable tools for text exp...
Semantic Cartography: Using ontologies to create adaptable tools for text exp...Semantic Cartography: Using ontologies to create adaptable tools for text exp...
Semantic Cartography: Using ontologies to create adaptable tools for text exp...andyashton
 
RDF SHACL, Annotations, and Data Frames
RDF SHACL, Annotations, and Data FramesRDF SHACL, Annotations, and Data Frames
RDF SHACL, Annotations, and Data FramesKurt Cagle
 
Indexing, searching, and aggregation with redi search and .net
Indexing, searching, and aggregation with redi search and .netIndexing, searching, and aggregation with redi search and .net
Indexing, searching, and aggregation with redi search and .netStephen Lorello
 
Semantic Web introduction
Semantic Web introductionSemantic Web introduction
Semantic Web introductionGraphity
 
Why is JSON-LD Important to Businesses - Franz Inc
Why is JSON-LD Important to Businesses - Franz IncWhy is JSON-LD Important to Businesses - Franz Inc
Why is JSON-LD Important to Businesses - Franz IncFranz Inc. - AllegroGraph
 

Was ist angesagt? (20)

Jesús Barrasa
Jesús BarrasaJesús Barrasa
Jesús Barrasa
 
Semantic Variation Graphs the case for RDF & SPARQL
Semantic Variation Graphs the case for RDF & SPARQLSemantic Variation Graphs the case for RDF & SPARQL
Semantic Variation Graphs the case for RDF & SPARQL
 
Debunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative FactsDebunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative Facts
 
Open data easy, explicit and fast
Open data easy, explicit and fastOpen data easy, explicit and fast
Open data easy, explicit and fast
 
JSON-LD and SHACL for Knowledge Graphs
JSON-LD and SHACL for Knowledge GraphsJSON-LD and SHACL for Knowledge Graphs
JSON-LD and SHACL for Knowledge Graphs
 
SHACL: Shaping the Big Ball of Data Mud
SHACL: Shaping the Big Ball of Data MudSHACL: Shaping the Big Ball of Data Mud
SHACL: Shaping the Big Ball of Data Mud
 
20110728 datalift-rpi-troy
20110728 datalift-rpi-troy20110728 datalift-rpi-troy
20110728 datalift-rpi-troy
 
Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011
Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011
Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011
 
Christian Jakenfelds
Christian JakenfeldsChristian Jakenfelds
Christian Jakenfelds
 
Theory behind Image Compression and Semantic Search
Theory behind Image Compression and Semantic SearchTheory behind Image Compression and Semantic Search
Theory behind Image Compression and Semantic Search
 
Presentation shexer
Presentation shexerPresentation shexer
Presentation shexer
 
RDF validation tutorial
RDF validation tutorialRDF validation tutorial
RDF validation tutorial
 
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
 
ELSE IF 2019: Porting the xEBR Taxonomy to a Linked Open Data compliant Format
ELSE IF 2019: Porting the xEBR Taxonomy to a Linked Open Data compliant FormatELSE IF 2019: Porting the xEBR Taxonomy to a Linked Open Data compliant Format
ELSE IF 2019: Porting the xEBR Taxonomy to a Linked Open Data compliant Format
 
Semantic Cartography: Using ontologies to create adaptable tools for text exp...
Semantic Cartography: Using ontologies to create adaptable tools for text exp...Semantic Cartography: Using ontologies to create adaptable tools for text exp...
Semantic Cartography: Using ontologies to create adaptable tools for text exp...
 
RDF SHACL, Annotations, and Data Frames
RDF SHACL, Annotations, and Data FramesRDF SHACL, Annotations, and Data Frames
RDF SHACL, Annotations, and Data Frames
 
Indexing, searching, and aggregation with redi search and .net
Indexing, searching, and aggregation with redi search and .netIndexing, searching, and aggregation with redi search and .net
Indexing, searching, and aggregation with redi search and .net
 
Semantic Web introduction
Semantic Web introductionSemantic Web introduction
Semantic Web introduction
 
What's New in RDF 1.1?
What's New in RDF 1.1?What's New in RDF 1.1?
What's New in RDF 1.1?
 
Why is JSON-LD Important to Businesses - Franz Inc
Why is JSON-LD Important to Businesses - Franz IncWhy is JSON-LD Important to Businesses - Franz Inc
Why is JSON-LD Important to Businesses - Franz Inc
 

Ähnlich wie Graph databases & data integration v2

Graph databases & data integration - the case of RDF
Graph databases & data integration - the case of RDFGraph databases & data integration - the case of RDF
Graph databases & data integration - the case of RDFDimitris Kontokostas
 
2011 4IZ440 Semantic Web – RDF, SPARQL, and software APIs
2011 4IZ440 Semantic Web – RDF, SPARQL, and software APIs2011 4IZ440 Semantic Web – RDF, SPARQL, and software APIs
2011 4IZ440 Semantic Web – RDF, SPARQL, and software APIsJosef Petrák
 
A Little SPARQL in your Analytics
A Little SPARQL in your AnalyticsA Little SPARQL in your Analytics
A Little SPARQL in your AnalyticsDr. Neil Brittliff
 
Slides semantic web and Drupal 7 NYCCamp 2012
Slides semantic web and Drupal 7 NYCCamp 2012Slides semantic web and Drupal 7 NYCCamp 2012
Slides semantic web and Drupal 7 NYCCamp 2012scorlosquet
 
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...Databricks
 
Find your way in Graph labyrinths
Find your way in Graph labyrinthsFind your way in Graph labyrinths
Find your way in Graph labyrinthsDaniel Camarda
 
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudFirst Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudOntotext
 
Infromation Reprentation, Structured Data and Semantics
Infromation Reprentation,Structured Data and SemanticsInfromation Reprentation,Structured Data and Semantics
Infromation Reprentation, Structured Data and SemanticsYogendra Tamang
 
Bringing the Semantic Web closer to reality: PostgreSQL as RDF Graph Database
Bringing the Semantic Web closer to reality: PostgreSQL as RDF Graph DatabaseBringing the Semantic Web closer to reality: PostgreSQL as RDF Graph Database
Bringing the Semantic Web closer to reality: PostgreSQL as RDF Graph DatabaseJimmy Angelakos
 
Introduction to RDFa
Introduction to RDFaIntroduction to RDFa
Introduction to RDFaIvan Herman
 
Understanding RDF: the Resource Description Framework in Context (1999)
Understanding RDF: the Resource Description Framework in Context  (1999)Understanding RDF: the Resource Description Framework in Context  (1999)
Understanding RDF: the Resource Description Framework in Context (1999)Dan Brickley
 
Rdf data-model-and-storage
Rdf data-model-and-storageRdf data-model-and-storage
Rdf data-model-and-storage灿辉 葛
 
RDFa: an introduction
RDFa: an introductionRDFa: an introduction
RDFa: an introductionKai Li
 
SemanticWeb Nuts 'n Bolts
SemanticWeb Nuts 'n BoltsSemanticWeb Nuts 'n Bolts
SemanticWeb Nuts 'n BoltsRinke Hoekstra
 
Syntax Reuse: XSLT as a Metalanguage for Knowledge Representation Languages
Syntax Reuse: XSLT as a Metalanguage for Knowledge Representation LanguagesSyntax Reuse: XSLT as a Metalanguage for Knowledge Representation Languages
Syntax Reuse: XSLT as a Metalanguage for Knowledge Representation LanguagesTara Athan
 

Ähnlich wie Graph databases & data integration v2 (20)

Graph databases & data integration - the case of RDF
Graph databases & data integration - the case of RDFGraph databases & data integration - the case of RDF
Graph databases & data integration - the case of RDF
 
2011 4IZ440 Semantic Web – RDF, SPARQL, and software APIs
2011 4IZ440 Semantic Web – RDF, SPARQL, and software APIs2011 4IZ440 Semantic Web – RDF, SPARQL, and software APIs
2011 4IZ440 Semantic Web – RDF, SPARQL, and software APIs
 
Danbri Drupalcon Export
Danbri Drupalcon ExportDanbri Drupalcon Export
Danbri Drupalcon Export
 
RDFa Tutorial
RDFa TutorialRDFa Tutorial
RDFa Tutorial
 
SWT Lecture Session 2 - RDF
SWT Lecture Session 2 - RDFSWT Lecture Session 2 - RDF
SWT Lecture Session 2 - RDF
 
A Little SPARQL in your Analytics
A Little SPARQL in your AnalyticsA Little SPARQL in your Analytics
A Little SPARQL in your Analytics
 
Slides semantic web and Drupal 7 NYCCamp 2012
Slides semantic web and Drupal 7 NYCCamp 2012Slides semantic web and Drupal 7 NYCCamp 2012
Slides semantic web and Drupal 7 NYCCamp 2012
 
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
 
Find your way in Graph labyrinths
Find your way in Graph labyrinthsFind your way in Graph labyrinths
Find your way in Graph labyrinths
 
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudFirst Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
 
Semantic Web talk TEMPLATE
Semantic Web talk TEMPLATESemantic Web talk TEMPLATE
Semantic Web talk TEMPLATE
 
Infromation Reprentation, Structured Data and Semantics
Infromation Reprentation,Structured Data and SemanticsInfromation Reprentation,Structured Data and Semantics
Infromation Reprentation, Structured Data and Semantics
 
Bringing the Semantic Web closer to reality: PostgreSQL as RDF Graph Database
Bringing the Semantic Web closer to reality: PostgreSQL as RDF Graph DatabaseBringing the Semantic Web closer to reality: PostgreSQL as RDF Graph Database
Bringing the Semantic Web closer to reality: PostgreSQL as RDF Graph Database
 
Introduction to RDFa
Introduction to RDFaIntroduction to RDFa
Introduction to RDFa
 
SWT Lecture Session 10 R2RML Part 1
SWT Lecture Session 10 R2RML Part 1SWT Lecture Session 10 R2RML Part 1
SWT Lecture Session 10 R2RML Part 1
 
Understanding RDF: the Resource Description Framework in Context (1999)
Understanding RDF: the Resource Description Framework in Context  (1999)Understanding RDF: the Resource Description Framework in Context  (1999)
Understanding RDF: the Resource Description Framework in Context (1999)
 
Rdf data-model-and-storage
Rdf data-model-and-storageRdf data-model-and-storage
Rdf data-model-and-storage
 
RDFa: an introduction
RDFa: an introductionRDFa: an introduction
RDFa: an introduction
 
SemanticWeb Nuts 'n Bolts
SemanticWeb Nuts 'n BoltsSemanticWeb Nuts 'n Bolts
SemanticWeb Nuts 'n Bolts
 
Syntax Reuse: XSLT as a Metalanguage for Knowledge Representation Languages
Syntax Reuse: XSLT as a Metalanguage for Knowledge Representation LanguagesSyntax Reuse: XSLT as a Metalanguage for Knowledge Representation Languages
Syntax Reuse: XSLT as a Metalanguage for Knowledge Representation Languages
 

Mehr von Dimitris Kontokostas

Data quality assessment - connecting the pieces...
Data quality assessment - connecting the pieces...Data quality assessment - connecting the pieces...
Data quality assessment - connecting the pieces...Dimitris Kontokostas
 
8th DBpedia meeting / California 2016
8th DBpedia meeting /  California 20168th DBpedia meeting /  California 2016
8th DBpedia meeting / California 2016Dimitris Kontokostas
 
Semantically enhanced quality assurance in the jurion business use case
Semantically enhanced quality assurance in the jurion  business use caseSemantically enhanced quality assurance in the jurion  business use case
Semantically enhanced quality assurance in the jurion business use caseDimitris Kontokostas
 
DBpedia+ / DBpedia meeting in Dublin
DBpedia+ / DBpedia meeting in DublinDBpedia+ / DBpedia meeting in Dublin
DBpedia+ / DBpedia meeting in DublinDimitris Kontokostas
 
NLP Data Cleansing Based on Linguistic Ontology Constraints
NLP Data Cleansing Based on Linguistic Ontology ConstraintsNLP Data Cleansing Based on Linguistic Ontology Constraints
NLP Data Cleansing Based on Linguistic Ontology ConstraintsDimitris Kontokostas
 
RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)
RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)
RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)Dimitris Kontokostas
 
DBpedia i18n - Amsterdam Meeting (30/01/2014)
DBpedia i18n - Amsterdam Meeting (30/01/2014)DBpedia i18n - Amsterdam Meeting (30/01/2014)
DBpedia i18n - Amsterdam Meeting (30/01/2014)Dimitris Kontokostas
 

Mehr von Dimitris Kontokostas (11)

Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
 
Data quality assessment - connecting the pieces...
Data quality assessment - connecting the pieces...Data quality assessment - connecting the pieces...
Data quality assessment - connecting the pieces...
 
8th DBpedia meeting / California 2016
8th DBpedia meeting /  California 20168th DBpedia meeting /  California 2016
8th DBpedia meeting / California 2016
 
Semantically enhanced quality assurance in the jurion business use case
Semantically enhanced quality assurance in the jurion  business use caseSemantically enhanced quality assurance in the jurion  business use case
Semantically enhanced quality assurance in the jurion business use case
 
DBpedia past, present & future
DBpedia past, present & futureDBpedia past, present & future
DBpedia past, present & future
 
DBpedia+ / DBpedia meeting in Dublin
DBpedia+ / DBpedia meeting in DublinDBpedia+ / DBpedia meeting in Dublin
DBpedia+ / DBpedia meeting in Dublin
 
DBpedia ♥ Commons
DBpedia ♥ CommonsDBpedia ♥ Commons
DBpedia ♥ Commons
 
NLP Data Cleansing Based on Linguistic Ontology Constraints
NLP Data Cleansing Based on Linguistic Ontology ConstraintsNLP Data Cleansing Based on Linguistic Ontology Constraints
NLP Data Cleansing Based on Linguistic Ontology Constraints
 
RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)
RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)
RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)
 
DBpedia Viewer - LDOW 2014
DBpedia Viewer - LDOW 2014DBpedia Viewer - LDOW 2014
DBpedia Viewer - LDOW 2014
 
DBpedia i18n - Amsterdam Meeting (30/01/2014)
DBpedia i18n - Amsterdam Meeting (30/01/2014)DBpedia i18n - Amsterdam Meeting (30/01/2014)
DBpedia i18n - Amsterdam Meeting (30/01/2014)
 

Kürzlich hochgeladen

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 

Kürzlich hochgeladen (20)

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 

Graph databases & data integration v2

  • 1. Graph Databases & data integration Voxxed Days Athens 2018 Dimitris Kontokostas Senior Knowledge Engineer @GeoPhy
  • 2. About me ● Data geek, software engineer & open source enthusiast ● Involved in many R&D projects since 2003 ● Participate(d) in graph-related standardization activities ● PhD in knowledge extraction and quality assessment ● Working on the GeoPhy Real Estate Knowledge Graph
  • 3. Agenda ● Graphs ● RDF Graphs (*) ● Semantics & why they matter (*) ● Linked Data ● Who uses RDF ● How Google uses RDF ● How we (GeoPhy) uses RDF (*) Some concepts are simplified or skipped to make this talk easier to digest in the allocated time
  • 4.
  • 5. Heatmap for Graph Databases (*) See also this Gartner study in 2013 found: ● many organizations find the variety dimension a greater challenge than volume or velocity. Graph DBs to the rescue: ● Combine multiple sources with different structures ● Retain the flexibility to add new ones without adapting schemas ● Query combined data, or multiple sources at once ● Detect patterns in the data
  • 6. © Image by Max De Margi
  • 7. ● A graph is a way of specifying relationships among a collection of items ● Items can be: ○ Nodes: Alice, Bob, … ○ Edges ■ undirected: knows, … ■ directed: follows, … ○ Attributes: name, age, type, since, ... ○ Values: 18, 2001/10/13, ... Graphs Image source from wikimedia commons
  • 8. Graph Data Models Property graphs ● Industry standards ○ Cypher mainly Neo4j ○ Gremlin traversal API (Apache TinkerPop) => Most common ○ GraphQL ● Data import / export using Cypher, gremlin or vendor-specific ● Usually optimized for specific operations / use cases RDF Graphs ● W3C standards ○ Like XML, HTML, define once run everywhere ™ ● Standardised way for querying (SPARQL), exporting & importing (RDF) Slide input from Andy Seaborn @VoxxedDays Bristol
  • 9. Graph Databases Landscape Property Graphs Gremlin traversal API RDF Graphs SPARQL Hybrid Gremlin API + SPARQL +Cypher
  • 10. ● Each node has ○ unique identifier ○ outgoing edges ○ incoming edges ○ key-value properties collection ● Each edge has ○ unique identifier ○ direction ○ label for the relationship ○ key-value properties collection ● Extreme flexibility Property Graphs
  • 11. RDF - Resource Description Framework ● An RDF Graph is a set of RDF Triples ● An RDF triple consists of only three components (simplified): ○ the subject which is a Thing ○ the predicate which is a (special) Thing ○ the object that can be either a Thing or a Literal (Value) ● Things are represented with URIs ● Literals have a value and a value type or a language tag (defaults to string) Subject Predicate Object
  • 12. RDF - Resource Description Framework ● An RDF Graph is a set of RDF Triples ● An RDF triple consists of only three components (simplified): ○ the subject which is a Thing ○ the predicate which is a (special) Thing ○ the object that can be either a Thing or a Literal (Value) ● Things are represented with URIs ● Literals have a value and a value type or a language tag (defaults to string) Subject Predicate Object
  • 13. RDF - Resource Description Framework Depending on the serialization format, URIs can be abbreviated with namespaces > just like XML > Improves readability, e.g. @prefix dbpedia: <http://dbpedia.org/resource/> . @prefix schema: <http://schema.org/> . Subject Predicate Object
  • 14. RDF is an abstract data model Many different serialization formats… Turtle, NTriples, JSON-LD, XML, RDFa, Microdata*
  • 15. RDF is an abstract data model Many different serialization formats… Turtle, NTriples, JSON-LD, XML, RDFa, Microdata* @prefix dbpedia: <http://dbpedia.org/resource/> . @prefix schema: <http://schema.org/> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . dbpedia:Friends schema:name "Friends"@en ; schema:datePublished "1994-09-22"^^xsd:date ; schema:numberOfSeasons 10 ; schema:genre dbpedia:Sitcom . dbpedia:The_Office schema:name "The Office"@en ; schema:genre dbpedia:Sitcom .
  • 16. RDF is an abstract data model Many different serialization formats… Turtle, NTriples, JSON-LD, XML, RDFa, Microdata*
  • 17. RDF is an abstract data model Many different serialization formats… Turtle, NTriples, JSON-LD, XML, RDFa, Microdata*
  • 18. RDF is an abstract data model Many different serialization formats… Turtle, NTriples, JSON-LD, XML, RDFa, Microdata*
  • 19. [Fun fact] What does RSS stand for? Rich Site Summary but... Original name was: RDF Site Summary Based on first versions of RDF/XML See https://en.wikipedia.org/wiki/RSS
  • 20. RDF is an abstract data model Many different serialization formats… Turtle, NTriples, JSON-LD, XML, RDFa, Microdata*
  • 21. RDF is an abstract data model Many different serialization formats… Turtle, NTriples, JSON-LD, XML, RDFa, Microdata*
  • 22. You can store RDF ... In simple (text) files, locally, remote, HDFS, ... Embedded web documents In graph databases
  • 23. RDF & Graphs (Separate) @prefix dbpedia: <http://dbpedia.org/resource/> . @prefix schema: <http://schema.org/> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . dbpedia:Friends schema:numberOfSeasons 10 ; schema:datePublished "1994-09-22"^^xsd:date ; schema:genre dbpedia:Sitcom . @prefix dbpedia: <http://dbpedia.org/resource/> . @prefix schema: <http://schema.org/> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . dbpedia:Friends schema:name "Friends"@en . dbpedia:The_Office schema:name "The Office"@en . /data/tvseries/labels.ttl /data/tvseries/metadata.ttl
  • 24. RDF & Graphs (merge) File_all.ttl Can you name of any other format where files can be merged without losing data integrity? CSV, SQL, XML, JSON, ... @prefix dbpedia: <http://dbpedia.org/resource/> . @prefix schema: <http://schema.org/> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . dbpedia:Friends schema:name "Friends"@en ; schema:numberOfSeasons 10 ; schema:datePublished "1994-09-22"^^xsd:date ; schema:genre dbpedia:Sitcom . dbpedia:The_Office schema:name "The Office"@en ; schema:genre dbpedia:Sitcom . /data/tvseries.ttl
  • 25. Datasets / multi-graph TriG files @prefix dbpedia: <http://dbpedia.org/resource/> . @prefix schema: <http://schema.org/> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . <http://example.com/labels> { dbpedia:Friends schema:name "Friends"@en ; dbpedia:The_Office schema:name "The Office"@en ; } <http://example.com/metadata> { dbpedia:Friends schema:datePublished "1994-09-22"^^xsd:date ; schema:numberOfSeasons 10 . } <http://example.com/genre> { dbpedia:Friends schema:genre dbpedia:Sitcom . dbpedia:The_Office schema:genre dbpedia:Sitcom . } /data/tvseries.trig
  • 26. RDF is persistent, wherever it’s stored RDF DB Input Files Output Files Import Export Exactly same (*) (*) The proper term is isomorphic graphs, to cover some special cases called blank nodes Query
  • 27. Big ecosystem SPARQL: RDF query language RDFS, OWL: RDF schema languages SHACL, ShEx: RDF constraint languages See http://book.validatingrdf.com (free online) R2RML: Virtual RDF views on top of RDBMS (i.e. MySQL) And many more specification & tools...
  • 28. Takeaway points, so far... RDF is a graph data model > can be serialized in many formats > identifiers are persistent by design Natively store & integrates diverse data RDF is kind of the new XML > but it is much cooler... > and you don’t need to write XML ;)
  • 29. Semantics & RDF Why they matter
  • 30. Semantics & RDF ● RDF is a core part of the Semantic Web vision ● Semantics is defined as: ○ the meaning of something (word, phrase, text, etc) ○ the branch of linguistics and logic concerned with meaning ● Too academic? “A Little Semantics Goes a Long Way” by prof. J. Hendler BuzzwordAlert!!!
  • 31. RDF & Semantics Ontologies are the results of modelling a specific domain Some people prefer the terms: model, vocabulary, taxonomy, schema (doesn’t make much difference) Ontologies in RDF deal with classes & properties > Some part is machine readable > Some part is human readable Can you tell which part is more important? (... a more pragmatic view)
  • 32. @prefix ex: <http://example.com/> ex:TVSeries rdf:type rdfs:Class ; rdfs:comment “Series dedicated to TV broadcast” ; rdfs:subClassOf ex:CreativeWork . ex:CreativeWork rdf:type rdfs:Class ; rdfs:comment “A generic kind of creative work, i.e. books, movies, etc.” . RDF Schema - Classes Classes of Things Machine-Readable Semantics Human-Readable Semantics … and we can assign types to Things (i.e. “Friends” is an instance of “TVSeries”) dbpedia:Friends rdf:type ex:TVSeries.
  • 33. @prefix ex: <http://example.com/> ex:actor rdf:type rdf:Property ; rdfs:comment “The person that is the actor of a TVSeries.” ; rdfs:domain ex:TVSeries ; rdfs:range ex:Person . RDF Schema - Properties Relationships between subjects and objects Machine-Readable Semantics Human-Readable Semantics dbpedia:Friends ex:actor dbpedia:Jennifer_Aniston . … and we can use this in RDF statements
  • 34. to Infer or to Validate ? Given only the following, what can we say about dbpedia:Jennifer_Aniston and dbpedia:Friends ? dbpedia:Jennifer_Aniston rdf:type ex:Person. dbpedia:Friends rdf:type ex:TVSeries . ex:actor rdf:type rdf:Property ; rdfs:domain ex:TVSeries ; rdfs:range ex:Person. dbpedia:Friends ex:actor dbpedia:Jennifer_Aniston .
  • 35. to Infer or to Validate ? Given only the following, what can we say ? ex:actor rdf:type rdf:Property ; rdfs:domain ex:TVSeries ; rdfs:range ex:Person. ex:Dimitris rdf:type ex:Person . ex:VoxxedDaysAthens rdf:type ex:Conference . ex:VoxxedDaysAthens ex:actor ex:Dimitris . Something is not right… ex:VoxxedDaysAthens is not a ex:TVSeries
  • 36. to Infer or to Validate ? Given only the following, what can we say ? ex:actor rdf:type rdf:Property ; rdfs:domain ex:TVSeries ; rdfs:range ex:Person. ex:Dimitris rdf:type ex:Person . dbpedia:Friends rdf:type ex:TVSeries . dbpedia:Friends ex:actor ex:Dimitris . Appears legit
  • 37. Schema stored & queried as Data ex:TVSeries rdf:type rdfs:Class ; rdfs:subClassOf ex:CreativeWork . ex:BookSeries rdf:type rdfs:Class ; rdfs:subClassOf ex:CreativeWork . ex:CreativeWork rdf:type rdfs:Class . dbpedia:Friends rdf:type ex:TVSeries. dbpedia:The_Office rdf:type ex:TVSeries. dbpedia:Narnia rdf:type ex:BookSeries. SELECT ?s WHERE { ?s rdfs:subClassOf ex:CreativeWork . } ex:TVSeries, ex:BookSeries SELECT ?s WHERE { ?s rdf:type ex:TVSeries . } dbpedia:Friends, dbpedia:The_Office
  • 38. Schema stored & queried as Data Navigates the class hierarchy SELECT ?s WHERE { ?s rdf:type/rdfs:subClassOf* ex:CreativeWork } dbpedia:Friends, dbpedia:The_Office, dbpedia:Narnia Hierarchy can be extended without breaking the query ex:TVSeries rdf:type rdfs:Class ; rdfs:subClassOf ex:CreativeWork . ex:BookSeries rdf:type rdfs:Class ; rdfs:subClassOf ex:CreativeWork . ex:CreativeWork rdf:type rdfs:Class . dbpedia:Friends rdf:type ex:TVSeries. dbpedia:The_Office rdf:type ex:TVSeries. dbpedia:Narnia rdf:type ex:BookSeries.
  • 39. Many Available free Schemas Many existing free (as in beer) ontologies (or schemas) model different domains > General purpose (DBpedia, schema.org) > Geographical (geo) > Provenance (prov-o) > Taxonomies / Classification (SKOS family) > Organizations (org) > Find ~600 entries at http://lov.okfn.org
  • 40. Reusing Available (Free) schemas Get part of your data modeling for free > Groups of people already worked on modeling the domain > Spent time defining human and machine-readable semantics Facilitates data integration easier > Data published with common schemas > Data easier to be consumed
  • 41. Mapping to Available (Free) schemas Map when not reusing > integrate data in a loosely coupled way ex:TVSeries owl:equivalentClass schema:TVSeries . ex:actor owl:equivalentProperty schema:actor .
  • 42. RDF & Semantics - take away points It’s all about Classes & Properties Human-readable semantics > Commonly accepted modelling conventions Machine-readable semantics > Can be used for inference and/or validation > Can be queried together with data Reusing [or linking to] common ontologies / schemas > Integrating data with less variety > Network effect (the more people/data use it the better) > Developing reusable applications against schemas
  • 44. Given only this, can can we do/say? <https://voxxeddays.com/athens> <https://schema.org/attendee> <http://kontokostas.com>. schema:Event (domain) schema:Person (range)A person attending the event. HTTPGET <https://voxxeddays.com/athens> rdf:type schema:Event; schema:name “Voxxed Athens”; schema:startDate “2018-06-01”; schema:endDate “2018-06-02”; schema:inLanguage “English” schema:description “...” HTTP GET <http://kontokostas.com> rdf:type schema:Person ; schema:givenName “Dimitris” ; schema:familyName “Kontokostas” ; schema:birthPlace dbpedia:Greece ; schema:jobTitle “Data Engineer” ; schema:worksFor <https://geophy.com>. HTTP GET
  • 45. Follow your nose pattern <http://kontokostas.com> <https://schema.org/birthPlace> <http://dbpedia.org/resource/Greece>. schema:Person (domain) schema:Place (range)The place where the person was born. HTTPGET <http://kontokostas.com> rdf:type schema:Person ; schema:givenName “Dimitris” ; schema:familyName “Kontokostas” ; schema:birthPlace dbpedia:Greece ; schema:jobTitle “Data Engineer” ; schema:worksFor <https://geophy.com>. HTTP GET <http://dbpedia.org/resource/Greece> rdf:type schema:Place, dbpedia:Country; dbo:capital dbpedia:Athens; dbo:currency dbpedia:Euro ; geo:lat “39.0”^^xsd:float ; geo:long “22.0”^^xsd:float . HTTP GET
  • 46. RDF & Linked Data Things represented with http(s)-based URIs can be self-published HTTP GET requests on Things return RDF Triples where it is a subject (or an object) Decentralized storage / access / semantics (*) a.k.a. the Web of Data, see TED talk from Tim Berners Lee (Creator of WWW)
  • 47. RDF & Linked Data (on the web) kontokostas.com example.com voxxeddays.com/At hens DBpedia Web of Data DBpedia DBpedia DBpedia Wikipedia As RDF
  • 48. RDF & Linked Data (on the enterprise) Web of Data RDF DB x LD x RDF DB y LD y RDF DB z LD z LD w
  • 49. Linked Open Data Cloud Diagram from 2014 v2018 is too big 1.184 datasets 15.993 links https://lod-cloud.net/
  • 50. Reusing available datasets / identifiers Just like reusing schemas, referencing / reusing external identifiers, facilitates: Data integration e.g. dbpedia:Friends represents the Friends TV series, not some friends > use dbpedia:Friends directly > link it: ex:tv_series_123 owl:sameAs dbpedia:Friends Data enrichment e.g. dbpedia:Friends may have additional information about the series than our database, and we can easily (http) get it
  • 51. RDF & Linked Data - take away points Decentralisation of Data Management Self-documented schemas & data Scale your [local] graphs to the [Enterprise] Web Big pool of stable identifiers (i.e. DBpedia)
  • 52. Pay as you go data integration You can get benefit with low effort > RDF views on top of RDBMS with R2RML (mappings, SPARQL 2 SQL translation) > Convert XML/JSON/CSV/… to RDF with RML The more time you invest the better the results > Schema developement, mapping & linking > Semi-automatically link discovery with tools like Limes & Silk e.g.: ex:tv_series_123 owl:sameAs dbpedia:Friends RDF does not need to be your master dataset
  • 54. 28% of TLD (or 39% of HTML pages) > 3.7M Microdata > 2.7M JSON-LD > 1.2M RDFa In total 9 billion Things & 38 billion RDF triples Full report at http://webdatacommons.org/structureddata/#results-2017-1 Structured data on the web (Nov 2017)
  • 56. RDF Ontology > Less strict / formal > Promotes JSON-LD Funded & maintained by all Search engines drives many google products...
  • 57.
  • 58. Schema.org && Google && Search https://developers.google.com/search/docs/guides/search-features
  • 59. Google is... Using the RDF graph model to integrate diverse data from webpages & emails By using the concept of Linked Data And this is all empowered by a common ontology (or schema)
  • 61. GeoPhy provides value, risk, & quality metrics for every building in the world
  • 62. RDF @GeoPhy We collect & integrate a lot of data > on properties, on its surroundings, and on the market conditions Master dataset on Real Estate (aka Knowledge Graph) > driving our Machine Learning / Deep Learning models Challenges... > We have thousands of sources, > Sources are updated at arbitrary intervals > We get our data in CSV, in the good days And, of course… we are not Google to make people write RDF for us :-)
  • 63. Geophy Data Management Platform CSV PDF GeoPhy Ontologies Transform To RDF Validate Identify & Deduplicate Conflict resolution Data Fusion Data Wrangling & Extraction Annotation & Provenance Modeling Mapping CoreDB Provenance (value-level) Data Indexing Data Ingestion Data Enrichment Dependency Detection Geo Enrichment Trigger ML/DL API
  • 64. And the closing slide... People think RDF is a pain because it is complicated. The truth is even worse. RDF is painfully simplistic, but it allows you to work with real-world data and problems that are horribly complicated. While you can avoid RDF, it is harder to avoid complicated data and complicated computer problems. Dan Brickley, Schema.org and Google Libby Miller, BBC
  • 65. Thank you for your attention Questions? Many thanks to Sander, Matt and the whole GeoPhy Eng. Team for their feedback