SlideShare ist ein Scribd-Unternehmen logo
1 von 14
Downloaden Sie, um offline zu lesen
Data Quality
In Real Estate
Dimitris Kontokostas, Andy van der Hoeven, Samur Araujo
Amsterdam, Sep 14th 2017, LDQ Workshop, SEMANTiCS Conference
About Geophy
● Goal to map all buildings in the world
● Provide a quality score for each building
○ Based on location, building status, history, environmental metrics, etc
● Semantic platform
○ RDF eases the data integration process
● Team of 45 with aim to double by next year
Real Estate is a very complex domain
Really!
Possible constraints on addresses?
● An address will start with, or at least include, a building number.
● When there is a building number, it will be all-numeric.
● No buildings are numbered zero
● Well, at the very least no buildings have negative numbers
● A building number will only be used once per street
● A building will only have one number
● A building name won't also be a number
● [...] https://www.mjt.me.uk/posts/falsehoods-programmers-believe-about-addresses
Geophy [set of] ontologies
● 13 ontologies (+ 9 external)
● 125 Classes
○ Buildings
○ Addresses
○ Companies
○ [...]
● 720 properties
○ 500 datatype
○ 160 relation properties
● Growing...
Quality is expensive
● Quality of source data
○ Free, open, closed data sources, etc.
● Data clean up process
○ Violations, deduplication, precision, etc.
○ How much time and effort can one afford?
How much quality is good enough?
Fitness for use
Quality of ...
● Source data
○ Accuracy of the source
● Translation of source data
○ RDF mappings, rml, d2rq, scripts etc.
● Model design
○ Modelling quality
○ Data fitting on schema
● Model definition
○ Mapping of model on RDFS, OWL, ShEx|SHACL Shapes, etc
○ Semantics i.e RDFS, OWL DL/RL/FULL, etc
Evolution & quality
Data evolves
so do ontologies
so do RDF mappings
so does code
so do SPARQL queries
so do constraints
http://aligned-project.eu
Scaling quality ...
● Thousands of triples
● Millions of triples
● Billions of triples
● ?
Try to move validation in the K range (when possible)
Validate closer to the source
Validate the model
Validate the RDF mappings
Validate RDF mapping excerpts
Validate instance data
Automate, automate & automate
Can you spot the error?
rdfs:label ⇒ rdf:langString
:foo rdfs:label ″foo @en″ .
Automate, automate & automate
Can you spot the error?
rdfs:label ⇒ rdf:langString
:foo rdfs:label ″foo @en″ .
:foo rdfs:label ″foo″@en .
CI/CD is your buddy
● Integrate validation with your CI/CD
○ Choose tools & technologies wisely
○ Jenkins, Travis, Gitlab, TeamCity
● Fail the build until data issues are fixed
● Data integration validation checks
○ Standalone datasets can pass CI
Thank you for your attention
Questions?

Weitere ähnliche Inhalte

Was ist angesagt?

Debunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative FactsDebunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative FactsNeo4j
 
A Graph is a Graph is a Graph: Equivalence, Transformation, and Composition o...
A Graph is a Graph is a Graph: Equivalence, Transformation, and Composition o...A Graph is a Graph is a Graph: Equivalence, Transformation, and Composition o...
A Graph is a Graph is a Graph: Equivalence, Transformation, and Composition o...Joshua Shinavier
 
Semantic Cartography: Using ontologies to create adaptable tools for text exp...
Semantic Cartography: Using ontologies to create adaptable tools for text exp...Semantic Cartography: Using ontologies to create adaptable tools for text exp...
Semantic Cartography: Using ontologies to create adaptable tools for text exp...andyashton
 
An Algebraic Data Model for Graphs and Hypergraphs (Category Theory meetup, N...
An Algebraic Data Model for Graphs and Hypergraphs (Category Theory meetup, N...An Algebraic Data Model for Graphs and Hypergraphs (Category Theory meetup, N...
An Algebraic Data Model for Graphs and Hypergraphs (Category Theory meetup, N...Joshua Shinavier
 
Theory behind Image Compression and Semantic Search
Theory behind Image Compression and Semantic SearchTheory behind Image Compression and Semantic Search
Theory behind Image Compression and Semantic SearchSanti Adavani
 
Cogapp Open Studios 2012 - Adventures with Linked Data
Cogapp Open Studios 2012 - Adventures with Linked DataCogapp Open Studios 2012 - Adventures with Linked Data
Cogapp Open Studios 2012 - Adventures with Linked DataCogapp
 
Managing RDF data with graph databases
Managing RDF data with graph databasesManaging RDF data with graph databases
Managing RDF data with graph databasesGraph-TA
 
Open data easy, explicit and fast
Open data easy, explicit and fastOpen data easy, explicit and fast
Open data easy, explicit and fastMetaSolutions AB
 
Why is JSON-LD Important to Businesses - Franz Inc
Why is JSON-LD Important to Businesses - Franz IncWhy is JSON-LD Important to Businesses - Franz Inc
Why is JSON-LD Important to Businesses - Franz IncFranz Inc. - AllegroGraph
 
Dirk Goldhahn: Introduction to the German Wortschatz Project
Dirk Goldhahn: Introduction to the German Wortschatz ProjectDirk Goldhahn: Introduction to the German Wortschatz Project
Dirk Goldhahn: Introduction to the German Wortschatz Projectmbruemmer
 
Indexing, searching, and aggregation with redi search and .net
Indexing, searching, and aggregation with redi search and .netIndexing, searching, and aggregation with redi search and .net
Indexing, searching, and aggregation with redi search and .netStephen Lorello
 
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...Fabrizio Orlandi
 
Deriving an Emergent Relational Schema from RDF Data
Deriving an Emergent Relational Schema from RDF DataDeriving an Emergent Relational Schema from RDF Data
Deriving an Emergent Relational Schema from RDF DataGraph-TA
 
Semantics 2017 - Trying Not to Die Benchmarking using LITMUS
Semantics 2017 - Trying Not to Die Benchmarking using LITMUSSemantics 2017 - Trying Not to Die Benchmarking using LITMUS
Semantics 2017 - Trying Not to Die Benchmarking using LITMUSHarsh Thakkar
 

Was ist angesagt? (20)

Debunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative FactsDebunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative Facts
 
JSON-LD and SHACL for Knowledge Graphs
JSON-LD and SHACL for Knowledge GraphsJSON-LD and SHACL for Knowledge Graphs
JSON-LD and SHACL for Knowledge Graphs
 
TinkerPop 2020
TinkerPop 2020TinkerPop 2020
TinkerPop 2020
 
A Graph is a Graph is a Graph: Equivalence, Transformation, and Composition o...
A Graph is a Graph is a Graph: Equivalence, Transformation, and Composition o...A Graph is a Graph is a Graph: Equivalence, Transformation, and Composition o...
A Graph is a Graph is a Graph: Equivalence, Transformation, and Composition o...
 
Semantic Cartography: Using ontologies to create adaptable tools for text exp...
Semantic Cartography: Using ontologies to create adaptable tools for text exp...Semantic Cartography: Using ontologies to create adaptable tools for text exp...
Semantic Cartography: Using ontologies to create adaptable tools for text exp...
 
An Algebraic Data Model for Graphs and Hypergraphs (Category Theory meetup, N...
An Algebraic Data Model for Graphs and Hypergraphs (Category Theory meetup, N...An Algebraic Data Model for Graphs and Hypergraphs (Category Theory meetup, N...
An Algebraic Data Model for Graphs and Hypergraphs (Category Theory meetup, N...
 
Theory behind Image Compression and Semantic Search
Theory behind Image Compression and Semantic SearchTheory behind Image Compression and Semantic Search
Theory behind Image Compression and Semantic Search
 
Christian Jakenfelds
Christian JakenfeldsChristian Jakenfelds
Christian Jakenfelds
 
LD4KD 2015 - Demos and tools
LD4KD 2015 - Demos and toolsLD4KD 2015 - Demos and tools
LD4KD 2015 - Demos and tools
 
Cogapp Open Studios 2012 - Adventures with Linked Data
Cogapp Open Studios 2012 - Adventures with Linked DataCogapp Open Studios 2012 - Adventures with Linked Data
Cogapp Open Studios 2012 - Adventures with Linked Data
 
Managing RDF data with graph databases
Managing RDF data with graph databasesManaging RDF data with graph databases
Managing RDF data with graph databases
 
Open data easy, explicit and fast
Open data easy, explicit and fastOpen data easy, explicit and fast
Open data easy, explicit and fast
 
Why is JSON-LD Important to Businesses - Franz Inc
Why is JSON-LD Important to Businesses - Franz IncWhy is JSON-LD Important to Businesses - Franz Inc
Why is JSON-LD Important to Businesses - Franz Inc
 
Presentation shexer
Presentation shexerPresentation shexer
Presentation shexer
 
NoSQL Roundup
NoSQL RoundupNoSQL Roundup
NoSQL Roundup
 
Dirk Goldhahn: Introduction to the German Wortschatz Project
Dirk Goldhahn: Introduction to the German Wortschatz ProjectDirk Goldhahn: Introduction to the German Wortschatz Project
Dirk Goldhahn: Introduction to the German Wortschatz Project
 
Indexing, searching, and aggregation with redi search and .net
Indexing, searching, and aggregation with redi search and .netIndexing, searching, and aggregation with redi search and .net
Indexing, searching, and aggregation with redi search and .net
 
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
 
Deriving an Emergent Relational Schema from RDF Data
Deriving an Emergent Relational Schema from RDF DataDeriving an Emergent Relational Schema from RDF Data
Deriving an Emergent Relational Schema from RDF Data
 
Semantics 2017 - Trying Not to Die Benchmarking using LITMUS
Semantics 2017 - Trying Not to Die Benchmarking using LITMUSSemantics 2017 - Trying Not to Die Benchmarking using LITMUS
Semantics 2017 - Trying Not to Die Benchmarking using LITMUS
 

Ähnlich wie Data quality in Real Estate

Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dan Lynn
 
Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConne...
Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConne...Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConne...
Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConne...Neo4j
 
Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016Dan Lynn
 
Brett Ragozzine - Graph Databases and Neo4j
Brett Ragozzine - Graph Databases and Neo4jBrett Ragozzine - Graph Databases and Neo4j
Brett Ragozzine - Graph Databases and Neo4jBrett Ragozzine
 
Domain driven design: a gentle introduction
Domain driven design:  a gentle introductionDomain driven design:  a gentle introduction
Domain driven design: a gentle introductionAsher Sterkin
 
Nov 2019 kafka with mongo db and confluent sydney
Nov 2019 kafka with mongo db and confluent   sydneyNov 2019 kafka with mongo db and confluent   sydney
Nov 2019 kafka with mongo db and confluent sydneyAndrew Blades
 
Getting started with Apache Spark in Python - PyLadies Toronto 2016
Getting started with Apache Spark in Python - PyLadies Toronto 2016Getting started with Apache Spark in Python - PyLadies Toronto 2016
Getting started with Apache Spark in Python - PyLadies Toronto 2016Holden Karau
 
Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...
Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...
Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...Aaron Saray
 
General introduction to AI ML DL DS
General introduction to AI ML DL DSGeneral introduction to AI ML DL DS
General introduction to AI ML DL DSRoopesh Kohad
 
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022ArangoDB Database
 
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022ArangoDB Database
 
Machine Learning + Graph Databases for Better Recommendations
Machine Learning + Graph Databases for Better RecommendationsMachine Learning + Graph Databases for Better Recommendations
Machine Learning + Graph Databases for Better RecommendationsChristopherWoodward16
 
Introduction to Neo4j for the Emirates & Bahrain
Introduction to Neo4j for the Emirates & BahrainIntroduction to Neo4j for the Emirates & Bahrain
Introduction to Neo4j for the Emirates & BahrainNeo4j
 
Devday @ Sahaj - Domain Specific NLP Pipelines
Devday @ Sahaj -  Domain Specific NLP PipelinesDevday @ Sahaj -  Domain Specific NLP Pipelines
Devday @ Sahaj - Domain Specific NLP PipelinesRajesh Muppalla
 
Scala Days Highlights | BoldRadius
Scala Days Highlights | BoldRadiusScala Days Highlights | BoldRadius
Scala Days Highlights | BoldRadiusBoldRadius Solutions
 
Drill Lightning London Big Data
Drill Lightning London Big DataDrill Lightning London Big Data
Drill Lightning London Big DataMapR Technologies
 
Building data "Py-pelines"
Building data "Py-pelines"Building data "Py-pelines"
Building data "Py-pelines"Rob Winters
 

Ähnlich wie Data quality in Real Estate (20)

Anything-to-Graph
Anything-to-GraphAnything-to-Graph
Anything-to-Graph
 
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
 
Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConne...
Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConne...Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConne...
Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConne...
 
Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016
 
Brett Ragozzine - Graph Databases and Neo4j
Brett Ragozzine - Graph Databases and Neo4jBrett Ragozzine - Graph Databases and Neo4j
Brett Ragozzine - Graph Databases and Neo4j
 
Domain driven design: a gentle introduction
Domain driven design:  a gentle introductionDomain driven design:  a gentle introduction
Domain driven design: a gentle introduction
 
Nov 2019 kafka with mongo db and confluent sydney
Nov 2019 kafka with mongo db and confluent   sydneyNov 2019 kafka with mongo db and confluent   sydney
Nov 2019 kafka with mongo db and confluent sydney
 
MongoDB Basics
MongoDB BasicsMongoDB Basics
MongoDB Basics
 
Getting started with Apache Spark in Python - PyLadies Toronto 2016
Getting started with Apache Spark in Python - PyLadies Toronto 2016Getting started with Apache Spark in Python - PyLadies Toronto 2016
Getting started with Apache Spark in Python - PyLadies Toronto 2016
 
Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...
Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...
Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...
 
General introduction to AI ML DL DS
General introduction to AI ML DL DSGeneral introduction to AI ML DL DS
General introduction to AI ML DL DS
 
L15.pptx
L15.pptxL15.pptx
L15.pptx
 
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
 
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
 
Machine Learning + Graph Databases for Better Recommendations
Machine Learning + Graph Databases for Better RecommendationsMachine Learning + Graph Databases for Better Recommendations
Machine Learning + Graph Databases for Better Recommendations
 
Introduction to Neo4j for the Emirates & Bahrain
Introduction to Neo4j for the Emirates & BahrainIntroduction to Neo4j for the Emirates & Bahrain
Introduction to Neo4j for the Emirates & Bahrain
 
Devday @ Sahaj - Domain Specific NLP Pipelines
Devday @ Sahaj -  Domain Specific NLP PipelinesDevday @ Sahaj -  Domain Specific NLP Pipelines
Devday @ Sahaj - Domain Specific NLP Pipelines
 
Scala Days Highlights | BoldRadius
Scala Days Highlights | BoldRadiusScala Days Highlights | BoldRadius
Scala Days Highlights | BoldRadius
 
Drill Lightning London Big Data
Drill Lightning London Big DataDrill Lightning London Big Data
Drill Lightning London Big Data
 
Building data "Py-pelines"
Building data "Py-pelines"Building data "Py-pelines"
Building data "Py-pelines"
 

Mehr von Dimitris Kontokostas

Data quality assessment - connecting the pieces...
Data quality assessment - connecting the pieces...Data quality assessment - connecting the pieces...
Data quality assessment - connecting the pieces...Dimitris Kontokostas
 
8th DBpedia meeting / California 2016
8th DBpedia meeting /  California 20168th DBpedia meeting /  California 2016
8th DBpedia meeting / California 2016Dimitris Kontokostas
 
Semantically enhanced quality assurance in the jurion business use case
Semantically enhanced quality assurance in the jurion  business use caseSemantically enhanced quality assurance in the jurion  business use case
Semantically enhanced quality assurance in the jurion business use caseDimitris Kontokostas
 
Graph databases & data integration - the case of RDF
Graph databases & data integration - the case of RDFGraph databases & data integration - the case of RDF
Graph databases & data integration - the case of RDFDimitris Kontokostas
 
DBpedia+ / DBpedia meeting in Dublin
DBpedia+ / DBpedia meeting in DublinDBpedia+ / DBpedia meeting in Dublin
DBpedia+ / DBpedia meeting in DublinDimitris Kontokostas
 
NLP Data Cleansing Based on Linguistic Ontology Constraints
NLP Data Cleansing Based on Linguistic Ontology ConstraintsNLP Data Cleansing Based on Linguistic Ontology Constraints
NLP Data Cleansing Based on Linguistic Ontology ConstraintsDimitris Kontokostas
 
RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)
RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)
RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)Dimitris Kontokostas
 
DBpedia i18n - Amsterdam Meeting (30/01/2014)
DBpedia i18n - Amsterdam Meeting (30/01/2014)DBpedia i18n - Amsterdam Meeting (30/01/2014)
DBpedia i18n - Amsterdam Meeting (30/01/2014)Dimitris Kontokostas
 

Mehr von Dimitris Kontokostas (12)

Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
 
Data quality assessment - connecting the pieces...
Data quality assessment - connecting the pieces...Data quality assessment - connecting the pieces...
Data quality assessment - connecting the pieces...
 
8th DBpedia meeting / California 2016
8th DBpedia meeting /  California 20168th DBpedia meeting /  California 2016
8th DBpedia meeting / California 2016
 
Semantically enhanced quality assurance in the jurion business use case
Semantically enhanced quality assurance in the jurion  business use caseSemantically enhanced quality assurance in the jurion  business use case
Semantically enhanced quality assurance in the jurion business use case
 
Graph databases & data integration - the case of RDF
Graph databases & data integration - the case of RDFGraph databases & data integration - the case of RDF
Graph databases & data integration - the case of RDF
 
DBpedia past, present & future
DBpedia past, present & futureDBpedia past, present & future
DBpedia past, present & future
 
DBpedia+ / DBpedia meeting in Dublin
DBpedia+ / DBpedia meeting in DublinDBpedia+ / DBpedia meeting in Dublin
DBpedia+ / DBpedia meeting in Dublin
 
DBpedia ♥ Commons
DBpedia ♥ CommonsDBpedia ♥ Commons
DBpedia ♥ Commons
 
NLP Data Cleansing Based on Linguistic Ontology Constraints
NLP Data Cleansing Based on Linguistic Ontology ConstraintsNLP Data Cleansing Based on Linguistic Ontology Constraints
NLP Data Cleansing Based on Linguistic Ontology Constraints
 
RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)
RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)
RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)
 
DBpedia Viewer - LDOW 2014
DBpedia Viewer - LDOW 2014DBpedia Viewer - LDOW 2014
DBpedia Viewer - LDOW 2014
 
DBpedia i18n - Amsterdam Meeting (30/01/2014)
DBpedia i18n - Amsterdam Meeting (30/01/2014)DBpedia i18n - Amsterdam Meeting (30/01/2014)
DBpedia i18n - Amsterdam Meeting (30/01/2014)
 

Kürzlich hochgeladen

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 

Kürzlich hochgeladen (20)

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 

Data quality in Real Estate

  • 1. Data Quality In Real Estate Dimitris Kontokostas, Andy van der Hoeven, Samur Araujo Amsterdam, Sep 14th 2017, LDQ Workshop, SEMANTiCS Conference
  • 2. About Geophy ● Goal to map all buildings in the world ● Provide a quality score for each building ○ Based on location, building status, history, environmental metrics, etc ● Semantic platform ○ RDF eases the data integration process ● Team of 45 with aim to double by next year
  • 3. Real Estate is a very complex domain Really!
  • 4. Possible constraints on addresses? ● An address will start with, or at least include, a building number. ● When there is a building number, it will be all-numeric. ● No buildings are numbered zero ● Well, at the very least no buildings have negative numbers ● A building number will only be used once per street ● A building will only have one number ● A building name won't also be a number ● [...] https://www.mjt.me.uk/posts/falsehoods-programmers-believe-about-addresses
  • 5. Geophy [set of] ontologies ● 13 ontologies (+ 9 external) ● 125 Classes ○ Buildings ○ Addresses ○ Companies ○ [...] ● 720 properties ○ 500 datatype ○ 160 relation properties ● Growing...
  • 6. Quality is expensive ● Quality of source data ○ Free, open, closed data sources, etc. ● Data clean up process ○ Violations, deduplication, precision, etc. ○ How much time and effort can one afford? How much quality is good enough? Fitness for use
  • 7. Quality of ... ● Source data ○ Accuracy of the source ● Translation of source data ○ RDF mappings, rml, d2rq, scripts etc. ● Model design ○ Modelling quality ○ Data fitting on schema ● Model definition ○ Mapping of model on RDFS, OWL, ShEx|SHACL Shapes, etc ○ Semantics i.e RDFS, OWL DL/RL/FULL, etc
  • 8. Evolution & quality Data evolves so do ontologies so do RDF mappings so does code so do SPARQL queries so do constraints http://aligned-project.eu
  • 9. Scaling quality ... ● Thousands of triples ● Millions of triples ● Billions of triples ● ? Try to move validation in the K range (when possible)
  • 10. Validate closer to the source Validate the model Validate the RDF mappings Validate RDF mapping excerpts Validate instance data
  • 11. Automate, automate & automate Can you spot the error? rdfs:label ⇒ rdf:langString :foo rdfs:label ″foo @en″ .
  • 12. Automate, automate & automate Can you spot the error? rdfs:label ⇒ rdf:langString :foo rdfs:label ″foo @en″ . :foo rdfs:label ″foo″@en .
  • 13. CI/CD is your buddy ● Integrate validation with your CI/CD ○ Choose tools & technologies wisely ○ Jenkins, Travis, Gitlab, TeamCity ● Fail the build until data issues are fixed ● Data integration validation checks ○ Standalone datasets can pass CI
  • 14. Thank you for your attention Questions?