SlideShare a Scribd company logo
Data Quality
Data Quality at the Open Data Hub and
why it is everybody's business
Peter Hopfgartner - ODH Day
What is Data Quality
Possible definition
Data is [...] of high quality if it is
"fit for [its] intended uses in
operations, decision making and
planning".
[https://en.wikipedia.org/wiki/D
ata_quality]
Dimensions of Data
Quality
This can be a very complex topic.
No general agreement on the
attributes.
Some examples:
➔ accuracy
➔ completeness
➔ consistency
Importance of Data
Quality
Hypothetical example:
Google Maps
Google:
➔ has the geographic data
➔ builds services on data
➔ can evaluate the quality of
the data from usage
◆ did you arrive at the correct
place
◆ how happy were you with the
navigation
◆ did you use the route that GM
proposed?
Data of the ODH
Right now, data is used in many
web pages:
https://sparql.opendatahub.bz.it
/predefined/accommodation?Id=
4E44BB8764234E7B937652859C
B7BBAD
(Good) data generates
value
ODH data can be used for many
purposes:
➔ Tourism
➔ Research
➔ Mobility
Inspecting data
Incorrect Data 1
Finding wrong data
# on the top of the world?
PREFIX schema: <http://schema.org/>
PREFIX geo: <http://www.opengis.net/ont/geosparql#>
PREFIX : <http://noi.example.org/ontology/odh#>
SELECT ?h ?pos ?posLabel ?altitude
WHERE {
?h a schema:LodgingBusiness ;
schema:name ?posLabel ;
geo:defaultGeometry/geo:asWKT ?pos ;
schema:geo/schema:elevation ?altitude ;
FILTER (lang(?posLabel) = 'de')
FILTER (?altitude > 3500)
} ORDER BY DESC(?altitude)
Restaurants in the
stratosphere?
posLabel altitude
Garni Wieterer 26430
Alberhof 10060
BNB Kreithof 4002
Ferienwohnungen
Steinegg
4000
Incorrect Data 1
Finding wrong data
# Tropic South Tyrol?
PREFIX schema: <http://schema.org/>
PREFIX geo: <http://www.opengis.net/ont/geosparql#>
SELECT ?h ?pos ?posLabel ?box ?boxColor ?posColor {
?h a schema:LodgingBusiness ;
schema:name ?posLabel ;
schema:geo [ schema:latitude ?lat ;
schema:longitude ?long ] ;
geo:defaultGeometry/geo:asWKT ?pos .
FILTER (?lat = 0.0 || ?long = 0.0) .
FILTER (lang(?posLabel) = 'de') .
}
Incorrect Data 3
Wrong position
PREFIX schema: <http://schema.org/>
PREFIX geo: <http://www.opengis.net/ont/geosparql#>
SELECT *
WHERE {
?h a schema:LodgingBusiness ;
geo:defaultGeometry/geo:asWKT ?hPos ;
schema:name ?hPosLabel ;
schema:address ?a .
?a schema:postalCode "39100" .
}
Incorrect Data 2
Eno-Camping?
14 Hotels are on the “Null Island”
Incomplete Data 1
Hotels without rooms
PREFIX geo: <http://www.opengis.net/ont/geosparql#>
PREFIX schema: <http://schema.org/>
SELECT ?pos ?posColor ?bName (?bName AS ?posLabel)
WHERE {
?b a schema:LodgingBusiness ;
schema:name ?bName ;
geo:defaultGeometry/geo:asWKT ?pos .
MINUS {
?r schema:containedInPlace ?b .
?r a schema:Accommodation
}
FILTER (lang(?bName) = 'en')
}
Incomplete Data 2
Is this Betten-Stop?
449 Hotels have no rooms
Incomplete Data 3
Missing names
PREFIX schema: <http://schema.org/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX : <http://noi.example.org/ontology/odh#>
SELECT * WHERE {
?h a ?c .
OPTIONAL {
?h schema:name ?hName .
FILTER (langMatches(lang(?hName), "de"))
}
VALUES (?c ?p) {
#(schema:FoodEstablishment schema:name)
(schema:LodgingBusiness schema:name)
(schema:FoodEstablishment schema:name)
}
VALUES (?l) {
("it") ("de") ("en")
}
MINUS {
VALUES (?l) {
("it") ("de") ("en")
}
?h ?p ?n .
FILTER(lang(?n) = ?l)
FILTER(str(?n) != "")
}
}
LIMIT 1000
Inconsistent Data 1
Misplaced Hotels
PREFIX uom: <http://www.opengis.net/def/uom/OGC/1.0/>
PREFIX geof: <http://www.opengis.net/def/function/geosparql/>
PREFIX geo: <http://www.opengis.net/ont/geosparql#>
PREFIX schema: <http://schema.org/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX : <http://noi.example.org/ontology/odh#>
SELECT ?hPos (CONCAT(?hName, " (", ?munName, ") ",
str(?distance), " km away") AS ?hPosLabel) ?munName ?distance
WHERE {
?h a schema:LodgingBusiness ;
schema:name ?hName ;
geo:defaultGeometry/geo:asWKT ?hPos ;
schema:containedInPlace [
a :Municipality ;
schema:name ?munName ;
geo:defaultGeometry/geo:asWKT ?munPos ] .
FILTER(langMatches(lang(?hName), "de"))
FILTER(langMatches(lang(?munName), "de"))
FILTER(!geof:sfContains(?munPos,?hPos))
BIND(geof:distance(?hPos, ?munPos, uom:metre)/1000 AS
?distance)
FILTER(?distance > 1)
}
Inconsistent Data 1
Misplaced Hotels
20 Hotels are more than 10 km from their
municipality
Inspection tools
The Open Data Hub has
➔ Well integrated data
➔ Powerful inspection tools to
filter and compare data
➔ Full data querying language
Who should care?
Data providers
Can easily fix data upstream, if
they are aware of the issues.
ODH/Data integrators
Can do many routinary checks.
Data users
Notify ODH or the data provider
on any issue found
We make data usable

More Related Content

Similar to Peter Hopfgartner- Ontopic - Data quality of the open data hub and why it is everybody's business

Geocoding Our Nation's Schools - Blue Raster NCES Stats-DC 2012 Presentation
Geocoding Our Nation's Schools - Blue Raster NCES Stats-DC 2012 PresentationGeocoding Our Nation's Schools - Blue Raster NCES Stats-DC 2012 Presentation
Geocoding Our Nation's Schools - Blue Raster NCES Stats-DC 2012 Presentation
Blue Raster
 
Analyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillAnalyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache Drill
Tomer Shiran
 
Data Visualization and Mapping using Javascript
Data Visualization and Mapping using JavascriptData Visualization and Mapping using Javascript
Data Visualization and Mapping using Javascript
Mack Hardy
 
Drupal and the GeoSpatial Web
Drupal and the GeoSpatial WebDrupal and the GeoSpatial Web
Drupal and the GeoSpatial Web
Andrew Turner
 

Similar to Peter Hopfgartner- Ontopic - Data quality of the open data hub and why it is everybody's business (20)

Large Scale Geo Processing on Hadoop
Large Scale Geo Processing on HadoopLarge Scale Geo Processing on Hadoop
Large Scale Geo Processing on Hadoop
 
Geocoding Our Nation's Schools - Blue Raster NCES Stats-DC 2012 Presentation
Geocoding Our Nation's Schools - Blue Raster NCES Stats-DC 2012 PresentationGeocoding Our Nation's Schools - Blue Raster NCES Stats-DC 2012 Presentation
Geocoding Our Nation's Schools - Blue Raster NCES Stats-DC 2012 Presentation
 
Pig latin
Pig latinPig latin
Pig latin
 
Analyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillAnalyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache Drill
 
Bring Cartography to the Cloud
Bring Cartography to the CloudBring Cartography to the Cloud
Bring Cartography to the Cloud
 
Workshop - Build a Graph Solution
Workshop - Build a Graph SolutionWorkshop - Build a Graph Solution
Workshop - Build a Graph Solution
 
Analyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillAnalyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache Drill
 
Rails Gis Hacks
Rails Gis HacksRails Gis Hacks
Rails Gis Hacks
 
Corinne Hutchinson's 7/8/2015 PuPPy Presentation on GeoDjango
Corinne Hutchinson's 7/8/2015 PuPPy Presentation on GeoDjangoCorinne Hutchinson's 7/8/2015 PuPPy Presentation on GeoDjango
Corinne Hutchinson's 7/8/2015 PuPPy Presentation on GeoDjango
 
Web mapswithleaflet
Web mapswithleafletWeb mapswithleaflet
Web mapswithleaflet
 
Geolocation in Drupal
Geolocation in DrupalGeolocation in Drupal
Geolocation in Drupal
 
Where20 2008 Ruby Tutorial
Where20 2008 Ruby TutorialWhere20 2008 Ruby Tutorial
Where20 2008 Ruby Tutorial
 
Geo-linked data: towards deep integration of location in the web of data
Geo-linked data: towards deep integration of location in the web of dataGeo-linked data: towards deep integration of location in the web of data
Geo-linked data: towards deep integration of location in the web of data
 
Data Visualization and Mapping using Javascript
Data Visualization and Mapping using JavascriptData Visualization and Mapping using Javascript
Data Visualization and Mapping using Javascript
 
GeoTechTalk InkSatogaeri Project
GeoTechTalk InkSatogaeri ProjectGeoTechTalk InkSatogaeri Project
GeoTechTalk InkSatogaeri Project
 
The Key Features of a Great Web API
The Key Features of a Great Web APIThe Key Features of a Great Web API
The Key Features of a Great Web API
 
Drupal and the GeoSpatial Web
Drupal and the GeoSpatial WebDrupal and the GeoSpatial Web
Drupal and the GeoSpatial Web
 
Open Location Data and Linked Open Data
Open Location Data and Linked Open DataOpen Location Data and Linked Open Data
Open Location Data and Linked Open Data
 
The essentials for life at cogs
The essentials for life at cogsThe essentials for life at cogs
The essentials for life at cogs
 
Adding Complex Data to Spark Stack-(Neeraja Rentachintala, MapR)
Adding Complex Data to Spark Stack-(Neeraja Rentachintala, MapR)Adding Complex Data to Spark Stack-(Neeraja Rentachintala, MapR)
Adding Complex Data to Spark Stack-(Neeraja Rentachintala, MapR)
 

More from South Tyrol Free Software Conference

SFSCON23 - Rufai Omowunmi Balogun - SMODEX – a Python package for understandi...
SFSCON23 - Rufai Omowunmi Balogun - SMODEX – a Python package for understandi...SFSCON23 - Rufai Omowunmi Balogun - SMODEX – a Python package for understandi...
SFSCON23 - Rufai Omowunmi Balogun - SMODEX – a Python package for understandi...
South Tyrol Free Software Conference
 
SFSCON23 - Marianna d'Atri Enrico Zanardo - How can Blockchain technologies i...
SFSCON23 - Marianna d'Atri Enrico Zanardo - How can Blockchain technologies i...SFSCON23 - Marianna d'Atri Enrico Zanardo - How can Blockchain technologies i...
SFSCON23 - Marianna d'Atri Enrico Zanardo - How can Blockchain technologies i...
South Tyrol Free Software Conference
 
SFSCON23 - Giovanni Giannotta - Intelligent Decision Support System for trace...
SFSCON23 - Giovanni Giannotta - Intelligent Decision Support System for trace...SFSCON23 - Giovanni Giannotta - Intelligent Decision Support System for trace...
SFSCON23 - Giovanni Giannotta - Intelligent Decision Support System for trace...
South Tyrol Free Software Conference
 
SFSCON23 - Elena Maines - Embracing CI/CD workflows for building ETL pipelines
SFSCON23 - Elena Maines - Embracing CI/CD workflows for building ETL pipelinesSFSCON23 - Elena Maines - Embracing CI/CD workflows for building ETL pipelines
SFSCON23 - Elena Maines - Embracing CI/CD workflows for building ETL pipelines
South Tyrol Free Software Conference
 
SFSCON23 - Johannes Näder Linus Sehn - Let’s monitor implementation of Free S...
SFSCON23 - Johannes Näder Linus Sehn - Let’s monitor implementation of Free S...SFSCON23 - Johannes Näder Linus Sehn - Let’s monitor implementation of Free S...
SFSCON23 - Johannes Näder Linus Sehn - Let’s monitor implementation of Free S...
South Tyrol Free Software Conference
 
SFSCON23 - Edoardo Scepi - The Brand-New Version of IGis Maps
SFSCON23 - Edoardo Scepi - The Brand-New Version of IGis MapsSFSCON23 - Edoardo Scepi - The Brand-New Version of IGis Maps
SFSCON23 - Edoardo Scepi - The Brand-New Version of IGis Maps
South Tyrol Free Software Conference
 

More from South Tyrol Free Software Conference (20)

SFSCON23 - Rufai Omowunmi Balogun - SMODEX – a Python package for understandi...
SFSCON23 - Rufai Omowunmi Balogun - SMODEX – a Python package for understandi...SFSCON23 - Rufai Omowunmi Balogun - SMODEX – a Python package for understandi...
SFSCON23 - Rufai Omowunmi Balogun - SMODEX – a Python package for understandi...
 
SFSCON23 - Roberto Innocenti - From the design to reality is here the Communi...
SFSCON23 - Roberto Innocenti - From the design to reality is here the Communi...SFSCON23 - Roberto Innocenti - From the design to reality is here the Communi...
SFSCON23 - Roberto Innocenti - From the design to reality is here the Communi...
 
SFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data Hub
SFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data HubSFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data Hub
SFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data Hub
 
SFSCON23 - Marianna d'Atri Enrico Zanardo - How can Blockchain technologies i...
SFSCON23 - Marianna d'Atri Enrico Zanardo - How can Blockchain technologies i...SFSCON23 - Marianna d'Atri Enrico Zanardo - How can Blockchain technologies i...
SFSCON23 - Marianna d'Atri Enrico Zanardo - How can Blockchain technologies i...
 
SFSCON23 - Lucas Lasota - The Future of Connectivity, Open Internet and Human...
SFSCON23 - Lucas Lasota - The Future of Connectivity, Open Internet and Human...SFSCON23 - Lucas Lasota - The Future of Connectivity, Open Internet and Human...
SFSCON23 - Lucas Lasota - The Future of Connectivity, Open Internet and Human...
 
SFSCON23 - Giovanni Giannotta - Intelligent Decision Support System for trace...
SFSCON23 - Giovanni Giannotta - Intelligent Decision Support System for trace...SFSCON23 - Giovanni Giannotta - Intelligent Decision Support System for trace...
SFSCON23 - Giovanni Giannotta - Intelligent Decision Support System for trace...
 
SFSCON23 - Elena Maines - Embracing CI/CD workflows for building ETL pipelines
SFSCON23 - Elena Maines - Embracing CI/CD workflows for building ETL pipelinesSFSCON23 - Elena Maines - Embracing CI/CD workflows for building ETL pipelines
SFSCON23 - Elena Maines - Embracing CI/CD workflows for building ETL pipelines
 
SFSCON23 - Christian Busse - Free Software and Open Science
SFSCON23 - Christian Busse - Free Software and Open ScienceSFSCON23 - Christian Busse - Free Software and Open Science
SFSCON23 - Christian Busse - Free Software and Open Science
 
SFSCON23 - Charles H. Schulz - Why open digital infrastructure matters
SFSCON23 - Charles H. Schulz - Why open digital infrastructure mattersSFSCON23 - Charles H. Schulz - Why open digital infrastructure matters
SFSCON23 - Charles H. Schulz - Why open digital infrastructure matters
 
SFSCON23 - Andrea Vianello - Achieving FAIRness with EDP-portal
SFSCON23 - Andrea Vianello - Achieving FAIRness with EDP-portalSFSCON23 - Andrea Vianello - Achieving FAIRness with EDP-portal
SFSCON23 - Andrea Vianello - Achieving FAIRness with EDP-portal
 
SFSCON23 - Thomas Aichner - How IoT and AI are revolutionizing Mass Customiza...
SFSCON23 - Thomas Aichner - How IoT and AI are revolutionizing Mass Customiza...SFSCON23 - Thomas Aichner - How IoT and AI are revolutionizing Mass Customiza...
SFSCON23 - Thomas Aichner - How IoT and AI are revolutionizing Mass Customiza...
 
SFSCON23 - Stefan Mutschlechner - Smart Werke Meran
SFSCON23 - Stefan Mutschlechner - Smart Werke MeranSFSCON23 - Stefan Mutschlechner - Smart Werke Meran
SFSCON23 - Stefan Mutschlechner - Smart Werke Meran
 
SFSCON23 - Mirko Boehm - European regulators cast their eyes on maturing OSS ...
SFSCON23 - Mirko Boehm - European regulators cast their eyes on maturing OSS ...SFSCON23 - Mirko Boehm - European regulators cast their eyes on maturing OSS ...
SFSCON23 - Mirko Boehm - European regulators cast their eyes on maturing OSS ...
 
SFSCON23 - Marco Pavanelli - Monitoring the fleet of Sasa with free software
SFSCON23 - Marco Pavanelli - Monitoring the fleet of Sasa with free softwareSFSCON23 - Marco Pavanelli - Monitoring the fleet of Sasa with free software
SFSCON23 - Marco Pavanelli - Monitoring the fleet of Sasa with free software
 
SFSCON23 - Marco Cortella - KNOWAGE and AICS for 2030 agenda SDG goals monito...
SFSCON23 - Marco Cortella - KNOWAGE and AICS for 2030 agenda SDG goals monito...SFSCON23 - Marco Cortella - KNOWAGE and AICS for 2030 agenda SDG goals monito...
SFSCON23 - Marco Cortella - KNOWAGE and AICS for 2030 agenda SDG goals monito...
 
SFSCON23 - Lina Ceballos - Interoperable Europe Act - A real game changer
SFSCON23 - Lina Ceballos - Interoperable Europe Act - A real game changerSFSCON23 - Lina Ceballos - Interoperable Europe Act - A real game changer
SFSCON23 - Lina Ceballos - Interoperable Europe Act - A real game changer
 
SFSCON23 - Johannes Näder Linus Sehn - Let’s monitor implementation of Free S...
SFSCON23 - Johannes Näder Linus Sehn - Let’s monitor implementation of Free S...SFSCON23 - Johannes Näder Linus Sehn - Let’s monitor implementation of Free S...
SFSCON23 - Johannes Näder Linus Sehn - Let’s monitor implementation of Free S...
 
SFSCON23 - Gabriel Ku Wei Bin - Why Do We Need A Next Generation Internet
SFSCON23 - Gabriel Ku Wei Bin - Why Do We Need A Next Generation InternetSFSCON23 - Gabriel Ku Wei Bin - Why Do We Need A Next Generation Internet
SFSCON23 - Gabriel Ku Wei Bin - Why Do We Need A Next Generation Internet
 
SFSCON23 - Edoardo Scepi - The Brand-New Version of IGis Maps
SFSCON23 - Edoardo Scepi - The Brand-New Version of IGis MapsSFSCON23 - Edoardo Scepi - The Brand-New Version of IGis Maps
SFSCON23 - Edoardo Scepi - The Brand-New Version of IGis Maps
 
SFSCON23 - Davide Vernassa - Empowering Insights Unveiling the latest innova...
SFSCON23 - Davide Vernassa - Empowering Insights  Unveiling the latest innova...SFSCON23 - Davide Vernassa - Empowering Insights  Unveiling the latest innova...
SFSCON23 - Davide Vernassa - Empowering Insights Unveiling the latest innova...
 

Recently uploaded

AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
Alluxio, Inc.
 

Recently uploaded (20)

Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
 
AI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in MichelangeloAI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in Michelangelo
 
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAGAI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
 
GraphAware - Transforming policing with graph-based intelligence analysis
GraphAware - Transforming policing with graph-based intelligence analysisGraphAware - Transforming policing with graph-based intelligence analysis
GraphAware - Transforming policing with graph-based intelligence analysis
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 

Peter Hopfgartner- Ontopic - Data quality of the open data hub and why it is everybody's business

  • 1. Data Quality Data Quality at the Open Data Hub and why it is everybody's business Peter Hopfgartner - ODH Day
  • 2. What is Data Quality
  • 3. Possible definition Data is [...] of high quality if it is "fit for [its] intended uses in operations, decision making and planning". [https://en.wikipedia.org/wiki/D ata_quality]
  • 4. Dimensions of Data Quality This can be a very complex topic. No general agreement on the attributes. Some examples: ➔ accuracy ➔ completeness ➔ consistency
  • 6. Hypothetical example: Google Maps Google: ➔ has the geographic data ➔ builds services on data ➔ can evaluate the quality of the data from usage ◆ did you arrive at the correct place ◆ how happy were you with the navigation ◆ did you use the route that GM proposed?
  • 7. Data of the ODH Right now, data is used in many web pages: https://sparql.opendatahub.bz.it /predefined/accommodation?Id= 4E44BB8764234E7B937652859C B7BBAD
  • 8.
  • 9. (Good) data generates value ODH data can be used for many purposes: ➔ Tourism ➔ Research ➔ Mobility
  • 11. Incorrect Data 1 Finding wrong data # on the top of the world? PREFIX schema: <http://schema.org/> PREFIX geo: <http://www.opengis.net/ont/geosparql#> PREFIX : <http://noi.example.org/ontology/odh#> SELECT ?h ?pos ?posLabel ?altitude WHERE { ?h a schema:LodgingBusiness ; schema:name ?posLabel ; geo:defaultGeometry/geo:asWKT ?pos ; schema:geo/schema:elevation ?altitude ; FILTER (lang(?posLabel) = 'de') FILTER (?altitude > 3500) } ORDER BY DESC(?altitude)
  • 12. Restaurants in the stratosphere? posLabel altitude Garni Wieterer 26430 Alberhof 10060 BNB Kreithof 4002 Ferienwohnungen Steinegg 4000
  • 13. Incorrect Data 1 Finding wrong data # Tropic South Tyrol? PREFIX schema: <http://schema.org/> PREFIX geo: <http://www.opengis.net/ont/geosparql#> SELECT ?h ?pos ?posLabel ?box ?boxColor ?posColor { ?h a schema:LodgingBusiness ; schema:name ?posLabel ; schema:geo [ schema:latitude ?lat ; schema:longitude ?long ] ; geo:defaultGeometry/geo:asWKT ?pos . FILTER (?lat = 0.0 || ?long = 0.0) . FILTER (lang(?posLabel) = 'de') . }
  • 14. Incorrect Data 3 Wrong position PREFIX schema: <http://schema.org/> PREFIX geo: <http://www.opengis.net/ont/geosparql#> SELECT * WHERE { ?h a schema:LodgingBusiness ; geo:defaultGeometry/geo:asWKT ?hPos ; schema:name ?hPosLabel ; schema:address ?a . ?a schema:postalCode "39100" . }
  • 15. Incorrect Data 2 Eno-Camping? 14 Hotels are on the “Null Island”
  • 16. Incomplete Data 1 Hotels without rooms PREFIX geo: <http://www.opengis.net/ont/geosparql#> PREFIX schema: <http://schema.org/> SELECT ?pos ?posColor ?bName (?bName AS ?posLabel) WHERE { ?b a schema:LodgingBusiness ; schema:name ?bName ; geo:defaultGeometry/geo:asWKT ?pos . MINUS { ?r schema:containedInPlace ?b . ?r a schema:Accommodation } FILTER (lang(?bName) = 'en') }
  • 17. Incomplete Data 2 Is this Betten-Stop? 449 Hotels have no rooms
  • 18. Incomplete Data 3 Missing names PREFIX schema: <http://schema.org/> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX : <http://noi.example.org/ontology/odh#> SELECT * WHERE { ?h a ?c . OPTIONAL { ?h schema:name ?hName . FILTER (langMatches(lang(?hName), "de")) } VALUES (?c ?p) { #(schema:FoodEstablishment schema:name) (schema:LodgingBusiness schema:name) (schema:FoodEstablishment schema:name) } VALUES (?l) { ("it") ("de") ("en") } MINUS { VALUES (?l) { ("it") ("de") ("en") } ?h ?p ?n . FILTER(lang(?n) = ?l) FILTER(str(?n) != "") } } LIMIT 1000
  • 19. Inconsistent Data 1 Misplaced Hotels PREFIX uom: <http://www.opengis.net/def/uom/OGC/1.0/> PREFIX geof: <http://www.opengis.net/def/function/geosparql/> PREFIX geo: <http://www.opengis.net/ont/geosparql#> PREFIX schema: <http://schema.org/> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX : <http://noi.example.org/ontology/odh#> SELECT ?hPos (CONCAT(?hName, " (", ?munName, ") ", str(?distance), " km away") AS ?hPosLabel) ?munName ?distance WHERE { ?h a schema:LodgingBusiness ; schema:name ?hName ; geo:defaultGeometry/geo:asWKT ?hPos ; schema:containedInPlace [ a :Municipality ; schema:name ?munName ; geo:defaultGeometry/geo:asWKT ?munPos ] . FILTER(langMatches(lang(?hName), "de")) FILTER(langMatches(lang(?munName), "de")) FILTER(!geof:sfContains(?munPos,?hPos)) BIND(geof:distance(?hPos, ?munPos, uom:metre)/1000 AS ?distance) FILTER(?distance > 1) }
  • 20. Inconsistent Data 1 Misplaced Hotels 20 Hotels are more than 10 km from their municipality
  • 21. Inspection tools The Open Data Hub has ➔ Well integrated data ➔ Powerful inspection tools to filter and compare data ➔ Full data querying language
  • 23. Data providers Can easily fix data upstream, if they are aware of the issues. ODH/Data integrators Can do many routinary checks. Data users Notify ODH or the data provider on any issue found
  • 24. We make data usable