This presentation was given in a session dedictated to OpenStreetMap studies during the annual meeting of the Association of the American Geographers (AAG) in Chicago, IL.
20240315 ACMJ Diagrams Set 2.docx . With light, motor, coloured light, and se...
#AAG2015 presentation on OSM attribute inconsistency and semantic heterogeneity
1. An intrinsic approach for the detection and correction of attributive
inconsistencies and semantic heterogeneity in OSM data
Martin Loidl | martin.loidl@sbg.ac.at
Stefan Keller| sfkeller@hsr.ch
AAG Annual Meeting – Workshop OpenStreetMap Studies
Chicago, April 24th 2015
2. OSM bottom-up community approach
Rudimentary data model and attribute structure (tagging scheme K = v)
Attributes: recommendations ≠ conventions ≠ formalized standard
No restriction of tag usage and definition
Problem Statement
2
http://www.openstreetmap.es
3. Within one way
Within a succession of ways (e.g. street)
Attributive Inconsistencies
3
highway = motorway
name = Kennedy Expressway
bicycle = yes
highway = motorway
name = Kennedy Expressway
ref = I 90
highway = motorway
name = Fisher Freeway
ref = I 90
highway = motorway
name = Kennedy Expressway
ref = I 90
4. Different (correct) description for one and the same entity
Specific to crowd-sourced data (≠ authoritative data follow
strict specifications)
Semantic Heterogeneity
4
highway = cycleway
foot = designated
width = 3
highway = path
bicycle = designated
foot = yes
highway = footway
bicycle = designated
surface = asphalt
5. Considering attributive inconsistencies and semantic
heterogeneity is relevant for …
Visualization (data rendering)
Descriptive statistics (classification)
Spatial analysis (e.g. routing)
Improve results through
Harmonization (remove semantic heterogeneities)
Correction through estimation (gaps, inconsistencies)
Relevance
5
6. Spatial data quality
Standards (e.g. ISO 19157 = harmonization of
multiple preceeding standards) and extensive
body of literature of limited use for OSM data
Quality asssessment of OSM data
Primarily focusing on positional accuracy and
geometrical completeness
Reference data set and/or descriptive
statistics
Comparable little work on attribute quality
Data Quality
6
Haklay 2010
Hochmair et al. 2015
Barron et al. 2014
7. Why an intrinsic approach?
Extrinsic approach requires reference data set,
which ideally has:
Same geographical coverage
Same data model and attribute structure
[Koukoletsos et al. (2012): multi-stage process
to deal with it to a certain extent]
Quality of reference data set (authoritative data
doesn‘t necessarily imply better data!)
Data often created for very different purposes
Quality Assessment
7
Elsbethen (Austria):
authoritative data –
OSM data
8. Exclusively based on respective data set (data-centered
approach)
Makes use of:
Redundancy
Inherent logic, functionally related attributes
Intrinsic Approach
8
Translation into query
statements
highway = * surface = *
tracktype = *
9. Case Study Area
9
4,600 km² in Austrian-Bavarian
boarder region
~ 22,600 km total network length
Rural and urban areas
Data preparation
Extraction from OSM Database
(April 1st 2015)
Conversion to topological correct
graph (edge-node) in GeoDB
10. Major Road Network
10
Major road = motorway, primary, secondary
(incl. links)
Consistent for road category (highway = *)
Makes features mappable = primary
intent/purpose of OSM
Attributes incomplete (n = 11,951 segments)
name = *: 64.6%
surface = *: 22.93% [ can be estimated: asphalt]
maxspeed = *: 72.19%
lanes = *: 57.86%
Rather an issue of completeness than of
inconsistency and heterogeneity
11. Local Road Network
11
Majority of ways in OSM
Differences in terms of attribute
quality (existence, consistency etc.)
Relevant e.g. for active modes of
transport (cycling, hiking etc.)
In many cases more extensive
(spatial coverage, attribute details)
than authoritative data
12. Define set of logical/legal contradictions
Connect to corresponding tags
Tag specification according to Wiki
Query the dataset for contradictions
Attributive Inconsistencies
12
approx. 1 from 1,000
("tracktype" = 'grade3' or "tracktype" = 'grade4' or "tracktype" = 'grade5')
and "surface" = 'asphalt'
14. Correction without ground truthing = estimation
Quality of estimation depends on number of functionally
related attributes
Correction of Inconsistencies
14
15. How to map a mixed foot-/cycleway in OSM?
Heterogeneity
15
http://www.stadt-salzburg.at
16. How to map a mixed foot-/cycleway in OSM?
Co-existence vs. “tag war”
Credibility and reputation (Flanagin & Metzger 2008)
Heterogeneity
16
("highway" = 'footway' and ("bicycle" =
'designated' or "bicycle" = 'yes' or
"bicycle" = 'official'))
OR
("highway" = 'cycleway' and ("foot" =
'designated' or "foot" = 'yes'))
OR
("highway" = 'path' and ("foot" =
'designated' or "foot" = 'official') and
("bicycle" = 'designated' or "bicycle" =
'official'))
OR
("highway" = 'track' and ("foot" =
'designated' or "foot" = 'official') and
("bicycle" = 'designated' or "bicycle" =
'official'))
669 segments
1,202 segments
2,655 segments
73 segments
17. Different (correct) views on same entity
Heterogeneity
17
highway = cycleway
surface = asphalt
ref = BGL 3
foot = designated
bicycle = designated
segregated = no
Last editor: j_cook
highway = path
surface = asphalt
foot = designated
bicycle = designated
Last editor: pyram
19. Define derived attributes that fit best for actual purpose
Harmonization of Heterogeneity
19
Loidl & Zagel (2014)
20. OSMAXX
Extracts OSM data
Data cleaning (capital
letters etc.) and
harmonization
(generalization)
Conversion to GIS formats
For visualization and
geospatial analysis
Harmonization of Heterogeneity
20
21. Inconsistency = quality issue
Can be detected with intrinsic approach
Heterogeneity = depends on purpose
Definition of derived attributes
Implement assessment routines during editing or in post-
processing?
Tag recommender system during editing (Vandecasteele & Devillers 2014)
Probabilistic approach and/or functionally related attributes
Prevent from contradiction
Data tuning in post-processing allows specification for actual purpose
Combination prevent – detect – repair (Herzog et al 2007)
Data model issue social complexity of OSM (Spielmann 2014)
Wrap-Up
21
@gicycle_
gicycle.wordpress.com
Hinweis der Redaktion
Figure from GIP report (Elsbethen)
+ clustering in test data set (around Goldegg … >> check if it‘s the same author)