The Future of Software Development - Devin AI Innovative Approach.pdf
Improving volunteered geographic data quality using semantic similarity measurements
1. 1/26
Improving volunteered geographic data qualityImproving volunteered geographic data quality
using semantic similarity measurementsusing semantic similarity measurements
Arnaud Vandecasteele -Arnaud Vandecasteele - Rodolphe DevillersRodolphe Devillers
Memorial University of Newfoundland, CanadaMemorial University of Newfoundland, Canada
8th International Symposium on Spatial Data Quality, 30 May - 1 June 2013
8. 8/26
Introduction
Volunteered Geographic Information (VGI)
the widespread engagement of large numbers of
private citizens, often with little in the way of formal
qualifications, in the creation of geographic information
“
Goodchild - 2007
”
10. 10/26
Introduction
Data Quality & Volunteered Geographic Information
What about
Data Quality ?
Good geometric accuracy
Haklay – 2010, Girres and Touya –
2010, Ludwig et al., - 2011
But
Geographic coverage patchwork
Goodchild - 2007
Semantics can be inconsistent
Ballatore et al., - 2012, Mooney and Corcoran - 2012
11. 11/26
Introduction
VGI changed the way we
produce, publish and share Geographic Information
BUT
Semantic Quality is still an important issue
How to improve semantic quality using a VGI approach ?
Research Problem
12. 12/26
Semantic Similarity
What is Semantic Similarity ?
Landuse
=
Forest
How to describe a forest in OpenStreetMap
Natural
=
Wood
One concept, different representation !
Q ? -> When should we use landuse=forest rather than natural=wood?
* https://help.openstreetmap.org/questions/324/when-should-we-use-landuseforest-rather-than-naturalwood
11 different answers and no real general agreement
13. 13/26
Semantic Similarity
How to measure the semantic similarity ?
● Geometric Model
● Feature Model
● Alignment Model
● Network models
● Transformation Model
Different models exist:
Semantic similarity applied to VGI:
Mooney and Corcoran - 2012
Ballatore et al., - 2012
Natural
=
Wood
Landuse
=
Forest
Natural
=
Wood
Landuse
=
Forest
Natural
=
Wood
Landuse
=
Forest
Measure?
Semantic Network created from the OpenStreetMap Wiki
Point Pattern analysis and semantic pattern
16. 16/26
Measuring Semantic similarity
Two entities are similar if :
1 They are referenced by similar entities
2 They reference similar entities
A B
C
=
A B
C
=
Semantic Similarity
P-Rank
Algorithm
17. 17/26
Semantic similarity
all things are related, but nearbynearby things
are more relatedrelated than distant things
“
”Tobler - 1970
Semantic similarity and Geography
Tobler's first law of geography
18. 18/26
New Object in a cityNew Object in a cityA
P-Rank score
P-Rankscore
P-Rank
score
P-Rankscore
P-Rank
score
P-Rank
score
Semantic similarity
Applied Tobler's first law to semantic similarity
21. 21/26
A B
P-Rank Score
0.18
A C
P-Rank Score
0.35
A D
P-Rank Score
0.05
How similar are they ?
P-Rank scores
OpenStreetMap Semantic Plugin (aka OSMantic)
Description
A
AC
22. 22/26
Creation of a new object
Examples - Creation of a new object
New object
25. 25/26
Conclusion
The next big question ?
When will VGI be the next authoritative dataset ?
Semantic Similarity can be used to enhance the quality of VGI dataset
OSM Semantic plugin uses a collaborative approach to
reduce the potential semantic similarity
How to improve the results:
● Using the Tag Info database to know the most used tags
● By mixing the Geographic and the semantic approach (Ballatore + Mooney)
26. 26/26
Questions ?
Rodolphe Devillers
Marine Geomatics Lab
http://www.marinegis.com/
Memorial University of Newfoundland
Acknowledgements
Natural Science and Engineering Research Council of Canada (NSERC)
Andrea Ballatore for sharing his results