Quantifying Error in Training Data for Mapping and Monitoring the Earth System - A Workshop on “Quantifying Error in Training Data for Mapping and Monitoring the Earth System” was held on January 8-9, 2019 at Clark University, with support from Omidyar Network’s Property Rights Initiative, now PlaceFund.
6. Label Maker
OpenStreetMap is an attractive label/tags database for machine learning
applications that holds repid updated mapped object daily by thousands of users.
7.
8. Training Data Completeness Matters
- OSM tag/label info and
popularity
Tag info in France.
Landuse is one of tags that has
been frequently used by the
users.
9. Label Maker
OpenStreetMap is an attractive label/tags database for machine learning
applications that host rapidly update mapped object daily by thousands of users.
11. ISO standard for geographic information data: positional accuracy,
completeness, and logical consistency.
Other data quality issues in OSM:
- Vandalism
- Missing details
- Completeness and accuracy
Training data Completeness Matters
12. Training Data Completeness Matters
Available tools for data quality
assessment:
- OSM analytics (OSM v.s.
Human Settlement Layer)
- OSM-lint (e.g. OSM v.s.
US census TIGER in USA)
13. Training Data Completeness Matters
Building classification in Vietnam with LeNet
on AWS SageMaker.
Individual building detection with Tensorflow Object
detection in Mexico
60% -> 84% from Vietnam to Mexico
14. - OSM label data + satellite images match
- OSM label data is not well-aligned with the paired satellite image
Training Data Completeness Matters
21. Training Data Geodiversity Matters
Urban settlement change detection in Ethiopia between 2000 - 2017 with random
22. Conclusions
When it comes to applying machine learning applications:
- Training data quality matters, and to use OSM label data for ML
applications, I recommend:
1. Do a proper label completeness assessment with currently
available tools;
2. Check OSM tag/label info and frequency for your area of interest;
3. For segmentation ML application, make sure the image tiles
align-well with your label dataset;
4. Prepare training dataset using: Label Maker, or RoboSat or other
data prep tools.
- Training data geodiversity matters, and recommend to do:
a. data/image feature similarity analysis;
24. Data Completeness Matters
HOT Analytics for Health
With support of the Bill and Melinda Gates
Foundation and the Clinton Health Access
Initiative, we have designed an analysis tool
to evaluate the accuracy and precision of
OpenStreetMap field data.
25. Other data quality issues in OSM:
- Vandalism
- Missing details
- Completeness and accuracy
The results of this analysis found the
positional accuracy of OpenStreetMap data
to be very good in comparison to OS
MasterMap, with over 80% overlap
between most the road objects tested
between the two datasets. The results also
found there to be a positive correlation
between road name attribute
completeness and number of users per
area.
Training data Completeness Matters