Generating Training Data from Noisy Measurements

•Als PPTX, PDF herunterladen•

0 gefällt mir•40 views

This document discusses generating training data for machine learning models from noisy measurements of land cover classifications. It describes a workflow that uses Sentinel-2 satellite imagery and GlobeLand30 land cover labels to train a random forests model for land cover classification. Key points include: - Sentinel-2 and GlobeLand30 data are used as input, with GlobeLand30 labels filtered and resampled to the Sentinel-2 grid to create reference labels. - A random forests model is trained separately for each Sentinel-2 scene using stratified samples of pixels. - Initial results show 88.75% average accuracy across scenes, with some classes like water predicting well and others like wetlands being more difficult.

Technologie

Generating Training Data from Noisy
Measurements
HAMED ALEMOHAMMAD
LEAD GEOSPATIAL DATA SCIENTIST

ML Hub Earth
 Machine Learning commons for EO
 Training data
 Models
 Standards and best practices

Global Land Cover Training Dataset
 Human-verified training dataset
 Using open-source Sentinel-2 imagery
 10 m spatial resolution.
 Global and geo-diverse

Workflow
S2 L2A
Reflectance
S2 L2A
Classification
GlobeLand30
Labels (2010)
Filtered Labels
Class
Predictions
Class
Verification
(Human)
Model
Training

Data
 Input Data:
 10 Sentinel-2 bands: Red, Green, Blue, Red-Edge1-3, NIR, Narrow NIR, SWIR1-2
 20 m bands scaled to 10m using bi-cubic interpolation
 Reference/Label Data:
 GlobeLand30 labels for 2010 used as a source
 Classes mapped to REF Land Cover Taxonomy
 Labels re-gridded to Sentinel-2 grid using nearest neighbor
 Labels filtered by agreement with classes from Sentinel-2’s 20m scene classification
(produced as part of atmospheric correction)
 Filtered labels used as reference labels for training

Methodology
 A pixel-based supervised Random Forests model trained for each scene.
 Pixels without valid reflectance are excluded from training.
 Training on class-stratified samples of half the pixels in a scene with one
Sentinel-2 pixel at 10 m for each label pixel at 30 m.
 Predictions are made on all pixels marked with usable classes during Level-2A
processing, including pixels labeled as unclassified.
 Annual labels will be generated by aggregating time series of predictions and
probabilities from the same tile throughout the year.

Results
 88.75% average model accuracy across 4 diverse scenes.
 Some classes, like water and snow/ice, predicted with high accuracy and high
confidence across all scenes.
 Other classes, like wetland and (semi) natural vegetation, are subtler and were
expected to be more difficult to classify.
 Woody vegetation and cultivated vegetation were predicted relatively
accurately and not confused with each other, as a result of including 20 m red
edge bands, resampled to 10 m.
 Artificial bare ground tended to be predicted in unclassified regions (in
reference data), taking over areas of natural bare ground and cultivated
vegetation and suggesting that traces of human activity would lead to pixels
classified as artificial bare ground in off-vegetation season.

What about non-categorical variables?
 True value of categorical variables vs true value of continuous variables:
 Crop Yield
 Soil Moisture
 Temperature
 Precipitation
 All measurements of continuous variables are prone to uncertainty (noise and
bias).
 How to reduce/eliminate these uncertainties in training data?

In-SituModel Satellite
Truth
Noisy and biased measurement systems
slide courtesy of K. McColl

Generating Training Dataset
 Triple collocation (TC) is a technique for estimating the unknown error standard
deviations (or RMSEs) of three mutually independent measurement systems,
without treating any one system as zero-error “truth”.
𝑄𝑖𝑗 ≡ 𝐶𝑜𝑣 𝑋𝑖, 𝑋𝑗 𝜎𝜀𝑖
= 𝑄𝑖𝑖 −
𝑄 𝑖𝑗 𝑄𝑖𝑘
𝑄 𝑗𝑘
 TC-based RMSE estimates at each pixel are used to compute a priori probability
(𝑃𝑖) of selecting a particular dataset:
𝑃𝑖 =
1
𝜎𝜀𝑖
2
𝑖=1
3 1
𝜎𝜀𝑖
2

Sample time series of a pixel
𝑋1 𝑋2 𝑋3
𝑡1
𝑡2
𝑡3
𝑡 𝑁
𝑋 𝑇

Alemohammad, et al., Biogeosciences, 2017

Things to check
 Sentinel-2 L2A classes
 What are the usable classes there?
 Plot actual scene + artificial bare ground

Empfohlen

igarss11_2.pptgrssieee

GIS work sampleMarvelous Echeng

Plot-Segmentation-PosterTravis Gray

CFD simulation as a tool for evaluation and optimization of uv reactor decont...Jan Rusås

Andy J Humane Near Real Time Monitoring Of Deforestation Using A Neural Aug...guest121fc9

Andy Jarvis and Louis Reymondin - PARASID Near Real Time Monitoring Of Defo...CIAT

Operational Data Fusion Framework for Building Frequent Land sat-Like ImageryKaashivInfoTech Company

Empfohlen

igarss11_2.pptgrssieee

GIS work sampleMarvelous Echeng

Plot-Segmentation-PosterTravis Gray

CFD simulation as a tool for evaluation and optimization of uv reactor decont...Jan Rusås

Andy J Humane Near Real Time Monitoring Of Deforestation Using A Neural Aug...guest121fc9

Andy Jarvis and Louis Reymondin - PARASID Near Real Time Monitoring Of Defo...CIAT

Operational Data Fusion Framework for Building Frequent Land sat-Like ImageryKaashivInfoTech Company

Investigation of Chaotic-Type Features in Hyperspectral Satellite Datacsandit

Fragmentation revisited 050902Niels Nielsen

REMOTE SENSINGmusadoto

Retraining maximum likelihood classifiers using low-rank model.pptgrssieee

Распознавание облаков и теней на спутниковых изображениях с использованием гл...Ontico

Hsc 340 10 14CSULB

Maciej soja l3_posterMaciej Soja

Raster data analysisAbdul Raziq

10008-16.antoine_lefebvre2Antoine Lefebvre

MODELING THE CHLOROPHYLL-A FROM SEA SURFACE REFLECTANCE IN WEST AFRICA BY DEE...ijaia

Robust registration of cloudy satellite images using two step segmentationI3E Technologies

Irrera gold2010grssieee

Digital Elevation Model (DEM)Malla Reddy University

Remote sensing e course (Geohydrology)Fatwa Ramdani

Pulvirenti_IGARSS2011.pptgrssieee

Af33174179IJERA Editor

Poster: MMSP 2008Mahfuzul Haque

Separability Analysis of Integrated Spaceborne Radar and Optical Data: Sudan ...rsmahabir

geographic information system pdfRolan Ben Lorono

DRONES IN HYDROLOGYSalvatore Manfreda

Molinier - Feature Selection for Tree Species Identification in Very High res...grssieee

Copernicus Land Moniotring Service PortfolioCLMS

Weitere ähnliche Inhalte

Was ist angesagt?

Investigation of Chaotic-Type Features in Hyperspectral Satellite Datacsandit

Fragmentation revisited 050902Niels Nielsen

REMOTE SENSINGmusadoto

Retraining maximum likelihood classifiers using low-rank model.pptgrssieee

Распознавание облаков и теней на спутниковых изображениях с использованием гл...Ontico

Hsc 340 10 14CSULB

Maciej soja l3_posterMaciej Soja

Raster data analysisAbdul Raziq

10008-16.antoine_lefebvre2Antoine Lefebvre

MODELING THE CHLOROPHYLL-A FROM SEA SURFACE REFLECTANCE IN WEST AFRICA BY DEE...ijaia

Robust registration of cloudy satellite images using two step segmentationI3E Technologies

Irrera gold2010grssieee

Digital Elevation Model (DEM)Malla Reddy University

Remote sensing e course (Geohydrology)Fatwa Ramdani

Pulvirenti_IGARSS2011.pptgrssieee

Af33174179IJERA Editor

Poster: MMSP 2008Mahfuzul Haque

Separability Analysis of Integrated Spaceborne Radar and Optical Data: Sudan ...rsmahabir

geographic information system pdfRolan Ben Lorono

Was ist angesagt? (19)

Investigation of Chaotic-Type Features in Hyperspectral Satellite Data

Fragmentation revisited 050902

REMOTE SENSING

Retraining maximum likelihood classifiers using low-rank model.ppt

Распознавание облаков и теней на спутниковых изображениях с использованием гл...

Hsc 340 10 14

Maciej soja l3_poster

Raster data analysis

10008-16.antoine_lefebvre2

MODELING THE CHLOROPHYLL-A FROM SEA SURFACE REFLECTANCE IN WEST AFRICA BY DEE...

Robust registration of cloudy satellite images using two step segmentation

Irrera gold2010

Digital Elevation Model (DEM)

Remote sensing e course (Geohydrology)

Pulvirenti_IGARSS2011.ppt

Af33174179

Poster: MMSP 2008

Separability Analysis of Integrated Spaceborne Radar and Optical Data: Sudan ...

geographic information system pdf

Ähnlich wie Generating Training Data from Noisy Measurements

DRONES IN HYDROLOGYSalvatore Manfreda

Molinier - Feature Selection for Tree Species Identification in Very High res...grssieee

Copernicus Land Moniotring Service PortfolioCLMS

IGARSS_2011_GALLOZA.pptxgrssieee

Atmospheric Correction of Remote Sensing Data_RamaRao.pptxssusercd49c0

Use of UAS for Hydrological MonitoringSalvatore Manfreda

Rb euregeo 2012 poster 2Ricardo Brasil

Yang-IGARSS2011-1082.pptxgrssieee

AT_MB_MM_IGARSS2011.pptgrssieee

SIXTEEN CHANNEL, NON-SCANNING AIRBORNE LIDAR SURFACE TOPOGRAPHY (LIST) SIMULATORgrssieee

Failed handoffs in collaborative Wi-Fi networksTELKOMNIKA JOURNAL

WE1.L09 - GLOBAL BIOMASS ESTIMATES FROM DESDYNIgrssieee

Prediction of soil properties with NIR data and site descriptors using prepro...FAO

2013 ASPRS Track, Ozone Modeling for the Contiguous United States by Michael ...GIS in the Rockies

MODELING THE CHLOROPHYLL-A FROM SEA SURFACE REFLECTANCE IN WEST AFRICA BY DEE...gerogepatton

2_Goodenough_IGARSS11_Final.pptgrssieee

Landsat calibration summary_rseAlejandro González Castillo

Kim_WE3_T05_2.pptxgrssieee

Atmospheric Correction of Remotely Sensed Images in Spatial and Transform DomainCSCJournals

Ähnlich wie Generating Training Data from Noisy Measurements (20)

DRONES IN HYDROLOGY

Molinier - Feature Selection for Tree Species Identification in Very High res...

Copernicus Land Moniotring Service Portfolio

IGARSS_2011_GALLOZA.pptx

Atmospheric Correction of Remote Sensing Data_RamaRao.pptx

Use of UAS for Hydrological Monitoring

Rb euregeo 2012 poster 2

Yang-IGARSS2011-1082.pptx

AT_MB_MM_IGARSS2011.ppt

SIXTEEN CHANNEL, NON-SCANNING AIRBORNE LIDAR SURFACE TOPOGRAPHY (LIST) SIMULATOR

Failed handoffs in collaborative Wi-Fi networks

WE1.L09 - GLOBAL BIOMASS ESTIMATES FROM DESDYNI

Prediction of soil properties with NIR data and site descriptors using prepro...

2013 ASPRS Track, Ozone Modeling for the Contiguous United States by Michael ...

MODELING THE CHLOROPHYLL-A FROM SEA SURFACE REFLECTANCE IN WEST AFRICA BY DEE...

2_Goodenough_IGARSS11_Final.ppt

Landsat calibration summary_rse

Kim_WE3_T05_2.pptx

Atmospheric Correction of Remotely Sensed Images in Spatial and Transform Domain

Mehr von Louisa Diggs

Workshop: Quantifying Error in Training Data for Mapping and Monitoring the E...Louisa Diggs

Using Active Learning to Quantify how Training Data Errors Impact Classificat...Louisa Diggs

Machine Learning for Better MapsLouisa Diggs

Cropped Field Boundaries, Food Systems, & FireLouisa Diggs

Challenges to Large Scale Mapping: Can Data Geometry Help?Louisa Diggs

A Random Walk of Issues Related to Training Data and Land Cover MappingLouisa Diggs

Assessing Land Cover Change using Uncertain DataLouisa Diggs

Informal Settlements and Cadastral MappingLouisa Diggs

Sources of Map Error in Public Health Activities and Operations ResearchLouisa Diggs

Measuring the impact of label noise on semantic segmentation using rastervisionLouisa Diggs

Mapping Smallholder Yields Using Micro-Satellite DataLouisa Diggs

Crowdsourcing Land Cover and Land Use Data: Experiences from IIASALouisa Diggs

IMED 2018: The use of remote sensing, geostatistical and machine learning met...Louisa Diggs

IMED 2018: Predicting the environmental suitability of podoconiosis in EthiopiaLouisa Diggs

IMED 2018: Landcover/habitatLouisa Diggs

IMED 2018: Modeled Population Estimates from Satellite Imagery and Microcensu...Louisa Diggs

IMED 2018: An intro to Remote Sensing and Machine LearningLouisa Diggs

IMED 2018: Mapping Monkeypox risk in the Congo Basin using Remote Sensing and...Louisa Diggs

IMED 2018: Predicting spatiotemporal risk of yellow fever using a machine lea...Louisa Diggs

IMED 2018: Innovations and Challenges in the Use of Open-source Remote Sensin...Louisa Diggs

Mehr von Louisa Diggs (20)

Workshop: Quantifying Error in Training Data for Mapping and Monitoring the E...

Using Active Learning to Quantify how Training Data Errors Impact Classificat...

Machine Learning for Better Maps

Cropped Field Boundaries, Food Systems, & Fire

Challenges to Large Scale Mapping: Can Data Geometry Help?

A Random Walk of Issues Related to Training Data and Land Cover Mapping

Assessing Land Cover Change using Uncertain Data

Informal Settlements and Cadastral Mapping

Sources of Map Error in Public Health Activities and Operations Research

Measuring the impact of label noise on semantic segmentation using rastervision

Mapping Smallholder Yields Using Micro-Satellite Data

Crowdsourcing Land Cover and Land Use Data: Experiences from IIASA

IMED 2018: The use of remote sensing, geostatistical and machine learning met...

IMED 2018: Predicting the environmental suitability of podoconiosis in Ethiopia

IMED 2018: Landcover/habitat

IMED 2018: Modeled Population Estimates from Satellite Imagery and Microcensu...

IMED 2018: An intro to Remote Sensing and Machine Learning

IMED 2018: Mapping Monkeypox risk in the Congo Basin using Remote Sensing and...

IMED 2018: Predicting spatiotemporal risk of yellow fever using a machine lea...

IMED 2018: Innovations and Challenges in the Use of Open-source Remote Sensin...

Kürzlich hochgeladen

Decarbonising Buildings: Making a net-zero built environment a realityIES VE

How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada

MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar

Top 10 Hubspot Development Companies in 2024TopCSSGallery

How to write a Business Continuity PlanDatabarracks

Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3

Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3

Time Series Foundation Models - current state and future directionsNathaniel Shimoni

Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA

Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González

Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein

Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica

Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh

A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3

Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq

Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada

Kürzlich hochgeladen (20)

Decarbonising Buildings: Making a net-zero built environment a reality

How AI, OpenAI, and ChatGPT impact business and software.

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024

MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes

Top 10 Hubspot Development Companies in 2024

How to write a Business Continuity Plan

Digital Identity is Under Attack: FIDO Paris Seminar.pptx

Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx

Time Series Foundation Models - current state and future directions

Long journey of Ruby standard library at RubyConf AU 2024

Generative Artificial Intelligence: How generative AI works.pdf

Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24

Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure

Generative AI - Gitex v1Generative AI - Gitex v1.pptx

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx

Genislab builds better products and faster go-to-market with Lean project man...

Testing tools and AI - ideas what to try with some tool examples

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024

Generating Training Data from Noisy Measurements

1. Generating Training Data from Noisy Measurements HAMED ALEMOHAMMAD LEAD GEOSPATIAL DATA SCIENTIST

2. ML Hub Earth  Machine Learning commons for EO  Training data  Models  Standards and best practices

3. Global Land Cover Training Dataset  Human-verified training dataset  Using open-source Sentinel-2 imagery  10 m spatial resolution.  Global and geo-diverse

4. Workflow S2 L2A Reflectance S2 L2A Classification GlobeLand30 Labels (2010) Filtered Labels Class Predictions Class Verification (Human) Model Training

5. Data  Input Data:  10 Sentinel-2 bands: Red, Green, Blue, Red-Edge1-3, NIR, Narrow NIR, SWIR1-2  20 m bands scaled to 10m using bi-cubic interpolation  Reference/Label Data:  GlobeLand30 labels for 2010 used as a source  Classes mapped to REF Land Cover Taxonomy  Labels re-gridded to Sentinel-2 grid using nearest neighbor  Labels filtered by agreement with classes from Sentinel-2’s 20m scene classification (produced as part of atmospheric correction)  Filtered labels used as reference labels for training

7. Methodology  A pixel-based supervised Random Forests model trained for each scene.  Pixels without valid reflectance are excluded from training.  Training on class-stratified samples of half the pixels in a scene with one Sentinel-2 pixel at 10 m for each label pixel at 30 m.  Predictions are made on all pixels marked with usable classes during Level-2A processing, including pixels labeled as unclassified.  Annual labels will be generated by aggregating time series of predictions and probabilities from the same tile throughout the year.

8. Results  88.75% average model accuracy across 4 diverse scenes.  Some classes, like water and snow/ice, predicted with high accuracy and high confidence across all scenes.  Other classes, like wetland and (semi) natural vegetation, are subtler and were expected to be more difficult to classify.  Woody vegetation and cultivated vegetation were predicted relatively accurately and not confused with each other, as a result of including 20 m red edge bands, resampled to 10 m.  Artificial bare ground tended to be predicted in unclassified regions (in reference data), taking over areas of natural bare ground and cultivated vegetation and suggesting that traces of human activity would lead to pixels classified as artificial bare ground in off-vegetation season.

9. Results

10.

11. What about non-categorical variables?  True value of categorical variables vs true value of continuous variables:  Crop Yield  Soil Moisture  Temperature  Precipitation  All measurements of continuous variables are prone to uncertainty (noise and bias).  How to reduce/eliminate these uncertainties in training data?

12. In-SituModel Satellite Truth Noisy and biased measurement systems slide courtesy of K. McColl

13. Generating Training Dataset  Triple collocation (TC) is a technique for estimating the unknown error standard deviations (or RMSEs) of three mutually independent measurement systems, without treating any one system as zero-error “truth”. 𝑄𝑖𝑗 ≡ 𝐶𝑜𝑣 𝑋𝑖, 𝑋𝑗 𝜎𝜀𝑖 = 𝑄𝑖𝑖 − 𝑄 𝑖𝑗 𝑄𝑖𝑘 𝑄 𝑗𝑘  TC-based RMSE estimates at each pixel are used to compute a priori probability (𝑃𝑖) of selecting a particular dataset: 𝑃𝑖 = 1 𝜎𝜀𝑖 2 𝑖=1 3 1 𝜎𝜀𝑖 2

14. Sample time series of a pixel 𝑋1 𝑋2 𝑋3 𝑡1 𝑡2 𝑡3 𝑡 𝑁 𝑋 𝑇

15.

16.

17. Backup Slides

18. Alemohammad, et al., Biogeosciences, 2017

19. Alemohammad, et al., Biogeosciences, 2017

20. Things to check  Sentinel-2 L2A classes  What are the usable classes there?  Plot actual scene + artificial bare ground