SlideShare ist ein Scribd-Unternehmen logo
1 von 37
Downloaden Sie, um offline zu lesen
Using active learning to quantify how training
data errors impact classification accuracy over
smallholder-dominated agricultural systems
Stephanie Debats, Lei Song, Su Ye, Sitian Xiong, Kaixi Zhang,
Tammy Woodard, Ron Eastman, Ryan Avery, Kelly Caylor,
Dennis McRitchie, Lyndon Estes
Clark University|Clark Labs
University of California Santa Barbara
AWS Cloud Credits for Research Program
IIASA
Stephanie Debats Ryan Avery Su YeLei Song Sitian Xiong
Problem 1: High spatial variability
Problem 2: High temporal variability
Bing Base Map
PlanetScope Analytic
Problem 3: Interpretation errors in training data
High spatial & temporal resolution imagery
Active Learning
01
00
11
Train
Predict
Select
Re-label
Label
Debats et al, 2017
Study Region
prob 

(%)
prob 

(%)
Growing Season
Dry Season
Labelling component:
Crowdsourcing Platform
True positive (TP) False positive (FP) False negative (FN) True negative (TN)
score = in_accuracy * β0 +
out_accuracy * β1 +
fragmentation * β2 +
edge_accuracy * β3 +
categorical_accuracy * β4
Accuracy assessment and consensus labelling
Probability
Bayesian Model Averaging
Label collection
! " # = % ! &' # !("|#, &')
,
'-.
Bayesian Model Averaging
Heat map
! " # = % ! &' # !("|#, &')
,
'-.
Consensus Label
Probability
Debats et al (2016)
A generalized computer vision approach to mapping crop fields in
heterogeneous agricultural landscapes
Remote Sensing Environment 179
Machine Learning component
1. On the fly feature extraction
2. Spark ML RandomForest
GeoTrellis/
GeoPySpark
Does Training Data Error Impact Classification Performance?
Next Steps
1. Errors in image atmospheric corrections
2. Increase feature space for classifier
3. Improve label quality
4. Quantify gap between worker and ground
Worker map
Ground truth(y)
Where lies the truth?
8
Circle Bias, many
false positive
identified because
of overreliance on
circular features
https://github.com/ecoh
ydro/CropMask_RCNN
Probability
score above
.7 deemed a
center pivot
Tested on
never before
seen
512x512 tiles
11
Some center
pivots are
missed
because of
date mismatch
between
imagery and
labels of the
reference
dataset
BAYESIAN MODEL AVERAGING:
! " # = %
&'(
)
! *& # !("|#, *&)
": the ground truth, which will be either ‘field’ or ‘no field’
#: the given data of crowdsourcing opinions for labeling this pixel
(e.g., # = {#mapper_1 = field , #mapper_/= no field, …} )
*&: the Mappers considered
(1) 012234&’s opinion: how much probability to
be "
(2) Weight (or evidence): is the probability that we weigh
012234&’s opinion based on their mapping history
combining crowdsourcing labels from their mapping history
MAPPER OPINION
In our mapping project, mappers are allowed to only label a crispy category for polygons (either ‘field’ or ‘no
field’). So ! " #, %& = 0 )* 1
(1) !(" = -./01|#& = -./01, %&) = 1
(2) !(" = 4) -./01|#& = -./01, %&) = 0
(3) !(" = 4) -./01|#& = 4) -./01, %&) = 1
(4) !(" = -./01|#& = 4) -./01, %&) = 0
WEIGHT
Weight: ! "# $ ∝ ! $ "# !("#)
(1) !("#): ‘mapper priors’, is our prior belief for mapper '. We can use average score
(combining geometric and thematic accuracy) to represent our belief
(()*) ∝ (∑,-.
/
01234,) /7
(2) ! $ "# : ‘mapper likelihood’, ! $ "# ∝ exp(-
.
8
9:;#) [1][2]
BIC(Bayesian Information Criterion) = ln ? ∗ A − 2 ln D $ ̂F, "
‘BIC simply reduces to maximum likelihood when the number of parameters is equal
for the models of interest’ [3] , so 9:; ≈ −2 ln D $ IF, " . After adjustment,
( J )* ∝ K J ̂F, )* (Maximum mapper likelihood)
(? is the sample number, A
is the parameter number to
be estimated (our case has
only one, i.e., L), ML is the
label that maximizes the
likelihood function)
WEIGHT (CONTI.)
Weight: ! "# $ ∝ ! $ "# !("#)
Mapper likelihood: ' ( )* ∝ + ( ,-, )* (Maximum Mapper likelihood)
(1) !(- = 01234| ,-, "#) = ! $ = 01234 - = 01234, "# = (∑8
9 :;<
:;<=>?<
) /A
(2) !(- = BC 01234| ,-, "#) = ! $ = BC 01234 - = BC 01234, "# = (∑8
9 :?<
:?<=>;<
) /A
D $ ̂-, " can be computed as:
* Maximum mapper likelihood is actually average producer’s accuracy of the mapper
SUMMARY
! " # = ∑&'(
)
! *& # !("|#, *&)
weight = score ∗ producer′s accuracy ∝ P M8 D
P("|D, M8) = 0 ;< 1
Labeling:
If ! " = >?@AB # > ! " = D; >?@AB # (or ! " = >?@AB # > 0.5), we give a consensus label
as field; otherwise, we give a label as no field
The posterior probability of the pixel label " given the data of mappers’ opinions (#):
(*& is the mapper ?)
→ ! " # =
∑FGH
I
JK&LMNF∗ O(P|Q,RF)
∑FGH
I
JK&LMNF
, where

Weitere ähnliche Inhalte

Ähnlich wie Using Active Learning to Quantify how Training Data Errors Impact Classification Accuracy over Smallholder-Dominated Agricultural Systems

07 dimensionality reduction
07 dimensionality reduction07 dimensionality reduction
07 dimensionality reductionMarco Quartulli
 
Random forest algorithm for regression a beginner's guide
Random forest algorithm for regression   a beginner's guideRandom forest algorithm for regression   a beginner's guide
Random forest algorithm for regression a beginner's guideprateek kumar
 
T. Lucas Makinen x Imperial SBI Workshop
T. Lucas Makinen x Imperial SBI WorkshopT. Lucas Makinen x Imperial SBI Workshop
T. Lucas Makinen x Imperial SBI WorkshopLucasMakinen1
 
2013.11.14 Big Data Workshop Bruno Voisin
2013.11.14 Big Data Workshop Bruno Voisin 2013.11.14 Big Data Workshop Bruno Voisin
2013.11.14 Big Data Workshop Bruno Voisin NUI Galway
 
A note on estimation of population mean in sample survey using auxiliary info...
A note on estimation of population mean in sample survey using auxiliary info...A note on estimation of population mean in sample survey using auxiliary info...
A note on estimation of population mean in sample survey using auxiliary info...Alexander Decker
 
Optimization of sample configurations for spatial trend estimation
Optimization of sample configurations for spatial trend estimationOptimization of sample configurations for spatial trend estimation
Optimization of sample configurations for spatial trend estimationAlessandro Samuel-Rosa
 
An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)Dataspora
 
software engineering modules iii & iv.pptx
software engineering  modules iii & iv.pptxsoftware engineering  modules iii & iv.pptx
software engineering modules iii & iv.pptxrani marri
 
Course Project for Coursera Practical Machine Learning
Course Project for Coursera Practical Machine LearningCourse Project for Coursera Practical Machine Learning
Course Project for Coursera Practical Machine LearningJohn Edward Slough II
 
Regression and Classification with R
Regression and Classification with RRegression and Classification with R
Regression and Classification with RYanchang Zhao
 
2. data types, variables and operators
2. data types, variables and operators2. data types, variables and operators
2. data types, variables and operatorsPhD Research Scholar
 
Principal Components Analysis, Calculation and Visualization
Principal Components Analysis, Calculation and VisualizationPrincipal Components Analysis, Calculation and Visualization
Principal Components Analysis, Calculation and VisualizationMarjan Sterjev
 
Outrageous Ideas for Graph Databases
Outrageous Ideas for Graph DatabasesOutrageous Ideas for Graph Databases
Outrageous Ideas for Graph DatabasesMax De Marzi
 
Prob-Dist-Toll-Forecast-Uncertainty
Prob-Dist-Toll-Forecast-UncertaintyProb-Dist-Toll-Forecast-Uncertainty
Prob-Dist-Toll-Forecast-UncertaintyAnkoor Bhagat
 
1 PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
1  PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx1  PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
1 PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docxaulasnilda
 

Ähnlich wie Using Active Learning to Quantify how Training Data Errors Impact Classification Accuracy over Smallholder-Dominated Agricultural Systems (20)

Iowa_Report_2
Iowa_Report_2Iowa_Report_2
Iowa_Report_2
 
07 dimensionality reduction
07 dimensionality reduction07 dimensionality reduction
07 dimensionality reduction
 
Random forest algorithm for regression a beginner's guide
Random forest algorithm for regression   a beginner's guideRandom forest algorithm for regression   a beginner's guide
Random forest algorithm for regression a beginner's guide
 
T. Lucas Makinen x Imperial SBI Workshop
T. Lucas Makinen x Imperial SBI WorkshopT. Lucas Makinen x Imperial SBI Workshop
T. Lucas Makinen x Imperial SBI Workshop
 
2013.11.14 Big Data Workshop Bruno Voisin
2013.11.14 Big Data Workshop Bruno Voisin 2013.11.14 Big Data Workshop Bruno Voisin
2013.11.14 Big Data Workshop Bruno Voisin
 
Curvefitting
CurvefittingCurvefitting
Curvefitting
 
A note on estimation of population mean in sample survey using auxiliary info...
A note on estimation of population mean in sample survey using auxiliary info...A note on estimation of population mean in sample survey using auxiliary info...
A note on estimation of population mean in sample survey using auxiliary info...
 
Scaling up data science applications
Scaling up data science applicationsScaling up data science applications
Scaling up data science applications
 
FinalReportFoxMelle
FinalReportFoxMelleFinalReportFoxMelle
FinalReportFoxMelle
 
Survey Demo
Survey DemoSurvey Demo
Survey Demo
 
Optimization of sample configurations for spatial trend estimation
Optimization of sample configurations for spatial trend estimationOptimization of sample configurations for spatial trend estimation
Optimization of sample configurations for spatial trend estimation
 
An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)
 
software engineering modules iii & iv.pptx
software engineering  modules iii & iv.pptxsoftware engineering  modules iii & iv.pptx
software engineering modules iii & iv.pptx
 
Course Project for Coursera Practical Machine Learning
Course Project for Coursera Practical Machine LearningCourse Project for Coursera Practical Machine Learning
Course Project for Coursera Practical Machine Learning
 
Regression and Classification with R
Regression and Classification with RRegression and Classification with R
Regression and Classification with R
 
2. data types, variables and operators
2. data types, variables and operators2. data types, variables and operators
2. data types, variables and operators
 
Principal Components Analysis, Calculation and Visualization
Principal Components Analysis, Calculation and VisualizationPrincipal Components Analysis, Calculation and Visualization
Principal Components Analysis, Calculation and Visualization
 
Outrageous Ideas for Graph Databases
Outrageous Ideas for Graph DatabasesOutrageous Ideas for Graph Databases
Outrageous Ideas for Graph Databases
 
Prob-Dist-Toll-Forecast-Uncertainty
Prob-Dist-Toll-Forecast-UncertaintyProb-Dist-Toll-Forecast-Uncertainty
Prob-Dist-Toll-Forecast-Uncertainty
 
1 PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
1  PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx1  PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
1 PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
 

Mehr von Louisa Diggs

Workshop: Quantifying Error in Training Data for Mapping and Monitoring the E...
Workshop: Quantifying Error in Training Data for Mapping and Monitoring the E...Workshop: Quantifying Error in Training Data for Mapping and Monitoring the E...
Workshop: Quantifying Error in Training Data for Mapping and Monitoring the E...Louisa Diggs
 
Machine Learning for Better Maps
Machine Learning for Better MapsMachine Learning for Better Maps
Machine Learning for Better MapsLouisa Diggs
 
Generating Training Data from Noisy Measrements
Generating Training Data from Noisy MeasrementsGenerating Training Data from Noisy Measrements
Generating Training Data from Noisy MeasrementsLouisa Diggs
 
Cropped Field Boundaries, Food Systems, & Fire
Cropped Field Boundaries, Food Systems, & FireCropped Field Boundaries, Food Systems, & Fire
Cropped Field Boundaries, Food Systems, & FireLouisa Diggs
 
Challenges to Large Scale Mapping: Can Data Geometry Help?
Challenges to Large Scale Mapping: Can Data Geometry Help?Challenges to Large Scale Mapping: Can Data Geometry Help?
Challenges to Large Scale Mapping: Can Data Geometry Help?Louisa Diggs
 
A Random Walk of Issues Related to Training Data and Land Cover Mapping
A Random Walk of Issues Related to Training Data and Land Cover MappingA Random Walk of Issues Related to Training Data and Land Cover Mapping
A Random Walk of Issues Related to Training Data and Land Cover MappingLouisa Diggs
 
Assessing Land Cover Change using Uncertain Data
Assessing Land Cover Change using Uncertain DataAssessing Land Cover Change using Uncertain Data
Assessing Land Cover Change using Uncertain DataLouisa Diggs
 
Informal Settlements and Cadastral Mapping
Informal Settlements and Cadastral MappingInformal Settlements and Cadastral Mapping
Informal Settlements and Cadastral MappingLouisa Diggs
 
Sources of Map Error in Public Health Activities and Operations Research
Sources of Map Error in Public Health Activities and Operations ResearchSources of Map Error in Public Health Activities and Operations Research
Sources of Map Error in Public Health Activities and Operations ResearchLouisa Diggs
 
Measuring the impact of label noise on semantic segmentation using rastervision
Measuring the impact of label noise on semantic segmentation using rastervisionMeasuring the impact of label noise on semantic segmentation using rastervision
Measuring the impact of label noise on semantic segmentation using rastervisionLouisa Diggs
 
Mapping Smallholder Yields Using Micro-Satellite Data
Mapping Smallholder Yields Using Micro-Satellite DataMapping Smallholder Yields Using Micro-Satellite Data
Mapping Smallholder Yields Using Micro-Satellite DataLouisa Diggs
 
Crowdsourcing Land Cover and Land Use Data: Experiences from IIASA
Crowdsourcing Land Cover and Land Use Data: Experiences from IIASACrowdsourcing Land Cover and Land Use Data: Experiences from IIASA
Crowdsourcing Land Cover and Land Use Data: Experiences from IIASALouisa Diggs
 
IMED 2018: The use of remote sensing, geostatistical and machine learning met...
IMED 2018: The use of remote sensing, geostatistical and machine learning met...IMED 2018: The use of remote sensing, geostatistical and machine learning met...
IMED 2018: The use of remote sensing, geostatistical and machine learning met...Louisa Diggs
 
IMED 2018: Predicting the environmental suitability of podoconiosis in Ethiopia
IMED 2018: Predicting the environmental suitability of podoconiosis in EthiopiaIMED 2018: Predicting the environmental suitability of podoconiosis in Ethiopia
IMED 2018: Predicting the environmental suitability of podoconiosis in EthiopiaLouisa Diggs
 
IMED 2018: Landcover/habitat
IMED 2018: Landcover/habitatIMED 2018: Landcover/habitat
IMED 2018: Landcover/habitatLouisa Diggs
 
IMED 2018: Modeled Population Estimates from Satellite Imagery and Microcensu...
IMED 2018: Modeled Population Estimates from Satellite Imagery and Microcensu...IMED 2018: Modeled Population Estimates from Satellite Imagery and Microcensu...
IMED 2018: Modeled Population Estimates from Satellite Imagery and Microcensu...Louisa Diggs
 
IMED 2018: An intro to Remote Sensing and Machine Learning
IMED 2018: An intro to Remote Sensing and Machine LearningIMED 2018: An intro to Remote Sensing and Machine Learning
IMED 2018: An intro to Remote Sensing and Machine LearningLouisa Diggs
 
IMED 2018: Mapping Monkeypox risk in the Congo Basin using Remote Sensing and...
IMED 2018: Mapping Monkeypox risk in the Congo Basin using Remote Sensing and...IMED 2018: Mapping Monkeypox risk in the Congo Basin using Remote Sensing and...
IMED 2018: Mapping Monkeypox risk in the Congo Basin using Remote Sensing and...Louisa Diggs
 
IMED 2018: Predicting spatiotemporal risk of yellow fever using a machine lea...
IMED 2018: Predicting spatiotemporal risk of yellow fever using a machine lea...IMED 2018: Predicting spatiotemporal risk of yellow fever using a machine lea...
IMED 2018: Predicting spatiotemporal risk of yellow fever using a machine lea...Louisa Diggs
 
IMED 2018: Innovations and Challenges in the Use of Open-source Remote Sensin...
IMED 2018: Innovations and Challenges in the Use of Open-source Remote Sensin...IMED 2018: Innovations and Challenges in the Use of Open-source Remote Sensin...
IMED 2018: Innovations and Challenges in the Use of Open-source Remote Sensin...Louisa Diggs
 

Mehr von Louisa Diggs (20)

Workshop: Quantifying Error in Training Data for Mapping and Monitoring the E...
Workshop: Quantifying Error in Training Data for Mapping and Monitoring the E...Workshop: Quantifying Error in Training Data for Mapping and Monitoring the E...
Workshop: Quantifying Error in Training Data for Mapping and Monitoring the E...
 
Machine Learning for Better Maps
Machine Learning for Better MapsMachine Learning for Better Maps
Machine Learning for Better Maps
 
Generating Training Data from Noisy Measrements
Generating Training Data from Noisy MeasrementsGenerating Training Data from Noisy Measrements
Generating Training Data from Noisy Measrements
 
Cropped Field Boundaries, Food Systems, & Fire
Cropped Field Boundaries, Food Systems, & FireCropped Field Boundaries, Food Systems, & Fire
Cropped Field Boundaries, Food Systems, & Fire
 
Challenges to Large Scale Mapping: Can Data Geometry Help?
Challenges to Large Scale Mapping: Can Data Geometry Help?Challenges to Large Scale Mapping: Can Data Geometry Help?
Challenges to Large Scale Mapping: Can Data Geometry Help?
 
A Random Walk of Issues Related to Training Data and Land Cover Mapping
A Random Walk of Issues Related to Training Data and Land Cover MappingA Random Walk of Issues Related to Training Data and Land Cover Mapping
A Random Walk of Issues Related to Training Data and Land Cover Mapping
 
Assessing Land Cover Change using Uncertain Data
Assessing Land Cover Change using Uncertain DataAssessing Land Cover Change using Uncertain Data
Assessing Land Cover Change using Uncertain Data
 
Informal Settlements and Cadastral Mapping
Informal Settlements and Cadastral MappingInformal Settlements and Cadastral Mapping
Informal Settlements and Cadastral Mapping
 
Sources of Map Error in Public Health Activities and Operations Research
Sources of Map Error in Public Health Activities and Operations ResearchSources of Map Error in Public Health Activities and Operations Research
Sources of Map Error in Public Health Activities and Operations Research
 
Measuring the impact of label noise on semantic segmentation using rastervision
Measuring the impact of label noise on semantic segmentation using rastervisionMeasuring the impact of label noise on semantic segmentation using rastervision
Measuring the impact of label noise on semantic segmentation using rastervision
 
Mapping Smallholder Yields Using Micro-Satellite Data
Mapping Smallholder Yields Using Micro-Satellite DataMapping Smallholder Yields Using Micro-Satellite Data
Mapping Smallholder Yields Using Micro-Satellite Data
 
Crowdsourcing Land Cover and Land Use Data: Experiences from IIASA
Crowdsourcing Land Cover and Land Use Data: Experiences from IIASACrowdsourcing Land Cover and Land Use Data: Experiences from IIASA
Crowdsourcing Land Cover and Land Use Data: Experiences from IIASA
 
IMED 2018: The use of remote sensing, geostatistical and machine learning met...
IMED 2018: The use of remote sensing, geostatistical and machine learning met...IMED 2018: The use of remote sensing, geostatistical and machine learning met...
IMED 2018: The use of remote sensing, geostatistical and machine learning met...
 
IMED 2018: Predicting the environmental suitability of podoconiosis in Ethiopia
IMED 2018: Predicting the environmental suitability of podoconiosis in EthiopiaIMED 2018: Predicting the environmental suitability of podoconiosis in Ethiopia
IMED 2018: Predicting the environmental suitability of podoconiosis in Ethiopia
 
IMED 2018: Landcover/habitat
IMED 2018: Landcover/habitatIMED 2018: Landcover/habitat
IMED 2018: Landcover/habitat
 
IMED 2018: Modeled Population Estimates from Satellite Imagery and Microcensu...
IMED 2018: Modeled Population Estimates from Satellite Imagery and Microcensu...IMED 2018: Modeled Population Estimates from Satellite Imagery and Microcensu...
IMED 2018: Modeled Population Estimates from Satellite Imagery and Microcensu...
 
IMED 2018: An intro to Remote Sensing and Machine Learning
IMED 2018: An intro to Remote Sensing and Machine LearningIMED 2018: An intro to Remote Sensing and Machine Learning
IMED 2018: An intro to Remote Sensing and Machine Learning
 
IMED 2018: Mapping Monkeypox risk in the Congo Basin using Remote Sensing and...
IMED 2018: Mapping Monkeypox risk in the Congo Basin using Remote Sensing and...IMED 2018: Mapping Monkeypox risk in the Congo Basin using Remote Sensing and...
IMED 2018: Mapping Monkeypox risk in the Congo Basin using Remote Sensing and...
 
IMED 2018: Predicting spatiotemporal risk of yellow fever using a machine lea...
IMED 2018: Predicting spatiotemporal risk of yellow fever using a machine lea...IMED 2018: Predicting spatiotemporal risk of yellow fever using a machine lea...
IMED 2018: Predicting spatiotemporal risk of yellow fever using a machine lea...
 
IMED 2018: Innovations and Challenges in the Use of Open-source Remote Sensin...
IMED 2018: Innovations and Challenges in the Use of Open-source Remote Sensin...IMED 2018: Innovations and Challenges in the Use of Open-source Remote Sensin...
IMED 2018: Innovations and Challenges in the Use of Open-source Remote Sensin...
 

Kürzlich hochgeladen

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 

Kürzlich hochgeladen (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 

Using Active Learning to Quantify how Training Data Errors Impact Classification Accuracy over Smallholder-Dominated Agricultural Systems

  • 1. Using active learning to quantify how training data errors impact classification accuracy over smallholder-dominated agricultural systems Stephanie Debats, Lei Song, Su Ye, Sitian Xiong, Kaixi Zhang, Tammy Woodard, Ron Eastman, Ryan Avery, Kelly Caylor, Dennis McRitchie, Lyndon Estes Clark University|Clark Labs University of California Santa Barbara
  • 2. AWS Cloud Credits for Research Program IIASA
  • 3. Stephanie Debats Ryan Avery Su YeLei Song Sitian Xiong
  • 4. Problem 1: High spatial variability
  • 5. Problem 2: High temporal variability Bing Base Map PlanetScope Analytic
  • 6. Problem 3: Interpretation errors in training data
  • 7. High spatial & temporal resolution imagery
  • 10.
  • 16.
  • 17.
  • 18.
  • 19. True positive (TP) False positive (FP) False negative (FN) True negative (TN)
  • 20. score = in_accuracy * β0 + out_accuracy * β1 + fragmentation * β2 + edge_accuracy * β3 + categorical_accuracy * β4
  • 21. Accuracy assessment and consensus labelling Probability
  • 22. Bayesian Model Averaging Label collection ! " # = % ! &' # !("|#, &') , '-.
  • 23. Bayesian Model Averaging Heat map ! " # = % ! &' # !("|#, &') , '-.
  • 26. Debats et al (2016) A generalized computer vision approach to mapping crop fields in heterogeneous agricultural landscapes Remote Sensing Environment 179 Machine Learning component 1. On the fly feature extraction 2. Spark ML RandomForest GeoTrellis/ GeoPySpark
  • 27. Does Training Data Error Impact Classification Performance?
  • 28.
  • 29. Next Steps 1. Errors in image atmospheric corrections 2. Increase feature space for classifier 3. Improve label quality 4. Quantify gap between worker and ground
  • 31. 8 Circle Bias, many false positive identified because of overreliance on circular features https://github.com/ecoh ydro/CropMask_RCNN
  • 32. Probability score above .7 deemed a center pivot Tested on never before seen 512x512 tiles 11 Some center pivots are missed because of date mismatch between imagery and labels of the reference dataset
  • 33. BAYESIAN MODEL AVERAGING: ! " # = % &'( ) ! *& # !("|#, *&) ": the ground truth, which will be either ‘field’ or ‘no field’ #: the given data of crowdsourcing opinions for labeling this pixel (e.g., # = {#mapper_1 = field , #mapper_/= no field, …} ) *&: the Mappers considered (1) 012234&’s opinion: how much probability to be " (2) Weight (or evidence): is the probability that we weigh 012234&’s opinion based on their mapping history combining crowdsourcing labels from their mapping history
  • 34. MAPPER OPINION In our mapping project, mappers are allowed to only label a crispy category for polygons (either ‘field’ or ‘no field’). So ! " #, %& = 0 )* 1 (1) !(" = -./01|#& = -./01, %&) = 1 (2) !(" = 4) -./01|#& = -./01, %&) = 0 (3) !(" = 4) -./01|#& = 4) -./01, %&) = 1 (4) !(" = -./01|#& = 4) -./01, %&) = 0
  • 35. WEIGHT Weight: ! "# $ ∝ ! $ "# !("#) (1) !("#): ‘mapper priors’, is our prior belief for mapper '. We can use average score (combining geometric and thematic accuracy) to represent our belief (()*) ∝ (∑,-. / 01234,) /7 (2) ! $ "# : ‘mapper likelihood’, ! $ "# ∝ exp(- . 8 9:;#) [1][2] BIC(Bayesian Information Criterion) = ln ? ∗ A − 2 ln D $ ̂F, " ‘BIC simply reduces to maximum likelihood when the number of parameters is equal for the models of interest’ [3] , so 9:; ≈ −2 ln D $ IF, " . After adjustment, ( J )* ∝ K J ̂F, )* (Maximum mapper likelihood) (? is the sample number, A is the parameter number to be estimated (our case has only one, i.e., L), ML is the label that maximizes the likelihood function)
  • 36. WEIGHT (CONTI.) Weight: ! "# $ ∝ ! $ "# !("#) Mapper likelihood: ' ( )* ∝ + ( ,-, )* (Maximum Mapper likelihood) (1) !(- = 01234| ,-, "#) = ! $ = 01234 - = 01234, "# = (∑8 9 :;< :;<=>?< ) /A (2) !(- = BC 01234| ,-, "#) = ! $ = BC 01234 - = BC 01234, "# = (∑8 9 :?< :?<=>;< ) /A D $ ̂-, " can be computed as: * Maximum mapper likelihood is actually average producer’s accuracy of the mapper
  • 37. SUMMARY ! " # = ∑&'( ) ! *& # !("|#, *&) weight = score ∗ producer′s accuracy ∝ P M8 D P("|D, M8) = 0 ;< 1 Labeling: If ! " = >?@AB # > ! " = D; >?@AB # (or ! " = >?@AB # > 0.5), we give a consensus label as field; otherwise, we give a label as no field The posterior probability of the pixel label " given the data of mappers’ opinions (#): (*& is the mapper ?) → ! " # = ∑FGH I JK&LMNF∗ O(P|Q,RF) ∑FGH I JK&LMNF , where