SlideShare ist ein Scribd-Unternehmen logo
1 von 34
Downloaden Sie, um offline zu lesen
Pennsylvania
Predictive Model:
Lessons Learned
Matthew D. Harris, AECOM - Burlington, NJ
matthew.d.harris@aecom.com
FHWA Statement
“The contents of the report reflect the views of the author(s) who are
responsible for the facts and accuracy of the data presented within. The
contents do not necessarily reflect the official view or policies of the
Department or FHWA at the time of publication.”
Report available at: www.penndotcrm.org
“Remember that all models are
wrong; the practical question is
how wrong do they have to be
to not be useful.”
~ George E. P. Box, 1987
Organization of talk
• Introduction to PA Model
• Data lessons
• Methodological lessons
• Policy lessons
• Concluding observations
Pennsylvania Predictive
Model
PA Model Specs
• 45,293 square miles
• 1 billion raster cells
• 2 million site-present cells
• 18,226 pre-contact sites
• 132 geographic study areas
• 528 individual models
• 93 model variables
• 102 billion cells processed
• Random Forest, MARS, and Stepwise
Logistic Regression models
Archaeo “Big Data”
PA Model
PA Model in
comparison
PA Model in
comparison
DATA Lessons Learned
• Unique characteristics of archaeological data
• Representation of archaeological data
• Archaeological site prevalence
• Covariates and correlation
• Dealing with uncertainty
Characteristics of Archaeological Data
Population Generating Process:
• Highly dynamic & complex
• Non-mechanistic
• Cultural and Agency
• Dynamic environment
• Changing parameters
• Subjectively defined expression
• Censored through taphonomy
Sample Generating Process:
• Non-systematic
• Subjective & inconsistent
• Extensive measurement error
• Imperfect detectability
• Non-representative of population
• Spatially biased
• Over simplification
Data Representation
Do centroids represent sites?
Background
Samples and
model variance
How many non-site samples to
use?
Background gif
Model uncertainty
Quantifying Uncertainty
Logistic regression (Bayesian GLM)
Quantifying Uncertainty
95% Credibility Interval
Quantifying Uncertainty
500 simulated plausible models
Methodological Lessons Learned
• Define your objectives and assumptions
• Reproducibility
• Create a model building system
• ArcGIS is only part of the answer
• Understand your algorithms
• Test and validate all results
Reproducibility
Reproducibility and Accountability
www.rstudio.com
www.python.org
www.esri.com
aws.amazon.com
code example:
pseudo-code example:
Model Building
System
● Variable creationand analysis
● Train model hyperparameters
● Algortihm Selection
● Test error with Cross-Validation
● Assess performance
● Model selection
● Mosaic and aggregate
Validation and error
Does this model predict new sites?
“The generalization performance of a
learning method relates to its prediction
capability on independent test data.” ~ Hastie et al.
(2008)
Bias & Variance Tradeoff
ErrorError
Policy Lessons Learned
• Model purpose dictates policy applications
• Implementation requires explicit assumptions
• Error rates and uncertainty must be known
• Scale of data is critical in scale of use
• Methods to visualize uncertainty
How it all works...
PURPOSE ASSUMPTIONS METHODS
ALGORITHMS /
MODELS
INTERPRETATIONPOLICY
Lessons learned
Reproducibility
Accountability in all aspects of model building
Clear and understandableassumptions
Validation
Test predictions on independent data to assess error
Balance Models to achieve appropriate generalization
Uncertainty
Understand and control for sources of uncertainty
Communicate uncertainty in text and visually
Purpose
Assess all aspects of a model relative to its purpose
Policy and implementation are based on model purpose
Not all doom and gloom!
• Face modeling issues head-on
• Model for our unique data
• Standardize our approaches
• Formalize our theory
• Compare our results
THANK
YOU!!!
@md_harris
github.com/mrecos
matthewdharris.com
www.penndotcrm.orgReport:

Weitere ähnliche Inhalte

Ähnlich wie A Statewide Archaeological Predictive Model of Pennsylvania: Lessons Learned

Revisiting evolutionary information filtering
Revisiting evolutionary information filteringRevisiting evolutionary information filtering
Revisiting evolutionary information filtering
Manolis Vavalis
 
COSMOS1_Scitech_2014_Ali
COSMOS1_Scitech_2014_AliCOSMOS1_Scitech_2014_Ali
COSMOS1_Scitech_2014_Ali
MDO_Lab
 
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Sri Ambati
 
Automating Data Science over a Human Genomics Knowledge Base
Automating Data Science over a Human Genomics Knowledge BaseAutomating Data Science over a Human Genomics Knowledge Base
Automating Data Science over a Human Genomics Knowledge Base
Vaticle
 

Ähnlich wie A Statewide Archaeological Predictive Model of Pennsylvania: Lessons Learned (20)

Revisiting evolutionary information filtering
Revisiting evolutionary information filteringRevisiting evolutionary information filtering
Revisiting evolutionary information filtering
 
Using evolutionary testing to improve efficiency and quality
Using evolutionary testing to improve efficiency and qualityUsing evolutionary testing to improve efficiency and quality
Using evolutionary testing to improve efficiency and quality
 
Looking into the Future: Using Google's Prediction API
Looking into the Future: Using Google's Prediction APILooking into the Future: Using Google's Prediction API
Looking into the Future: Using Google's Prediction API
 
Experimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles BakerExperimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles Baker
 
Kaggle Higgs Boson Machine Learning Challenge
Kaggle Higgs Boson Machine Learning ChallengeKaggle Higgs Boson Machine Learning Challenge
Kaggle Higgs Boson Machine Learning Challenge
 
Handling Missing Attributes using Matrix Factorization 
Handling Missing Attributes using Matrix Factorization Handling Missing Attributes using Matrix Factorization 
Handling Missing Attributes using Matrix Factorization 
 
Large Scale PCA Analysis in SVS
Large Scale PCA Analysis in SVSLarge Scale PCA Analysis in SVS
Large Scale PCA Analysis in SVS
 
COSMOS1_Scitech_2014_Ali
COSMOS1_Scitech_2014_AliCOSMOS1_Scitech_2014_Ali
COSMOS1_Scitech_2014_Ali
 
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
 
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMs
 
productionising-recommenders
productionising-recommendersproductionising-recommenders
productionising-recommenders
 
Learning With Complete Data
Learning With Complete DataLearning With Complete Data
Learning With Complete Data
 
Automating Data Science over a Human Genomics Knowledge Base
Automating Data Science over a Human Genomics Knowledge BaseAutomating Data Science over a Human Genomics Knowledge Base
Automating Data Science over a Human Genomics Knowledge Base
 
Context-aware preference modeling with factorization
Context-aware preference modeling with factorizationContext-aware preference modeling with factorization
Context-aware preference modeling with factorization
 
QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...
QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...
QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...
 
Testing Machine Learning-enabled Systems: A Personal Perspective
Testing Machine Learning-enabled Systems: A Personal PerspectiveTesting Machine Learning-enabled Systems: A Personal Perspective
Testing Machine Learning-enabled Systems: A Personal Perspective
 
Transferring biodiversity models for conservation: Opportunities and challenges
Transferring biodiversity models for conservation: Opportunities and challengesTransferring biodiversity models for conservation: Opportunities and challenges
Transferring biodiversity models for conservation: Opportunities and challenges
 
Sybrandt Thesis Proposal Presentation
Sybrandt Thesis Proposal PresentationSybrandt Thesis Proposal Presentation
Sybrandt Thesis Proposal Presentation
 
Kaggle Gold Medal Case Study
Kaggle Gold Medal Case StudyKaggle Gold Medal Case Study
Kaggle Gold Medal Case Study
 

Kürzlich hochgeladen

Corporate_Science-based_Target_Setting.pptx
Corporate_Science-based_Target_Setting.pptxCorporate_Science-based_Target_Setting.pptx
Corporate_Science-based_Target_Setting.pptx
arnab132
 
case-study-marcopper-disaster in the philippines.pdf
case-study-marcopper-disaster in the philippines.pdfcase-study-marcopper-disaster in the philippines.pdf
case-study-marcopper-disaster in the philippines.pdf
garthraymundo123
 
Disaster risk reduction management Module 4: Preparedness, Prevention and Mit...
Disaster risk reduction management Module 4: Preparedness, Prevention and Mit...Disaster risk reduction management Module 4: Preparedness, Prevention and Mit...
Disaster risk reduction management Module 4: Preparedness, Prevention and Mit...
BrixsonLajara
 

Kürzlich hochgeladen (20)

Corporate_Science-based_Target_Setting.pptx
Corporate_Science-based_Target_Setting.pptxCorporate_Science-based_Target_Setting.pptx
Corporate_Science-based_Target_Setting.pptx
 
Low Rate Call Girls Boudh 9332606886 HOT & SEXY Models beautiful and charmin...
Low Rate Call Girls Boudh  9332606886 HOT & SEXY Models beautiful and charmin...Low Rate Call Girls Boudh  9332606886 HOT & SEXY Models beautiful and charmin...
Low Rate Call Girls Boudh 9332606886 HOT & SEXY Models beautiful and charmin...
 
Test bank for beckmann and ling s obstetrics and gynecology 8th edition by ro...
Test bank for beckmann and ling s obstetrics and gynecology 8th edition by ro...Test bank for beckmann and ling s obstetrics and gynecology 8th edition by ro...
Test bank for beckmann and ling s obstetrics and gynecology 8th edition by ro...
 
Environmental Topic : Soil Pollution by Afzalul Hoda.pptx
Environmental Topic : Soil Pollution by Afzalul Hoda.pptxEnvironmental Topic : Soil Pollution by Afzalul Hoda.pptx
Environmental Topic : Soil Pollution by Afzalul Hoda.pptx
 
case-study-marcopper-disaster in the philippines.pdf
case-study-marcopper-disaster in the philippines.pdfcase-study-marcopper-disaster in the philippines.pdf
case-study-marcopper-disaster in the philippines.pdf
 
Yil Me Hu Summer 2023 Edition - Nisqually Salmon Recovery Newsletter
Yil Me Hu Summer 2023 Edition - Nisqually Salmon Recovery NewsletterYil Me Hu Summer 2023 Edition - Nisqually Salmon Recovery Newsletter
Yil Me Hu Summer 2023 Edition - Nisqually Salmon Recovery Newsletter
 
RA 7942:vThe Philippine Mining Act of 1995
RA 7942:vThe Philippine Mining Act of 1995RA 7942:vThe Philippine Mining Act of 1995
RA 7942:vThe Philippine Mining Act of 1995
 
Hook Up Call Girls Rajgir 9332606886 High Profile Call Girls You Can Get T...
Hook Up Call Girls Rajgir   9332606886  High Profile Call Girls You Can Get T...Hook Up Call Girls Rajgir   9332606886  High Profile Call Girls You Can Get T...
Hook Up Call Girls Rajgir 9332606886 High Profile Call Girls You Can Get T...
 
Water Pollution
Water Pollution Water Pollution
Water Pollution
 
Yil Me Hu Spring 2024 - Nisqually Salmon Recovery Newsletter
Yil Me Hu Spring 2024 - Nisqually Salmon Recovery NewsletterYil Me Hu Spring 2024 - Nisqually Salmon Recovery Newsletter
Yil Me Hu Spring 2024 - Nisqually Salmon Recovery Newsletter
 
Call girl in Sharjah 0503464457 Sharjah Call girl
Call girl in Sharjah 0503464457 Sharjah Call girlCall girl in Sharjah 0503464457 Sharjah Call girl
Call girl in Sharjah 0503464457 Sharjah Call girl
 
A Review on Integrated River Basin Management and Development Master Plan of ...
A Review on Integrated River Basin Management and Development Master Plan of ...A Review on Integrated River Basin Management and Development Master Plan of ...
A Review on Integrated River Basin Management and Development Master Plan of ...
 
Mira Road Reasonable Call Girls ,09167354423,Kashimira Call Girls Service
Mira Road Reasonable Call Girls ,09167354423,Kashimira Call Girls ServiceMira Road Reasonable Call Girls ,09167354423,Kashimira Call Girls Service
Mira Road Reasonable Call Girls ,09167354423,Kashimira Call Girls Service
 
Jumping Scales and Producing peripheries.pptx
Jumping Scales and Producing peripheries.pptxJumping Scales and Producing peripheries.pptx
Jumping Scales and Producing peripheries.pptx
 
Trusted call girls in Fatehabad 9332606886 High Profile Call Girls You Can...
Trusted call girls in Fatehabad   9332606886  High Profile Call Girls You Can...Trusted call girls in Fatehabad   9332606886  High Profile Call Girls You Can...
Trusted call girls in Fatehabad 9332606886 High Profile Call Girls You Can...
 
Disaster risk reduction management Module 4: Preparedness, Prevention and Mit...
Disaster risk reduction management Module 4: Preparedness, Prevention and Mit...Disaster risk reduction management Module 4: Preparedness, Prevention and Mit...
Disaster risk reduction management Module 4: Preparedness, Prevention and Mit...
 
Deforestation
DeforestationDeforestation
Deforestation
 
Russian Call girl Dubai 0503464457 Dubai Call girls
Russian Call girl Dubai 0503464457 Dubai Call girlsRussian Call girl Dubai 0503464457 Dubai Call girls
Russian Call girl Dubai 0503464457 Dubai Call girls
 
Call girl in Ajman 0503464457 Ajman Call girl services
Call girl in Ajman 0503464457 Ajman Call girl servicesCall girl in Ajman 0503464457 Ajman Call girl services
Call girl in Ajman 0503464457 Ajman Call girl services
 
Top Call Girls in Bishnupur 9332606886 High Profile Call Girls You Can Get...
Top Call Girls in Bishnupur   9332606886  High Profile Call Girls You Can Get...Top Call Girls in Bishnupur   9332606886  High Profile Call Girls You Can Get...
Top Call Girls in Bishnupur 9332606886 High Profile Call Girls You Can Get...
 

A Statewide Archaeological Predictive Model of Pennsylvania: Lessons Learned