SlideShare ist ein Scribd-Unternehmen logo
1 von 36
Integrating EO with Official
Statistics using Machine
Learning in Mexico
(Work in progress)
November 5
INEGI
abel.coronado@inegi.org.mx
UNECE Machine Learning Project
UNECE Machine Learning Project
This is one of the pilot projects in the Machine Learning Project of the UNECE High-Level
Group on Modernization of Official Statistics.
UNECE Machine Learning Project
UNECE Machine Learning Project
Some Objectives
• Investigate and demonstrate the value added of ML in the production of official statistics,
where "value added" is increase in relevance, better overall quality or reduction in costs.
• Advance the capability of national statistical organisations to use ML in the production of
official statistics.
• Enhance collaboration between statistical organisations in the development and application
of ML.
Imagery Pilot Project
Imagery Pilot Project
Objective of this Imagery Pilot Project (Practical Application)
Expand the use of imagery data in the production of official statistics through the further
development of knowledge and sharing of ML solutions and practices.
Training
2010 Census
(Random Sample)
Annual Satellite Image
Summary Machine Learning
Annual classifications in
Non-Census years
Urban-Rural Classes2010 Imagery
Imagery Pilot Project
Imagery Pilot Project
• This pilot project proposes the use of Landsat satellite data for the mapping of urban and rural areas
in non-census years, using as ground truth an urban and rural grid of 1km x 1km generated from the
field data of the 2010 Population Census at block level and the updated Georeferenced Business
Register to date 2010.
• This pilot project seeks to take advantage of the knowledge, data, and technologies available to show
an example of the use of satellite images in the generation of geographic information that can be
used to monitor the change in urban extent and if the final product meets the appropriate quality,
explore the path to incorporate it into the continuous urban-rural monitoring processes of INEGI.
• Monitoring the growth of the urban and rural areas of a country would enable the adjustments of
models and samples of the household and business surveys as well as to plan censuses. As the
application matures and improves, it could produce new sets of official statistics on its own.
Urban and Rural Locations in Mexico
4, 525 Urban Locations
187, 722 Rural Locations
Locations in Mexico
Draft of Imagery Pipeline
Draft of Imagery Pipeline
Imagery Pipeline
In order to have a road map.
We build an abstract machine learning pipeline for Imagery
Based on:
IBM CRISP-DM
Cross Industry Standard Process for Data Mining
&
Microsoft Data Science lifecycle
InKyung Choi; Refactoring
Draft Imagery Pipeline
Imagery Pipeline
Pilot Project Pipeline
Establish the problem to be solved
Establish the problem to be solved
Monitor the growth of locations of Mexico (Urban and Rural), which would generate a more
timely input for :
• Cartography update
• Estimation of the population in non-census years
• Related with:
• SDG Indicator 11.3.1: Ratio of land consumption rate to population growth rate
• SDG Indicator 15.3.1: Proportion of land that is degraded over total land area
Identify satellite data to address the problem
Identify satellite data to address the problem
In our case, the data to monitor the growth of cities can be Landsat images (NASA & USGS):
• They are open data.
• There is a constant record since the 70s, although they are available from 1985 to date.
• The spatial resolution of Landsat images is 30 meters.
• Temporal resolution is 16 days and 8 days with combined satellites.
• Spectral resolution, in this pilot project we use 6 spectral bands
• All the data we use is Analysis Ready Data (ARD)
• We take advantage that we have just built the Mexican Geospatial Data Cube, with all this
information.
Translate the problem into statistical problem
Translate the problem into statistical problem
We define that is a classification problem:
• Unit of analysis: 1km x 1km squares covering the whole country: 1’ 975, 719 .
• 3 classes were designated:
• Urban
• Rural
• None of the above
Obtain ground-truth data; 2010 Census Data
Obtain ground-truth data
Obtain layers of geographical information:
• Georeferenced Population Census 2010 (Block Level Aggregation)
• Georeferenced Economic Census (Economic Unit Level)
• Visual Inspection
Obtain ground-truth data; 1km x 1km Grid
Obtain ground-truth data
Build the polygons (1 km – 1 km)
1’ 975, 719 Polygons
Obtain ground-truth data; Urban-Rural Grid
Obtain ground-truth data
Assign Labels
Obtain satellite imagery data
Obtain satellite imagery data
• 32 TB of Images in external discs.
• 90 TB decompressed
• March 4, 2019
• The images are ARD, Analysis Ready Data.
• In essence ARD means that the pixels of the entire time series are aligned and
comparable.
Obtain satellite imagery data
Obtain satellite imagery data
• 36 Years
Obtain satellite imagery data
Obtain satellite imagery data
• 2010 same year of the Census
Use the ODC to generate annual summaries
Obtain satellite imagery data
• 2010 same year of the Census
2,737,273,075 pixels – 35 Gb
Geomedians
Geomedians
Geomedians
Integrate ground-truth data with satellite data
Integrate ground-truth data with satellite data
1 km x 1 km tagged polygon
33 x 33 pixels
30 meters of resolution
33 Pixels
6Bands
6 swir 2------
5 swir 1------
4 nir---------
3 red---------
2 green-------
1 blue--------
Python
Function
Multidimensional Array
NumPy
Take a Random Sample
Take a Random Sample
Define features and generate a dataset to be used for analysis
Define features and generate a dataset to be used for analysis
• Feature Engineering
30,000 tagged regions
1km x 1km
33 pixels x 33 pixels x 6 Bands
30,000
4,305
30,000 rows in a data matrix
?
Define features and generate a dataset to be used for analysis
Define features and generate a dataset to be used for analysis
• Feature Engineering
30,000 tagged regions
1km x 1km
33 pixels x 33 pixels x 6 Bands
30,000
4,305
30,000 rows in a data matrix
Auto Machine Learning
https://epistasislab.github.io/tpot/
Evaluate ML
Random Sample
30,000 Elements
Machine Learning
Result of TPOT
2010 National
Classification
78% Accuracy
Training
2010 Geomedian
Classification
30,000
4,305
30,000 rows in a data matrix
Detailed results
Detailed results; First Iteration & Second Iteration
Next Steps
Finish Third Iteration
At this moment finishing third iteration
we seek to improve improve 2 or 3 %
Next Steps
Next Steps
• Finish a third iteration, no later than one month.
• The grid is also an important innovation and is evolving, it is likely
that we will have a new version very soon and we should update
the classification.
• Apply the model to un-labelled years, for example 2019 first
semester.
• Validate a sample of the un-labelled year, requesting support for
the area of visual interpretation of the geography division, to have
a measure of product quality.
• Hold more meetings with the area of sociodemographic statistics,
involve them in the exercise to improve the potential benefit in the
population estimate.
• Hold more meetings with the area of statistical research, to
receive feedback and observations that can improve the product.
• Hold meetings with the cartographic update area, to receive
feedback.
GRACIAS!
@abxda
Integrating eo with official statistics using machine learning in mexico geo week

Weitere ähnliche Inhalte

Was ist angesagt?

Luo-IGARSS2011-2385.ppt
Luo-IGARSS2011-2385.pptLuo-IGARSS2011-2385.ppt
Luo-IGARSS2011-2385.ppt
grssieee
 
Spectral_classification_of_WorldView2_multiangle_sequence.pptx
Spectral_classification_of_WorldView2_multiangle_sequence.pptxSpectral_classification_of_WorldView2_multiangle_sequence.pptx
Spectral_classification_of_WorldView2_multiangle_sequence.pptx
grssieee
 
07 a70110 remotesensingandgisapplications
07 a70110 remotesensingandgisapplications07 a70110 remotesensingandgisapplications
07 a70110 remotesensingandgisapplications
imaduddin91
 
Land Use/Land Cover Detection
Land Use/Land Cover DetectionLand Use/Land Cover Detection
Land Use/Land Cover Detection
Wansoo Im
 

Was ist angesagt? (18)

Luo-IGARSS2011-2385.ppt
Luo-IGARSS2011-2385.pptLuo-IGARSS2011-2385.ppt
Luo-IGARSS2011-2385.ppt
 
Identifying Land Patterns from Satellite Images using Deep Learning
Identifying Land Patterns from Satellite Images using Deep LearningIdentifying Land Patterns from Satellite Images using Deep Learning
Identifying Land Patterns from Satellite Images using Deep Learning
 
Processing of raw astronomical data of large volume by map reduce model
Processing of raw astronomical data of large volume by map reduce modelProcessing of raw astronomical data of large volume by map reduce model
Processing of raw astronomical data of large volume by map reduce model
 
Swiss Territorial Data Lab - geo Data Science - colloque FHNW
Swiss Territorial Data Lab - geo Data Science - colloque FHNWSwiss Territorial Data Lab - geo Data Science - colloque FHNW
Swiss Territorial Data Lab - geo Data Science - colloque FHNW
 
GEM Risk: main achievements during the first implementation phase
GEM Risk: main achievements during the first implementation phaseGEM Risk: main achievements during the first implementation phase
GEM Risk: main achievements during the first implementation phase
 
Using Deep Learning to Derive 3D Cities from Satellite Imagery
Using Deep Learning to Derive 3D Cities from Satellite ImageryUsing Deep Learning to Derive 3D Cities from Satellite Imagery
Using Deep Learning to Derive 3D Cities from Satellite Imagery
 
A new generation of data to monitor landscapes across the tropics
A new generation of data to monitor landscapes across the tropicsA new generation of data to monitor landscapes across the tropics
A new generation of data to monitor landscapes across the tropics
 
Casos de uso do FME no IBGE
Casos de uso do FME no IBGECasos de uso do FME no IBGE
Casos de uso do FME no IBGE
 
State of the Map US 2018: Analytic Support to Mapping Contributors
State of the Map US 2018: Analytic Support to Mapping ContributorsState of the Map US 2018: Analytic Support to Mapping Contributors
State of the Map US 2018: Analytic Support to Mapping Contributors
 
Spectral_classification_of_WorldView2_multiangle_sequence.pptx
Spectral_classification_of_WorldView2_multiangle_sequence.pptxSpectral_classification_of_WorldView2_multiangle_sequence.pptx
Spectral_classification_of_WorldView2_multiangle_sequence.pptx
 
The Seismic Hazard Modeller’s Toolkit: An Open-Source Library for the Const...
The Seismic Hazard Modeller’s Toolkit:  An Open-Source Library for the Const...The Seismic Hazard Modeller’s Toolkit:  An Open-Source Library for the Const...
The Seismic Hazard Modeller’s Toolkit: An Open-Source Library for the Const...
 
Automated Summarisation of Big Data, useR! 2018
Automated Summarisation of Big Data, useR! 2018  Automated Summarisation of Big Data, useR! 2018
Automated Summarisation of Big Data, useR! 2018
 
2nd e-ROSA Stakeholder Workshop: By, EO Based Global Public goods
2nd e-ROSA Stakeholder Workshop: By, EO Based Global Public goods2nd e-ROSA Stakeholder Workshop: By, EO Based Global Public goods
2nd e-ROSA Stakeholder Workshop: By, EO Based Global Public goods
 
How to set achievable tree canopy goals
How to set achievable tree canopy goalsHow to set achievable tree canopy goals
How to set achievable tree canopy goals
 
Big data in GIS Environment
Big data in GIS Environment Big data in GIS Environment
Big data in GIS Environment
 
07 a70110 remotesensingandgisapplications
07 a70110 remotesensingandgisapplications07 a70110 remotesensingandgisapplications
07 a70110 remotesensingandgisapplications
 
Land Use/Land Cover Detection
Land Use/Land Cover DetectionLand Use/Land Cover Detection
Land Use/Land Cover Detection
 
Bayesian Hilbert Maps for Dynamic Continuous Occupancy Mapping
Bayesian Hilbert Maps for Dynamic Continuous Occupancy MappingBayesian Hilbert Maps for Dynamic Continuous Occupancy Mapping
Bayesian Hilbert Maps for Dynamic Continuous Occupancy Mapping
 

Ähnlich wie Integrating eo with official statistics using machine learning in mexico geo week

System for Land-Based Emissions Estimation in Kenya
System for Land-Based Emissions Estimation in KenyaSystem for Land-Based Emissions Estimation in Kenya
System for Land-Based Emissions Estimation in Kenya
ipcc-media
 
flat_presentation_time_evolving_OD_matrix_estimation
flat_presentation_time_evolving_OD_matrix_estimationflat_presentation_time_evolving_OD_matrix_estimation
flat_presentation_time_evolving_OD_matrix_estimation
Luís Moreira-Matias
 
Crowd sourcing gis for global urban area mapping
Crowd sourcing gis for global urban area mappingCrowd sourcing gis for global urban area mapping
Crowd sourcing gis for global urban area mapping
Hiroyuki Miyazaki
 

Ähnlich wie Integrating eo with official statistics using machine learning in mexico geo week (20)

Machine learning and Satellite Images
Machine learning and Satellite ImagesMachine learning and Satellite Images
Machine learning and Satellite Images
 
ONS local presents clustering
ONS local presents clusteringONS local presents clustering
ONS local presents clustering
 
Smart Urban Planning Support through Web Data Science on Open and Enterprise ...
Smart Urban Planning Support through Web Data Science on Open and Enterprise ...Smart Urban Planning Support through Web Data Science on Open and Enterprise ...
Smart Urban Planning Support through Web Data Science on Open and Enterprise ...
 
Singapore 3D Map for a Smart Nation
Singapore 3D Map for a Smart NationSingapore 3D Map for a Smart Nation
Singapore 3D Map for a Smart Nation
 
Multimedia Mining
Multimedia Mining Multimedia Mining
Multimedia Mining
 
Tanay Chaudhari_SIP PPT
Tanay Chaudhari_SIP PPTTanay Chaudhari_SIP PPT
Tanay Chaudhari_SIP PPT
 
NextGEOSS: The Next Generation European Data Hub and Cloud Platform for Earth...
NextGEOSS: The Next Generation European Data Hub and Cloud Platform for Earth...NextGEOSS: The Next Generation European Data Hub and Cloud Platform for Earth...
NextGEOSS: The Next Generation European Data Hub and Cloud Platform for Earth...
 
Carpita metulini 111220_dssr_bari_version2
Carpita metulini 111220_dssr_bari_version2Carpita metulini 111220_dssr_bari_version2
Carpita metulini 111220_dssr_bari_version2
 
Introduction to Computational Statistics
Introduction to Computational StatisticsIntroduction to Computational Statistics
Introduction to Computational Statistics
 
ML & Decision Making
ML & Decision MakingML & Decision Making
ML & Decision Making
 
Analysis of Educational Robotics activities using a machine learning approach
Analysis of Educational Robotics activities using a machine learning approachAnalysis of Educational Robotics activities using a machine learning approach
Analysis of Educational Robotics activities using a machine learning approach
 
[DSC Europe 22] Land Boundary Delineation using U-net - Theophilus Aidoo
[DSC Europe 22] Land Boundary Delineation using U-net - Theophilus Aidoo[DSC Europe 22] Land Boundary Delineation using U-net - Theophilus Aidoo
[DSC Europe 22] Land Boundary Delineation using U-net - Theophilus Aidoo
 
mohamed saber c.v
mohamed saber c.vmohamed saber c.v
mohamed saber c.v
 
mohamed saber c.v
mohamed saber c.vmohamed saber c.v
mohamed saber c.v
 
12 SuperAI on Supercomputers
12 SuperAI on Supercomputers12 SuperAI on Supercomputers
12 SuperAI on Supercomputers
 
System for Land-Based Emissions Estimation in Kenya
System for Land-Based Emissions Estimation in KenyaSystem for Land-Based Emissions Estimation in Kenya
System for Land-Based Emissions Estimation in Kenya
 
flat_presentation_time_evolving_OD_matrix_estimation
flat_presentation_time_evolving_OD_matrix_estimationflat_presentation_time_evolving_OD_matrix_estimation
flat_presentation_time_evolving_OD_matrix_estimation
 
Open Land Use - the Current Status and Steps Forward
Open Land Use - the Current Status and Steps ForwardOpen Land Use - the Current Status and Steps Forward
Open Land Use - the Current Status and Steps Forward
 
Resume final upload pdf
Resume final upload pdfResume final upload pdf
Resume final upload pdf
 
Crowd sourcing gis for global urban area mapping
Crowd sourcing gis for global urban area mappingCrowd sourcing gis for global urban area mapping
Crowd sourcing gis for global urban area mapping
 

Mehr von Abel Alejandro Coronado Iruegas

Mehr von Abel Alejandro Coronado Iruegas (20)

Mobility Master Class.pdf
Mobility Master Class.pdfMobility Master Class.pdf
Mobility Master Class.pdf
 
Live UAEMex Cubo de Datos Geoespaciales de Mexico
Live UAEMex Cubo de Datos Geoespaciales de MexicoLive UAEMex Cubo de Datos Geoespaciales de Mexico
Live UAEMex Cubo de Datos Geoespaciales de Mexico
 
Cubo de datos uaemex
Cubo de datos uaemexCubo de datos uaemex
Cubo de datos uaemex
 
Geo Big Data 4 Datalab
Geo Big Data 4 DatalabGeo Big Data 4 Datalab
Geo Big Data 4 Datalab
 
Catedra INEGI Big Data en IBERO
Catedra INEGI Big Data en IBEROCatedra INEGI Big Data en IBERO
Catedra INEGI Big Data en IBERO
 
El Cubo de Datos Geoespaciales de Mexico
El Cubo de Datos Geoespaciales de MexicoEl Cubo de Datos Geoespaciales de Mexico
El Cubo de Datos Geoespaciales de Mexico
 
No Sql
No SqlNo Sql
No Sql
 
Cubo de Datos Geoespaciales de Mexico
Cubo de Datos Geoespaciales de MexicoCubo de Datos Geoespaciales de Mexico
Cubo de Datos Geoespaciales de Mexico
 
Congreso UAA 2018 Animo Tuitero 2 0
Congreso UAA 2018 Animo Tuitero 2 0Congreso UAA 2018 Animo Tuitero 2 0
Congreso UAA 2018 Animo Tuitero 2 0
 
Analisis del Sentimiento en el Estado de Animo de los Tuiteros en Mexico
Analisis del Sentimiento en el Estado de Animo de los Tuiteros en MexicoAnalisis del Sentimiento en el Estado de Animo de los Tuiteros en Mexico
Analisis del Sentimiento en el Estado de Animo de los Tuiteros en Mexico
 
Ejemplos de Proyectos de Ciencia de Datos y Big Data en el INEGI
Ejemplos de Proyectos de Ciencia de Datos y Big Data en el INEGIEjemplos de Proyectos de Ciencia de Datos y Big Data en el INEGI
Ejemplos de Proyectos de Ciencia de Datos y Big Data en el INEGI
 
Big data big opportunities
Big data big opportunitiesBig data big opportunities
Big data big opportunities
 
Big data taller inegi sedesol
Big data taller inegi sedesolBig data taller inegi sedesol
Big data taller inegi sedesol
 
INEGI ESS big data workshop
INEGI ESS big data workshopINEGI ESS big data workshop
INEGI ESS big data workshop
 
Taller de Big Data y Ciencia de Datos en COLMEX dia 2
Taller de Big Data y Ciencia de Datos en COLMEX dia 2Taller de Big Data y Ciencia de Datos en COLMEX dia 2
Taller de Big Data y Ciencia de Datos en COLMEX dia 2
 
Taller de Big Data y Ciencia de Datos en COLMEX dia 1
Taller de Big Data y Ciencia de Datos en COLMEX dia 1 Taller de Big Data y Ciencia de Datos en COLMEX dia 1
Taller de Big Data y Ciencia de Datos en COLMEX dia 1
 
Big data lead colmex
Big data lead colmexBig data lead colmex
Big data lead colmex
 
Explorando Big Data y Ciencia de Datos con GPUs
Explorando Big Data y Ciencia de Datos con GPUsExplorando Big Data y Ciencia de Datos con GPUs
Explorando Big Data y Ciencia de Datos con GPUs
 
Geo Big Data 2015
Geo Big Data 2015 Geo Big Data 2015
Geo Big Data 2015
 
Anatomía de un proyecto de Big Data
Anatomía de un proyecto de Big DataAnatomía de un proyecto de Big Data
Anatomía de un proyecto de Big Data
 

Kürzlich hochgeladen

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Kürzlich hochgeladen (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Integrating eo with official statistics using machine learning in mexico geo week

  • 1. Integrating EO with Official Statistics using Machine Learning in Mexico (Work in progress) November 5 INEGI abel.coronado@inegi.org.mx
  • 2. UNECE Machine Learning Project UNECE Machine Learning Project This is one of the pilot projects in the Machine Learning Project of the UNECE High-Level Group on Modernization of Official Statistics.
  • 3. UNECE Machine Learning Project UNECE Machine Learning Project Some Objectives • Investigate and demonstrate the value added of ML in the production of official statistics, where "value added" is increase in relevance, better overall quality or reduction in costs. • Advance the capability of national statistical organisations to use ML in the production of official statistics. • Enhance collaboration between statistical organisations in the development and application of ML.
  • 5. Imagery Pilot Project Objective of this Imagery Pilot Project (Practical Application) Expand the use of imagery data in the production of official statistics through the further development of knowledge and sharing of ML solutions and practices. Training 2010 Census (Random Sample) Annual Satellite Image Summary Machine Learning Annual classifications in Non-Census years Urban-Rural Classes2010 Imagery
  • 6. Imagery Pilot Project Imagery Pilot Project • This pilot project proposes the use of Landsat satellite data for the mapping of urban and rural areas in non-census years, using as ground truth an urban and rural grid of 1km x 1km generated from the field data of the 2010 Population Census at block level and the updated Georeferenced Business Register to date 2010. • This pilot project seeks to take advantage of the knowledge, data, and technologies available to show an example of the use of satellite images in the generation of geographic information that can be used to monitor the change in urban extent and if the final product meets the appropriate quality, explore the path to incorporate it into the continuous urban-rural monitoring processes of INEGI. • Monitoring the growth of the urban and rural areas of a country would enable the adjustments of models and samples of the household and business surveys as well as to plan censuses. As the application matures and improves, it could produce new sets of official statistics on its own.
  • 7. Urban and Rural Locations in Mexico 4, 525 Urban Locations 187, 722 Rural Locations Locations in Mexico
  • 8. Draft of Imagery Pipeline
  • 9. Draft of Imagery Pipeline Imagery Pipeline In order to have a road map. We build an abstract machine learning pipeline for Imagery Based on: IBM CRISP-DM Cross Industry Standard Process for Data Mining & Microsoft Data Science lifecycle InKyung Choi; Refactoring
  • 12. Establish the problem to be solved Establish the problem to be solved Monitor the growth of locations of Mexico (Urban and Rural), which would generate a more timely input for : • Cartography update • Estimation of the population in non-census years • Related with: • SDG Indicator 11.3.1: Ratio of land consumption rate to population growth rate • SDG Indicator 15.3.1: Proportion of land that is degraded over total land area
  • 13. Identify satellite data to address the problem Identify satellite data to address the problem In our case, the data to monitor the growth of cities can be Landsat images (NASA & USGS): • They are open data. • There is a constant record since the 70s, although they are available from 1985 to date. • The spatial resolution of Landsat images is 30 meters. • Temporal resolution is 16 days and 8 days with combined satellites. • Spectral resolution, in this pilot project we use 6 spectral bands • All the data we use is Analysis Ready Data (ARD) • We take advantage that we have just built the Mexican Geospatial Data Cube, with all this information.
  • 14. Translate the problem into statistical problem Translate the problem into statistical problem We define that is a classification problem: • Unit of analysis: 1km x 1km squares covering the whole country: 1’ 975, 719 . • 3 classes were designated: • Urban • Rural • None of the above
  • 15. Obtain ground-truth data; 2010 Census Data Obtain ground-truth data Obtain layers of geographical information: • Georeferenced Population Census 2010 (Block Level Aggregation) • Georeferenced Economic Census (Economic Unit Level) • Visual Inspection
  • 16. Obtain ground-truth data; 1km x 1km Grid Obtain ground-truth data Build the polygons (1 km – 1 km) 1’ 975, 719 Polygons
  • 17. Obtain ground-truth data; Urban-Rural Grid Obtain ground-truth data Assign Labels
  • 18. Obtain satellite imagery data Obtain satellite imagery data • 32 TB of Images in external discs. • 90 TB decompressed • March 4, 2019 • The images are ARD, Analysis Ready Data. • In essence ARD means that the pixels of the entire time series are aligned and comparable.
  • 19. Obtain satellite imagery data Obtain satellite imagery data • 36 Years
  • 20. Obtain satellite imagery data Obtain satellite imagery data • 2010 same year of the Census
  • 21. Use the ODC to generate annual summaries Obtain satellite imagery data • 2010 same year of the Census 2,737,273,075 pixels – 35 Gb
  • 25. Integrate ground-truth data with satellite data Integrate ground-truth data with satellite data 1 km x 1 km tagged polygon 33 x 33 pixels 30 meters of resolution 33 Pixels 6Bands 6 swir 2------ 5 swir 1------ 4 nir--------- 3 red--------- 2 green------- 1 blue-------- Python Function Multidimensional Array NumPy
  • 26. Take a Random Sample Take a Random Sample
  • 27. Define features and generate a dataset to be used for analysis Define features and generate a dataset to be used for analysis • Feature Engineering 30,000 tagged regions 1km x 1km 33 pixels x 33 pixels x 6 Bands 30,000 4,305 30,000 rows in a data matrix ?
  • 28. Define features and generate a dataset to be used for analysis Define features and generate a dataset to be used for analysis • Feature Engineering 30,000 tagged regions 1km x 1km 33 pixels x 33 pixels x 6 Bands 30,000 4,305 30,000 rows in a data matrix
  • 30. Evaluate ML Random Sample 30,000 Elements Machine Learning Result of TPOT 2010 National Classification 78% Accuracy Training 2010 Geomedian Classification 30,000 4,305 30,000 rows in a data matrix
  • 31. Detailed results Detailed results; First Iteration & Second Iteration
  • 33. Finish Third Iteration At this moment finishing third iteration we seek to improve improve 2 or 3 %
  • 34. Next Steps Next Steps • Finish a third iteration, no later than one month. • The grid is also an important innovation and is evolving, it is likely that we will have a new version very soon and we should update the classification. • Apply the model to un-labelled years, for example 2019 first semester. • Validate a sample of the un-labelled year, requesting support for the area of visual interpretation of the geography division, to have a measure of product quality. • Hold more meetings with the area of sociodemographic statistics, involve them in the exercise to improve the potential benefit in the population estimate. • Hold more meetings with the area of statistical research, to receive feedback and observations that can improve the product. • Hold meetings with the cartographic update area, to receive feedback.