SlideShare a Scribd company logo
1 of 22
Understanding the planet using satellites
and deep learning
bcn.AI June 7th 2019
Albert Pujol Torras @AlbertPT71
Lead Machine Learning Platform
Agenda
● Satellogic
● Satellogic Data Science and Solutions
● What we can do with satellites, examples of problems we face
● What type of data do we work with ?
● Processing infrastructure, hardware and software
● Some lines of research we are interested on.
● Lessons learned
● Questions
Data Science & Solutions
BCN
Delivery platform
TLV
Headquarters & Design
BSAS
Manufacturing Plant
MVD
Comprehensive services
PEK
Object detection/Counting
What kinds of problems do we face ?
Object amount/density estimation / regression with lower image resolution
Estimation of other image modalities
HR RGB LR TIR LR SWIR1 LR SWIR2
HR THERMAL
Regression: time series image prediction
-Estimation of the yield at the end of the season
-Monitoring of changes in the estimation to know when and where to act.
Image semantic segmentation: Land use detection
“anomaly change” and “semantic change” detection
Time T0 Time T1 Diff (T1,T0)
Satellogic Data
3rd Party Satellite Data
Primary Data
Sources
Derived Layers
Temporal Evolution
Land Use Maps
Advanced Indices
Distance to Water
Terrain Orientation
Superresolution Images
...
These sources can be available globally or locally, dynamic or static, high or low res...
nKappa: Data science platform with focus on geographic data and satellite imagery.
Main goal: To scale solution development by automating/accelerating data science work.
nKappa enables solution development using aligned sets of image tiles (Kappas)
World Climate Maps
Geologic Data
Elevation Models
Georef: Man-Made Structure
Political Boundaries
Census Data Maps
Data - Data Sources
Sizes:
-Typical project: 20Gb/day.
-Daily world remap: continental surface processing 5300 hours of video per day.
Sources of image variation:
-Clouds….70% of the world is cloud covered.
-Perspective changes (off nadir satellite images, drone images).
-Shadows orientation, intensity, and longitude variations depending on day hour, clouds, and season.
-Chromatic changes due to aerosol and hour of day.
-Variations between sensors (different satellites, drone images,..)
-Variations/errors in image orthorectification, geolocalization.
-Growth and color of seasonal vegetation changes,...
Data - Data Sources
clouds perspective shadows
Chromatic and vegetation
changes
Data - Data Sources
Extremely unbalanced datasets
Rare and expensive: indispensable to train and to assess quality of ML and computer vision
approaches.
Sources of ground truth:
- Land ground truth provided by client.
- GT generated using highest resolution imagery.
- Human annotation
- Our team always annotate ... to understand the problem.
- Internal and external annotation (mechanical turk, supahands, ...)
- sample what to annotate to preserve variability and input domain coverage.
- Measure biases and variances of annotators (discard annotators, images,reconstruct annotation
instructions...).
- Other GT sources: first world surveyed data annotated from visual imagery or using land ground truth (Corina project,
LUCAS, Creaf, Siose in spain, USA USGS land cover dataset,...)
Useful one we have to deal with:
- Out of data (most of it are correct but small parts are erroneous)
- differing resolution (uncertain labels at class borders),
- domain/covariate shift: how to transfer it to places that differ in land management culture, climate or relief.
Data - Ground truth
expensivecheaper
GT Data: Covariate shift & Domain adaptation
Existent “good quality” Ground Truth
Rice fields in Europe
Target areas without ground truth
Urban areas in Europe Urban areas in Lagos
Rice fields in China
● huge amount of data --> cloud infrastructure.
● nKappa platform for distributed processing (actually using Microsoft Azure)
and in-house gpu servers (equipped with 1080ti’s)
● nKappa uses cloud for experiment management to keep track, team share,
and audit datasets, algorithms, models ,deploying pipelines and models in to
production, and handle all the GIS-ETL related stuff.
● GPU-servers mostly used in the stage of EDA and DS algorithms and models
development.
Infrastructure - Hardware
Some lines of research: Domain adaptation
”Deep Visual Domain Adaptation: A Survey”, Mei Wang, Weihong Deng,
“Domain Adaptation for Visual Applications: A Comprehensive Survey”, Gabriela Csurka
Sampling and sample-weighting based on classifier domain differentiation Adversarial networks to make embeddings invariant to domain change
GT Barcelona Target Lasa
Some lines of research: Usage of generative models
Image-to-Image Translation with Conditional Adversarial Networks, Isola, Phillip; Zhu, Jun-Yan; Zhou, Tinghui; Efros, Alexei A.
Satellite Image Spoofing: Creating Remote Sensing Dataset with Generative Adversarial Networks, Chunxue Xu,Bo Zhao
GeoGAN: A Conditional GAN with Reconstruction and Style Loss to Generate Standard Layer of Maps from Satellite Images
Invisible cities. https://opendot.github.io/ml4a-invisible-cities/implementation/
Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, Jun-Yan Zhu, Taesung Park, Phillip Isola, Alexei A. Efros
Evaluation of the effects on semantic segmentation of using samples from Conditional Generative Adversarial Networks:
- Data augmentation: Generation of satellite images (textures) from land use random labels.
- Hiper resolution and image enhancement.
Some lines of research: Uncertainty measurement and GT cleaning
“Dropout as a Bayesian Approximation:Representing Model Uncertainty in Deep Learning”, Yarin Gal,Zoubin Ghahramani
“Co-teaching: Robust Training of Deep Neural Networks with Extremely Noisy Labels”, Bo Han,, Quanming Yao, Xingrui Yu,, Gang Niu,Miao Xu,
Weihua Hu, Ivor W. Tsang, Masashi Sugiyama
Measure errors on GT labeling:
- Error and entropy on classification distribution when using
an ensemble of classifiers.
- Entropy of DNN outputs when applying dropout on fully
connected layers on inference stage.
Some lines of research: Distance metric -Invariant embeddings
How we can use the huge amount of unlabeled data to train models.
-learning deep NN invariant embeddings and transferable models for encoding land use content.
“Tile2Vec: Unsupervised representation learning for spatially distributed data” ,Neal Jean, Sherrie Wang, Anshul Samar, George Azzari, David Lobell,
Stefano Ermon
ANCHOR TILES
POSITIVE TILES
NEGATIVE TILES
Lessons learned
- Project success :
- 5% ML algorithm and algorithm parameters selection,
- 95% really understanding what the client needs, how to generate value, and anticipate how
your output is going to be consumed, defining good features, good ground truth, good
sampling data policy, pre and post processing.
- Dedicate the time first to ensure success, … after that improve:
- Using fast ML algorithms.
- Starting with small datasets with the input and output variability of the original one.
- Predictive models: accuracy is not always the most important: explainability, consistency.
- Worth invest on automatically measure dataset quality before start training on big datasets.
- Missing values, constant variables, unaligned bands, duplicated variables, unbalancing…
- Most of our in production costs are ETL (extract, transform, load)
- Deep Learning is amazing (sometimes too much for the problems to solve) ….and it is expensive:
- In production: computational cost.
- In development: Fine tuning and network cooking. (does not scale quite well)
- Context knowledge + common sense heuristics + ML vs end-to-end (is all tarjet domain variability
in your train set?)
Questions ?

More Related Content

What's hot

Preliminary Evaluation of TinyYOLO on a New Dataset for Search-And-Rescue wit...
Preliminary Evaluation of TinyYOLO on a New Dataset for Search-And-Rescue wit...Preliminary Evaluation of TinyYOLO on a New Dataset for Search-And-Rescue wit...
Preliminary Evaluation of TinyYOLO on a New Dataset for Search-And-Rescue wit...
Gennaro Vessio
 
Application of terrestrial 3D laser scanning in building information modellin...
Application of terrestrial 3D laser scanning in building information modellin...Application of terrestrial 3D laser scanning in building information modellin...
Application of terrestrial 3D laser scanning in building information modellin...
Martin Ma
 
Image processing training in mohali
Image processing training in mohaliImage processing training in mohali
Image processing training in mohali
matrixphagwara
 

What's hot (13)

Automated features extraction from satellite images.
Automated features extraction from satellite images.Automated features extraction from satellite images.
Automated features extraction from satellite images.
 
Critical Infrastructure Monitoring Using UAV Imagery
Critical Infrastructure Monitoring Using UAV ImageryCritical Infrastructure Monitoring Using UAV Imagery
Critical Infrastructure Monitoring Using UAV Imagery
 
PyconPH 2014 - Image Analysis in Python
PyconPH 2014 - Image Analysis in PythonPyconPH 2014 - Image Analysis in Python
PyconPH 2014 - Image Analysis in Python
 
Laser scanning technology in civil engg
Laser scanning technology in civil enggLaser scanning technology in civil engg
Laser scanning technology in civil engg
 
Big Data, Data and Information Mining for Earth Observation
Big Data, Data and Information Mining for Earth ObservationBig Data, Data and Information Mining for Earth Observation
Big Data, Data and Information Mining for Earth Observation
 
Preliminary Evaluation of TinyYOLO on a New Dataset for Search-And-Rescue wit...
Preliminary Evaluation of TinyYOLO on a New Dataset for Search-And-Rescue wit...Preliminary Evaluation of TinyYOLO on a New Dataset for Search-And-Rescue wit...
Preliminary Evaluation of TinyYOLO on a New Dataset for Search-And-Rescue wit...
 
Application of terrestrial 3D laser scanning in building information modellin...
Application of terrestrial 3D laser scanning in building information modellin...Application of terrestrial 3D laser scanning in building information modellin...
Application of terrestrial 3D laser scanning in building information modellin...
 
Crowd Counting from UAVs (ECCV2020)
Crowd Counting from UAVs (ECCV2020)Crowd Counting from UAVs (ECCV2020)
Crowd Counting from UAVs (ECCV2020)
 
Final presentation for Ordinance Survey sponsored MSc Project
Final presentation for Ordinance Survey sponsored MSc ProjectFinal presentation for Ordinance Survey sponsored MSc Project
Final presentation for Ordinance Survey sponsored MSc Project
 
Remote Sensing Imagery & Artificial Intelligence
Remote Sensing Imagery & Artificial IntelligenceRemote Sensing Imagery & Artificial Intelligence
Remote Sensing Imagery & Artificial Intelligence
 
Godiva2 Overview
Godiva2 OverviewGodiva2 Overview
Godiva2 Overview
 
SPAR 2015 - Civil Maps Presentation by Sravan Puttagunta
SPAR 2015 - Civil Maps Presentation by Sravan PuttaguntaSPAR 2015 - Civil Maps Presentation by Sravan Puttagunta
SPAR 2015 - Civil Maps Presentation by Sravan Puttagunta
 
Image processing training in mohali
Image processing training in mohaliImage processing training in mohali
Image processing training in mohali
 

Similar to understanding the planet using satellites and deep learning

WFIRST Poster Small File Size
WFIRST Poster Small File SizeWFIRST Poster Small File Size
WFIRST Poster Small File Size
Davis Unruh
 
Roelof Pieters (Overstory) – Tackling Forest Fires and Deforestation with Sat...
Roelof Pieters (Overstory) – Tackling Forest Fires and Deforestation with Sat...Roelof Pieters (Overstory) – Tackling Forest Fires and Deforestation with Sat...
Roelof Pieters (Overstory) – Tackling Forest Fires and Deforestation with Sat...
Codiax
 

Similar to understanding the planet using satellites and deep learning (20)

Satellite Image Classification and Analysis using Machine Learning with ISRO ...
Satellite Image Classification and Analysis using Machine Learning with ISRO ...Satellite Image Classification and Analysis using Machine Learning with ISRO ...
Satellite Image Classification and Analysis using Machine Learning with ISRO ...
 
WFIRST Poster Small File Size
WFIRST Poster Small File SizeWFIRST Poster Small File Size
WFIRST Poster Small File Size
 
Deeplearning in finance
Deeplearning in financeDeeplearning in finance
Deeplearning in finance
 
Roelof Pieters (Overstory) – Tackling Forest Fires and Deforestation with Sat...
Roelof Pieters (Overstory) – Tackling Forest Fires and Deforestation with Sat...Roelof Pieters (Overstory) – Tackling Forest Fires and Deforestation with Sat...
Roelof Pieters (Overstory) – Tackling Forest Fires and Deforestation with Sat...
 
RemoteSensing_DeepLearning_v2.pptx
RemoteSensing_DeepLearning_v2.pptxRemoteSensing_DeepLearning_v2.pptx
RemoteSensing_DeepLearning_v2.pptx
 
A benchmark dataset to evaluate sensor displacement in activity recognition
A benchmark dataset to evaluate sensor displacement in activity recognitionA benchmark dataset to evaluate sensor displacement in activity recognition
A benchmark dataset to evaluate sensor displacement in activity recognition
 
Aplications for machine learning in IoT
Aplications for machine learning in IoTAplications for machine learning in IoT
Aplications for machine learning in IoT
 
IRJET- 3D Object Recognition of Car Image Detection
IRJET-  	  3D Object Recognition of Car Image DetectionIRJET-  	  3D Object Recognition of Car Image Detection
IRJET- 3D Object Recognition of Car Image Detection
 
Exascale Computing and Experimental Sensor Data
Exascale Computing and Experimental Sensor DataExascale Computing and Experimental Sensor Data
Exascale Computing and Experimental Sensor Data
 
Satellite and Land Cover Image Classification using Deep Learning
Satellite and Land Cover Image Classification using Deep LearningSatellite and Land Cover Image Classification using Deep Learning
Satellite and Land Cover Image Classification using Deep Learning
 
Rack Cluster Deployment for SDSC Supercomputer
Rack Cluster Deployment for SDSC SupercomputerRack Cluster Deployment for SDSC Supercomputer
Rack Cluster Deployment for SDSC Supercomputer
 
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
 
Drones and A.I in Earth Science
Drones and A.I in Earth ScienceDrones and A.I in Earth Science
Drones and A.I in Earth Science
 
Exascale Challenges: Space, Time, Experimental Science and Self Driving Cars
Exascale Challenges: Space, Time, Experimental Science and Self Driving Cars Exascale Challenges: Space, Time, Experimental Science and Self Driving Cars
Exascale Challenges: Space, Time, Experimental Science and Self Driving Cars
 
Spatial Computing and the Future of Utility GIS
Spatial Computing and the Future of Utility GISSpatial Computing and the Future of Utility GIS
Spatial Computing and the Future of Utility GIS
 
Machine_Learning_with_MATLAB_Seminar_Latest.pdf
Machine_Learning_with_MATLAB_Seminar_Latest.pdfMachine_Learning_with_MATLAB_Seminar_Latest.pdf
Machine_Learning_with_MATLAB_Seminar_Latest.pdf
 
Exploration – A Serious Game
Exploration – A Serious GameExploration – A Serious Game
Exploration – A Serious Game
 
Self Automated Rovers
Self Automated RoversSelf Automated Rovers
Self Automated Rovers
 
Word
WordWord
Word
 
Next Century Project Overview
Next Century Project OverviewNext Century Project Overview
Next Century Project Overview
 

Recently uploaded

Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
dharasingh5698
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
MsecMca
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
Epec Engineered Technologies
 

Recently uploaded (20)

Unit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfUnit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdf
 
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 

understanding the planet using satellites and deep learning

  • 1. Understanding the planet using satellites and deep learning bcn.AI June 7th 2019 Albert Pujol Torras @AlbertPT71 Lead Machine Learning Platform
  • 2. Agenda ● Satellogic ● Satellogic Data Science and Solutions ● What we can do with satellites, examples of problems we face ● What type of data do we work with ? ● Processing infrastructure, hardware and software ● Some lines of research we are interested on. ● Lessons learned ● Questions
  • 3.
  • 4. Data Science & Solutions BCN Delivery platform TLV Headquarters & Design BSAS Manufacturing Plant MVD Comprehensive services PEK
  • 5. Object detection/Counting What kinds of problems do we face ?
  • 6. Object amount/density estimation / regression with lower image resolution
  • 7. Estimation of other image modalities HR RGB LR TIR LR SWIR1 LR SWIR2 HR THERMAL
  • 8. Regression: time series image prediction -Estimation of the yield at the end of the season -Monitoring of changes in the estimation to know when and where to act.
  • 9. Image semantic segmentation: Land use detection
  • 10. “anomaly change” and “semantic change” detection Time T0 Time T1 Diff (T1,T0)
  • 11. Satellogic Data 3rd Party Satellite Data Primary Data Sources Derived Layers Temporal Evolution Land Use Maps Advanced Indices Distance to Water Terrain Orientation Superresolution Images ... These sources can be available globally or locally, dynamic or static, high or low res... nKappa: Data science platform with focus on geographic data and satellite imagery. Main goal: To scale solution development by automating/accelerating data science work. nKappa enables solution development using aligned sets of image tiles (Kappas) World Climate Maps Geologic Data Elevation Models Georef: Man-Made Structure Political Boundaries Census Data Maps Data - Data Sources
  • 12. Sizes: -Typical project: 20Gb/day. -Daily world remap: continental surface processing 5300 hours of video per day. Sources of image variation: -Clouds….70% of the world is cloud covered. -Perspective changes (off nadir satellite images, drone images). -Shadows orientation, intensity, and longitude variations depending on day hour, clouds, and season. -Chromatic changes due to aerosol and hour of day. -Variations between sensors (different satellites, drone images,..) -Variations/errors in image orthorectification, geolocalization. -Growth and color of seasonal vegetation changes,... Data - Data Sources clouds perspective shadows Chromatic and vegetation changes
  • 13. Data - Data Sources Extremely unbalanced datasets
  • 14. Rare and expensive: indispensable to train and to assess quality of ML and computer vision approaches. Sources of ground truth: - Land ground truth provided by client. - GT generated using highest resolution imagery. - Human annotation - Our team always annotate ... to understand the problem. - Internal and external annotation (mechanical turk, supahands, ...) - sample what to annotate to preserve variability and input domain coverage. - Measure biases and variances of annotators (discard annotators, images,reconstruct annotation instructions...). - Other GT sources: first world surveyed data annotated from visual imagery or using land ground truth (Corina project, LUCAS, Creaf, Siose in spain, USA USGS land cover dataset,...) Useful one we have to deal with: - Out of data (most of it are correct but small parts are erroneous) - differing resolution (uncertain labels at class borders), - domain/covariate shift: how to transfer it to places that differ in land management culture, climate or relief. Data - Ground truth expensivecheaper
  • 15. GT Data: Covariate shift & Domain adaptation Existent “good quality” Ground Truth Rice fields in Europe Target areas without ground truth Urban areas in Europe Urban areas in Lagos Rice fields in China
  • 16. ● huge amount of data --> cloud infrastructure. ● nKappa platform for distributed processing (actually using Microsoft Azure) and in-house gpu servers (equipped with 1080ti’s) ● nKappa uses cloud for experiment management to keep track, team share, and audit datasets, algorithms, models ,deploying pipelines and models in to production, and handle all the GIS-ETL related stuff. ● GPU-servers mostly used in the stage of EDA and DS algorithms and models development. Infrastructure - Hardware
  • 17. Some lines of research: Domain adaptation ”Deep Visual Domain Adaptation: A Survey”, Mei Wang, Weihong Deng, “Domain Adaptation for Visual Applications: A Comprehensive Survey”, Gabriela Csurka Sampling and sample-weighting based on classifier domain differentiation Adversarial networks to make embeddings invariant to domain change GT Barcelona Target Lasa
  • 18. Some lines of research: Usage of generative models Image-to-Image Translation with Conditional Adversarial Networks, Isola, Phillip; Zhu, Jun-Yan; Zhou, Tinghui; Efros, Alexei A. Satellite Image Spoofing: Creating Remote Sensing Dataset with Generative Adversarial Networks, Chunxue Xu,Bo Zhao GeoGAN: A Conditional GAN with Reconstruction and Style Loss to Generate Standard Layer of Maps from Satellite Images Invisible cities. https://opendot.github.io/ml4a-invisible-cities/implementation/ Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, Jun-Yan Zhu, Taesung Park, Phillip Isola, Alexei A. Efros Evaluation of the effects on semantic segmentation of using samples from Conditional Generative Adversarial Networks: - Data augmentation: Generation of satellite images (textures) from land use random labels. - Hiper resolution and image enhancement.
  • 19. Some lines of research: Uncertainty measurement and GT cleaning “Dropout as a Bayesian Approximation:Representing Model Uncertainty in Deep Learning”, Yarin Gal,Zoubin Ghahramani “Co-teaching: Robust Training of Deep Neural Networks with Extremely Noisy Labels”, Bo Han,, Quanming Yao, Xingrui Yu,, Gang Niu,Miao Xu, Weihua Hu, Ivor W. Tsang, Masashi Sugiyama Measure errors on GT labeling: - Error and entropy on classification distribution when using an ensemble of classifiers. - Entropy of DNN outputs when applying dropout on fully connected layers on inference stage.
  • 20. Some lines of research: Distance metric -Invariant embeddings How we can use the huge amount of unlabeled data to train models. -learning deep NN invariant embeddings and transferable models for encoding land use content. “Tile2Vec: Unsupervised representation learning for spatially distributed data” ,Neal Jean, Sherrie Wang, Anshul Samar, George Azzari, David Lobell, Stefano Ermon ANCHOR TILES POSITIVE TILES NEGATIVE TILES
  • 21. Lessons learned - Project success : - 5% ML algorithm and algorithm parameters selection, - 95% really understanding what the client needs, how to generate value, and anticipate how your output is going to be consumed, defining good features, good ground truth, good sampling data policy, pre and post processing. - Dedicate the time first to ensure success, … after that improve: - Using fast ML algorithms. - Starting with small datasets with the input and output variability of the original one. - Predictive models: accuracy is not always the most important: explainability, consistency. - Worth invest on automatically measure dataset quality before start training on big datasets. - Missing values, constant variables, unaligned bands, duplicated variables, unbalancing… - Most of our in production costs are ETL (extract, transform, load) - Deep Learning is amazing (sometimes too much for the problems to solve) ….and it is expensive: - In production: computational cost. - In development: Fine tuning and network cooking. (does not scale quite well) - Context knowledge + common sense heuristics + ML vs end-to-end (is all tarjet domain variability in your train set?)