VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
understanding the planet using satellites and deep learning
1. Understanding the planet using satellites
and deep learning
bcn.AI June 7th 2019
Albert Pujol Torras @AlbertPT71
Lead Machine Learning Platform
2. Agenda
● Satellogic
● Satellogic Data Science and Solutions
● What we can do with satellites, examples of problems we face
● What type of data do we work with ?
● Processing infrastructure, hardware and software
● Some lines of research we are interested on.
● Lessons learned
● Questions
3.
4. Data Science & Solutions
BCN
Delivery platform
TLV
Headquarters & Design
BSAS
Manufacturing Plant
MVD
Comprehensive services
PEK
7. Estimation of other image modalities
HR RGB LR TIR LR SWIR1 LR SWIR2
HR THERMAL
8. Regression: time series image prediction
-Estimation of the yield at the end of the season
-Monitoring of changes in the estimation to know when and where to act.
11. Satellogic Data
3rd Party Satellite Data
Primary Data
Sources
Derived Layers
Temporal Evolution
Land Use Maps
Advanced Indices
Distance to Water
Terrain Orientation
Superresolution Images
...
These sources can be available globally or locally, dynamic or static, high or low res...
nKappa: Data science platform with focus on geographic data and satellite imagery.
Main goal: To scale solution development by automating/accelerating data science work.
nKappa enables solution development using aligned sets of image tiles (Kappas)
World Climate Maps
Geologic Data
Elevation Models
Georef: Man-Made Structure
Political Boundaries
Census Data Maps
Data - Data Sources
12. Sizes:
-Typical project: 20Gb/day.
-Daily world remap: continental surface processing 5300 hours of video per day.
Sources of image variation:
-Clouds….70% of the world is cloud covered.
-Perspective changes (off nadir satellite images, drone images).
-Shadows orientation, intensity, and longitude variations depending on day hour, clouds, and season.
-Chromatic changes due to aerosol and hour of day.
-Variations between sensors (different satellites, drone images,..)
-Variations/errors in image orthorectification, geolocalization.
-Growth and color of seasonal vegetation changes,...
Data - Data Sources
clouds perspective shadows
Chromatic and vegetation
changes
13. Data - Data Sources
Extremely unbalanced datasets
14. Rare and expensive: indispensable to train and to assess quality of ML and computer vision
approaches.
Sources of ground truth:
- Land ground truth provided by client.
- GT generated using highest resolution imagery.
- Human annotation
- Our team always annotate ... to understand the problem.
- Internal and external annotation (mechanical turk, supahands, ...)
- sample what to annotate to preserve variability and input domain coverage.
- Measure biases and variances of annotators (discard annotators, images,reconstruct annotation
instructions...).
- Other GT sources: first world surveyed data annotated from visual imagery or using land ground truth (Corina project,
LUCAS, Creaf, Siose in spain, USA USGS land cover dataset,...)
Useful one we have to deal with:
- Out of data (most of it are correct but small parts are erroneous)
- differing resolution (uncertain labels at class borders),
- domain/covariate shift: how to transfer it to places that differ in land management culture, climate or relief.
Data - Ground truth
expensivecheaper
15. GT Data: Covariate shift & Domain adaptation
Existent “good quality” Ground Truth
Rice fields in Europe
Target areas without ground truth
Urban areas in Europe Urban areas in Lagos
Rice fields in China
16. ● huge amount of data --> cloud infrastructure.
● nKappa platform for distributed processing (actually using Microsoft Azure)
and in-house gpu servers (equipped with 1080ti’s)
● nKappa uses cloud for experiment management to keep track, team share,
and audit datasets, algorithms, models ,deploying pipelines and models in to
production, and handle all the GIS-ETL related stuff.
● GPU-servers mostly used in the stage of EDA and DS algorithms and models
development.
Infrastructure - Hardware
17. Some lines of research: Domain adaptation
”Deep Visual Domain Adaptation: A Survey”, Mei Wang, Weihong Deng,
“Domain Adaptation for Visual Applications: A Comprehensive Survey”, Gabriela Csurka
Sampling and sample-weighting based on classifier domain differentiation Adversarial networks to make embeddings invariant to domain change
GT Barcelona Target Lasa
18. Some lines of research: Usage of generative models
Image-to-Image Translation with Conditional Adversarial Networks, Isola, Phillip; Zhu, Jun-Yan; Zhou, Tinghui; Efros, Alexei A.
Satellite Image Spoofing: Creating Remote Sensing Dataset with Generative Adversarial Networks, Chunxue Xu,Bo Zhao
GeoGAN: A Conditional GAN with Reconstruction and Style Loss to Generate Standard Layer of Maps from Satellite Images
Invisible cities. https://opendot.github.io/ml4a-invisible-cities/implementation/
Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, Jun-Yan Zhu, Taesung Park, Phillip Isola, Alexei A. Efros
Evaluation of the effects on semantic segmentation of using samples from Conditional Generative Adversarial Networks:
- Data augmentation: Generation of satellite images (textures) from land use random labels.
- Hiper resolution and image enhancement.
19. Some lines of research: Uncertainty measurement and GT cleaning
“Dropout as a Bayesian Approximation:Representing Model Uncertainty in Deep Learning”, Yarin Gal,Zoubin Ghahramani
“Co-teaching: Robust Training of Deep Neural Networks with Extremely Noisy Labels”, Bo Han,, Quanming Yao, Xingrui Yu,, Gang Niu,Miao Xu,
Weihua Hu, Ivor W. Tsang, Masashi Sugiyama
Measure errors on GT labeling:
- Error and entropy on classification distribution when using
an ensemble of classifiers.
- Entropy of DNN outputs when applying dropout on fully
connected layers on inference stage.
20. Some lines of research: Distance metric -Invariant embeddings
How we can use the huge amount of unlabeled data to train models.
-learning deep NN invariant embeddings and transferable models for encoding land use content.
“Tile2Vec: Unsupervised representation learning for spatially distributed data” ,Neal Jean, Sherrie Wang, Anshul Samar, George Azzari, David Lobell,
Stefano Ermon
ANCHOR TILES
POSITIVE TILES
NEGATIVE TILES
21. Lessons learned
- Project success :
- 5% ML algorithm and algorithm parameters selection,
- 95% really understanding what the client needs, how to generate value, and anticipate how
your output is going to be consumed, defining good features, good ground truth, good
sampling data policy, pre and post processing.
- Dedicate the time first to ensure success, … after that improve:
- Using fast ML algorithms.
- Starting with small datasets with the input and output variability of the original one.
- Predictive models: accuracy is not always the most important: explainability, consistency.
- Worth invest on automatically measure dataset quality before start training on big datasets.
- Missing values, constant variables, unaligned bands, duplicated variables, unbalancing…
- Most of our in production costs are ETL (extract, transform, load)
- Deep Learning is amazing (sometimes too much for the problems to solve) ….and it is expensive:
- In production: computational cost.
- In development: Fine tuning and network cooking. (does not scale quite well)
- Context knowledge + common sense heuristics + ML vs end-to-end (is all tarjet domain variability
in your train set?)