Automating Google Workspace (GWS) & more with Apps Script
Big Data, Data and Information Mining for Earth Observation
1. Image Information Mining and
Knowledge Discovery from Earth
Observation Data
Towards the Sentinels Era
P.G. Marchetti ESA, M. Iapaolo Randstad
Ground Segment and Mission Operations Department
Research and Ground Segment Technology Section
Earth Observation Programmes Directorate
pier.giorgio.marchetti@esa.int michele.iapaolo@esa.int
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013)
26/11/2013
2. Outline
1. Background on the European Space Agency
2. Motivation
3. Overview of ESA activities in the IIM field
4. Systems and services for EO data exploitation
5. The road ahead
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013)
26/11/2013
3. The Heritage: ERS and ENVISAT
• ERS and Envisat missions 1991-2012
•
More than 2 Petabytes of data
•
Two decades of global change records
•
Need for data preservation, availability
and exploitation
ESA UNCLASSIFIED – For Official Use
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013)
26/11/2013
4. Ten Years of Envisat Science
5000 scientific
projects
using Envisat data
Iceland
2010
Ozone hole 2005
Arctic 2007
First images
L’Aquila 2009
Global air
pollution
B-15A
iceberg
Chlorophyll
concentration
Japan 2011
Bam earthquake
Prestige tanker
oil slick
CO2 map
Launch
Hurricane
Katrina
Envisat
Symposium
Salzburg (A)
Mar 02
Sep 04
Envisat was the Sentinel
“precursor” for many operational
users
Envisat
Symposium
Montreux (CH)
Apr 07
Living Planet
Symposium
Bergen (N)
Jun 10
Living Planet
Symposium
Edinburgh (UK)
Sep 13
and many workshops dedicated to specific Envisat user communities
ESA UNCLASSIFIED – For Official Use
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013)
26/11/2013
5. The Copernicus Programme
Copernicus (formerly known
as GMES) is a European space
flagship programme led by
the European Union
Space
Component
Provides the necessary data
for operational monitoring of
the environment and for civil
security
ESA coordinates the space(*)
component
(*)spacecraft, flight operation segment, ground
segment
ESA UNCLASSIFIED – For Official Use
In-Situ
Component
Services
Component
6
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013)
26/11/2013
6. Copernicus Space Component: Dedicated
Missions
S1A/B: Radar Mission
S2A/B: High Resolution Optical Mission
S3A/B: Medium Resolution Imaging and Altimetry Mission
S4A/B: Geostationary Atmospheric Chemistry Mission
S5P: Low Earth Orbit Atmospheric Chemistry Precursor Mission
S5A/B/C: Low Earth Orbit Atmospheric Chemistry Mission
Jason-CS A/B: Altimetry Mission
7
ESA UNCLASSIFIED – For Official Use
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013)
26/11/2013
7. Copernicus Contributing Missions
COSMO-Skymed
SPOT (VGT)
TerraSAR–X
Tandem-X
PROBA-V
Radarsat
DMC
Pléiades
Copernicus
Contributing
Missions
Cryosat
Deimos-2
RapidEye
Jason
Atmospheric
missions
SPOT (HRS)
MetOp
ESA UNCLASSIFIED – For Official Use
Meteosat 2nd Generation
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013)
26/11/2013
8. Motivation
1. Foster the use of IIM and derived technologies in support of the EO
data exploitation
2. Develop state-of-the-art data processing for improving access and
dissemination of future EO data (e.g. Sentinels mission)
3. Implement systems and services for supporting the “scientific
exploitation” of EO data
4. Investigate new approaches and methodologies to exploit data from
all available missions and archives (joint effort with Long Term Data
Preservation programme)
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013)
26/11/2013
9. Image Information Mining
Coordination Group (IIMCG)
The Image Information Mining Coordination Group (IIMCG):
Space Agencies (ESA, DLR, CNES, ASI)
European Institutions (EUSC, JRC)
National Research Institutes (Uni-Trento, ETHZ, INGV, Mississippi
State University)
Main objectives:
Inform Agencies and partners, promote research and technological
activities on IIM (automatic information extraction from EO data for
image understanding and retrieval)
Promote the use of IIM techniques for management and exploitation
of very large EO data archives/missions (PB of data)
Foster the role of IIM in the context of future missions and existing
archives
Involve industry and agency partners to increase the relevance of
IIM activities in Europe
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013)
26/11/2013
10. 10 years of IIM activities at ESA
2000
2002
2004
2006
2008
2010
2012
Knowledge Driven Information
Mining Prototype
Knowledge-centred Earth
Observation Prototype
Multi-sensor Evolution Analysis Prototype
Technology activities over last decade
Main achievements:
KIM System: IIM reference prototype @ ESRIN
Platforms for EO data exploitation (KEO, GPOD, SSE, etc.)
Tools for multi-temporal and evolution analysis (MEA)
Issues:
Limited number of scientific and industrial partners involved
National efforts not coordinated and harmonised
Funds limited wrt the size of the research goals
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013)
26/11/2013
11. Image Information Mining
From Data to Information
Data (PB)
Acquisition
EO Data
Catalogue & Ordering
Image Information
Mining (IIM)
Information
Algorithms & Applications
Knowledge
Models & Ground Truth
Information (KB)
Provides processing tools to extract features from images and associate meaning
to extracted features (bridging the gap between data and information)
Empower users (researchers, service providers, decision makers) to identify and
reuse relevant information for their applications
Encourage the use of common cooperative environments to achieve a common
knowledge
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013)
26/11/2013
12. KIM
(Knowledge-based Information Mining)
The KIM prototype developed @ ESRIN permits:
Intelligent and effective access to information in large EO datasets
Improved exploration and use of EO images for scientific research
Extraction of relevant information for different applications (change
detection, global monitoring, disaster management, …)
Implementation, integration and validation of services derived from IIM methods
Three main components:
1. Ingestion Software (Primitive Feature Extraction / Clustering)
2. Database (storing extracted information)
3. Interactive Client Application
i.
ii.
iii.
iv.
Training and definition of “semantic rules”
Application of training (rules) to the entire collection
Definition of “semantic labels” for extracted information
Store for successive re-use
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013)
26/11/2013
13. KIM Architectural Elements
Input
EO images
Output
Identifiers of searched images
Feature Maps / Thematic maps
laveirteR egam I desaB tnetnoC
laveirteR egam I desaB tnetnoC
KIM
seirotisopeR ataD
seirotisopeR ataD
Ingestion
Feature Extraction
Clustering
Database
XUA
XUA
noitacifissalC desivrepusnU
nInformation usnU
oitacifissalC desivrep
Mining
EO Images
seires emiT
ssiisyls na iT
e re a em
sisylana
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013)
26/11/2013
Client
14. KIM Search and label
KIM permits to inspect a collection of images…
…interactively define “semantic features” using the “primitive features” extracted by the
system…
…search for the defined feature within the entire collection…
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013)
26/11/2013
15. KIM Information Extraction
…and extract Feature Maps or Thematic Maps
Cloud masks
Flooded areas
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013)
26/11/2013
Forest Monitoring
16. KIM Primitive features
Spectral
Spectral signature
Texture
Structural information extracted with the Gibbs Marcov Random Fields (GMRF) model
S0 - full resolution images; S1 - sub-sampled images
DCT
Discrete Cosine Transform: transforms signals and images from the spatial domain to
the frequency domain
EMBD
Enhanced-Model-Based-Despeckling: performs a high quality despeckling of SAR
images
Area
Area of the objects detected with the segmentation process
Compactness
Compactness of the objects detected with the segmentation process
Spectral Mean
Mean value of the radiometric information of the image inside the closed area detected
by the segmenter
Spectral
Variance
Variance of the radiometric information of the image inside the closed area detected by
the segmenter
Hu Moments
Hu-Moment Invariants: shape information conveyed by the contour points. Hu moments
are invariant to scale, rotation and translation (the first 4 out of 7 invariant moments as
shape descriptors have been used).
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013)
26/11/2013
17. KIM Validation
KIM has been tested and validated with different datasets:
1. MERIS RR / MERIS FR
2. ERS / ASAR
3. SPOT
4. Landsat
5. Maps (Level 2 / Level 3 products)
Large number of collection created
Low number of significant semantic features identified
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013)
26/11/2013
18. KIM for Information Extraction
1. Flood Detection (SAR data)
2. Cloud Detection (MERIS RR)
3. Long-term Forest Monitoring (Landsat)
4. Rapid Mapping / Damage Assessment (VHR optical data)
Potentialities of the tool have been highlighted and
confirmed in different contexts
End-users expectations not always achieved
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013)
26/11/2013
19. KEO
(Knowledge centred Earth Observation)
KEO is a distributed Component-based Processing Environment (CPE)
permitting to:
a.
Create & semantically identify internal/external Processing Components
b.
Graphically chain Processing Components into processing chains
c.
Create Processing Components from IIM components (KIM training)
d.
Export and store outputs into Web Servers (WFS, WMS, WCS)
KEO also provides some relevant Reference Data Sets:
a.
Heterogeneous data and information, growing with external
contributions (images, documents, DEMs, photos, processors, etc.)
b.
In support of various applications: Classification, Time Series
Analysis, Ortho-rectification, Urban Monitoring, Interferometry, etc.)
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013)
26/11/2013
20. KEO CPE
Graphical Processor Designer
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013)
26/11/2013
21. MEA
(Multi-temporal Evolution Analysis)
Multi-temporal analysis of HR / VHR products:
1. Select multi-temporal applications that might benefit from such extension
2. Design, implement and integrate the automatic multi-temporal algorithms
to support the selected applications
3. Create the needed HR/VHR Reference Data Sets and Evolution Models
4. Develop standard interfaces between the different systems for common
exploitation of ingested data and processing capabilities
5. Integrate algorithms and Evolution Models provided by other independent
projects
6. Validate (with the support of a Validation Group) the Automatic Multitemporal algorithms and Evolution Models
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013)
26/11/2013
22. MEA
(Multi-temporal Evolution Analysis)
The MEA-ASIM system aims at providing:
1. Advanced tools for Land Use / Land Cover change analysis
2. Level-2 EO products for real time exploitation
3. Interfaces to external systems (G-POD, KEO, data providers, etc.)
4. Access to data via standard WCS OGC interface
RSS Data Farm
5. Native support for Sentinel-2 datasets
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013)
26/11/2013
23. MEA Pixel and Coverage analysis
Time-Series Analisys
Single and multi plot functionality
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013)
26/11/2013
24. MEA Pixel and Coverage analysis
Cross-comparison of EO products
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013)
26/11/2013
25. Exploitation Platforms for EO
Development and implementation of collaborative Exploitation
Platforms (G-POD, SSEP, E-CEO, etc.):
1. Fostering the scientific exploitation of EO data
2. Automating the creation data mining and information extraction
experiments and algorithms
3. Supporting the creation of EO-based applications and services
4. Supporting the entire scientific research process:
a.
Addressing specific scientific challenges and tackling new research
problems in a “parallel and collaborative way”
b.
Generation of reproducible results that can be easily shared and
validated
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013)
26/11/2013
26. Research and Service Support:
Research Process
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013)
26/11/2013
27. RSS G-POD Process Steps
Principal Investigators
•
•
RSS
EO algorithms delivery
Data type and range indication
•
Data are made available in the RSS
catalogue
•
Algorithm porting and Integration
•
Output validation
•
Test and validation (involving the PI)
•
On-demand EO data processing
•
On-demand EO data processing
•
Use of produced data (scientific
projects delivery)
•
Delivery
•
Publications
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013)
26/11/2013
28. RSS Flexible Resources
On-demand processing service:
Process
EO data
delivery
G-POD
EO Scientists
Principal
Investigators
Platform
Volume accessed by PI projects in 2012:
Infrastructure
• Total Number Submitted Jobs 38,774
• Average Number of Products per Job: 35
• Average Product Size: 700 MB
• Total Size Data Processed: 906 TB
ESRIN
- 172 cores
- 400 TB
UK-PAC
- 96 cores
- 300 TB
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013)
Flexible/
Unlimited
Infrastructure
- 10-200 cores
- 1-10 TB
26/11/2013
Flexible Infrastructure satisfies:
HW requirements
Connectivity requirements
SLA (HA, help desk, ticketing
systems, etc.)
29. RSS Facts & Figures
On-demand Processing: actual figures in the last 3 years
•Supported more than 40 active users per year
•Supported >20 processing/re-processing campaigns (included entire missions, e.g.
MERIS, ASAR, SMOS and TPM)
•Integrated ~10 new algorithms per year
•Upgraded ~15 algorithms per year
•Set-up flexible (additional) processing capacity in less than 2 working days
•Managed >450 TB data farm (ESA, TPM and scientific products)
•
ESA – ENVISAT (~320TB), ERS (~50TB), SMOS (~10TB)
•
TPM – MSG (~19TB), METOP (~11TB), ALOS (~2TB)
•
Scientific products – AARDVARC Swansea University and MGVI JRC (produced
by GPOD and distributed via SSE), MKL3 ACRI
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013)
26/11/2013
30. SMOS Testbed
Service Purpose
SMOS Testbed aims to provide a flexible test environment to support the ESA calibration team for L1
calibration, and the Expert Support Laboratories (ESLs) for L2 Soil Moisture and Ocean Salinity pre-validation.
G-POD support elements
–
Fast integration of new versions
–
SMOS L0 NRT ingestion chain set-up for L1 NRT custom re-processing
–
Access to online data for bulk re-processing
–
Access to flexible cloud resources for meeting deadlines
–
On-demand SMOS L1 and L2 processors available for SMOS Teams
SMOS New
Processor
Delivery
Processor
Integration in
G-POD
G-POD
Processing
Campaign
Results
Analysis and
Validation
Auxiliary And
Calibration Datasets
ESA UNCLASSIFIED – For Official Use
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013)
26/11/2013
31. SMOS Testbed
SMOS L1 TESTBED
Processor:s Calibration,
Telemetry, Level 1A, 1B, 1C
Supported versions: 3.46, 5.00,
5.01, 5.02, 5.03, 5.04, 5.05, 6.00,
6.01
Reference data series: L0, L1A,
L1B, L1C (reprocessed)
Auxiliary data baseline: as
per Operational environment
ESA UNCLASSIFIED – For Official Use
SMOS L2 SOIL
MOISTURE TESTBED
SMOS L2 OCEAN
SALINITY TESTBED
Processor: SM L2, SM L2 postprocessing
Supported versions: 4.00, 4.01
Reference data series: L1C
(reprocessed)
Auxiliary data baseline: as
per CESBIO reprocessing
Processor: OS L2
Supported versions: 5.00, 5.50
Reference data series: L1C
(reprocessed)
Auxiliary data baseline: as
per Operational environment
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013)
26/11/2013
32. The road ahead (1)
Funds for a full scale research programme are needed to foster:
The widening of competence and expertise in several research
centres/industrial actors in Europe
The widening of efforts to cover time series analysis and data
analytics in general
Research and development of multi dimensional and scalable DB
solutions (including nosql databases, hadoop, etc.)
Large collaborative and persistent effort on crowdsourcing,
benchmarking, image and feature annotation and evaluation
Establishing a theoretical framework to bridge the semantic gap and
be able to assign “discriminating power” to extracted features and
“categorization” of extracted classes/objects
High quality software and algorithm developments able to reach at
least the “software prototype” readiness level
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013)
26/11/2013
33. The road ahead (2)
To achieve these goals it is necessary to:
Establish a common “Big Data Mining” framework with
interdisciplinary partners
Establish a R&D network to sustain this field
Establish a network of users, and give them access to IIM resources
(system, data, …)
Enlarge the scope of “Image” Mining to the physical parameters
measured by EO instruments
Address the “instrument” gap, instrument-application
Develop methods to use heterogeneous data: in
situ, metadata, linked data, models, etc.
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013)
26/11/2013
34. The road ahead (3)
Activities to be started:
Promote IIM technology acceptance for EO users
Extend and adapt methods from multimedia and social nets
Apply human computing, gather knowledge from the use of the
system, adaptation, personalization, etc.
Focus on Web/Internet based systems
Develop simple and specific HMI and GUI
Focus on Visual Data Mining, Visual Analytics, and related methods
In the PDGS identify “long term data preservation” and “interactive
data exploitation” components
Design data representations: actionable information
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013)
26/11/2013
35. The road ahead (4)
Merge the best of :
• data mining approach
• time series capability
• ability to support and host the user
algorithm
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013)
26/11/2013
36. MANY THANKS!
ESA UNCLASSIFIED – For Official Use
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013)
26/11/2013
Hinweis der Redaktion
European independence in data sources for environment and security monitoringEuropean independence & contribution to global observing system Copernicus is a European system for monitoring the Earth. Copernicus consists of a complex set of systems which collect data from multiple sources: earth observation satellites and in situ sensors such as ground stations, airborne and sea-borne sensors. It processes these data and provides users with reliable and up-to-date information through a set of services related to environmental and security issues.
TheSentinel-Satellites(S1A/B, S2A/B, S3A/B, S4A/B and S5 Precursor) are under development, S-5 and Jason-CS are under definition Satellite launchesas from beginning 2014 Theground segment (data reception, processing and dissemination)is getting ready for Sentinel launchesEUis responsible for Copernicus overall and for servicesESAis responsible forthe Space Component
Available today or planned at European, national and international levelDeveloped for other purposes but making important data available for CopernicusList not exhaustive (+ Seosar, SPOT-6/-7, TanDEM-X, EnMAP, Venμs, Altika, Deimos2, etc. )… will evolve based on service requirements and mission availabilities