SlideShare ist ein Scribd-Unternehmen logo
1 von 34
Downloaden Sie, um offline zu lesen
Three years of the ExtremeEarth project
Online workshop - December 9th 2021
Theofilos Kakantousis
Desta Haileselassie Hagos
Logical Clocks, KTH
The ExtremeEarth platform: scalable deep learning
pipelines with Earth observation data and Hopsworks
ExtremeEarth
From Copernicus Big Data
to Extreme Earth Analytics
This project has received funding from the European Union’s Horizon 2020
research and innovation programme under grant agreement No 825258.
3
Contents
1. ExtremeEarth platform architecture
2. End-to-end scalable deep learning
pipelines with Hopsworks
3. Exploitation of results
4. Research
ExtremeEarth Platform
Architecture
5
Background
• The Copernicus programme produces more than three petabytes (PB) of Earth Observation (EO)
data annually from Sentinel satellites.*
• Data and Information Access Services (DIAS) provide centralised access to Copernicus data and
processing tools.
• European Space Agency (ESA)Thematic Exploitation Platforms (TEPs) make sure complex data
streams are exploited to their full potential.
○ Food Security, Polar
• Hopsworks Data-Intensive AI platform brings scalable AI support for Earth Observation data.
* https://workshop.copernicus.eu/sites/default/files/content/attachments/ajax/copernicus_overview.pdf
6
How to build AI products with EO data
7
ExtremeEarth architecture goals
• ExtremeEarth brings together these components
○ Under the same architecture…
○ … and infrastructure.
○ Reduce cost and increase productivity by providing a seamless end-user experience without
having to manage different services
• Combine
○ EO data access from DIASes
○ End-user facing EO data products from TEPs
○ Scalable AI capabilities of Hopsworks
8
ExtremeEarth architecture overview
9
ExtremeEarth architecture deep dive 1/2
• Infrastructure provided by Creodias and
managed by the TEPs
○ OpenStack cluster with GPU support
• Data layer with multiple data sources
○ Raw Creodias data
○ Intermediate TEP data
○ Training datasets
• Processing layer provided by Hopsworks.
○ Core AI engine
○ Develop PB-scale machine learning
algorithms with deep learning
architectures.
○ Platform that provides support for
semantic data tools
10
ExtremeEarth architecture deep dive 2/2
• Product layer
○ Hopsworks serves AI products to
external clients
• User interface
○ Hopsworks is integrated with the
TEPs via APIs
○ TEP users make direct use of AI
models develop in Hopsworks.
11
Real World Use Cases - Food Security
12
Real World Use Cases - Polar
13
ExtremeEarth running in production
• Hopsworks installed alongside TEP
infrastructure on CREODIAS
○ https://hopsworks.polartep.io
• Provides easy EO data access and
machine learning development tooling to
developers and data scientists.
• Deep learning architectures developed on
this Hopsworks cluster for the Food
Security and Polar use cases.
End-to-end scalable
machine learning
pipelines
15
Hopsworks
Open source platform to develop end-to-end machine learning pipelines at scale
for Enterprise AI.
Use your tools of choice and serve at the lowest latency on any cloud, at any
scale.
The Data Platform for AI
16
Organizations are struggling to deploy AI
because of Data
● “87% identified data as the reason their organizations failed to successfully implement AI.”*
Venture Beat * https://venturebeat.com/2021/03/24/employees-attribute-ai-project-failure-to-poor-data-quality/
Where the data is
(storage)
Discover and
Access the data
Clean, Join and Aggregate the Data
Extract the Data
Transform the
data into features
Validate the data.
Make the process
repeatable
🔁
Serve for real-time
applications or train.
🏆
17
Growing Consensus on How to Manage
Complexity of AI
Data validation
Distributed Training
Model
Serving
A/B
Testing
Monitoring
Pipeline Management
HyperParameter
Tuning
Feature Engineering
Data Collection
Hardware
Management
* Diagram from Google’s paper Hidden Technical Debt in Machine Learning Systems
Data Model Prediction
φ(x)
18
Growing Consensus on How to Manage
Complexity of AI
Data validation
Distributed Training
Model
Serving
A/B
Testing
Monitoring
Pipeline Management
HyperParameter
Tuning
Feature Engineering
Data Collection
Hardware
Management
* Diagram from Google’s paper Hidden Technical Debt in Machine Learning Systems
FEATURE STORE
FEATURE ENGINEERING
Data Model Prediction
φ(x)
FEATURE STORE
FEATURE ENGINEERING
19
Growing Consensus on How to Manage
Complexity of AI
Data validation
Distributed Training
Model
Serving
A/B
Testing
Monitoring
Pipeline Management
HyperParameter
Tuning
Feature Engineering
Data Collection
Hardware
Management
* Diagram from Google’s paper Hidden Technical Debt in Machine Learning Systems
FEATURE STORE
FEATURE ENGINEERING
FEATURE STORE
FEATURE ENGINEERING
ML PLATFORM
TRAIN and SERVE
Data Model Prediction
φ(x)
20
Scalable end-to-end deep learning pipelines
● Horizontally scalable infrastructure that enables developers to manage the lifecycle of EO
machine learning applications
21
End-to-end machine learning components
Streaming Train/Test Data
(S3, HDFS, etc)
Online
Application
Data Warehouse
Data Lake
Feature
Engineering
Offline
Feature Store
Model Training
Model
Serving
Online
Feature Store
Model
Repository
Monitor
Deploy
Feature Vectors
Result Sink (DB)
Batch
Scoring
Batch Access
Deploy
Feature Store
HopsFS
Scaleout Metadata
22
Hopsworks - one open source platform with
all the tools
APPLICATIONS
API
DASHBOARDS
HOPSWORKS
DATASOURCE
ORCHESTRATION
In Airflow
BATCH
Apache Spark
STREAMING
Apache Spark
Apache Flink
HOPSWORKS
FEATURE
STORE
DISTRIBUTED
ML & DL
Pip
Conda
Tensorflow
scikit-learn
PyTorch
Jupyter
Notebooks
Tensorboard
FILESYSTEM & METADATA STORAGE
In HopsFS
MODEL
SERVING
Kubernetes
MODEL
MONITORING
Kafka
+
Spark Streaming
Data Preparation
& Ingestion
Experimentation
& Model Training
Deploy
& Productionalize
Apache
Kafka
23
ML experiments management
24
Distributed deep learning with Hopsworks
# RUNS ON THE WORKERS
def train():
def input_fn(): # return dataset
model = …
optimizer = …
model.compile(…)
history = model.fit(..)
metrics = {
'train_loss': history.history['loss'][-1],
'train_accuracy': history.history['accuracy'][-1],
'val_loss': history.history['val_loss'][-1],
'val_accuracy': history.history['val_accuracy'][-1],
}
tf.estimator.train_and_evaluate(
keras_estimator, input_fn)
# RUNS ON THE DRIVER
experiment.mirrored(train_fn, name='distributed,
metric_key='val_accuracy')
HopsFS
W 1
Driver
TF_CONFIG
W 5
W8
W 7
W 6
W 2
W 4
W 3
Metrics
TensorBoard Checkpoints Training Data Models Logs
25
Hyperparameter tuning with Maggy
● Library for distribution transparent machine
learning experiments on Apache Spark
● Not bound to stage based algorithms, contrary
to existing frameworks.
● Directed Hyperparameter Search (ASHA,
Bayesian) on TensorFlow, PyTorch,
ScikitLearn, XGBoost
● In real-time, unified Logging in Jupyter
notebooks.
26
Ablation studies with Maggy
● Parallel Ablation Studies: without
changing your inner training loop in
TensorFlow/Keras, evaluate (in
parallel) the effect of different
layers, datasets features, etc.
27
ML model registry management
28
Demo
Exploitation of results
30
Exploitation
● Hopsworks is now extended with EO data support
● Creates opportunities to onboard new use cases for AI with EO data
o Hopsworks as the AI platform for other research projects, H2020 DeepCube
● Hopsworks as a product offering
o With the Polar and Food Security TEPs ExtremeAI platform
o Can be seamlessly integrated with further DIASes
o Offered as SaaS at hopsworks.ai on public clouds such as Amazon AWS and Microsoft Azure
Research
32
Publications
o The ExtremeEarth Software Architecture for Copernicus Earth Observation Data. (Conference
paper)
▪ Published: Conference on Big Data from Space (BiDS21).
o ExtremeEarth Meets Data From Space (Journal paper).
▪ Published: IEEE Journal of Selected Topics in Applied Earth Observations and Remote
Sensing (JSTARS) (2021).
o Maggy: Scalable Asynchronous Parallel Hyperparameter Search. (Conference paper)
▪ Published: The 1st Workshop on Distributed Machine Learning (DistributedML'20).
o AutoAblation: Automated Parallel Ablation Studies for Deep Learning. (Conference paper)
▪ Published: The 1st Workshop on Machine Learning and Systems (EuroMLSys‘21)
o Scalable Artificial Intelligence for Earth Observation Data Using Hopsworks. (Journal paper)
▪ Under preparation: IEEE Journal of Selected Topics in Applied Earth Observations and
Remote Sensing (JSTARS) (2021). ⇒ Will be submitted soon.
• Published papers: http://earthanalytics.eu/publications.html
33
Blog posts
o AI Software Architecture for Copernicus Data with Hopsworks.
▪ July 2021 (link)
o End-to-end Deep Learning Pipelines with Earth observation Data in Hopsworks
▪ October 2021 (link)
Thank you!
github.com/logicalclocks/hopsworks
@hopsworks

Weitere ähnliche Inhalte

Was ist angesagt?

Project Matsu: Elastic Clouds for Disaster Relief
Project Matsu: Elastic Clouds for Disaster ReliefProject Matsu: Elastic Clouds for Disaster Relief
Project Matsu: Elastic Clouds for Disaster ReliefRobert Grossman
 
Big linked geospatial data tools in ExtremeEarth-phiweek19
Big linked geospatial data tools in ExtremeEarth-phiweek19Big linked geospatial data tools in ExtremeEarth-phiweek19
Big linked geospatial data tools in ExtremeEarth-phiweek19ExtremeEarth
 
Enabling Access to Big Geospatial Data with LocationTech and Apache projects
Enabling Access to Big Geospatial Data with LocationTech and Apache projectsEnabling Access to Big Geospatial Data with LocationTech and Apache projects
Enabling Access to Big Geospatial Data with LocationTech and Apache projectsRob Emanuele
 
Processing Geospatial Data At Scale @locationtech
Processing Geospatial Data At Scale @locationtechProcessing Geospatial Data At Scale @locationtech
Processing Geospatial Data At Scale @locationtechRob Emanuele
 
Bioclouds CAMDA (Robert Grossman) 09-v9p
Bioclouds CAMDA (Robert Grossman) 09-v9pBioclouds CAMDA (Robert Grossman) 09-v9p
Bioclouds CAMDA (Robert Grossman) 09-v9pRobert Grossman
 
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...Accumulo Summit
 
Opening the Path to Technical Excellence
Opening the Path to Technical ExcellenceOpening the Path to Technical Excellence
Opening the Path to Technical ExcellenceNETWAYS
 
CourboSpark: Decision Tree for Time-series on Spark
CourboSpark: Decision Tree for Time-series on SparkCourboSpark: Decision Tree for Time-series on Spark
CourboSpark: Decision Tree for Time-series on SparkDataWorks Summit
 
GeoSpatially enabling your Spark and Accumulo clusters with LocationTech
GeoSpatially enabling your Spark and Accumulo clusters with LocationTechGeoSpatially enabling your Spark and Accumulo clusters with LocationTech
GeoSpatially enabling your Spark and Accumulo clusters with LocationTechRob Emanuele
 
Update on the Exascale Computing Project (ECP)
Update on the Exascale Computing Project (ECP)Update on the Exascale Computing Project (ECP)
Update on the Exascale Computing Project (ECP)inside-BigData.com
 
Q4 2016 GeoTrellis Presentation
Q4 2016 GeoTrellis PresentationQ4 2016 GeoTrellis Presentation
Q4 2016 GeoTrellis PresentationRob Emanuele
 
BigDataEurope 1st SC5 Workshop, Project Teleios & LEO, by M. Koubarakis, Univ...
BigDataEurope 1st SC5 Workshop, Project Teleios & LEO, by M. Koubarakis, Univ...BigDataEurope 1st SC5 Workshop, Project Teleios & LEO, by M. Koubarakis, Univ...
BigDataEurope 1st SC5 Workshop, Project Teleios & LEO, by M. Koubarakis, Univ...BigData_Europe
 
Sky Arrays - ArrayDB in action for Sky View Factor Computation
Sky Arrays - ArrayDB in action for Sky View Factor ComputationSky Arrays - ArrayDB in action for Sky View Factor Computation
Sky Arrays - ArrayDB in action for Sky View Factor ComputationEUDAT
 
GeoMesa LocationTech DC
GeoMesa LocationTech DCGeoMesa LocationTech DC
GeoMesa LocationTech DCCCRinc
 
Exascale Computing Project - Driving a HUGE Change in a Changing World
Exascale Computing Project - Driving a HUGE Change in a Changing WorldExascale Computing Project - Driving a HUGE Change in a Changing World
Exascale Computing Project - Driving a HUGE Change in a Changing Worldinside-BigData.com
 
Deep Learning on Aerial Imagery: What does it look like on a map?
Deep Learning on Aerial Imagery: What does it look like on a map?Deep Learning on Aerial Imagery: What does it look like on a map?
Deep Learning on Aerial Imagery: What does it look like on a map?Rob Emanuele
 

Was ist angesagt? (20)

Project Matsu: Elastic Clouds for Disaster Relief
Project Matsu: Elastic Clouds for Disaster ReliefProject Matsu: Elastic Clouds for Disaster Relief
Project Matsu: Elastic Clouds for Disaster Relief
 
Big linked geospatial data tools in ExtremeEarth-phiweek19
Big linked geospatial data tools in ExtremeEarth-phiweek19Big linked geospatial data tools in ExtremeEarth-phiweek19
Big linked geospatial data tools in ExtremeEarth-phiweek19
 
Enabling Access to Big Geospatial Data with LocationTech and Apache projects
Enabling Access to Big Geospatial Data with LocationTech and Apache projectsEnabling Access to Big Geospatial Data with LocationTech and Apache projects
Enabling Access to Big Geospatial Data with LocationTech and Apache projects
 
Processing Geospatial Data At Scale @locationtech
Processing Geospatial Data At Scale @locationtechProcessing Geospatial Data At Scale @locationtech
Processing Geospatial Data At Scale @locationtech
 
Bioclouds CAMDA (Robert Grossman) 09-v9p
Bioclouds CAMDA (Robert Grossman) 09-v9pBioclouds CAMDA (Robert Grossman) 09-v9p
Bioclouds CAMDA (Robert Grossman) 09-v9p
 
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
 
Opening the Path to Technical Excellence
Opening the Path to Technical ExcellenceOpening the Path to Technical Excellence
Opening the Path to Technical Excellence
 
CourboSpark: Decision Tree for Time-series on Spark
CourboSpark: Decision Tree for Time-series on SparkCourboSpark: Decision Tree for Time-series on Spark
CourboSpark: Decision Tree for Time-series on Spark
 
GeoSpatially enabling your Spark and Accumulo clusters with LocationTech
GeoSpatially enabling your Spark and Accumulo clusters with LocationTechGeoSpatially enabling your Spark and Accumulo clusters with LocationTech
GeoSpatially enabling your Spark and Accumulo clusters with LocationTech
 
Update on the Exascale Computing Project (ECP)
Update on the Exascale Computing Project (ECP)Update on the Exascale Computing Project (ECP)
Update on the Exascale Computing Project (ECP)
 
Q4 2016 GeoTrellis Presentation
Q4 2016 GeoTrellis PresentationQ4 2016 GeoTrellis Presentation
Q4 2016 GeoTrellis Presentation
 
BigDataEurope 1st SC5 Workshop, Project Teleios & LEO, by M. Koubarakis, Univ...
BigDataEurope 1st SC5 Workshop, Project Teleios & LEO, by M. Koubarakis, Univ...BigDataEurope 1st SC5 Workshop, Project Teleios & LEO, by M. Koubarakis, Univ...
BigDataEurope 1st SC5 Workshop, Project Teleios & LEO, by M. Koubarakis, Univ...
 
Sky Arrays - ArrayDB in action for Sky View Factor Computation
Sky Arrays - ArrayDB in action for Sky View Factor ComputationSky Arrays - ArrayDB in action for Sky View Factor Computation
Sky Arrays - ArrayDB in action for Sky View Factor Computation
 
GeoMesa LocationTech DC
GeoMesa LocationTech DCGeoMesa LocationTech DC
GeoMesa LocationTech DC
 
Working with Scientific Data in MATLAB
Working with Scientific Data in MATLABWorking with Scientific Data in MATLAB
Working with Scientific Data in MATLAB
 
Exascale Computing Project - Driving a HUGE Change in a Changing World
Exascale Computing Project - Driving a HUGE Change in a Changing WorldExascale Computing Project - Driving a HUGE Change in a Changing World
Exascale Computing Project - Driving a HUGE Change in a Changing World
 
EOSC-hub & Geohazards TEP
EOSC-hub & Geohazards TEPEOSC-hub & Geohazards TEP
EOSC-hub & Geohazards TEP
 
Working with HDF and netCDF Data in ArcGIS: Tools and Case Studies
Working with HDF and netCDF Data in ArcGIS: Tools and Case StudiesWorking with HDF and netCDF Data in ArcGIS: Tools and Case Studies
Working with HDF and netCDF Data in ArcGIS: Tools and Case Studies
 
Deep Learning on Aerial Imagery: What does it look like on a map?
Deep Learning on Aerial Imagery: What does it look like on a map?Deep Learning on Aerial Imagery: What does it look like on a map?
Deep Learning on Aerial Imagery: What does it look like on a map?
 
Summary of HDF-EOS5 Files, Data Model and File Format
Summary of HDF-EOS5 Files, Data Model and File FormatSummary of HDF-EOS5 Files, Data Model and File Format
Summary of HDF-EOS5 Files, Data Model and File Format
 

Ähnlich wie Hopsworks - ExtremeEarth Open Workshop

ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...Big Data Value Association
 
Gergely Sipos (EGI): Exploiting scientific data in the international context ...
Gergely Sipos (EGI): Exploiting scientific data in the international context ...Gergely Sipos (EGI): Exploiting scientific data in the international context ...
Gergely Sipos (EGI): Exploiting scientific data in the international context ...Gergely Sipos
 
Use r 2013 tutorial - r and cloud computing for higher education and research
Use r 2013   tutorial - r and cloud computing for higher education and researchUse r 2013   tutorial - r and cloud computing for higher education and research
Use r 2013 tutorial - r and cloud computing for higher education and researchkchine3
 
Artificial Intelligence in the Earth Observation Domain: Current European Res...
Artificial Intelligence in the Earth Observation Domain: Current European Res...Artificial Intelligence in the Earth Observation Domain: Current European Res...
Artificial Intelligence in the Earth Observation Domain: Current European Res...ExtremeEarth
 
BDE SC3.3 Workshop - BDE Platform: Technical overview
 BDE SC3.3 Workshop -  BDE Platform: Technical overview BDE SC3.3 Workshop -  BDE Platform: Technical overview
BDE SC3.3 Workshop - BDE Platform: Technical overviewBigData_Europe
 
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hub
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hubCloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hub
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hubBjörn Backeberg
 
Going deep (learning) with tensor flow and quarkus
Going deep (learning) with tensor flow and quarkusGoing deep (learning) with tensor flow and quarkus
Going deep (learning) with tensor flow and quarkusRed Hat Developers
 
09 The Extreme-scale Scientific Software Stack for Collaborative Open Source
09 The Extreme-scale Scientific Software Stack for Collaborative Open Source09 The Extreme-scale Scientific Software Stack for Collaborative Open Source
09 The Extreme-scale Scientific Software Stack for Collaborative Open SourceRCCSRENKEI
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioAlluxio, Inc.
 
CHAPTER 2 cloud computing technology in cs
CHAPTER 2 cloud computing technology in csCHAPTER 2 cloud computing technology in cs
CHAPTER 2 cloud computing technology in csTSha7
 
Design phase kick-off event and Ceremony
Design phase kick-off event and CeremonyDesign phase kick-off event and Ceremony
Design phase kick-off event and CeremonyArchiver
 
Pathways for EOSC-hub and MaX collaboration
Pathways for EOSC-hub and MaX collaborationPathways for EOSC-hub and MaX collaboration
Pathways for EOSC-hub and MaX collaborationEOSC-hub project
 
WEBINAR: "How to manage your data to make them open and fair"
WEBINAR:  "How to manage your data to make them open and fair"  WEBINAR:  "How to manage your data to make them open and fair"
WEBINAR: "How to manage your data to make them open and fair" OpenAIRE
 
Federated Cloud Computing
Federated Cloud ComputingFederated Cloud Computing
Federated Cloud ComputingDavid Wallom
 
Getting Access to ALCF Resources and Services
Getting Access to ALCF Resources and ServicesGetting Access to ALCF Resources and Services
Getting Access to ALCF Resources and Servicesdavidemartin
 
H2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to EveryoneH2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to EveryoneSri Ambati
 
MDIS workshop 2015
MDIS workshop 2015MDIS workshop 2015
MDIS workshop 2015terradue
 
CloudComputingJun28.ppt
CloudComputingJun28.pptCloudComputingJun28.ppt
CloudComputingJun28.pptVipin Singhal
 

Ähnlich wie Hopsworks - ExtremeEarth Open Workshop (20)

ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
 
Gergely Sipos (EGI): Exploiting scientific data in the international context ...
Gergely Sipos (EGI): Exploiting scientific data in the international context ...Gergely Sipos (EGI): Exploiting scientific data in the international context ...
Gergely Sipos (EGI): Exploiting scientific data in the international context ...
 
Use r 2013 tutorial - r and cloud computing for higher education and research
Use r 2013   tutorial - r and cloud computing for higher education and researchUse r 2013   tutorial - r and cloud computing for higher education and research
Use r 2013 tutorial - r and cloud computing for higher education and research
 
Deep Hybrid DataCloud
Deep Hybrid DataCloudDeep Hybrid DataCloud
Deep Hybrid DataCloud
 
Artificial Intelligence in the Earth Observation Domain: Current European Res...
Artificial Intelligence in the Earth Observation Domain: Current European Res...Artificial Intelligence in the Earth Observation Domain: Current European Res...
Artificial Intelligence in the Earth Observation Domain: Current European Res...
 
BDE SC3.3 Workshop - BDE Platform: Technical overview
 BDE SC3.3 Workshop -  BDE Platform: Technical overview BDE SC3.3 Workshop -  BDE Platform: Technical overview
BDE SC3.3 Workshop - BDE Platform: Technical overview
 
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hub
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hubCloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hub
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hub
 
Going deep (learning) with tensor flow and quarkus
Going deep (learning) with tensor flow and quarkusGoing deep (learning) with tensor flow and quarkus
Going deep (learning) with tensor flow and quarkus
 
09 The Extreme-scale Scientific Software Stack for Collaborative Open Source
09 The Extreme-scale Scientific Software Stack for Collaborative Open Source09 The Extreme-scale Scientific Software Stack for Collaborative Open Source
09 The Extreme-scale Scientific Software Stack for Collaborative Open Source
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
 
CHAPTER 2 cloud computing technology in cs
CHAPTER 2 cloud computing technology in csCHAPTER 2 cloud computing technology in cs
CHAPTER 2 cloud computing technology in cs
 
Design phase kick-off event and Ceremony
Design phase kick-off event and CeremonyDesign phase kick-off event and Ceremony
Design phase kick-off event and Ceremony
 
Pathways for EOSC-hub and MaX collaboration
Pathways for EOSC-hub and MaX collaborationPathways for EOSC-hub and MaX collaboration
Pathways for EOSC-hub and MaX collaboration
 
WEBINAR: "How to manage your data to make them open and fair"
WEBINAR:  "How to manage your data to make them open and fair"  WEBINAR:  "How to manage your data to make them open and fair"
WEBINAR: "How to manage your data to make them open and fair"
 
OCP Summit 2017
OCP Summit 2017OCP Summit 2017
OCP Summit 2017
 
Federated Cloud Computing
Federated Cloud ComputingFederated Cloud Computing
Federated Cloud Computing
 
Getting Access to ALCF Resources and Services
Getting Access to ALCF Resources and ServicesGetting Access to ALCF Resources and Services
Getting Access to ALCF Resources and Services
 
H2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to EveryoneH2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to Everyone
 
MDIS workshop 2015
MDIS workshop 2015MDIS workshop 2015
MDIS workshop 2015
 
CloudComputingJun28.ppt
CloudComputingJun28.pptCloudComputingJun28.ppt
CloudComputingJun28.ppt
 

Mehr von ExtremeEarth

Polar Use Case - ExtremeEarth Open Workshop
Polar Use Case  - ExtremeEarth Open WorkshopPolar Use Case  - ExtremeEarth Open Workshop
Polar Use Case - ExtremeEarth Open WorkshopExtremeEarth
 
ExtremeEarth Open Workshop - Introduction
ExtremeEarth Open Workshop - IntroductionExtremeEarth Open Workshop - Introduction
ExtremeEarth Open Workshop - IntroductionExtremeEarth
 
AI models for Ice Classification - ExtremeEarth Open Workshop
AI models for Ice Classification - ExtremeEarth Open WorkshopAI models for Ice Classification - ExtremeEarth Open Workshop
AI models for Ice Classification - ExtremeEarth Open WorkshopExtremeEarth
 
Big Linked Data Interlinking - ExtremeEarth Open Workshop
Big Linked Data Interlinking - ExtremeEarth Open WorkshopBig Linked Data Interlinking - ExtremeEarth Open Workshop
Big Linked Data Interlinking - ExtremeEarth Open WorkshopExtremeEarth
 
Artificial Intelligence and Big Data Technologies for Copernicus Data: the Ex...
Artificial Intelligence and Big Data Technologies for Copernicus Data: the Ex...Artificial Intelligence and Big Data Technologies for Copernicus Data: the Ex...
Artificial Intelligence and Big Data Technologies for Copernicus Data: the Ex...ExtremeEarth
 
ExtremeEarth Data Science Pipeline for Linked Earth Observation Data
ExtremeEarth Data Science Pipeline for Linked Earth Observation DataExtremeEarth Data Science Pipeline for Linked Earth Observation Data
ExtremeEarth Data Science Pipeline for Linked Earth Observation DataExtremeEarth
 
Snow Monitoring for Water Availability and Irrigation
Snow Monitoring for Water Availability and IrrigationSnow Monitoring for Water Availability and Irrigation
Snow Monitoring for Water Availability and IrrigationExtremeEarth
 
Polar Use Case in ExtremeEarth-phiweek19
Polar Use Case in ExtremeEarth-phiweek19Polar Use Case in ExtremeEarth-phiweek19
Polar Use Case in ExtremeEarth-phiweek19ExtremeEarth
 
The ExtremeEarth infrastructure-phiweek19
The ExtremeEarth infrastructure-phiweek19The ExtremeEarth infrastructure-phiweek19
The ExtremeEarth infrastructure-phiweek19ExtremeEarth
 
Scalable Deep Learning in ExtremeEarth-phiweek19
Scalable Deep Learning in ExtremeEarth-phiweek19Scalable Deep Learning in ExtremeEarth-phiweek19
Scalable Deep Learning in ExtremeEarth-phiweek19ExtremeEarth
 
Food security use case in ExtremeEarth-phiweek19
Food security use case in ExtremeEarth-phiweek19Food security use case in ExtremeEarth-phiweek19
Food security use case in ExtremeEarth-phiweek19ExtremeEarth
 
Copernicus and AI workshop 2020
Copernicus and AI workshop 2020Copernicus and AI workshop 2020
Copernicus and AI workshop 2020ExtremeEarth
 
LPS19 ExtremeEarth Project
LPS19 ExtremeEarth ProjectLPS19 ExtremeEarth Project
LPS19 ExtremeEarth ProjectExtremeEarth
 

Mehr von ExtremeEarth (13)

Polar Use Case - ExtremeEarth Open Workshop
Polar Use Case  - ExtremeEarth Open WorkshopPolar Use Case  - ExtremeEarth Open Workshop
Polar Use Case - ExtremeEarth Open Workshop
 
ExtremeEarth Open Workshop - Introduction
ExtremeEarth Open Workshop - IntroductionExtremeEarth Open Workshop - Introduction
ExtremeEarth Open Workshop - Introduction
 
AI models for Ice Classification - ExtremeEarth Open Workshop
AI models for Ice Classification - ExtremeEarth Open WorkshopAI models for Ice Classification - ExtremeEarth Open Workshop
AI models for Ice Classification - ExtremeEarth Open Workshop
 
Big Linked Data Interlinking - ExtremeEarth Open Workshop
Big Linked Data Interlinking - ExtremeEarth Open WorkshopBig Linked Data Interlinking - ExtremeEarth Open Workshop
Big Linked Data Interlinking - ExtremeEarth Open Workshop
 
Artificial Intelligence and Big Data Technologies for Copernicus Data: the Ex...
Artificial Intelligence and Big Data Technologies for Copernicus Data: the Ex...Artificial Intelligence and Big Data Technologies for Copernicus Data: the Ex...
Artificial Intelligence and Big Data Technologies for Copernicus Data: the Ex...
 
ExtremeEarth Data Science Pipeline for Linked Earth Observation Data
ExtremeEarth Data Science Pipeline for Linked Earth Observation DataExtremeEarth Data Science Pipeline for Linked Earth Observation Data
ExtremeEarth Data Science Pipeline for Linked Earth Observation Data
 
Snow Monitoring for Water Availability and Irrigation
Snow Monitoring for Water Availability and IrrigationSnow Monitoring for Water Availability and Irrigation
Snow Monitoring for Water Availability and Irrigation
 
Polar Use Case in ExtremeEarth-phiweek19
Polar Use Case in ExtremeEarth-phiweek19Polar Use Case in ExtremeEarth-phiweek19
Polar Use Case in ExtremeEarth-phiweek19
 
The ExtremeEarth infrastructure-phiweek19
The ExtremeEarth infrastructure-phiweek19The ExtremeEarth infrastructure-phiweek19
The ExtremeEarth infrastructure-phiweek19
 
Scalable Deep Learning in ExtremeEarth-phiweek19
Scalable Deep Learning in ExtremeEarth-phiweek19Scalable Deep Learning in ExtremeEarth-phiweek19
Scalable Deep Learning in ExtremeEarth-phiweek19
 
Food security use case in ExtremeEarth-phiweek19
Food security use case in ExtremeEarth-phiweek19Food security use case in ExtremeEarth-phiweek19
Food security use case in ExtremeEarth-phiweek19
 
Copernicus and AI workshop 2020
Copernicus and AI workshop 2020Copernicus and AI workshop 2020
Copernicus and AI workshop 2020
 
LPS19 ExtremeEarth Project
LPS19 ExtremeEarth ProjectLPS19 ExtremeEarth Project
LPS19 ExtremeEarth Project
 

Kürzlich hochgeladen

detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxAleenaJamil4
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGIThomas Poetter
 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxellehsormae
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 

Kürzlich hochgeladen (20)

detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptx
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptx
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 

Hopsworks - ExtremeEarth Open Workshop

  • 1. Three years of the ExtremeEarth project Online workshop - December 9th 2021 Theofilos Kakantousis Desta Haileselassie Hagos Logical Clocks, KTH The ExtremeEarth platform: scalable deep learning pipelines with Earth observation data and Hopsworks
  • 2. ExtremeEarth From Copernicus Big Data to Extreme Earth Analytics This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825258.
  • 3. 3 Contents 1. ExtremeEarth platform architecture 2. End-to-end scalable deep learning pipelines with Hopsworks 3. Exploitation of results 4. Research
  • 5. 5 Background • The Copernicus programme produces more than three petabytes (PB) of Earth Observation (EO) data annually from Sentinel satellites.* • Data and Information Access Services (DIAS) provide centralised access to Copernicus data and processing tools. • European Space Agency (ESA)Thematic Exploitation Platforms (TEPs) make sure complex data streams are exploited to their full potential. ○ Food Security, Polar • Hopsworks Data-Intensive AI platform brings scalable AI support for Earth Observation data. * https://workshop.copernicus.eu/sites/default/files/content/attachments/ajax/copernicus_overview.pdf
  • 6. 6 How to build AI products with EO data
  • 7. 7 ExtremeEarth architecture goals • ExtremeEarth brings together these components ○ Under the same architecture… ○ … and infrastructure. ○ Reduce cost and increase productivity by providing a seamless end-user experience without having to manage different services • Combine ○ EO data access from DIASes ○ End-user facing EO data products from TEPs ○ Scalable AI capabilities of Hopsworks
  • 9. 9 ExtremeEarth architecture deep dive 1/2 • Infrastructure provided by Creodias and managed by the TEPs ○ OpenStack cluster with GPU support • Data layer with multiple data sources ○ Raw Creodias data ○ Intermediate TEP data ○ Training datasets • Processing layer provided by Hopsworks. ○ Core AI engine ○ Develop PB-scale machine learning algorithms with deep learning architectures. ○ Platform that provides support for semantic data tools
  • 10. 10 ExtremeEarth architecture deep dive 2/2 • Product layer ○ Hopsworks serves AI products to external clients • User interface ○ Hopsworks is integrated with the TEPs via APIs ○ TEP users make direct use of AI models develop in Hopsworks.
  • 11. 11 Real World Use Cases - Food Security
  • 12. 12 Real World Use Cases - Polar
  • 13. 13 ExtremeEarth running in production • Hopsworks installed alongside TEP infrastructure on CREODIAS ○ https://hopsworks.polartep.io • Provides easy EO data access and machine learning development tooling to developers and data scientists. • Deep learning architectures developed on this Hopsworks cluster for the Food Security and Polar use cases.
  • 15. 15 Hopsworks Open source platform to develop end-to-end machine learning pipelines at scale for Enterprise AI. Use your tools of choice and serve at the lowest latency on any cloud, at any scale. The Data Platform for AI
  • 16. 16 Organizations are struggling to deploy AI because of Data ● “87% identified data as the reason their organizations failed to successfully implement AI.”* Venture Beat * https://venturebeat.com/2021/03/24/employees-attribute-ai-project-failure-to-poor-data-quality/ Where the data is (storage) Discover and Access the data Clean, Join and Aggregate the Data Extract the Data Transform the data into features Validate the data. Make the process repeatable 🔁 Serve for real-time applications or train. 🏆
  • 17. 17 Growing Consensus on How to Manage Complexity of AI Data validation Distributed Training Model Serving A/B Testing Monitoring Pipeline Management HyperParameter Tuning Feature Engineering Data Collection Hardware Management * Diagram from Google’s paper Hidden Technical Debt in Machine Learning Systems Data Model Prediction φ(x)
  • 18. 18 Growing Consensus on How to Manage Complexity of AI Data validation Distributed Training Model Serving A/B Testing Monitoring Pipeline Management HyperParameter Tuning Feature Engineering Data Collection Hardware Management * Diagram from Google’s paper Hidden Technical Debt in Machine Learning Systems FEATURE STORE FEATURE ENGINEERING Data Model Prediction φ(x) FEATURE STORE FEATURE ENGINEERING
  • 19. 19 Growing Consensus on How to Manage Complexity of AI Data validation Distributed Training Model Serving A/B Testing Monitoring Pipeline Management HyperParameter Tuning Feature Engineering Data Collection Hardware Management * Diagram from Google’s paper Hidden Technical Debt in Machine Learning Systems FEATURE STORE FEATURE ENGINEERING FEATURE STORE FEATURE ENGINEERING ML PLATFORM TRAIN and SERVE Data Model Prediction φ(x)
  • 20. 20 Scalable end-to-end deep learning pipelines ● Horizontally scalable infrastructure that enables developers to manage the lifecycle of EO machine learning applications
  • 21. 21 End-to-end machine learning components Streaming Train/Test Data (S3, HDFS, etc) Online Application Data Warehouse Data Lake Feature Engineering Offline Feature Store Model Training Model Serving Online Feature Store Model Repository Monitor Deploy Feature Vectors Result Sink (DB) Batch Scoring Batch Access Deploy Feature Store HopsFS Scaleout Metadata
  • 22. 22 Hopsworks - one open source platform with all the tools APPLICATIONS API DASHBOARDS HOPSWORKS DATASOURCE ORCHESTRATION In Airflow BATCH Apache Spark STREAMING Apache Spark Apache Flink HOPSWORKS FEATURE STORE DISTRIBUTED ML & DL Pip Conda Tensorflow scikit-learn PyTorch Jupyter Notebooks Tensorboard FILESYSTEM & METADATA STORAGE In HopsFS MODEL SERVING Kubernetes MODEL MONITORING Kafka + Spark Streaming Data Preparation & Ingestion Experimentation & Model Training Deploy & Productionalize Apache Kafka
  • 24. 24 Distributed deep learning with Hopsworks # RUNS ON THE WORKERS def train(): def input_fn(): # return dataset model = … optimizer = … model.compile(…) history = model.fit(..) metrics = { 'train_loss': history.history['loss'][-1], 'train_accuracy': history.history['accuracy'][-1], 'val_loss': history.history['val_loss'][-1], 'val_accuracy': history.history['val_accuracy'][-1], } tf.estimator.train_and_evaluate( keras_estimator, input_fn) # RUNS ON THE DRIVER experiment.mirrored(train_fn, name='distributed, metric_key='val_accuracy') HopsFS W 1 Driver TF_CONFIG W 5 W8 W 7 W 6 W 2 W 4 W 3 Metrics TensorBoard Checkpoints Training Data Models Logs
  • 25. 25 Hyperparameter tuning with Maggy ● Library for distribution transparent machine learning experiments on Apache Spark ● Not bound to stage based algorithms, contrary to existing frameworks. ● Directed Hyperparameter Search (ASHA, Bayesian) on TensorFlow, PyTorch, ScikitLearn, XGBoost ● In real-time, unified Logging in Jupyter notebooks.
  • 26. 26 Ablation studies with Maggy ● Parallel Ablation Studies: without changing your inner training loop in TensorFlow/Keras, evaluate (in parallel) the effect of different layers, datasets features, etc.
  • 27. 27 ML model registry management
  • 30. 30 Exploitation ● Hopsworks is now extended with EO data support ● Creates opportunities to onboard new use cases for AI with EO data o Hopsworks as the AI platform for other research projects, H2020 DeepCube ● Hopsworks as a product offering o With the Polar and Food Security TEPs ExtremeAI platform o Can be seamlessly integrated with further DIASes o Offered as SaaS at hopsworks.ai on public clouds such as Amazon AWS and Microsoft Azure
  • 32. 32 Publications o The ExtremeEarth Software Architecture for Copernicus Earth Observation Data. (Conference paper) ▪ Published: Conference on Big Data from Space (BiDS21). o ExtremeEarth Meets Data From Space (Journal paper). ▪ Published: IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (JSTARS) (2021). o Maggy: Scalable Asynchronous Parallel Hyperparameter Search. (Conference paper) ▪ Published: The 1st Workshop on Distributed Machine Learning (DistributedML'20). o AutoAblation: Automated Parallel Ablation Studies for Deep Learning. (Conference paper) ▪ Published: The 1st Workshop on Machine Learning and Systems (EuroMLSys‘21) o Scalable Artificial Intelligence for Earth Observation Data Using Hopsworks. (Journal paper) ▪ Under preparation: IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (JSTARS) (2021). ⇒ Will be submitted soon. • Published papers: http://earthanalytics.eu/publications.html
  • 33. 33 Blog posts o AI Software Architecture for Copernicus Data with Hopsworks. ▪ July 2021 (link) o End-to-end Deep Learning Pipelines with Earth observation Data in Hopsworks ▪ October 2021 (link)