SMOTE and K-Fold Cross Validation-Presentation.pptx
BDE SC6 workshop - introduction 2016
1. THE BIG DATA EUROPE
PROJECT:
STATUS & NEXT STEPS
SC6 Workshop, Cologne
05 December
2016
2. Supporting the Societal Domains with Big Data
Technology
BigDataEurope Project
16-déc.-16www.big-data-europe.eu
3. Stakeholder Engagement
Cycle
Present action, showcase
deployments
Raise awareness about BDE
results, what they mean for
stakeholders
Collect requirements to drive
further development
16-déc.-16
www.big-data-europe.eu
M12M6 M18 M24 M30
4. Data Value Chain Evolution
16-déc.-16
Extraction, Curation Quality, Linking,
Integration
Publication,
Visualization, Analysis
Extraction, Curation, Quality,
Linking, Integration, Publication,
Visualization, Analysis
Health
Transport
Security
Extraction Curation Quality Linking Integration Publication Visualization Analysis
Data
Repositories
Linked
Open Data
TIME
Food SocietiesClimate Energy
Proprietary,
‘locked-in’
solutions
OS Solutions,
Big Data
Stackswww.big-data-europe.eu
5. A flexible, generic platform for (Big) Data
Value Chain Deployment
Big Data Integrator
16-déc.-16www.big-data-europe.eu
6. Big Data Integrator
Prototype developed by BDE
o Incorporates existing BD technology
o Facilitates integration and deployment
Main points of the architecture
o Dockerization
o Support layer, including integrated UI
o Semantification layer
16-déc.-16www.big-data-europe.eu
8. Demonstrating the Societal Value through 7
Pilot ‘Real-world’ use-cases
BigDataEurope Pilots
16-déc.-16www.big-data-europe.eu
9. 7 Pilots
◎ BDI Platform Instantiations
o Allow end-users to easily deploy functionality in own system
environment
o Modularized Docker approach - easier to replace components
o Reduces effort to keep 3rd party software updated & integrated
◎ 7 Societal Challenge Pilots
o Aligned with 7 European Commision H2020 Societal
Challenges
o Real-world use-cases (Data, Objectives, Solutions)
o Some pilots have different data & objectives but a similar16-déc.-16www.big-data-europe.eu
10. SC1: Pharmacology research
16-déc.-16
www.big-data-europe.eu
Life
Science
s &
Health
• Query a large
number of
datasets, some
large
• Existing elaborate
ingestion and
homogenization by
OpenPHACTS
• Extensive toolset
developed by OPF
and others
Objective: Large-scale heterogeneous
pharma-research data linking & integration
11. SC1: Architecture &
Components
16-déc.-16www.big-data-europe.eu
• Replicate Open PHACTS
functionality on the BDE
infrastructure using OS solutions
• Based on Virtuoso, proprietary
distributed database
• Apply to other domains (e.g.
Agriculture)
• Porting to BDI gives flexibility
and enables new
functionalities
• Logging & system health
monitoring
12. SC2: Viticulture resources
16-déc.-16www.big-data-europe.eu
Food and
Agricultur
e
Objective: Automate publication ingestion
and thematic classification• AgInfra is a major
infrastructure for
agriculture
researchers,
serving cross-
linked
bibliography, data,
and processing
services
13. www.big-data-europe.eu
SC2: Architecture &
Components
• BDI deployed as an
external infrastructure for
processing text (viticulture
publications)
• Storing and processing
text at a larger scale than
AgInfra can currently
manage
14. SC3: Predictive maintenance
16-déc.-16www.big-data-europe.eu
Energy
• Wind turbine monitoring
applies computational
models to sensor data
streams
• Models are weekly re-
parameterized using
week’s data from
multiple turbines
Objective: Real-time turbine monitoring
stream processing and analytics
15. www.big-data-europe.eu
• Existing in-house non-scalable solution for
model parameterization
• Reliable Fortran software for data analysis
• Efficient, but not scalable to data volume
• Developing a BDI orchestrator
• Re-uses existing software unmodified
• Makes it easy to apply in parallel to many
datasets and manage the outputs
SC3: Architecture &
Components
17. 16-déc.-16www.big-data-europe.eu
• New Flink
implementations of map
matching and traffic
prediction algorithms
• BDI provides access to
varied data sources
• PostGIS database
with city map
• ElasticSearch
database of
historical data
• Kafka stream of real-
time data
SC4: Architecture &
Components
18. SC5: Climate modelling
16-déc.-16www.big-data-europe.eu
Climate
• Preparing modelling experiments
• Slicing, transforming, combining
datasets
• Submission and retrieval from
modelling infrastructure
• Discovering and re-using
previously computed derivatives
• Lineage annotation: computer
derivatives from datasets and model
parameters
• Finding appropriate past runs avoids
Objective: Supporting data-intensive climate
research
19. • BDI offers:
• Hive for managing
data in a way that
can be retrieved and
manipulated, rather
than file blocks
• Cassandra stores
structured and textual
metadata for
searching headers
and lineage
• Existing infrastructure; stable, reliable software for parallel computation of
models
• BDI is deployed as an external infrastructure for preparing and managing
datasets
SC5: Architecture &
Components
21. 16-déc.-16www.big-data-europe.eu
• BDI deployed as
ingestion and storage
infrastructure for external
tools
• Homogenizes variety of
data (JSON, CSV,
XML, etc.)
• Exposes data as
SPARQL endpoint
serving homogenized
data
• Existing analytics and visualization tools
• Use SPARQL queries to retrieve only the relevant slices of the overall data
SC6: Architecture &
Components
22. SC7: Change detection &
verification
16-déc.-16www.big-data-europe.eu
Secure
Societie
s
• Events are extracted from text
published by news agencies
and on social networking sites
• Events are geo-located and
relevant changes are detected
by comparing current and
previous satellite images
Objective: Detect and Verify Events based on
Satellite Imagery, News and Social Media
23. 16-déc.-16www.big-data-europe.eu
Event Detection
Change Detection
• Re-implementation of change
detection algorithms for Spark
• Parallel orchestrator for text
analytics
• Re-uses existing software
• Scales to many input streams
• BDI provides:
• Cassandra for text content and
metadata
• Strabon GIS store for detected
change location
• Homogeneous access to both for
analysis and visualization
SC7: Architecture &
Components
25. 2nd round of Societal
Workshops
16-déc.-16www.big-data-europe.eu
Transport 22 September 2016 Brussel
s
Collocated with Big Data for
Transport, Tisa workshop
Food&Agri 30 September 2016 Brussel
s
Collocated with DG AGRI
WP2018-20 stakeholder
consultation
Energy 4 October 2016 Brussel
s
Collocated with EC H2020 Info
Day on “Smart Grids and
Storage”
Climate 11 October 2016 Brussel
s
Collocated with Melodies
Project Event – Exploiting Open
Data
Security 18 October 2016 Brussel
s
Standalone Workshop
Societies 5 December 2016 Cologne Collocated with EDDI16- 8th
26. Other Activities
Fresh set (7) of Societal Workshops in 2017
Various SC-focussed and general hangouts,
follow!
o General (technical): 2 this year More to follow!
o SC6: 2 so far, next in the next weeks
o Recordings & Presentations available online!
o Keep track on BDE Website (Events)
16-déc.-16www.big-data-europe.eu