Keynote presentation of the 3rd workshop on Real-time & Stream Analytics in Big Data & Stream Data Management (https://workshop.euranova.eu/bigdata18.html)
3. 3
TWO KEYNOTES
The Workshop content
Fabian Hüske co-founder Data
Artisans
Unified Processing of Static and Streaming Data with
SQL on Apache Flink.
55min
Sabri SKHIRI R&D Director EURA
NOVA
The challenge of Data Management in the Big Data Era &
its underlying Enterprise architecture shift
15min
4. 4
THE PAPERS
The Workshop Topics
Data Streaming
Architecture
CEP / CER Stream Mining
IoT Device
integration
5. KEYNOTE 1
Unified Processing of Static and Streaming Data with
SQL on Apache Flink.
Fabian Hüske co-founder Data Artisans
6. KEYNOTE 2
The challenge of Data Management in the Big Data
Era & its underlying Enterprise architecture shift
Sabri Skhiri Research director @EURA NOVA
7. Agenda
1. Emerging challenges in data management
2. What is a data architecture?
3. The linkedin/Confluent vision of data architecture
4. Open Challenges
5. Digazu as an implementation
14. Challenge 1
Application
Enterprise Data Warehouse
data-driven
database
Application Application
algorithm
Decision in
real time
Decision
batch
1
Sharing information in real time
between applications and data
storages
15. Challenge 2
Application
Enterprise Data Warehouse
data-driven
database
Application Application
algorithm
Decision in
real time
Decision
batch
2
Implementing algorithms in a
real-time-driven environment
16. Challenge 3
Application
Enterprise Data Warehouse
data-driven
database
Application Application
algorithm
Decision in
real time
Decision
batch
3
Online / Incremental /
Reinforcement learning
17. Challenge 4
Application
Enterprise Data Warehouse
data-driven
database
Application Application
algorithm
Decision in
real time
Decision
batch4
Integration strategy BI-Datalake
18. Challenge 5
Application
Enterprise Data Warehouse
data-driven
database
Application Application
algorithm
Decision in
real time
Decision
batch 5
GDPR Compliance
19. Challenge 5
Application
Enterprise Data Warehouse
data-driven
database
Application Application
algorithm
Decision in
real time
Decision
batch 5
Access policy management
On-purpose storage
● Contracts
● Opt-in
● Legitimate interest
● Regulations
Deletion
GDPR Compliance
20. Challenge 6
Application
Enterprise Data Warehouse
data-driven
database
Application Application
algorithm
Decision in
real time
Decision
batch
6
Data Governance
-data lineage
-where is my data?
-data meaning
21. With these 6 features...
1
Sharing information in real time between applications and data storages
2
Implementing algorithms in a real-time-driven environment
3
Online / Incremental / Reinforcement learning
4
Integration strategy BI-Datalake
5
GDPR Compliance
6
Data Governance -data lineage -where is my data? -data meaning
22. ...the business strategy is supported
And it is called a “data architecture”
The objectives of your company + The new Customer’s behaviour
23. What is a Data
Architecture?
Organising your data strategy
24. What is a Data architecture?
A global plan depicting how to collect, store, use, &
manage data
App. 1
App. 2
...
App. N
Analytics layerExposure layer
Governancelayer
Securitylayer
Storage layer
Users
Data processes
(Create, Read, Update, Delete)
Questions
● Where is the master data?
● How do we manage the
replica's consistency ?
● Where are the data?
● How to use the data in apps or
analytics?
● Best technology stack ?
● Convergence of BI/Analytics ?
(The 3 DW from Gartner)
● How to productize predictive
models?
● What about data governance
processes?
25. 3 needs in Enterprises
3 facets of the same story
business teams
want to implement
use cases
CDO
wants to mutualise
the use cases
IT
want to set up the
right infrastructure
32. Point-to-point data architecture
Every new use case increases
maintenance cost.
The more I stick to the roadmap of
company use cases, the higher
exploitation cost of data is.
Problem
IT ENTROPY
DATA TCO
33. The Story of the Data stream
The new wave of architecture
1
2 3
4
5
We can use these patterns in
1. DATA ARCHITECTURE
2. SERVICE ARCHITECTURE
https://data-artisans.com/flink-forward-berlin/resources/the-convergence-of-stream-processing-and-microservice-architecture
34. 34
Apps Apps Apps Apps
OLAPNewsfeedSearch
Social
Graph
Log
Search
Monitoring
Security
RT analytic
Samza
Apps
Apps
Stream
Data
Platform
Hadoop
Key
value
storage
Oracle
Teradata
FIRST EFFICIENT SOLUTION @LINKEDIN
DECOUPLING DATA PRODUCERS & CONSUMERS
36. 36
Apps Apps Apps Apps
OLAPNewsfeedSearch
Social
Graph
Log
Search
Monitoring
Security
RT analytic
Samza
Apps
Apps
Stream
Data
Platform
Hadoop
Key
value
storage
Oracle
Teradata
Open challenges
STILL A LOT OF QUESTIONS
Governance?
Data exposure
management?
Security &
regulation?
Data Transf.?
ETL?
History
Management in
data lake ?
Integration with
Data Science
Workbench
Integration
with EDW ?
37. Data Warehouse
Historical Storage
Layer
37
THE DAV: FUNCTIONAL COMPONENTS
THE RESULT OF 7 YEARS OF R&D @EURANOVA ON DATA MANAGEMENT
Operational
System 1
Operational
System 2
Operational
System 3
Applications
Data Profiling
Profiling
Lake
Access & Policy Manager
Audit & Reporting
Management
Lineage tracker
CIM & Data Location
Tracker
Governance
Stack
Governance
BI Stack
Data Analytics
Lab
DAL
Data Service
Gateway
Derived- views
Transformer
Layer
Transformer
Data Collector
Policy
Interceptor
CEP
Interceptor
Collector
External sources of
data
Existing operational
systems
Existing EDW/BI
tooling
DIGAZU
components
Labels
Legend:
External data
38. Data Warehouse
Historical Storage
Layer
38
FROM ARCHITECTURE TO PRODUCT
DATA & IGAZU FALLS => DIGAZU
Operational
System 1
Operational
System 2
Operational
System 3
Applications
Data Profiling
Profiling
Lake
Access & Policy Manager
Audit & Reporting
Management
Lineage tracker
CIM & Data Location
Tracker
Governance
Stack
Governance
BI Stack
Data Analytics
Lab
DAL
Data Service
Gateway
Derived- views
Transformer
Layer
Transformer
Data Collector
Policy
Interceptor
CEP
Interceptor
Collector
External sources of
data
Existing operational
systems
Existing EDW/BI
tooling
DIGAZU
components
Labels
Legend:
External data
40. 40
digazu
40
is an end to end data engineering platform which
includes
○ data integration
○ data preparation and
○ data lake.
connects to many data sources, collects only once & streams
the data to all data consumers.
41. data scientists
marketing teams
Sources
Live 360° view
Context-aware
services
Business
Intelligence
Cubes
Data Analytics
Lab
data
warehouse
open sources
and third
parties
connected
homes
legacy
systems
smartwatches
still unused
databases
Usages Users
connected
devices
sensors
42. data scientists
marketing teams
Sources
Live 360° view
Context-aware
services
Business
Intelligence
Cubes
Data Analytics
Lab
Usages Users
Data lake
Transformation layer
Collector
Distributor
Exploration tool
data
warehouse
open sources
and third
parties
connected
homes
legacy
systems
smartwatches
still unused
databases
43. Data lake
Transformation layer
Collector
Distributor
Exploration tool
1 Stop-shop data management
Historical data management
Real-time & batch data pipeline management
Real-time enrichment process management (built-in)
Data Registry
Connector for files, RDB, Kafka, NoSQL
Fully elastic
GDPR-ready
Data Governance pre-built connector
https://digazu.com/
45. 45
CONCLUSION
Key takeaways
The digital transformation drivers all rely on data
New Customer behaviors and direct interaction require a new way think about data
architecture
DATA CAN BE SHARED THROUGH STREAMS APPLYING KAPPA-stlyle ARCHITECTURE
=>APPLY FOR EITHER APPLICATIONS OR DATA
YOU STILL NEED TO PUT IN PLACE A GLOBAL DATA MANAGEMENT STRATEGY
(GOVERNANCE, SECURITY, REGULATION, INTEGRATION WITH EDWH)