The Data Lake paradigm is often considered the scalable successor of the more curated Data Warehouse approach when it comes to democratization of data. However, many who went out to build a centralized Data Lake came out with a data swamp of unclear responsibilities, a lack of data ownership, and sub-par data availability.
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes Beyond the Data Lake
1. Data Mesh in Practice
Max Schultze - max.schultze@zalando.de
Arif Wider - awider@thoughtworks.com
25-06-2020
How Europe’s Leading
Online Platform for Fashion
Goes Beyond the Data Lake
@mcs1408 @arifwider
2. 2
Max Schultze
● Lead Data Engineer
● MSc in Computer Science
● Took part in early
development of Apache Flink
● Retired semi-professional
Magic: the Gathering player
Who are we?
Arif Wider
● Lead Technology Consultant
● Head of AI, ThoughtWorks Germany
● Scala & FP enthusiast
● Coffee geek
13. 13
Centralization Challenges
Datasets provided by data agnostic infrastructure team
● Lack of ownership
Pipeline responsibility on data agnostic infrastructure team
● Lack of quality
Organizational scaling
● Central team becomes the bottleneck
14. 14
A Recurring Pattern
Source domain
product teams
generating data
Teams, decisions
makers, data scientists
consuming data
Data & ML engineers
maintaining the
data platform
15. 15
A Recurring Pattern
Source domain
product teams
generating data
Teams, decisions
makers, data scientists
consuming data
Data & ML engineers
maintaining the
data platform
16. 16
A Recurring Pattern
Source domain
product teams
generating data
Teams, decisions
makers, data scientists
consuming data
Data & ML engineers
maintaining the
data platform
17. 17
A Recurring Pattern
Source domain
product teams
generating data
Data & ML engineers
maintaining the
data platform
Teams, decisions
makers, data scientists
consuming data
20. 20
What is Data Mesh?
Old wine applied to new bottles…
→ Product Thinking
→ Domain-Driven Distributed Architecture
→ Infrastructure as a Platform
… creates value from Data
https://martinfowler.com/articles/data-monolith-to-mesh.html by Zhamak Dehghani
21. 21
Data as a Product
Data
Product
What is my market?
What are the desires of
my customers?
What “price” is justified?
How to do marketing?
What’s the USP?
Are my customers happy?
24. 24
Domain-Driven Distributed Architecture… applied to Data
Discoverable
Addressable
Self-describing
Trustworthy
Interoperable
(governed by open
standard)
Secure (governed by
global access control)
Domain
24
→ The Data Product is the
fundamental building block
Aggregated
Domain
25. 25
...backed by domain-agnostic self-service data infrastructure
Data Infra as a Platform
Storage, pipeline, catalogue, access control, etc
Data infra
engineers
Discoverable
Addressable
Self-describing
Trustworthy
Interoperable
(governed by open
standard)
Secure (governed by
global access control)
Domain
25
→ The Data Product is the
fundamental building block
Aggregated
Domain
26. 26
It’s a mindset shift
FROM TO
Centralized ownership Decentralized ownership
Pipelines as first class concern Domain Data as first class concern
Data as a by-product Data as a Product
Siloed Data Engineering Team Cross-functional Domain-Data Teams
Centralized Data Lake / Warehouse Ecosystem of Data Products
28. 28
Recap:
● From Bottleneck to Infra Platform
Data Mesh in Practice
Data Infra as a Platform
Storage, pipeline, catalogue, access control, etc
29. 29
Recap:
● From Bottleneck to Infra Platform
● From Data Monolith to Interoperable Services
Data Mesh in Practice
Data Infra as a Platform
Storage, pipeline, catalogue, access control, etc
central
data
platform
34. 34
Central Services with Global Interoperability
Decentralized ownership does not imply decentralized infrastructure!
Interoperability is created through convenient solutions of a self service platform.
Decentral Storage Central Infrastructure
Decentral Ownership Central Governance
36. 36
How to Ensure Data Quality?
Make conscious decisions
● Opt-in instead of default storage
37. 37
How to Ensure Data Quality?
Make conscious decisions
● Opt-in instead of default storage
● Classification of data usage
38. 38
Data Quality - A Contract between Consumer and Producer
Behavioral changes for data producers
● Data is a product not a by-product
39. 39
Behavioral changes for data producers
● Data is a product not a by-product
● Dedicate resources to
○ Understand usage
○ Ensure quality
Data Quality - A Contract between Consumer and Producer
40. 40
Data Mesh in Practice
How Europe’s Leading
Online Platform for Fashion
Goes Beyond the Data Lake
Max Schultze
max.schultze@zalando.de
@mcs1408
Arif Wider
awider@thoughtworks.com
@arifwider