Data mesh was among the most discussed and controversial enterprise data management topics of 2021. One of the reasons people struggle with data mesh concepts is we still have a lot of open questions that we are not thinking about:
Are you thinking beyond analytics? Are you thinking about all possible stakeholders? Are you thinking about how to be agile? Are you thinking about standardization and policies? Are you thinking about organizational structures and roles?
Join data.world VP of Product Tim Gasper and Principal Scientist Juan Sequeda for an honest, no-bs discussion about data mesh and its role in data governance.
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Five Things to Consider About Data Mesh and Data Governance
1. data.world
How to launch a data catalog in minutes
Tim Gasper
VP of Product
data.world
Five things to consider about Data Mesh and Data Governance
Paul Gancz
Partner Solutions Architect
Snowflake
Juan Sequeda
Principal Scientist
data.world
2. datadotworld data.world
Better together
The Data Cloud
ONE platform
MANY workloads
NO data silos
The most powerful
combined data mesh
solution to eliminate data
silos and democratize
access to well-governed
data products.
The Modern Data Catalog
Make data discovery,
governance, and analysis
easy.
+
3. Why Data Mesh?
What is the problem?
Monolithic approaches to data
don’t scale socially
Data is treated as an afterthought
Why do we care?
Centralized processes and teams become a
bottleneck for the business
Data value is being left untapped
4. Distribute responsibility for data pipelines and data quality to people with domain knowledge.
Serve data as a product using a common self-service IT infrastructure platform.
Domain-Centric
Ownership &
Architecture
Data as-a-Product Self-Service
Data Platform
Federated
Governance
Data pipelines owned by
teams with domain knowledge
Domains own cleansing,
refinement, historization,
pre-aggregation, etc.
Domains responsible for
governance, lineage, etc.
Domains treat data with
consumers in mind
Data is discoverable
Data is easy to obtain and use
Data is documented
Domains responsible for the
quality of their data
Common set of tools across
domains
Domain-agnostic
Easy to use and low
maintenance to support
Easy to deploy repeatable
patterns for data cleansing,
transformation, automation,
storage, security, governance,
sharing
Global interoperability
standards across domains
Define and use global data
governance policies
Define and apply governance
within each domain and
propagate downstream
Data Mesh Principles
Source: Zhamak Dehghani, https://martinfowler.com/articles/data-monolith-to-mesh.html , https://martinfowler.com/articles/data-mesh-principles.html
5. DATA GOVERNANCE CHALLENGES
5
Data Is
Everywhere
Must be able to eliminate
silos inside and outside
your organization
Managing Data Is
Unnecessarily Complex
Knowing what your
data is — and how it is
being used — is hard
Security and Governance
Are Inherently Rigid
Requires managing risk and
changing regulations, while
getting the most from your data
6. DATA GOVERNANCE IN THE DATA CLOUD
6
Know Your Data Protect Your Data
Understand, classify, and
track data and its usage
Secure sensitive data with
policy-based access controls
Securely collaborate and
share data across teams
Unlock Your Data
7. What is has been...
Risk avoidance and compliance
Top-down policies
Cumbersome processes
DATA GOVERNANCE
8. What it needs to be...
DATA GOVERNANCE
Rules of cooperation and collaboration
Process of data & analytics together
Capture knowledge in real-time
9. What is the goal of data governance
Data Governance and Data Catalogs
What do catalogs do and how they help
Governance is now about data discoverability; not just data
protection.
While application silos pose a governance challenge, inclusive,
agile data governance approaches pose solutions.
Governance needs to be a benefit, not a burden. The friction
has to go away.
Business users don’t want to install software for governance,
SaaS removes all the friction and is the way to go.
Understand and trust your data with profiling, sampling and
lineage.
Everyone (producers and consumers) actively contributes to
data as they use it.
Accelerates time to value and uncover insights.
Cloud-native and multi-tenant approach are highly available,
scale bigger, perform better and evolve faster.
10. 1
2
3
4
5
Five things to consider about Data Mesh and Data Governance
What is the scope?
Who are the stakeholders?
Where should we standardize and productize data?
Who is responsible?
How to be agile?
14. Data Mesh: Domain-centric Architecture
Domain: Customer
Data
sources
from
different
domains
Consumers
Domain: Helpdesk & Support
Domain: Products
Interoperability Standards, Federated Governance, Data Catalog
ELT ELT
ETL ETL ETL
Data
Model
Data
Model
ETL ETL
ETL ETL
ETL ETL
ETL
Domain: Orders & Sales
Domain:
Marketing & Promotions
Domain: Customer 360
• Domain-centric ownership of data sources, pipelines, and data quality
• Ownership sits with domain knowledge 🡪 better data quality for consumers
• Domain teams can react faster to source format changes or quality issues
• Overall easier to scale the number of sources & consumers
• Data assets offered as products
• “Serve & pull” instead of
“push & ingest” model
15. datadotworld data.world
Resource Graph
Data Platform
Catalog
How scope affects your data catalog
Analytics
Catalog
Approach Purpose Coverage Stakeholders
Analytics Catalog
Enabling Data
Consumers discover
assets
Data Lake and Data
Mart Tables and related
Reports
Analysts, BI Team,
Report Writers,
Report Users
Data Platform
Catalog
+
Enabling the
management of Data
Platform (automation
and observability)
+
Upstream Data sources,
lineage, streaming data,
ml model, usage
information
+
Data Scientist, Data
Engineers
Enterprise
Resource Graph
+
Managing and protect
the company’s data
related resources
+
All data systems,
services, classification,
access and provenance.
+
Run Time
Developers,
Security, Privacy
The approach to managing metadata will depend on the problems that are a priority to solve.
17. datadotworld data.world
Capture and store what user
data exists, where is it, and
who is responsible for it?
Privacy
Tell me where is the sensitive
data, how is it handled, who
has access, who is
responsible for it?
Provide a platform to store
and share data best
practices, certifications,
documentation, and curated
data models.
Tell me what data there is, its
usability, how to use it and
who to go to for help.
Tell me who uses my data,
and give me a platform to
interact with them.
Enable automation within
data systems – registration,
provisioning, validation,
access controls, etc.
Stakeholders
Key to buy-in, executive sponsorship, and oversight.
Security Platforms
Data Governance
Data Producers
Data Consumers
Data Leadership
19. datadotworld data.world
What is a Data Product?
“A product that facilitates an end goal
through the use of data”
DJ Patil, former United States Chief Data Scientist
“Data as a product defines a new
concept, called data product that
embodies standardized characteristics
to make data valuable and usable.”
Zhamak Dehghani, Thoughtworks Director of Emerging
Technologies and founder of data mesh
20. datadotworld data.world
Data Product ABCs
Explicit Knowledge
E
● Modeling Schemas
● Documentation
● Relationships with other Data Products
Downstream Consumers
D
● Current and Potential Consumers
● Use Cases
● Roadmap
Contracts & Expectations
C
● Data Constraints, Definitions, Tests
● SLAs, SLOs, Sharing Agreements, Consents, Purposes
● Performance, Scale, Maintainability, etc.
Boundaries
B
● What is it? What isn’t it?
● Where will it live?
● Inputs and Outputs
Accountability
A
● Who is the owner?
● Who defines the requirements?
● Who fixes it when it breaks?
21. datadotworld data.world
What is a Data Product?
Data Producer A
Internal Data
API
Data Product(s)
Data Consumer B
Data Consumer A
Data Platform
Dataset
The Cloud-Native Data Catalog
22. datadotworld data.world
What is a Data Product?
Data Producer
A
Internal Data
API
Data Product(s)
Data Producer
B
Internal Data
API
Data Producer
C
Internal Data
API
Data
Consumer C
Data
Consumer B
Data
Consumer A
Data Platform
Aggregate or “Enterprise”
Data Product(s)
23. Data Mesh Reference Architecture
Domain: Customer
Domain: Sales
Domain: Products
Domain: Marketing
Domain: Customer 360
Inventory of shared
data products
Snowflake
Reader Account
Snowflake Data Cloud
Consumers
Interoperability Standards, Federated Governance, 3rd
Party Tools
Snowflake Data Sharing as the preferred interoperability standard. Data Marketplace makes data discoverable.
Data Exchange / Catalog for
Consumers
• Connects providers to consumers
• Inventory of available assets
• No central storage of shared data
• Providers retain full control over shared
assets (data, functions)
• Consumers access live provider data, no
copies or ETL required. Register shared
data for local SQL access in their
environment (no copy)
Data domains:
• Can consume and share data or
functions
• Control access policies, data masking,
etc. for downstream consumers
• Can share external tables, i.e. provide
access to data outside of Snowflake
• Can provide reader accounts for
non-Snowflake consumers
Data Catalog for Producers:
• Technical Metadata Inventory, Lineage,
Sensitive Data, Business Glossary
3rd
party
marketing
agency
Reseller
Sales
Analysts
Churn &
Retention
Business
optimization
Finance &
Controlling
Data Sources
24. Global and Multi-Cloud Data Mesh
Data Domain 1
Data Domain 2
Data Domain 3
Data Domain 5
Data Domain 4
Interoperability Standards, Federated Governance, 3rd
Party Tools
US East
FRA
Snowflake
Reader Account
Consumers
Snowflake enables a truly global and multi-cloud data mesh across cloud platforms and regions.
• Data sources, data domains, and
consumers can sit in different regions
and different cloud platforms
• Snowflake enables a truly global and
multi-cloud data mesh
Tokyo
Zurich
Snowflake Data Cloud
Data Sources
Inventory of shared
data products
25. GOVERNANCE IN THE DATA CLOUD
Know, protect, and unlock your data
Know your data Protect your data Unlock your data
Object Tagging
Auto Classification
Object
Dependencies**
Access History
(writes)**
Access History
(data access audit)
What
Where
Who
Row Access Policies
Dynamic Data Masking
External Tokenization
Conditional Masking
Secure Data Sharing
Data Exchange
Data Marketplace
Object
Dependencies
(impact analysis)
Access History
(data lineage)
27. datadotworld data.world
Who is responsible?
Whether you call them data product managers, data stewards, data owners, data
advocates, data custodians, or data trustees…
Let’s revisit Accountability of the Data Product ABCs Framework:
● Who is the owner?
● Who defines the requirements?
● Who fixes it when it breaks?
● Who defines the roadmap?
● Who has the expertise?
What are the fewest number of critical “hats to wear”?
28. datadotworld data.world
Data Producer Data Consumer
Data Platform
Data Engineering
Data Producer Data Consumer
Data Platform
Data Management
Changing the Paradigm
Data Management as an Intermediary Direct Data Producer and Data
Consumer Collaboration
29. Data Mesh: Domain-centric Responsibility
Domain: Customer
Data
sources
from
different
domains
Consumers
Domain: Helpdesk & Support
Domain: Products
Interoperability Standards, Federated Governance, Data Catalog
ELT ELT
ETL ETL ETL
Data
Model
Data
Model
ETL ETL
ETL ETL
ETL ETL
ETL
Domain: Orders & Sales
Domain:
Marketing & Promotions
Domain: Customer 360
Data
Consumption
Data
Management
Data
Integration
Data Sources
31. datadotworld data.world
The Cloud Data Catalog
What is Agile Data Governance?
The process of creating and improving data
assets by iteratively capturing knowledge as
data producers and consumers work together
so that everyone can benefit.
Empowering the usage of data safely.
It adapts the deeply proven best practices of
Agile and Open software development to data
and analytics.
datadotworld data.world
The Cloud-Native Data Catalog
32. datadotworld data.world
Agile Data Governance Process: iterate!
datadotworld data.world
The Cloud Data Catalog datadotworld data.world
The Cloud-Native Data Catalog
33. datadotworld data.world
The time impact of being fast, incremental, and iterative
Define policies
Release
Refine
Build workflows
Define standards and principles
Use Case 1
Define policies
Release
Build workflows
Define standards/principles
Analysis, Insight, Value
Measure, Learn, Iterate
Use Case 2
Define policies
Release
Build workflows
Define standards/principles
Analysis, Insight, Value
Measure, Learn, Iterate
Use Case 3
Define policies
Release
Build workflows
Define standards/principles
Analysis, Insight, Value
Measure, Learn, Iterate
Use Case 4
Define policies
Release
Build workflows
Define standards/principles
Analysis, Insight, Value
Measure, Learn, Iterate
datadotworld data.world
The Cloud Data Catalog datadotworld data.world
The Cloud-Native Data Catalog
34. datadotworld data.world
Takeaways
What is the scope?
● Identify the Domains. You are already doing the work,
they exist!
● Depends on the problems that are a priority to solve:
Analytics, Data Platform, Enterprise Resources
Who are the stakeholders of your data catalog?
● Always need Data Leadership
● Consumers, Producers, Governance, Privacy,
Security, Platforms
Where to Standardize/Productize Data?
● Data Product ABCs: Accountability, Boundaries,
Contracts & Expectations, Downstream
Consumers, Explicit Knowledge
● Consumption, Data Mgmt, Data Producing Systems
Who is responsible?
● Accountability: Owner, Requirements, Who
Fixes, Roadmap, Expertises
● Consumption, Data Mgmt, Data Producing
Systems
How to be agile?
● Empowering the usage of data safely.
● Develop a backlog of questions based on end user
business value
● Sprints, Peer Review, Collaborate, Iterate
The Cloud-Native Data Catalog
35. Learn more about data mesh governance
What’s inside?
How to…
● Establish a framework for treating data as a product
● Find the right balance of decentralization and centralization
● Transform data into knowledge
Download it here:
data.world/resources/reports-and-tools/data-mesh-governance-white-paper
datadotworld data.world
The Cloud Data Catalog datadotworld data.world