2. What we’ve heard
To spend less time preparing
data
Robust data governance
Platform to actionable Insights to
the business
Ability to increase the value of
hidden data
Improve Operational Efficiency
Ideally, organizations
want to have…..
Reduce cost of data engineering
Need for Frictionless
Data Governance
Difficult to balance
access and data
protection
Data and Analytics
Operationalization
Enable Lines of Businesses
Poor data quality
Disparate systems
and data silos
Too slow moving
from data to decision
Barriers
to
achieve
business
outcomes
Unified ecosystem
Project prioritization
3. Every application that creates data, needs and will have a database
Application A Application B
Consequently, when we have two applications, we hypothesize that each application has its own ‘database’.
When there is interoperability between these two applications, we expect data to be transferred from one
application to the other.
Every application, at least in the context of data management, that creates data, needs and will have a
database. Even stateless applications that create data have “databases”. In these scenarios the database
typically sits in the RAM or in a temp file.
4. We can’t escape from data integration
Application A Application B
The ‘always’ required data transformation lies in the fact that an application database schema is designed to
meet the application’s specific requirements. Since the requirements differ from application to application,
the schemas are expected to be different and data integration is always required when moving data around.
A crucial aspect when it comes to data transfer is that data integration is always right around the corner.
Whether you do ETL or ELT, virtual or physical, batch or real-time, there’s no escape from the data
integration* dilemma.
Data integration
5. Business Drivers
•Lack of data
ownership
Lack of data quality
Difficult to see
interdependencies
Model conflicts
across business
concerns
Tremendous effort
for integration and
coordination leads
to bypasses
Business and IT
work in silos
Disconnect
between the data
producers and data
consumers
Central team
becomes the
bottleneck
Difficult to apply
policy and
governance
Hard to see
technical
dependencies
Small changes
become risky due
to unexpected
consequences
Technical
ownership rather
than data
ownership
Many Enterprises are saddled with outdated Data Architectures that do not scale to the needs of large multi-
disciplinary organizations.
6. Problems with Existing Architectures
There’s a deep assumption that centralization is the solution to data management. This includes
centralizing all data and management activities into one central team, building one data platform,
using one ETL framework, using one canonical model, etc.
Transactional
Sources
Analytical
Consumers
Centralized Architecture
• Single team with centralized knowledge and book of work
• Centralized pipelines for all extraction / ingestion activities
• Centralized transformations to create harmonized data
• Central platform serves as large integration database: all
execution and analysis is done on the same platform
Data providers Data consumers
Central engineering team
Transactional
Sources
Transactional
Sources
Analytical
Consumers
Analytical
Consumers
7. Transformational Trends in the Data Landscape
Massive increase of computing power, driven
by hardware innovation (SSD storage, in-
memory storage, GPU advances) lets us move
data to compute faster.
Cloud and APIs make it easier to integrate.
Software & Platform as a Service (SaaS, PaaS)
offerings push the connectivity and API usage
even further.
Explosion of tools
New (open source) concepts are introduced,
such as NoSQL database types, block chain,
new database designs, distributed models
(Hadoop), new analytical methods, etc.
Exponential growth of data, especially external
data sources like open and social data.
Internal, external, structured, and unstructured
data are all used to deliver additional insights.
Eco-system connectivity
Exponential growth of data
Increase of computing power
Stronger regulatory requirements, such as
GDPR and BCBS 239, are coming into effect
worldwide. Data quality and lineage become
more important every day.
Increased regulatory attention
The read/write ratio has changed due to more
intensive data consumption: data is read more
often, there is increased real-time consumption
and more searches are performed.
Increase of read/write ratio
9. Data as a Product
Data is no
longer a
side-effect,
it’s a product.
Who are my
"customers"?
What do my
"customers"
need?
Are they
happy with
the data? Are
they using it?
How do I let
my
"customers"
know my
data exists?
What is in it
for the
"customer"?
11. Data Product Properties
How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh (martinfowler.com)
Zhamak Dehghani
• Overview of product in central data catalog
• Provide easy discoverability
Discoverable
• Help users access the product
programmatically
Addressable
• Data Product Owners provide monitored SLOs
• Data is cleansed and up to standard
Trustworthy
• Minimal friction for data engineers and
scientists to use the data
Self-describing
• Open standards for harmonization
• Field type formatting
Interoperable
• Access control policies
• Use SSO and RBAC
Secure
13. Data Mesh
Data Mesh is a new decentralized
socio-technical approach to
managing data, designed to work
with organizational complexity and
continuous growth. It enables large
organizations to get value from their
data, at scale, through reusability,
analytics and ML. It is building on the
Domain Driven Design methodology.
Data
Mesh
Domain
Driven
Design
Domain
Zones
Data
Products
Consumed
by other
Domains
How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh (martinfowler.com)
Zhamak Dehghani
14. Centralized Implementation is not working!
GSR
Finance
HR
Travel
Sales
Clinical Ops
Centralized Platform
LOBs are the SMEs and Shared
Service team is not able to cope
up with the projects
Datasets sprawls
Competing needs within the
organization
• IT needs to standardize
• LOBs need to implement analytics
Primitive Data Strategy
15. Introduction to Data Domains
Search
Keywords
Promotions
Top
Selling
Products
Orders
Customer
Profiles
Data Products
Integration
Services
Operational
Systems
Marketing
Domain
Customer Services
Domain
Order Management
Domain
• A domain is a collection of people, typically organized around a common business purpose.
• Create and serve data products to other domains and end users, independently from other domains.
• Ensure data is accessible, usable, available, and meets the quality criteria defined.
• Evolve data products based on user feedback and retire data products when they become irrelevant.
16. Domain Zones
Engineering
Finance HR Innovation
Program 1 Operations
Management zone
Data products
Data Domains
Microsoft Enterprise Data Mesh
17. Domain Zone
Domain Zone
Environment for each LOB
LOBs: Implement Data Services
• ex: Exploration Service, Data Order System
LOBs: Build and Share Data Products
• ex: Sales Forecast, Clean Room Performance
Automated using templates
• security, integration, monitoring, etc
18. E N T E R P R I S E
R E Q U I R E M E N T S
Security & Privacy
Governance & Compliance
Availability & Recovery
Performance & Scalability
Skills & Training
Licensing & Usage
Observation & Monitoring
Domain Architecture
19. Shift towards Domain Ownership
A new type of eco-system architecture, which shifts to the left towards a modern distributed
architecture that enables domain-specific data and data products, empowering each domain to handle
their own data pipelines.
Supporting governance and domain-agnostic platform infrastructure
Data Providers Data Product
Data Providers Data Product
Data Providers Data Product
Source-oriented
Domains
Consumer-
specific
Transformation
Data Consumer
Consumer-
specific
Transformation
Data Consumer
Consumer-
specific
Transformation
Data Consumer
Consumption-oriented
Domains
20. Domain Zones
Data Products
Domain Zone
HR
Recruitment
Time Tracking
Employee Value
And Performance
Training and
Development
Engagement and
Retention
Engineering Operations
New Project :
Digital Twin
Clean Room
Personnel
21. • Map your data domains organically, during the onboarding of data
providers and consumers.
• Reference your business capabilities (e.g., strategy and processes) while
mapping your data domains.
• Isolate your data domains and enable communication through data
products like APIs or events.
• Create and document a shared, ubiquitous language that different domains
can use to communicate.
• Determine boundaries for both business and technical granularity.
Data Domain Considerations from the Field
23. Enterprise Scale: Azure Landing Zones
The main purpose of a “Landing Zone”
is to ensure that when a workload
lands on Azure, the required
“plumbing” is already in place,
providing greater agility and
compliance with enterprise security
and governance requirements.
24. Data Management Landing Zone
Data Management Landing Zone
Business Glossary
Data Discovery
SLAs Business Rules Ref. Data Mgmt.
Master Record Mgmt.
Data Policy
Access Governance
Loss Prevention
Privacy Operations
Risk Assessment
Repository for Data
Models
Integration
API Documentation
Automation for provisioning landing zones, data
integrations, and products
Pre-configured network and monitoring setup Standard images for deploying analytics and AI services
Azure Subscription Azure Policy
25. Data Landing Zone
Core
Networking Shared
Products
Ingest and
Processing
Upload
Data Lake
Services
Metadata
Services
Preconfigured
network and
monitoring setup
Data lake configured
with layers and
connectivity
Spark and
scheduling
engines
Blobs where 3rd parties
can upload their data
Scanners for data
governance/metadata
required by landing
zone
Analytics engines for
exploratory analytics
Data
Integration
Data
Integration #
Data Integration Teams are responsible for the ingestion of data to a
read data source. The data shouldn’t have any data transformation
applied apart from data quality checks and data type verification.
Data
Integration #
Pull SAP Data into
Landing Zone #
Streaming interface
to pull data from
heat sensors
Data
Products
Data Product
#
Data Product #
Financial Reporting
pulling Customers and
Sales together
Streaming Machine
Data from Read Data
Source
Data products fulfil a specific business need using data. Data products
manage, organize, and make sense of data across domains and
present the insights gained from the data products.
A data product is a result from one or many data integrations and/or
other data products.
Infrastructure
as
Code
26. Azure Event
Hubs
Azure Data
Lake Store Gen2
Storing read-optimized
domain data
Data
Product
Team
Data
Product
Team
Data
Product
Team
Data
Product
Team Data Onboarding Team
Data Integration
Synapse
Analytics
Data
Product
Team
Data
Product
Team
Data
Product
Team
Real-time applications,
operational systems
Self-service BI,
semantic models
Analytical applications
Data
Engineering
Team
Data Management
Landing Zone
Data Governance
Team
Azure Purview
Data Lake Services
Azure Data
Factory
Transforming into read-
optimized data products
Data Integration
Data Integration
Data Landing Zone
Azure Databricks
Shared Service
Data-driven
applications
Data Product
Data Product
Example Reference Architecture for Data Mesh in a Small Company
Data Product
27. Optimize Existing Implementation Patterns
Take a new approach to data management that supports and evolves with your strategy.
The data management and analytics scenario supports a range of patterns to
build on your current data infrastructure, to help you modernize and scale from where you are.
Data Warehouse Data Lake Data Lakehouse Data Mesh
Data Fabric
28. Integrating your DWH in a Data Mesh
From be-all end-all to yet another Data Product in your mesh
Ownership based on your preference
DWH is data product on its own: managed by one data product team
DWH serves as "wrapper" for multiple data products: managed by multiple teams
DWH consumes data from multiple Data Products
Multiple Data Products consume data from DWH
29. Agile Data Management
Enforce data governance and security.
Serve data as a product rather than a byproduct.
Provide an ecosystem of data products.
Create data domains to serve lines of business.
Empower teams to drive analytics solutions that deliver value to the business.
Modernize your teams and operations.
Prepare your company to:
31. Contoso
Managem
ent zone
Data products
Data Domains
Multi Organization Data Mesh
Finance
HR
Organization
Contoso
Managem
ent zone
Finance
HR
Contoso
Managem
ent zone
Finance
HR
33. Links
DDD
Best Practice - An Introduction To Domain-Driven Design | Microsoft Docs
Introduction into Domain-Driven Design (DDD) (jannikwempe.com)
IBM Automation Event-Driven Reference Architecture – Domain Driven Design (ibm-
cloud-architecture.github.io)
Data Mesh
How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh
(martinfowler.com)
Data Mesh in Practice: How Europe's Leading Online Platform for Fashion Goes
Beyond the Data Lake - Databricks