SlideShare ist ein Scribd-Unternehmen logo
1 von 39
Downloaden Sie, um offline zu lesen
Got data… now what?
An introduction to modern data architecture
Susan Pierce
Product Manager, Data Analytics
Google Cloud
● Industry context
● History lesson: data warehouses and data lakes
● Data lakehouse
● Data mesh
● Data vault
● Considerations in determining a data strategy
Topics
Why you should care about data
Data
Organizations see data and AI/ML potential
Technology is Altering the Way Organizations Operate
but are struggling to make it a reality
Big data/analytics 34%
Artificial intelligence / machine learning 30%
Cloud infrastructure
Identity and access management (IAM)
Cloud databases
SaaS
Internet of Things / M2M
Security Orchestration Automation and
Response (SOAR)
Serverless computing
Next-generation WiFi
30%
19%
17%
14%
11%
10%
9%
9%
Q: Which of these technologies has the most potential to significantly
alter the way your business operates over the next 3 to 5 years?
CIO Magazine, Oct 2021
10%
of organizations achieve significant
financial benefits from AI
Boston Consulting Group
Are You Making the Most of Your Relationship with AI?, Oct 2020
Challenge #1
Data is big and
multi-format.
● Structured and unstructured
● Real-time streams and at-rest
● Across clouds and on-premise
181ZB
expected by 2025
- STATISTA, FEB 2022
Challenge #2
Data requires more
than SQL.
● Machine learning & AI
● Stream analytics and events
● Data-driven applications
75%
of enterprises will shift from
piloting to operationalizing
artificial intelligence
Gartner ®, Streaming Analytics in the
Cloud: A Comparative Analysis of
Amazon, Microsoft and Google, Sumit
Pal, Shaurya Rana, 14 December, 2021
73%
of data leaders feel
real-time access to data is
extremely important
Challenge #3
Data reaches
everyone.
● Mission critical
● Accessed by everyone
● Shareable asset
- HBR, 2021
Proprietary + Confidential
68%of companies are unable to realize
measurable value from data.
More data copies
Data is big and multi-format.
Data requires more than SQL.
Data reaches everyone.
More tech islands
More integrations
High costs
Low productivity
Limited access
More capacity
More security risk
More data silos Constant Capacity Planning
Data unavailable
Poor SLAs
Unclear compliance
Accenture, Closing the Data Value Gap, 2019
Proprietary + Confidential
The data warehouse
● Timeframe: 1980s/1990s
● Purpose: Bridge the gap between operational data and business intelligence
● Data type(s): Structured, cleaned data with a known schema. Operational data is
aggregated, cleaned/pre-processed, and inserted into the data warehouse in batches
● Good for: Forcing consistency (quality and integrity), querying known dimensions, large
queries
● Typical users: “The Business”
● Other info: Can store data from multiple source databases and don’t require a strict 1-1
mapping with transactional databases
● A data mart is a smaller, application-specific data warehouse, usually tied to specific
team or line of business (example: marketing specific data)
Proprietary + Confidential
Datamarts
(logical)
Data warehouse - single integrated repository of atomic data
Inmon – Enterprise Data Warehouse Kimball – Dimensional Data Warehouse
Data sources ETL
EDW
Datamart Datamart
Datamart
Application
Application
Data sources ETL EDW
BI tool
BI tool Application
Proprietary + Confidential
Legacy systems bust customer budgets
and it’s a hassle to renew the license.
Maintaining the operations of EDW
takes too much time.
Doesn’t support machine learning and AI
initiatives as well as streaming use cases.
Data is not fresh or current enough. Systems can’t keep up with forecasted
usage and data growth. It’s hard to scale
on compute or storage on-demand.
Cost challenges Modernization challenges
Scaling challenges
Data freshness
Data warehouses are often painful to manage
Proprietary + Confidential
BigQuery Architecture
NDA
SQL:2011
Compliant
Petabit Network
High-Available Cluster
Compute
(Dremel)
Streaming
Ingest
Free Bulk
Loading
Replicated, Distributed
Storage
(high durability)
REST API
Client
Libraries
In 7
languages
Web UI, CLI
Distributed Memory
Shuffle Tier
Decoupled storage and compute for maximum flexibility.
Storage API
BigQuery
BI Engine Compute
(Stateful workers)
BQML
Proprietary + Confidential
The data lake
● Timeframe: 2000s/2010s
● Purpose: Storage for data of any type in its native format without pre-processing
● Data type(s): Structured, unstructured, semi-structured in a flat architecture, ingested by
streaming, micro batch, or batch
● Good for: Flexibility across use cases, high volumes of data, data exploration,
granular/low level data
● Typical users: Data scientists
● Other info: Optimized for lower storage cost (cheaper hardware, open source tools), and
considered highly configurable because data is not restricted to a set schema
● Data lakes are usually queried in a programmatic fashion due to the large amounts of
high-variability data they contain
Business applications
Operational
databases
Documents
Log data
Object Storage
Metadata
Storage
(Catalog)
Replicated data
Governance
Security
Operations
ETL Discovery Exploration Analytics tools
Reporting/BI tools
Data lake, a simplified view
ETL
APIs
Proprietary + Confidential
On-premises data lakes are struggling to deliver value
Resource utilization and overall TCO
of on-premises data lakes becomes
unmanageable.
Data governance and security issues open
up compliance concerns.
Resource intensive data and analytics
processing can lead to missed SLAs.
Analytics experimentation is slow due
to resource provisioning time.
TCO challenges
Agility challenges
Scaling challenges
Governance challenges
Data Warehouse
● Schema on write
● Difficult to change
● Structured data
● Strong business context
● Batch ingestion
● Inherent security
● Schema on read
● Easy to change
● Raw data
● Geared for exploration
● Supports streaming
● Difficult to govern
Data Lake
Proprietary + Confidential
The big big data decision: data warehouse or data lake?
Use case
characteristics
Understanding your business
Data Warehouse
(TB scale)
Answer “known” questions
Access “known” data
Structured data
SQL access and manipulation
Data Lake
(PB scale)
Answer “unknown” questions
Access “unknown” data
Unstructured (raw) and structured data
Code-involved access and exploration
Exploring your business
Data type
and access
Paper: Build a modern, unified analytics data platform; Blog: Bringing data lakes and data warehouses together
Proprietary + Confidential
Security
Connecting Tools
Standardization
Financial Governance
Metadata
Security
Governance
Metadata
Security
Governance
Metadata
Security
Governance
Metadata
Security
Governance
Databases
Data Marts
Data Lakes
Data Warehouses
Transactions
Unstructured
Files
Logs
LoB Specific
Data
Real-time App
Data
BI
Machine
Learning
ETL/ ELT Tools
Consumer
App
Data Silos
With more data comes more responsibility
Proprietary + Confidential
80% of analytics work is
still descriptive”
MIT, 2020
Maturity
● Teams: analysts vs. engineers
● Model: self service vs.
centralized
● Technology: BI, AI, and data
fabric
Silos
● Multiple clouds, on-premises
legacy
● No consistent governance
● Duplication: data and
definitions
90% of employees say that
their work is slowed by
unreliable data sources”
Dimensional Research, 2020
Complexity
● Volume, velocity, and variety
● Data marts, EDW, data lakes, and
lake houses
● Experimentation and production
86% of analysts struggle with
data that's out of date”
Dimensional Research, 2020
Closing the data/value gap
Proprietary + Confidential
Empowering technology and people
Data lakehouse removes overhead of data lakes and
data warehouses
Data warehouse gets the capabilities of the data
lakes
Data lake gets the capabilities of the data
warehouses
Provides:
● Multimodal data access with higher volumes of
data
● Schema on read
● The governance that data lakes lacks but data
warehouses provide
Data mesh removes the organizational barriers becoming
the bottleneck
● Emphasizes on data Domain/Team, then technology,
● Agile teams/more insights
Teams own their data and technology
● Provides API access to others teams
● Decentralized raw and processed data
Provides:
● Well defined, governed, and secured data meshes
● Still able to leverage several domains with no data
movement
Enabling data Enabling teams
Proprietary + Confidential
lakehouse
Proprietary + Confidential
Data Warehouse
/ Data Lake
Cloud
Storage
Object Storage
(low-cost semi-unstructured data store)
Structured Storage
(highly optimized analytical store)
Spark Streaming
SQL
Beam
Users
User Experience
Batch
Data lakehouse building blocks
Consistent User
experience
Choice of
processing and
analytic engines
Decoupled data
storage
Data Warehouse
/ Data Lake
Cloud
Storage
Automated Data Discovery
Unified Permissioning
Integrated Data Catalog
Centralized
Management
AI/ML
Proprietary + Confidential
Data Warehouse
/ Data Lake
Cloud
Storage
Cloud Storage
(low-cost semi-unstructured data store)
BigQuery Storage
(highly optimized analytical store)
Dataproc
Spark, Flink,
Presto, Hive
BigQuery
SQL
Data
Fusion
Beam
Users
User Experience
Vertex AI
Google Cloud Data lakehouse building blocks
Consistent User
experience
Choice of
processing and
analytic engines
Decoupled data
storage
Data Warehouse
/ Data Lake
Cloud
Storage
Dataplex
(Governance, Management)
Data Catalog
(Discovery, Search, Metadata)
Centralized
Management
Dataflow …..
Proprietary + Confidential
NDA
Data lakehouse layers
Proprietary + Confidential
Data mesh
Proprietary + Confidential
Distributed ownership
Federated data domain teams
are responsible for maintaining
their data and making it useful
for others, and are provided the
freedom to choose the best
course of action.
Focus on value of data
Teams are incentivized to
maximize the value of their data
products while staying within
policies, and making their data
available and useful for others.
Central support
Central data platform team supports
distributed data domains with tooling,
processes, standards, and guardrails,
and automation in policy enforcement.
Data mesh enables a successful data culture
Through the provision of:
Proprietary + Confidential
Five targeted outcomes
1. Value of data is measured and recognized
2. Teams empowered to generate value from data
3. No central bottleneck
4. Each domain equipped with relevant skills and knowledge
to be successful
5. Distributed ownership for data governance
Proprietary + Confidential
Data mesh data architecture and capability model
Proprietary + Confidential
One way to build a data mesh
Proprietary + Confidential
Data vault
● The Data Model can grow by adding new links and from there new HUB/SAT
○ Flexible changes with less adjustments ETL and reporting (minimal impact
analysis)
○ Parallel loading
○ Minimal regression testing
What is Data Vault? “add-only” model
Rules
● No direct connections between Hubs
● Hubs don’t reference other entities
● Links reference Hubs
● Satellites reference Hubs or Links
Data Vault modelling - how does it work?
A Data Vault model consists of 3 basic entity types
○ The Hub is a business object and separates the business keys from the rest of the model.
No data (!), no relationships -> no reasons to change.
○ The Link stores relationships between hubs (using the business keys).
It is modelled as a many-to-many relationship.
○ Satellites store the context (the attributes of a business key or relationship).
Can be added without changing Hubs and Links.
Identity and Access Management
Scheduling, Logging & Monitoring
Version Control, Continuous Integration
Source Systems
ELT SQL SQL SQL
Check
datatypes
Semantic
integration
hard business
rules
Soft business
rules
Flexible require-
ments
Driven business
domains
Staging Raw Vault Business Vault Information Mart
Presentation Layer
Public
data
Other clouds
sales customer supplier
Data Lake
Relational databases, Legacy systems
Streaming data
No SQL Databases
Staging area Raw vault
Data vault
Operational vault
Metrics vault
Metadata
Data Governance, Data Life Cycle Management
Information
marts
Report
collection
Meta Mart
Metrics Mart
Error Mart
Sheets
Flat files
Cubes
People
Technology
Process
Where to start?
Strategic vs Tactical
Proprietary + Confidential
Paper: Build a modern distributed Data Mesh with Google Cloud
Blog: Building a Unified Analytics Data Platform
Paper: Build a modern, unified analytics data platform with Google Cloud
Blog: Data lake and data warehouse convergence
Paper: Converging Architectures: Bringing Data Lakes and Data Warehouses Together
Blog: Data driven transformation using Google's unified analytics platform
Paper: What type of data processing organization are you?
Blog: Open data lakehouse on Google Cloud
Paper: Building a data lakehouse on Google Cloud Platform
Blog: Building the data science driven organization
Blog: Building the data engineering driven organization
Blog: Building the data analyst driven organization from the first principles
Blog: Announcing BigQuery Migration Service
Further reading
Proprietary + Confidential
Thank you.
Proprietary + Confidential
How do you know where to start?

Weitere ähnliche Inhalte

Was ist angesagt?

Data Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced AnalyticsData Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced AnalyticsDATAVERSITY
 
Data Catalog as a Business Enabler
Data Catalog as a Business EnablerData Catalog as a Business Enabler
Data Catalog as a Business EnablerSrinivasan Sankar
 
Data Governance and Metadata Management
Data Governance and Metadata ManagementData Governance and Metadata Management
Data Governance and Metadata Management DATAVERSITY
 
Intro to databricks delta lake
 Intro to databricks delta lake Intro to databricks delta lake
Intro to databricks delta lakeMykola Zerniuk
 
You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?Precisely
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?DATAVERSITY
 
Introduction to Data Governance
Introduction to Data GovernanceIntroduction to Data Governance
Introduction to Data GovernanceJohn Bao Vuu
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerDatabricks
 
The Business Value of Metadata for Data Governance
The Business Value of Metadata for Data GovernanceThe Business Value of Metadata for Data Governance
The Business Value of Metadata for Data GovernanceRoland Bullivant
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshJeffrey T. Pollock
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureDATAVERSITY
 
Time to Talk about Data Mesh
Time to Talk about Data MeshTime to Talk about Data Mesh
Time to Talk about Data MeshLibbySchulze
 
Data Governance
Data GovernanceData Governance
Data GovernanceBoris Otto
 
Emerging Trends in Data Architecture – What’s the Next Big Thing
Emerging Trends in Data Architecture – What’s the Next Big ThingEmerging Trends in Data Architecture – What’s the Next Big Thing
Emerging Trends in Data Architecture – What’s the Next Big ThingDATAVERSITY
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseDatabricks
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureDatabricks
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lakeJames Serra
 

Was ist angesagt? (20)

Data Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced AnalyticsData Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced Analytics
 
Data Catalog as a Business Enabler
Data Catalog as a Business EnablerData Catalog as a Business Enabler
Data Catalog as a Business Enabler
 
Data Governance and Metadata Management
Data Governance and Metadata ManagementData Governance and Metadata Management
Data Governance and Metadata Management
 
Intro to databricks delta lake
 Intro to databricks delta lake Intro to databricks delta lake
Intro to databricks delta lake
 
You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?
 
Introduction to Data Governance
Introduction to Data GovernanceIntroduction to Data Governance
Introduction to Data Governance
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics Primer
 
The Business Value of Metadata for Data Governance
The Business Value of Metadata for Data GovernanceThe Business Value of Metadata for Data Governance
The Business Value of Metadata for Data Governance
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data Architecture
 
Time to Talk about Data Mesh
Time to Talk about Data MeshTime to Talk about Data Mesh
Time to Talk about Data Mesh
 
Data Governance
Data GovernanceData Governance
Data Governance
 
Emerging Trends in Data Architecture – What’s the Next Big Thing
Emerging Trends in Data Architecture – What’s the Next Big ThingEmerging Trends in Data Architecture – What’s the Next Big Thing
Emerging Trends in Data Architecture – What’s the Next Big Thing
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Data Mesh
Data MeshData Mesh
Data Mesh
 

Ähnlich wie Got data?… now what? An introduction to modern data platforms

Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationDenodo
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
Exploring the Wider World of Big Data- Vasalis Kapsalis
Exploring the Wider World of Big Data- Vasalis KapsalisExploring the Wider World of Big Data- Vasalis Kapsalis
Exploring the Wider World of Big Data- Vasalis KapsalisNetAppUK
 
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...DataScienceConferenc1
 
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...DATAVERSITY
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
Data Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricData Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricNathan Bijnens
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarioskcmallu
 
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesDATAVERSITY
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization Denodo
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Nathan Bijnens
 
Derfor skal du bruge en DataLake
Derfor skal du bruge en DataLakeDerfor skal du bruge en DataLake
Derfor skal du bruge en DataLakeMicrosoft
 
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...Denodo
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Denodo
 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefitsRicky Barron
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...DATAVERSITY
 
The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
The Marriage of the Data Lake and the Data Warehouse and Why You Need BothThe Marriage of the Data Lake and the Data Warehouse and Why You Need Both
The Marriage of the Data Lake and the Data Warehouse and Why You Need BothAdaryl "Bob" Wakefield, MBA
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
Analytics in a day
Analytics in a day Analytics in a day
Analytics in a day Peter Ward
 

Ähnlich wie Got data?… now what? An introduction to modern data platforms (20)

Benefits of a data lake
Benefits of a data lake Benefits of a data lake
Benefits of a data lake
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data Virtualization
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Exploring the Wider World of Big Data- Vasalis Kapsalis
Exploring the Wider World of Big Data- Vasalis KapsalisExploring the Wider World of Big Data- Vasalis Kapsalis
Exploring the Wider World of Big Data- Vasalis Kapsalis
 
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
 
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Data Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricData Mesh using Microsoft Fabric
Data Mesh using Microsoft Fabric
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
 
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
 
Derfor skal du bruge en DataLake
Derfor skal du bruge en DataLakeDerfor skal du bruge en DataLake
Derfor skal du bruge en DataLake
 
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefits
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
 
The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
The Marriage of the Data Lake and the Data Warehouse and Why You Need BothThe Marriage of the Data Lake and the Data Warehouse and Why You Need Both
The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data Architecture
 
Analytics in a day
Analytics in a day Analytics in a day
Analytics in a day
 

Kürzlich hochgeladen

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 

Kürzlich hochgeladen (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 

Got data?… now what? An introduction to modern data platforms

  • 1. Got data… now what? An introduction to modern data architecture Susan Pierce Product Manager, Data Analytics Google Cloud
  • 2. ● Industry context ● History lesson: data warehouses and data lakes ● Data lakehouse ● Data mesh ● Data vault ● Considerations in determining a data strategy Topics
  • 3. Why you should care about data
  • 5. Organizations see data and AI/ML potential Technology is Altering the Way Organizations Operate but are struggling to make it a reality Big data/analytics 34% Artificial intelligence / machine learning 30% Cloud infrastructure Identity and access management (IAM) Cloud databases SaaS Internet of Things / M2M Security Orchestration Automation and Response (SOAR) Serverless computing Next-generation WiFi 30% 19% 17% 14% 11% 10% 9% 9% Q: Which of these technologies has the most potential to significantly alter the way your business operates over the next 3 to 5 years? CIO Magazine, Oct 2021 10% of organizations achieve significant financial benefits from AI Boston Consulting Group Are You Making the Most of Your Relationship with AI?, Oct 2020
  • 6. Challenge #1 Data is big and multi-format. ● Structured and unstructured ● Real-time streams and at-rest ● Across clouds and on-premise 181ZB expected by 2025 - STATISTA, FEB 2022
  • 7. Challenge #2 Data requires more than SQL. ● Machine learning & AI ● Stream analytics and events ● Data-driven applications 75% of enterprises will shift from piloting to operationalizing artificial intelligence Gartner ®, Streaming Analytics in the Cloud: A Comparative Analysis of Amazon, Microsoft and Google, Sumit Pal, Shaurya Rana, 14 December, 2021
  • 8. 73% of data leaders feel real-time access to data is extremely important Challenge #3 Data reaches everyone. ● Mission critical ● Accessed by everyone ● Shareable asset - HBR, 2021
  • 9. Proprietary + Confidential 68%of companies are unable to realize measurable value from data. More data copies Data is big and multi-format. Data requires more than SQL. Data reaches everyone. More tech islands More integrations High costs Low productivity Limited access More capacity More security risk More data silos Constant Capacity Planning Data unavailable Poor SLAs Unclear compliance Accenture, Closing the Data Value Gap, 2019
  • 10. Proprietary + Confidential The data warehouse ● Timeframe: 1980s/1990s ● Purpose: Bridge the gap between operational data and business intelligence ● Data type(s): Structured, cleaned data with a known schema. Operational data is aggregated, cleaned/pre-processed, and inserted into the data warehouse in batches ● Good for: Forcing consistency (quality and integrity), querying known dimensions, large queries ● Typical users: “The Business” ● Other info: Can store data from multiple source databases and don’t require a strict 1-1 mapping with transactional databases ● A data mart is a smaller, application-specific data warehouse, usually tied to specific team or line of business (example: marketing specific data)
  • 11. Proprietary + Confidential Datamarts (logical) Data warehouse - single integrated repository of atomic data Inmon – Enterprise Data Warehouse Kimball – Dimensional Data Warehouse Data sources ETL EDW Datamart Datamart Datamart Application Application Data sources ETL EDW BI tool BI tool Application
  • 12. Proprietary + Confidential Legacy systems bust customer budgets and it’s a hassle to renew the license. Maintaining the operations of EDW takes too much time. Doesn’t support machine learning and AI initiatives as well as streaming use cases. Data is not fresh or current enough. Systems can’t keep up with forecasted usage and data growth. It’s hard to scale on compute or storage on-demand. Cost challenges Modernization challenges Scaling challenges Data freshness Data warehouses are often painful to manage
  • 13. Proprietary + Confidential BigQuery Architecture NDA SQL:2011 Compliant Petabit Network High-Available Cluster Compute (Dremel) Streaming Ingest Free Bulk Loading Replicated, Distributed Storage (high durability) REST API Client Libraries In 7 languages Web UI, CLI Distributed Memory Shuffle Tier Decoupled storage and compute for maximum flexibility. Storage API BigQuery BI Engine Compute (Stateful workers) BQML
  • 14. Proprietary + Confidential The data lake ● Timeframe: 2000s/2010s ● Purpose: Storage for data of any type in its native format without pre-processing ● Data type(s): Structured, unstructured, semi-structured in a flat architecture, ingested by streaming, micro batch, or batch ● Good for: Flexibility across use cases, high volumes of data, data exploration, granular/low level data ● Typical users: Data scientists ● Other info: Optimized for lower storage cost (cheaper hardware, open source tools), and considered highly configurable because data is not restricted to a set schema ● Data lakes are usually queried in a programmatic fashion due to the large amounts of high-variability data they contain
  • 15. Business applications Operational databases Documents Log data Object Storage Metadata Storage (Catalog) Replicated data Governance Security Operations ETL Discovery Exploration Analytics tools Reporting/BI tools Data lake, a simplified view ETL APIs
  • 16. Proprietary + Confidential On-premises data lakes are struggling to deliver value Resource utilization and overall TCO of on-premises data lakes becomes unmanageable. Data governance and security issues open up compliance concerns. Resource intensive data and analytics processing can lead to missed SLAs. Analytics experimentation is slow due to resource provisioning time. TCO challenges Agility challenges Scaling challenges Governance challenges
  • 17. Data Warehouse ● Schema on write ● Difficult to change ● Structured data ● Strong business context ● Batch ingestion ● Inherent security ● Schema on read ● Easy to change ● Raw data ● Geared for exploration ● Supports streaming ● Difficult to govern Data Lake
  • 18. Proprietary + Confidential The big big data decision: data warehouse or data lake? Use case characteristics Understanding your business Data Warehouse (TB scale) Answer “known” questions Access “known” data Structured data SQL access and manipulation Data Lake (PB scale) Answer “unknown” questions Access “unknown” data Unstructured (raw) and structured data Code-involved access and exploration Exploring your business Data type and access Paper: Build a modern, unified analytics data platform; Blog: Bringing data lakes and data warehouses together
  • 19. Proprietary + Confidential Security Connecting Tools Standardization Financial Governance Metadata Security Governance Metadata Security Governance Metadata Security Governance Metadata Security Governance Databases Data Marts Data Lakes Data Warehouses Transactions Unstructured Files Logs LoB Specific Data Real-time App Data BI Machine Learning ETL/ ELT Tools Consumer App Data Silos With more data comes more responsibility
  • 20. Proprietary + Confidential 80% of analytics work is still descriptive” MIT, 2020 Maturity ● Teams: analysts vs. engineers ● Model: self service vs. centralized ● Technology: BI, AI, and data fabric Silos ● Multiple clouds, on-premises legacy ● No consistent governance ● Duplication: data and definitions 90% of employees say that their work is slowed by unreliable data sources” Dimensional Research, 2020 Complexity ● Volume, velocity, and variety ● Data marts, EDW, data lakes, and lake houses ● Experimentation and production 86% of analysts struggle with data that's out of date” Dimensional Research, 2020 Closing the data/value gap
  • 21. Proprietary + Confidential Empowering technology and people Data lakehouse removes overhead of data lakes and data warehouses Data warehouse gets the capabilities of the data lakes Data lake gets the capabilities of the data warehouses Provides: ● Multimodal data access with higher volumes of data ● Schema on read ● The governance that data lakes lacks but data warehouses provide Data mesh removes the organizational barriers becoming the bottleneck ● Emphasizes on data Domain/Team, then technology, ● Agile teams/more insights Teams own their data and technology ● Provides API access to others teams ● Decentralized raw and processed data Provides: ● Well defined, governed, and secured data meshes ● Still able to leverage several domains with no data movement Enabling data Enabling teams
  • 23. Proprietary + Confidential Data Warehouse / Data Lake Cloud Storage Object Storage (low-cost semi-unstructured data store) Structured Storage (highly optimized analytical store) Spark Streaming SQL Beam Users User Experience Batch Data lakehouse building blocks Consistent User experience Choice of processing and analytic engines Decoupled data storage Data Warehouse / Data Lake Cloud Storage Automated Data Discovery Unified Permissioning Integrated Data Catalog Centralized Management AI/ML
  • 24. Proprietary + Confidential Data Warehouse / Data Lake Cloud Storage Cloud Storage (low-cost semi-unstructured data store) BigQuery Storage (highly optimized analytical store) Dataproc Spark, Flink, Presto, Hive BigQuery SQL Data Fusion Beam Users User Experience Vertex AI Google Cloud Data lakehouse building blocks Consistent User experience Choice of processing and analytic engines Decoupled data storage Data Warehouse / Data Lake Cloud Storage Dataplex (Governance, Management) Data Catalog (Discovery, Search, Metadata) Centralized Management Dataflow …..
  • 27. Proprietary + Confidential Distributed ownership Federated data domain teams are responsible for maintaining their data and making it useful for others, and are provided the freedom to choose the best course of action. Focus on value of data Teams are incentivized to maximize the value of their data products while staying within policies, and making their data available and useful for others. Central support Central data platform team supports distributed data domains with tooling, processes, standards, and guardrails, and automation in policy enforcement. Data mesh enables a successful data culture Through the provision of:
  • 29. Five targeted outcomes 1. Value of data is measured and recognized 2. Teams empowered to generate value from data 3. No central bottleneck 4. Each domain equipped with relevant skills and knowledge to be successful 5. Distributed ownership for data governance
  • 30. Proprietary + Confidential Data mesh data architecture and capability model
  • 31. Proprietary + Confidential One way to build a data mesh
  • 33. ● The Data Model can grow by adding new links and from there new HUB/SAT ○ Flexible changes with less adjustments ETL and reporting (minimal impact analysis) ○ Parallel loading ○ Minimal regression testing What is Data Vault? “add-only” model Rules ● No direct connections between Hubs ● Hubs don’t reference other entities ● Links reference Hubs ● Satellites reference Hubs or Links
  • 34. Data Vault modelling - how does it work? A Data Vault model consists of 3 basic entity types ○ The Hub is a business object and separates the business keys from the rest of the model. No data (!), no relationships -> no reasons to change. ○ The Link stores relationships between hubs (using the business keys). It is modelled as a many-to-many relationship. ○ Satellites store the context (the attributes of a business key or relationship). Can be added without changing Hubs and Links.
  • 35. Identity and Access Management Scheduling, Logging & Monitoring Version Control, Continuous Integration Source Systems ELT SQL SQL SQL Check datatypes Semantic integration hard business rules Soft business rules Flexible require- ments Driven business domains Staging Raw Vault Business Vault Information Mart Presentation Layer Public data Other clouds sales customer supplier Data Lake Relational databases, Legacy systems Streaming data No SQL Databases Staging area Raw vault Data vault Operational vault Metrics vault Metadata Data Governance, Data Life Cycle Management Information marts Report collection Meta Mart Metrics Mart Error Mart Sheets Flat files Cubes
  • 37. Proprietary + Confidential Paper: Build a modern distributed Data Mesh with Google Cloud Blog: Building a Unified Analytics Data Platform Paper: Build a modern, unified analytics data platform with Google Cloud Blog: Data lake and data warehouse convergence Paper: Converging Architectures: Bringing Data Lakes and Data Warehouses Together Blog: Data driven transformation using Google's unified analytics platform Paper: What type of data processing organization are you? Blog: Open data lakehouse on Google Cloud Paper: Building a data lakehouse on Google Cloud Platform Blog: Building the data science driven organization Blog: Building the data engineering driven organization Blog: Building the data analyst driven organization from the first principles Blog: Announcing BigQuery Migration Service Further reading
  • 39. Proprietary + Confidential How do you know where to start?