SlideShare ist ein Scribd-Unternehmen logo
1 von 65
Downloaden Sie, um offline zu lesen
New York City
9th June, 2016
Logical Data Warehouse,
Data Lakes, and Data
Services Marketplaces
Agenda1.Introductions
2.Logical Data Warehouse and Data Lakes
3.Coffee Break
4.Data Services Marketplaces
5.Q&A
3
HEADQUARTERS
Palo Alto, CA.
DENODO OFFICES, CUSTOMERS, PARTNERS
Global presence throughout North America,
EMEA, APAC, and Latin America.
CUSTOMERS
250+ customers, including many
F500 and G2000 companies across every
major industry have gained significant
business agility and ROI.
LEADERSHIP
 Longest continuous focus on data
virtualization and data services.
 Product leadership.
 Solutions expertise.
3
THE LEADER IN DATA VIRTUALIZATION
Denodo provides agile, high performance data
integration and data abstraction across the broadest
range of enterprise, cloud, big data and unstructured
data sources, and real-time data services at half the
cost of traditional approaches.
Speakers
Paul Moxon
Senior Director of Product
Management, Denodo
Pablo Álvarez
Principal Technical Account
Manager, Denodo
Rubén Fernández
Technical Account Manager,
Denodo
Logical Data Warehouse and
Data Lakes
New York City
June 2016
Agenda1.The Logical Data Warehouse
2.Different Types, Different Needs
3.Performance in a LDW
4.Customer Success Stories
5.Q&A
What is a Logical Data Warehouse?
A logical data warehouse is a data system that follows
the ideas of traditional EDW (star or snowflake schemas)
and includes, in addition to one (or more) core DWs,
data from external sources.
The main motivations are improved decision making
and/or cost reduction
Logical Data Warehouse
Description:
 “The Logical Data Warehouse (LDW) is a new data management architecture for
analytics combining the strengths of traditional repository warehouses with
alternative data management and access strategy. The LDW will form a new
best practice by the end of 2015.”
 “The LDW is an evolution and augmentation of DW practices, not a replacement”
 “A repository-only style DW contains a single ontology/taxonomy, whereas in the
LDW a semantic layer can contain many combination of use cases, many
business definitions of the same information”
 “The LDW permits an IT organization to make a large number of datasets
available for analysis via query tools and applications.”
8
Gartner Definition
Gartner Hype Cycle for Enterprise Information Management, 2012
Logical Data Warehouse
Description:
 “The Logical Data Warehouse (LDW) is a new data management architecture for
analytics combining the strengths of traditional repository warehouses with
alternative data management and access strategy. The LDW will form a new
best practice by the end of 2015.”
 “The LDW is an evolution and augmentation of DW practices, not a replacement”
 “A repository-only style DW contains a single ontology/taxonomy, whereas in the
LDW a semantic layer can contain many combination of use cases, many
business definitions of the same information”
 “The LDW permits an IT organization to make a large number of datasets
available for analysis via query tools and applications.”
9
Gartner Definition
Gartner Hype Cycle for Enterprise Information Management, 2012
Logical Data Warehouse
Description:
 A semantic layer on top of the data warehouse that keeps the business data
definition.
 Allows the integration of multiple data sources including enterprise systems,
the data warehouse, additional processing nodes (analytical appliances, Big
Data, …), Web, Cloud and unstructured data.
 Publishes data to multiple applications and reporting tools.
10
11
Three Integration/Semantic Layer Alternatives
Gartner’s View of Data Integration
Application/BI Tool as Data
Integration/Semantic Layer
EDW as Data
Integration/Semantic Layer
Data Virtualization as Data
Integration/Semantic Layer
Application/BI Tool Data Virtualization
EDW
EDW
ODS ODS EDW ODS
12
Application/BI Tool as the Data Integration Layer
Application/BI Tool as Data
Integration/Semantic Layer
Application/BI Tool
EDW ODS
• Integration is delegated to end user tools
and applications
• e.g. BI Tools with ‘data blending’
• Results in duplication of effort – integration
defined many times in different tools
• Impact of change in data schema?
• End user tools are not intended to be
integration middleware
• Not their primary purpose or expertise
13
EDW as the Data Integration Layer
EDW as Data
Integration/Semantic Layer
EDW
ODS
• Access to ‘other’ data (query federation) via
EDW
• Teradata QueryGrid, IBM FluidQuery, SAP
Smart Data Access, etc.
• Often coupled with traditional ETL replication
of data into EDW
• EDW ‘center of data universe’
• Provides data integration and semantic layer
• Appears attractive to organizations heavily
invested in EDW
• More than one EDW? EDW costs?
14
Data Virtualization as the Data Integration Layer
Data Virtualization as Data
Integration/Semantic Layer
Data Virtualization
EDW ODS
• Move data integration and semantic layer to
independent Data Virtualization platform
• Purpose built for supporting data access
across multiple heterogeneous data sources
• Separate layer provides semantic models for
underlying data
• Physical to logical mapping
• Enforces common and consistent security
and governance policies
• Gartner’s recommended approach
Logical Data Warehouse
15
EDW Hadoop
Cluster
Sales
HDFS
Files
Document
Collections
NoSQL
Database
ERP
Database Excel
16
Logical Data Warehouse
Reference Architecture by Denodo
17
The State and Future of Data Integration. Gartner, 25 may 2016
Physical data movement architectures that aren’t designed to
support the dynamic nature of business change, volatile
requirements and massive data volume are increasingly being
replaced by data virtualization.
Evolving approaches (such as the use of LDW architectures) include
implementations beyond repository-centric techniques
What about the Logical Data Lake?
A Data Lake will not have a star or snowflake schema, but rather a more
heterogeneous collection of views with raw data from heterogeneous
sources
The virtual layer will act as a common umbrella under which these
different sources are presented to the end user as a single system
However, from the virtualization perspective, a Virtual Data Lake shares
many technical aspects with a LDW and most of these contents also
apply to a Logical Data Lake
Different Types, Different
Needs
Common Patterns for a Logical Data Warehouse
20
Common Patterns for a Logical Data Warehouse
1. The Virtual Data Mart
2. DW + MDM
 Data Warehouse extended with master data
3. DW + Cloud
 Data Warehouse extended with cloud data
4. DW + DW
 Integration of multiple Data Warehouse
5. DW historical offloading
 DW horizontal partitioning with historical data in cheaper storage
6. Slim DW extension
 DW vertical partitioning with rarely used data in cheaper storage
21
Virtual Data Marts
Business friendly models defined on top of one or multiple systems,
often “flavored” for a particular division
Motivation
 Hide complexity of star schemas for business users
 Simplify model for a particular vertical
 Reuse semantic models and security across multiple reporting engines
Typical queries
 Simple projections, filters and aggregations on top of curated “fat tables”
that merge data from facts and many dimensions
Simplified semantic models for business users
22
Virtual Data Marts
Time Dimension Fact table
(sales)
Product
Retailer
Dimension
Sales
EDW Others
Product
Prod. Details
23
DW + MDM
Slim dimensions with extended information maintained in an external
MDM system
Motivation
 Keep a single copy of golden records in the MDM that can be reused across
systems and managed in a single place
Typical queries
 Join a large fact table (DW) with several MDM dimensions, aggregations on
top
Example
 Revenue by customer, projecting the address from the MDM
24
DW + MDM dimensions
Time Dimension Fact table
(sales) Product Dimension
Retailer
Dimension
EDW MDM
25
DW + Cloud dimensional data
Fresh data from cloud systems (e.g. SFDC) is mixed with the EDW, usually
on the dimensions. DW is sometimes also in the cloud.
Motivation
 Take advantage of “fresh” data coming straight from SaaS systems
 Avoid local replication of cloud systems
Typical queries
 Dimensions are joined with cloud data to filter based on some external attribute
not available (or not current) in the EDW
Example
 Report on current revenue on accounts where the potential for an expansion is
higher than 80%
26
DW + Cloud dimensional data
Time Dimension Fact table
(sales) Product Dimension
Customer
Dimension
CRM
SFDC
Customer
EDW
27
Multiple DW integration
Motivation
 Merges and acquisitions
 Different DWs by department
 Transition to new EDW Deployments (migration to Spark, Redshift, etc.)
Typical queries
 Joins across fact tables in different DW with aggregations before or after the JOIN
Example
 Get customers with a purchases higher than 100 USD that do not have a fidelity
card (purchases and fidelity card data in different DW)
Use of multiple DW as if it was only one
28
Multiple DW integration
Time
Dimensi
on
Sales fact
Product
Dimension
Region
Finance EDW
City
Marketing EDW
Customer Fidelity factsProduct
Dimension
*Real Examples: Nationwide POC, IBM tests
Store
29
DW Historical Partitioning
Only the most current data (e.g. last year) is in the EDW. Historical data is
offloaded to a Hadoop cluster
Motivations
 Reduce storage cost
 Transparently use the two datasets as if they were all together
Typical queries
 Facts are defined as a partitioned UNION based on date
 Queries join the “virtual fact” with dimensions and aggregate on top
Example
 Queries on current date only need to go to the DW, but longer timespans need to merge
with Hadoop
Horizontal partitioning
30
DW Historical offloading
Horizontal partitioning
Time Dimension Fact table
(sales) Product Dimension
Retailer
Dimension
Current Sales Historical Sales
EDW
31
Slim DW extension
Minimal DW, with more complete raw data in a Hadoop cluster
Motivation
 Reduce cost
 Transparently use the two datasets as if they were all together
Typical queries
 Tables are defined virtually as 1-to-1 joins between the two systems
 Queries join the facts with dimensions and aggregate on top
Example
 Common queries only need to go to the DW, but some queries need attributes or
measures from Hadoop
Vertical partitioning
32
Slim DW extension
Vertical partitioning
Time Dimension Fact table
(sales) Product Dimension
Retailer
Dimension
Slim Sales Extended Sales
EDW
Performance in a LDW
34
It is a common assumption that a virtualized solution will
be much slower than a persisted approach via ETL:
1. There is a large amount of data moved through the
network for each query
2. Network transfer is slow
But is this really true?
35
Debunking the myths of virtual performance
1. Complex queries can be solved transferring moderate data volumes when
the right techniques are applied
 Operational queries
 Predicate delegation produces small result sets
 Logical Data Warehouse and Big Data
 Denodo uses characteristics of underlying star schemas to apply
query rewriting rules that maximize delegation to specialized sources
(especially heavy GROUP BY) and minimize data movement
2. Current networks are almost as fast as reading from disk
 10GB and 100GB Ethernet are a commodity
36
Denodo has done extensive testing using queries from the standard benchmarking test
TPC-DS* and the following scenario
Compares the performance of a federated approach in Denodo with an MPP system where
all the data has been replicated via ETL
Customer Dim.
2 M rows
Sales Facts
290 M rows
Items Dim.
400 K rows
* TPC-DS is the de-facto industry standard benchmark for measuring the performance of
decision support solutions including, but not limited to, Big Data systems.
vs.
Sales Facts
290 M rows
Items Dim.
400 K rows
Customer Dim.
2 M rows
Performance Comparison
Logical Data Warehouse vs. Physical Data Warehouse
37
Performance Comparison
Query Description
Returned
Rows
Time Netezza
Time Denodo
(Federated Oracle,
Netezza & SQL Server)
Optimization Technique
(automatically selected)
Total sales by customer 1,99 M 20.9 sec. 21.4 sec. Full aggregation push-down
Total sales by customer and
year between 2000 and 2004
5,51 M 52.3 sec. 59.0 sec Full aggregation push-down
Total sales by item brand 31,35 K 4.7 sec. 5.0 sec. Partial aggregation push-down
Total sales by item where
sale price less than current
list price
17,05 K 3.5 sec. 5.2 sec On the fly data movement
Logical Data Warehouse vs. Physical Data Warehouse
38
Performance and optimizations in Denodo
Focused on 3 core concepts
Dynamic Multi-Source Query Execution Plans
Leverages processing power & architecture of data sources
Dynamic to support ad hoc queries
Uses statistics for cost-based query plans
Selective Materialization
Intelligent Caching of only the most relevant and often used
information
Optimized Resource Management
Smart allocation of resources to handle high concurrency
Throttling to control and mitigate source impact
Resource plans based on rules
39
Performance and optimizations in Denodo
Comparing optimizations in DV vs ETL
Although Data Virtualization is a data integration platform,
architecturally speaking it is more similar to a RDBMs
Uses relational logic
Metadata is equivalent to that of a database
Enables ad hoc querying
Key difference between ETL engines and DV:
ETL engines are optimized for static bulk movements
Fixed data flows
Data virtualization is optimized for queries
Dynamic execution plan per query
Therefore, the performance architecture presented here
resembles that of a RDBMS
Query Optimizer
41
Step by Step
Metadata
Query Tree
• Maps query entities (tables, fields) to actual metadata
• Retrieves execution capabilities and restrictions for views involved
in the query
Static
Optimizer
• Query delegation
• SQL rewriting rules (removal of redundant filters, tree pruning, join
reordering, transformation push-up, star-schema rewritings, etc.)
• Data movement query plans
Cost Based
Optimizer
• Picks optimal JOIN methods and orders based on data distribution
statistics, indexes, transfer rates, etc.
Physical
Execution Plan
• Creates the calls to the underlying systems in their corresponding
protocols and dialects (SQL, MDX, WS calls, etc.)
How Dynamic Query Optimizer Works
How Dynamic Query Optimizer Works
42
Example: Total sales by retailer and product during the last month for the brand ACME
Time Dimension Fact table
(sales) Product Dimension
Retailer
Dimension
EDW MDM
SELECT retailer.name,
product.name,
SUM(sales.amount)
FROM
sales JOIN retailer ON
sales.retailer_fk = retailer.id
JOIN product ON sales.product_fk =
product.id
JOIN time ON sales.time_fk = time.id
WHERE time.date < ADDMONTH(NOW(),-1)
AND product.brand = ‘ACME’
GROUP BY product.name, retailer.name
How Dynamic Query Optimizer Works
43
Example: Non-optimized
1,000,000,0
00 rows
JOIN
JOIN
JOIN
GROUP BY
product.name,
retailer.name
100 rows 10 rows 30 rows
10,000,000
rows
SELECT
sales.retailer_fk,
sales.product_fk,
sales.time_fk,
sales.amount
FROM sales
SELECT
retailer.name,
retailer.id
FROM retailer
SELECT
product.name,
product.id
FROM product
WHERE
produc.brand =
‘ACME’
SELECT time.date,
time.id
FROM time
WHERE time.date <
add_months(CURRENT_
TIMESTAMP, -1)
How Dynamic Query Optimizer Works
44
Step 1: Applies JOIN reordering to maximize delegation
100,000,000
rows
JOIN
JOIN
100 rows 10 rows
10,000,000
rows
GROUP BY
product.name,
retailer.name
SELECT sales.retailer_fk,
sales.product_fk,
sales.amount
FROM sales JOIN time ON
sales.time_fk = time.id WHERE
time.date <
add_months(CURRENT_TIMESTAMP, -1)
SELECT
retailer.name,
retailer.id
FROM retailer
SELECT product.name,
product.id
FROM product
WHERE
produc.brand = ‘ACME’
How Dynamic Query Optimizer Works
45
Step 2
10,000 rows
JOIN
JOIN
100 rows 10 rows
1,000 rows
GROUP BY
product.name,
retailer.name
Since the JOIN is on foreign keys
(1-to-many), and the GROUP BY is
on attributes from the dimensions,
it applies the partial aggregation
push down optimization
SELECT sales.retailer_fk,
sales.product_fk,
SUM(sales.amount)
FROM sales JOIN time ON
sales.time_fk = time.id WHERE
time.date <
add_months(CURRENT_TIMESTAMP, -1)
GROUP BY sales.retailer_fk,
sales.product_fk
SELECT
retailer.name,
retailer.id
FROM retailer
SELECT product.name,
product.id
FROM product
WHERE
produc.brand = ‘ACME’
How Dynamic Query Optimizer Works
46
Step 3
Selects the right JOIN
strategy based on costs for
data volume estimations
1,000 rows
NESTED
JOIN
HASH
JOIN
100 rows10 rows
1,000 rows
GROUP BY
product.name,
retailer.name
SELECT sales.retailer_fk,
sales.product_fk,
SUM(sales.amount)
FROM sales JOIN time ON
sales.time_fk = time.id WHERE
time.date <
add_months(CURRENT_TIMESTAMP, -1)
GROUP BY sales.retailer_fk,
sales.product_fk
WHERE product.id IN (1,2,…)
SELECT
retailer.name,
retailer.id
FROM retailer
SELECT product.name,
product.id
FROM product
WHERE
produc.brand = ‘ACME’
How Dynamic Query Optimizer Works
1. Automatic JOIN reordering
 Groups branches that go to the same source to maximize query delegation and reduce processing in the DV
layer
 End users don’t need to worry about the optimal “pairing” of the tables
2. The Partial Aggregation push-down optimization is key in those scenarios. Based on
PK-FK restrictions, pushes the aggregation (for the PKs) to the DW
 Leverages the processing power of the DW, optimized for these aggregations
 Reduces significantly the data transferred through the network (from 1 b to 10 k)
3. The Cost-based Optimizer picks the right JOIN strategies based on estimations on data
volumes, existence of indexes, transfer rates, etc.
 Denodo estimates costs in a different way for parallel databases (Vertica, Netezza, Teradata) than for regular
databases to take into consideration the different way those systems operate (distributed data, parallel
processing, different aggregation techniques, etc.)
47
Summary
How Dynamic Query Optimizer Works
Automatic data movement
 Creation of temp tables in one of the systems to enable complete delegation
 Only considered as an option if the target source has the “data movement” option
enabled
 Use of native bulk load APIs for better performance
Execution Alternatives
 If a view exist in more than one system, Denodo can decide in execution time which one
to use
 The goal is to maximize query delegation depending on the other tables involved in the
query
48
Other relevant optimization techniques for LDW and Big Data
How Dynamic Query Optimizer Works
Optimizations for Virtual Partitioning
Eliminates unnecessary queries and processing based on a pre-execution analysis of the
views and the queries
 Pruning of unnecessary JOIN branches
 Relevant for horizontal partitioning and “fat” semantic models when queries do not
need attributes for all the tables
 Pruning of unnecessary UNION branches
 Enables detection of unnecessary UNION branches in vertical partitioning scenarios
 Push down of JOIN under UNION views
 Enables the delegation of JOINs with dimensions
 Automatic Data movement for partition scenarios
 Enables the delegation of JOINs with dimensions
49
Other relevant optimization techniques for LDW and Big Data
Caching
50
51
Caching
Sometimes, real time access & federation not a good fit:
 Sources are slow (ex. text files, cloud apps. like Salesforce.com)
 A lot of data processing needed (ex. complex combinations, transformations,
matching, cleansing, etc.)
 Limited access or have to mitigate impact on the sources
For these scenarios, Denodo can replicate just the relevant data in
the cache
Real time vs. caching
52
Caching
Denodo’s cache system is based on an external relational database
 Traditional (Oracle, SQLServer, DB2, MySQL, etc.)
 MPP (Teradata, Netezza, Vertica, Redshift, etc.)
 In-memory storage (Oracle TimesTen, SAP HANA)
Works at view level.
 Allows hybrid access (real-time / cached) of an execution tree
Cache Control (population / maintenance)
 Manually – user initiated at any time
 Time based - using the TTL or the Denodo Scheduler
 Event based - e.g. using JMS messages triggered in the DB
Overview
References
53
54
Further Reading
Data Virtualization Blog (http://www.datavirtualizationblog.com)
Check the following articles written by our CTO Alberto Pan in our blog:
• Myths in data virtualization performance
• Performance of Data Virtualization in Logical Data Warehouse scenarios
• Physical vs Logical Data Warehouse: the numbers
• Cost Based Optimization in Data Virtualization
Denodo Cookbook
• Data Warehouse Offloading
Success Stories
Customer Case Studies
Autodesk Overview
• Founded 1982 (NASDAQ: ASDK)
• Annual revenues (FY 2015) $2.5B
 Over 8,800 employees
• 3D modeling and animation software
 Flagship product is AutoCAD
• Market sectors:
 Architecture, Engineering, and Construction
 Manufacturing
 Media and Entertainment
 Recently started 3D Printing offerings
56
Business Drivers for Change
• Software consumption model is changing
 Perpetual licenses to subscriptions
 User want more flexibility in how they use software
• Autodesk needed to transition to subscription pricing
 2016 – some products will be subscription only
• Lifetime revenue higher with subscriptions
 Over 3-5 years, subscriptions = more revenues
• Changing a licensing model is disruptive
57
Technology Challenges
• Current ‘traditional’ BI/EDW architecture not
designed for data streams from online apps
 Weblogs, Clickstreams, Cloud/Desktop apps, etc.
• Existing infrastructure can’t simply ‘go away’
 Regulatory reporting (e.g. SEC)
 Existing ‘perpetual’ customers
• ‘Subscription’ infrastructure work in parallel
 Extend and enhance existing systems
 With single access point to all data
• Solution – ‘Logical Data Warehouse’
58
Logical Data Warehouse at Autodesk
59
Logical Data Warehouse at Autodesk
Traditional BI/Reporting
60
Logical Data Warehouse at Autodesk
‘New Data’ Ingestion
61
Logical Data Warehouse Example
Reporting on Combined Data
62
63
Problem Solution Results
Case Study Autodesk Successfully Changes Their
Revenue Model and Transforms Business
 Autodesk was changing their business
revenue model from a conventional
perpetual license model to
subscription-based license model.
 Inability to deliver high quality data in
a timely manner to business
stakeholders.
 Evolution from traditional operational
data warehouse to contemporary
logical data warehouse deemed
necessary for faster speed.
 General purpose platform to deliver
data through logical data warehouse.
 Denodo Abstraction Layer helps live
invoicing with SAP.
 Data virtualization enabled a culture
of “see before you build”.
 Successfully transitioned to
subscription-based licensing.
 For the first time, Autodesk can do
single point security enforcement and
have uniform data environment for
access.
Autodesk, Inc. is an American multinational software corporation that makes software for the
architecture, engineering, construction, manufacturing, media, and entertainment industries.
Q&A
Thanks!
www.denodo.com info@denodo.com
© Copyright Denodo Technologies. All rights reserved
Unless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical,
including photocopying and microfilm, without prior the written authorization from Denodo Technologies.

Weitere ähnliche Inhalte

Was ist angesagt?

Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?DATAVERSITY
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lakeJames Serra
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureDATAVERSITY
 
Why Data Virtualization? An Introduction
Why Data Virtualization? An IntroductionWhy Data Virtualization? An Introduction
Why Data Virtualization? An IntroductionDenodo
 
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Tristan Baker
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsKhalid Salama
 
Why an AI-Powered Data Catalog Tool is Critical to Business Success
Why an AI-Powered Data Catalog Tool is Critical to Business SuccessWhy an AI-Powered Data Catalog Tool is Critical to Business Success
Why an AI-Powered Data Catalog Tool is Critical to Business SuccessInformatica
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsDATAVERSITY
 
Intro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeIntro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeKent Graziano
 
You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?Precisely
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?DATAVERSITY
 
Présentation data vault et bi v20120508
Présentation data vault et bi v20120508Présentation data vault et bi v20120508
Présentation data vault et bi v20120508Empowered Holdings, LLC
 
Activate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogActivate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogDATAVERSITY
 
Data Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital TransformationData Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital TransformationDATAVERSITY
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...DataScienceConferenc1
 
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesPutting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesDATAVERSITY
 
Data Mesh at CMC Markets: Past, Present and Future
Data Mesh at CMC Markets: Past, Present and FutureData Mesh at CMC Markets: Past, Present and Future
Data Mesh at CMC Markets: Past, Present and FutureLorenzo Nicora
 
MDM Strategy & Roadmap
MDM Strategy & RoadmapMDM Strategy & Roadmap
MDM Strategy & Roadmapvictorlbrown
 

Was ist angesagt? (20)

Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data Architecture
 
Why Data Virtualization? An Introduction
Why Data Virtualization? An IntroductionWhy Data Virtualization? An Introduction
Why Data Virtualization? An Introduction
 
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake Analytics
 
Why an AI-Powered Data Catalog Tool is Critical to Business Success
Why an AI-Powered Data Catalog Tool is Critical to Business SuccessWhy an AI-Powered Data Catalog Tool is Critical to Business Success
Why an AI-Powered Data Catalog Tool is Critical to Business Success
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 
Intro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeIntro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on Snowflake
 
Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3
 
You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?
 
Why Data Vault?
Why Data Vault? Why Data Vault?
Why Data Vault?
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?
 
Présentation data vault et bi v20120508
Présentation data vault et bi v20120508Présentation data vault et bi v20120508
Présentation data vault et bi v20120508
 
Activate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogActivate Data Governance Using the Data Catalog
Activate Data Governance Using the Data Catalog
 
Data Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital TransformationData Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital Transformation
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
 
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesPutting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
 
Data Mesh at CMC Markets: Past, Present and Future
Data Mesh at CMC Markets: Past, Present and FutureData Mesh at CMC Markets: Past, Present and Future
Data Mesh at CMC Markets: Past, Present and Future
 
MDM Strategy & Roadmap
MDM Strategy & RoadmapMDM Strategy & Roadmap
MDM Strategy & Roadmap
 

Ähnlich wie Logical Data Warehouse and Data Lakes

Fast Data Strategy Houston Roadshow Presentation
Fast Data Strategy Houston Roadshow PresentationFast Data Strategy Houston Roadshow Presentation
Fast Data Strategy Houston Roadshow PresentationDenodo
 
Data Ninja Webinar Series: Realizing the Promise of Data Lakes
Data Ninja Webinar Series: Realizing the Promise of Data LakesData Ninja Webinar Series: Realizing the Promise of Data Lakes
Data Ninja Webinar Series: Realizing the Promise of Data LakesDenodo
 
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Denodo
 
Virtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & BénéficesVirtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & BénéficesDenodo
 
Data Virtualization. An Introduction (ASEAN)
Data Virtualization. An Introduction (ASEAN)Data Virtualization. An Introduction (ASEAN)
Data Virtualization. An Introduction (ASEAN)Denodo
 
Data Integration Alternatives: When to use Data Virtualization, ETL, and ESB
Data Integration Alternatives: When to use Data Virtualization, ETL, and ESBData Integration Alternatives: When to use Data Virtualization, ETL, and ESB
Data Integration Alternatives: When to use Data Virtualization, ETL, and ESBDenodo
 
Bridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need ItBridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need ItDenodo
 
Connecting Silos in Real Time with Data Virtualization
Connecting Silos in Real Time with Data VirtualizationConnecting Silos in Real Time with Data Virtualization
Connecting Silos in Real Time with Data VirtualizationDenodo
 
Innovative Data Strategies for Advanced Analytics Solutions and the Role of D...
Innovative Data Strategies for Advanced Analytics Solutions and the Role of D...Innovative Data Strategies for Advanced Analytics Solutions and the Role of D...
Innovative Data Strategies for Advanced Analytics Solutions and the Role of D...Denodo
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An IntroductionDenodo
 
Big Data LDN 2018: CONNECTING SILOS IN REAL-TIME WITH DATA VIRTUALIZATION
Big Data LDN 2018: CONNECTING SILOS IN REAL-TIME WITH DATA VIRTUALIZATIONBig Data LDN 2018: CONNECTING SILOS IN REAL-TIME WITH DATA VIRTUALIZATION
Big Data LDN 2018: CONNECTING SILOS IN REAL-TIME WITH DATA VIRTUALIZATIONMatt Stubbs
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationDenodo
 
Enabling Self-Service Analytics with Logical Data Warehouse
Enabling Self-Service Analytics with Logical Data WarehouseEnabling Self-Service Analytics with Logical Data Warehouse
Enabling Self-Service Analytics with Logical Data WarehouseDenodo
 
Education Seminar: Self-service BI, Logical Data Warehouse and Data Lakes
Education Seminar: Self-service BI, Logical Data Warehouse and Data LakesEducation Seminar: Self-service BI, Logical Data Warehouse and Data Lakes
Education Seminar: Self-service BI, Logical Data Warehouse and Data LakesDenodo
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
Data Ninja Webinar Series: Accelerating Business Value with Data Virtualizati...
Data Ninja Webinar Series: Accelerating Business Value with Data Virtualizati...Data Ninja Webinar Series: Accelerating Business Value with Data Virtualizati...
Data Ninja Webinar Series: Accelerating Business Value with Data Virtualizati...Denodo
 
Rethink Your 2021 Data Management Strategy with Data Virtualization (ASEAN)
Rethink Your 2021 Data Management Strategy with Data Virtualization (ASEAN)Rethink Your 2021 Data Management Strategy with Data Virtualization (ASEAN)
Rethink Your 2021 Data Management Strategy with Data Virtualization (ASEAN)Denodo
 
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRobertsWP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRobertsJane Roberts
 
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...Dataconomy Media
 

Ähnlich wie Logical Data Warehouse and Data Lakes (20)

Fast Data Strategy Houston Roadshow Presentation
Fast Data Strategy Houston Roadshow PresentationFast Data Strategy Houston Roadshow Presentation
Fast Data Strategy Houston Roadshow Presentation
 
Data Ninja Webinar Series: Realizing the Promise of Data Lakes
Data Ninja Webinar Series: Realizing the Promise of Data LakesData Ninja Webinar Series: Realizing the Promise of Data Lakes
Data Ninja Webinar Series: Realizing the Promise of Data Lakes
 
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
 
Virtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & BénéficesVirtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & Bénéfices
 
Data Virtualization. An Introduction (ASEAN)
Data Virtualization. An Introduction (ASEAN)Data Virtualization. An Introduction (ASEAN)
Data Virtualization. An Introduction (ASEAN)
 
Data Integration Alternatives: When to use Data Virtualization, ETL, and ESB
Data Integration Alternatives: When to use Data Virtualization, ETL, and ESBData Integration Alternatives: When to use Data Virtualization, ETL, and ESB
Data Integration Alternatives: When to use Data Virtualization, ETL, and ESB
 
Bridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need ItBridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need It
 
Connecting Silos in Real Time with Data Virtualization
Connecting Silos in Real Time with Data VirtualizationConnecting Silos in Real Time with Data Virtualization
Connecting Silos in Real Time with Data Virtualization
 
Innovative Data Strategies for Advanced Analytics Solutions and the Role of D...
Innovative Data Strategies for Advanced Analytics Solutions and the Role of D...Innovative Data Strategies for Advanced Analytics Solutions and the Role of D...
Innovative Data Strategies for Advanced Analytics Solutions and the Role of D...
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
 
Big Data LDN 2018: CONNECTING SILOS IN REAL-TIME WITH DATA VIRTUALIZATION
Big Data LDN 2018: CONNECTING SILOS IN REAL-TIME WITH DATA VIRTUALIZATIONBig Data LDN 2018: CONNECTING SILOS IN REAL-TIME WITH DATA VIRTUALIZATION
Big Data LDN 2018: CONNECTING SILOS IN REAL-TIME WITH DATA VIRTUALIZATION
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data Virtualization
 
Enabling Self-Service Analytics with Logical Data Warehouse
Enabling Self-Service Analytics with Logical Data WarehouseEnabling Self-Service Analytics with Logical Data Warehouse
Enabling Self-Service Analytics with Logical Data Warehouse
 
Education Seminar: Self-service BI, Logical Data Warehouse and Data Lakes
Education Seminar: Self-service BI, Logical Data Warehouse and Data LakesEducation Seminar: Self-service BI, Logical Data Warehouse and Data Lakes
Education Seminar: Self-service BI, Logical Data Warehouse and Data Lakes
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data Architecture
 
Data Ninja Webinar Series: Accelerating Business Value with Data Virtualizati...
Data Ninja Webinar Series: Accelerating Business Value with Data Virtualizati...Data Ninja Webinar Series: Accelerating Business Value with Data Virtualizati...
Data Ninja Webinar Series: Accelerating Business Value with Data Virtualizati...
 
Data wirehouse
Data wirehouseData wirehouse
Data wirehouse
 
Rethink Your 2021 Data Management Strategy with Data Virtualization (ASEAN)
Rethink Your 2021 Data Management Strategy with Data Virtualization (ASEAN)Rethink Your 2021 Data Management Strategy with Data Virtualization (ASEAN)
Rethink Your 2021 Data Management Strategy with Data Virtualization (ASEAN)
 
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRobertsWP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
 
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
 

Mehr von Denodo

Enterprise Monitoring and Auditing in Denodo
Enterprise Monitoring and Auditing in DenodoEnterprise Monitoring and Auditing in Denodo
Enterprise Monitoring and Auditing in DenodoDenodo
 
Lunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps Approach
Lunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps ApproachLunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps Approach
Lunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps ApproachDenodo
 
Achieving Self-Service Analytics with a Governed Data Services Layer
Achieving Self-Service Analytics with a Governed Data Services LayerAchieving Self-Service Analytics with a Governed Data Services Layer
Achieving Self-Service Analytics with a Governed Data Services LayerDenodo
 
What you need to know about Generative AI and Data Management?
What you need to know about Generative AI and Data Management?What you need to know about Generative AI and Data Management?
What you need to know about Generative AI and Data Management?Denodo
 
Mastering Data Compliance in a Dynamic Business Landscape
Mastering Data Compliance in a Dynamic Business LandscapeMastering Data Compliance in a Dynamic Business Landscape
Mastering Data Compliance in a Dynamic Business LandscapeDenodo
 
Denodo Partner Connect: Business Value Demo with Denodo Demo Lite
Denodo Partner Connect: Business Value Demo with Denodo Demo LiteDenodo Partner Connect: Business Value Demo with Denodo Demo Lite
Denodo Partner Connect: Business Value Demo with Denodo Demo LiteDenodo
 
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...Denodo
 
Drive Data Privacy Regulatory Compliance
Drive Data Privacy Regulatory ComplianceDrive Data Privacy Regulatory Compliance
Drive Data Privacy Regulatory ComplianceDenodo
 
Знакомство с виртуализацией данных для профессионалов в области данных
Знакомство с виртуализацией данных для профессионалов в области данныхЗнакомство с виртуализацией данных для профессионалов в области данных
Знакомство с виртуализацией данных для профессионалов в области данныхDenodo
 
Data Democratization: A Secret Sauce to Say Goodbye to Data Fragmentation
Data Democratization: A Secret Sauce to Say Goodbye to Data FragmentationData Democratization: A Secret Sauce to Say Goodbye to Data Fragmentation
Data Democratization: A Secret Sauce to Say Goodbye to Data FragmentationDenodo
 
Denodo Partner Connect - Technical Webinar - Ask Me Anything
Denodo Partner Connect - Technical Webinar - Ask Me AnythingDenodo Partner Connect - Technical Webinar - Ask Me Anything
Denodo Partner Connect - Technical Webinar - Ask Me AnythingDenodo
 
Lunch and Learn ANZ: Key Takeaways for 2023!
Lunch and Learn ANZ: Key Takeaways for 2023!Lunch and Learn ANZ: Key Takeaways for 2023!
Lunch and Learn ANZ: Key Takeaways for 2023!Denodo
 
It’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way Forward
It’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way ForwardIt’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way Forward
It’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way ForwardDenodo
 
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...Denodo
 
Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...
Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...
Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...Denodo
 
How to Build Your Data Marketplace with Data Virtualization?
How to Build Your Data Marketplace with Data Virtualization?How to Build Your Data Marketplace with Data Virtualization?
How to Build Your Data Marketplace with Data Virtualization?Denodo
 
Webinar #2 - Transforming Challenges into Opportunities for Credit Unions
Webinar #2 - Transforming Challenges into Opportunities for Credit UnionsWebinar #2 - Transforming Challenges into Opportunities for Credit Unions
Webinar #2 - Transforming Challenges into Opportunities for Credit UnionsDenodo
 
Enabling Data Catalog users with advanced usability
Enabling Data Catalog users with advanced usabilityEnabling Data Catalog users with advanced usability
Enabling Data Catalog users with advanced usabilityDenodo
 
Denodo Partner Connect: Technical Webinar - Architect Associate Certification...
Denodo Partner Connect: Technical Webinar - Architect Associate Certification...Denodo Partner Connect: Technical Webinar - Architect Associate Certification...
Denodo Partner Connect: Technical Webinar - Architect Associate Certification...Denodo
 
GenAI y el futuro de la gestión de datos: mitos y realidades
GenAI y el futuro de la gestión de datos: mitos y realidadesGenAI y el futuro de la gestión de datos: mitos y realidades
GenAI y el futuro de la gestión de datos: mitos y realidadesDenodo
 

Mehr von Denodo (20)

Enterprise Monitoring and Auditing in Denodo
Enterprise Monitoring and Auditing in DenodoEnterprise Monitoring and Auditing in Denodo
Enterprise Monitoring and Auditing in Denodo
 
Lunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps Approach
Lunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps ApproachLunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps Approach
Lunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps Approach
 
Achieving Self-Service Analytics with a Governed Data Services Layer
Achieving Self-Service Analytics with a Governed Data Services LayerAchieving Self-Service Analytics with a Governed Data Services Layer
Achieving Self-Service Analytics with a Governed Data Services Layer
 
What you need to know about Generative AI and Data Management?
What you need to know about Generative AI and Data Management?What you need to know about Generative AI and Data Management?
What you need to know about Generative AI and Data Management?
 
Mastering Data Compliance in a Dynamic Business Landscape
Mastering Data Compliance in a Dynamic Business LandscapeMastering Data Compliance in a Dynamic Business Landscape
Mastering Data Compliance in a Dynamic Business Landscape
 
Denodo Partner Connect: Business Value Demo with Denodo Demo Lite
Denodo Partner Connect: Business Value Demo with Denodo Demo LiteDenodo Partner Connect: Business Value Demo with Denodo Demo Lite
Denodo Partner Connect: Business Value Demo with Denodo Demo Lite
 
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
 
Drive Data Privacy Regulatory Compliance
Drive Data Privacy Regulatory ComplianceDrive Data Privacy Regulatory Compliance
Drive Data Privacy Regulatory Compliance
 
Знакомство с виртуализацией данных для профессионалов в области данных
Знакомство с виртуализацией данных для профессионалов в области данныхЗнакомство с виртуализацией данных для профессионалов в области данных
Знакомство с виртуализацией данных для профессионалов в области данных
 
Data Democratization: A Secret Sauce to Say Goodbye to Data Fragmentation
Data Democratization: A Secret Sauce to Say Goodbye to Data FragmentationData Democratization: A Secret Sauce to Say Goodbye to Data Fragmentation
Data Democratization: A Secret Sauce to Say Goodbye to Data Fragmentation
 
Denodo Partner Connect - Technical Webinar - Ask Me Anything
Denodo Partner Connect - Technical Webinar - Ask Me AnythingDenodo Partner Connect - Technical Webinar - Ask Me Anything
Denodo Partner Connect - Technical Webinar - Ask Me Anything
 
Lunch and Learn ANZ: Key Takeaways for 2023!
Lunch and Learn ANZ: Key Takeaways for 2023!Lunch and Learn ANZ: Key Takeaways for 2023!
Lunch and Learn ANZ: Key Takeaways for 2023!
 
It’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way Forward
It’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way ForwardIt’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way Forward
It’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way Forward
 
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...
 
Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...
Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...
Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...
 
How to Build Your Data Marketplace with Data Virtualization?
How to Build Your Data Marketplace with Data Virtualization?How to Build Your Data Marketplace with Data Virtualization?
How to Build Your Data Marketplace with Data Virtualization?
 
Webinar #2 - Transforming Challenges into Opportunities for Credit Unions
Webinar #2 - Transforming Challenges into Opportunities for Credit UnionsWebinar #2 - Transforming Challenges into Opportunities for Credit Unions
Webinar #2 - Transforming Challenges into Opportunities for Credit Unions
 
Enabling Data Catalog users with advanced usability
Enabling Data Catalog users with advanced usabilityEnabling Data Catalog users with advanced usability
Enabling Data Catalog users with advanced usability
 
Denodo Partner Connect: Technical Webinar - Architect Associate Certification...
Denodo Partner Connect: Technical Webinar - Architect Associate Certification...Denodo Partner Connect: Technical Webinar - Architect Associate Certification...
Denodo Partner Connect: Technical Webinar - Architect Associate Certification...
 
GenAI y el futuro de la gestión de datos: mitos y realidades
GenAI y el futuro de la gestión de datos: mitos y realidadesGenAI y el futuro de la gestión de datos: mitos y realidades
GenAI y el futuro de la gestión de datos: mitos y realidades
 

Kürzlich hochgeladen

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 

Kürzlich hochgeladen (20)

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 

Logical Data Warehouse and Data Lakes

  • 1. New York City 9th June, 2016 Logical Data Warehouse, Data Lakes, and Data Services Marketplaces
  • 2. Agenda1.Introductions 2.Logical Data Warehouse and Data Lakes 3.Coffee Break 4.Data Services Marketplaces 5.Q&A
  • 3. 3 HEADQUARTERS Palo Alto, CA. DENODO OFFICES, CUSTOMERS, PARTNERS Global presence throughout North America, EMEA, APAC, and Latin America. CUSTOMERS 250+ customers, including many F500 and G2000 companies across every major industry have gained significant business agility and ROI. LEADERSHIP  Longest continuous focus on data virtualization and data services.  Product leadership.  Solutions expertise. 3 THE LEADER IN DATA VIRTUALIZATION Denodo provides agile, high performance data integration and data abstraction across the broadest range of enterprise, cloud, big data and unstructured data sources, and real-time data services at half the cost of traditional approaches.
  • 4. Speakers Paul Moxon Senior Director of Product Management, Denodo Pablo Álvarez Principal Technical Account Manager, Denodo Rubén Fernández Technical Account Manager, Denodo
  • 5. Logical Data Warehouse and Data Lakes New York City June 2016
  • 6. Agenda1.The Logical Data Warehouse 2.Different Types, Different Needs 3.Performance in a LDW 4.Customer Success Stories 5.Q&A
  • 7. What is a Logical Data Warehouse? A logical data warehouse is a data system that follows the ideas of traditional EDW (star or snowflake schemas) and includes, in addition to one (or more) core DWs, data from external sources. The main motivations are improved decision making and/or cost reduction
  • 8. Logical Data Warehouse Description:  “The Logical Data Warehouse (LDW) is a new data management architecture for analytics combining the strengths of traditional repository warehouses with alternative data management and access strategy. The LDW will form a new best practice by the end of 2015.”  “The LDW is an evolution and augmentation of DW practices, not a replacement”  “A repository-only style DW contains a single ontology/taxonomy, whereas in the LDW a semantic layer can contain many combination of use cases, many business definitions of the same information”  “The LDW permits an IT organization to make a large number of datasets available for analysis via query tools and applications.” 8 Gartner Definition Gartner Hype Cycle for Enterprise Information Management, 2012
  • 9. Logical Data Warehouse Description:  “The Logical Data Warehouse (LDW) is a new data management architecture for analytics combining the strengths of traditional repository warehouses with alternative data management and access strategy. The LDW will form a new best practice by the end of 2015.”  “The LDW is an evolution and augmentation of DW practices, not a replacement”  “A repository-only style DW contains a single ontology/taxonomy, whereas in the LDW a semantic layer can contain many combination of use cases, many business definitions of the same information”  “The LDW permits an IT organization to make a large number of datasets available for analysis via query tools and applications.” 9 Gartner Definition Gartner Hype Cycle for Enterprise Information Management, 2012
  • 10. Logical Data Warehouse Description:  A semantic layer on top of the data warehouse that keeps the business data definition.  Allows the integration of multiple data sources including enterprise systems, the data warehouse, additional processing nodes (analytical appliances, Big Data, …), Web, Cloud and unstructured data.  Publishes data to multiple applications and reporting tools. 10
  • 11. 11 Three Integration/Semantic Layer Alternatives Gartner’s View of Data Integration Application/BI Tool as Data Integration/Semantic Layer EDW as Data Integration/Semantic Layer Data Virtualization as Data Integration/Semantic Layer Application/BI Tool Data Virtualization EDW EDW ODS ODS EDW ODS
  • 12. 12 Application/BI Tool as the Data Integration Layer Application/BI Tool as Data Integration/Semantic Layer Application/BI Tool EDW ODS • Integration is delegated to end user tools and applications • e.g. BI Tools with ‘data blending’ • Results in duplication of effort – integration defined many times in different tools • Impact of change in data schema? • End user tools are not intended to be integration middleware • Not their primary purpose or expertise
  • 13. 13 EDW as the Data Integration Layer EDW as Data Integration/Semantic Layer EDW ODS • Access to ‘other’ data (query federation) via EDW • Teradata QueryGrid, IBM FluidQuery, SAP Smart Data Access, etc. • Often coupled with traditional ETL replication of data into EDW • EDW ‘center of data universe’ • Provides data integration and semantic layer • Appears attractive to organizations heavily invested in EDW • More than one EDW? EDW costs?
  • 14. 14 Data Virtualization as the Data Integration Layer Data Virtualization as Data Integration/Semantic Layer Data Virtualization EDW ODS • Move data integration and semantic layer to independent Data Virtualization platform • Purpose built for supporting data access across multiple heterogeneous data sources • Separate layer provides semantic models for underlying data • Physical to logical mapping • Enforces common and consistent security and governance policies • Gartner’s recommended approach
  • 15. Logical Data Warehouse 15 EDW Hadoop Cluster Sales HDFS Files Document Collections NoSQL Database ERP Database Excel
  • 16. 16 Logical Data Warehouse Reference Architecture by Denodo
  • 17. 17 The State and Future of Data Integration. Gartner, 25 may 2016 Physical data movement architectures that aren’t designed to support the dynamic nature of business change, volatile requirements and massive data volume are increasingly being replaced by data virtualization. Evolving approaches (such as the use of LDW architectures) include implementations beyond repository-centric techniques
  • 18. What about the Logical Data Lake? A Data Lake will not have a star or snowflake schema, but rather a more heterogeneous collection of views with raw data from heterogeneous sources The virtual layer will act as a common umbrella under which these different sources are presented to the end user as a single system However, from the virtualization perspective, a Virtual Data Lake shares many technical aspects with a LDW and most of these contents also apply to a Logical Data Lake
  • 19. Different Types, Different Needs Common Patterns for a Logical Data Warehouse
  • 20. 20 Common Patterns for a Logical Data Warehouse 1. The Virtual Data Mart 2. DW + MDM  Data Warehouse extended with master data 3. DW + Cloud  Data Warehouse extended with cloud data 4. DW + DW  Integration of multiple Data Warehouse 5. DW historical offloading  DW horizontal partitioning with historical data in cheaper storage 6. Slim DW extension  DW vertical partitioning with rarely used data in cheaper storage
  • 21. 21 Virtual Data Marts Business friendly models defined on top of one or multiple systems, often “flavored” for a particular division Motivation  Hide complexity of star schemas for business users  Simplify model for a particular vertical  Reuse semantic models and security across multiple reporting engines Typical queries  Simple projections, filters and aggregations on top of curated “fat tables” that merge data from facts and many dimensions Simplified semantic models for business users
  • 22. 22 Virtual Data Marts Time Dimension Fact table (sales) Product Retailer Dimension Sales EDW Others Product Prod. Details
  • 23. 23 DW + MDM Slim dimensions with extended information maintained in an external MDM system Motivation  Keep a single copy of golden records in the MDM that can be reused across systems and managed in a single place Typical queries  Join a large fact table (DW) with several MDM dimensions, aggregations on top Example  Revenue by customer, projecting the address from the MDM
  • 24. 24 DW + MDM dimensions Time Dimension Fact table (sales) Product Dimension Retailer Dimension EDW MDM
  • 25. 25 DW + Cloud dimensional data Fresh data from cloud systems (e.g. SFDC) is mixed with the EDW, usually on the dimensions. DW is sometimes also in the cloud. Motivation  Take advantage of “fresh” data coming straight from SaaS systems  Avoid local replication of cloud systems Typical queries  Dimensions are joined with cloud data to filter based on some external attribute not available (or not current) in the EDW Example  Report on current revenue on accounts where the potential for an expansion is higher than 80%
  • 26. 26 DW + Cloud dimensional data Time Dimension Fact table (sales) Product Dimension Customer Dimension CRM SFDC Customer EDW
  • 27. 27 Multiple DW integration Motivation  Merges and acquisitions  Different DWs by department  Transition to new EDW Deployments (migration to Spark, Redshift, etc.) Typical queries  Joins across fact tables in different DW with aggregations before or after the JOIN Example  Get customers with a purchases higher than 100 USD that do not have a fidelity card (purchases and fidelity card data in different DW) Use of multiple DW as if it was only one
  • 28. 28 Multiple DW integration Time Dimensi on Sales fact Product Dimension Region Finance EDW City Marketing EDW Customer Fidelity factsProduct Dimension *Real Examples: Nationwide POC, IBM tests Store
  • 29. 29 DW Historical Partitioning Only the most current data (e.g. last year) is in the EDW. Historical data is offloaded to a Hadoop cluster Motivations  Reduce storage cost  Transparently use the two datasets as if they were all together Typical queries  Facts are defined as a partitioned UNION based on date  Queries join the “virtual fact” with dimensions and aggregate on top Example  Queries on current date only need to go to the DW, but longer timespans need to merge with Hadoop Horizontal partitioning
  • 30. 30 DW Historical offloading Horizontal partitioning Time Dimension Fact table (sales) Product Dimension Retailer Dimension Current Sales Historical Sales EDW
  • 31. 31 Slim DW extension Minimal DW, with more complete raw data in a Hadoop cluster Motivation  Reduce cost  Transparently use the two datasets as if they were all together Typical queries  Tables are defined virtually as 1-to-1 joins between the two systems  Queries join the facts with dimensions and aggregate on top Example  Common queries only need to go to the DW, but some queries need attributes or measures from Hadoop Vertical partitioning
  • 32. 32 Slim DW extension Vertical partitioning Time Dimension Fact table (sales) Product Dimension Retailer Dimension Slim Sales Extended Sales EDW
  • 34. 34 It is a common assumption that a virtualized solution will be much slower than a persisted approach via ETL: 1. There is a large amount of data moved through the network for each query 2. Network transfer is slow But is this really true?
  • 35. 35 Debunking the myths of virtual performance 1. Complex queries can be solved transferring moderate data volumes when the right techniques are applied  Operational queries  Predicate delegation produces small result sets  Logical Data Warehouse and Big Data  Denodo uses characteristics of underlying star schemas to apply query rewriting rules that maximize delegation to specialized sources (especially heavy GROUP BY) and minimize data movement 2. Current networks are almost as fast as reading from disk  10GB and 100GB Ethernet are a commodity
  • 36. 36 Denodo has done extensive testing using queries from the standard benchmarking test TPC-DS* and the following scenario Compares the performance of a federated approach in Denodo with an MPP system where all the data has been replicated via ETL Customer Dim. 2 M rows Sales Facts 290 M rows Items Dim. 400 K rows * TPC-DS is the de-facto industry standard benchmark for measuring the performance of decision support solutions including, but not limited to, Big Data systems. vs. Sales Facts 290 M rows Items Dim. 400 K rows Customer Dim. 2 M rows Performance Comparison Logical Data Warehouse vs. Physical Data Warehouse
  • 37. 37 Performance Comparison Query Description Returned Rows Time Netezza Time Denodo (Federated Oracle, Netezza & SQL Server) Optimization Technique (automatically selected) Total sales by customer 1,99 M 20.9 sec. 21.4 sec. Full aggregation push-down Total sales by customer and year between 2000 and 2004 5,51 M 52.3 sec. 59.0 sec Full aggregation push-down Total sales by item brand 31,35 K 4.7 sec. 5.0 sec. Partial aggregation push-down Total sales by item where sale price less than current list price 17,05 K 3.5 sec. 5.2 sec On the fly data movement Logical Data Warehouse vs. Physical Data Warehouse
  • 38. 38 Performance and optimizations in Denodo Focused on 3 core concepts Dynamic Multi-Source Query Execution Plans Leverages processing power & architecture of data sources Dynamic to support ad hoc queries Uses statistics for cost-based query plans Selective Materialization Intelligent Caching of only the most relevant and often used information Optimized Resource Management Smart allocation of resources to handle high concurrency Throttling to control and mitigate source impact Resource plans based on rules
  • 39. 39 Performance and optimizations in Denodo Comparing optimizations in DV vs ETL Although Data Virtualization is a data integration platform, architecturally speaking it is more similar to a RDBMs Uses relational logic Metadata is equivalent to that of a database Enables ad hoc querying Key difference between ETL engines and DV: ETL engines are optimized for static bulk movements Fixed data flows Data virtualization is optimized for queries Dynamic execution plan per query Therefore, the performance architecture presented here resembles that of a RDBMS
  • 41. 41 Step by Step Metadata Query Tree • Maps query entities (tables, fields) to actual metadata • Retrieves execution capabilities and restrictions for views involved in the query Static Optimizer • Query delegation • SQL rewriting rules (removal of redundant filters, tree pruning, join reordering, transformation push-up, star-schema rewritings, etc.) • Data movement query plans Cost Based Optimizer • Picks optimal JOIN methods and orders based on data distribution statistics, indexes, transfer rates, etc. Physical Execution Plan • Creates the calls to the underlying systems in their corresponding protocols and dialects (SQL, MDX, WS calls, etc.) How Dynamic Query Optimizer Works
  • 42. How Dynamic Query Optimizer Works 42 Example: Total sales by retailer and product during the last month for the brand ACME Time Dimension Fact table (sales) Product Dimension Retailer Dimension EDW MDM SELECT retailer.name, product.name, SUM(sales.amount) FROM sales JOIN retailer ON sales.retailer_fk = retailer.id JOIN product ON sales.product_fk = product.id JOIN time ON sales.time_fk = time.id WHERE time.date < ADDMONTH(NOW(),-1) AND product.brand = ‘ACME’ GROUP BY product.name, retailer.name
  • 43. How Dynamic Query Optimizer Works 43 Example: Non-optimized 1,000,000,0 00 rows JOIN JOIN JOIN GROUP BY product.name, retailer.name 100 rows 10 rows 30 rows 10,000,000 rows SELECT sales.retailer_fk, sales.product_fk, sales.time_fk, sales.amount FROM sales SELECT retailer.name, retailer.id FROM retailer SELECT product.name, product.id FROM product WHERE produc.brand = ‘ACME’ SELECT time.date, time.id FROM time WHERE time.date < add_months(CURRENT_ TIMESTAMP, -1)
  • 44. How Dynamic Query Optimizer Works 44 Step 1: Applies JOIN reordering to maximize delegation 100,000,000 rows JOIN JOIN 100 rows 10 rows 10,000,000 rows GROUP BY product.name, retailer.name SELECT sales.retailer_fk, sales.product_fk, sales.amount FROM sales JOIN time ON sales.time_fk = time.id WHERE time.date < add_months(CURRENT_TIMESTAMP, -1) SELECT retailer.name, retailer.id FROM retailer SELECT product.name, product.id FROM product WHERE produc.brand = ‘ACME’
  • 45. How Dynamic Query Optimizer Works 45 Step 2 10,000 rows JOIN JOIN 100 rows 10 rows 1,000 rows GROUP BY product.name, retailer.name Since the JOIN is on foreign keys (1-to-many), and the GROUP BY is on attributes from the dimensions, it applies the partial aggregation push down optimization SELECT sales.retailer_fk, sales.product_fk, SUM(sales.amount) FROM sales JOIN time ON sales.time_fk = time.id WHERE time.date < add_months(CURRENT_TIMESTAMP, -1) GROUP BY sales.retailer_fk, sales.product_fk SELECT retailer.name, retailer.id FROM retailer SELECT product.name, product.id FROM product WHERE produc.brand = ‘ACME’
  • 46. How Dynamic Query Optimizer Works 46 Step 3 Selects the right JOIN strategy based on costs for data volume estimations 1,000 rows NESTED JOIN HASH JOIN 100 rows10 rows 1,000 rows GROUP BY product.name, retailer.name SELECT sales.retailer_fk, sales.product_fk, SUM(sales.amount) FROM sales JOIN time ON sales.time_fk = time.id WHERE time.date < add_months(CURRENT_TIMESTAMP, -1) GROUP BY sales.retailer_fk, sales.product_fk WHERE product.id IN (1,2,…) SELECT retailer.name, retailer.id FROM retailer SELECT product.name, product.id FROM product WHERE produc.brand = ‘ACME’
  • 47. How Dynamic Query Optimizer Works 1. Automatic JOIN reordering  Groups branches that go to the same source to maximize query delegation and reduce processing in the DV layer  End users don’t need to worry about the optimal “pairing” of the tables 2. The Partial Aggregation push-down optimization is key in those scenarios. Based on PK-FK restrictions, pushes the aggregation (for the PKs) to the DW  Leverages the processing power of the DW, optimized for these aggregations  Reduces significantly the data transferred through the network (from 1 b to 10 k) 3. The Cost-based Optimizer picks the right JOIN strategies based on estimations on data volumes, existence of indexes, transfer rates, etc.  Denodo estimates costs in a different way for parallel databases (Vertica, Netezza, Teradata) than for regular databases to take into consideration the different way those systems operate (distributed data, parallel processing, different aggregation techniques, etc.) 47 Summary
  • 48. How Dynamic Query Optimizer Works Automatic data movement  Creation of temp tables in one of the systems to enable complete delegation  Only considered as an option if the target source has the “data movement” option enabled  Use of native bulk load APIs for better performance Execution Alternatives  If a view exist in more than one system, Denodo can decide in execution time which one to use  The goal is to maximize query delegation depending on the other tables involved in the query 48 Other relevant optimization techniques for LDW and Big Data
  • 49. How Dynamic Query Optimizer Works Optimizations for Virtual Partitioning Eliminates unnecessary queries and processing based on a pre-execution analysis of the views and the queries  Pruning of unnecessary JOIN branches  Relevant for horizontal partitioning and “fat” semantic models when queries do not need attributes for all the tables  Pruning of unnecessary UNION branches  Enables detection of unnecessary UNION branches in vertical partitioning scenarios  Push down of JOIN under UNION views  Enables the delegation of JOINs with dimensions  Automatic Data movement for partition scenarios  Enables the delegation of JOINs with dimensions 49 Other relevant optimization techniques for LDW and Big Data
  • 51. 51 Caching Sometimes, real time access & federation not a good fit:  Sources are slow (ex. text files, cloud apps. like Salesforce.com)  A lot of data processing needed (ex. complex combinations, transformations, matching, cleansing, etc.)  Limited access or have to mitigate impact on the sources For these scenarios, Denodo can replicate just the relevant data in the cache Real time vs. caching
  • 52. 52 Caching Denodo’s cache system is based on an external relational database  Traditional (Oracle, SQLServer, DB2, MySQL, etc.)  MPP (Teradata, Netezza, Vertica, Redshift, etc.)  In-memory storage (Oracle TimesTen, SAP HANA) Works at view level.  Allows hybrid access (real-time / cached) of an execution tree Cache Control (population / maintenance)  Manually – user initiated at any time  Time based - using the TTL or the Denodo Scheduler  Event based - e.g. using JMS messages triggered in the DB Overview
  • 54. 54 Further Reading Data Virtualization Blog (http://www.datavirtualizationblog.com) Check the following articles written by our CTO Alberto Pan in our blog: • Myths in data virtualization performance • Performance of Data Virtualization in Logical Data Warehouse scenarios • Physical vs Logical Data Warehouse: the numbers • Cost Based Optimization in Data Virtualization Denodo Cookbook • Data Warehouse Offloading
  • 56. Autodesk Overview • Founded 1982 (NASDAQ: ASDK) • Annual revenues (FY 2015) $2.5B  Over 8,800 employees • 3D modeling and animation software  Flagship product is AutoCAD • Market sectors:  Architecture, Engineering, and Construction  Manufacturing  Media and Entertainment  Recently started 3D Printing offerings 56
  • 57. Business Drivers for Change • Software consumption model is changing  Perpetual licenses to subscriptions  User want more flexibility in how they use software • Autodesk needed to transition to subscription pricing  2016 – some products will be subscription only • Lifetime revenue higher with subscriptions  Over 3-5 years, subscriptions = more revenues • Changing a licensing model is disruptive 57
  • 58. Technology Challenges • Current ‘traditional’ BI/EDW architecture not designed for data streams from online apps  Weblogs, Clickstreams, Cloud/Desktop apps, etc. • Existing infrastructure can’t simply ‘go away’  Regulatory reporting (e.g. SEC)  Existing ‘perpetual’ customers • ‘Subscription’ infrastructure work in parallel  Extend and enhance existing systems  With single access point to all data • Solution – ‘Logical Data Warehouse’ 58
  • 59. Logical Data Warehouse at Autodesk 59
  • 60. Logical Data Warehouse at Autodesk Traditional BI/Reporting 60
  • 61. Logical Data Warehouse at Autodesk ‘New Data’ Ingestion 61
  • 62. Logical Data Warehouse Example Reporting on Combined Data 62
  • 63. 63 Problem Solution Results Case Study Autodesk Successfully Changes Their Revenue Model and Transforms Business  Autodesk was changing their business revenue model from a conventional perpetual license model to subscription-based license model.  Inability to deliver high quality data in a timely manner to business stakeholders.  Evolution from traditional operational data warehouse to contemporary logical data warehouse deemed necessary for faster speed.  General purpose platform to deliver data through logical data warehouse.  Denodo Abstraction Layer helps live invoicing with SAP.  Data virtualization enabled a culture of “see before you build”.  Successfully transitioned to subscription-based licensing.  For the first time, Autodesk can do single point security enforcement and have uniform data environment for access. Autodesk, Inc. is an American multinational software corporation that makes software for the architecture, engineering, construction, manufacturing, media, and entertainment industries.
  • 64. Q&A
  • 65. Thanks! www.denodo.com info@denodo.com © Copyright Denodo Technologies. All rights reserved Unless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and microfilm, without prior the written authorization from Denodo Technologies.