SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Downloaden Sie, um offline zu lesen
O C T O B E R 1 8 , 2 0 1 6 S A N F R A N C I S C O B A Y A R E A , C A
#DenodoDataFest
RAPID, AGILE DATA STRATEGIES
For Accelerating Analytics, Cloud, and Big Data Initiatives.
Logical Data Lakes: Performance
Considerations
Dr. Alberto Pan
Denodo, CTO
Agenda
1.Logical Architectures for Analytics: The Role of Data Virtualization
2.Example Scenario: The Numbers
3.Performance: The “Move Processing to the Data” Paradigm
4.Performance: Cost-based Optimization in Logical Architectures
Logical Architectures for Analytics
The Role of Data Virtualization
5
The Logical Data Warehouse
6
Logical Data Lake
Real-Time
Decision
Management
Alerts
Scorecards
Dashboards
Reporting
Data Discovery
Self-Service
Search
Predictive
Analytics
Statistical
Analytics (R)
Text Analytics
Data Mining
Data Warehouse
Sensor Data
Machine Data (Logs)
Social Data
Clickstream Data
Internet Data
Image and Video
Enterprise Content
(Unstructured)
Big
Data
Enterprise
Applications
Traditional
Enterprise
Data
Cloud
Cloud
Applications
Metadata Management, Data Governance, Data Security
NoSQL
EDW
In-Memory
(SAP Hana, …)
Analytical
Appliances
Cloud DW
(Redshift,..)
ODS
Big Data
E
T
L
C
D
C
S
q
o
o
p
(Flume, Kafka, …)
Real-Time Data Access (On-Demand / Streaming)
Batch
YARN / Workload Management
HDFS
Hive
Spark
Drill
Impala
Storm HBase Solr
Hunk
DW Streams NoSQL SearchSQL
Hadoop
Tez
Map
Red.
7
The Logical Data Warehouse
8
The Logical Data Warehouse: Data Virtualization
9
Logical Data Lake
Real-Time
Decision
Management
Alerts
Scorecards
Dashboards
Reporting
Data Discovery
Self-Service
Search
Predictive
Analytics
Statistical
Analytics (R)
Text Analytics
Data Mining
Data Warehouse
Sensor Data
Machine Data (Logs)
Social Data
Clickstream Data
Internet Data
Image and Video
Enterprise Content
(Unstructured)
Big
Data
Enterprise
Applications
Traditional
Enterprise
Data
Cloud
Cloud
Applications
Metadata Management, Data Governance, Data Security
NoSQL
EDW
In-Memory
(SAP Hana, …)
Analytical
Appliances
Cloud DW
(Redshift,..)
ODS
Big Data
E
T
L
C
D
C
S
q
o
o
p
(Flume, Kafka, …)
Real-Time Data Access (On-Demand / Streaming)
Batch
YARN / Workload Management
HDFS
Hive
Spark
Drill
Impala
Storm HBase Solr
Hunk
DW Streams NoSQL SearchSQL
Hadoop
Tez
Map
Red.
10
Logical Data Lake: Data Virtualization
Real-Time
Decision
Management
Alerts
Scorecards
Dashboards
Reporting
Data Discovery
Self-Service
Search
Predictive
Analytics
Statistical
Analytics (R)
Text Analytics
Data Mining
Data Warehouse
Sensor Data
Machine Data (Logs)
Social Data
Clickstream Data
Internet Data
Image and Video
Enterprise Content
(Unstructured)
Big
Data
Enterprise
Applications
Traditional
Enterprise
Data
Cloud
Cloud
Applications
NoSQL
EDW
In-Memory
(SAP Hana, …)
Analytical
Appliances
Cloud DW
(Redshift,..)
ODS
Big Data
E
T
L
C
D
C
S
q
o
o
p
(Flume, Kafka, …)
Data Virtualization
Real-Time Data Access (On-Demand / Streaming)
Data Caching
DataServices
Data Search & Discovery
Governance
Security
Optimization
DataAbstraction
DataTransformation
DataFederation
Batch
YARN / Workload Management
HDFS
Hive
Spark
Drill
Impala
Storm HBase Solr
Hunk
DW Streams NoSQL SearchSQL
Hadoop
Tez
Map
Red.
11
The Role of Data Virtualization
Combines data from several systems and publish it to the desired
format with a few clicks
Denodo Data Virtualization is the only option verifying:
12
Example: Distributed Report
Demo Scenario of LDW
Sales Data (TPC-DS)
(280M)
Customer (TPC-DS)
(2M)
MDM
12
SQL on Hadoop
Total Sales by Customer
13
The Role of Data Virtualization
Combines data from several systems and publish it to the desired
format with a few clicks
Abstracts applications from changes in the underlying infrastructure
Expose different logical views over the same data
Single entry point to apply Security and Governance
Denodo Data Virtualization is the only option verifying:
14
DW + MDM Dimensions
Common LDW Patterns
Time
Dimension
Fact table
(sales) Product
Dimension
Retailer
Dimension
EDW MDM
14
DW Historical Offloading (Cold Data Storage)
Common LDW Patterns
Time Dimension Fact table
(sales) Product Dimension
Retailer
Dimension
EDW Hadoop
Current Sales Historical Sales
15
DW + Cloud Dimensional Data
Common LDW Patterns
Time Dimension Fact table
(sales) Product Dimension
Customer
Dimension
EDW CRM
SFDC
Customer
16
Data Warehouse Federation
Common LDW Patterns
Time
Dimensi
on
Fact table
(sales)
Customer
Dimension
Region
EDW
City
EDW
Fact table
(Fidelity)
Customer
Dimension
17
Example Scenario: The Numbers
19
Denodo has done extensive testing using queries from the standard benchmarking test
TPC-DS* and the following scenario
Compares the performance of a federated approach in Denodo with an MPP system where
all the data has been replicated via ETL
Customer Dim.
2 M rows
Sales Facts
290 M rows
Items Dim.
400 K rows
* TPC-DS is the de-facto industry standard benchmark for
measuring the performance of decision support solutions including,
but not limited to, Big Data systems.
vs.
Sales Facts
290 M rows
Items Dim.
400 K rows
Customer Dim.
2 M rows
Denodo 6.0 Architecture
Performance Comparison – Logical Data Warehouse vs. Physical Data Warehouse
20
Denodo 6.0 Architecture
Query Description
Returned
Rows
Time Netezza
Time Denodo
(Federated Oracle,
Netezza & SQL Server)
Optimization Technique
(automatically selected)
Total sales by customer 1,99 M 20.9 sec. 21.4 sec. Full aggregation push-down
Total sales by customer and
year between 2000 and 2004
5,51 M 52.3 sec. 59.0 sec Full aggregation push-down
Total sales by item brand 31,35 K 4.7 sec. 5.0 sec. Partial aggregation push-down
Total sales by item where
sale price less than current
list price
17,05 K 3.5 sec. 5.2 sec On the fly data movement
Performance Comparison – Logical Data Warehouse vs. Physical Data Warehouse
Performance
The “Move Processing to the Data” Paradigm
22
Move Processing to the Data
Process the data where it resides
Process the data locally where
it resides
DV System combines partial
results
Minimizes network traffic
Leverages specialized data
sources
23
Move Processing to the Data: Example 1
Obtain Total Sales By Product (Naive Strategy)
Naive Strategy:
350M rows moved through the network
24
Move Processing to the Data: Example 1
Obtain Total Sales By Product (Move Processing to the Data)
Denodo Strategy:
30k rows moved through the network
25
Move Processing to the Data: Example 1 (Alternative)
Obtain Total Sales By Product (Dimension Table Replicated)
Denodo Strategy:
Dimension Table Replicated
Denodo Strategy:
20k rows moved through the network
26
Move Processing to the Data: Example 2
Execution Strategy:
Full aggregation pushdown not possible
Two possible techniques:
- On-the-fly data movement
- Partial aggregation pushdown
Maximum Sales Discount By Product in The last year
Sales Discount: list_price (Product) – sale_price (Sales)
27
Move Processing to the Data: Example 2
Maximum Sales Discount By Product in the last year: On-the-fly Data Movement
Move Products Data to a Temp table in the DW :
20K rows moved through the network + 10K
rows inserted in the DW
Execute full query on the DW:
10k rows through the network
28
Move Processing to the Data: Example 2
Maximum Sales Discount By Product in the last year: Partial aggregation Pushdown
Products DB:
10K rows through the network
Data Warehouse:
#rows through the network = 10K * average
#sale_prices_per_product
Performance
Cost-based Optimization in Logical Architectures
30
How to Choose the Best Execution Plan?
Cost-Based Optimization in Data Virtualization
Data statistics to estimate size of intermediate result sets
Data Source Indexes (and other physical structures)
Execution Model of data sources: e.g. Parallel Databases VS
Hadoop clusters VS Relational Databases
Features of data sources (e.g. number of processing cores in
parallel database or Hadoop Cluster)
Data Transfer rate
Must take into account:
Q&A
Find more details at: datavirtualization.blog
http://www.datavirtualizationblog.com/myths-in-data-
virtualization-performance/
Thank you!
© Copyright Denodo Technologies. All rights reserved
Unless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and
microfilm, without prior the written authorization from Denodo Technologies.
O C T O B E R 1 8 , 2 0 1 6 S A N F R A N C I S C O B A Y A R E A , C A
#DenodoDataFest

Weitere ähnliche Inhalte

Mehr von Denodo

Enterprise Monitoring and Auditing in Denodo
Enterprise Monitoring and Auditing in DenodoEnterprise Monitoring and Auditing in Denodo
Enterprise Monitoring and Auditing in DenodoDenodo
 
Lunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps Approach
Lunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps ApproachLunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps Approach
Lunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps ApproachDenodo
 
Achieving Self-Service Analytics with a Governed Data Services Layer
Achieving Self-Service Analytics with a Governed Data Services LayerAchieving Self-Service Analytics with a Governed Data Services Layer
Achieving Self-Service Analytics with a Governed Data Services LayerDenodo
 
What you need to know about Generative AI and Data Management?
What you need to know about Generative AI and Data Management?What you need to know about Generative AI and Data Management?
What you need to know about Generative AI and Data Management?Denodo
 
Mastering Data Compliance in a Dynamic Business Landscape
Mastering Data Compliance in a Dynamic Business LandscapeMastering Data Compliance in a Dynamic Business Landscape
Mastering Data Compliance in a Dynamic Business LandscapeDenodo
 
Denodo Partner Connect: Business Value Demo with Denodo Demo Lite
Denodo Partner Connect: Business Value Demo with Denodo Demo LiteDenodo Partner Connect: Business Value Demo with Denodo Demo Lite
Denodo Partner Connect: Business Value Demo with Denodo Demo LiteDenodo
 
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...Denodo
 
Drive Data Privacy Regulatory Compliance
Drive Data Privacy Regulatory ComplianceDrive Data Privacy Regulatory Compliance
Drive Data Privacy Regulatory ComplianceDenodo
 
Знакомство с виртуализацией данных для профессионалов в области данных
Знакомство с виртуализацией данных для профессионалов в области данныхЗнакомство с виртуализацией данных для профессионалов в области данных
Знакомство с виртуализацией данных для профессионалов в области данныхDenodo
 
Data Democratization: A Secret Sauce to Say Goodbye to Data Fragmentation
Data Democratization: A Secret Sauce to Say Goodbye to Data FragmentationData Democratization: A Secret Sauce to Say Goodbye to Data Fragmentation
Data Democratization: A Secret Sauce to Say Goodbye to Data FragmentationDenodo
 
Denodo Partner Connect - Technical Webinar - Ask Me Anything
Denodo Partner Connect - Technical Webinar - Ask Me AnythingDenodo Partner Connect - Technical Webinar - Ask Me Anything
Denodo Partner Connect - Technical Webinar - Ask Me AnythingDenodo
 
Lunch and Learn ANZ: Key Takeaways for 2023!
Lunch and Learn ANZ: Key Takeaways for 2023!Lunch and Learn ANZ: Key Takeaways for 2023!
Lunch and Learn ANZ: Key Takeaways for 2023!Denodo
 
It’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way Forward
It’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way ForwardIt’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way Forward
It’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way ForwardDenodo
 
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...Denodo
 
Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...
Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...
Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...Denodo
 
How to Build Your Data Marketplace with Data Virtualization?
How to Build Your Data Marketplace with Data Virtualization?How to Build Your Data Marketplace with Data Virtualization?
How to Build Your Data Marketplace with Data Virtualization?Denodo
 
Webinar #2 - Transforming Challenges into Opportunities for Credit Unions
Webinar #2 - Transforming Challenges into Opportunities for Credit UnionsWebinar #2 - Transforming Challenges into Opportunities for Credit Unions
Webinar #2 - Transforming Challenges into Opportunities for Credit UnionsDenodo
 
Enabling Data Catalog users with advanced usability
Enabling Data Catalog users with advanced usabilityEnabling Data Catalog users with advanced usability
Enabling Data Catalog users with advanced usabilityDenodo
 
Denodo Partner Connect: Technical Webinar - Architect Associate Certification...
Denodo Partner Connect: Technical Webinar - Architect Associate Certification...Denodo Partner Connect: Technical Webinar - Architect Associate Certification...
Denodo Partner Connect: Technical Webinar - Architect Associate Certification...Denodo
 
GenAI y el futuro de la gestión de datos: mitos y realidades
GenAI y el futuro de la gestión de datos: mitos y realidadesGenAI y el futuro de la gestión de datos: mitos y realidades
GenAI y el futuro de la gestión de datos: mitos y realidadesDenodo
 

Mehr von Denodo (20)

Enterprise Monitoring and Auditing in Denodo
Enterprise Monitoring and Auditing in DenodoEnterprise Monitoring and Auditing in Denodo
Enterprise Monitoring and Auditing in Denodo
 
Lunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps Approach
Lunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps ApproachLunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps Approach
Lunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps Approach
 
Achieving Self-Service Analytics with a Governed Data Services Layer
Achieving Self-Service Analytics with a Governed Data Services LayerAchieving Self-Service Analytics with a Governed Data Services Layer
Achieving Self-Service Analytics with a Governed Data Services Layer
 
What you need to know about Generative AI and Data Management?
What you need to know about Generative AI and Data Management?What you need to know about Generative AI and Data Management?
What you need to know about Generative AI and Data Management?
 
Mastering Data Compliance in a Dynamic Business Landscape
Mastering Data Compliance in a Dynamic Business LandscapeMastering Data Compliance in a Dynamic Business Landscape
Mastering Data Compliance in a Dynamic Business Landscape
 
Denodo Partner Connect: Business Value Demo with Denodo Demo Lite
Denodo Partner Connect: Business Value Demo with Denodo Demo LiteDenodo Partner Connect: Business Value Demo with Denodo Demo Lite
Denodo Partner Connect: Business Value Demo with Denodo Demo Lite
 
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
 
Drive Data Privacy Regulatory Compliance
Drive Data Privacy Regulatory ComplianceDrive Data Privacy Regulatory Compliance
Drive Data Privacy Regulatory Compliance
 
Знакомство с виртуализацией данных для профессионалов в области данных
Знакомство с виртуализацией данных для профессионалов в области данныхЗнакомство с виртуализацией данных для профессионалов в области данных
Знакомство с виртуализацией данных для профессионалов в области данных
 
Data Democratization: A Secret Sauce to Say Goodbye to Data Fragmentation
Data Democratization: A Secret Sauce to Say Goodbye to Data FragmentationData Democratization: A Secret Sauce to Say Goodbye to Data Fragmentation
Data Democratization: A Secret Sauce to Say Goodbye to Data Fragmentation
 
Denodo Partner Connect - Technical Webinar - Ask Me Anything
Denodo Partner Connect - Technical Webinar - Ask Me AnythingDenodo Partner Connect - Technical Webinar - Ask Me Anything
Denodo Partner Connect - Technical Webinar - Ask Me Anything
 
Lunch and Learn ANZ: Key Takeaways for 2023!
Lunch and Learn ANZ: Key Takeaways for 2023!Lunch and Learn ANZ: Key Takeaways for 2023!
Lunch and Learn ANZ: Key Takeaways for 2023!
 
It’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way Forward
It’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way ForwardIt’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way Forward
It’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way Forward
 
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...
 
Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...
Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...
Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...
 
How to Build Your Data Marketplace with Data Virtualization?
How to Build Your Data Marketplace with Data Virtualization?How to Build Your Data Marketplace with Data Virtualization?
How to Build Your Data Marketplace with Data Virtualization?
 
Webinar #2 - Transforming Challenges into Opportunities for Credit Unions
Webinar #2 - Transforming Challenges into Opportunities for Credit UnionsWebinar #2 - Transforming Challenges into Opportunities for Credit Unions
Webinar #2 - Transforming Challenges into Opportunities for Credit Unions
 
Enabling Data Catalog users with advanced usability
Enabling Data Catalog users with advanced usabilityEnabling Data Catalog users with advanced usability
Enabling Data Catalog users with advanced usability
 
Denodo Partner Connect: Technical Webinar - Architect Associate Certification...
Denodo Partner Connect: Technical Webinar - Architect Associate Certification...Denodo Partner Connect: Technical Webinar - Architect Associate Certification...
Denodo Partner Connect: Technical Webinar - Architect Associate Certification...
 
GenAI y el futuro de la gestión de datos: mitos y realidades
GenAI y el futuro de la gestión de datos: mitos y realidadesGenAI y el futuro de la gestión de datos: mitos y realidades
GenAI y el futuro de la gestión de datos: mitos y realidades
 

Kürzlich hochgeladen

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 

Kürzlich hochgeladen (20)

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 

Denodo DataFest 2016: Logical Data Lakes: Performance Considerations

  • 1. O C T O B E R 1 8 , 2 0 1 6 S A N F R A N C I S C O B A Y A R E A , C A #DenodoDataFest RAPID, AGILE DATA STRATEGIES For Accelerating Analytics, Cloud, and Big Data Initiatives.
  • 2. Logical Data Lakes: Performance Considerations Dr. Alberto Pan Denodo, CTO
  • 3. Agenda 1.Logical Architectures for Analytics: The Role of Data Virtualization 2.Example Scenario: The Numbers 3.Performance: The “Move Processing to the Data” Paradigm 4.Performance: Cost-based Optimization in Logical Architectures
  • 4. Logical Architectures for Analytics The Role of Data Virtualization
  • 5. 5 The Logical Data Warehouse
  • 6. 6 Logical Data Lake Real-Time Decision Management Alerts Scorecards Dashboards Reporting Data Discovery Self-Service Search Predictive Analytics Statistical Analytics (R) Text Analytics Data Mining Data Warehouse Sensor Data Machine Data (Logs) Social Data Clickstream Data Internet Data Image and Video Enterprise Content (Unstructured) Big Data Enterprise Applications Traditional Enterprise Data Cloud Cloud Applications Metadata Management, Data Governance, Data Security NoSQL EDW In-Memory (SAP Hana, …) Analytical Appliances Cloud DW (Redshift,..) ODS Big Data E T L C D C S q o o p (Flume, Kafka, …) Real-Time Data Access (On-Demand / Streaming) Batch YARN / Workload Management HDFS Hive Spark Drill Impala Storm HBase Solr Hunk DW Streams NoSQL SearchSQL Hadoop Tez Map Red.
  • 7. 7 The Logical Data Warehouse
  • 8. 8 The Logical Data Warehouse: Data Virtualization
  • 9. 9 Logical Data Lake Real-Time Decision Management Alerts Scorecards Dashboards Reporting Data Discovery Self-Service Search Predictive Analytics Statistical Analytics (R) Text Analytics Data Mining Data Warehouse Sensor Data Machine Data (Logs) Social Data Clickstream Data Internet Data Image and Video Enterprise Content (Unstructured) Big Data Enterprise Applications Traditional Enterprise Data Cloud Cloud Applications Metadata Management, Data Governance, Data Security NoSQL EDW In-Memory (SAP Hana, …) Analytical Appliances Cloud DW (Redshift,..) ODS Big Data E T L C D C S q o o p (Flume, Kafka, …) Real-Time Data Access (On-Demand / Streaming) Batch YARN / Workload Management HDFS Hive Spark Drill Impala Storm HBase Solr Hunk DW Streams NoSQL SearchSQL Hadoop Tez Map Red.
  • 10. 10 Logical Data Lake: Data Virtualization Real-Time Decision Management Alerts Scorecards Dashboards Reporting Data Discovery Self-Service Search Predictive Analytics Statistical Analytics (R) Text Analytics Data Mining Data Warehouse Sensor Data Machine Data (Logs) Social Data Clickstream Data Internet Data Image and Video Enterprise Content (Unstructured) Big Data Enterprise Applications Traditional Enterprise Data Cloud Cloud Applications NoSQL EDW In-Memory (SAP Hana, …) Analytical Appliances Cloud DW (Redshift,..) ODS Big Data E T L C D C S q o o p (Flume, Kafka, …) Data Virtualization Real-Time Data Access (On-Demand / Streaming) Data Caching DataServices Data Search & Discovery Governance Security Optimization DataAbstraction DataTransformation DataFederation Batch YARN / Workload Management HDFS Hive Spark Drill Impala Storm HBase Solr Hunk DW Streams NoSQL SearchSQL Hadoop Tez Map Red.
  • 11. 11 The Role of Data Virtualization Combines data from several systems and publish it to the desired format with a few clicks Denodo Data Virtualization is the only option verifying:
  • 12. 12 Example: Distributed Report Demo Scenario of LDW Sales Data (TPC-DS) (280M) Customer (TPC-DS) (2M) MDM 12 SQL on Hadoop Total Sales by Customer
  • 13. 13 The Role of Data Virtualization Combines data from several systems and publish it to the desired format with a few clicks Abstracts applications from changes in the underlying infrastructure Expose different logical views over the same data Single entry point to apply Security and Governance Denodo Data Virtualization is the only option verifying:
  • 14. 14 DW + MDM Dimensions Common LDW Patterns Time Dimension Fact table (sales) Product Dimension Retailer Dimension EDW MDM 14
  • 15. DW Historical Offloading (Cold Data Storage) Common LDW Patterns Time Dimension Fact table (sales) Product Dimension Retailer Dimension EDW Hadoop Current Sales Historical Sales 15
  • 16. DW + Cloud Dimensional Data Common LDW Patterns Time Dimension Fact table (sales) Product Dimension Customer Dimension EDW CRM SFDC Customer 16
  • 17. Data Warehouse Federation Common LDW Patterns Time Dimensi on Fact table (sales) Customer Dimension Region EDW City EDW Fact table (Fidelity) Customer Dimension 17
  • 19. 19 Denodo has done extensive testing using queries from the standard benchmarking test TPC-DS* and the following scenario Compares the performance of a federated approach in Denodo with an MPP system where all the data has been replicated via ETL Customer Dim. 2 M rows Sales Facts 290 M rows Items Dim. 400 K rows * TPC-DS is the de-facto industry standard benchmark for measuring the performance of decision support solutions including, but not limited to, Big Data systems. vs. Sales Facts 290 M rows Items Dim. 400 K rows Customer Dim. 2 M rows Denodo 6.0 Architecture Performance Comparison – Logical Data Warehouse vs. Physical Data Warehouse
  • 20. 20 Denodo 6.0 Architecture Query Description Returned Rows Time Netezza Time Denodo (Federated Oracle, Netezza & SQL Server) Optimization Technique (automatically selected) Total sales by customer 1,99 M 20.9 sec. 21.4 sec. Full aggregation push-down Total sales by customer and year between 2000 and 2004 5,51 M 52.3 sec. 59.0 sec Full aggregation push-down Total sales by item brand 31,35 K 4.7 sec. 5.0 sec. Partial aggregation push-down Total sales by item where sale price less than current list price 17,05 K 3.5 sec. 5.2 sec On the fly data movement Performance Comparison – Logical Data Warehouse vs. Physical Data Warehouse
  • 21. Performance The “Move Processing to the Data” Paradigm
  • 22. 22 Move Processing to the Data Process the data where it resides Process the data locally where it resides DV System combines partial results Minimizes network traffic Leverages specialized data sources
  • 23. 23 Move Processing to the Data: Example 1 Obtain Total Sales By Product (Naive Strategy) Naive Strategy: 350M rows moved through the network
  • 24. 24 Move Processing to the Data: Example 1 Obtain Total Sales By Product (Move Processing to the Data) Denodo Strategy: 30k rows moved through the network
  • 25. 25 Move Processing to the Data: Example 1 (Alternative) Obtain Total Sales By Product (Dimension Table Replicated) Denodo Strategy: Dimension Table Replicated Denodo Strategy: 20k rows moved through the network
  • 26. 26 Move Processing to the Data: Example 2 Execution Strategy: Full aggregation pushdown not possible Two possible techniques: - On-the-fly data movement - Partial aggregation pushdown Maximum Sales Discount By Product in The last year Sales Discount: list_price (Product) – sale_price (Sales)
  • 27. 27 Move Processing to the Data: Example 2 Maximum Sales Discount By Product in the last year: On-the-fly Data Movement Move Products Data to a Temp table in the DW : 20K rows moved through the network + 10K rows inserted in the DW Execute full query on the DW: 10k rows through the network
  • 28. 28 Move Processing to the Data: Example 2 Maximum Sales Discount By Product in the last year: Partial aggregation Pushdown Products DB: 10K rows through the network Data Warehouse: #rows through the network = 10K * average #sale_prices_per_product
  • 29. Performance Cost-based Optimization in Logical Architectures
  • 30. 30 How to Choose the Best Execution Plan? Cost-Based Optimization in Data Virtualization Data statistics to estimate size of intermediate result sets Data Source Indexes (and other physical structures) Execution Model of data sources: e.g. Parallel Databases VS Hadoop clusters VS Relational Databases Features of data sources (e.g. number of processing cores in parallel database or Hadoop Cluster) Data Transfer rate Must take into account:
  • 31. Q&A Find more details at: datavirtualization.blog http://www.datavirtualizationblog.com/myths-in-data- virtualization-performance/
  • 32. Thank you! © Copyright Denodo Technologies. All rights reserved Unless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and microfilm, without prior the written authorization from Denodo Technologies. O C T O B E R 1 8 , 2 0 1 6 S A N F R A N C I S C O B A Y A R E A , C A #DenodoDataFest