Data lakes often fail because they are only accessible by highly-skilled data scientists and not by business users. But BI tools have been able to access data warehouses for years, so what gives?
In this talk, we’ll discuss:
- Why existing BI tools are architected well for data warehouses, but not data lakes.
- The pros and cons of each architecture.
- Why every organization should have two BI standards: one for data warehouses and one for data lakes.
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
A Tale of Two BI Standards
1. Arcadia Data. Proprietary and Confidential
A Tale of Two BI Standards
Data Warehouses and Data Lakes
Randy Lea, Chief Solutions Officer
Arcadia Data
March 8, 2018
2. Arcadia Data. Proprietary and Confidential
25+ years in Enterprise Analytics
§ 10+ years Sales, Sales Management, Business Consulting
§ 10+ years Product Marketing, Product Management, Business
Development
Companies
§ Teradata
§ Aster Data
§ Think Big
§ Arcadia Data
2
Quick Background
3. Arcadia Data. Proprietary and Confidential
Anyone Remember the 3 V’s?
Volume
Variety
Velocity
3
Why have Many Big Data Initiatives Failed?
6. Arcadia Data. Proprietary and Confidential
A data lake is a method of storing data within a system or repository, in its
natural format, that facilitates the collocation of data in various schemata and
structural forms, usually object blobs or files. The idea of data lake is to have
a single store of all data in the enterprise ranging from raw data (which
implies exact copy of source system data) to transformed data which is used
for various tasks including reporting, visualization, analytics and machine
learning. The data lake includes structured data from relational databases
(rows and columns), semi-structured data (CSV, logs, XML, JSON),
unstructured data (emails, documents, PDFs) and even binary data (images,
audio, video) thus creating a centralized data store accommodating all forms
of data.
6
Wikipedia – Data Lake
7. Arcadia Data. Proprietary and Confidential
A data swamp is a deteriorated data lake, that is inaccessible to its
intended users and provides little value.
7
Wikipedia – Data Swamp
Gartner - Strategic Planning Assumptions
§ Through 2018, 90% of deployed data lakes will be rendered useless,
as they're overwhelmed with information assets captured for
uncertain use cases.
Derive Value From Data Lakes Using Analytics Design Patterns
§ Published: 26 September 2017 ID: G00332852
§ Analyst(s): Svetlana Sicular, Joao Tapadinhas, Cindi Howson
8. Arcadia Data. Proprietary and Confidential
Anyone Know the 4th V?
Value
8
Why have Most Data Lake Initiatives Failed?
9. Arcadia Data. Proprietary and Confidential
#1 Challenge – Determining Business Value
9
0
10
20
30
40
50
60
52
42
33
31
28
27
26
26
19
12
5
Percent,%
Integrating
multiple
data sources
Defining our
strategy
Funding
Risk and
governance
Skills and
capabilities
Determining
value
Source: Gartner Big Data Adoption Survey, September 2016
2016 Survey (n = 184)
Integrating
with existing
infrastructure
Leadership or
Org. issues
Other
Understanding
what is “big data”
Infrastructure/
architecture
10. Arcadia Data. Proprietary and Confidential
Remember MapReduce?
§ Procedural Code, Java, etc
What’s Inside the Lake?
§ “Found a Tool Called Erwin”
What’s the #1 Access API for
Hadoop?
§ I’m Not Sure but Bet it is SQL
10
Business User Challenges Accessing the Data Lake
SQL is the Language of Business
11. Arcadia Data. Proprietary and Confidential
“Data” and “Platforms" Have Changed – Why Haven’t BI Tools?
From To
Data
Platforms
BI Tools
rows and columns and multi-structured
batch and interactive and real-time
small and large volumes
many sources
internal and external
tables and doc’s, search indexes, events
schema on write and schema on read
commodity hardware
ETL and ELT and ELDT
data warehouses and data lakes
rows and columns
batch
smaller data volumes
limited # sources
mainly internal
tables
schema on write
proprietary hardware
ETL
data warehouses
SQL queries
extracts
cubes
BI servers
small/med scale
Why haven’t BI
tools evolved?
12. Arcadia Data. Proprietary and Confidential
Would you use Water Skis
to Ski Down a Mountain?
Why Not Use Any BI Tool? Architecture Built for a Purpose
Then Why Would you use a
Data Warehouse Tool
on a Data Lake?
13. Arcadia Data. Proprietary and Confidential
Companies are now Choosing Two BI Standards for Their Enterprise
13
Data Warehouse
(Database)
Data Lake
(Hadoop/Cloud)
BI Standard for
Data Warehouse
(Database)
BI Standard for
Data Lake
(Hadoop/Cloud)
14. Arcadia Data. Proprietary and Confidential
Data Warehouse BI Architecture
14
BI Server
Data Warehouse
(Database)
Analytic Process
Load Data
Secure Data
Semantic Layer
Optimize Physical
Big Data Requirements
Native Connection
Semi-Structured Data
BI/SQL Aware
Parallel Execution
15. Arcadia Data. Proprietary and Confidential
Data Lake BI Architecture – Native BI for Big Data
15
BI Server
Big Data Requirements
Native Connection
Semi-Structured Data
BI/SQL Aware
Parallel Execution
Data Warehouse
(Database)
Data Lake
(Hadoop/Cloud)
Analytic Process
Load Data
Secure Data
Semantic Layer
Optimize Physical
Arcadia Data was built
from inception to
run natively within data lakes
16. Arcadia Data. Proprietary and Confidential
The Result: Faster BI Analytics and Higher User Concurrency
16
25 35
88 105
169
427404
644
1440
120
214
366
199
379.107
687
0
200
400
600
800
1000
1200
1400
1 2 5 10 15 30
Completion Time (seconds)
# of Concurrent Jobs
Query 1 Performance Testing - Heavy Query
Arcadia Hive Impala Spark
Customer Benchmark of a Legacy BI Tool Accelerated by Arcadia Data On a MapR Data Lake
17. Arcadia Data. Proprietary and Confidential
Data Lake BI Architecture – Big Data is More than Data at Rest
17
Visual
Analytics
Browser
Streams/Topics
Real-Time Data
Data Warehouse
(Database)
Data Lake
(Hadoop/Cloud) ADLS
Arcadia Data was built
from inception to
run natively within data lakes
18. Arcadia Data. Proprietary and Confidential
BI for Data Lakes must be Architected for Scale and Performance
BI Server
Data Warehouse BI Architecture
• BI Server can’t scale
• Significant data movement, modeling, security
management
Data Lake Cluster
Edge Node BI ServerBrowser
“Big Data” BI Architecture
• Edge node BI server only scales via long planning
• Performance optimizations require heavy IT intervention
• Only passing SQL with no semantic information (e.g., filters)
Data Lake Cluster
DataNode + Arcadia
Native Hadoop Data Lake BI
Architecture
• Scales linearly with DataNodes while retaining agility
• Semantic model is “pushed down” and distributed
• Highly optimized “based on usage” physical model
• No data movement; single security model
Data Lake Cluster
Browser
DataNodesBrowser
DataNodes
19. Arcadia Data. Proprietary and Confidential
Data Lake BI/Visual Analytics Market
19
Data Warehouse
(Database)
Data Lake
(Hadoop/Cloud)
Arcadia Data was built
from inception to
run natively within data lakes
20. Arcadia Data. Proprietary and Confidential
Time to Value and Production – Architecture and Analytic Process
Model Data
Land and
Secure Data
Semantic and
Visual/Analytic
Discovery
Production
RDBMS
DATA
WAREHOUSE
PLATFORM
Data Warehouse Load, Model and Go “Build it and they will Come”
It is also About the Analytic Process Improvement
It is not Just About System Architecture
21. Arcadia Data. Proprietary and Confidential
Time to Value and Production – Architecture and Analytic Process
Model Data
Land and
Secure Data
Semantic and
Visual/Analytic
Discovery
Production
Extract and Load
- ETL servers
- ELT In-database
Transform
- Put into Tables
- Star-Scheme or
denormalized 3NF
Discovery and Reports
- Build Semantic Layer
- Design Report
Layout
Productionize
- Optimize
Physical Scheme
Weeks and Months in Most Companies Weeks and Often
Discovery Only Run
Once
Optimize in
Database or BI
Tool or Both?
Data Warehouse Load, Model and Go “Build it and they will Come”
22. Arcadia Data. Proprietary and Confidential
Time to Value and Production – Architecture and Analytic Process
Model Data
Land and
Secure Data
Semantic and
Visual/Analytic
Discovery
Production
RDBMS
DATA
WAREHOUSE
PLATFORM
Extract and
Secure
Load and
Secure
Transform
Cubes or Aggregates
Transform
Star-Scheme or 3NF
Build Semantic Layer
Productionize
Optimize Physical
Productionize
Optimize Physical
Build Semantic Layer
Discovery and Reports
Data Warehouse (RDBMS)
Data Warehouse BI Server
DBA
Analytics Team
Data Warehouse Load, Model and Go “Build it and they will Come”
23. Arcadia Data. Proprietary and Confidential
Time to Value and Production – Architecture and Analytic Process
Model Data
Land and
Secure Data
Semantic and
Visual/Analytic
Discovery
Production
RDBMS
DATA
WAREHOUSE
PLATFORM
Extract and
Secure
Load and
Secure
Transform
Cubes or Aggregates
Transform
Star-Scheme or 3NF
Build Semantic Layer
Productionize
Optimize Physical
Productionize
Optimize Physical
Build Semantic Layer
Discovery and Reports
Data Warehouse (RDBMS)
Data Warehouse BI Server
DBA
Data Warehouse Load, Model and Go “Build it and they will Come”
Analytics Team
Time to Value Delayed
Weeks and Months
24. Arcadia Data. Proprietary and Confidential
Time to Value and Production – Architecture and Analytic Process
Model Data
Land and
Secure Data
Semantic and
Analytic/Visual
Discovery
Production
RDBMS
DATA
WAREHOUSE
PLATFORM
Extract and
Secure
Load and
Secure
Transform
Cubes or Aggregates
Transform
Star-Scheme or 3NF
Build Semantic Layer
Productionize
Optimize Physical
Productionize
Optimize Physical
Build Semantic Layer
Discovery and Reports
Data Lake (Hadoop/Cloud)
Data Warehouse BI Server
Data Lake Load, Model and Go “Build it and they will Come”
Analytics Team
Hadoop/Cloud Admin
25. Arcadia Data. Proprietary and Confidential
Time to Value and Production – Architecture and Analytic Process
Model Data
Land and
Secure Data
Semantic and
Visual/Analytic
Discovery
Production
RDBMS
DATA
WAREHOUSE
PLATFORM
Extract and
Secure
Load and
Secure
Transform
Cubes or Aggregates
Transform
Star-Scheme or 3NF
Build Semantic Layer
Productionize
Optimize Physical
Productionize
Optimize Physical
Build Semantic Layer
Discovery and Reports
Data Lake (Hadoop/Cloud)
Data Warehouse BI Server
Data Lake Load, Model and Go “Build it and they will Come”
Analytics Team
Hadoop/Cloud Admin
Data Warehouse BI Tools Treat
Hadoop/Cloud Just Like any
Other Database
Time to Value Delayed
Weeks and Months
26. Arcadia Data. Proprietary and Confidential
Time to Value and Production – Architecture and Analytic Process
Model Data
Land and
Secure Data
Semantic and
Visual/Analytic
Discovery
Production
RDBMS
DATA
WAREHOUSE
PLATFORM
Load and
Secure
Transform
Star-Scheme or 3NF
Build Semantic Layer Productionize
Optimize Physical
Data Lake (Hadoop/Cloud)
Data Lake Load and Go “Discover to Production”
Analytics Team
Hadoop/Cloud Admin
Native BI for Data Lakes
ELDT
27. Arcadia Data. Proprietary and Confidential
Time to Value and Production – Architecture and Analytic Process
Model Data
Land and
Secure Data
Semantic and
Visual/Analytic
Discovery
Production
RDBMS
DATA
WAREHOUSE
PLATFORM
Load and
Secure
Transform
Star-Scheme or 3NF
Build Semantic Layer Productionize
Optimize Physical
Data Lake (Hadoop/Cloud)
Data Lake Load and Go “Discover to Production”
Hadoop/Cloud Admin
Native BI for Data Lakes Analytics Team
Extract Load “Discover” Transform
Model Based on Usage
28. Arcadia Data. Proprietary and Confidential
Time to Value and Production – Architecture and Analytic Process
Land and
Secure Data
RDBMS
DATA
WAREHOUSE
PLATFORM
Load and
Secure
Semantic and
Visual/Analytic
Discovery
Build Semantic Layer
Model Data
Transform
Star-Scheme or 3NF
Production
Productionize
Optimize Physical
Data Lake (Hadoop/Cloud)
Data Lake Load and Go “Discover to Production”
Hadoop/Cloud Admin
Native BI for Data Lakes
$100,000 in Business Value in 30 Days
or We Pick Up and Go Home
Analytics Team
Time to Value
In Days
30. Arcadia Data. Proprietary and Confidential
Business Analysts Can Enrich Data with Their Own Table Joins
AGILITY
31. Arcadia Data. Proprietary and Confidential
§ Intuitive and Visual UI that Anyone
Can Use
§ Accessed via web-browser
§ Easy to compose visuals, dashboards and
apps via drag and drop
§ Get recommendations via machine-
assisted insights
§ Benefits
§ Unlocks big data analytics for business
users and analysts
§ Promotes agility and reduces time to
insight
§ Enables business self-sufficiency and
relieves burden on IT
Drag and Drop Dashboards and Applications – No Coding
DIVERSITY
32. Arcadia Data. Proprietary and Confidential
Instant Visuals – AI-Based Visualization Recommendations
Select data fields, then one click…
Visualization Builder Recommended Visualizations
shows which visuals best represent your data.
AGILITY
33. Arcadia Data. Proprietary and Confidential
Query acceleration for
scale, performance,
and concurrency
Smart Acceleration Leverages What Is Learned during Data Discovery
Ad hoc
queries
Arcadia Enterprise makes
recommendations –
build these with a click.
Hadoop Cluster
• Fast query responses
• Minimal modeling
• Live acceleration (no downtime)
All
Granular
Data
Analytical
Views
Accelerated
application queries
SCALE
34. Arcadia Data. Proprietary and Confidential
§ Easy to Combine Extremely
Diverse Data
§ Blend streaming, complex and
unstructured data with traditional
enterprise content
§ Add new data from local and remote file
systems including cloud storage
§ Interact with all data via business friendly
semantic layer
§ Benefits
§ Build rich multi-source dashboards
§ Unlock valuable insights from sources
previously considered out of reach
§ Masks database complexity and
technical terminology from business
users
Business Access to All the Data
Data Hub / Data Lake Data Warehouse / Data Mart
Data Storage & NoSQL
Structured Data Unstructured Data Streaming Data Relational Data
Self-Service Data
Raw Data
JSON
Delimited
Business Friendly Semantic Layer
DIVERSITY
35. Arcadia Data. Proprietary and Confidential
Visual Data Linkage between Multiple Data Sources
Streami
ng Visu
al
from
Kafka
Visual
from
RDBMS
Visuals
from
Hadoop
36. Arcadia Data. Proprietary and Confidential
Arcadia Enterprise Handles the Complexity for You
No ETL Needed to Flatten
Data
Supports Modern ARRAY, STRUCT, MAP
Complex Types and Nested Schemas
SELECT c.name, sum(i.amount)
FROM customers c, c.orders.items i
GROUP BY 1
Simple Drag and Drop
Experience
Translates Complex Structure into
Intuitive Field Browser
No Flattening at Query
Time
Generates Native SQL for Complex
Types
Understands Complex Structures Easy Self-Service UI Powerful Native SQL
AGILITY
37. Arcadia Data. Proprietary and Confidential
Unlock Unstructured Data not Reachable by Legacy BI
Source-aware search box appears in
Visual Designer
AGILITY
38. Arcadia Data. Proprietary and Confidential38
Customer Value of Arcadia Data
Ad tech
Cybersecurity app to capture
investigative workflows, real-
time incident response, and
guided data exploration
Developed a new SaaS self-
service analytics platform to
give their customers better
marketing attribution
BI standard for data lake.
Gives global brand
managers digital campaign
intelligence across 100+
brandsINNOVATION
REDUCE RISK
Government
Improve patient outcomes
on 10+ million members by
predicting and controlling re-
admission risk.
Turn IoT data from enterprise
data servers into meaningful
lifecycle analytics data
service
BI standard for data lake.
Reduce credit card default
risk across retailers.
Fortune 50 CPG
Company
39. Arcadia Data. Proprietary and Confidential39
Arcadia Data was Built
from Inception
to Run Natively within Data Lakes