Data integration is paramount, in this presentation you will find three different paradigms: using client-side tools, creating traditional data warehouses and the data virtualization solution - the logical data warehouse, comparing each other and positioning data virtualization as an integral part of any future-proof IT infrastructure.
This presentation is part of the Fast Data Strategy Conference, and you can watch the video here goo.gl/1q94Ka.
4. 4
1. Reduce corporate data silos to
gain efficiency and
productivity
2. Towards a common data
backbone for operational and
informational use
3. Enterprises going with
bimodal IT in their
modernization efforts
Three Key Trends
5. 5
1. Reduce corporate data silos to
gain efficiency and
productivity
2. Towards a common data
backbone for operational and
informational use
3. Enterprises going with
bimodal IT in their
modernization efforts
• Organizational structures create
specialized data and application
silos
• The proliferation of silos has
inhibited access to and the sharing
of data across the organization
• Consolidating and opening up
these silos (while retaining
ownership and control) will
promote efficiency and productivity
Trend I - Consolidation
6. 6
1. Reduce corporate data silos to
gain efficiency and
productivity
2. Towards a common data
backbone for operational and
informational use
3. Enterprises going with
bimodal IT in their
modernization efforts
• Access to data via logical layer for
common and consistent view of
data assets
• Example: Customer Data
• All analytics, reports, processes,
applications (web, mobile,
desktop) should see same
customer data
• Is this a Data Lake?
• In reality there will be more than
one data lake (separate or refined)
Trend II – Common Data Backbone
7. 7
1. Reduce corporate data silos to
gain efficiency and
productivity
2. Towards a common data
backbone for operational and
informational use
3. Enterprises going with
bimodal IT in their
modernization efforts
• Bimodal IT has two IT ‘flavors’
• Type 1 – focused on stability and
efficiency (traditional IT)
• Type 2 – experimental and agile
focused on TTM and rapid app
evolution. Aligned with business.
• Some have compared to ‘SoR’ and
‘SoE’ differentiation
• Two need to live side-by-side and
interact
• New apps still need data from ‘SoR’
Trend III – Bimodal IT
8. 8
What Does This Mean?
• A data access layer is needed to ‘open up’ data silos
But retaining local ownership and control of the data
• The access layer must provide access to all data sources and support different
modes of access
Reporting/analytics, real-time applications access (mobile/web and ‘traditional’), etc.
• New technologies will be an important part of the information infrastructure
Hadoop ecosystem, NoSQL, streaming data, “Data Lakes”
• The traditional IT infrastructure is not going away soon
‘Systems of Record’ still needed
• The new and the old need to work together
Newer systems still needs to interact with ‘Systems of Record’
How does this affect the ‘Information Architecture’?
10. 10
Logical Data Warehouse
Definition:
“The Logical Data Warehouse (LDW) is a new data management architecture for analytics
combining the strengths of traditional repository warehouses with alternative data management
and access strategy.”
“The LDW is an evolution and augmentation of DW practices, not a replacement”
“A repository-only style DW contains a single ontology/taxonomy, whereas in the LDW a semantic
layer can contain many combination of use cases, many business definitions of the same
information”
“The LDW permits an IT organization to make a large number of datasets available … via query
tools and applications”
Gartner Hype Cycle for Enterprise Information Management, 2012.
11. 11
Architecture of the Logical Data Warehouse
Data Warehouse
Sensor Data
Machine Data (Logs)
Social Data
Clickstream Data
Internet Data
Image and Video
Enterprise Content
(Unstructured)
Big
Data
Enterprise
Applications
Traditional
Enterprise
Data
Cloud
Cloud
Applications
Metadata Management, Data Governance, Data Security
NoSQL
EDW
In-Memory
(SAP Hana, …)
Analytical
Appliances
Cloud DW
(Redshift,..)
ODS
Big Data
E
T
L
C
D
C
S
q
o
o
p
(Flume, Kafka, …)
Real-Time Data Access (On-Demand / Streaming)
Batch
YARN / Workload Management
HDFS
Hive
Spark
Drill
Impala
Storm HBase Solr
Hunk
DW Streams NoSQL SearchSQL
Hadoop
Tez
Map
Red.
DataIntegration/SemanticLayer
Real-Time
Decision
Management
Alerts
Scorecards
Dashboards
Reporting
Data Discovery
Self-Service
Search
Predictive
Analytics
Statistical
Analytics (R)
Text Analytics
Data Mining
14. 14
Three Integration/Semantic Layer Alternatives
Application/BI Tool as Data
Integration/Semantic Layer
EDW as Data
Integration/Semantic Layer
Data Virtualization as Data
Integration/Semantic Layer
Application/BI Tool Data Virtualization
EDW
EDW
ODS ODS EDW ODS
15. 15
Application/BI Tool as the Data Integration Layer
Application/BI Tool as Data
Integration/Semantic Layer
Application/BI Tool
EDW ODS
• Integration is delegated to end user tools
and applications
• e.g. BI Tools with ‘data blending’
• Results in duplication of effort – integration
defined many times in different tools
• Impact of change in data schema?
• End user tools are not intended to be
integration middleware
• Not their primary purpose or expertise
16. 16
EDW as the Data Integration Layer
EDW as Data
Integration/Semantic Layer
EDW
ODS
• Access to ‘other’ data (query federation) via
EDW
• Teradata QueryGrid, IBM FluidQuery, SAP
Smart Data Access, etc.
• Often coupled with traditional ETL replication
of data into EDW
• EDW ‘center of data universe’
• Provides data integration and semantic layer
• Appears attractive to organizations heavily
invested in EDW
• More than one EDW? EDW costs?
17. 17
Data Virtualization as the Data Integration Layer
Data Virtualization as Data
Integration/Semantic Layer
Data Virtualization
EDW ODS
• Move data integration and semantic layer to
independent Data Virtualization platform
• Purpose built for supporting data access
across multiple heterogeneous data sources
• Separate layer provides semantic models for
underlying data
• Physical to logical mapping
• Enforces common and consistent security
and governance policies
• Gartner’s recommended approach
19. 19
Architecture of the Logical Data Warehouse
Real-Time
Decision
Management
Alerts
Scorecards
Dashboards
Reporting
Data Discovery
Self-Service
Search
Predictive
Analytics
Statistical
Analytics (R)
Text Analytics
Data Mining
Data Warehouse
Sensor Data
Machine Data (Logs)
Social Data
Clickstream Data
Internet Data
Image and Video
Enterprise Content
(Unstructured)
Big
Data
Enterprise
Applications
Traditional
Enterprise
Data
Cloud
Cloud
Applications
NoSQL
EDW
In-Memory
(SAP Hana, …)
Analytical
Appliances
Cloud DW
(Redshift,..)
ODS
Big Data
E
T
L
C
D
C
S
q
o
o
p
(Flume, Kafka, …)
Data Virtualization
Real-Time Data Access (On-Demand / Streaming)
Data Caching
DataServices
Data Search & Discovery
Governance
Security
Optimization
DataAbstraction
DataTransformation
DataFederation
Batch
YARN / Workload Management
HDFS
Hive
Spark
Drill
Impala
Storm HBase Solr
Hunk
DW Streams NoSQL SearchSQL
Hadoop
Tez
Map
Red.
21. 21
1. The 3 trends will change your
‘information architecture’
2. Logical Data Warehouse (LDW) is a key
architectural pattern to address many of
the challenges of the new information
architecture
3. LDW requires a data
integration/semantic layer
4. Data Virtualization is the recommended
approach for this critical layer
Summary