A brief description of what Data Virtualisation is and how it can be used to support business intelligence applications and development. Originally presented to the ETIS Conference in Riga, Latvia in October 2013
An introduction to data virtualization in business intelligence
1. An Introduction to Data Virtualization
in Business Intelligence
David M Walker
Data Management & Warehousing
http://datamgmt.com
18 OKTOBRIS 2013
2. What Is Data Virtualization?
• Wikipedia:
“Data virtualization is [..] an application to retrieve
and manipulate data without requiring technical
details about the data, such as how it is formatted
or where it is physically located.”
• Or more simply:
A solution that sits in front of multiple data
sources and allows them to be treated as a single
SQL database
6. Advanced Features:
Creating a Canonical Data Model
User$sees$system$as$a$single$CDM$and$not$mul0ple$sources$
Data$Virtualiza0on$PlaWorm$
$$
$
Data$mapped$to$
conform$to$a$$$
Canonical$Model$
Finance$System$
Other$Systems$
CRM$System$
Billing$System$
Website$
7. But it’s not a Silver Bullet
• Can be slow
– Depending on how much data has to be fetched from remote
systems to the DV platform – platforms try to be smart to
reduce this
• Can impact performance on underlying systems
– Lots of BI users making queries on resource sensitive OLTP
systems is not a good idea
• Requires Resources
– Another set of servers, technologies, etc. to manage, but this
cost is often offset against the reduction in complexity
elsewhere.
• Not a replacement – it is an additional tool
– You will still need ETL and Messaging
8. BI Use Cases:
Agile Data Mart Design
• Access data
warehouse data
quickly and easily
• Design the data mart
you think you want
• Test it with real data
and your actual
reporting tool
• Also possible with data
warehouse design
Data$Virtualiza0on$PlaWorm$
A$
OR$
Data$Warehouse$
B$
9. BI Use Case:
Virtual Data Marts
• Big Tin Appliance with
lots of horse power?
• Don’t want to duplicate
data in the appliance
and consume disk
space for a data mart
but want the star
schema for ease of
use?
Data$Virtualiza0on$PlaWorm$
Data$Warehouse$
10. BI Use Case:
Data Mart Extensions
• Existing (physical) data
mart
• New Data source that
needs to be
incorporated quickly
• Create virtual copy of
existing data mart and
data source
• Integrate into updated
data mart design
Data$
Virtualiza0on$
PlaWorm$
Data$Mart$
New$Data$
Source$
$
11. BI Use Case:
Agile Set Based ELT Design
• If your normal ETL style
is a series of set SQL
queries built on top of
each other then you
can quickly prototype
ETL before moving it
into your normal ETL
engine to persist
execute (normally for
performance)
Data$Virtualiza0on$PlaWorm$
Source$
Source$
Source$
12. BI Use Case:
Big Data Integration
• DV Platform
connects to Big Data
Sources
• Data Sources are
mapped into DV
• User accesses them
via standard tools
(SQL, RESTful
interfaces, etc.)
SQL$based$tools$
SQL$Interface$
Data$Virtualiza0on$PlaWorm$
Map$Reduce,$etc.$Interface$
13. BI Use Case:
Source System Analysis
• Apply your data quality
and data profiling tools
to all your data sources
• Look for relationships
across systems
• Remove limitations of
accessibility by
enabling caching so
that you are not hitting
the source system but
have fresh data
Data$Quality$&$Profiling$Tools$
Data$Virtualiza0on$PlaWorm$
Source$
Source$
Source$
14. BI Use Case:
Data Masking
• Currently building two
versions of a data
mart, one with
sensitive data in and
one without
• Instead build one and
use Role Based Access
Control (RBAC) to
restrict what an
individual can see
Data$Virtualiza0on$PlaWorm$
AND$
Physical$Data$Mart$
15. BI Use Cases
• Some examples
– Usefulness of each example depends on the
organization
• Generally an enabler for more agility
– Quicker prototyping and integration
• Will not solve all your problems
– And has a cost associated with it (license &
hardware
16. Vendors: What The Analysts Say
• Forrester Wave Data
Virtualization Q1 2012
• Forrester Wave Q1/12
– Informatica
– IBM
– Denodo
• EU (Spanish) Origins
– Composite
• Now part of Cisco
• Was OEM’d by Informatica
– Microsoft
– SAP
– And others
• Gartner
– No Magic Quadrant, instead
includes Data Virtualization
in Data Integration
17. Vendors: Product Positioning
Stand Alone
• Players
– Cisco (Composite)
– Denodo
• Selection
– Popular where IBM/
Informatica are not already
embedded
Integrated
• Players
– IBM
– Informatica
• Selection
– Popular with organisations
that already have the vendor
ETL tool
18. An Introduction to Data Virtualization
in Business Intelligence
David M Walker
Data Management & Warehousing
http://datamgmt.com
THANK YOU - PALDIES