This presentation is focused on the Data Integration and Interoperability section of the DMBOK2. It focuses on Data virtualization as a key tool of Data Integration.
2. DII – The new kid in the
block
Data Integration and Interoperability (DII) describes processes
related to the movement and consolidation of data within
and between data stores, applications and organizations.
Why we’re geeking out for DII
1. SOA/Microservices are becoming more popular.
2. Integration of structured and unstructured data
3. Deliver value faster… avoid ROGUE users
4. Although it’s not new, DII in DMBOK provides clear
guidelines to organizations aiming to become more efficient
through IT.
5. History
Data virtualization exists
since Bill Inmon
popularized data
warehouse in the 1990s.
But virtual models back
then were not very
popular due to the lack
of computer power
available (or
accessible).
Today, change in data
types and business
expectations on
information velocity have
made virtualization a
more popular concept.
Did you know?
Last time Bill Inmon wrote about
Data virtualization he compared
it to a frustrating whack a mole
game, where no matter how
much you hit the mole… it
keeps coming back!
http://www.b-eye-
network.com/view/9956
8. Why data virtualization?
Fast and Easy
• Rapid data integration which enables a faster time to solution
• Integrations and changes are easy (No need to update Extractions, tables, DataMart's)
Integrate more!
• Opportunity to integrate structured and unstructured data
Cheaper and more secure
• Less expensive to maintain
• No need to replicate data
• Reduces overhead of management of data integration systems (Easier + Faster = Less
reqauired resources)
Agile
• Enables iterative development with quick deliverables (Note: very important one since in most
cases, users don’t know what they want… too many iterations)
• Developers are more focused on business instead of understanding the mechanics of data
manipulation (Why? Because Data virtualization tools automatically connect to many data
sources )
9.
10. Use Cases
Data Warehouse augmentation
Problem
• Bringing in new data sources to a data warehouse
takes a significant amount of effort, but even more
so, if the data sources include unstructured data.
Fix
• Data virtualization can be applied to augment
existing data warehouse with virtual views that
incorporate unstructured data.
Support ETL process
Problem
• It is sometimes too complicated to access web
services data, extract it and make it part of the ETL,
specially if you need to develop access methods for
external or new types of data.
Fix
• Data virtualization tools have access methods which
can be used to easily extract data from web
services, pre-process this data and have it ready to
Data Warehouse Federation/Canonical
Problem
• Some organizations have multiple separate data
warehouses which may take too much effort to
integrate.
Fix
• Data virtualization allows to quickly generate
federated views of all these data warehouses and
integrate this data for different services. Individual
warehouses continue to operate with no
interruptions. (Same thing for DWH migrations!)
11. Use Cases
Data Warehouse prototyping
Problem
• Organizations are moving to agile development,
where iterations and short term sprints are key to
delivering value on a weekly ot bi weekly basis.
Fix
• When data prototypes are built fast and are
validated by users, this then generates a proven
product that can then be materialized saving time
and therefore money.
Data Mashups
Problem
• Web mashups are enabled by APIs and most
corporate data sources do not have accessible APIs
to support this mashup process.
Fix
• Data virtualization tools are enables of mashups
since they use same protocols and data delivery
formats as APIs.
Master Data on Steroids– Past, present and future
data
Problem
• Master Data Hubs traditionally only hold identity and
descriptive information, but transactional data is
usually not stored in MDHs.
Fix
• With data virtualization, you could make a canonical
layer where you would input data from the MDH and
other sources and enrich master data with
summarized transactional data. (E.g. adding value
of customer over time, purchasing forecast etc…)
12. So is ETL going away?
This does not mean ETL is not needed, its more around identifying when ETL is not
enough, and use virtualization to enhance Data integration! When ETL is too slow, or
data sources are difficult to access or data types are challenging.
Maybe in the future it’ll be the other way around, where we’ll look at ETL for cases
when data virtualization is not enough. For instance, when you need to perform highly
complex transformations that could impact performance in a virtual database.
Today, It is common to virtualize in development and materialize in production.
Misconceptions
1. VDB DOES NOT replace a DWH. VDB enhances DWH by:
• Combine structured and unstructured data into a single data layer