Watch full webinar here: https://bit.ly/2Y0vudM
What is Data Virtualization and why do I care? In this webinar we intend to help you understand not only what Data Virtualization is but why it's a critical component of any organization's data fabric and how it fits. How data virtualization liberates and empowers your business users via data discovery, data wrangling to generation of reusable reporting objects and data services. Digital transformation demands that we empower all consumers of data within the organization, it also demands agility too. Data Virtualization gives you meaningful access to information that can be shared by a myriad of consumers.
Register to attend this session to learn:
- What is Data Virtualization?
- Why do I need Data Virtualization in my organization?
- How do I implement Data Virtualization in my enterprise?
3. Agenda
1. Data challenges in the 21 century
2. What is Data Virtualization?
3. Benefits of Data Virtualization
4. How Data Virtualization Works
5. Key takeaways
6. Q&A
5. 5
2020âs Data Facts
Rising Volume of data
⪠90% of the data have been produced in the past 2 years
⪠40 zettabytes of Data by end 2020 (5 200GB / person on earth)
⪠Every person will be generating 1.7 MB data / second In 2020
⪠It will take 181 million years for a person to download all those Data
Rising Business challenges with Data
⪠Poor data quality costs business between $9 M to $14 M a year
⪠Bad data is estimated to cost US only $3 trillion a year
⪠97% of organization are investing in AI & Big Data
⪠93% have multi-cloud & hybrid strategy
⪠Data Scientists waste 75% looking for Data
Sources: 2020, Capgemini, IBM, EDC âŚ
6. 6
⢠Social Media
⢠Mobile Devices
⢠Increased Internet
commerce/transac
tions
⢠Networked
devices/sensors
New sources of data New repositories
â˘Images
â˘Streamed
data
â˘Video/audio
â˘Parkay
New types
⢠Citizen analysts
⢠Customer
demands
⢠AI/ML
⢠Predictive
Analytics
⢠Data Science
Increase demand
⢠PPI
⢠Reporting needs
(AML/KYC, HEDIS,
etc.)
⢠GDPR
⢠HR, Privacy, Tax
Growing Regulatory concerns
⢠SaaS/PaaS
⢠Cloud based data
⢠Governance
challenges
⢠More apps
Increasingly complex
More Data More Insight
21st Century Data Challenges
â˘Data Lake
â˘Snowflake
â˘Queues
TIME & Cost
8. 8
How do I handle data
Gartner â The Evolution of Analytical Environments
This is a Second Major Cycle of Analytical Consolidation
Operational Application
Operational Application
Operational Application
Operational Application
Operational Application
Operational Application
IoT Data
IoT Data
Other NewData
Other NewData
Operational
Application
Operational
Application
Operational
Application
Operational
Application
Cube
Cube
Operational
Application
Operational
Application
Cube
Cube
?
? Operational Application
Operational Application
Operational Application
Operational Application
Operational Application
Operational Application
IoT Data
IoT Data
Other NewData
Other NewData
1980s
1980s
Pre EDW
1990s
1990s
EDW
2010s
2010s
2000s
2000s
Post EDW
Time
LDW
Operational
Application
Operational
Application
Operational
Application
Operational
Application
Operational
Application
Operational
Application
Data
Warehouse
Data
Warehouse
Data
Warehouse
Data
Warehouse
Data
Lake
Data
Lake
?
?
LDW
LDW
Data Warehouse
Data Warehouse
Data Lake
Data Lake
Marts
Marts
ODS
ODS
Staging/Ingest
Staging/Ingest
Unified analysis
âş Consolidated data
âş "Collect the data"
âş Single server, multiple nodes
âş More analysis than any
one server can provide
Š2018 Gartner, Inc.
Unified analysis
âş Logically consolidated view of all data
âş "Connect and collect"
âş Multiple servers, of multiple nodes
âş More analysis than any one system can provide
Fragmented/
nonexistent analysis
âş Multiple sources
âş Multiple structured sources
Fragmented analysis
âş "Collect the data" (Into
âş different repositories)
âş New data types,
âş processing, requirements
âş Uncoordinated views
10. 10
Source: âGartner Market Guide for Data Virtualization, November 16, 2018â
Data virtualization can be used to create virtualized
and integrated views of data in-memory rather
than executing data movement and physically storing
integrated views in a target data structure. It
provides a layer of abstraction above the physical
implementation of data, to simplify query logic.
11. 11
What is Data Virtualization?
Consume
in business applications
Combine
related data into views
Connect
to disparate data sources
2
3
1
DATA CONSUMERS
DISPARATE DATA SOURCES
Enterprise Applications, Reporting, BI, Portals, ESB, Mobile, Web, Users
Databases & Warehouses, Cloud/Saas Applications, Big Data, NoSQL, Web, XML, Excel, PDF, Word...
Analytical Operational
Less Structured
More Structured
CONNECT COMBINE PUBLISH
Multiple Protocols,
Formats
Query, Search,
Browse
Request/Reply,
Event Driven
Secure
Delivery
SQL,
MDX
Web
Services
Big Data
APIs
Web Automation
and Indexing
CONNECT COMBINE CONSUME
Share, Deliver,
Publish, Govern,
Collaborate
Discover, Transform,
Prepare, Improve
Quality, Integrate
Normalized views of
disparate data
12. 12
Modern Data Virtualization
Data Virtualization enhanced with data management, automation and AI
ď Delivers data more quickly than direct queries
ď Leverages AI to accelerate performance and enhance the user experience
ď An active data catalog to explore and govern data in real time
ď Empowers data scientists with an integrated data science notebooks
ď Flexible support for hybrid and multi-cloud architectures
ď Employs automation to speed cloud deployment and management
ď Leverages SSO and fine grain permissions to secure data assets
13. 13
Six Essential Capabilities of Data Virtualization
4. Self-service data services
5. Centralized metadata, security &
governance
6. Location-agnostic architecture for
multi-cloud, hybrid acceleration
1. Data abstraction
2. Zero replication, zero relocation
3. Real-time information
14. 14
1. Data abstraction
Abstracts access to disparate data sources.
Acts as a single virtual repository.
Abstracts data complexities like location,
format, protocols
âŚhides data complexity for ease of data access by business
Enterprise architects must revise their data architecture to meet
the demand for fast data.â
â Create a Road Map For A Real-time, Agile, Self-Service Data
Platform, Forrester Research
15. 15
2. Zero replication, zero relocation
âŚreduces development time and overall TCO
The Denodo Platform enables us to build and deliver data
services, to our internal and external consumers, within a
day instead of the 1 â 2 weeks it would take with ETL.â
â Manager, Enervus
Leaves the data at its source; extracts only what is
needed, on demand.
Diminishes the need for effort-intensive ETL
processes.
Eliminates unnecessary data redundancy.
16. 16
3. Real-time information
Provisions data in real-time to consumers
Creates real-time logical views of data across many
data sources.
Supports transformations and quality functions
without the latency, redundancy, and rigidity of legacy
approaches
âŚenables timely decision-making
Denodoâs data fabric design relies on data virtualization
to provide integrated data quickly to business users to
effect faster outcomes..â
â Gartner Magic Quadrant for Data Integration Tools, 18 Augustâ 2020
17. 17
4. Self-service data services
Facilitates access to all data, both internal and external
Enables creation of universal semantic models reflecting
business taxonomy
Connects data silos to provide best available information to
drive business decisions
âŚenables information discovery and self-service
Impressively quick turn around time to "unlockâ data from
additional siloes and from legacy systems - Few vendors (if any) can
compete with Denodo's support of the Restful/Odata standard -
both to provide data (northbound) and to access data from the
sources (southbound).â
â Business Analyst, Swiss Re
18. 18
5. Centralized metadata, security & governance
Abstracts data source security models and enables single-point
security and governance.
Extends single-point control across cloud and on-premises
architectures
Provides multiple forms of metadata (technical, business,
operational) to facilitate understanding of data.
âŚsimplifies data security, privacy, audit
Our Denodo rollout was one of the easiest and most successful rollouts of critical
enterprise software I have seen. It was successful in handling our initial, security,
use case immediately, and has since shown a strong ability to cover additional
use cases, in particular acting as a Data Abstraction Layer via it's web service
functionality.â
â Enterprise Architect, Asurion
19. 19
6. Location-agnostic architecture for multi-cloud, hybrid acceleration
Optimizes costs by migrating data, applications, and analytics
workloads to cloud without impacting the business
Enables creation of hub architecture to support integration of
data across mixed workloads.
End-to-end management of migrations/promotions and
continuous delivery processes.
âŚenables cloud adoption
Impressively quick turn around time to "unlockâ data from
additional siloes and from legacy systems - Few vendors (if any) can
compete with Denodo's support of the Restful/Odata standard -
both to provide data (northbound) and to access data from the
sources (southbound).â
â Business Analyst, Swiss Re
21. 21
Reference Architecture
IT: Flexible Source Architecture
Business: Flexible
Tool Choice
Business can
now make
faster & more
sophisticated
decisions as
all data
accessible by
any tool of
choice
IT can now
move at a
cadence
that suits
speed w/o
affecting
business
23. 23
Benefits of Using Data Virtualization
Easier & faster access to trusted data
⢠For Business Users
⢠Simplicity: Users donât need to navigate the complexity of the architecture. Where is
data (on-prem, cloud, multi-cloud)? How to Access it? Which location has priority?
⢠Agility: All data is securely delivered from a single (virtual) system
⢠Accessibility: Data is accessible in a variety of formats (SQL, REST, OData, GraphQL)
and in a web-based Data Catalog, regardless of original format and location
⢠Common Semantic Layer: All users see the same definitions and data, providing data
consistency
⢠Governed Self-Service: Users can use their own tools (BYOT) to access and query the
data that is governed, secure, and trusted data.
24. 24
Benefits of Using Data Virtualization
Faster, cheaper, simpler, easier to secure and govern
⢠For IT
⢠Abstraction: Decouples storage and processing engines from the delivery of data
⢠Flexibility: Allows IT to change technologies and move data without service
interruptions
⢠Security: Centralized governance and security controls for all data assets
⢠Governance: The data accessed by the users can be governed, secured, and managed
so that users are accessing known, trusted, and approved data sets.
⢠Accelerated Delivery: As data is not be replicated to a staging area or data mart for
use, it is significantly quicker (up to 90% quicker) to deliver the data needed by the
users.
25. 25
Data Virtualization use cases
From Data Storage & Management, to Data Consumers, going through Data Governance & Security
Decision
(Real time)
Single View
(Customer 360)
Agile BI
(Self-service)
Data Science
(ML & AI)
APPS
(Mobile & web)
Mergers &
Acquisitions
Data
Marketplace
Compliances
(IFRS17, GRC)
Data
Security
APIfication
(& SQLification)
Unified Data
Layer
Agility
& Simplicity
Real-time
Delivery
Data
Abstraction
Zero
Replication
Data
Governance
Sophisticated
Optimizations
Logical Data
Warehouse/Lake
Big Data
Fabric
Hybrid
Data Fabric
Data
Integration
Data
Migration
Refactoring &
Replatforming
Data Consumption
Data Storage & Management
Data Governance, Manipulation & Access
Sales
HR
Executive
Marketing
Apps/API
Data
Science
AI/ML
27. 27
Denodo Platform 8.0 Architecture
DATA CATALOG
Discover - Explore - Document
DATA AS A SERVICE
RESTful / OData
GraphQL / GeoJSON
BI Tools Data Science Tools
SQL
CONSUMERS
DATA VIRTUALIZATION
CONNECT
to disparate data
in any location, format
or latency
COMBINE
related data into views
with universal semantic
model
CONSUME
using BI & data science
tools, data catalog,
and APIs
Self-Service
Self-Service
Hybrid/
Multi-Cloud
Hybrid/
Multi-Cloud
Data
Governance
Data
Governance
Query
Optimization
Query
Optimization
AI//ML
Recommendations
AI//ML
Recommendations
Security
Security
LOGICAL
DATA
FABRIC
SOURCES
Traditional
DB & DW
Traditional
DB & DW
150+
data
adapters
Cloud
Stores
Cloud
Stores
Hadoop
& NoSQL
Hadoop
& NoSQL OLAP
OLAP Files
Files Apps
Apps Streaming
Streaming SaaS
SaaS
1
2
3
4
5
6
7
8
9 10
11
28. 28
Denodo Platform â How does virtualization work..?
DATA CATALOG
Discover - Explore - Document
DATA AS A SERVICE
RESTful / OData
GraphQL / GeoJSON
BI Tools Data Science Tools
SQL
CONSUMERS
LOGICAL
DATA
FABRIC
SOURCES
Traditional
DB & DW
Traditional
DB & DW
150+
data
adapters
Cloud
Stores
Cloud
Stores
Hadoop
& NoSQL
Hadoop
& NoSQL OLAP
OLAP Files
Files Apps
Apps Streaming
Streaming SaaS
SaaS
U
Customer 360
View
Virtual Data
Mart View
J
Unified
View
Unified
View
Unified
View
Unified
View
A
J
J
Derived
View
Derived
View
J
J
S
Transformation
& Cleansing
Base
View
Base
View
Base
View
Base
View
Base
View
Base
View
Base
View
Abstraction
CONNECT
COMBINE
CONSUME
30. Reduce Complexity, Time and Money
Data architectures
are getting more
complex, more
diverse, and more
distributed
Traditional data
integration and
management
approaches are too
expensive, slow and
complex
Multiple Use Case Support
Enables a wide range
of use cases; from
self-service analytics
and data services to
centralized data
governance and
compliance,
innovation platform.
Abstraction helps
Presenting unified
business consumable
informational assets.
Separation of storage
and access.
Governance
Governed access
across silos of data.
Agility
Faster provisioning
of business
consumable data
sets. Rapid
prototyping
capabilities and
empowerment of
business users.
34. 34
Sources
1
⢠150+ data adapters
⢠Relational, parallel, multidimensional, and in-
memory databases
⢠Cloud data warehouses
⢠NoSQL databases and Hadoop ecosystem
⢠SOAP/REST web services, SaaS applications
⢠Enterprise systems, web and file systems
⢠JMS queues and streaming technologies
⢠New adapters for Databricks Delta, Azure
Synapse, Google BigQuery and Cloud Data
Storages
DETAILS
⢠Agile connectivity to new data sources within minutes
⢠Broad range of source connectivity options
⢠Rapid integration of new sources
⢠Faster time to market
BENEFITS
SUMMARY
Industryâs broadest range of source connectivity options
USED BY
Data Engineers and Integrators
All available connectors are included within the Denodo Platform's costâ
â Gartner Magic Quadrant for Data Integration Tools, 2017
35. 35
Data Governance
2
⢠Metadata repository with multiple visualization
options
⢠Discover, introspect and transform source
metadata
⢠Refresh or propagate source metadata when it
changes
⢠Data lineage, change impact and dependency
analysis
⢠Ability to integrate with third-party governance
tools and catalogs
DETAILS
⢠Delivery of consistent, curated and contextual
data to users
⢠Controlled data virtualization and enterprise data
services capabilities
BENEFITS
SUMMARY
Comprehensive data and metadata governance
USED BY
Data Engineers and Integrators
Data Stewards and Analysts
Using Data virtualization to Harden Business Usersâ self-Authored BI Applicationâ
â Forrester: Divide (BI Governance From Data Governance) And Conquer, 2017
36. 36
Hybrid/Multi-Cloud
3
⢠Ready-to-use and available on AWS, Azure,
Google Marketplaces and Docker Container.
⢠Automated installation, configuration,
deployment and upgrade of clusters in hybrid
and multi-cloud environments
⢠A wide range of capacity options. Flexible rent-
by-the-hour options.
⢠Centralized Metadata management across
multiple locations.
⢠One Denodo instance can be a source for
another Denodo instance. Enables convenient,
layered, regional architectures.
⢠Orchestrate, Audit, Monitor, and Govern
DETAILS
⢠Deploy at any location - on-premises, multi-
cloud, and edge. Multi-location architecture for
Maximum Flexibility.
⢠Minimize expensive data movement and
maximize local processing.
⢠Optimize costs by migrating data, applications,
and analytics workloads to cloud without
impacting the business.
⢠Full integrated Diagnostic and Monitoring Tool
with Solutions Manager, making it easy to
manage clusters
BENEFITS
SUMMARY
Multi-location architecture for multi-cloud, hybrid, and edge scenarios with automated infrastructure management capability
USED BY
Cloud-first Enterprises
37. 37
Self-Service
4
⢠Linked data services for self-service data
discovery, browsing and exploration
⢠Users can drill down in data views to examine the
data itself
⢠Denodo data catalog for self-service global data
search, relevant to the user
⢠Users can find, share and reuse all datasets
available through the data virtualization layer
DETAILS
⢠Easy for business users to create a catalog of
business views and classify them according to
business categories
⢠Business users and LOB executives become less
dependent on IT organization
⢠Denodo data catalog empowers a community of
analysts and decision makers by creating a digital
marketplace
BENEFITS
SUMMARY
Easy data exploration and discovery by business users in a self-serviceable manner
USED BY
Data Engineers and Integrators
Application & API Integrators
Finding the right data quickly is essential in the age of self-service analyticsâ
â Dave Wells, Eckerson Group, 2017
38. 38
Security
5
⢠Role-based access control to data services,
sources, and enterprise tools
⢠Single sign-on using Kerberos; Security
delegation; SAML, OAuth Support.
⢠Row and column level fine-grained authorization
⢠User authentication using LDAP, Active Directory
⢠Data Encryption, Masking, Tokenization and
Redaction for data privacy
DETAILS
⢠Easy to enforce security and policies in one central
place, the data virtualization layer
⢠Consistent security model for all sources and all
applications
⢠Secures both data and metadata.
BENEFITS
SUMMARY
All-encompassing unified security layer for data delivery
USED BY
Data Engineers and Integrators
Application & API Integrators
Denodoâs data fabric solution integrates key data management components, including data integration, data ingestion, data transformation, data
governance and security, to support new and emerging use cases including customers 360, real-time and on-demand analytics, IoT analytics, and self-service
analytics.â
â The Forrester Waveâ˘: Enterprise Data Fabric, Q2 2020
39. 39
AI/ML Recommendation
6
⢠Past activity metadata-based ML process to
automate fabric management activities
⢠Denodo uses ML to automatically propose and
choose the best summaries for faster processing
⢠ML process to predict workload peaks for
Denodo in the cloud and auto-scale accordingly
⢠ML-based recommendations of similar datasets,
and datasets interesting for similar users
DETAILS
⢠Significant reduction in data search and discovery
time
⢠Accelerate advanced analytics and data science
⢠Cost reduction in Denodo cloud usage
BENEFITS
SUMMARY
Automate data fabric management and processes using ML
USED BY
Citizen Analysts and Integrators
Denodoâs AI/ML capabilities, as well as automation, continue to enhance its capabilities across data fabric components!â
â The Forrester Waveâ˘: Enterprise Data Fabric, Q2 2020
40. 40
Query Optimization
7
⢠Dynamic Query Optimizer for best-in-class
optimization
⢠Smart query acceleration using Summaries for
complex analytical scenarios
⢠Offers partial and full aggregation pushdown
⢠Native integration with existing MPP and in-
memory systems for query acceleration
⢠Widest range of caching configuration options -
partial (for frequently used reports) or full cache
(for data intensive analytical applications)
DETAILS
⢠Significant reduction in query execution time
⢠Significant reduction of network traffic
⢠Exploits the processing power of data sources to
maximize local processing
⢠Cost reduction in cloud use cases
⢠âFull cache modeâ avoids accessing data sources
BENEFITS
SUMMARY
Unparalleled performance through query optimization and caching
USED BY
Data Engineers and Integrators
Caching mechanisms range from proprietary file structures to standard relational database management system (RDBMS) tables, and also include caching of
data in-memory. These caching mechanisms enhance the performance of data virtualization. Full and partial refreshing of caching can be triggered by
schedules, events or rules.â
â Adopt Data Virtualization to Improve Agility and Bimodal Traits in Your Aging Data Integration, 2017
41. 41
Data Catalog
8
⢠Google-like search capability for data and
metadata
⢠Business-friendly new UI geared to roles such as
data stewards, data analysts, and citizen analysts
⢠Ability to create business categories or tags
⢠Graphical representation of lineage, relationships
⢠Usage-based metadata â who, when, what, why,
and how of data consumption
⢠Enhanced collaboration features through user
warnings and comments
⢠Machine learning powered personalized
recommendation for data sets
DETAILS
⢠Data at the speed of business
⢠Users can easily search for data or metadata
⢠Facilitates Sharing and Collaboration
⢠Enhanced user experience with smart ranking of
search results
BENEFITS
SUMMARY
The only data virtualization solution that seamlessly integrates data catalog with data delivery.
USED BY
Data Stewards and Analysts
Citizen Integrators
Citizen Analysts
Through 2022, over 80% data lake projects will fail to deliver value as finding, inventorying and curating data will prove to be the biggest inhibitor to
analytics and data science success.â
- Gartner Augmented Data Catalogs: Now an Enterprise Must-Have for Data and Analytics Leaders, September 12, 2019
42. 42
Data Consumers
9
⢠Multiprotocol support including JDBC, ODBC,
Odata, GraphQL and GeoJSON
⢠SOAP and RESTful web services
⢠Output in XML, JSON and HTML for human and
machine consumption
⢠Portal widget for major portal support
⢠Native Denodo connector for major BI tools such
as Tableau, Microstrategy, Cognos, and PowerBI
DETAILS
⢠Empower business users with relevant and
contextual data at their fingertips
⢠Single source of truth across multiple BI tools
⢠Consistent security and governance across all
consuming applications
⢠Reduced API call overhead through GraphQL
BENEFITS
SUMMARY
Consistent view of information across any consuming application; Rationalize and Integrate multiple BI platforms.
USED BY
Citizen Integrators
Citizen Analysts
43. 43
Data Science Tools
10
⢠Data scientists can combine queries, scripts,
graphics and text to create narratives
⢠Denodo Data Science Tool is built based on
Apache Zeppelin
⢠Denodo users can create, save, and share their
own notebooks with fellow users
⢠Fully integrated with Denodoâs security system
and SSO capabilities
DETAILS
⢠Help data scientists save time in finding data for
analytics and model building
⢠Data scientists can easily share their findings with
peers using the notebook dashboard
⢠Contextualizing data science models and
consumption is easier through Denodo layer
BENEFITS
SUMMARY
Data Science Notebook that is fully integrated with Denodoâs security system and SSO capabilities
USED BY
Data Scientists and Data Engineers
44. 44
Data as a Service
⢠Expose reusable Data Services supporting
multiple protocols (JDBC, ODBC, ADO.NET, REST
and SOAP/XML Web Services, OData, GraphQL)
⢠Easily extend or specialize Data Services for
specific use cases
⢠Full metadata introspection support
⢠OpenAPI (Swagger) support
⢠Data Lineage support
DETAILS
⢠Data at the speed of business
⢠Users can easily search for data or metadata
⢠Facilitates Sharing and Collaboration
BENEFITS
SUMMARY
Create reusable, extensible Data Services for all types of consumers
USED BY
Data Engineers and Integrators
Application and API Integrators
11