Watch here: https://bit.ly/2NGQD7R
In an era increasingly dominated by advancements in cloud computing, AI and advanced analytics it may come as a shock that many organizations still rely on data architectures built before the turn of the century. But that scenario is rapidly changing with the increasing adoption of real-time data virtualization - a paradigm shift in the approach that organizations take towards accessing, integrating, and provisioning data required to meet business goals.
As data analytics and data-driven intelligence takes centre stage in today’s digital economy, logical data integration across the widest variety of data sources, with proper security and governance structure in place has become mission-critical.
Attend this session to learn:
- Learn how you can meet cloud and data science challenges with data virtualization.
- Why data virtualization is increasingly finding enterprise-wide adoption
- Discover how customers are reducing costs and improving ROI with data virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
1. Modernizing Data
Architecture Using Data
Virtualization
Multipurpose Data Lake and Data
Virtualization enabled Data Fabric
Chris Day, Director Sales Engineering, APAC
2. 2
• Competition from a low cost
vendor
• Lower the price, affecting
margins?
• Or, maintain high price, but
differentiate in other ways?
3. 3
Benefits
Large Heavy Equipment Manufacturer
Self-service / Predictive Analytics – IoT Integration
Improved asset performance and
proactive maintenance
Increased revenue from sale of
services and parts
Reduced warranty costs of parts
failure
4. 4
Current Requirements in Data Management
1. Faster & more accurate decision making
▪ Significant increase in business speed & complexity of
requirements
2. Regulations, enterprise-wide governance & data security
▪ Thousand of new regulations worldwide: tax, finance, privacy, HR,
environmental, GDPR, etc.
3. IT cost reduction
▪ Huge data growth with associated storage and operational costs
5. 5
Challenges: Fragmentation of the Data Landscape
ETL
Data Warehouse
Kafka
Physical Data
Lake
ML/AI
SQL
interface
IT Storage and Processing
Streaming
Analytics
Distributed Storage
Files
Bus. Tools, Ent. Apps,
Portals, Mobile…
Gov/S
ec
Gov/Sec
Gov/
Sec
G
o
v
/
S
e
c
Gov/Sec
Gov/Sec
Gov/Sec
Gov/SecGov/SecGov/SecGov/Sec
Bus.LogicBus.LogicBus.LogicBus.Logic
IT has to
implement Gov.
& Sec. at every
data source Bus. adds Data Logic in
every report, tool, etc.
7. 7
Quiz
Where is the data for your data lake located?
1. ‘In the cloud’
2. On-premise
3. Both ‘in the cloud’ and on-premise
4. We don’t have a data lake
Quiz number 1
10. 10
What are Data Lakes?
• A storage repository that holds a vast
amount of raw data in its native
format.
• Hadoop and its ecosystem provided
the foundation: vast storage and
processing muscle
• Advanced analytic tools and mining
software intake raw data from data
lakes and transform it into useful
insight.
11. 11
• Hadoop seen as their personal
supercomputer.
• Data Lakes helped democratise
access storage and computing
with off-the-shelf hardware.
• Hadoop–based solutions became
the standard to bring modern
analytics to any corporation
Data Lakes – A Data Scientist’s Playground
12. 12
Data Lakes – Not a Perfect World
Physical Nature
• Based on Replication
• Require data to be copied to its physical storage
• Extends development cycles and costs
• Not all data is suitable for replication
• Real time needs: Cloud and SaaS APIs
• Large volumes: existing EDW
• Laws and restrictions
Single Purpose
• Usage of the data lake is often monopolised
• New silo of data, requires additional skills
• Governance, security & quality may differ what user expect (e.g. EDW)
13. 13
Multi‐purpose data lakes are data delivery environments developed
to support a broad range of users, from traditional self‐service BI users
(e.g. finance, marketing, human resource, transport) to sophisticated data
scientists.
Multi‐purpose data lakes allow a broader and deeper use of the data
lake investment without minimizing the potential value for data
science and without making it an inflexible environment.
Rick Van der Lans, R20 Consultancy
14. 14
The Multipurpose Data Lake with Data Virtualization
“Amulti-purpose data lake can become an organization’s universal data delivery system”
Architecting the Multi-Purpose Data Lake with Data Virtualization , Rick Van der Lans, April 2018
15. 15
Denodo’s Coronavirus Data Portal
File
Denodo Express
COVID-19 Edition
Data
Catalog
Data
Portal
JDBC
ODBC
API
GraphQL
GeoJSON
Sandbox
Sandbox
Sandbox
17. 17
The Multipurpose Data Lake with Data Virtualization
Logical Nature
• Replication is an option, not a necessity
• Broaden data access, shorten development times, better
insights
• Tight integration with big data systems. Fast execution with
large data volumes
Multi-purpose
• Curated access for non-technical users
• Better governance and access control
• Better ROI for the investment of the lake
18. 18
Single access to all data assets,
internal & external including:
▪ Physical Data Lake (usually based on SQL-on-
Hadoop systems)
▪ Other databases (EDW, ODS, applications,
etc.)
▪ SaaS APIs (Salesforce, Google, social media,
etc.)
▪ Files (local, S3, Azure, etc.)
The Virtual Data Lake – Access to all Data Sources
19. 19
Denodo optimizer provides native integration with
MPP systems to provide one extra key capability:
Query Acceleration
Denodo can move, on demand, processing during
execution:
• Parallel power for calculations in the
virtual layer
• Avoids slow processing on disk for large
data volumes
The Virtual Data Lake – Using the Lake Processing Engine
20. 20
join
Group by ZIP
join
Group by ZIP
The Logical Data Lake – Putting the Pieces Together
2M rows
(sales by customer)
Customer
(2M rows)
System Execution Time Optimization Techniques
Others ~ 10 min Basic
No MPP 43 sec Aggregation push-down
With MPP 11 sec Aggregation push-down + MPP integration (Impala 8 nodes)
Sales
(300 million rows)
join
Group by ZIP
1. Partial Aggregation
push down
Maximizes source processing
Reduces network traffic
3. On-demand data transfer
For SQL-on-Hadoop systems,
Denodo automatically generates
and upload Parquet files
4. Integration with local
and pre-cached data
The engine detects when data
Is cached or a is native table
in the MPP
2. Integrated with Cost Based Optimizer
Based on data volume estimation and
the cost of these particular operations,
the CBO can decide to move all or part
Of the execution tree to the MPP
5. Fast parallel execution
Support for Spark, Presto and Impala
For fast analytical processing in
inexpensive Hadoop-based solutions
With MPP Integration
group by
customer ID
21. 21
The Forrester Wave, Enterprise Data Fabric, Q2, 2020
Data fabric focuses on automating the process integration,
transformation, preparation, curation, security, governance,
and orchestration to enable analytics and insights quickly for
business success.
24. 24
Big Data Fabric – Data Abstraction Layer
Abstracts access to disparate
data sources
Acts as a single repository
(virtual)
Makes data available in
real-time to consumers
25. 25
BI and Analytics Reference Architecture
IT: Flexible Source Architecture
Business: Flexible
Tool Choice
IT can now
move at
slower
speed w/o
affecting
business
Business can
now make
faster & more
sophisticated
decisions as
all data
accessible by
any tool of
choice
Cloud DW
(Snowflake
, etc)
26. 26
BI and Analytics Reference Architecture
IT: Flexible Source Architecture
Business: Flexible
Tool Choice
IT can now
move at
slower
speed w/o
affecting
business
Business can
now make
faster & more
sophisticated
decisions as
all data
accessible by
any tool of
choice
Cloud DW
(Snowflake
, etc)
Data-as-a-
Service
ITSemantic–where
stored&processd
BusSemantic–how
consumed&used
27. 27
Data Fabric – Use Cases
Data Warehouse OffloadingIoT Integration
29. 29
Customer Case Study - Asurion
• 290 million consumers
• Annual revenues (FY
2016) $5.8 B
• Over 17,000
employees
• 49 Offices, 18
Countries
• Insurance &
Warranties on digital
devices
BUSINESS NEED
• Reduce time to create new services and products from months to weeks.
• Meet strict restrictions on migrating data out of countries of origin.
• Centralize companywide security management around a single point of control.
THE CHALLENGE:
Expand their data architecture to cope with global growth, while
exceeding the expectations of the customers.
30. 30
Asurion – Digital Transformation
SOLUTION:
• Asurion developed a hybrid
data layer across the cloud &
on-premise data.
• A single point of access to the
data ensuring security
compliance.
• Removed complexities of data
access from the consumers,
enabling better integration &
improved analtyics
32. 33
Current Requirements in Data Management
1. Faster & more accurate decision making
▪ Data Virtualization – Single platform for all enterprise data
2. Regulations, enterprise-wide governance & data security
▪ Data Virtualization – Unified metadata management for
governance and security
3. IT cost reduction
▪ Data Virtualization – Minimise data management infrastructure
33. Data Virtualization:
1. Enables multi-use data lake reducing costs &
increasing collaboration
2. Unifies disparate data sources in real-time
3. Supports self-service & data discovery
4. Centralises governance & security of enterprise
data assets
KEY TAKEAWAYS
34. 35
Next Steps
Access Denodo Platform in the Cloud!
Take a Test Drive today!
https://www.denodo.com/TestDrive
G E T S TA R T E D TO DAY
36. 37
Useful Links
• Data Virtualization for Dummies - Learn how to put data virtualization to
work in your organisation: Integrate all data source, deliver big data solutions
that work, take the pain out of cloud adoption and drive digital
transformation.
• Data Virtualization: The Modern Data Integration Solution - Data
virtualization is a modern data integration approach that is already meeting
today’s data integration challenges, providing the foundation for data
integration in the future. Download this whitepaper to learn more about:
The fundamental challenge for organizations today, why traditional solutions
fall short and why data virtualization is the core solution.
37. 38
Denodo
The Leader in Data Virtualization
DENODO OFFICES, CUSTOMERS, PARTNERS
Palo Alto, CA.
Global presence throughout North America,
EMEA, APAC, and Latin America.
LEADERSHIP
▪ Longest continuous focus on data
virtualization – since 1999
▪ Leader in 2018 Forrester Wave – Big
Data Fabric
▪ Winner of numerous awards
CUSTOMERS
~800 customers, including many F500 and
G2000 companies across every major industry
have gained significant business agility and ROI.
FINANCIALS
Backed by $4B+ private equity firm.
50+% annual growth; Profitable.