Watch here: https://bit.ly/2Npt82U
If you have data, you are engaged in data management—be sure to do it effectively.
As organizations are assessing how COVID-19 has impacted their operations, new possibilities and uncharted routes are becoming the norm for many businesses. While exploring and implementing different deployment and operational models, the question of data management naturally surfaces while considering how these changes impact your data. Is this the right time to focus on data management? The reality is that if you have data, you are engaged in data management and so the real question is, are you doing it well?
Join Brice Giesbrecht from Caserta and Mitesh Shah from Denodo to explore data management challenges and solutions facing data driven organizations.
Best Practices in the Cloud for Data Management (US)
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12. Best Practices for Data
Management in the Cloud
June 2020
Mitesh Shah – Senior Cloud Product Manager
13. 2
Current Challenges in Data Management
1. End Users: faster & more accurate decision making
• Significant increase in business speed & complexity of
requirements
2. Regulations: enterprise-wide governance & data security
• Thousands of new regulations worldwide: tax, finance,
privacy, HR, environmental, etc.
3. IT: cost reduction & multi-cloud skills
• Huge data growth with associated storage and operational
costs. Keeping up with dynamic cloud services.
14. Why is this a Problem?
3
IT DepartmentBusiness
“You re too slow, too
expensive, and never
deliver what I want.”
“You can t make up your
mind, keep adding
features, and never see
the big picture.”
Casual User: “Just
forget it.”
Power User: “Just give
me a data dump.”
BU Leader: “We ll do it
ourselves.”
“I d rather be doing
something else than
taking your order.”
“You ll come crawling
back to us soon.”
15. Hybrid Cloud Data management – Simplifying Data Integration
4
Manually access different
systems
IT responds with point-to-
point data integration
Takes too long to get
answers to business users
MarketingSales ExecutiveSupport
Database
Apps
Warehouse Cloud
Big Data
Documents AppsNo SQL
Businesses are reporting that integrating data from
silos to support real-time insights has become a
nightmare, especially when supporting large and
complex data sets
Big Data Fabric 2.0 Drives Data Democratization, May 9, 2019
16. The Solution – A Data Abstraction Layer
5
Abstracts access to
disparate data sources
Acts as a single repository
(virtual)
Makes data available in
real-time to consumers
DATA ABSTRACTION LAYER
“Enterprise architects are finding that traditional
data architectures are failing to meet new business
requirements, especially around data integration for
streaming analytics and real-time analytics.”
The Forrester Wave: Enterprise Data Virtualization, Jan 12, 2018
DATA VIRTUALIZATION PLATFORM
17. Source: “Gartner Market Guide for Data Virtualization, November 16, 2018
Data virtualization can be used to create virtualized and
integrated views of data in-memory rather than executing
data movement and physically storing integrated views in a
target data structure. It provides a layer of abstraction
above the physical implementation of data, to simplify
query logic.
18. 7
How Does It Work?
Development
Lifecycle Mgmt
Monitoring &
Audit
Governance
Security
Development
Tools and SDK
Scheduled Tasks
Data Caching
Query Optimizer
JDBC/ODBC/ADO.Net SOAP / REST WS
U
Business
View
Data Mart
View
J
Application
Layer
Business
Layer
Unified
View
Unified
View
Unified
View
Unified
View
A
J
J
Derived
View
Derived
View
J
JS
Transformation
& Cleansing
Data
Source
Layer
Base
View
Base
View
Base
View
Base
View
Base
View
Base
View
Base
View
Abstraction
19. 8
Gartner – Logical Data Warehouse
“Adopt the Logical Data Warehouse Architecture to Meet Your Modern Analytical Needs”. Henry
Cook, Gartner April 2018
DATA VIRTUALIZATION
20. Market Guide for Data Virtualization, Gartner, November, 16, 2018
“Through 2022, 60% of all organizations will
implement data virtualization as one key delivery
style in their data integration architecture.”
9
21. 10
Denodo Data Virtualization - Use Case Categories
Customer Centricity
APIs/Services
Data as a Service
Microservices/containers
API Data Services
Application Migration
Machine Learning
Data Catalog
Metadata management
Universal data access
Governance, Risk, Compliance (GRC)
Data Masking/Data Privacy
Auditing/data lineage
Hybrid/Multicloud Integration
Cloud Data Analytics
Cloud IT Modernization
22. 11
The Data Lake as the Repository of All Data
COSTGOVERNANCE
TTMSECURITY
• Huge up-front investment:
Questionable ROI
• Large recurrent maintenance costs:
• Risk of inconsistencies:
• Loss of capabilities:
Efficient use of the data lake to accelerate
insights comes at the cost of price, time-to-
market and governance
An environment with multiple purpose-specific
systems slows down TTM and jeopardizes security
and governance
• Higher Complexity
• Risk of Inconsistencies
• Loss of Security
Delivery
Zone
23. 12
Best Practices – Data Virtualization / Management in the Cloud
Architecture:
• Location of data sources
• Scaling – Auto / Clustering
• Load balancing / High
Availability
Sizing:
• Data volume (size)
• Concurrency (queries)
• Infrastructure choices
• Cloud Burst workloads
Performance:
• Query pushdown
• Caching
• Networking (VPC)
Data Sources:
• SaaS applications (SFDC,
ServiceNow)
• Special connectors for AWS
Redshift, Snowflake, Spark
SQL, Object Storage, Kafka
streaming)
• REST and Odata connectors
24. • Public cloud providers charge every time you move
data from their cloud storage to your on-premises
storage.
• Moving data to cloud – no charge
• Moving data from cloud – $$$
System
Executio
n Time
Data
Transfer
red
Optimization
Technique
Denodo 9 sec. 4 M
Aggregation
push-down
Other
Federato
rs
125 sec. 302 M
None: full
scan
Avoid Expensive Cloud Data Movement
Denodo Can Reduce Cloud Egress Costs
1. Keep Denodo server closer to the Cloud data sources
(minimize data movement) - Think cloud native
2. With Hybrid scenarios, ability to cache your active data
on-premises, helps avoid Cloud Data Egress fees.
3. If center of data gravity is in the cloud, Denodo can
pushdown the query for maximal filtering and returning
MINIMAL rows to save costs. Denodo in Cloud can act as a
source to minimize further data movement.
25. 14
How Denodo Complement s Logical Data Lake in Cloud
Denodo Architecture for Logical Data Lake
Denodo does not substitute data
warehouses, data lakes, ETLs...
Denodo enables the use of all
together plus other data sources
In a logical data warehouse
In a logical data lake
They are very similar, the only
difference is in the main
objective
There are also use cases where
Denodo can be used as data
source in a ETL flow
26. 15
Hybrid Data Fabric – Migration to Cloud
Scenario where the Hybrid Data
Hub is useful during Cloud
migration.
• Databases and applications can
be gradually migrated to the
cloud
• The DV layer absorbs the
changes
• Migration is transparent for end
users
Active
Directory
Data CenterCloud
27. 16
DV abstracting SaaS APIs
Enables traditional reporting tool to
work with any kind of SaaS API
SQL-to-SaaS: DV abstracts the
SaaS API (usually REST services)
as part of a relational model
Real Time access: avoids
replication of Cloud data back
into the data center
28. Key Takeaways – Data Management in the Cloud
17
FIRST
Takeaway
Data Virtualization is a key technology when building a
modern data architecture
SECOND
Takeaway
It provides flexibility and agility and reduces the time to
deliver data to the business by up to 10X
THIRD
Takeaway
Data Virtualization hides the complexity of a constantly
changing data infrastructure from the users
FOURTH
Takeaway
In doing so, it allows you to introduce new technologies,
formats, protocols, etc. without causing user disruption