Watch full webinar here: https://bit.ly/3dudL6u
It's not if you move to the cloud, but when. Most organisations are well underway with migrating applications and data to the cloud. In fact, most organisations - whether they realise it or not - have a multi-cloud strategy. Single, hybrid, or multi-cloud…the potential benefits are huge - flexibility, agility, cost savings, scaling on-demand, etc. However, the challenges can be just as large and daunting. A poorly managed migration to the cloud can leave users frustrated at their inability to get to the data that they need and IT scrambling to cobble together a solution.
In this session, we will look at the challenges facing data management teams as they migrate to cloud and multi-cloud architectures. We will show how the Denodo Platform can:
- Reduce the risk and minimise the disruption of migrating to the cloud.
- Make it easier and quicker for users to find the data that they need - wherever it is located.
- Provide a uniform security layer that spans hybrid and multi-cloud environments.
5. A data fabric is an architecture pattern that informs and automates the design,
integration and deployment of data objects regardless of deployment platforms and
architectural approaches.
It utilizes continuous analytics and AI/ML over all metadata assets to provide actionable
insights and recommendations on data management and integration design and
deployment patterns.
This results in faster, informed and, in some cases, completely automated data access
and sharing.
Data Fabric Definition
5
6. 6
Pictorial View of a Data Fabric – from Gartner
Data Fabric Net
Compounds Customers Products Claims
RDBMS/OLTP
Flat Files
Legacy
Third Party
Traditional Analytics/BI
Data Warehouse
ETL ETL
Mart Mart
Data Lakes Cloud Data Stores Apps andDocument
Repositories
XML • JSON • PDF
DOC • WEB
7. - Forrester Research, June 2020
“Dynamically orchestrating disparate data sources intelligently and
securely in a self-service manner and leveraging various data platforms to
deliver integrated and trusted data to support various applications,
analytics, and use cases”
Data Fabric Definition
7
8. 8
Data management
Metadata/catalog
Data security
Data governance
Data processing
Data quality
Data lineage
Global distributed platform, in-memory, embedded,
self-service, and APIs
AI/ML
Global data access
Data modeling, preparation, curation, and graph engine
AI/ML
Data discovery
Transformation, integration, and cleansing
AI/ML
Data orchestration
Data platform —
processing
Data processing/
persistence
Hadoop
NoSQL
Spark
Policies
Ingestion, streaming, and data movement Data ingestion/streaming
AI/ML AI/ML
On-premises
Cloud Data sources
Data lake
EDW/BDW
AI/ML
Forrester Data Fabric Architecture
9. 9
Data management
Metadata/catalog
Data security
Data governance
Data processing
Data quality
Data lineage
Global distributed platform, in-memory, embedded,
self-service, and APIs
AI/ML
Global data access
Data modeling, preparation, curation, and graph engine
AI/ML
Data discovery
Transformation, integration, and cleansing
AI/ML
Data orchestration
Data platform —
processing
Data processing/
persistence
Hadoop
NoSQL
Spark
Data lake
EDW/BDW
AI/ML
Policies
Ingestion, streaming, and data movement Data ingestion/streaming
AI/ML AI/ML
On-premises
Cloud Data sources
The Logical Data Fabric Architecture
10. 10
• Data Abstraction: decoupling
applications/data usage from data
sources
• Data Integration without replication
or relocation of physical data
• Easy Access to Any Data, high
performant and real-time/ right-
time
• Data Catalog for self-service data
services and easy discovery
• Unified metadata, security &
governance across all data assets
• Data Delivery in any format with
intelligent query optimization that
leverages new and existing
physical data platforms
A logical data layer – a “logical data fabric” – that provides high-performant, real-time, and secure
access to integrated business views of disparate data across the enterprise
Data Virtualization: Logical Data Fabric
11. 11
Stages of a Cloud Journey
All systems are on-premise.
Using traditionaldatabases,
etc. – maybe an on-premise
Hadoop cluster. Lots of ETL
pipelines. Using Denodofor
integrated view of data.
Systems are now on-premise and in the Cloud –
initially hosted by the preferred Cloud provider. The
data is balanced across the different environments
although the bulk of the data is initially on-premise.
ETL-style data movement is often used to move data
from on-premise systems to Cloud-based analytical
systems. The systems are more complex and users
need to be able to find and access data from on-
premise and Cloud locations.
In reality, this is a hybrid/multi-Cloud environment, with
systems in multiple Clouds (AWS, Azure, GCP, Salesforce,
etc.) and a few legacy systems still on-premise. The
environment is even more complex as workloads can
move between Cloud providers to take advantage of new
capabilities, cost optimization, etc. Users still need to find
and access data in this environment.
System modernization initiatives move applications and
data to the Cloud. For critical systems, this migration is
typically a phased approach over a period of months(or
years).
On-
Premise
Transition
to Cloud
Hybrid
Single
Cloud
Multi-
Cloud
(Note: Most organizations skip this stage and go straight to
multi-Cloud)
Systems have moved to the Cloud (although some systems
are still on-premise and cannot be moved to the Cloud).
The ‘center of gravity’ for data is solidly in the Cloud. More
processing and data integration occurs in the Cloud. Data is
moved from on-premise systems to the Cloud using ETL.
User data access is predominantly from Cloud systems.
12. 12
Cloud Migrations Options
• Re-Host – ‘Lift and Shift’ – Take existing data and copy it to Cloud “as is” into same
database
• Good for smaller data sets or data sets with low importance
• Re-Platform – Relocate to new database running on Cloud – everything else stays
the same
• e.g. move from Oracle 12g to Snowflake
• Re-Factor/Re-Architect – Move to a different database *and* change the data
schema
• e.g. move from Oracle to Redshift and re-factor data model, partitioning, etc.
14. 14
Cloud Migration Using Data Virtualization
• Large or critical Cloud migrations are risky
• Big Bang approach is not advised
• Phased approach is recommended
• Select data set to migrate, copy to Cloud
• Test and tune data access, then go live
• Repeat for next data set and so on
• Use Denodo as abstraction layer during
migration process
• Isolate users from shift of data
15. 15
Hybrid Data Integration with a Logical Data Fabric
Common access point for both on-premise
and cloud sources
• Access to all sources as a single
schema
with no replication: Virtual data lake
• Enables combination of data
across
sources, regardless of nature and
location
• Allows definition of common
semantic
model
• Single security model and single
Active
Directory
Data Center
Cloud
16. 16
Multi-Cloud Integration with Logical Data Fabric
Amazon RDS,
Aurora
US East
AvailabilityZone
EMEA
AvailabilityZone
On-prem
data center
17. 17
BHP Builds a Logical Data Fabric Using Data Virtualization
BHP wanted to manage business risk by integrating data systems across
multiple geographies. But this was a time consuming and expensive operation.
BHP’s global application landscape provides limited and restricted reusability of existing
data platforms which lead to:
• Repeated engineering effort to access the same data sources for different data
solutions
• Long lead times to ingest or load data before a data solu on can be developed
• Project-centric data repositories are created to provide a consolidated set of data for
a specific purpose, increasing total cost of ownership, complexity and variability in
data interpretation
BHP is among the world's
top producers of major
commodities including iron
ore, coal and copper. They
have a global presence with
operations and offices
across Australia, Asia, UK,
Canada, USA and central
and south America.
18. 18
Reference Architecture
Data Source
Application data stores
SaaS / Cloud Applications
Application interfaces
Manual data sources
Data Virtualization Platform Consumers
Enterprise &
Regional Data
Stores
Self Service Data Catalogue
Query
Optimisation
Query
Development
Data
Federation
Data
Discovery
Abstraction / Semantic Layer
Security Layer
Kerberos Delegation + Encryption in Transit + Extensive Auditing
Secure
Faster
Connect to data stores or direct to source Get access to the right data, fast.
Self service
Flexible protocols
Analytics
Self Service
Business Intelligence
Transactional Applications
Bring your own tool
Built using
technology by
19. 19
Query Federation to Local Data Sources
Every Data Virtualization cluster is connected to local
data sources, and is the access point for local
consumer apps such as BI and analytics tools. Each
Data Virtualization cluster has visibility of the datasets
available from all other clusters, and requests this data
from it's peer cluster as required by end users
Brisbane
Perth
Santiago
Houston
Cloud
Tenancy
Data
Lake
Data
Mart
Data
Mart
Analytics
Analytics
Analytics
20. 20
1. Cloud architectures – both hybrid and multi-Cloud – are complex beasts
▪ A Logical Data Fabric using Data Virtualization can simplify the
architecture andmake
it easier for users to find and access the data that theyneed
2. In a multi-Cloud architecture, the Data Fabric should also be distributed –
providing global access to data coupled with local control
3. A Data Fabric also provides a unified security layer for data access
▪ A single place to enforce data access control – allowing users to access
the data that
they need rather than data based on organizationalsilos
Conclusions
22. 22
Demo Scenario
Tim
Mary
Jane
Manager
(South Region)
Manager
(North Region)
Data Analyst
(Corporate)
• Access to Southern Region Employee data
• Unnecessary data hidden or masked
• e.g. monthly salary, bonus rate, DOB
& email address
• No access to Northern Region data at all
• Access to Northern Region Staff data
• Unnecessary data hidden or masked
• e.g. monthly salary, bonus rate, DOB &
email address
• No access to Southern Region data at all
• Access to all de-identified employee data
• PII data hidden
• Access to data in all locations (North & South)
24. 24
1
2
3
4
5
6
7
8
9
10
SINGLE ACCESS POINT
APPLY BUSINESS RULES
PUBLISH DATA FOR RE-USE
BUILD A LOGICAL VIEW
APPLY DATA SECURITY
DATA DISCOVERY
CONNECT TO DISPARATE DATA
3RD PARTY TOOL ACCESS
HARVEST THE METADATA
Demonstrate
STANDARDIZE DATA
25. South - Oracle North - Snowflake
Differences in tables
Different Table
Names
Different Field
Names
Different values
/ reference
26. North - Snowflake
South - Oracle Standarardised views
Same Naming
Convention
Same Field
Names
Standardised
Values