Fast Data Strategy Houston Roadshow focused on the next industrial revolution on the horizon, driven by the application of big data, IoT and Cloud technologies.
• Denodo’s innovative customer, Anadarko, elaborated on how data virtualization serves as the key component in their prescriptive and predictive analytics initiatives, driven by multi-structured data ranging from customer data to equipment data.
• Denodo’s session, Unleashing the Power of Data, described the complexity of the modern data ecosystem and how to overcome challenges and successfully harness insights.
• Our Partner Noah Consulting, an expert analytics solutions provider in the energy industry, explained how your peers are innovating using new business models and reducing cost in areas such as Asset Management and Operations by leveraging Data Virtualization and Prescriptive and Predictive Analytics.
For more information on upcoming roadshows near you, follow this link: https://goo.gl/WBDHiE
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
Fast Data Strategy Houston Roadshow Presentation
1. 1
Harnessing Your Hybrid Data
Ecosystem
Unleashing the power of data with data virtualization.
Lakshmi Randall, Twitter:@LakshmiLJ
Director of Product Marketing
July 2017
2. 2
Multi-Platform Architecture
Reality of Modern Enterprise
Diverse Governance and Metadata Needs Diverse Ingestion and Integration Needs
Diverse SkillsetsDiverse Data Architectures
Batch Real-time Continuous
Right-time
CloudData lakes DW
Data Hub Distributed
On-demand
Local
Centralized
Metadata
Local
Metadata
Metadata
Exchange
Local
Governance
Centralized
Governance
3. 3
HDE comprises multivarious data, processes and technologies that enable
enterprises to optimally harness insights
Hybrid Characteristics
Legacy & Modern
Multi-Platform
Distributed Architectures
Batch & Real-time
Structured & Unstructured
Cloud & On Premises
Open Source & Commercial
Diverse Data
Domain-specific Views
Disparate Data Sources
Hybrid Data Ecosystem
4. Most data warehouses are now
multi-platform hybrid architectures.
Source: 2014 TDWI report “Evolving Data Warehouse
Architectures.” Based on 538 respondents.
Other
(2%)
No true EDW, but
many workload-
specific data
platforms instead
Many workload-specific
data platforms w/non-
central EDW
Central EDW
with many
additional data
platforms
Central EDW with a
few additional data
platforms
Central
monolithic EDW
with no other
data platforms
15%15%16%37%15%
EDW
DWE
Multi-platform
hybrid is the new
norm.
Monolith was
norm in ‘90s;
now rare.
5. 5
BENEFITS
• Enables business goals
• Flexibility to support data
diversity
• Cost optimization opportunities
• Supports prototyping of new
business models
• Multiple Systems of Insight
CHALLENGES
• Data Ownership
• Integration and Unification
• Data Quality Risks
• Skillset Scarcity
• Optimization Issues
• Multiple data models
• Lack of Holistic View
• Multiple Local Architectures
Benefits and Challenges of HDEs
6. 6
Harnessing Insights from HDEs
Costs of Complexity:
“Just Because It’s Difficult To Quantify, Doesn’t Mean It’s
Zero!(But That’s How It’s Often Treated!)”
The Hands-on Group
One or more architectures or layers must unify the disparate systems
and data assets of the HDE to understand and mask the HDE’s
complexity
Achieving technical cohesion and business value in a multi-platform environment
7. 7
The Solution – A Data Abstraction Layer
Abstracts access to
disparate data sources
Acts as a single repository
(virtual)
Makes data available to
consumers in real-time
“Enterprise architects must revise their data
architecture to meet the demand for fast data.”
– Create a Road Map For A Real-time, Agile, Self-Service Data
Platform, Forrester Research, Dec 16, 2015
DATA ABSTRACTION LAYER
8. 888
Five Essential Capabilities of Data Virtualization
1. Data abstraction
2. Zero replication, zero relocation
3. Real-time information
4. Self-service data services
5. Centralized metadata, security
& governance
9. 999
1.Data abstraction
Abstracts access to disparate data sources.
Acts as a single virtual repository.
Abstracts data complexities like location, format,
protocols
…hides data complexity for ease of data access by business
Enterprise architects must revise their data architecture to meet
the demand for fast data.”
– Create a Road Map For A Real-time, Agile, Self-Service Data
Platform, Forrester Research
10. 101010
2.Zero replication, zero relocation
…reduces development time and overall TCO
The Denodo Platform enables us to build and deliver data services, to
our internal and external consumers, within a day instead of the 1 – 2
weeks it would take with ETL.”
– Manager, DrillingInfo
Leaves the data at its source; extracts only what is needed,
on demand.
Diminishes the need for effort-intensive ETL processes.
Supports transformations and quality functions without the
latency, redundancy, and rigidity of legacy approaches.
11. 111111
3.Real-time information
Provisions data in real-time to consumers
Creates real-time logical views of data across many data
sources.
Supports transformations and quality functions without the
latency, redundancy, and rigidity of legacy approaches
…enables timely decision-making
Data virtualization integrates disparate data sources in real time or near-real time
to meet demands for analytics and transactional data.”
– Create a Road Map For A Real-time, Agile, Self-Service Data Platform, Forrester
Research, Dec 16, 2015
12. 121212
4. Self-service data services
Facilitates access to all data, both internal and external
Enables creation of universal semantic models reflecting business
taxonomy
Connects data silos to provide best available information to drive
business decisions
…enables information discovery and self-service
Impressively quick turn around time to "unlock“ data from additional siloes and
from legacy systems - Few vendors (if any) can compete with Denodo's support
of the Restful/Odata standard - both to provide data (northbound) and to
access data from the sources (southbound).”
– Business Analyst, Swiss Re
13. 131313
5. Centralized metadata, security & governance
Abstracts data source security models and enables single-point security and
governance.
Extends single-point control across cloud and on-premises architectures
Provides multiple forms of metadata (technical, business, operational) to
facilitate understanding of data.
…simplifies data security, privacy, audit
Our Denodo rollout was one of the easiest and most successful rollouts of critical enterprise
software I have seen. It was successful in handling our initial, security, use case immediately,
and has since shown a strong ability to cover additional use cases, in particular acting as a Data
Abstraction Layer via it's web service functionality.”
– Enterprise Architect, Asurion
14. 141414
Definition
-Source: “Gartner Market Guide for data virtualization – 2016”
Data virtualization technology can be used to create virtualized and integrated
views of data in memory (rather than executing data movement and physically
storing integrated views in a target data structure), and provides a layer of
abstraction above the physical implementation of data.”
16. 16
The Role of Data Virtualization in HDEs
• Enable an Integrated Data Ecosystem
• Improve Business Agility & Productivity
• Provide Virtualized Views of HDE
• Access data instead of replicating and consolidating as appropriate
• Centralize Metadata and Governance Policies for a HDE
• Optimize and Manage data access to a HDE
• Minimize skillset challenges in a HDE
• Provision business-ecosystem-specific views from HDEs
17. 17
HDE: Three Perspectives
HDE comprises multivarious data, processes and technologies that enable
enterprises to optimally harness insights
Integrated
Supply Chain
Multi-channel
Marketing
Financial RiskQuality Control
Business
Perspective
Local & Centralized
Governance
Hybrid Characteristics
Legacy & Modern
Multi-Platform
Distributed
Architectures
Batch & Real-time
Structured &
Unstructured
Cloud & On Premises
Open Source &
Commercial
Diverse Data
Domain-specific ViewsEnterprise
Perspective
Common
Data Models
Data Reuse
Technical
Perspective Disparate Data Sources
Databases & Warehouses, Cloud/Saas Applications, Big Data, NoSQL, Web, XML, Excel ,PDF, Word...
Shared
Metadata
Data
Ownership
19. -Chuck DeVries VP Architecture and Development Vizient
The Denodo Platform will provide 350% ROI over 5 years and break
even within 1.5 years of our initial project and will continue to deliver
additional savings every year. Further, we plan to leverage the platform
in our data lake project.”
20. 20
Risk Data Ecosystem (RDE) – Business Perspective
Risk Systems Integration using Data Virtualization
Risk areas: financial (credit, liquidity, ...), market and
operations
RDE Delivers aggregation and internal reporting of
risk data that is more timely, accurate,
comprehensive and granular;
Highly automated aggregation of risk data by
business line, region, asset type, industry, legal
entity.
Adaptable and flexible process for ad hoc
requests.
Higher standards for reporting practices: reports
are accurate, reconciled, validated; tailored to the
audience and context
Virtual risk views across bank
21. 21
Data Marketplace - Enterprise Perspective
Enterprise Data Service Registry
Virtual Data Layer
Scheduling
& Delivery
Reuse Data
Services
Virtual Operational
Data Stores
Virtual Data
Marts
Usage StatsMeta Data
RDBMSNoSQLBig Data Web ServicesPackaged
App
Files
Enterprise Data
Marketplace
BI, CPM and
Reporting
Portal &
Dashboards
Applications
BUSINESS
SOLUTIONS
Access Information-
as-a-Service
ENTERPRISE DATA
SERVICE
REGISTRY
Standard metadata
and enterprise data
services
DATA
VIRTUALIZATION
Abstract layer for
data services
DISPARATE DATA
Any source
Any format
23. Company confidential – do not forward or distribute
23
Customer Centricity / MDM
Complete View of Customer
Data Services
Data as a Service
Data Marketplace
Data Services
Application and Data Migration
Cloud Solutions
Cloud Modernization
Cloud Analytics
Hybrid Data Fabric
Data Governance
GRC
GDPR
Data Privacy / Masking
BI and Analytics
Self-Service Analytics
Logical Data Warehouse
Enterprise Data Fabric
Big Data
Logical Data Lake
Data Warehouse
Offloading
IoT Analytics
Denodo ‘Solution’ Categories
24. Company confidential – do not forward or distribute
24
Denodo ‘Solution’ Categories
Customer Centricity / MDM
Complete View of Customer
Data Services
Data as a Service
Data Marketplace
Data Services
Application and Data Migration
Cloud Solutions
Cloud Modernization
Cloud Analytics
Hybrid Data Fabric
Data Governance
GRC
GDPR
Data Privacy / Masking
BI and Analytics
Self-Service Analytics
Logical Data Warehouse
Enterprise Data Fabric
Big Data
Logical Data Lake
Data Warehouse
Offloading
IoT Analytics
Customer Centricity/MDM
Complete View of Customer
Customer Service Unified Desktop
Unified Desktop for Contact Center
Customer Self-Service Portal
Single Customer View for Back Office
Automation
25. Company confidential – do not forward or distribute
25
Customer Centricity / MDM
Complete View of Customer
Data Services
Data as a Service
Data Marketplace
Data Services
Application and Data Migration
Cloud Solutions
Cloud Modernization
Cloud Analytics
Hybrid Data Fabric
Data Governance
GRC
GDPR
Data Privacy / Masking
BI and Analytics
Self-Service Analytics
Logical Data Warehouse
Enterprise Data Fabric
Big Data
Logical Data Lake
Data Warehouse
Offloading
IoT Analytics
Denodo ‘Solution’ Categories
Data Governance
GRC
Data Retention for Regulatory Compliance
Risk Reporting for Basel III Compliance
Single View of Risk
GDPR
Data Privacy and Protection
Data Privacy/Masking
Data Privacy in a Hybrid Environment
De-identifying Patient Data according to
HIPAA Safe Harbor Rules
26. Company confidential – do not forward or distribute
26
Customer Centricity / MDM
Complete View of Customer
Data Services
Data as a Service
Data Marketplace
Data Services
Application and Data Migration
Cloud Solutions
Cloud Modernization
Cloud Analytics
Hybrid Data Fabric
Data Governance
GRC
GDPR
Data Privacy / Masking
BI and Analytics
Self-Service Analytics
Logical Data Warehouse
Enterprise Data Fabric
Big Data
Logical Data Lake
Data Warehouse
Offloading
IoT Analytics
Denodo ‘Solution’ Categories
Data Services
Data as a Service
Data Services for Drug Discovery
Unified Data Services Layer
Enterprise Data Service Layer
Data Marketplace
Data Access Marketplace
Liquidity Management Dashboard
Data Services
Cable Set Top Box Transaction Management
RESTful Web Services API for Development
Teams
Application and Data Migration
Migration Abstraction Layer
Mergers and Acquisitions
27. Company confidential – do not forward or distribute
27
Denodo ‘Solution’ Categories
Customer Centricity / MDM
Complete View of Customer
Data Services
Data as a Service
Data Marketplace
Data Services
Application and Data Migration
Cloud Solutions
Cloud Modernization
Cloud Analytics
Hybrid Data Fabric
Data Governance
GRC
GDPR
Data Privacy / Masking
BI and Analytics
Self-Service Analytics
Logical Data Warehouse
Enterprise Data Fabric
Big Data
Logical Data Lake
Data Warehouse
Offloading
IoT Analytics
BI and Analytics
Self-Service Analytics
Self-Service Discovery
Self-Service Exploration
Self-Service Collaboration
Logical Data Warehouse
Inventory-Sales Reconciliation Reports
Logical Data Warehouse
Agile Reporting using Logical Data
Warehouse
Enterprise Data Fabric
Single View of Supply Chain
Secure Data Services Layer
28. Company confidential – do not forward or distribute
28
Denodo ‘Solution’ Categories
Customer Centricity / MDM
Complete View of Customer
Data Services
Data as a Service
Data Marketplace
Data Services
Application and Data Migration
Cloud Solutions
Cloud Modernization
Cloud Analytics
Hybrid Data Fabric
Data Governance
GRC
GDPR
Data Privacy / Masking
BI and Analytics
Self-Service Analytics
Logical Data Warehouse
Enterprise Data Fabric
Big Data
Logical Data Lake
Data Warehouse
Offloading
IoT Analytics
Big Data
Logical Data Lake
Single View for Customer Analytics
Data Warehouse Offloading
Cost Reduction
IoT Analytics
Contextual Data for Advanced Analytics
29. Company confidential – do not forward or distribute
29
Denodo ‘Solution’ Categories
Customer Centricity / MDM
Complete View of Customer
Data Services
Data as a Service
Data Marketplace
Data Services
Application and Data Migration
Cloud Solutions
Cloud Modernization
Cloud Analytics
Hybrid Data Fabric
Data Governance
GRC
GDPR
Data Privacy / Masking
BI and Analytics
Self-Service Analytics
Logical Data Warehouse
Enterprise Data Fabric
Big Data
Logical Data Lake
Data Warehouse
Offloading
IoT Analytics
Cloud Solutions
Cloud Modernization
Application Modernization
Cloud Migration
Cloud Analytics
Analytics in the Cloud
Web/Cloud/Semi-Structured Data
Integration
Hybrid Data Fabric
Single View of Customer for Distributor
Portal
Automation of Service Interaction for
Retail Partner Customers
30. 30
Going Forward
Web-based Information Self-Service
• Advanced data catalog enables a centralized “data marketplace”
• Keyword base search
• Collaboration (tags, comments, annotations, request for access, etc.)
Next-gen “Fabric” Execution Engine
• Tighter integration with in-memory and data grids to move processing from the
virtual layer to specialized execution engines
Holistic Operations Console
• Common operations web console to orchestrate monitoring, notifications,
diagnosis, auditing, migration, license management, etc.
What’s cooking in the virtualization space
31. 31
Summary
• HDE is inevitable in modern enterprises - Embrace the diversity.
• Ensure your HDE evolution is driven by business goals
• Virtualize Data, don’t Migrate or Consolidate It
• Leverage Data Virtualization to understand, access, unify, govern,
and model your data in a HDE.
34. Who Are We
One of the world's largest independent exploration and production companies.
Committed to Health, Safety and Environment.
Over 4,000 employees worldwide.
Committed to its Core Values of: Integrity and Trust, Servant Leadership, People and
Passion, Commercial Focus, Open Communication.
An integral part of the communities where we live, work and operate.
Recognized among the World's Most Innovative Companies by Forbes in 2012.
34
35. Business Need
Data - Access to Critical Information to Support Business Processes
Better – access to complete information
More – access to related information
Faster – access in real-time
Common Catalog – For enterprise
35
36. Challenge
Data is Siloed Across Disparate Systems
Manually access different systems
Addressed with point-to-point data
integration
Takes too long to get answers to users
Inadequate security on source systems
36
37. Challenge
Friction between Business and IT
IT is too slow. Takes too long to build
solutions.
Wrong Data – Obsolete or Stale
Lack of adequate enterprise data
repositories - DW / Data Mart / Data Lake
37
39. Solution
Data Abstraction Layer
Abstracts access to disparate data sources
Acts as a single repository (virtual)
Makes data available in real-time to consumers
Integration with AD – Security
39
40. Data Virtualization – Our Journey
Projects and Timeline
Pilot Project: Jul’2016 - Sep’2016
Full Implementation in Business Unit: Oct’2016 - Feb’2017
Governance Implementation: Jun’2017 - Oct’2017
40
42. Data Virtualization
Use Case # 1 – Industry Subscription
Data in the Cloud
Problem
Provide consistent and up-to-date access to purchased industry subscription data for data mining
Multiple vendors
Multiple data types (Well Locations, Oil and Gas Production Volumes, M&A Activity)
Multiple access protocols (Azure SQL database, hosted XML files, external JSONREST web
services)
Honor internal and external security requirements and ensure adequate performancecost
Prevent sharing usernames and passwords
Leverage (internal) enterprise security infrastructure
Provide metricsaudit on usage
Limit access as specified in agreements
Avoid time and cost of standing up additional databases
Solution 42
43. Data Virtualization
Use Case # 2 – Logical Data Mart for
Key Business Unit
Problem
Significant organization changes due to market conditions surfaced several point solutions driving
critical business processes
Reduce unnecessary copies from corporate data stores into local stores that stagnant quickly and are
difficult to support (e.g. multiple, duplicated mini data marts in Excel and Access)
Need ability to combine augmented or rapidly changing business unit specific data with corporate
data
Solution
Leveraged newly formed data and analytics team in business unit(s) to provide centralized support
Partnered with corporate teams to develop managed data delivery environment (tools + process)
Built logical data mart (i.e. virtual database) to combine BU-specific and corporate data 43
44. Data Virtualization
Use Case # 3 – Streamline Well Summary and Production Data Retrieval
Problem
Needed to combine multiple data types (well header, production volumes, well spacing, forecast)
from disparate systems
Many manual processes used to update data set resulted in time-consuming process
Reports ran very slowly
Use of Spotfire for integration prevented reports from being run by other reporting tools.
Solution
Integrated data from disparate data sources into a few views
Was able to integrate Excel workbooks into the solution 44
46. Data Virtualization
Our Observations
More access to data
Ability to expose data in multiple ways (ODBC, JDBC, OData)
Combine data in new ways from different sources
Ability to access non-traditional data sources (e.g. SharePoint, web services,
multi-dimensional)
Make the data sources all look like they reside in the same database
Better access to data
Pick data from the best sources to incorporate into a mash-up view
Find source-of-record information in a central, documented location
Access by going directly to the source (instead of a copy) 46
47. Producers
View Designers
Techs and Power Users will be trained by IT
and/or train the trainer approach
Techs and Power Users from each asset or
functional area will build virtual views
Views will merge asset specific virtual views
with global asset views
Techs are hands on daily with individual
assets giving them a deeper understanding
of what each asset needs
IT
Train business users on use and best
practices
Build global virtual views that can be used
Consumers
Data Catalog (TBD)
Web-based tool for viewing metadata
Ability to request access and connection
info
Applications
Protocols
ODBC
JDBC
47
How are views to data created and accessed?
ToadTM Data Point
48. Recommendations - Base Cases for Use
Combine data from multiple sources in real-time
Source systems are highly available
Access different types of data: structured (DBMS), semi-structured (XLS),
unstructured (PDF, Web), web services
Simple data cleansing and less complex transformations
48
49. Key Discoveries
May not duplicate all functionality in every client tool (Spotfire, Excel, Access)
You are only as fast as your slowest data source
Pass-thru security is difficult
You can connect to almost anything (whether or not you should)
Change Management is a challenge with the current version, 6.0
Involve source system owners in early stages
ETL may be the best solution in some cases
Integration with SAP BW - Possible but performance is a challenge
Not intended for aggregations of large data sources in real-time 49
50. What’s next?
Business Unit Roll-out
Rolling out governance
Implementing metrics end of Sep’2017
Data Catalog
Enterprise Roll-out
Planned for Oct’2017
50
52. Infosys-Noah Consulting
Industry
Experience
Information
Managemen
t
Operations
Focused
Domain
Expertise
Average 25+ years of industry experience in
Information Management disciplines
Library of project Accelerators honed to meet
specific industry needs
Extensive experience providing solutions to the
largest and most complex companies in the world
Specific Information Management Strategy and
Implementation Methodologies
Industry Thought Leadership in MDM, Data
Quality, Metadata, Data Virtualization
52
53. Why Consider Data Virtualization?
Improve business agility
Reduce latency
Provide high quality, in context data
End user self-service
Ease of change
Enable enterprise / cross-BU data integration
Access immovable data
Lower TCO
53
Leveraging DV to unlock
information to accelerate and
improve business performance.
54. Data Virtualization use cases
Page
| 54
This is the primary
Use Case, using DV
to create a virtual
data warehouse for
reporting and
analytics.
Using DV to
extend/upgrade
existing EDWs would
be a good way to
expand on the value
case of DV.
DV and ETL can work
together to create Virtual
Data Marts on top of the
existing/extended EDW
platform. This use case is
relevant when there is an
existing trusted EDW
system.
DV can be used to
create golden records
on the fly implementing
a Registry MDM. Note
that MDM match and
merge logic can
sometimes be fairly
complex and may not
always be possible to
implement using DV.
1. Reporting & Analytics 2. Extending EDW
4. Registry MDM 3. Virtual Data Marts
56. Data virtualization doesn’t solve data
quality issuesApply strong data governance to support
your DV approach
Manage data quality at the source
Data dictionary and definitions
Data Stewardship
DV Standards
Library and naming standards
Virtualization layers
56
Image from the Data Management Book of Knowledge
(DMBOK) published by Data Management International
(DAMA)
57. Where is my data?
Know what the authoritative source is for each
attribute
Understand the data lifecycle and how data can change
across SOR
Understand the quality of information within each SOR
Ensure data standards are applied consistently across
each SOR for shared data types
Use of Data Catalog for easy access to information and
metadata about DV Views
58. Stick to your DV principles
Establish guiding principles on when to use DV versus other methods for data exchange
Understand your non-functional requirements (latency, linage, performance)
Performance considerations
59. Adding complexity to equation
Do not use for transformations beyond:
routing data
re-representing objects (i.e., renaming to
standard model)
data augmentation (i.e., derived metrics)
Minimize the urge to apply complex
transformation or calculations
59
Extract
Transform
Load
Data
Data
Data
Analyti
cs
Busines
s
Applicati
on
Reporti
ng
60. General guiding principles
Data Virtualization
Layer
* Source – Data Virtualization book
App Data Stores Files Services
Data WarehousesApplications
BI, Reports Applications Portals, other..
ConsumersProducers
• Use appliances to supercharge performance
• Do not store any material business data
• Perform aggregations in the data layer
• Perform transformations in the ETL layer
• Run data quality tools against the repository to
validate/qualify data
• Pass through I/U/D queries to the transaction source
• Modeling should account for reuse and growth
• Organization to support continuous expansion
61. Any Questions?
Thank you
James Soos
Associate Partner – E&P Practice Lead
Mobile: 936-499-8441
James.Soos@noah-consulting.com
James.Soos@infoys.com
Hinweis der Redaktion
Well Header data, production data, drilling data, transactional data, well survey data. Drilling schedule data. Common catalog for data.
Business feels that IT is too slow or takes too long.
Superuser to the rescue. Different version of the same solution. Downturn caused key personnel to leave and we lost support of solutions.