2. 2
Data, Data, Everywhere…
And not a drop to read!
Organizations are awash with data, but…
▪ What data is available?
▪ What’s its structure?
▪ How good is it?
▪ How to access the data?
Data Services Marketplace
▪ Provides a mechanism for end users and developers to
▪ Easily Find and access data
▪ For reports, applications, analytics, etc.
3. 3
Data Services Marketplace
Enterprise Apps
SQL (JDBC/ODBC), RESTful Web Services, SOAP, JMS, etc.
Operational
Systems
Analytical
Systems
Big Data External/SaaS
Systems
Virtual
Data Marts Virtual ODS
Reusable
Data Services
Metadata Scheduling & Delivery Usage Stats
Enterprise Data
Service Registry
Data Services
Layer
A single place where
consumers of data –
developers or end
users – can search,
find, and access data,
that is available to
them, as a service
4. 4
Enterprise Data Service Registry
Catalog of data available to consumers
Metadata for data ‘services’
▪ Format and structure of data, description of data and attributes
▪ Data lineage information – where does the data come from?
Access permissions for data services
▪ Enforcing privacy and security policies
Monitoring and auditing of data usage
▪ Monitoring and managing QoS/SLA
▪ Knowing who is access data, when and how…
Metadata Scheduling & Delivery Usage Stats
Enterprise Data
Service Registry
5. 5
Virtual Data Services Layer
A data access layer that abstracts underlying data sources and exposes them as
discrete services to form a ‘data API’
▪ Different users and developers across the enterprise access data in a secure and managed
fashion and share a common data ‘model’
▪ Secure and managed access to data across the enterprise
▪ Consistency of data
▪ Hides complexity, format, and location of actual data sources
▪ Supports many consumption protocols and patterns
Single data access layer for all development teams to avoid ‘hunting down and
interpreting data differently by project’
Virtual
Data Marts Virtual ODS
Reusable
Data Services
Data Services
Layer
6. 6
Implementing Data Services
Different Technologies
Data services can be implemented using a number of
different technologies:
▪ ESB/SOA
▪ ETL
▪ MDM
▪ Data Virtualization
Typically it will be one or more of the above
7. 7
The Foundation for the Data Services Marketplace
Data Services with Data Virtualization
SQL (JDBC/ODBC), RESTful Web Services, SOAP, JMS, etc.
Operational
Systems
Analytical
Systems
Big Data External/SaaS
Systems
Enterprise Apps
Optimized for data services
▪ Configuration and not coding
▪ Rapid development and time-to-
value
Supports multiple delivery styles
▪ Real-time/right-time, batch/file, etc.
▪ Multiple protocols – SQL
(JDBC/ODBC), Web Services
(REST/SOAP), …
Complements other technologies
▪ MDM exposed as services through
data virtualization
▪ Combined with an ESB for process
flows
8. 8
Benefits & Challenges of Data Services
•Rapid development, service reuse, quicker time-to-value
Agility
•Combine data to provide data ‘as needed’ not ‘as stored’
•Aligned with logical data models
Data Integration
•Data consistency, common ‘model’
Data Quality
•Users don’t need direct access to data sources, better
management and security
Single Point of Interaction
Benefits
•How secure is the data? How is access controlled?
Security
•How is personal information protected?
•How can you audit access compliance?
Privacy
•Does the data services layer ‘get in the way’? How
does it impact performance? And QoS/SLAs?
Performance/QoS
•How do you know that the data is ‘good’?
Data Governance and Veracity
Challenges
10. 10
•How secure is the data? How is access controlled?
Security
Authentication
• Pass-through authentication
• Kerberos and Windows SSO
• OAuth, SPNEGO
Authentication
• Standard JDBC/ODBC security
• Kerberos and Windows SSO
• Web Service security
Role based Authentication
Guest, employee, corporate
Schema-wide Permissions
Data Specific Permissions
(Row, Column level, Masking)
Policy Based Security
Data in motion
• SSL/TLS
Data in motion
• SSL/TLS
Encrypted data
at rest
• Cache
• Swap
LDAP
Active Directory
11. 11
Data in Motion – secure channels
▪ Using SSL/TLS
▪ Client-to-Denodo and Denodo-to-source
▪ Available for all protocols (JDBC, ODBC,
ADO.NET and WS)
▪ WS security: Basic, Digest, SPNEGO (Kerberos),
integration with LDAP
Data at Rest – secure storage
▪ Cache: third party database. Can leverage its
own encryption mechanism
▪ Swapping to disk: serialized temporarily stored
in a configurable folder that can be encrypted by
the OS
Encryption/Decryption
▪ Support for custom decryption for files and web
services
▪ Transparent integration with RDBMs encryption
Authentication
▪ Native and LDAP/Active Directory based
▪ Support for Kerberos and Windows SSO
Authorization
▪ Virtual Database
▪ View
▪ Row and Column level authorization
▪ Masking
▪ Custom policies for specific security constrains
and integration with external policy servers
Roles
▪ Integration with LDAP/AD groups
▪ Role hierarchies supported
Pass-through session credentials
▪ Leverage existing source privileges
Securing data
Security in Denodo
12. 12
Advanced Selective Data Masking
•How is personal information protected?
•How can you audit access compliance?
Privacy
13. 13
Custom
Policy
Conditions satisfied
Security: applies custom security policies
▪ If person accessing data has role of 'Supervisor' and
location is ‘London', then show compensation
information for employees in the London office only.
Data consuming users, Apps
Query
Accept / add filters
Reject
Interception of queries before they are executed
Custom Policies
14. 14
Rule Based Resource Restriction
▪ Rules classify sessions into groups
▪ By user, role, application, IP, time of the
day, etc.
▪ E.g. Connections from application ‘app1’
coming from users with role ‘reporting’
are assigned to a group
▪ Apply restrictions for each group.
▪ Change priority, change concurrency
settings, change max timeouts, etc
•Does the data services layer ‘get in the way’? How
does it impact performance? And QoS/SLAs?
Performance/QoS
Custom
Policy
Conditions satisfied
Enforcement: rejects/filters queries by specified criteria
like user priority, cost, time of day etc.
▪ If the production batch window runs from 3 am –
6 am, there is increased load on production
servers at this time
▪ All queries on these servers can be blocked during
this time to prevent failure of a process
Data consuming users, Apps
Query
Accept / add filters
Reject
15. 15
Controlled Resource Allocation
Resource Manager
1 Defines a rule that will be triggered
for “app1” and users with the role
“reporting”
2 For requests that fulfill the rule, if the CPU usage is greater
than 85%, will apply the following:
• Reduce thread priority
• Reduce the number of concurrent requests
• Limit the number of queued queries
16. 16
Performance Features
Data Provisioning Layer
Selective Materialization
Intelligent Caching of only the most relevant and often used information
Streaming & pagination
Operate on data in streaming mode for a low memory footprint. Paginate
responses to control the size of datasets
Parallelism
Parallel access to disparate sources to minimize latency
NESTED JOINs for concurrent access to sources with restricted query
capabilities
Optimized Resource Management
Smart allocation of resources to handle high concurrency
Throttling to control and mitigate source impact
Resource plans based on rules
17. 17
Multinational insurance &
reinsurance company
▪ Average response time of 80-100ms
▪ 200+ concurrent queries
▪ 2 nodes – 4 cores each
Global semiconductor chip manufacturer
▪ Enterprise-wide data access layer
▪ ~50 data sources
▪ +90 published data services
▪ Response times under 120ms,
▪ well in compliance with internal SLAs
(200-300ms)
▪ 128+ cores in production
Data Provisioning Layer
Quality of Service in Real Scenarios
18. 18
Data Lineage: Understand the “source of truth” and transformations of every piece of data in the model
•How do you know that the data is ‘good’?
Data Governance and Veracity
19. 19
Data Lineage: Understand the “source of truth” and transformations of every piece of data in the model
•How do you know that the data is ‘good’?
Data Governance and Veracity
21. 21
Leading SaaS and data analytics company
for energy exploration decision support
Helping the oil and gas industry achieve
better, faster results
HQs in Austin, Texas. More than 400
employees on 5 continents
Services 3,000+ companies globally
Business Need
Business growth driving need to develop
new tools and models.
▪ Rapid time-to-market is crucial
▪ Conventional Enterprise Data Warehouse
fed by ETL was not fast enough for the
data needs of the development team
▪ Needed a cost-effective solution to reduce
time to value
DrillingInfo
22. 22
Drillinginfo
Solution
▪ Raw data in the Data Warehouse and refined data in
MDM are virtually connected
▪ Data Virtualization Layer combines the views and exposes
them as RESTful services to the Analytics and Decision
Support applications internally.
▪ Provided search indices for external clients that were
building their own apps based on these services
Benefits
▪ So far built 24 services around 11 core line of business
entities.
▪ Response time cut to hours. Earlier it took 2-3 days for
ETL process to finish and 2 more days to build data
interface.
▪ Now just 1 developer managing the entire virtualization
process.
▪ Saved time and resources to achieve the primary benefit
of rapid time-to-market for their products.
23. 23
-Jay Heydt, Manager, Drillinginfo
As a data and business intelligence provider, one of our biggest
challenges is the need to rapidly sell the data that we acquire. The
Denodo Platform enables us to build and deliver data services to
our internal and external consumers within 3–4 hours instead of
the 1–2 weeks that would take with ETL”
24. 24
Offers life insurance, disability income
insurance and retirement programs to
individuals
8800 employees
$82B in asset under management
$7+ revenue
Business Need
▪ Business units needed access to a wide
variety of disparate data sources
▪ Decouple app to app communication
through an abstraction layer so that
enhancing and retiring applications becomes
much more flexible and easy
▪ Understand all data assets existing in the
organization and how and where data is
consumed
▪ Create a data dictionary for the entire
organization
Insurance Provider
25. 25
Insurance Provider
Solution
▪ Centralized enterprise data services marketplace using
Denodo platform, containing all reusable data assets
▪ Data services marketplace accommodating all consumers,
allowing them to search and request access to data
▪ Standardized data access and delivery patterns
▪ Self-service portal, dashboards and reporting systems for
business consumers
Benefits
▪ Centrally governed certified data services marketplace
helps with data security, audit and better access control
▪ Data-as-a-Service (DaaS) to applications and users,
making data consumption flexible and easy
▪ Business users hidden from underlying data complexity
and empowered with information so that they can
further business goals
▪ Consistency and standardization of information across
the organization