Today’s enterprises have an unprecedented variety of data store choices to meet the needs of the varied workloads of an enterprise because there is no one-size-fits-all when it comes to data stores. Putting in place data stores to support a modern enterprise that is now reliant on data can lead to confusion and chaos.
Enterprises have many needs for databases, including for cache, operational, data warehouse, master data, ERP, analytical, graph data, data lake, time series data, and numerous other specific needs.
Today’s enterprises have an unprecedented variety of data store choices to meet the needs of the varied workloads of an enterprise because there is no one-size-fits-all when it comes to data stores. Putting in place data stores to support a modern enterprise that is now reliant on data can lead to confusion and chaos.
Enterprises have many needs for databases, including for cache, operational, data warehouse, master data, ERP, analytical, graph data, data lake, time series data, and numerous other specific needs.
While vendor offerings have exploded in recent years, in due time frameworks will integrate components into what amounts to, for practical purposes, a single offering for multiple workloads, perhaps even for the enterprise.
A multi-model database is a database that can store, manage, and query data in multiple models, such as relational, document-oriented, key-value, graph (triplestore), and column store.
An enterprise will find reduced overhead and other synergies from choosing a single vendor for these workloads.
This session will explore the multi-model option and some criteria that decision makers should evaluate when choosing a multi-model solution.
17. William McKnight
President, McKnight Consulting Group
• Frequent keynote speaker and trainer internationally
• Consulted to Pfizer, Scotiabank, Fidelity, TD
Ameritrade, Teva Pharmaceuticals, Verizon, and many
other Global 1000 companies
• Hundreds of articles, blogs and white papers in
publication
• Focused on delivering business value and solving
business problems utilizing proven, streamlined
approaches to information management
• Former Database Engineer, Fortune 50 Information
Technology executive and Ernst&Young Entrepreneur
of Year Finalist
• Owner/consultant: Research, Data Strategy and
Implementation consulting firm
2
William McKnight
The Savvy Manager’s Guide
The
Savvy
Manager’s
Guide
Information
Management
Information Management
Strategies for Gaining a
Competitive Advantage with Data
18. McKnight Consulting Group Offerings
Strategy
Training
Strategy
§ Trusted Advisor
§ Action Plans
§ Roadmaps
§ Tool Selections
§ Program Management
Training
§ Classes
§ Workshops
Implementation
§ Data/Data Warehousing/Business
Intelligence/Analytics
§ Big Data
§ Master Data Management
§ Governance/Quality
Implementation
3
20. Decisions, Decisions, Decisions
• Unprecedented variety of data store choices to meet
the needs of their varied workloads
• Enterprises have many needs for databases, including
cache, operational, data warehouse, master data, ERP,
analytical, graph data, data lake, and time series data
• While vendor offerings have exploded in recent
years, in due time frameworks will integrate
components into what amounts to a single offering
for multiple workloads, perhaps even for the
enterprise
• But what if price-performant offerings for adjacent
workloads in an enterprise have materialized?
5
21. Many Data Types
• Web Crawlers
• Open Linked Data
• JSON
• XML
• Documents
• Binary
• Graph
• Log Files
6
22. Why NoSQL for Operational Big Data
More data model flexibility
– Web Services as a data model
– No !schema first" requirement; load first
Faster time to insight from data acquisition
Relaxed ACID
– Eventual consistency
– Willing to trade consistency for availability
– ACID would crush things like storing clicks on Google
Low upfront software and development costs
Programmers love the freedoms
Fault-tolerant redundancy
Linear Scaling to “webscale”
7
23. • Placement policy:
A copy is written to the node creating the file (write affinity)
A second copy is written to a data node within the same rack (to
minimize cross-rack network traffic)
A third copy is written to a data node in a different rack (to tolerate
switch failures)
Node 5
Node 4
Node 3
Node 2
Node 1
Block
1
Block
3
Block
2
Block
1
Block
3
Block
2
Block
3
Block
2
Block
1
Objectives: load balancing, fast access, fault tolerance
DFS Block Placement
8
24. CAR
DRIVES
name: “Dan”
born: May 29, 1970
twitter: “@dan”
name: “Ann”
born: Dec 5, 1975
since:
Jan 10, 2011
brand: “Volvo”
model: “V70”
Property Graph Model Components
Nodes
• The objects in the graph
• Can have name-value
properties
• Can be labeled
friends
friends
LIVES WITH
O
W
N
S
PERSON PERSON
Relationships
• Relate nodes by type and
direction
• Can have name-value
properties
9
25. Semantic Graph
• RDF Triple Store
– Semantic databases only work with RDF
• Target market is users of third-party
data in RDF (all Linked open data)
– Working across data sets
10
27. Data Types and NoSQL Data Models
Data Type Data Model
CSV, TSV or web logs Column, Document
Documents Document
JSON Document
Metadata catalog Column, Document
Keyed images and documents Key-Value
RDF, Linked data Graph
12
28. Key-Value Stores
What are they?
• NoSQL’s OLTP equivalent
• Extremely simple
• Key-”blob pairs”, that’s it
• Associative array data model
• Retrieve value given a key
– All access is by a key
(key,value)
13
30. Key-Value Stores
Good for:
• Any single object of unstructured data
• Storing BLOBs
• Fast writes
• Web app cache
• Session Information – get all session information in a
single put/get
• User profile data
• Massive multi-player on-line gaming
• Shopping carts (up until the payment transaction)
• Geo-localized processing
• Speed when you can’t be down
(key,value)
15
31. A multi-model database is a single, integrated
database that can store, manage and query data i
multiple models such as relational, document,
graph, key-value, column-store, cache. It is the
opposite approach to Polyglot Persistence – the
use of multiple databases in a workload.
16
32. Document-oriented Databases
What are they?
• Key-Value Stores with added capabilities
– Ability to nest sub-documents
• JSON/XML data models
• With Tree-Like Structure
• Encapsulated document objects
• Groups data together more naturally and
logically
17
33. Document-oriented Databases
Technical Characteristics:
• Store all data together
– Example: Order document contains all line items
• Documents are self-describing hierarchical tree
structures
• Unlike Key-Value Stores, the value part of the field
can be queried
18
34. Document-oriented Databases
Good for:
• Semi-structured data
• Web pages
• Web traffic/E-Commerce
• Web analytics
• Log files
• User actions/behaviors
• Content Management Systems
• Full text
• Uncertain data
• Extending object-oriented approaches
• Event logging
• JSON/XML data
19
36. Multiple NoSQL Solutions Working Together
You could use
• Key-Value Store for Shopping Cart and
Session Data
• Document or Column Store for Consuming
Completed Orders
• RDBMS for inventory (small, not served real-
time), financials
• Graph Store for Customer Relationships for
Marketing
21
37. Column Stores
What are they?
• Data model:
– A big table, with column families
– Map-reduce for querying/processing
• Schema-lite
• No single point of failure
• Operational simplicity
• Closest NoSQL implementation to RDBMS
22
38. Column Stores
Good for:
• Large amounts of data
• Data that needs compression
• Event logging
• Content Management Systems
• Data model supports semi-structured
data
• Naturally indexed (columns)
• Good at scaling out horizontally
• Time Series data
– Weather data
– Location data
– Sensor data
23
40. What to Look for in Multi-Model 1/2
• Excellent implementation of multiple
models
• Single copy of data
• Model change propagation
• Works in microservices world
• Submillisecond response time
25
41. What to Look for in Multi-Model 2/2
• Globally distributed multi-region
deployments
• Cross-model data processing language
and optimizer
• Edge-capable database
• JSON flattening without data explosion
• Universal indices
26
42. Emerging Technologies
• Use of artificial
intelligence (AI)
• Integration with data
catalog platforms
• Robust user
experience
• Multi-cloud/native
application
27