Ten to fifteen years ago, we picked between a few major SQL databases. Now our apps have a variety of needs, and an overwhelming selection of database platforms. There are 5 main database families. In this talk we’ll survey all 5: Relational (SQL), Key/Value (NoSQL), Columnar (NoSQL), Document (NoSQL), and Graph (NoSQL). We’ll cover what scenarios each family handles well. In addition, we’ll discuss the most popular members of each family. So, the next time you need to pick a database, you’ll know which one - or ones - are the best fit.
2. About Me – Kristin Ferrier
18+ Years in IT
Principal Consultant at Ferrier Solutions
Full stack web developer with specialty and passion for data
Twitter: @SQLEnergy
Techlahoma Slack: @EnergyDev
GitHub: @EnergyDev
3. 10-15 Years Ago
Oracle SQL Server
Postgres MySQL
Which database
should I use?
There are four to pick from.
5. Overview
Difference between SQL and NoSQL databases
ACID – Something you may want your database to have
5 Main Database Families
Great scenarios for each of these families
7. SQL Databases
SQL databases are also known as Relational Databases
They store data in tables (in math terms called relations)
Use SQL to interact with the data
ID Product Price
3 Gryffindor Scarf 9.99
4 Ravenclaw Scarf 9.99
5 Hufflepuff 9.99
6 Slytherin Scarf 9.99
8. NoSQL
Varying opinions on the definition
Purpose-built databases for specific data models with an emphasis on flexible schemas
and new demands of modern applications
The data models include document, graph, key-value, and columnar
Non-relational databases (i.e. any database that isn’t relational)
11. Atomicity
Transaction must execute completely or not at all
Such atomicity must be guaranteed in all situations, including power failures and
hardware crashes
Account A $1,000 Account B
2. Account B Deposit $1,000
1. Account A Withdraw $1,000
This entire “unit of work”
succeeds or fails. No
intermediate state.
12. Consistency
Once a transaction has been committed, the data must conform to the
database schema.
Example: Database has foreign key constraint requiring deposits or withdrawals to
happen with bank accounts that exist in the Accounts table. Thus, the system
would only commit deposits or withdrawals corresponding to an account in the
Accounts table.
Accounts
123
456
789
Deposit $250 into
Account 123
Deposit $250 into
Account 999
Yes
No. Account
doesn’t exist
13. Isolation
Concurrent transactions must leave the database in a status as if they were executed
sequentially
Let’s look at an example where April is transferring $1,000 from Account A to Account B, while
Taylor is withdrawing $500 from Account B. Below is the expected result.
Account A Account B
$3,000 $2,000
-$1,000 +$1,000
$2,000 $3,000
-$ 500
$2,500
14. Isolation (No Insolation, account loses
money)
Transfer $1,000 from Account A
to Account B
Read Account A as $3,000
Withdraw $1,000 from Account A
Read Account B as $2,000
Add $1,000 to Account B
Account B has $3,000
Withdraw $500 from Account B
Read Account B as $2,000
Withdraw $500 from Account B
Account B has $1,500
We LOST $1,000!
15. Isolation (With isolation, we don’t lose the
money)
Transfer $1,000 from Account A
to Account B
Read Account A as $3,000
Pull $1,000 from Account A
Read Account B as $2,000
Add $1,000 to Account B
Account B has $3,000
Withdraw $500 from Account B
Read Account B (sorry, please, wait)
We’re waiting
We’re waiting
Account B has $3,000
Withdraw $500 from Account B
Account B has $2,500
16. Durability
Once a transaction has completed execution, the updates to the database are
persisted in a way that is recoverable upon system failure.
Example: April’s $1,000 transfer, once completed, could be recovered in case of
hardware failure.
Account A Account B$1,000
HW
Crash
We still
know about
the transfer
21. Relational – Data model
Data resides in tables containing rows and columns
ProductID (PK) ProductName CurrentPrice SubcategoryID (FK)
1 Red Headphones 49.99 1
2 Blue Headphones 59.99 1
3 Gryffindor Scarf 9.99 2
4 Ravenclaw Scarf 9.99 2
5 Hufflepuff 9.99 2
6 Slytherin Scarf 9.99 2
7 Developer Dragon T-shirt (M) 19.99 2
22. Relational – Data model
Each column has a data type, like
INT or VARCHAR(50)
At any one time a table has an
exact number of columns for
each row
Table structure must be defined
prior to adding data
ProductID (PK) ProductName CurrentPrice SubcategoryID (FK)
1 Red Headphones 49.99 1
2 Blue Headphones 59.99 1
3 Gryffindor Scarf 9.99 2
4 Ravenclaw Scarf 9.99 2
5 Hufflepuff 9.99 2
6 Slytherin Scarf 9.99 2
7 Developer Dragon T-shirt (M) 19.99 2
23. Relational – Data model
SELECT ProductName,
CurrentPrice
FROM Products
Data is accessed using SQL
SQL
24. Data lives in tables
Customers
Benefits
Orders
Categories
OrderDetails
Products
25. That can be tied together
Customers
Benefits
Orders
Categories
OrderDetails
Products
28. Relational - Overview
Design schema with types ahead of time
Can query the data with lots of flexibility
ACID-compliant
29. Relational – Popular for
Online Transaction Processing (OLTP)
Order entry
Financial transactions
Retail sales
Data warehousing
Great for many things – Some people recommend using relational databases
unless expecting to reach 500 GB or more fairly quickly
30. Relational – Not great for
Large volume (Petabytes) of data
Rapidly ingesting large volumes of data with unknown structures
Scaling out / Horizontal scaling
31. Relational – What if I want more?
Postgres – JSON data type (9.2)
JSONB data type (9.4)
SQL Server – JSON functionality (2016)
R (2016)
Python (2017)
MySQL – JSON data type (5.7.8)
Oracle – JSON functionality (12.1.0.2)
34. Document – Data model
Data resides in documents, such as JSON or XML documents
{
"EpisodeTitle": "And the Crown of King Arthur",
"Director": "Dean Devlin",
"FranchiseName": "The Librarians",
"Characters": [
{
"CharacterName": "Cassandra",
"Actor": "Lindy Booth"
},
{
"CharacterName": "Ezekiel",
"Actor": "John Harlin Kim"
},
{
"CharacterName": "Jake",
"Actor": "Christian Kane"
}
]
}
35. Document – Data model
Each document corresponds to a
document key
Can store nested objects within a
document
Typically store all data for a single object
in a single document
Typically can query by document key or
data within the document
DocumentKey: “7E8ABGED92A”
{
"EpisodeTitle": "And the Crown of King Arthur",
"Director": "Dean Devlin",
"FranchiseName": "The Librarians",
"Characters": [
{
"CharacterName": "Cassandra",
"Actor": "Lindy Booth"
},
{
"CharacterName": "Ezekiel",
"Actor": "John Harlin Kim"
},
{
"CharacterName": "Jake",
"Actor": "Christian Kane"
}
]
}
36. Document
Schemaless, nested data documents
You don’t need to know the structure ahead of time
ACID-compliance depends upon the provider and might be complete, partial, or
none
37. Document – Data access
Data manipulation and querying depends upon the specific provider
MongoDB: JavaScript
Couchbase: N1QL (SQL for JSON) for querying
Other options for insert/update/delete
38. MongoDB – Example
Insert
db.quotes.insertMany([
{ Franchise: "Star Wars", Character: "Yoda", QuoteText: "Do. Or do not. There is no try."},
{ Franchise: "The Librarians", Character: "Cassandra", QuoteText: "Mathemagics. I like it.."}
]);
Retrieve
db.quotes.find();
42. Document – Real Examples
eBay – Stored meta data for every item for sale on eBay using MongoDB
Gap – Many supply chain systems run against MongoDB
Marriott - Entire reservation system run via Couchbase
Viber - Mobile devices with always-available messaging using Couchbase
43. Document – Not great for
Large-scale batch analytics
Highly interconnected data sets
46. Key-Value – Data model
Data resides in Key-Value pairs where the Key must be unique
Key Value
Key1 25
ab928019281019210 “Carmen Electron”
ae0918384901-01102 <QuoteText>Mathemagics. I like it..</QuoteText>
user:jackson123:name “Jackson Peterson”
librarians:characters {"Characters": [
{ "CharacterName": "Cassandra", "Actor": "Lindy Booth" },
{ "CharacterName": "Ezekiel", "Actor": "John Harlin Kim"},
{ "CharacterName": "Jake", "Actor": "Christian Kane“
}]}
47. Key-Value – Data Model
Highly flexible in what you can store
Don’t need to know the data structure ahead of time
Can store various kinds of simple data like integers and strings and more complex
data like JSON or XML with very high levels of nesting.
When designing, strive for a system that will know the key when querying
48. Key-Value – Data Access
Typically you query the database by the key and not the data corresponding to
the key
In simplest Key-Value systems the data value is opaque
Some providers provide additional capabilities to allow range queries or other
extended functionality
49. Key-Value – Often strong with
Horizontal scaling
Speed
Handling data with unknown structure
Server Server Server
50. Key Value – Popular For
Messaging and chat
User and session data
bet365, Hibernum, Riot Games, Rovio have used Riak KV to store session data for
gamers and players
Virgin America and Flywheel have used RiakKV to store passenger information
and session data
51. Key-Value – Not great for
Flexible querying
Querying capabilities are limited
Data warehousing with aggregations of numbers
Complex data query needs
54. Columnar – Data model
Data resides in a keyspace (table in Hbase) that contains column families
Row key Column Family “color” Column Family “shape”
“primary”
“red”: “#FF0000”
“yellow”: “#FFFF00”
“blue”: “#0000FF”
“rectangle”: “4”
“triangle”: “3”
“secondary”
“purple”: “#A020F0”
“orange”: “#FFA500”
“green”: “#008000”
“triangle”: 3
“rainbow”
“red”: “#FF0000”
“orange”: “#FFA500”
“yellow”: “#FFFF00”
“green”: “#008000”
“blue”: “#0000FF”
"indigo": "#4b0082“
"violet": "#EE82EE"
“icosagon”: “20”
55. Columnar – Data model
Terminology varies even between HBase and
Cassandra. We’ll look at Hbase.
A column family may contain multiple columns
For each row, the column family may have
different columns
Columns don’t need to be known at time of
column family creation
Row key
Column family
“color”
Column Family
“shape”
“primary”
“red”: “#FF0000”
“yellow”: “#FFFF00”
“blue”: “#0000FF”
“rectangle”: “4”
“triangle”: “3”
“secondary”
“purple”: “#A020F0”
“orange”: “#FFA500”
“green”: “#008000”
“triangle”: 3
“rainbow”
“red”: “#FF0000”
“orange”: “#FFA500”
“yellow”: “#FFFF00”
“green”: “#008000”
“blue”: “#0000FF”
"indigo": "#4b0082“
"violet": "#EE82EE"
“icosagon”: “20”
56. Columnar – Data model
For “primary” the columns are
color:red
color:yellow
color:blue
shape:rectangle
shape:triangle
Row key
Column family
“color”
Column Family
“shape”
“primary”
“red”: “#FF0000”
“yellow”: “#FFFF00”
“blue”: “#0000FF”
“rectangle”: “4”
“triangle”: “3”
“secondary”
“purple”: “#A020F0”
“orange”: “#FFA500”
“green”: “#008000”
“triangle”: 3
“rainbow”
“red”: “#FF0000”
“orange”: “#FFA500”
“yellow”: “#FFFF00”
“green”: “#008000”
“blue”: “#0000FF”
"indigo": "#4b0082“
"violet": "#EE82EE"
“icosagon”: “20”
57. Columnar – Data model
HBase code to create an empty version of this
“table”
Hbase> create ‘visualizations’ ‘color’, ‘shape’
Row key
Column family
“color”
Column Family
“shape”
“primary”
“red”: “#FF0000”
“yellow”: “#FFFF00”
“blue”: “#0000FF”
“rectangle”: “4”
“triangle”: “3”
“secondary”
“purple”: “#A020F0”
“orange”: “#FFA500”
“green”: “#008000”
“triangle”: 3
“rainbow”
“red”: “#FF0000”
“orange”: “#FFA500”
“yellow”: “#FFFF00”
“green”: “#008000”
“blue”: “#0000FF”
"indigo": "#4b0082“
"violet": "#EE82EE"
“icosagon”: “20”
58. Columnar – Strong at
Fast retrieval of columns of data
Scaling “out” / horizontal scaling
59. Columnar – Popular For
Analytics, like data warehousing, on large amounts of data
Internet of Things (IoT)
60. Columnar – Real Examples
Twitter - people search capability
Facebook Messenger (previously)
61. Columnar – Not great for
Online Transaction Processing (OLTP)
Insert/update/delete an entire row is relatively slow
Systems with small amounts of data
Low Gigabytes or smaller
Systems where you don’t understand your query needs upfront as design tends to
be focused towards meeting those query needs.
64. Graph – Data model
Data resides in Graphs containing Nodes and Relationships
Jesse
JeffJamie
Friends With
65. Graph – Data model
Nodes (vertices) – Entities like people,
accounts, products
Nodes can contain multiple pieces of data
for an entity
Relationships (edges) – Represent
relationships between nodes
Relationships can contain additional data
about a relationship
Jamie
Joined: 2016
Province: OK
Friends With
As of: 2017
68. Neo4j – Answering questions with Cypher
Who are the friends of Supergirl?
MATCH (n)-[:FRIENDS_WITH]-(m) WHERE n.name="Supergirl" RETURN n, m;
Who are the enemies of Supergirl?
MATCH (n)-[:ENEMY_OF]-(m) WHERE n.name="Supergirl" RETURN n, m;
70. Kinds of Questions?
Who are the friends of friends of Supergirl?
Who are within 2 degrees of Supergirl?
Who are friends of people who worked with Supergirl?
Who are friends of enemies of Supergirl?
71. Graph – Popular For
Applications that work with highly connected datasets
Social networking
Recommendation engines
Fraud detection
Knowledge graphs
Asset Management
72. Graph – Real Examples
Walmart – Online real-time recommendations using Neo4j
Monsanto – Analysis of plant genetics using Neo4j
73. Graph – Not great for
Large scale analytics
Example: Not great at aggregating numbers
Online Transaction Processing (OLTP)
74. CAP Theorem – Important Too
In Distributed DB Systems
You can have 2 of these
Consistency
Availability Partition
Tolerance
CA PC
AP
75. Where to go from here?
https://www.db-fiddle.com/ - Playground for multiple SQL databases
http://www.sqlfiddle.com – Playground for multiple SQL databases
https://neo4j.com/sandbox-v2/ - Neo4j sandbox
https://docs.mongodb.com/manual/tutorial/query-documents/ - MongoDB
documentation with an online MongoDB Web Shell
https://university.mongodb.com/courses/catalog - MongoDB Courses
https://www.couchbase.com/get-started - Couchbase Get Started, including interactive
tutorial for N1QL
Seven Databases in Seven Weeks - Book by Eric Redmond and Jim Wilson
76. Q&A and Thank You
Q&A
Catch up with me later
Twitter @SQLEnergy
Techlahoma Slack @EnergyDev