In evaluating developers, I routinely come across very talented developers with a decade or more of experience with databases who nonetheless can't design even the simplest of schemas. This presentation is based on my popular blog post of the same name: http://blogs.perl.org/users/ovid/2013/07/how-to-fake-database-design.html
1. How to Fake a Database Design
How do I spell “normalization”?
OSCON 2014
Curtis "Ovid" Poe
http://allaroundtheworld.fr/
Copyright 2014, http://www.allaroundtheworld.fr/
March 18, 2022
2. Good Database Schemas
• Generally normalized
• Denormalized only as necessary
• No duplicate data
March 18, 2022 Copyright 2014, http://www.allaroundtheworld.fr/
3. Typical Developer Schemas
• A steaming pile of ones and zeros
• … with a “family friendly” background
March 18, 2022 Copyright 2014, http://www.allaroundtheworld.fr/
Source: http://commons.wikimedia.org/wiki/File:Spaghetti-prepared.jpg
4. Database Normalization
• Remove redundancy
• Create logical relations
• Decomposing data to atomic elements
March 18, 2022 Copyright 2014, http://www.allaroundtheworld.fr/
5. Only Covering 3NF
1. Remove repeating groups of data
2. Remove partial key dependencies
3. Remove data unrelated to key
March 18, 2022 Copyright 2014, http://www.allaroundtheworld.fr/
6. How to Feel Stupid
March 18, 2022 Copyright 2014, http://www.allaroundtheworld.fr/
“It is shown that if a relation
schema is in third normal form and
every key is simple, then it is in
projection-join normal form
(sometimes called fifth normal
form), the ultimate normal form
with respect to projections and
joins.”
Simple Conditions for Guaranteeing Higher Normal
Forms in Relational Databases — C. J. Date
http://commons.wikimedia.org/wiki/File:%22I_should_have_gone_to_the_pro_station%22_-_NARA_-
_514564.tif
7. ‘Nuff of that – Let’s Get Started
March 18, 2022 Copyright 2014, http://www.allaroundtheworld.fr/
I’m going to discuss “how”, not “why”,
because I only have 50 minutes.
8. Faking a Database Design
• Forget everything you know about Excel
• Focus on nouns (sort of)
• Duplicate data is a design flaw
March 18, 2022 Copyright 2014, http://www.allaroundtheworld.fr/
9. Real-World Problem
• Client wanted a rewrite of recipes site
• They sent us their Access (!) database
• Main objects:
– customers
– recipes
– orders
March 18, 2022 Copyright 2014, http://www.allaroundtheworld.fr/
10. Our “DBA” Said This Was OK
March 18, 2022 Copyright 2014, http://www.allaroundtheworld.fr/
11. March 18, 2022 Copyright 2014, http://www.allaroundtheworld.fr/
Our “DBA” also lost his job shortly thereafter
12. Back to the plot …
• Customers
• Orders
• Recipes
March 18, 2022 Copyright 2014, http://www.allaroundtheworld.fr/
22. Searching
SELECT recipe_id, name FROM recipes
WHERE
ingredient1 IN ( 'fettuccinne', 'fettuchini', 'fettucini', 'fettucinne', 'fetuchine', 'fetuchinney',
'fetuchinni', 'fetucine', 'fetucini', 'fetucinni')
OR
ingredient2 IN ( 'fettuccinne', 'fettuchini', 'fettucini', 'fettucinne', 'fetuchine', 'fetuchinney',
'fetuchinni', 'fetucine', 'fetucini', 'fetucinni')
OR
ingredient3 IN ( 'fettuccinne', 'fettuchini', 'fettucini', 'fettucinne', 'fetuchine', 'fetuchinney',
'fetuchinni', 'fetucine', 'fetucini', 'fetucinni')
OR
ingredient4 IN ( 'fettuccinne', 'fettuchini', 'fettucini', 'fettucinne', 'fetuchine', 'fetuchinney',
'fetuchinni', 'fetucine', 'fetucini', 'fetucinni')
OR
ingredient5 IN ( 'fettuccinne', 'fettuchini', 'fettucini', 'fettucinne', 'fetuchine', 'fetuchinney',
'fetuchinni', 'fetucine', 'fetucini', 'fetucinni')
OR
ingredient6 IN ( 'fettuccinne', 'fettuchini', 'fettucini', 'fettucinne', 'fetuchine', 'fetuchinney',
'fetuchinni', 'fetucine', 'fetucini', 'fetucinni')
OR
ingredient7 IN ( 'fettuccinne', 'fettuchini', 'fettucini', 'fettucinne', 'fetuchine', 'fetuchinney',
'fetuchinni', 'fetucine', 'fetucini', 'fetucinni')
OR
ingredient8 IN ( 'fettuccinne', 'fettuchini', 'fettucini', 'fettucinne', 'fetuchine', 'fetuchinney',
'fetuchinni', 'fetucine', 'fetucini', 'fetucinni');
March 18, 2022 Copyright 2014, http://www.allaroundtheworld.fr/
23. It’s “fettuccine”, in case
you were wondering
March 18, 2022 Copyright 2014, http://www.allaroundtheworld.fr/
24. Searching
SELECT recipe_id, name FROM recipes
WHERE ingredient1 = 'fettuccine'
OR ingredient2 = 'fettuccine'
OR ingredient3 = 'fettuccine'
OR ingredient4 = 'fettuccine'
OR ingredient5 = 'fettuccine'
OR ingredient6 = 'fettuccine'
OR ingredient7 = 'fettuccine'
OR ingredient8 = 'fettuccine';
March 18, 2022 Copyright 2014, http://www.allaroundtheworld.fr/
26. Rule #3
1. Nouns == tables
2. Another table’s ID must have a FK constraint
3. Lists of things get their own table
March 18, 2022 Copyright 2014, http://www.allaroundtheworld.fr/
28. Searching
SELECT recipe_id, name
FROM recipes r
JOIN recipe_ingredients ri ON ri.recipe_id = r.recipe_id
JOIN ingredients i ON i.ingredient_id =
ri.ingredient_id
WHERE i.name = 'fettuccine';
March 18, 2022 Copyright 2014, http://www.allaroundtheworld.fr/
31. Rule #4
1. Nouns == tables
2. Another table’s ID must have a FK constraint
3. Lists of things get their own table
4. Many-to-many == lookup table (with FKs)
March 18, 2022 Copyright 2014, http://www.allaroundtheworld.fr/
32. So How Do We Order Recipes?
March 18, 2022 Copyright 2014, http://www.allaroundtheworld.fr/
34. How Many of Which Ingredient?
March 18, 2022 Copyright 2014, http://www.allaroundtheworld.fr/
35. March 18, 2022 Copyright 2014, http://www.allaroundtheworld.fr/
Our simple “customers”, “orders”, and “recipes”
database has grown to seven tables.
And it will keep growing.
36. So Far
• Every noun has its own table (*)
• Lookup tables join related tables
• And generally have some of unique constraint
• Other table’s ids have foreign key constraints
March 18, 2022 Copyright 2014, http://www.allaroundtheworld.fr/
37. Database Tips
• We’ve covered the main rules
• They only cover structure
• Now to dive deeper
March 18, 2022 Copyright 2014, http://www.allaroundtheworld.fr/
38. Equality ≠ Identity
• No duplication == not duplicating identity
• Are identical twins the same person?
• Are two guys named “John” the same guy?
• This is important and easy to get wrong
• For example …
March 18, 2022 Copyright 2014, http://www.allaroundtheworld.fr/
39. How do you get the total of an order?
• Assume each recipe has a price
• Store total in the order? (hint: no)
• Store price on the recipe? (hint: yes)
• Is that enough?
March 18, 2022 Copyright 2014, http://www.allaroundtheworld.fr/
41. Calculating the Order Total?
SELECT o.order_id, sum(i.price)
FROM orders o
JOIN orders_recipes orr
ON orr.order_id = o.order_id
JOIN recipes r
ON r.recipe_id = orr.recipe_id
GROUP BY o.order_id
March 18, 2022 Copyright 2014, http://www.allaroundtheworld.fr/
42. What if the price changes?
March 18, 2022 Copyright 2014, http://www.allaroundtheworld.fr/
44. Calculating the Order Total
SELECT o.order_id, sum(orr.price)
FROM orders o
JOIN orders_recipes orr
ON orr.order_id = o.order_id
GROUP BY o.order_id
March 18, 2022 Copyright 2014, http://www.allaroundtheworld.fr/
45. Equality is not Identity
• Order item price isn’t item price
• What if the item price changes?
• What if you give a discount on the order item?
• A subtle, but common bug
March 18, 2022 Copyright 2014, http://www.allaroundtheworld.fr/
46. Rule #5
1. Nouns == tables
2. Another table’s ID must have a FK constraint
3. Lists of things get their own table
4. Many-to-many == lookup table (with FKs)
5. Watch for equal values that aren’t identical
March 18, 2022 Copyright 2014, http://www.allaroundtheworld.fr/
47. Naming
• Names are important
• Identical columns should have identical names
• Names should hint at use
March 18, 2022 Copyright 2014, http://www.allaroundtheworld.fr/
48. Bad Naming
SELECT name, 'too cold'
FROM areas
WHERE temperature < 32;
March 18, 2022 Copyright 2014, http://www.allaroundtheworld.fr/
50. ID Names
SELECT o.id, sum(i.price)
FROM orders o
JOIN orders_recipes orr
ON orr.order_id = o.id
JOIN recipes r
on r.id = o.id
GROUP BY o.order_id
March 18, 2022 Copyright 2014, http://www.allaroundtheworld.fr/
51. ID Names
SELECT o.id, sum(i.price)
FROM orders o
JOIN orders_recipes orr
ON orr.order_id = o.id
JOIN recipes r
on r.id = o.id
GROUP BY o.order_id
March 18, 2022 Copyright 2014, http://www.allaroundtheworld.fr/
52. Conceptually Similar to …
SELECT name
FROM customer
WHERE id > weight;
March 18, 2022 Copyright 2014, http://www.allaroundtheworld.fr/
53. ID Names
SELECT thread.*
FROM email thread
JOIN email selected ON selected.id = thread.id
JOIN character recipient ON recipient.id = thread.recipient_id
JOIN station_area sa ON sa.id = recipient.id
JOIN station st ON st.id = sa.id
JOIN star origin ON origin.id = thread.id
JOIN star destination ON destination.id = st.id
LEFT JOIN route
ON ( route.from_id = origin.id AND route.to_id = destination.id )
WHERE selected.id = ?
AND ( thread.sender_id = ?
OR ( thread.recipient_id = ?
AND ( origin.id = destination.id
OR ( route.distance IS NOT NULL
AND
now() >= thread.datesent
+ ( route.distance * interval '30 seconds' )
))))
ORDER BY datesent ASC, thread.parent_id ASC NULLS FIRST
March 18, 2022 Copyright 2014, http://www.allaroundtheworld.fr/
54. Rule #6
1. Nouns == tables
2. Another table’s ID must have a FK constraint
3. Lists of things get their own table
4. Many-to-many == lookup table (with FKs)
5. Watch for equal values that aren’t identical
6. Name columns as descriptively as possible
March 18, 2022 Copyright 2014, http://www.allaroundtheworld.fr/
55. Summary
• Nouns == tables (*)
• FK constraints
• Proper naming is important
• Your DBAs will thank you
• Your apps will be more robust
March 18, 2022 Copyright 2014, http://www.allaroundtheworld.fr/
56. March 18, 2022 Copyright 2014, http://www.allaroundtheworld.fr/
?
http://www.slideshare.net/ovid/
57. Bonus Slides!
Super-duper important stuff I wasn’t
sure I had time to cover because it’s
going to make your head hurt.
March 18, 2022 Copyright 2014, http://www.allaroundtheworld.fr/
58. Avoid NULL Values
• Every column should have a type
• NULLs, by definition, are unknown values
• Thus, their type is unknown
• But … every column should have a type?
March 18, 2022 Copyright 2014, http://www.allaroundtheworld.fr/
59. Our employees Table
CREATE TABLE employees (
employee_id SERIAL PRIMARY KEY,
name CHARACTER VARYING(255) NOT NULL,
salary MONEY NULL
);
March 18, 2022 Copyright 2014, http://www.allaroundtheworld.fr/
60. Giving Bonuses
• $1,000 bonus to all employees
• … if they make less than $40,000/year
March 18, 2022 Copyright 2014, http://www.allaroundtheworld.fr/
61. Get Employees For Bonus
SELECT employee_id, name
FROM employee
WHERE salary < 40000;
March 18, 2022 Copyright 2014, http://www.allaroundtheworld.fr/
62. Bad SQL
• Won’t return anyone with a NULL salary
• Why is the salary NULL?
– What if it’s confidential?
– What if they’re a contractor and in that table?
– What if they’re an unpaid slave intern?
– What if it’s unknown when the data was entered?
March 18, 2022 Copyright 2014, http://www.allaroundtheworld.fr/
63. NULLs tell you nothing
supplier_id city
s1 ‘London’
March 18, 2022 Copyright 2014, http://www.allaroundtheworld.fr/
part_id city
p1 NULL
suppliers table
parts table
Example via “Database In Depth” by C.J. Date
64. NULLs tell you nothing
March 18, 2022 Copyright 2014, http://www.allaroundtheworld.fr/
part_id city
p1 NULL
parts table
Example via “Database In Depth” by C.J. Date
SELECT part_id
FROM parts;
SELECT part_id
FROM parts
WHERE city = city;
65. NULLs tell you nothing
supplier_id city
s1 ‘London’
March 18, 2022 Copyright 2014, http://www.allaroundtheworld.fr/
part_id city
p1 NULL
Example via “Database In Depth” by C.J. Date
SELECT s.supplier_id, p.part_id
FROM suppliers s, parts p
WHERE p.city <> s.city -- can’t compare NULL
OR p.city <> 'Paris’; -- can’t compare NULL
66. NULLs tell you lies
March 18, 2022 Copyright 2014, http://www.allaroundtheworld.fr/
Example via “Database In Depth” by C.J. Date
SELECT s.supplier_id, p.part_id
FROM suppliers s, parts p
WHERE p.city <> s.city -- can’t compare NULL
OR p.city <> 'Paris’; -- can’t compare NULL
• We get no rows because we can’t compare a NULL city
• The unknown city is Paris or it isn't.
• If it’s Paris, the first condition is true
• If it’s not Paris, the second condition is true
• Thus, the WHERE clause must be true, but it’s not
67. Rule #7
1. Nouns == tables
2. Another table’s ID must have an FK constraint
3. Lists of things get their own table
4. Many-to-many == lookup table (with FKs)
5. Watch for equal values that aren’t identical
6. Name columns as descriptively as possible
7. Avoid NULL columns like the plague
March 18, 2022 Copyright 2014, http://www.allaroundtheworld.fr/
Hinweis der Redaktion
Duplicate data means identity, not equality!
Any guesses as to what was in ingredient8?
Note that ‘address’ and ‘directions’ aren’t separate tables. Great point for discussion. (Surprêmes de volaille aux champignons === chicken parisienne)
FKs prevent crap data.
How many of you have worked on databases with crap data?
Well-designed databases can make it hard to add crap data.
Even if you *knew* you would never need more than 8 ingredients,
what do you do when you find out that macaroni, barbecue, or fettucinne
are routinely misspelled?