This is an introduction to relational and non-relational databases and how their performance affects scaling a web application.
This is a recording of a guest Lecture I gave at the University of Texas school of Information.
In this talk I address the technologies and tools Gowalla (gowalla.com) uses including memcache, redis and cassandra.
Find more on my blog:
http://schneems.com
10. The Web is Data
⢠Username => String
⢠Birthday => Int/ Int/ Int
⢠Blog Post => Text
⢠Image => Binary-ďŹle/blob
Data needs to be stored
to be useful
12. Gowalla Database
⢠PostgreSQL
⢠Relational (RDBMS)
⢠Open Source
⢠Competitor to MySQL
⢠ACID compliant
⢠Running on a Dedicated Managed Server
13. Need for Speed
⢠Throughput:
⢠The number of operations per minute that
can be performed
⢠Pure Speed:
⢠How long an individual operation takes.
14. Potential Problems
⢠Hardware
⢠Slow Network
⢠Slow hard-drive
⢠InsuďŹicient CPU
⢠InsuďŹicient Ram
⢠Software
⢠too many Reads
⢠too many Writes
15. Scaling Up versus Out
⢠Scale Up:
⢠More CPU, Bigger HD, More Ram etc.
⢠Scale Out:
⢠More machines
⢠More machines
⢠More machines
⢠...
16. Scale Up
⢠Bigger faster machine
⢠More Ram
⢠More CPU
⢠Bigger ethernet bus
⢠...
⢠Moores Law
⢠Diminishing returns
17. Scale Out
⢠Forget Moores law...
⢠Add more nodes
⢠Master/ Slave Database
⢠Sharding
18. Master/Slave
Write
Master DB
Copy
Slave DB Slave DB Slave DB Slave DB
Read
19. Master & Slave +/-
⢠Pro
⢠Increased read speed
⢠Takes read load oďŹ of master
⢠Allows us to Join across all tables
⢠Con
⢠Doesnât buy increased write throughput
⢠Single Point of Failure in Master Node
20. Sharding
Write
Users in Users in Users in Users in
USA Europe Asia Africa
Read
21. Sharding +/-
⢠Pro
⢠Increased Write & Read throughput
⢠No Single Point of failure
⢠Individual features can fail
⢠Con
⢠Cannot Join queries between shards
22. What is a Database?
⢠Relational Database Managment System
(RDBMS)
⢠Stores Data Using Schema
⢠A.C.I.D. compliant
⢠Atomic
⢠Consistent
⢠Isolated
⢠Durable
23. RDBMS
⢠Relational
⢠Matches data on common characteristics
in data
⢠Enables âJoinâ & âUnionâ queries
⢠Makes data modular
24. Relational +/-
⢠Pros
⢠Data is modular
⢠Highly ďŹexible data layout
⢠Cons
⢠Getting desired data can be tricky
⢠Over modularization leads to many join
queries
⢠Trade oďŹ performance for search-ability
25. Schema Storage
⢠Blueprint for data storage
⢠Break data into tables/columns/rows
⢠Give data types to your data
⢠Integer
⢠String
⢠Text
⢠Boolean
⢠...
26. Schema +/-
⢠Pros
⢠Regularize our data
⢠Helps keep data consistent
⢠Converts to programming âtypesâ easily
⢠Cons
⢠Must seperatly manage schema
⢠Adding columns & indexes to existing
large tables can be painful & slow
27. ACID
⢠Properties that guarante a reliably
transaction are processed
database
⢠Atomic
⢠Consistent
⢠Isolated
⢠Durable
28. ACID
⢠Atomic
⢠Any database Transaction is all or nothing.
⢠If one part of the transaction fails it all fails
âAn Incomplete Transaction Cannot Existâ
29. ACID
⢠Consistent
⢠Any transaction will take the another
from one consistent state to
database
âOnly Consistent data is allowed to be
writtenâ
30. ACID
⢠Isolated
⢠No transaction should be able to interfere
with another transaction
âthe same ďŹeld cannot be updated by two
sources at the exact same timeâ
}
a = 0
a += 1 a = ??
a += 2
32. What is a Database?
⢠RDBMS
⢠Relational
⢠Flexible
⢠Has a schema
⢠Most likely ACID compliant
⢠Typically fast under low load or when
optimized
33. What is SQL?
⢠Structured Query Language
⢠The language databases speak
⢠Based on relational algebra
⢠Insert
⢠Query
⢠Update
⢠Delete
âSELECT Company, Country FROM Customers
WHERE Country = 'USA' â
34. Why people <3 SQL
⢠Relational algebra is powerful
⢠SQL is proven
⢠well understood
⢠well documented
35. Why people </3 SQL
⢠Relational algebra Is hard
⢠DiďŹerent databases support diďŹerent SQL
syntax
⢠Yet another programming language to learn
36. SQL != Database
⢠SQL is used to talk to a RDBMS (database)
⢠SQL is not a RDBMS
39. Types of NoSQL
⢠Distributed Systems
⢠Document Store
⢠Graph Database
⢠Key-Value Store
⢠Eventually Consistent Systems
Mix And Match â
40. Key Value Stores
⢠Non Relational
⢠Typically No Schema
⢠Map one Key (a string) to a Value (some
object)
Example: Redis
41. Key Value Example
redis = Redis.new
redis.set(âfooâ, âbarâ)
redis.get(âfooâ)
>> âbarâ
42. Key Value Example
redis = Redis.new
Key Value
redis.set(âfooâ, âbarâ)
Key
redis.get(âfooâ)
Value
>> âbarâ
43. Key Value
⢠Like a databse that can only ever use
primary Key (id)
YES
select * from users where id = â3â;
NO
select * from users where name = âschneemsâ;
44. NoSQL @ Gowalla
⢠Redis (key-value store)
⢠Store âLikesâ & Analytics
⢠Memcache (key-value store)
⢠Cache Database results
⢠Cassandra
⢠(eventually consistent, with-schema, key
value store)
⢠Store âfeedsâ or âtimelinesâ
⢠Solr (search index)
45. Memcache
⢠Key-Value Store
⢠Open Source
⢠Distributed
⢠In memory (ram) only
⢠fast, but volatile
⢠Not ACID
⢠Memory object caching system
47. Memcache
⢠Can store whole objects
memcache = Memcache.new
user = User.where(:username => âschneemsâ)
memcache.set(âuser:3â, user)
user_from_cache = memcache.get(âuser:3â)
user_from_cache == user
>> true
user_from_cache.username
>> âSchneemsâ
48. Memcache @ Gowalla
⢠Cache Common Queries
⢠Decreases Load on DB (postgres)
⢠Enables higher throughput from DB
⢠Faster response than DB
⢠Users see quicker page load time
49. What to Cache?
⢠Objects that change infrequently
⢠users
⢠spots (places)
⢠etc.
⢠Expensive(ish) sql queries
⢠Friend ids for users
⢠User ids for people visiting spots
⢠etc.
52. Memcache <3âs DB
⢠We use them Together
⢠If memcache doesnât have a value
⢠Fetch from the database
⢠Set the key from database
⢠Hard
⢠Cache Invalidation : (
53. Redis
⢠Key Value Store
⢠Open Source
⢠Not Distributed (yet)
⢠Extremely Quick
⢠âData structure serverâ
54. Redis Example, again
redis = Redis.new
redis.set(âfooâ, âbarâ)
redis.get(âfooâ)
>> âbarâ
55. Redis - Has Data Types
⢠Strings
⢠Hashes
⢠Lists
⢠Sets
⢠Sorted Sets
66. NoSQL vs. RDBMS
⢠No Magic Bullet
⢠Use Both!!!
⢠Model data in a datastore you understand
⢠Switch to when/if you need to
⢠Understand Your Options