Roberto "frank" Franchini presenta a Codemotion Techmeetup Torino Redis, un data structure server che può utilizzare come chiavi stringhe, hashes, lists, sets, sorted sets, bitmaps e hyperloglogs
.
2. whoami(1)
15 years of experience, proud to be a programmer
Writes software for information extraction, nlp, opinion mining (@scale ), and a
lot of other buzzwords
Implements scalable architectures
Member of the JUG-Torino coordination team
ro.franchini@gmail.com github.com/robfrank
twitter.com/robfrankie linkedin.com/in/robfrank
http://www.celi.it http://www.blogmeter.it
3. Agenda
What is it?
Main features
Caching
Counters
Scripting
How we use it
4. From the site
Redis is an open source, BSD licensed, advanced
key-value cache and store. It is often referred to
as a data structure server since keys can contain
strings, hashes, lists, sets, sorted sets, bitmaps
and hyperloglogs.
5. Who use it
Twitter
Github
Youporn
Pinterest
Groupon
...
6. Ecosystem
Clients in every known language
Articles, books, presentations
On High Scalability every other day
7. Architecture
Single-threaded server
Yes: single threaded server
Remember that when you need to scale
Single Linux server can handle 500k req/s
8. Main features
In memory K/V store
But with durable persistence
Master-slave async replica
Transactions
Pub/Sub
Server side LUA scripting
9. Main features
Keys with TTL
LRU eviction
Keys can contain strings, hashes, lists, sets, sorted sets,
bitmaps and hyperloglogs
REDIS cluster on the go (3.0.0-rc1)
10. K/V store
Key-value (KV) stores use the associative array (also
known as a map or dictionary) as their fundamental data
model. In this model, data is represented as a collection of
key-value pairs, such that each possible key appears at
most once in the collection. (wikipedia)
11. K/V store
Key
“plain text”
name rob
surname frank
A C E D B F
A B C D E F
String/blobs/bitmaps
HashTable: Objects
Linked lists
Sets
12. Persistence
Configurable, two flavors
RDB: perfect for backup
AOF: append only log, replayed at startup
Use AOF + RDB for rock solid persistence
Automatic cache warm-up at startup!!
Only RAM: switch off persistence
13. Common use cases
Cache
Queue
Session replication
In memory indexes
Centralized ID generation
14. Basics
SET user:1 frank
GET user:1 → frank
EXISTS user:2 → 1
EXPIRE user:1 3600
INCR count:1
GET count:1 → 1
15. Basics
KEYS user:* → user:1, user:2
MSET user:1 frank user:2 coder
MGET user:1 user:2 → frank, coder
HMSET userdetail:3 name rob surname frank
HGETALL userdetail:3 → name::rob, surname:: frank
16. Transactions
MULTI
INCR counter:1
INCR counter:2
EXEC
> 1
> 1
WATCH counter:3
val = GET counter:3
val = val +1
MULTI
SET counter:3 $val
EXEC
17. Atomic counters
Operators for key increment
INCR counter:1
GET counter:1 → 1
INCRBY counter:1 9
GET counter:1 → 10
18. LUA scripting
Server side LUA scripting
A “sort of” stored procedure
Scripts are sandboxed
Atomic execution ← bear in mind
20. Caching: server level
Configure REDIS as a cache
maxmemory 1024mb
maxmemory-policy allkeys-lru
all the keys will be evicted using an
approximated LRU algorithm
21. Caching: TTL on key
Set a timeout on a key
SET doc:1 “mydoc.txt”
EXIPRE doc:1 10
Or
SETEX doc:1 10 “mydoc.txt”
24. Duplicate detection
Real time stream of documents from
the Internet
20% to 50% of documents are duplicated
DUPLICATES ARE EVIL
And customers don’t pay for that :(
28. Documents
Each kind of document has its own natural id
twitter: status id
facebook: post id
forum: URL
blog: URL
We don’t want this IDs inside our system
29. Duplicate and id generation
Producer
2M
Producer
Producer
Duplicate
detector -
ID
generatio
n
Analysis
Storage
3M
3M
Duplicate
detector -
ID
generatio
n
1M Analysis 1M
5M
30. Map external keys to internal UID
Generate an ID for each document
IDs are generated using daily named counters:
INCR day:20141028 → 12576
INCR day:20141010 → 23412576
Cache generated ID
tw_1234578688 → day:20141028;12576
31. Map external keys to internal UID
Documents are internally stored on different storage
systems with their generated id
globalId→ 20141028:3456789
32. Operations
Natural Keys are cached with TTL
Documents out of time are parked in a staging area
Duplicated documents are usually dropped
33. LRU cache, counters and LUA
LUA scripts are executed atomically
Wrote a simple script to:
return previous mapped id
or generate id and store key and id in cache
EVALSHA “sha” 2 20141028 tw_1234566 → 20141028:123
GET tw_1234566 → 20141028:123