This is a talk that I gave on July 20, 2012 at the Southern California Python Interest Group meetup at Cross Campus, with food and drinks provided by Graph Effect.
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
Python redis talk
1. Redis and Python
by Josiah Carlson
@dr_josiah
dr-josiah.blogspot.com
bit.ly/redis-in-action
2. Redis and Python;
It's PB & J time
by Josiah Carlson
@dr_josiah
dr-josiah.blogspot.com
bit.ly/redis-in-action
3. What will be covered
• Who am I?
• What is Redis?
• Why Redis with Python?
• Cool stuff you can do by combining them
4. Who am I?
• A Python user for 12+ years
• Former python-dev bike-shedder
• Former maintainer of Python async sockets libraries
• Author of a few small OS projects
o rpqueue, parse-crontab, async_http, timezone-utils, PyPE
• Worked at some cool places you've never heard of
(Networks In Motion, Ad.ly)
• Cool places you have (Google)
• And cool places you will (ChowNow)
• Heavy user of Redis
• Author of upcoming Redis in Action
5. What is Redis?
• In-memory database/data structure server
o Limited to main memory; vm and diskstore defunct
• Persistence via snapshot or append-only file
• Support for master/slave replication (multiple slaves
and slave chaining supported)
o No master-master, don't even try
o Client-side sharding
o Cluster is in-progress
• Five data structures + publish/subscribe
o Strings, Lists, Sets, Hashes, Sorted Sets (ZSETs)
• Server-side scripting with Lua in Redis 2.6
6. What is Redis? (compared to other
databases/caches)
• Memcached
o in-memory, no-persistence, counters, strings, very fast, multi-threaded
• Redis
o in-memory, optionally persisted, data structures, very fast, server-side
scripting, single-threaded
• MongoDB
o on-disk, speed inversely related to data integrity, bson, master/slave,
sharding, multi-master, server-side mapreduce, database-level locking
• Riak
o on-disk, pluggable data stores, multi-master sharding, RESTful API,
server-side map-reduce, (Erlang + C)
• MySQL/PostgreSQL
o on-disk/in-memory, pluggable data stores, master/slave, sharding,
stored procedures, ...
7. What is Redis? (Strings)
• Really scalars of a few different types
o Character strings
concatenate values to the end
get/set individual bits
get/set byte ranges
o Integers (platform long int)
increment/decrement
auto "casting"
o Floats (IEEE 754 FP Double)
increment/decrement
auto "casting"
8. What is Redis? (Lists)
• Doubly-linked list of character strings
o Push/pop from both ends
o [Blocking] pop from multiple lists
o [Blocking] pop from one list, push on another
o Get/set/search for item in a list
o Sortable
9. What is Redis? (Sets)
• Unique unordered sequence of character
strings
o Backed by a hash table
o Add, remove, check membership, pop, random pop
o Set intersection, union, difference
o Sortable
10. What is Redis? (Hashes)
• Key-value mapping inside a key
o Get/Set/Delete single/multiple
o Increment values by ints/floats
o Bulk fetch of Keys/Values/Both
o Sort-of like a small version of Redis that only
supports strings/ints/floats
11. What is Redis? (Sorted Sets -
ZSETs)
• Like a Hash, with 'members' and 'scores',
scores limited to float values
o Get, set, delete, increment
o Can be accessed by the sorted order of the
(score,member) pair
By score
By index
12. What is Redis? (Publish/Subscribe)
• Readers subscribe to "channels" (exact
strings or patterns)
• Writers publish to channels, broadcasting to
all subscribers
• Messages are transient
13. Why Redis with Python?
• The power of Python lies in:
o Reasonably sane syntax/semantics
o Easy manipulation of data and data structures
o Large and growing community
• Redis also has:
o Reasonably sane syntax/semantics
o Easy manipulation of data and data structures
o Medium-sized and growing community
o Available as remote server
Like a remote IPython, only for data
So useful, people have asked for a library version
14. Per-hour and Per-day hit counters
from itertools import imap
import redis
def process_lines(prefix, logfile):
conn = redis.Redis()
for log in imap(parse_line, open(logfile, 'rb')):
time = log.timestamp.isoformat()
hour = time.partition(':')[0]
day = time.partition('T')[0]
conn.zincrby(prefix + hour, log.path)
conn.zincrby(prefix + day, log.path)
conn.expire(prefix + hour, 7*86400)
conn.expire(prefix + day, 30*86400)
15. Per-hour and Per-day hit counters
(with pipelines for speed)
from itertools import imap
import redis
def process_lines(prefix, logfile):
pipe = redis.Redis().pipeline(False)
for i, log in enumerate(imap(parse_line, open(logfile, 'rb'))):
time = log.timestamp.isoformat()
hour = time.partition(':')[0]
day = time.partition('T')[0]
pipe.zincrby(prefix + hour, log.path)
pipe.zincrby(prefix + day, log.path)
pipe.expire(prefix + hour, 7*86400)
pipe.expire(prefix + day, 30*86400)
if not i % 1000:
pipe.execute()
pipe.execute()