Big Data Day LA 2016/ NoSQL track - Using Redis Data Structures to Make Your App Blazing Fast, Adi Foulger, Solutions Architect, Redis Labs

Overclock your Data with Redis

2
Who?
Me: Adi Foulger, Pro Geek @ Redis Labs, the open source home and
provider of enterprise Redis
About Redis Labs:
• 5300+ paying customers, 35k+ free customers
• 118k Redis databases managed
• Withstood several datacenter outages with no loss of data for customers who chose HA
options
• Two main products: Redis Cloud, Redis Labs Enterprise Cluster

3
What?
Redis is an open source (BSD licensed),
in-memory data structure store,
used as database, cache and message broker
and much more!

5
REmote DIctionary Server
• Data Structure Database
• An overly simplistic definition :
CREATE TABLE redis (
k VARCHAR(512MB) NOT NULL,
v VARCHAR(512MB),
PRIMARY KEY (k)
);
• 8ish data types, 180+ commands, blazing fast
• Created by @antirez (a.k.a Salvatore Sanfilippo)
• v1.0 August 9th, 2009 … v3.2 May 6th, 2016
• Source: https://github.com/antirez/redis
• Website: http://redis.io

6
Data structures are used by developers like “Lego” building
blocks, saving them much coding effort and time
Redis : A Data Structure Database
Strings Hashes Lists Sets
Sorted Sets Bitmaps
Hyper-
LogLogs
Geospatial
indexes

7
Why? Because It Is Fun!
• Simplicity  rich functionality, great flexibility
• Performance  easily serves 100K’s of ops/sec
• Lightweight  ~ 2MB footprint
• Production proven (name dropping)

8
• about how data is stored
• about how data is accessed
• about efficiency
• about performance
• about the network
• …
• Redis is a database construction kit
• Beware of Maslow's "Golden"
Gavel/Law of Instrument:
"If all you have is a hammer, everything
looks like a nail"
Redis Makes You Think

10
Key Points About Key Names
• Key names are "limited" to 512MB (also the values btw)
• To conserve RAM & CPU, try avoid using
unnecessarily_longish_names_for_your_redis_keys because they are
more expensive to store and compare (unlike an RDBMS's column
names, key names are saved for each key-value pair)
• On the other hand, don't be too stringent (e.g 'u:<uid>:r')
• Although not mandatory, the convention is to use colons (':') to
separate the parts of the key's name
• Your schema is your keys' names so keep them in order

11
STRINGs
• Are the most basic data type
• Are binary-safe
• Is used for storing:
Strings (duh) – APPEND, GETRANGE, SETRANGE, STRLEN
Integers – INCR, INCRBY, DECR, DECRBY
Floats – INCRBYFLOAT
Bits – SETBIT, GETBIT, BITPOS, BITCOUNT, BITOP
http://xkcd.com/171/

12
Pattern: Caching Calls to the DB
• Motivation: quick responses, reduce load on DBMS
• How: keep the statement's results using the Redis STRING data type
def get_results(sql):
hash = md5.new(sql).digest()
result = redis.get(hash)
if result is None:
result = db.execute(sql)
redis.set(hash, result)
# or use redis.setex to set a TTL for the key
return result

13
The HASH Data Type
• Acts as a Redis-within-Redis  contains key-value pairs
• Have their own commands: HINCRBY, HINCRBYFLOAT, HLEN,
HKEYS, HVALS…
• Usually used for aggregation, i.e. keeping related data together
for easy fetching/updating (remember that Redis is not a
relational database). Example:
Using separate keys Using hash aggregation
user:1:id  1 user:1 id  1
user:1:fname  Foo fname  Foo
user:1:lname  Bar lname  Bar
user:1:email  foo@acme.com email  foo@acme.com

14
Pattern: Avoiding Calls to the DB
• Motivation: server-side storage and sharing of transient data that doesn't need a
full-fledged RDBMS, e.g. sessions and shopping carts
• How: depending on the case, use STRING or HASH to store data in Redis
def add_to_cart(session, product, quantity):
if quantity > 0:
redis.hset('cart:' + userId, product, quantity)
else:
redis.hrem('cart:' + userId, product)
redis.expire('cart:' + userId,Cart_Timeout)
def get_cart_contents(session):
return redis.hgetall('cart:' + userId)

15
Pattern: Counting Things
• Motivation: statistics, real-time analytics, dashboards, throttling
• How #1: use the *INCR commands
• How #2: use a little bit of BIT*
def user_log_login(uid):
joined = redis.hget('user:' + uid, 'joined')
d0 = datetime.strptime(joined, '%Y-%m-$d')
d1 = datetime.date.today()
delta = d1 – d0
redis.setbit('user:' + uid + ':logins', delta, 1)
def user_logins_count(uid):
return redis.bitcount(
'user:' + uid + ':logins', 0, -1)

16
De-normalization
• Non relational  no foreign keys, no
referential integrity constraints
• Thus, data normalization isn't practical (
Mostly)
• Be prepared to have duplicated data, e.g.:
> HSET user:1 country Mordor
> HSET user:2 country Mordor
…
• Tradeoff:
• Processing Complexity ↔ Data Volume

17
LISTs
17
• Lists of strings sorted by insertion
order
• Usually have a head and a tail
• Top n, bottom n, constant length list
operations as well as passing items
from one list to another are
extremely popular and extremely
fast

18
Pattern: Lists of Items
• Motivation: keeping track of a sequence, e.g. last viewed profiles
• How: use Redis' LIST data type
def view_product(uid, product):
redis.lpush('user:' + uid + ':viewed', product)
redis.ltrim('user:' + uid + ':viewed', 0, 9)
…
def get_last_viewed_products(uid):
return redis.lrange('user:' + uid + ':viewed', 0,
-1)

19
Pattern: Queues
• Motivation: a producer-consumer use case, asynchronous job
management, e.g. processing photo uploads
def enqueue(queue, item):
redis.lpush(queue, item)
def dequeue(queue):
return redis.rpop(queue)
# or use brpop for blocking pop

20
SETs
• Unordered collection of strings
• ADD, REMOVE or TEST for
membership –O(1)
• Unions, intersections,
differences computed very very
fast

21
Pattern: Searching
• Motivation: finding keys in the database, for example all the users
• How #1: use a LIST to store key names
• How #2: the *SCAN commands
def do_something_with_all_users():
first = True
cursor = 0
while cursor != 0 or first:
first = False
cursor, data = redis.scan(cursor, 'user:*')
do_something(data)

22
Pattern: Indexing
• Motivation: Redis doesn't have indices, you need to maintain
them
• How: the SET data type (a collection of unordered unique
members)
def update_country_idx(country, uid):
redis.sadd('country:' + country, uid)
def get_users_in_country(country):
return redis.smembers('country:' + country)

23
Pattern: Relationships
• Motivation: Redis doesn't have foreign keys, you need to maintain
them
> SADD user:1:friends 3 4 5 // Foo is social and makes friends
> SCARD user:1:friends // How many friends does Foo have?
> SINTER user:1:friends user:2:friends // Common friends
> SDIFF user:1:friends user:2:friends // Exclusive friends
> SUNION user:1:friends user:2:friends // All the friends

24
ZSETs (Sorted Sets)
Are just like SETs:
• Members are unique
• ZADD, ZCARD, ZINCRBY, …
ZSET members have a score that's used for sorting
• ZCOUNT, ZRANGE, ZRANGEBYSCORE
When the scores are identical, members are sorted alphabetically
Lexicographical ranges are also supported:
• ZLEXCOUNT, ZRANGEBYLEX

25
Pattern: Sorting
Motivation: anything that needs to be sorted
How: ZSETs
> ZADD friends_count 3 1 1 2 999 3
> ZREVRANGE friends_count 0 -1
3
1
2
Set members (uids)
Scores (friends count)

26
The SORT Command
• A command that sorts LISTs, SETs and SORTED SETs
• SORT's syntax is the most complex (comparatively) but SQLers
should feel right at home with it:
• SORT key [BY pattern] [LIMIT offset count]
[GET pattern [GET pattern ...]]
[ASC|DESC] [ALPHA]
[STORE destination]
• SORT is also expensive in terms of complexity  O(N+M*log(M))
• BTW, SORT is perhaps the only ad-hoc-like command in Redis

27
Pattern: Counting Unique Items
• How #1: SADD items and SCARD for the count
• Problem: more unique items  more RAM 
• How #2: the HyperLogLog data structure
> PFADD counter item1 item2 item3 …
HLL is a probabilistic data structure that counts (PFCOUNT) unique
items
Sacrifices accuracy: standard error of 0.81%
Gains: constant complexity and memory – 12KB per counter
Bonus: HLLs are merge-able with PFMERGE


28
Is Redis ACID? (mostly) Yes!
• Redis is (mostly) single threaded, hence every operation is
o Atomic
o Isolated
• WATCH/MULTI/EXEC allow something like transactions (no rollbacks)
• Server-side Lua scripts ("stored procedures") also behave like
transactions
• Durability is configurable and is a tradeoff between efficiency and
safety
• Consistency achieved using the “wait” command

30
Wait, There's More!
• There are additional commands that we didn't cover 
• Expiration and eviction policies
• Publish/Subscribe
• Data persistency and durability
• Server-side scripting with Lua
• Master-Slave(s) replication
• High Availability
• Clustering

33
Before
• Redis is ubiquitous for fast data, fits lots of cases (Swiss™ Army
knife)
• Some use cases need special care
• Open source has its own agenda

34
After
• Core still fits lots of cases
• Module extensions for special cases
• A new community-driven ecosystem
• “Give power to users to go faster”

35
What are they?
• Dynamically (server-)loaded libraries
• Future-compatible
• Will be (mostly) written in C
• (Almost) As fast as the core
• Planned for public release Q3 2016

36
What can I do with them?
• Process: where the data is at
• Compose: call core & other modules
• Extend: new structures, commands

37
It’s got layers!
• Operational: admin, memory, disk,
replication, arguments, replies…
• High-level: client-like access to core and
modules’ commands
• Low-level: (almost) native access to core
data structures memory

38
They make everything faster!
38

42
Learn More
• https://www.youtube.com/watch?v=EglSYFodaqw

44
Spark Operation w/o Redis
Read to RDD Deserialization Processing Serialization Write to RDD
Analytics & BI
1 2 3 4 5 6
Data SinkData Source

45
Spark SQL &
Data Frame
Spark Operation with Redis
Data Source Serving Layer
Analytics & BI
1 2
Processing
Spark-Redis connector
Read
filtered/sorted
data
Write
filtered/sorted
data

46
Accelerating Spark Time-Series with Redis
Redis is faster by up to 100 times compared to HDFS
and over 45 times compared to Tachyon or Spark

47
Goal:
• Accelerate Hadoop operation by orders of magnitude:
‒ Phase 1 – use Redis as a caching solution for HBase (Hadoop’s default database)
and HDFS (Hadoop Distributed File System)
‒ phase 2 – completely replace HBase with Redis
Hadoop is Turbo Charged with Redis
Milestone:
• Demonstrated Hadoop acceleration by
HBase caching with Redis
Real-life scenario:
~500% acceleration

Big Data Day LA 2016/ NoSQL track - Using Redis Data Structures to Make Your App Blazing Fast, Adi Foulger, Solutions Architect, Redis Labs

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (20)

Mehr von Data Con LA

Mehr von Data Con LA (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Big Data Day LA 2016/ NoSQL track - Using Redis Data Structures to Make Your App Blazing Fast, Adi Foulger, Solutions Architect, Redis Labs

Hinweis der Redaktion