SlideShare ist ein Scribd-Unternehmen logo
1 von 46
Overclock your Data with Redis
2
Who?
Me: Adi Foulger, Pro Geek @ Redis Labs, the open source home and
provider of enterprise Redis
About Redis Labs:
• 5300+ paying customers, 35k+ free customers
• 118k Redis databases managed
• Withstood several datacenter outages with no loss of data for customers who chose HA
options
• Two main products: Redis Cloud, Redis Labs Enterprise Cluster
3
What?
Redis is an open source (BSD licensed),
in-memory data structure store,
used as database, cache and message broker
and much more!
Redis… How does it work?
5
REmote DIctionary Server
• Data Structure Database
• An overly simplistic definition :
CREATE TABLE redis (
k VARCHAR(512MB) NOT NULL,
v VARCHAR(512MB),
PRIMARY KEY (k)
);
• 8ish data types, 180+ commands, blazing fast
• Created by @antirez (a.k.a Salvatore Sanfilippo)
• v1.0 August 9th, 2009 … v3.2 May 6th, 2016
• Source: https://github.com/antirez/redis
• Website: http://redis.io
6
Data structures are used by developers like “Lego” building
blocks, saving them much coding effort and time
Redis : A Data Structure Database
Strings Hashes Lists Sets
Sorted Sets Bitmaps
Hyper-
LogLogs
Geospatial
indexes
7
Why? Because It Is Fun!
• Simplicity  rich functionality, great flexibility
• Performance  easily serves 100K’s of ops/sec
• Lightweight  ~ 2MB footprint
• Production proven (name dropping)
8
• about how data is stored
• about how data is accessed
• about efficiency
• about performance
• about the network
• …
• Redis is a database construction kit
• Beware of Maslow's "Golden"
Gavel/Law of Instrument:
"If all you have is a hammer, everything
looks like a nail"
Redis Makes You Think
Okay… so how do I use it?
10
Key Points About Key Names
• Key names are "limited" to 512MB (also the values btw)
• To conserve RAM & CPU, try avoid using
unnecessarily_longish_names_for_your_redis_keys because they are
more expensive to store and compare (unlike an RDBMS's column
names, key names are saved for each key-value pair)
• On the other hand, don't be too stringent (e.g 'u:<uid>:r')
• Although not mandatory, the convention is to use colons (':') to
separate the parts of the key's name
• Your schema is your keys' names so keep them in order
11
STRINGs
• Are the most basic data type
• Are binary-safe
• Is used for storing:
Strings (duh) – APPEND, GETRANGE, SETRANGE, STRLEN
Integers – INCR, INCRBY, DECR, DECRBY
Floats – INCRBYFLOAT
Bits – SETBIT, GETBIT, BITPOS, BITCOUNT, BITOP
http://xkcd.com/171/
12
Pattern: Caching Calls to the DB
• Motivation: quick responses, reduce load on DBMS
• How: keep the statement's results using the Redis STRING data type
def get_results(sql):
hash = md5.new(sql).digest()
result = redis.get(hash)
if result is None:
result = db.execute(sql)
redis.set(hash, result)
# or use redis.setex to set a TTL for the key
return result
13
The HASH Data Type
• Acts as a Redis-within-Redis  contains key-value pairs
• Have their own commands: HINCRBY, HINCRBYFLOAT, HLEN,
HKEYS, HVALS…
• Usually used for aggregation, i.e. keeping related data together
for easy fetching/updating (remember that Redis is not a
relational database). Example:
Using separate keys Using hash aggregation
user:1:id  1 user:1 id  1
user:1:fname  Foo fname  Foo
user:1:lname  Bar lname  Bar
user:1:email  foo@acme.com email  foo@acme.com
14
Pattern: Avoiding Calls to the DB
• Motivation: server-side storage and sharing of transient data that doesn't need a
full-fledged RDBMS, e.g. sessions and shopping carts
• How: depending on the case, use STRING or HASH to store data in Redis
def add_to_cart(session, product, quantity):
if quantity > 0:
redis.hset('cart:' + userId, product, quantity)
else:
redis.hrem('cart:' + userId, product)
redis.expire('cart:' + userId,Cart_Timeout)
def get_cart_contents(session):
return redis.hgetall('cart:' + userId)
15
Pattern: Counting Things
• Motivation: statistics, real-time analytics, dashboards, throttling
• How #1: use the *INCR commands
• How #2: use a little bit of BIT*
def user_log_login(uid):
joined = redis.hget('user:' + uid, 'joined')
d0 = datetime.strptime(joined, '%Y-%m-$d')
d1 = datetime.date.today()
delta = d1 – d0
redis.setbit('user:' + uid + ':logins', delta, 1)
def user_logins_count(uid):
return redis.bitcount(
'user:' + uid + ':logins', 0, -1)
16
De-normalization
• Non relational  no foreign keys, no
referential integrity constraints
• Thus, data normalization isn't practical (
Mostly)
• Be prepared to have duplicated data, e.g.:
> HSET user:1 country Mordor
> HSET user:2 country Mordor
…
• Tradeoff:
• Processing Complexity ↔ Data Volume
17
LISTs
17
• Lists of strings sorted by insertion
order
• Usually have a head and a tail
• Top n, bottom n, constant length list
operations as well as passing items
from one list to another are
extremely popular and extremely
fast
18
Pattern: Lists of Items
• Motivation: keeping track of a sequence, e.g. last viewed profiles
• How: use Redis' LIST data type
def view_product(uid, product):
redis.lpush('user:' + uid + ':viewed', product)
redis.ltrim('user:' + uid + ':viewed', 0, 9)
…
def get_last_viewed_products(uid):
return redis.lrange('user:' + uid + ':viewed', 0,
-1)
19
Pattern: Queues
• Motivation: a producer-consumer use case, asynchronous job
management, e.g. processing photo uploads
def enqueue(queue, item):
redis.lpush(queue, item)
def dequeue(queue):
return redis.rpop(queue)
# or use brpop for blocking pop
20
SETs
• Unordered collection of strings
• ADD, REMOVE or TEST for
membership –O(1)
• Unions, intersections,
differences computed very very
fast
21
Pattern: Searching
• Motivation: finding keys in the database, for example all the users
• How #1: use a LIST to store key names
• How #2: the *SCAN commands
def do_something_with_all_users():
first = True
cursor = 0
while cursor != 0 or first:
first = False
cursor, data = redis.scan(cursor, 'user:*')
do_something(data)
22
Pattern: Indexing
• Motivation: Redis doesn't have indices, you need to maintain
them
• How: the SET data type (a collection of unordered unique
members)
def update_country_idx(country, uid):
redis.sadd('country:' + country, uid)
def get_users_in_country(country):
return redis.smembers('country:' + country)
23
Pattern: Relationships
• Motivation: Redis doesn't have foreign keys, you need to maintain
them
> SADD user:1:friends 3 4 5 // Foo is social and makes friends
> SCARD user:1:friends // How many friends does Foo have?
> SINTER user:1:friends user:2:friends // Common friends
> SDIFF user:1:friends user:2:friends // Exclusive friends
> SUNION user:1:friends user:2:friends // All the friends
24
ZSETs (Sorted Sets)
Are just like SETs:
• Members are unique
• ZADD, ZCARD, ZINCRBY, …
ZSET members have a score that's used for sorting
• ZCOUNT, ZRANGE, ZRANGEBYSCORE
When the scores are identical, members are sorted alphabetically
Lexicographical ranges are also supported:
• ZLEXCOUNT, ZRANGEBYLEX
25
Pattern: Sorting
Motivation: anything that needs to be sorted
How: ZSETs
> ZADD friends_count 3 1 1 2 999 3
> ZREVRANGE friends_count 0 -1
3
1
2
Set members (uids)
Scores (friends count)
26
The SORT Command
• A command that sorts LISTs, SETs and SORTED SETs
• SORT's syntax is the most complex (comparatively) but SQLers
should feel right at home with it:
• SORT key [BY pattern] [LIMIT offset count]
[GET pattern [GET pattern ...]]
[ASC|DESC] [ALPHA]
[STORE destination]
• SORT is also expensive in terms of complexity  O(N+M*log(M))
• BTW, SORT is perhaps the only ad-hoc-like command in Redis
27
Pattern: Counting Unique Items
• How #1: SADD items and SCARD for the count
• Problem: more unique items  more RAM 
• How #2: the HyperLogLog data structure
> PFADD counter item1 item2 item3 …
HLL is a probabilistic data structure that counts (PFCOUNT) unique
items
Sacrifices accuracy: standard error of 0.81%
Gains: constant complexity and memory – 12KB per counter
Bonus: HLLs are merge-able with PFMERGE

28
Is Redis ACID? (mostly) Yes!
• Redis is (mostly) single threaded, hence every operation is
o Atomic
o Isolated
• WATCH/MULTI/EXEC allow something like transactions (no rollbacks)
• Server-side Lua scripts ("stored procedures") also behave like
transactions
• Durability is configurable and is a tradeoff between efficiency and
safety
• Consistency achieved using the “wait” command
30
Wait, There's More!
• There are additional commands that we didn't cover 
• Expiration and eviction policies
• Publish/Subscribe
• Data persistency and durability
• Server-side scripting with Lua
• Master-Slave(s) replication
• High Availability
• Clustering
Modules, huh?
33
Before
• Redis is ubiquitous for fast data, fits lots of cases (Swiss™ Army
knife)
• Some use cases need special care
• Open source has its own agenda
34
After
• Core still fits lots of cases
• Module extensions for special cases
• A new community-driven ecosystem
• “Give power to users to go faster”
35
What are they?
• Dynamically (server-)loaded libraries
• Future-compatible
• Will be (mostly) written in C
• (Almost) As fast as the core
• Planned for public release Q3 2016
36
What can I do with them?
• Process: where the data is at
• Compose: call core & other modules
• Extend: new structures, commands
37
It’s got layers!
• Operational: admin, memory, disk,
replication, arguments, replies…
• High-level: client-like access to core and
modules’ commands
• Low-level: (almost) native access to core
data structures memory
38
They make everything faster!
38
39
Make your own!
40
Make your own!
41
Make your own!
42
Learn More
• https://www.youtube.com/watch?v=EglSYFodaqw
Redis in your Big Data
44
Spark Operation w/o Redis
Read to RDD Deserialization Processing Serialization Write to RDD
Analytics & BI
1 2 3 4 5 6
Data SinkData Source
45
Spark SQL &
Data Frame
Spark Operation with Redis
Data Source Serving Layer
Analytics & BI
1 2
Processing
Spark-Redis connector
Read
filtered/sorted
data
Write
filtered/sorted
data
46
Accelerating Spark Time-Series with Redis
Redis is faster by up to 100 times compared to HDFS
and over 45 times compared to Tachyon or Spark
47
Goal:
• Accelerate Hadoop operation by orders of magnitude:
‒ Phase 1 – use Redis as a caching solution for HBase (Hadoop’s default database)
and HDFS (Hadoop Distributed File System)
‒ phase 2 – completely replace HBase with Redis
Hadoop is Turbo Charged with Redis
Milestone:
• Demonstrated Hadoop acceleration by
HBase caching with Redis
Real-life scenario:
~500% acceleration
Thank you

Weitere ähnliche Inhalte

Andere mochten auch

Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineSpark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Data Con LA
 
Lisp for Python Programmers
Lisp for Python ProgrammersLisp for Python Programmers
Lisp for Python Programmers
Vsevolod Dyomkin
 

Andere mochten auch (20)

Getting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of DatastaxGetting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of Datastax
 
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­ticaA noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
 
Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of Amazon
Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of AmazonBig Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of Amazon
Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of Amazon
 
Data science and good questions eric kostello
Data science and good questions eric kostelloData science and good questions eric kostello
Data science and good questions eric kostello
 
Red Box Commerce Shopping Cart
Red Box Commerce Shopping CartRed Box Commerce Shopping Cart
Red Box Commerce Shopping Cart
 
Shopping Cart Optimization for eCommerce Web Sites
Shopping Cart Optimization for eCommerce Web SitesShopping Cart Optimization for eCommerce Web Sites
Shopping Cart Optimization for eCommerce Web Sites
 
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksThe Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
 
Big Data Day LA 2015 - Data Science ≠ Big Data by Jim McGuire of ZestFinance
Big Data Day LA 2015 - Data Science ≠ Big Data by Jim McGuire of ZestFinanceBig Data Day LA 2015 - Data Science ≠ Big Data by Jim McGuire of ZestFinance
Big Data Day LA 2015 - Data Science ≠ Big Data by Jim McGuire of ZestFinance
 
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineSpark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
 
Big Data Day LA 2015 - Scalable and High-Performance Analytics with Distribut...
Big Data Day LA 2015 - Scalable and High-Performance Analytics with Distribut...Big Data Day LA 2015 - Scalable and High-Performance Analytics with Distribut...
Big Data Day LA 2015 - Scalable and High-Performance Analytics with Distribut...
 
Big Data Day LA 2016/ Data Science Track - Enabling Cross-Screen Advertising ...
Big Data Day LA 2016/ Data Science Track - Enabling Cross-Screen Advertising ...Big Data Day LA 2016/ Data Science Track - Enabling Cross-Screen Advertising ...
Big Data Day LA 2016/ Data Science Track - Enabling Cross-Screen Advertising ...
 
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
 
Lisp for Python Programmers
Lisp for Python ProgrammersLisp for Python Programmers
Lisp for Python Programmers
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...
 
Scalable Streaming Data Pipelines with Redis
Scalable Streaming Data Pipelines with RedisScalable Streaming Data Pipelines with Redis
Scalable Streaming Data Pipelines with Redis
 
数据库内核分享——第一期
数据库内核分享——第一期数据库内核分享——第一期
数据库内核分享——第一期
 
Big Data Day LA 2016/ Use Case Driven track - BI is broken, Dave Fryer, Produ...
Big Data Day LA 2016/ Use Case Driven track - BI is broken, Dave Fryer, Produ...Big Data Day LA 2016/ Use Case Driven track - BI is broken, Dave Fryer, Produ...
Big Data Day LA 2016/ Use Case Driven track - BI is broken, Dave Fryer, Produ...
 
Big Data Day LA 2016/ Big Data Track - Warner Bros. Digital Consumer Intellig...
Big Data Day LA 2016/ Big Data Track - Warner Bros. Digital Consumer Intellig...Big Data Day LA 2016/ Big Data Track - Warner Bros. Digital Consumer Intellig...
Big Data Day LA 2016/ Big Data Track - Warner Bros. Digital Consumer Intellig...
 
Cost Savings at High Performance with Redis Labs and AWS
Cost Savings at High Performance with Redis Labs and AWSCost Savings at High Performance with Redis Labs and AWS
Cost Savings at High Performance with Redis Labs and AWS
 
Living with SQL and NoSQL at craigslist, a Pragmatic Approach
Living with SQL and NoSQL at craigslist, a Pragmatic ApproachLiving with SQL and NoSQL at craigslist, a Pragmatic Approach
Living with SQL and NoSQL at craigslist, a Pragmatic Approach
 

Mehr von Data Con LA

Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA
 

Mehr von Data Con LA (20)

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup Showcase
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendations
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI Ethics
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learning
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentation
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWS
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data Science
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with Kafka
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Kürzlich hochgeladen (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 

Big Data Day LA 2016/ NoSQL track - Using Redis Data Structures to Make Your App Blazing Fast, Adi Foulger, Solutions Architect, Redis Labs

  • 1. Overclock your Data with Redis
  • 2. 2 Who? Me: Adi Foulger, Pro Geek @ Redis Labs, the open source home and provider of enterprise Redis About Redis Labs: • 5300+ paying customers, 35k+ free customers • 118k Redis databases managed • Withstood several datacenter outages with no loss of data for customers who chose HA options • Two main products: Redis Cloud, Redis Labs Enterprise Cluster
  • 3. 3 What? Redis is an open source (BSD licensed), in-memory data structure store, used as database, cache and message broker and much more!
  • 4. Redis… How does it work?
  • 5. 5 REmote DIctionary Server • Data Structure Database • An overly simplistic definition : CREATE TABLE redis ( k VARCHAR(512MB) NOT NULL, v VARCHAR(512MB), PRIMARY KEY (k) ); • 8ish data types, 180+ commands, blazing fast • Created by @antirez (a.k.a Salvatore Sanfilippo) • v1.0 August 9th, 2009 … v3.2 May 6th, 2016 • Source: https://github.com/antirez/redis • Website: http://redis.io
  • 6. 6 Data structures are used by developers like “Lego” building blocks, saving them much coding effort and time Redis : A Data Structure Database Strings Hashes Lists Sets Sorted Sets Bitmaps Hyper- LogLogs Geospatial indexes
  • 7. 7 Why? Because It Is Fun! • Simplicity  rich functionality, great flexibility • Performance  easily serves 100K’s of ops/sec • Lightweight  ~ 2MB footprint • Production proven (name dropping)
  • 8. 8 • about how data is stored • about how data is accessed • about efficiency • about performance • about the network • … • Redis is a database construction kit • Beware of Maslow's "Golden" Gavel/Law of Instrument: "If all you have is a hammer, everything looks like a nail" Redis Makes You Think
  • 9. Okay… so how do I use it?
  • 10. 10 Key Points About Key Names • Key names are "limited" to 512MB (also the values btw) • To conserve RAM & CPU, try avoid using unnecessarily_longish_names_for_your_redis_keys because they are more expensive to store and compare (unlike an RDBMS's column names, key names are saved for each key-value pair) • On the other hand, don't be too stringent (e.g 'u:<uid>:r') • Although not mandatory, the convention is to use colons (':') to separate the parts of the key's name • Your schema is your keys' names so keep them in order
  • 11. 11 STRINGs • Are the most basic data type • Are binary-safe • Is used for storing: Strings (duh) – APPEND, GETRANGE, SETRANGE, STRLEN Integers – INCR, INCRBY, DECR, DECRBY Floats – INCRBYFLOAT Bits – SETBIT, GETBIT, BITPOS, BITCOUNT, BITOP http://xkcd.com/171/
  • 12. 12 Pattern: Caching Calls to the DB • Motivation: quick responses, reduce load on DBMS • How: keep the statement's results using the Redis STRING data type def get_results(sql): hash = md5.new(sql).digest() result = redis.get(hash) if result is None: result = db.execute(sql) redis.set(hash, result) # or use redis.setex to set a TTL for the key return result
  • 13. 13 The HASH Data Type • Acts as a Redis-within-Redis  contains key-value pairs • Have their own commands: HINCRBY, HINCRBYFLOAT, HLEN, HKEYS, HVALS… • Usually used for aggregation, i.e. keeping related data together for easy fetching/updating (remember that Redis is not a relational database). Example: Using separate keys Using hash aggregation user:1:id  1 user:1 id  1 user:1:fname  Foo fname  Foo user:1:lname  Bar lname  Bar user:1:email  foo@acme.com email  foo@acme.com
  • 14. 14 Pattern: Avoiding Calls to the DB • Motivation: server-side storage and sharing of transient data that doesn't need a full-fledged RDBMS, e.g. sessions and shopping carts • How: depending on the case, use STRING or HASH to store data in Redis def add_to_cart(session, product, quantity): if quantity > 0: redis.hset('cart:' + userId, product, quantity) else: redis.hrem('cart:' + userId, product) redis.expire('cart:' + userId,Cart_Timeout) def get_cart_contents(session): return redis.hgetall('cart:' + userId)
  • 15. 15 Pattern: Counting Things • Motivation: statistics, real-time analytics, dashboards, throttling • How #1: use the *INCR commands • How #2: use a little bit of BIT* def user_log_login(uid): joined = redis.hget('user:' + uid, 'joined') d0 = datetime.strptime(joined, '%Y-%m-$d') d1 = datetime.date.today() delta = d1 – d0 redis.setbit('user:' + uid + ':logins', delta, 1) def user_logins_count(uid): return redis.bitcount( 'user:' + uid + ':logins', 0, -1)
  • 16. 16 De-normalization • Non relational  no foreign keys, no referential integrity constraints • Thus, data normalization isn't practical ( Mostly) • Be prepared to have duplicated data, e.g.: > HSET user:1 country Mordor > HSET user:2 country Mordor … • Tradeoff: • Processing Complexity ↔ Data Volume
  • 17. 17 LISTs 17 • Lists of strings sorted by insertion order • Usually have a head and a tail • Top n, bottom n, constant length list operations as well as passing items from one list to another are extremely popular and extremely fast
  • 18. 18 Pattern: Lists of Items • Motivation: keeping track of a sequence, e.g. last viewed profiles • How: use Redis' LIST data type def view_product(uid, product): redis.lpush('user:' + uid + ':viewed', product) redis.ltrim('user:' + uid + ':viewed', 0, 9) … def get_last_viewed_products(uid): return redis.lrange('user:' + uid + ':viewed', 0, -1)
  • 19. 19 Pattern: Queues • Motivation: a producer-consumer use case, asynchronous job management, e.g. processing photo uploads def enqueue(queue, item): redis.lpush(queue, item) def dequeue(queue): return redis.rpop(queue) # or use brpop for blocking pop
  • 20. 20 SETs • Unordered collection of strings • ADD, REMOVE or TEST for membership –O(1) • Unions, intersections, differences computed very very fast
  • 21. 21 Pattern: Searching • Motivation: finding keys in the database, for example all the users • How #1: use a LIST to store key names • How #2: the *SCAN commands def do_something_with_all_users(): first = True cursor = 0 while cursor != 0 or first: first = False cursor, data = redis.scan(cursor, 'user:*') do_something(data)
  • 22. 22 Pattern: Indexing • Motivation: Redis doesn't have indices, you need to maintain them • How: the SET data type (a collection of unordered unique members) def update_country_idx(country, uid): redis.sadd('country:' + country, uid) def get_users_in_country(country): return redis.smembers('country:' + country)
  • 23. 23 Pattern: Relationships • Motivation: Redis doesn't have foreign keys, you need to maintain them > SADD user:1:friends 3 4 5 // Foo is social and makes friends > SCARD user:1:friends // How many friends does Foo have? > SINTER user:1:friends user:2:friends // Common friends > SDIFF user:1:friends user:2:friends // Exclusive friends > SUNION user:1:friends user:2:friends // All the friends
  • 24. 24 ZSETs (Sorted Sets) Are just like SETs: • Members are unique • ZADD, ZCARD, ZINCRBY, … ZSET members have a score that's used for sorting • ZCOUNT, ZRANGE, ZRANGEBYSCORE When the scores are identical, members are sorted alphabetically Lexicographical ranges are also supported: • ZLEXCOUNT, ZRANGEBYLEX
  • 25. 25 Pattern: Sorting Motivation: anything that needs to be sorted How: ZSETs > ZADD friends_count 3 1 1 2 999 3 > ZREVRANGE friends_count 0 -1 3 1 2 Set members (uids) Scores (friends count)
  • 26. 26 The SORT Command • A command that sorts LISTs, SETs and SORTED SETs • SORT's syntax is the most complex (comparatively) but SQLers should feel right at home with it: • SORT key [BY pattern] [LIMIT offset count] [GET pattern [GET pattern ...]] [ASC|DESC] [ALPHA] [STORE destination] • SORT is also expensive in terms of complexity  O(N+M*log(M)) • BTW, SORT is perhaps the only ad-hoc-like command in Redis
  • 27. 27 Pattern: Counting Unique Items • How #1: SADD items and SCARD for the count • Problem: more unique items  more RAM  • How #2: the HyperLogLog data structure > PFADD counter item1 item2 item3 … HLL is a probabilistic data structure that counts (PFCOUNT) unique items Sacrifices accuracy: standard error of 0.81% Gains: constant complexity and memory – 12KB per counter Bonus: HLLs are merge-able with PFMERGE 
  • 28. 28 Is Redis ACID? (mostly) Yes! • Redis is (mostly) single threaded, hence every operation is o Atomic o Isolated • WATCH/MULTI/EXEC allow something like transactions (no rollbacks) • Server-side Lua scripts ("stored procedures") also behave like transactions • Durability is configurable and is a tradeoff between efficiency and safety • Consistency achieved using the “wait” command
  • 29. 30 Wait, There's More! • There are additional commands that we didn't cover  • Expiration and eviction policies • Publish/Subscribe • Data persistency and durability • Server-side scripting with Lua • Master-Slave(s) replication • High Availability • Clustering
  • 31. 33 Before • Redis is ubiquitous for fast data, fits lots of cases (Swiss™ Army knife) • Some use cases need special care • Open source has its own agenda
  • 32. 34 After • Core still fits lots of cases • Module extensions for special cases • A new community-driven ecosystem • “Give power to users to go faster”
  • 33. 35 What are they? • Dynamically (server-)loaded libraries • Future-compatible • Will be (mostly) written in C • (Almost) As fast as the core • Planned for public release Q3 2016
  • 34. 36 What can I do with them? • Process: where the data is at • Compose: call core & other modules • Extend: new structures, commands
  • 35. 37 It’s got layers! • Operational: admin, memory, disk, replication, arguments, replies… • High-level: client-like access to core and modules’ commands • Low-level: (almost) native access to core data structures memory
  • 41. Redis in your Big Data
  • 42. 44 Spark Operation w/o Redis Read to RDD Deserialization Processing Serialization Write to RDD Analytics & BI 1 2 3 4 5 6 Data SinkData Source
  • 43. 45 Spark SQL & Data Frame Spark Operation with Redis Data Source Serving Layer Analytics & BI 1 2 Processing Spark-Redis connector Read filtered/sorted data Write filtered/sorted data
  • 44. 46 Accelerating Spark Time-Series with Redis Redis is faster by up to 100 times compared to HDFS and over 45 times compared to Tachyon or Spark
  • 45. 47 Goal: • Accelerate Hadoop operation by orders of magnitude: ‒ Phase 1 – use Redis as a caching solution for HBase (Hadoop’s default database) and HDFS (Hadoop Distributed File System) ‒ phase 2 – completely replace HBase with Redis Hadoop is Turbo Charged with Redis Milestone: • Demonstrated Hadoop acceleration by HBase caching with Redis Real-life scenario: ~500% acceleration

Hinweis der Redaktion

  1. AOF rewrites till disk is about to be full, take snapshot and start over. Auto matically – no complex configuration needed.
  2. Spark, the new in-memory distributed data processing framework represents the next generation of big data analytics tools. Spark isn't a database; whenever it needs to process data it has to read it from unfiltered/unordered the source, translates it to its internal data-structures, RDDs(resilient distributed datasets) or Tungsten , and  execute multiple deserialization and serialization steps before processing it. The same applicable when Spark needs to save the data to a data sink.
  3. With Redis' data structures exposed to Spark, its operations are tremendously simplified and accelerated. With the Redis Labs' Spark Redis connector, Spark can read only the data it needs for its processing directly from Redis, avoid copying the data, serializing and deserializing it. The same offloading applies when Spark needs to store its processing results back in Redis. And when Redis is used as a serving layer for Spark it can further offload and accelerate Spark, as whenever an SQL query arrives to Spark (using Spark SQL) it will first be examined against the data in Redis, and if the data is found, it will be read directly from Redis and the entire cycle of Spark processing will be avoided. 
  4. The acceleration provided by Redis is demonstrated easily through this benchmark performed on time series data. When Redis sorted sets are used to store timeseries data ( in this case stock prices for 1024 stocks over the last 30 years), Spark queries are executed 100 times faster compared to Spark using HDFS and 45 times faster compared to Spark using Tachyon or just in process memory