SlideShare a Scribd company logo
1 of 19
Download to read offline
A High-Level Pass
Through Redis Analytics*
by Josiah Carlson www.dr-josiah.com
@dr_josiah bit.ly/redis-in-action
Agenda
● Quick overview of Redis
● Monthly unique return/churn
○ too much memory method
○ reasonable memory method
○ very low memory method
● Visitor action sequence analytics
○ sequence method
○ low-memory method
● Geographic notifications with partitioning*
Quick Redis overview
● Remote key -> data structure server
○ Strings/integers/bitmaps
○ Lists of strings
○ Sets of unique string members
○ Hashes of key -> value
○ Sorted sets (ZSETs) mapping of member -> score
● Supports
○ Persistence
○ Replication
○ Publish/subscribe
○ Server-side Lua scripting (like a stored procedure)
○ Client-side sharding (server side in-progress)
Monthly unique return/churn
Problem:
● Say that you have millions of monthly visitors
● Need to know monthly churn, expected
~50%
● Don't want to waste too much memory
Monthly unique return/churn
Too much memory:
● Generate UUIDs for users, store in cookie
● Use a HASH mapping from UUIDs to int ids
● Use a HASH mapping from int ids to UUIDs
● Create a ZSET of short ids to timestamp
● Use per-month bitmaps for churn calculation
● Recycle int ids based on old timestamps,
discarding UUIDs and resetting bits
Monthly unique return/churn
Drawbacks:
● Memory use based on size of HASHes and
ZSET (about to 400 bytes/unique user)
● Second HASH can be thrown away
● The other HASH, ZSET, and bitmaps can be
thrown away and replaced by a "this month"
and "last month" SET (about 120 bytes/user)
● With 63 bit integer UUID and sharding
techniques, about 16 bytes/user
Monthly unique return/churn
Reasonable memory solution:
● Store per-month id in a signed cookie (lower-32 is the
unique id for the month, next 8 is the month)
● One month of bitmap
● If this month cookie, do nothing
● If last month cookie and bit isn't set for that id, mark the
bitmap, generate a new cookie, increment unique and
returning counts
● If last month cookie and bit is set, generate a new
cookie
● If old cookie or no cookie, generate a new cookie,
increment unique count
Monthly unique return/churn
Drawbacks:
● Memory use based on unique monthly
counts, ~1 bit per user (not bad)
● If you push to hundreds of millions/billions of
users, you should shard your bitmaps to
minimize realloc cost on bitmap updates
Monthly unique return/churn
Very low memory method:
● Store per-month id in a signed cookie
● If this month cookie, do nothing
● If last month cookie, generate a new cookie
for the client, increment unique and return
counts
● If old cookie or no cookie, generate a new
cookie, increment unique count
Monthly unique return/churn
Drawback:
● If someone sends you duplicate cookies,
hard to detect (keep "recently replaced"
cache, 5-10 minutes worth is likely good
enough)
Tangent on ZSETs
This slide is a filler so that I can talk about one
of my favorite "get rid of ZSETs" tricks, which
results in significant memory savings for a fairly
large subset of problems
Visitor action sequences
Problem:
● How are my funnels performing?
● These suck:
Visitor action sequences
Sequence method:
● Each user gets a LIST
● All users are recorded in a ZSET with a score based on
time
● Each action/page RPUSHes the action/page to the LIST
● Clean-up/analyze old sequences based on timestamps
in the ZSET
Drawbacks:
● Memory use can be high for active users
● More detailed events can use more memory
Visitor action sequences
Low memory method:
● Each user gets a bitmap (limit your unique events)
● All actions are mapped to an index in the bitmap
● When a user performs the action/visits the page, set the
bit and update the ZSET
● Clean up/analyze old bitmaps based on timestamps in
the ZSET
Drawbacks:
● No more strict sequence analysis possible
● Memory use is dominated by ZSET storage
Geo Notifications
Problem:
● Want to send events to nearby users
● Don't want users to be notified too often
● Reduce radius of results as notifications rise
● Increase radius of results as notifications fall
● Allow for history to be received on connect
Geo Notifications
● Consider the world as a recursively-divided series of
blocks (highest level as 1x1 degree)
● Clients subscribe to all block levels that their user is in
or is interested in
● When writing an event at point (lat,lon):
○ Add the event id to ZSETs to as deep a partition as you would ever
expect to need
○ Trim the ZSETs along the way based on your desired history
○ Check the resulting size of the ZSETs to determine the highest-level
block that is under your limit
○ Publish the event to a channel based on that level
Geo Notifications
Drawbacks:
● Event id/timestamp information is duplicated
● Large histories may use significant memory
(ZSETs can be replaced by LISTs with
minimal changes)
● Old data in un-visited blocks aren't cleaned
out (can add expiration)
Other questions?
Thank you
@dr_josiah www.dr-josiah.com
bit.ly/redis-in-action

More Related Content

Similar to Josiah carlson 2013-05-16 - redis analytics

Big data @ uber vu (1)
Big data @ uber vu (1)Big data @ uber vu (1)
Big data @ uber vu (1)
Mihnea Giurgea
 
Space Ape's Analytics Stack
Space Ape's Analytics StackSpace Ape's Analytics Stack
Space Ape's Analytics Stack
Simon Hade
 
Ceilometer lsf-intergration-openstack-summit
Ceilometer lsf-intergration-openstack-summitCeilometer lsf-intergration-openstack-summit
Ceilometer lsf-intergration-openstack-summit
Tim Bell
 

Similar to Josiah carlson 2013-05-16 - redis analytics (20)

Scaling event aggregation at twitter
Scaling event aggregation at twitterScaling event aggregation at twitter
Scaling event aggregation at twitter
 
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
 
MicroStrategy at Badoo
MicroStrategy at BadooMicroStrategy at Badoo
MicroStrategy at Badoo
 
Speed Up Uber's Presto with Alluxio
Speed Up Uber's Presto with AlluxioSpeed Up Uber's Presto with Alluxio
Speed Up Uber's Presto with Alluxio
 
Piano Media - approach to data gathering and processing
Piano Media - approach to data gathering and processingPiano Media - approach to data gathering and processing
Piano Media - approach to data gathering and processing
 
Elasticsearch as a time series database
Elasticsearch as a time series databaseElasticsearch as a time series database
Elasticsearch as a time series database
 
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
 
Big data @ uber vu (1)
Big data @ uber vu (1)Big data @ uber vu (1)
Big data @ uber vu (1)
 
Sea of Data
Sea of DataSea of Data
Sea of Data
 
A Technical Introduction to RTBkit
A Technical Introduction to RTBkitA Technical Introduction to RTBkit
A Technical Introduction to RTBkit
 
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...
 
Don’t give up, You can... Cache!
Don’t give up, You can... Cache!Don’t give up, You can... Cache!
Don’t give up, You can... Cache!
 
Those days
Those daysThose days
Those days
 
Denver MuleSoft Meetup Feb 24, 2021 - What's Batch Got to Do with It
Denver MuleSoft Meetup Feb 24, 2021 - What's Batch Got to Do with ItDenver MuleSoft Meetup Feb 24, 2021 - What's Batch Got to Do with It
Denver MuleSoft Meetup Feb 24, 2021 - What's Batch Got to Do with It
 
Our journey with druid - from initial research to full production scale
Our journey with druid - from initial research to full production scaleOur journey with druid - from initial research to full production scale
Our journey with druid - from initial research to full production scale
 
Space Ape's Analytics Stack
Space Ape's Analytics StackSpace Ape's Analytics Stack
Space Ape's Analytics Stack
 
Deltek Vision User Group Meeting - Q2 2013
Deltek Vision User Group Meeting - Q2 2013Deltek Vision User Group Meeting - Q2 2013
Deltek Vision User Group Meeting - Q2 2013
 
Ceilometer lsf-intergration-openstack-summit
Ceilometer lsf-intergration-openstack-summitCeilometer lsf-intergration-openstack-summit
Ceilometer lsf-intergration-openstack-summit
 
Building ZingMe News Feed System
Building ZingMe News Feed SystemBuilding ZingMe News Feed System
Building ZingMe News Feed System
 
Apache Cassandra at Target - Cassandra Summit 2014
Apache Cassandra at Target - Cassandra Summit 2014Apache Cassandra at Target - Cassandra Summit 2014
Apache Cassandra at Target - Cassandra Summit 2014
 

Recently uploaded

Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Peter Udo Diehl
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
UXDXConf
 

Recently uploaded (20)

Strategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering TeamsStrategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering Teams
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101
 
THE BEST IPTV in GERMANY for 2024: IPTVreel
THE BEST IPTV in  GERMANY for 2024: IPTVreelTHE BEST IPTV in  GERMANY for 2024: IPTVreel
THE BEST IPTV in GERMANY for 2024: IPTVreel
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty Secure
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoft
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 

Josiah carlson 2013-05-16 - redis analytics

  • 1. A High-Level Pass Through Redis Analytics* by Josiah Carlson www.dr-josiah.com @dr_josiah bit.ly/redis-in-action
  • 2. Agenda ● Quick overview of Redis ● Monthly unique return/churn ○ too much memory method ○ reasonable memory method ○ very low memory method ● Visitor action sequence analytics ○ sequence method ○ low-memory method ● Geographic notifications with partitioning*
  • 3. Quick Redis overview ● Remote key -> data structure server ○ Strings/integers/bitmaps ○ Lists of strings ○ Sets of unique string members ○ Hashes of key -> value ○ Sorted sets (ZSETs) mapping of member -> score ● Supports ○ Persistence ○ Replication ○ Publish/subscribe ○ Server-side Lua scripting (like a stored procedure) ○ Client-side sharding (server side in-progress)
  • 4. Monthly unique return/churn Problem: ● Say that you have millions of monthly visitors ● Need to know monthly churn, expected ~50% ● Don't want to waste too much memory
  • 5. Monthly unique return/churn Too much memory: ● Generate UUIDs for users, store in cookie ● Use a HASH mapping from UUIDs to int ids ● Use a HASH mapping from int ids to UUIDs ● Create a ZSET of short ids to timestamp ● Use per-month bitmaps for churn calculation ● Recycle int ids based on old timestamps, discarding UUIDs and resetting bits
  • 6. Monthly unique return/churn Drawbacks: ● Memory use based on size of HASHes and ZSET (about to 400 bytes/unique user) ● Second HASH can be thrown away ● The other HASH, ZSET, and bitmaps can be thrown away and replaced by a "this month" and "last month" SET (about 120 bytes/user) ● With 63 bit integer UUID and sharding techniques, about 16 bytes/user
  • 7. Monthly unique return/churn Reasonable memory solution: ● Store per-month id in a signed cookie (lower-32 is the unique id for the month, next 8 is the month) ● One month of bitmap ● If this month cookie, do nothing ● If last month cookie and bit isn't set for that id, mark the bitmap, generate a new cookie, increment unique and returning counts ● If last month cookie and bit is set, generate a new cookie ● If old cookie or no cookie, generate a new cookie, increment unique count
  • 8. Monthly unique return/churn Drawbacks: ● Memory use based on unique monthly counts, ~1 bit per user (not bad) ● If you push to hundreds of millions/billions of users, you should shard your bitmaps to minimize realloc cost on bitmap updates
  • 9. Monthly unique return/churn Very low memory method: ● Store per-month id in a signed cookie ● If this month cookie, do nothing ● If last month cookie, generate a new cookie for the client, increment unique and return counts ● If old cookie or no cookie, generate a new cookie, increment unique count
  • 10. Monthly unique return/churn Drawback: ● If someone sends you duplicate cookies, hard to detect (keep "recently replaced" cache, 5-10 minutes worth is likely good enough)
  • 11. Tangent on ZSETs This slide is a filler so that I can talk about one of my favorite "get rid of ZSETs" tricks, which results in significant memory savings for a fairly large subset of problems
  • 12. Visitor action sequences Problem: ● How are my funnels performing? ● These suck:
  • 13. Visitor action sequences Sequence method: ● Each user gets a LIST ● All users are recorded in a ZSET with a score based on time ● Each action/page RPUSHes the action/page to the LIST ● Clean-up/analyze old sequences based on timestamps in the ZSET Drawbacks: ● Memory use can be high for active users ● More detailed events can use more memory
  • 14. Visitor action sequences Low memory method: ● Each user gets a bitmap (limit your unique events) ● All actions are mapped to an index in the bitmap ● When a user performs the action/visits the page, set the bit and update the ZSET ● Clean up/analyze old bitmaps based on timestamps in the ZSET Drawbacks: ● No more strict sequence analysis possible ● Memory use is dominated by ZSET storage
  • 15. Geo Notifications Problem: ● Want to send events to nearby users ● Don't want users to be notified too often ● Reduce radius of results as notifications rise ● Increase radius of results as notifications fall ● Allow for history to be received on connect
  • 16. Geo Notifications ● Consider the world as a recursively-divided series of blocks (highest level as 1x1 degree) ● Clients subscribe to all block levels that their user is in or is interested in ● When writing an event at point (lat,lon): ○ Add the event id to ZSETs to as deep a partition as you would ever expect to need ○ Trim the ZSETs along the way based on your desired history ○ Check the resulting size of the ZSETs to determine the highest-level block that is under your limit ○ Publish the event to a channel based on that level
  • 17. Geo Notifications Drawbacks: ● Event id/timestamp information is duplicated ● Large histories may use significant memory (ZSETs can be replaced by LISTs with minimal changes) ● Old data in un-visited blocks aren't cleaned out (can add expiration)