Streaming Logs and Processing View Counts using Redis Cluster
Seandon Mooy
(Imgur)
When you browse through Imgur, you notice that each user's post includes the number of views for that particular post. Imgur processes over 3 billion views per month and powers our view count feature using Redis. In this talk, we cover our current architecture for streaming logs and processing view counts using Redis Cluster, as well as some of the alternatives we explored and why we chose Redis.
13. Challenges with Hbase
Roughly 5% of all requests
through THRIFT were
failing… So many tunables!
Optimized timeouts,
added circuitbreakers, etc
Trickle of working requests
during outage means circuit
breakers are hard to design…
14. Challenges with Hbase
Roughly 5% of all requests
through THRIFT were
failing… So many tunables!
Optimized timeouts,
added circuitbreakers, etc
Trickle of working requests
during outage means circuit
breakers are hard to design…
“Hbase down == Imgur down”
Downtime == sadtime :(
18. Fastly
ViewCount V2 - Real time with less complexity!
TCP syslog stream
Ingest service
19. Fastly
ViewCount V2 - Real time with less complexity!
TCP syslog stream
Ingest service
Parses syslog lines, reports
metrics via statsd
20. Fastly
ViewCount V2 - Real time with less complexity!
TCP syslog stream
Ingest service
Parses syslog lines, reports
metrics via statsd
Redis 3.2 cluster!
21. Fastly
ViewCount V2 - Real time with less complexity!
Ingest service
Hbase Backfill service
22. Fastly
ViewCount V2 - Real time with less complexity!
Ingest service
Hbase Backfill service
Internet
API service
27. Things to be aware of:
1. Redis Cluster shard maps - redirections, etc.
Monitor redirections - gracefully restart workers after shard moves
2. AOF can slow down / fail large “redis-trib.rb” operations.
Make sure to disable before / re-enable after!
3. Not all legacy systems support Redis Cluster, and if they do…
They might not support it well (PHP-FPM)!
4. Over memory capacity behavior?
Previously we would hard-crash - now we’d LRU old 1-view images.
Neither are good, but for us, one is much less painful
28. ViewCount V3?
Approaching the point of minimal gains for man-hours, but what else might be fun?
1. Moving PHP7 off NodeJS API and directly to Redis Cluster
Downsides: dealing with shard maps is complex is a stateless / process-per-request environment!
2. Using redis3's BITFIELD or HSet to save on key storage costs
Downsides: complicate the system, reduce “hit-by-a-bus” issues - keys are just hashes, values are just counts!
3. Dealing with the nature of TCP Streams (TCP is not HTTP!)
One connection to rule them all! - Node’s Cluster module helps,
but perhaps Rust or Golang?
Downsides: Vertical scaling is non-obvious on EC2
29. ViewCount V2 - Results:
Redis is:
Faster - Imgur response time decreased ~50ms
30. ViewCount V2 - Results:
Redis is:
Faster - Imgur response time decreased ~50ms
Cheaper - EC2 cost reduced by 75%
31. ViewCount V2 - Results:
Redis is:
Faster - Imgur response time decreased ~50ms
Cheaper - EC2 cost reduced by 75%
Simpler - No Java, no MR, no ZK, no third parties, just INCR + GET!
32. Redis is:
Faster - Imgur response time decreased ~50ms
Cheaper - EC2 cost reduced by 75%
Simpler - No Java, no MR, no ZK, no third parties, just INCR + GET!
More fun! - I got to talk at RedisConf17!
ViewCount V2 - Results: