4. How we view caches
Globally available
Eventually-consistent
Ephemeral storage mechanism
Tunable replication
As an optimization for online services or
As primary storage for bulk computation
(recommendations, predictions, etc.)
5. EVCache Use @ Netflix
70+ distinct EVCache clusters
Used by nearly 200 applications
Data replicated over 3 AWS regions
Over 1 Million replications per second
65+ Billion objects
30+ Million ops/second (1.8 Trillion+ per day)
160+ Terabytes of data stored
Clusters from 3 to hundreds of instances
12000+ memcached instances of varying size
10. Why Optimize for AWS
●Instances disappear
●Zones disappear
●Regions can disappear (Chaos Kong)
●These do happen (and we test all the time)
●Network can be lossy
○Throttling
○Dropped packets
●Customer requests move between regions
11. How we Optimized for AWS
●Multiple copies of data per region
●Clients are local replica aware
●Writes to all local replicas by the client
●Reads are local and retry on other copies
●Replication across regions with a custom
replication system
12. Reading
Zone A
Client Application
Client Library
EVCache Client
Zone B
Client Application
Client Library
EVCache Client
Zone C
Client Application
Client Library
EVCache Client
. . .. . .. . .
13. Writing
Zone A
Client Application
Client Library
EVCache Client
. . .
Zone B
Client Application
Client Library
EVCache Client
. . .
Zone C
Client Application
Client Library
EVCache Client
. . .
14. Use Case: Fronting Services
Client Application
Client Library
EVCache Client Service Client
S S S S. . .
C C C C. . .
. . .
15. Use Case: As the Data Store
Offline / Nearline
Computation
Online Client Application
Client Library
EVCache Client
. . .
Online Services
Offline Services
16. Use Case: Transient Data Store
Online Client Application
Client Library
EVCache Client
Online Client Application
Client Library
EVCache Client
. . .
Online Client Application
Client Library
EVCache Client
19. Cross-Region Replication
Why Replicate?
Maintain duplicate caches in each region
Invalidate stale cache entries in other region’s cache
What do we replicate?
set
delete
Where do we replicate?
One or more other regions, depending on application
requirements
20. ●Replicate delete and invalidations on set
Usually used for caches with persistent store
Entry fetched from persistent store on next cache miss
(demand-fill)
Replicate set
Used for offline/nearline computation
Commonly no persistent store
Duplicate cache in multiple regions
Cross-Region Replication
21. Region BRegion A
EVCache
Replication
Repl Writer
Kafka
Application
Client
EVCache
Replication
Repl Writer
1 set or
delete
2 send
metadata
3 poll msg
6 set or
delete
Application
Client
Kafka
Cross-Region Replication
7 read
23. Cross-Region Replication
Choices for Underlying Message System
● AWS SQS
○ Message queueing service
○ Reliable and fast (but with occasional spikes in latency)
○ No guaranteed ordering of messages
○ Messages are processed at-least-once and removed from queue
○ Forward a message to multiple queues to process multiple times
○ Cost based on messages, bandwidth, etc.
● Apache Kafka
○ Open-source publish-subscribe system
○ Reliable and fast enough
○ Messages are ordered within partition
○ Allows processing of same message by different applications
44. When to Use Caches
●Predictable response time with varying loads
●Improve throughput
●Reduce server costs
●Store results of idempotent computations
●Fallbacks when service is not responding
●Sharing data across multiple disparate
services
45. Know Your Limits
There’s probably a limitation in your
infrastructure that you don’t know about
CPU & Memory are easy, network is hard
Cascading failures
Hinweis der Redaktion
duplicate caches in multiple regions - cross region users and offline compute data
invalidations when there is persistent store/db with real value
eventually consistent
general data flow
Small percentage of users have requests that switch from one region to the other between requests for same user