2. Announcements
™ A Special Thanks to
™ The OSCON organizers (e.g. Shirley Bailes)
™ Other Speakers from Netflix @OSCON
™ Adrian Cockcroft – Keynote, OSCON Data, Tuesday
™ Daniel Jacobson – API, OSCON, Wed
™ Matt McCarthy/Kim Trott – Webkit, OSCON, Fri
@r39132 2
4. Big Data is Boring!
™ Our single largest stored data set is our tomcat logs
™ Hundreds of petabytes
™ Stored on S3
™ Never read or accessed after writing
™ Why do we have so much data?
™ Cumbersome to delete data from S3
™ This is Big Data .. And Boring!
@r39132 4
7. Motivation
™ Circa late 2008, Netflix had a single data center
™ Single-point-of-failure (a.k.a. SPOF)
™ Approaching limits on cooling, power, space, traffic
capacity
™ Alternatives
™ Build more data centers
™ Outsource the majority of our capacity planning and
scale out
™ Allows us to focus on core competencies
@r39132 7
12. Device Experience
IPhone
8 Screens of the IPhone App
(From Upper Left to Lower Right):
• Login Screen
• Home Screen
• Genres
• Queue (… loading…)
• Queue
• Video Detail Page
• Video Playback Starting
• Video in Progress
@r39132 12
13. Device Experience
IPhone
These Talk to API
• Home Screen
• Genres
• Queue (… loading…)
• Queue
• Video Detail Page
These Talk to NCCP
• Video Playback Starting
• Video in Progress
@r39132 13
14. Device Experience
IPhone
Playback is a multi-step process:
Step 1 : Authenticate & Authorize
the user and device (major over-
simplification)
Step 2 : Stream the video bits till
your ISP cries “mother”
@r39132 14
15.
16. The AWS Experience
™ We use the following services:
™ Compute (w/ Auto-scaling)
™ EC2, ELB, CloudWatch, ASG
™ Queueing
™ SQS, starting to use SES and SNS
™ Persistence
™ SDB & S3 (and minimal EBS)
@r39132 16
17.
18. ELB Primer
™ An elastic-load balancer (ELB) routes traffic to your EC2
instances
™ e.g.: api-apiproxy-frontend-11111111.us-east-1.elb.amazonaws.com
™ Netflix maps a CNAME to this ELB
™ e.g.: api.netflix.com (just a guess!)
™ Netflix then registers EC2 instances with this ELB, so that the
ELB can load balance traffic across EC2 instances
™ ELB periodically polls attached EC2 instances on their http port
to ensure the instances are healthy. If they are not, then no traffic
is sent to them
@r39132 18
19. ELB Primer : Request Flow
• Client DNS Lookups
• Netflix CNAME à ELB DNS
name
• ELB DNS name à IP
Address of an ELB node
• Client Connection to ELB Node
• ELB node Round Robins to
one of your servers
• Response sent back to ELB
and passed back to the client
@r39132 19
20. ELB Primer : Auto Scaling
™ Taking this a bit further:
™ We have CloudWatch monitor EC2 instance CPU
™ We set up a CloudWatch alarm on CPU limits
™ We associate this CloudWatch alarm with an Auto Scale policy
™ E.g. If CPU >60% persists for 5 minutes do policy z (add 3 nodes/zone)
™ E.g. If CPU <30% persists for 5 minutes do policy a (remove 1 node/zone)
™ Supported Metrics include:
™ CPU
™ Disk Read Ops or Disk Write Ops
™ Disk Read Bytes or Disk Write Bytes
™ Network In (bytes) or Network Out (bytes)
@r39132 20
21. Event Flow
Instances publish data to CloudWatch
NCCP CloudWatch (Alarms)
Standard system or custom metrics
CloudWatch alarms trigger
ASG policies
Auto-Scaling
EC2 instances are Service
added/removed (Policies)
@r39132 21
22. NCCP Rules
Rule Description
Scale Up Event Average CPU > 60% for 5 minutes
Scale Down Event Average CPU < 30% FOR 5
minutes
Cool-Down Period 10 minutes
Auto-Scale Alerts DLAutoScaleEvents
@r39132 22
26. Queuing: SQS
™ SQS
™ API for Queue Management
™ CreateQueue
™ ListQueues
™ DeleteQueue
™ API for Message Management
™ SendMessage (up to 64K in size)
™ ReceiveMessage (up to 10 messages in a batch)
™ DeleteMessage (a.k.a. ACK Message)
™ SetVisibilityTimeout – after which, a message becomes
visible to other ReceivedMessage calls
@r39132 26
27. Queuing: SQS
™ We are Happy with SQS
™ Our previous DC-based WebLogic Messaging
Infrastructure did not scale
™ If the Message Queue grew too large, the message
producer needed to drop messages (or store them on local
disk)
™ If the producer tried to force the message onto the
WebLogic queue, GC pauses would cripple WL
™ SQS has worked well even with >100M message
backlogs
™ As long as you can work through the backlog before any
message exceeds 4 days on the queue
@r39132 27
28. Messaging Services
™ SQS Wish List
™ API for Message Management
™ SendMessage
™ Support Batch Sends
™ ReceiveMessage
™ Record metrics in Cloud Watch on the following events
™ Empty Receive Count when Q is not empty
™ Visibility Timeout Expiration Count
™ DeleteMessage
™ Support Batch Deletes
@r39132 28
29.
30.
31.
32. Pick a Data Store in the Cloud
During our Cloud Migration, out initial requirements were :
þ Hosted
þ Managed Distribution Model
þ Works in AWS
þ AP from CAP
þ Handles a majority of use-cases accessing high-growth, high-
traffic data
þ Specifically, key access by customer id, movie id, or both
@r39132 32
33. Pick a Data Store in the Cloud
™ We picked SimpleDB and S3
™ SimpleDB was targeted as the AP equivalent of our
RDBMS databases in our Data Center
™ S3 was used for data sets where item or row data
exceeded SimpleDB limits and could be looked up
purely by a single key (i.e. does not require secondary
indices and complex query semantics)
™ Video encodes
™ Streaming device activity logs (i.e. CLOB, BLOB, etc…)
™ Compressed (old) Rental History
@r39132 33
35. Technology Overview : SimpleDB
Terminology
SimpleDB Hash Table Relational Databases
Domain Hash Table Table
Item Entry Row
Item Name Key Mandatory Primary Key
Attribute Part of the Entry Value Column
@r39132 35
36. Technology Overview : SimpleDB
Soccer Players
Key Value
Nickname = Wizard of Teams = Leeds United,
ab12ocs12v9 First Name = Harold Last Name = Kewell Oz Liverpool, Galatasaray
Nickname = Czech Teams = Lazio,
b24h3b3403b First Name = Pavel Last Name = Nedved Cannon Juventus
Teams = Sporting,
Manchester United,
cc89c9dc892 First Name = Cristiano Last Name = Ronaldo Real Madrid
SimpleDB’s salient characteristics
• SimpleDB offers a range of consistency options
• SimpleDB domains are sparse and schema-less
• The Key and all Attributes are indexed
• Each item must have a unique Key
• An item contains a set of Attributes
• Each Attribute has a name
• Each Attribute has a set of values
• All data is stored as UTF-8 character strings (i.e. no support for types such as numbers or dates)
@r39132 36
37. Technology Overview : SimpleDB
What does the API look like?
™ Manage Domains
™ CreateDomain
™ DeleteDomain
™ ListDomains
™ DomainMetaData
™ Access Data
™ Retrieving Data
™ GetAttributes – returns a single item
™ Select – returns multiple items using SQL syntax
™ Writing Data
™ PutAttributes – put single item
™ BatchPutAttributes – put multiple items
™ Removing Data
™ DeleteAttributes – delete single item
™ BatchDeleteAttributes – delete multiple items
@r39132 37
38. Technology Overview : SimpleDB
™ Options available on reads and writes
™ Consistent Read
™ Read the most recently committed write
™ May have lower throughput/higher latency/lower
availability
™ Conditional Put/Delete
™ i.e. Optimistic Locking
™ Useful if you want to build a consistent multi-master data
store – you will still require your own anti-entropy
™ We do not use this currently, so we don’t know how it
performs
@r39132 38
39.
40. Translate RDBMS Concepts to Key-Value Store
Concepts
™ Relational Databases are known for relations
™ First, a quick refresher on Normal forms
@r39132 40
41. Normalization
NF1 : All occurrences of a record type must contain the same number of fields
-- variable repeating fields and groups are not allowed
NF2 : Second normal form is violated when a non-key field is a fact about a
subset of a key
Violated here
Part Warehouse Quantity Warehouse-
Address
Fixed here
Part Warehouse Quantity Warehouse Warehouse-
Address
@r39132 41
42. Normalization
™ Issues
™ Wastes Storage
™ The warehouse address is repeated for every Part-WH pair
™ Update Performance Suffers
™ If the address of a warehouse changes, I must update every part
in that warehouse – i.e. many rows
™ Data Inconsistencies Possible
™ I can update the warehouse address for one Part-WH pair and
miss Parts for the same WH (a.k.a. update anomaly)
™ Data Loss Possible
™ An empty warehouse does not have a row, so the address will be
lost. (a.k.a. deletion anomaly)
@r39132 42
43. Normalization
™ RDBMS à KV Store migrations can’t simply accept
denormalization!
™ Especially many-to-many and many-to-one entity relationships
™ Instead, pick your data set candidates carefully!
™ Keep relational data in RDBMS
™ Move key-look-ups to KV stores
™ Luckily for Netflix, most Web Scale data is accessed by Customer,
Video, or both
™ i.e. Key Lookups that do not violate 2NF or 3NF
@r39132 43
44. Translate RDBMS Concepts to Key-Value Store
Concepts
™ Aside from relations, relational databases typically
offer the following:
™ Transactions
™ Locks
™ Sequences
™ Triggers
™ Clocks
™ A structured query language (i.e. SQL)
™ Database server-side coding constructs (i.e. PL/SQL)
™ Constraints
@r39132 44
45. Translate RDBMS Concepts to Key-Value Store
Concepts
™ Partial or no SQL support (e.g. no Joins, Group Bys, etc…)
™ BEST PRACTICE
™ Carry these out in the application layer for smallish data
™ No relations between domains
™ BEST PRACTICE
™ Compose relations in the application layer
™ No transactions
™ BEST PRACTICE
™ SimpleDB : Conditional Put/Delete (best effort) w/ fixer jobs
™ Cassandra : Batch Mutate + the same column TS for all writes
@r39132 45
46. Translate RDBMS Concepts to Key-Value Store
Concepts
™ No schema - This is non-obvious. A query for a misspelled attribute name
will not fail with an error
™ BEST PRACTICE
™ Implement a schema validator in a common data access layer
™ No sequences
™ BEST PRACTICE
™ Sequences are often used as primary keys
™ In this case, use a naturally occurring unique key
™ If no naturally occurring unique key exists, use a UUID
™ Sequences are also often used for ordering
™ Use a distributed sequence generator or rely on client timestamps
@r39132 46
47. Translate RDBMS Concepts to Key-Value Store
Concepts
™ No clock operations, PL/SQL, Triggers
™ BEST PRACTICE
™ Clocks : Instead rely on client-generated clocks and run NTP. If using
clocks to determine order, be aware that this is problematic over long
distances.
™ PL/SQL, Triggers : Do without
™ No constraints. Specifically,
™ No uniqueness constraints
™ No foreign key or referential constraints
™ No integrity constraints
™ BEST PRACTICE
™ Applications must implement this functionality
@r39132 47
54. Earlier Persistence Requirements
With Cloud Migration behind us and Global Expansion in
front of us!
þ Hosted
þ Managed Distribution Model
þ Works in AWS – We can make it work in AWS
þ AP from CAP
þ Handles a majority of use-cases accessing high-growth, high-
traffic data
þ Specifically, key access by customer id, movie id, or both
@r39132 54
55. Persistence Requirements Revisited
Requirements SDB S3 Cassandra
Auto-Sharding No Yes Yes
Auto-Failover & Yes Yes Yes
Failback
Fast Yes No TBD
HA Writes Reads No No TBD
Cross-Region No No Yes
Exportable for No Yes Yes
Backup and
Recovery
Works in AWS Yes Yes Yes
Hosted Yes Yes No
Open Source No No Yes
@r39132 55
56. Netflix Wants You
Cloud Systems
• Cassandra
• Netflix Platform
• Simian Army (e.g. Chaos Monkey)
API & Discovery Engineering
• Video Discovery
NCCP a.k.a. Streaming Server
• Video Playback
Partner Product Development
• PS3, Android / Web Kit, etc…
@r39132 56