9. Culprit:
B+Tree index
• Good at sequential insert
• e.g. ObjectId, Sequence #, Timestamp
• Poor at random insert
• Indexes on randomly-distributed data
10. Sequential vs. Random insert
1 55
2 75
3 78
4 1
5 99
6 36
7 80
8 91
9 52
10 B+Tree 63 B+Tree
11 56
12 33
working set working set
Sequential insert ➔ Small working set Random insert ➔ Large working set
➔ Fits in RAM ➔ Sequential I/O ➔ Cannot fit in RAM ➔ Random I/O
(bandwidth-bound) (IOPS-bound)
35. But what if...
B+Tree
large working set
201208_* 201209_* 201210_*
36. Sequential + hash key
• Can you predict data growth rate?
• Balancer not clever enough
• Only considers # of chunks
• Migration slow during heavy-writes
38. Low-cardinality hash key
Shard key range: A ~ D
• e.g. A~Z, 00~FF
• Alleviates B+Tree problem
• Sequential access on fixed # Local
of parts B+Tree
A A A B B B C C C
39.
40. Low-cardinality hash key
• Limits the # of possible chunks
• e.g. 00 ~ FF ➔ 256 chunks
• Chunk grows past 64MB
• Balancing becomes difficult
44. Lessons learned
• Know the performance impact of secondary index
• Choose the right shard key
• Test with large data sets
• Linear scalability is hard
• If you really need it, consider HBase or Cassandra
• SSD