10. Three (of many) Approaches
• Fan out on Read
• Fan out on Write
• Fan out on Write with Bucketing
11. Fan out on read
// Shard on "from"
db.shardCollection( "mongodbdays.inbox", { from: 1 } )
// Make sure we have an index to handle inbox reads
db.inbox.ensureIndex( { to: 1, sent: 1 } )
msg = {
from: "Joe",
to: [ "Bob", "Jane" ],
sent: new Date(),
message: "Hi!",
}
// Send a message
db.inbox.save( msg )
// Read my inbox
db.inbox.find( { to: "Joe" } ).sort( { sent: -1 } )
12. Fan out on read – I/O
Send
Message
Shard 1 Shard 2 Shard 3
13. Fan out on read – I/O
Shard 1 Shard 2 Shard 3
Read Inbox
Send
Message
14. Considerations
• Write: One document per message sent
• Read: Find all messages with my own name in
the recipient field
• Read: Requires scatter-gather on sharded
cluster
• A lot of random I/O on a shard to find everything
15. Fan out on write
// Shard on “recipient” and “sent”
db.shardCollection( "mongodbdays.inbox", { ”recipient”: 1, ”sent”: 1 } )
msg = {
from: "Joe",
to: [ "Bob", "Jane" ],
sent: new Date(),
message: "Hi!",
}
// Send a message
for ( recipient in msg.to ) {
msg.recipient = msg.to[recipient]
db.inbox.save( msg );
}
// Read my inbox
db.inbox.find( { recipient: "Joe" } ).sort( { sent: -1 } )
16. Fan out on write – I/O
Send
Message
Shard 1 Shard 2 Shard 3
17. Fan out on write – I/O
Read Inbox
Send
Message
Shard 1 Shard 2 Shard 3
18. Considerations
• Write: One document per recipient
• Read: Find all of the messages with me as the
recipient
• Can shard on recipient, so inbox reads hit one
shard
• But still lots of random I/O on the shard
19. Fan out on write with buckets
// Shard on "owner / sequence"
db.shardCollection( "mongodbdays.inbox",
{ owner: 1, sequence: 1 } )
db.shardCollection( "mongodbdays.users", { user_name: 1 } )
msg = {
from: "Joe",
to: [ "Bob", "Jane" ],
sent: new Date(),
message: "Hi!",
}
21. Fan out on write with buckets
• Each “inbox” document is an array of messages
• Append a message onto “inbox” of recipient
• Bucket inboxes so there’s not too many
messages per document
• Can shard on recipient, so inbox reads hit one
shard
• 1 or 2 documents to read the whole inbox
22. Fan out on write with buckets – I/O
Send
Message
Shard 1 Shard 2 Shard 3
23. Fan out on write with buckets – I/O
Shard 1 Shard 2 Shard 3
Read Inbox
Send
Message
26. Design Goals
• Need to retain a limited amount of history e.g.
– Hours, Days, Weeks
– May be legislative requirement (e.g. HIPPA, SOX,
DPA)
• Need to query efficiently by
– match
– ranges
27. 3 (of many) approaches
• Bucket by Number of messages
• Fixed size array
• Bucket by date + TTL collections
28. Bucket by number of messages
db.inbox.find()
{ owner: "Joe", sequence: 25,
messages: [
{ from: "Joe",
to: [ "Bob", "Jane" ],
sent: ISODate("2013-03-01T09:59:42.689Z"),
message: "Hi!"
},
…
] }
// Query with a date range
db.inbox.find ({owner: "friend1",
messages: {
$elemMatch: {sent:{$gte: ISODate("…") }}}})
// Remove elements based on a date
db.inbox.update({owner: "friend1" },
{ $pull: { messages: {
sent: { $gte: ISODate("…") } } } } )
29. Considerations
• Shrinking documents, space can be reclaimed
with
– db.runCommand ( { compact: '<collection>' } )
• Removing the document after the last element in
the array as been removed
– { "_id" : …, "messages" : [ ], "owner" :
"friend1", "sequence" : 0 }
32. TTL Collections
// messages: one doc per user per day
db.inbox.findOne()
{
_id: 1,
to: "Joe",
sequence: ISODate("2013-02-04T00:00:00.392Z"),
messages: [ ]
}
// Auto expires data after 31536000 seconds = 1 year
db.messages.ensureIndex( { sequence: 1 },
{ expireAfterSeconds: 31536000 } )
34. Design Goal
• Application needs to stored a variable number of
attributes e.g.
– User defined Form
– Meta Data tags
• Queries needed
– Equality
– Range based
• Need to be efficient, regardless of the number of
attributes
35. 2 (of many) Approaches
• Attributes as Embedded Document
• Attributes as Objects in an Array
36. Attributes as a sub-document
db.files.insert( { _id: "local.0",
attr: { type: "text", size: 64,
created: ISODate("..." } } )
db.files.insert( { _id: "local.1",
attr: { type: "text", size: 128} } )
db.files.insert( { _id: "mongod",
attr: { type: "binary", size: 256,
created: ISODate("...") } } )
// Need to create an index for each item in the sub-document
db.files.ensureIndex( { "attr.type": 1 } )
db.files.find( { "attr.type": "text"} )
// Can perform range queries
db.files.ensureIndex( { "attr.size": 1 } )
db.files.find( { "attr.size": { $gt: 64, $lte: 16384 } } )
37. Considerations
• Each attribute needs an Index
• Each time you extend, you add an index
• Lots and lots of indexes
46. Considerations
• Lookup by shard key is routed to 1 shard
• Lookup by other identifier is scatter gathered
across all shards
• Secondary keys cannot have a unique index
49. Considerations
• Lookup to Identities is a routed query
• Lookup to Users is a routed query
• Unique indexes available
• Must do two queries per lookup
51. Summary
• Multiple ways to model a domain problem
• Understand the key uses cases of your app
• Balance between ease of query vs. ease of write
• Reduce random I/O where possible for better
performance