Schema Design - Real world use case

#MongoDBDays

Schema Design
Real World Use Case
Matias Cascallares
Consulting Engineer, MongoDB
matias.cascallares@mongodb.com

Agenda
• Why is schema design important
• A real world use case
– Social Inbox
– History
• Conclusions

Why is Schema Design important?
•

Largest factor for a performant system

•

Schema design with MongoDB is different
•
•

RDBMS – "What answers do I have?"
MongoDB – "What question will I have?"

Design Goals
•

Efficiently send new messages to recipients

•

Efficiently read inbox

3 Approaches (there are more)
• Fan out on Read
• Fan out on Write
• Fan out on Write with Bucketing

Fan out on read
// Shard on "from"
db.shardCollection( "mongodbdays.inbox", { from: 1 } )
// Make sure we have an index to handle inbox reads
db.inbox.ensureIndex( { to: 1, sent: 1 } )
msg = {
from: ”Matias",
to:
[ "Bob", "Jane" ],
sent: new Date(),
message: "Hi!",
}
// Send a message
db.inbox.save( msg )
// Read my inbox
db.inbox.find( { to: ”Matias" } ).sort( { sent: -1 } )

Schema Design, Matias Cascallares

Fan out on read – IO
Send
Message

Shard 1

Shard 2

Shard 3

Fan out on read – IO
Read
Inbox

Shard 1

Shard 2

Shard 3

Considerations
• Write: one document per message sent
• Reading my inbox means finding all messages with

my own name in the recipient field
• Read: requires scatter-gather on sharded cluster
• Then a lot of random IO on a shard to find

everything

Fan out on write
// Shard on “recipient” and “sent”
db.shardCollection( "mongodbdays.inbox", { ”recipient”: 1, ”sent”: 1 } )
msg = {
from: ”Matias",
to:
[ "Bob", "Jane" ],
sent: new Date(),
message: "Hi!",
}

// Send a message
for ( recipient in msg.to ) {
msg.recipient = recipient
db.inbox.save( msg );
}
// Read my inbox
db.inbox.find( { recipient: "Matias" } ).sort( { sent: -1 } )


Fan out on write – IO
Send
Message

Shard 1

Shard 2

Shard 3

Fan out on write – IO
Read
Inbox

Shard 1

Shard 2

Shard 3

Considerations
• Write: one document per recipient
• Reading my inbox is just finding all of the messages

with me as the recipient
• Can shard on recipient, so inbox reads hit one shard
• But still lots of random IO on the shard

Fan out on write with buckets
// Shard on “owner / sequence”
db.shardCollection( "mongodbdays.inbox", { owner: 1, sequence: 1 } )
db.shardCollection( "mongodbdays.users", { user_name: 1 } )

msg = {
from: ”Matias",
to:
[ "Bob", "Jane" ],
sent: new Date(),
message: "Hi!",
}


// Send a message
for( recipient in msg.to ) {
count = db.users.findAndModify({
query: { user_name: recipient },
update: { "$inc": { "msg_count": 1 } },
upsert: true,
new: true }).msg_count;
sequence = Math.floor(count / 50);
db.inbox.update(
{ owner: recipient, sequence: sequence },
{ $push: { "messages": msg } },
{ upsert: true }
);
}
// Read my inbox
db.inbox.find( { owner: "Matias" } ).sort ( { sequence: -1 } ).limit( 2 )


• Each “inbox” document is an array of messages
• Append a message onto “inbox” of recipient
• Bucket inboxes so there’s not too many messages

per document
• Can shard on recipient, so inbox reads hit one shard
• 1 or 2 documents to read the whole inbox

Fan out on write with buckets - IO
Send
Message

Shard 1

Shard 2

Shard 3

Fan out on write with buckets - IO
Read
Inbox

Shard 1

Shard 2

Shard 3

Design Goals
Need to retain a limited amount of history e.g.
– Number of items
– Hours, Days, Weeks
– May be legislative requirement (e.g. HIPPA, SOX, DPA)

Need to query efficiently by
– match
– ranges

3 Approaches (there are more)
•

Bucket by number of messages

•

Fixed size array

•

Bucket by date + TTL Collections

Bucket by number of
messages
db.inbox.find()
{ owner: "Matias", sequence: 25,
messages: [
{ from: "Matias",
to: [ "Bob", "Jane" ],
sent: ISODate("2013-03-01T09:59:42.689Z"),
message: "Hi!"
},
…
]}
// Query with a date range
db.inbox.find({ owner: "Matias",
messages: {
$elemMatch: {sent:{$gt: ISODate("…") }}}})
// Remove elements based on a date
db.inbox.update({ owner: "Matias" },
{ $pull: { messages: {
sent: { $lt: ISODate("…") } } } } )

Considerations
•

Shrinking documents, space can be reclaimed
with
– db.runCommand ( { compact: '<collection>' } )

•

Removing the document after the last element
in the array as been removed
– { "_id" : …, "messages" : [ ], "owner" : ”Bob",

"sequence" : 0 }

Maintain the latest – Fixed size
array
msg = {
from: "Your Boss",
to: [ "Bob" ],
sent: new Date(),
message: "CALL ME NOW!"
}

// 2.4 Introduces $each, $sort and $slice modifiers for $push
db.messages.update(
{ _id: 1 },
{ $push: { messages: { $each: [ msg ],
$sort: { sent: 1 },
$slice: -50
}
}
}
)


Considerations
•

Need to compute the size of the array based on
retention period

TTL Collections
// messages: one doc per user per day
db.inbox.findOne()
{
_id: 1,
to: "Joe",
sequence: ISODate("2013-02-04T00:00:00.392Z"),
messages: [ ]
}
// Auto expires data after 31536000 seconds = 1 year
db.messages.ensureIndex( { sequence: 1 },
{
expireAfterSeconds: 31536000 }
)


Summary
•

Multiple ways to model a domain problem

•

Understand the key uses cases of your app

•

Balance between ease of query vs. ease of
write

•

Random IO should be avoided

•

Scatter/gatter should be avoided

#MongoDBDays

Thank You
Matias Cascallares
Consulting Engineer, MongoDB
matias.cascallares@mongodb.com

Schema Design - Real world use case

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Schema Design - Real world use case

Ähnlich wie Schema Design - Real world use case (20)

Mehr von Matias Cascallares

Mehr von Matias Cascallares (6)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Schema Design - Real world use case

Hinweis der Redaktion