8. MongoDB Philosophy
• Reduce transactional semantics for
performance
• No-relational is the best way to scale
horizontally
mongodb.org
9. MongoDB Features
• JSON style documents • Map / Reduce
• Index on any attribute • GridFS to store files
• Rich queries • Server-side JavaScript
• In-place update • Capped collections
• Auto-sharding • Full-text-search
(coming soon)
10. MongoDB's flexibility data structure, ability to index &
query data, and auto-sharding make it a strong tool that
adapt to changes well. It also help to reduce complexity
comparing to tradition RDBMS.
11. Why MongoDB reduce
complexity?
• Get rid of migrations
• Get rid of relationships (most of)
• Reduce number of database requests
• JSON (client, server, and database)
12. Get rid of migrations
• No create table
• No alter column
• No add column
• No change column
13. Get rid of relationships
• Many one-to-one and one-to-many
relationships is not necessary
• User :has_one :setting
• User :has_many :addresses
• User :has_many :roles
• Post :has_many :tags
14. Reduce number of
database requests
• Pre-joined
• Rich queries
• Atomic, in-place updates
16. Adapt to changes
• Changes in schema
• Changes in data & algorithms
• Changes for performance & scaling
17. Changes in schema
• In modern apps, schema changes quite
often (weekly, monthly ...)
• Alter tables are expensive in RDBMS
• Dynamic schema document makes those
changes seamlessly
18. Changes in data &
algorithms
• Atomic, in-place updates are very
powerful to modify data
$inc, $set, $unset, $push, $pop, $rename, $bit
• Rich$all, $exists,and aggregators
$in,
queries
$size, $type, regexp
count(), size(), distinct(), min(), max()
• Map/Reduce
19. Changes for
performance & scaling
• Very fast & ready to scale =>
• Don’t have to use additional tools
(memcached ...)
• Don’t have to change platforms
20. Case Studies
• Store crawled info as embedded documents
• Product listing
• Find unique slug
• Voting
21. Store crawled info as
embedded documents
• Data from 3rd party sources
• Sources and data formats can be changed in
the future
22. Store crawled info as
embedded documents
product = {
"_id" : ObjectId("4d8ace4b0dc3e43231bb930d"),
"name" : "Product ABC",
"amazon" : {
"asin" : ...,
"price" : ...,
....
}
};
25. Store crawled info as
embedded documents
def Product.find_by_asin(asin)
Product.where('amazon.asin' => asin).first
end
26. Product listing
• A product can be listed on multiple
categories on certain months
27. Product listing
• Need an extra table to express which
product is listed in which category and
on which month
product_id category_id month
1 2 2011-03
1 2 2011-04
SQL
28. Product listing
• To query products listed in category 2 and
month ‘2011-04’
Product.join(:listings).where('category_id = ? AND month = ?', 2,
‘2011-04’)
SQL
33. Find unique slug
• book1 = #<Book id: .., title => “Ruby”, ... >
• book2 = #<Book id: .., title => “Ruby”, ... >
• book2.uniq_slug => /books/ruby-1
• Need n queries to find an unique slug
def uniq_slug
slug = original_slug = title.to_slug
counter = 0
while (where(:slug => slug).count > 0)
counter += 1
slug = "#{original_slug}-#{counter}"
end
slug
end SQL
34. Find unique slug
• Need one query using regexp matching
def find_uniq_slug
original_slug = title.to_slug
slug_pattern = /^#{original_slug}(-d+)?$/
book = where(:slug => slug_pattern).
order(:slug.desc).limit(1)
if book
max_counter = book.slug.match(/-(d+)$/)[1].to_i
"#{original_slug}-#{max_counter + 1}"
else
original_slug
end
end
db.books.ensureIndex({"slug" : -1 })
Mongo
35. Voting
• A user can only vote each post once
• up / down votes has different points
• Cached votes_count and votes_point in
post for sorting and querying
• Post.max(:votes_point)
• Post.order_by(:votes_count.desc)
37. Voting
def vote(user_id, post_id, value)
# Validate
not_voted = Vote.where(:user_id => user_id,
:post_id => post_id).count == 0
if not_voted
# Create a new vote
Vote.create(
:user_id => user_id,
:post_id => post_id,
:value => value
)
# Get post
post = Post.find(post_id)
# Update votes_point & votes_count
post.votes_point += POINT[value]
post.votes_count += 1
post.save
end
end SQL
38. Voting
def vote(user_id, post_id, value)
# Validate
not_voted = Vote.where(:user_id => user_id,
:post_id => post_id).count == 0
if not_voted
# Create a new vote
Vote.create(
:user_id => user_id, 4 requests
:post_id => post_id,
:value => value
)
# Get post
post = Post.find(post_id)
# Update votes_point & votes_count
post.votes_point += POINT[value]
post.votes_count += 1
post.save
end
end SQL
39. Voting
def unvote(user_id, post_id)
# Get current vote
vote = Vote.where(:user_id => user_id,
:post_id => post_id).first
# Check if voted
if vote
# Destroy vote
vote.destroy
# Get post
post = Post.find(post_id)
# Update votes_point & votes_count
post.votes_point -= POINT[vote.value]
post.votes_count -= 1
post.save
end
end SQL
40. Voting
def unvote(user_id, post_id)
# Get current vote
vote = Vote.where(:user_id => user_id,
:post_id => post_id).first
# Check if voted
if vote
# Destroy vote 4 requests
vote.destroy
# Get post
post = Post.find(post_id)
# Update votes_point & votes_count
post.votes_point -= POINT[vote.value]
post.votes_count -= 1
post.save
end
end SQL
41. Voting
• Embed votes data to post
• use arrays to store who vote up and who
vote down
post = {
"_id" : ObjectId("4d8ace4b0dc3e43231bb930d"),
"title" : "Post ABC",
....
"votes" : {
"up" : [ user_id_1 ],
"down" : [ user_id_2 ],
"count" => 2,
"point" => -1
Mongo
}
};
42. def vote(user_id, post_id, value)
# Find post with post_id that was not up voted or down voted by user_id
query = {
'post_id' => post_id,
'votes.up' => { '$ne' => user_id },
'votes.down' => { '$ne' => user_id }
}
# Push user_id to votes.up_ids if vote up or votes.down_ids if vote_down
# and update votes.point and votes.count
update = {
'$push' => {
(value == :up ? 'votes.up' : 'votes.down') => user_id
},
'$inc' => {
'votes.point' => POINT[value],
'votes.count' => +1
}
}
# Validate, update and get result
post = Post.collection.find_and_modify(
:query => query,
:update => update,
:new => true # return post after update votes data
)
end Mongo
43. def vote(user_id, post_id, value)
# Find post with post_id that was not up voted or down voted by user_id
query = {
'post_id' => post_id,
'votes.up' => { '$ne' => user_id },
'votes.down' => { '$ne' => user_id }
}
# Push user_id to votes.up_ids if vote up or votes.down_ids if vote_down
# and update votes.point and votes.count
update = {
'$push' => {
(value == :up ? 'votes.up' : 'votes.down') => user_id
},
'$inc' => {
'votes.point' => POINT[value],
'votes.count' => +1
}
}
# Validate, update and get result
post = Post.collection.find_and_modify(
:query => query, one request
:update => update,
:new => true # return post after update votes data
)
end Mongo
44. def unvote(user_id, post_id)
# Find post with post_id that was up voted or down voted by user_id
query = {
'post_id' => post_id,
'$or' => { 'votes.up' => user_id, 'votes.down' => user_id }
}
# Pull user_id from both votes.up_ids and votes.down_ids
# and update votes.point and votes.count
update = {
'$pull' => {
'votes.up' => user_id,
'votes.down' => user_id
},
'$inc' => {
'votes.point' => -POINT[value],
'votes.count' => -1
}
}
# Validate, update and get result
post = Post.collection.find_and_modify(
:query => query,
:update => update,
:new => true # return post after update votes data
)
end Mongo
45. def unvote(user_id, post_id)
# Find post with post_id that was up voted or down voted by user_id
query = {
'post_id' => post_id,
'$or' => { 'votes.up' => user_id, 'votes.down' => user_id }
}
# Pull user_id from both votes.up_ids and votes.down_ids
# and update votes.point and votes.count
update = {
'$pull' => {
'votes.up' => user_id,
'votes.down' => user_id
},
'$inc' => {
'votes.point' => -POINT[value],
'votes.count' => -1
}
}
# Validate, update and get result
post = Post.collection.find_and_modify(
:query => query, one request
:update => update,
:new => true # return post after update votes data
)
end Mongo
46. Voting
• For a complete solution:
• gem install voteable_mongoid
• visit https://github.com/vinova/voteable_mongoid
Hi everyone. It&#x2019;s my pleasure to be here today. I&#x2019;m going to talk about MongoDB one of the most popular NoSQL databases.\n
Hi, my name is Alex. I&#x2019;m co-founder at Vinova. We are a Ruby on Rails and Mobile App development shop in Singapore. We&#x2019;ve doing Rails for 5 years.\n\nWe are growing and looking for projects. If you need expertise's, feel free to contact us.\n
\n
I love SQL. I&#x2019;ve done a lot of projects using MySQL, PostgreSQL ... \nI just found a better tool\n
What&#x2019;s MongoDB. MongoDB is a open source, document-oriented database that want to be the best database for web apps (not everything)\n
Document-oriented is like this. \n\nThink of document as a Hash in Ruby or an Object in JavaScript.\n\nYou can store anything in document. Id, string, number, array and other documents (embedded documents).\n\n
In relational database, we have tables and rows. In MongoDB we have collections and documents. You can think of collections as tables and documents as rows.\n
MongoDB try to be as fast and scalable as key / value stores without loosing functionality.\n
MongoDB has a lot of great features. Rich query interface, atomic and in-place update\n
My experiences show that ..\n
Why mongo reduce complexity?\n
Because by using MongoDB we can get rid of migrations\n
Get rid of relationships. \n\nFor data don&#x2019;t share among objects or small enough. We just store it as a nested documents or arrays. So many 1-1 and 1-n relationships is not really necessary.\n
MongoDB help to reduce number of database requests because we already pre-joined your data by storing 1-1, 1-n relational data as arrays or nested document.\n
Because Mongo know JSON we don&#x2019;t have to convert data to JSON format.\nWe can pull JSON from Mongo and push it to client as it is.\n\n
\n
\n
Atomic, in-place updates are very powerful to modify data. I&#x2019;ll show you in one of the case-studies later.\n\n
Feed enough hardware resources to MongoDB to keep it run fast. \nWhen you need to scale your DB to multiple boxes you just do it.\n\nUnless your target is to build next Google or Facebook you may need Hadoop, HBase, Hive or Cassandra. For most use-cases, I think MongoDB is GOOD enough for scaling.\n
\n
A common use-cases I met is storing crawled information from various third party websites. Later we want to add more sources and they may change the data format in the future.\n
Normally, when using SQL I have to create an additional table for each source. For MongoDB, I just push them the object itself as an embedded document like this.\n
Then later, any changes in data structure like adding a new field\n
or adding new source, I just push it to the product object. No migration, now new table creating\n
And I can query those information use later using dot notation.\n
Another problem that can utilize both MongoDB document and ability to index everything is product listing.\n\nI built an online catalogue application to show products, and a product can be listed on multiple category on certain months\n
In SQL I need an extra table to express which product is listed in which category and on which month.\n\nListings table is not really a join table, since product_id and category_id can be duplicated.\n
To query product listed on a specific category and month. I need to join products table with listing table and do the query.\n
When using MongoDB we don&#x2019;t need listings table. We store listings as an array of value pair [category_id, month]\n
\n
Can index listings array so speed up query\n
Instead of category_id, month pair we can store listings as an arrays of object that people know which value is category id, which value is month explicitly. But it require more storage to store field names.\n\nI don&#x2019;t recommend that for simple data structure like listings.\n
Another example that show the power Mongo query is finding uniq slug.\n\nWe have many books with the same title &#x201C;Ruby&#x201D; but different categories.\n\nIn SQL we need n queries to find uniq slug for each of them. \n\nThe algorithm is simple, init slug from book&#x2019;s title, set counter to zero. Check if slug is already in use, if yes increase the counter, modify slug and continue until we found an unique one.\n
In Mongo, we don&#x2019;t have to write the while loop by using regular expression matching.\n\nFirst we init the original slug and slug pattern that match the original slug and it variants.\n\nUse regular expression matching to find the variant with max counter value.\n\nIf found, extract the max counter value, increase it by one to create the uniq slug.\n\nIf original slug and it&#x2019;s variants are not in used. Return the original slug.\n\nAnd don&#x2019;t forget to index slug field to speed up your query.\n
The last case study is voting. By solving this problem in both SQL and Mongo, I will show you how flexible and powerful Mongo is to avoid join table reduce number of database requests.\n\nThe problem is like this. In a forum, a user can only vote for each post one. Each vote can be a up vote or a down vote. Up votes and down votes have different vote points. +2 for an up vote and -1 for a down vote for example.\n\nWe need to cache votes_count and votes_point in post so that we can query and sort by votes_count and votes_point later.\n
In SQL, we need an join table to store vote data.\n
Here is the algorithm to do voting in SQL. \n\nCheck if user did not vote the post. \n\nCreate the vote.\n\nRetrieve post to get votes_point and votes_count\n\nUpdate votes_point, votes_count and save updated value to the database.\n
As you see, we need fours database request to do a voting in SQL.\n
\n
Same for unvote\n
When using Mongo, we can avoid join table by storing votes as an embedded document in post object itself.\n\nvotes.up array to store user id who give up votes\nvotes.down array to store user id who give down votes\n\nvotes.count, votes.point for querying and ordering purposes.\n
Here is voting algorithm in Mongo.\n\ngive a post_id and a user_id, the query part to find the post and make sure user have not vote the post yet.\n\nThe update data part put user id to votes.up or votes.down array depend on vote value, update votes.point and votes.count.\n
By using Mongo find_and_modify operator, I can query the post, do validation, update votes and return updated data in just ONE database request.\n
\n
Same for unvote\n
I extracted the voting solution from one of your project and released it as a gem. You can install it and check source code at github. Comments and contributions are welcome.\n
For summary, MongoDB is Flexible, Powerful and Fun.\n\nFlexible: come from Schema-less and document-oriented.\n\nPowerful: because Mongo is fast, scalable, and have rich queries\n\nFun: because you don&#x2019;t have to think in the SQL box (tables, columns, joins ...)\n
\n
In case you want to know more about MongoDB, there is some selected slides in references session to know more MongoDB, Schema Design, Indexing and Query Optimization.\n