This document discusses using MongoDB for content management systems. It provides an overview of sample CMS applications and considerations for schema design in MongoDB. It also covers querying and indexing data, replication for high availability, and scaling MongoDB horizontally for large datasets. Specific topics covered include embedding documents, indexing tags and slugs, building custom RSS feeds, and reading from secondary nodes.
2. Agenda
• Sample Content Management System (CMS)
Application
• Schema Design Considerations
• Viewing the Final Product
• Building Feeds and Querying Data
• Replication, Failover, and Scaling
• Further Resources
4. CMS Application Overview
• Business news service
• Hundreds of stories per day
• Millions of website visitors per month
• Comments
• Related stories
• Tags
9. Sample Relational DB Structure
story author tag comment
id id id Id
headline first_name name storyid
copy last_name … name
authorid title Email
slug … comment_text
… …
related_story link_story_tag
id Id
storyid storyid
related_storyid tagid
… …
10. Sample Relational DB Structure
• Number of queries per page load?
• Caching layers add complexity
• Tables may grow to millions of rows
• Joins will become slower over time as db
increases in size
• Schema changes
• Scaling database to handle more reads
11. MongoDB Schema Design
• “Schemaless”, however, schema design is
important
• JSON documents
• Design for the use case and work backwards
• Do not use a relational model in MongoDB
• No joins or transactions, most related information
should be contained in the same document
• Atomic updates on documents, equivalent of
transaction
20. MongoDB Indexes for CMS
// Index on story slug
db.cms.ensureIndex( { slug : 1 });
// Index on story tags
db.cms.ensureIndex( { tags: 1 });
21. Querying MongoDB
// All Story information
db.cms.find( { slug : “apple-reports-second-quarter-earnings” });
// All Stories for a given tag
db.cms.find( { tags: “Earnings” });
23. Query Tags and Sort by Date
// Very simple to gather specific information for a feed
db.cms.find( { tags: { $in : [“Earnings”, “AAPL”] } }).sort(
{ date : -1 });
25. Replication
• Extremely easy to set up
• Replica node can trail primary node and
maintain a copy of the primary database
• Useful for disaster
recovery, failover, backups, and specific
workloads such as analytics
• When Primary goes down, a Secondary will
automatically become the new Primary
28. Scaling Horizontally
• Important to keep working data set in RAM
• When working data set exceeds RAM, easy to
add additional machines and segment data
across machines (sharding)
30. Additional Resources
• Use Case Tutorials:
http://docs.mongodb.org/manual/use-cases/
• What others are doing:
http://www.10gen.com/use-case/content-
management
• This presentation & video recording:
https://www.10gen.com/presentations/webinar