2. Emerging NoSQL Space
RDBMS RDBMS
RDBMS
Data Data
Warehou Warehou NoSQL
se se
The beginning Last 10 years Today
3. Qualities of NoSQL
Workloads
Flexible data models High Throughput Large Data Sizes
• Lists, Nested Objects • Lots of reads • Aggregate data size
• Sparse schemas • Lots of writes • Number of objects
• Semi-structured data
• Agile Development
Low Latency Cloud Computing Commodity
• Both reads and writes • Run anywhere Hardware
• Millisecond latency • No assumptions about • Ethernet
hardware • Local disks
• No / Few Knobs
4. MongoDB was designed for
this
Flexible data models High Throughput Large Data Sizes
• Lists, Nested Objects • Lots of reads • Aggregate data size
• schemas
• SparseJSON based • writes
• Lots of Replica Sets to • Number of objects shards
• 1000’s of
• Semi-structuredmodel
object data scale reads in a single DB
• Dynamic
• Agile Development • Sharding to • Partitioning of
schemas scale writes data
Low Latency Cloud Computing Commodity
• Both reads and writes • Run anywhere Hardware
• In-memory
• Millisecond latency • No • Scale-out to
assumptions about • Ethernet
• Designed for
cache overcome
hardware • Local disks
• No / Few Knobs “typical” OS and
• Scale-out hardware
local file system
working set limitations
7. High Volume Data Feeds
Machine • More machines, more sensors, more
Generated data
Data • Variably structured
Stock Market • High frequency trading
Data
Social Media • Multiple sources of data
Firehose • Each changes their format constantly
8. High Volume Data Feed
Flexible document
model can adapt to
changes in sensor
format
Asynchronous writes
Data
Data
Sources
Data
Sources
Data Write to memory with
Sources periodic disk flush
Sources
Scale writes over
multiple shards
9. Operational Intelligence
• Large volume of state about users
Ad Targeting • Very strict latency requirements
• Expose report data to millions of customers
Real time • Report on large volumes of data
dashboards • Reports that update in real time
Social Media • What are people talking about?
Monitoring
10. Operational Intelligence
Parallelize queries
Low latency reads
across replicas and
shards
API
In database
aggregation
Dashboards
Flexible schema
adapts to changing
input data
Can use same cluster
to collect, store, and
report on data
11. Behavioral Profiles
Rich profiles
collecting multiple
complex actions
1 See Ad
Scale out to support { cookie_id: “1234512413243”,
high throughput of advertiser:{
apple: {
activities tracked actions: [
2 See Ad { impression: ‘ad1’, time: 123 },
{ impression: ‘ad2’, time: 232 },
{ click: ‘ad2’, time: 235 },
{ add_to_cart: ‘laptop’,
sku: ‘asdf23f’,
time: 254 },
Click { purchase: ‘laptop’, time: 354 }
3 ]
}
}
}
Dynamic schemas
make it easy to track
Indexing and
4 Convert vendor specific
querying to support
attributes
matching, frequency
capping
12. Product Data Management
E-Commerce
• Diverse product portfolio
Product • Complex querying and filtering
Catalog
• Scale for short bursts of high-
volume traffic
Flash Sales • Scalable but consistent view of
inventory
13. Content Management
• Comments and user generated
News Site content
• Personalization of content, layout
Multi-Device • Generate layout on the fly for each
rendering device that connects
• No need to cache static pages
• Store large objects
Sharing • Simple modeling of metadata
14. Content Management
Geo spatial indexing
Flexible data model for location based
GridFS for large
for similar, but searches
object storage
different objects
{ camera: “Nikon d4”,
location: [ -122.418333, 37.775 ]
}
{ camera: “Canon 5d mkII”,
people: [ “Jim”, “Carol” ],
taken_on: ISODate("2012-03-07T18:32:35.002Z")
}
{ origin: “facebook.com/photos/xwdf23fsdf”,
license: “Creative Commons CC0”,
size: {
dimensions: [ 124, 52 ],
units: “pixels”
Horizontal scalability }
for large data sets }
15. User Data Management
Video • User state and session
Games management
Social • Scale out to large graphs
Graphs • Easy to search and process
Identity • Authentication, Authorization,
Management and Accounting
16. Social Graphs
Native support for
Arrays makes it easy
to store connections
inside user profile
Sharding partitions
user profiles across Documents enable
Social Graph available servers disk locality of all
profile data for a user
18. Good fits for MongoDB
Application Characteristic Why MongoDB might be a good fit
Variable data in objects Dynamic schema and JSON data model enable
flexible data storage without sparse tables or
complex joins
Low Latency Access Memory Mapped storage engine caches
documents in RAM, enabling in-memory
performance. Data locality of documents can
significantly improve latency over join based
approaches
High write or read throughput Sharding + Replication lets you scale read and
write traffic across multiple servers
Large number of objects to Sharding lets you split objects across multiple
store servers
Cloud based deployment Sharding and replication let you work around
hardware limitations in clouds.