The document discusses different approaches for designing schemas to store data from multiple feeds like network traffic, tweets, and Facebook posts in MongoDB. It analyzes storing the raw data in individual collections for each feed, a single raw collection, and semi-structured collections. Other approaches discussed are using time series or purpose modeling, with examples of fan-on-write and fan-on-read purpose models. The key takeaway is that the schema design should be tailored to the functional and logical usage of the data.
7. Things to consider
• Data Handling
– Processing
– Storage
• Which schema?
• Data types to use?
• Visualization
– Access to data
– Use Data
• Usage
– Enrichement
– Actualization / Updates
– Format Changes
8. How we can use our day-to-day data to
experiment different "bigdata" options
And all for fun … if your that kind of person
9. 9
Feeds
Machine Data Twitter Feed Facebook Posts
scapy Implementation
Sniffer
TwitterAPI facebook-sdk
All out/inbound traffic
for the last hours
All tweets that match a
set of terms
All my personal posts
13. 13
Different Approaches
• Raw Data Collection
– Individual Feed Collections
– Global Feed Collections
• Base Structured Documents
• Time Series Model
• Purpose Modeling
– Read Oriented
– Write Oriented
15. Raw Collections
Posi%ve
Not
So
Much
Simple
Approach
Hard
to
Maintain
Fast
to
Develop
More
logic
on
the
App
Layer
Direct
Model
to
Service
Dependency
on
3rd
Party
Model
Simple
direct
queries
More
complicated
to
Merge
Results
16. Single Raw Collection
db.raw.find()
{
"_id": ObjectId("55fe4d194cc75f0157a8c8b4"),
"contributors": null,
"truncated": false,
"text": "We compared #python vs #nodejs - see results: http://t.co/WVeOGWMR5V",
...
}
{
{
"_id": ObjectId("55fc4fa44cc75f4fa21b2de0"),
"picture": "https://fbcdn-photos-b-a.akamaihd.net/hphotos..."
...
{
"_id": ObjectId("55fc4faf4cc75f4fa21b2f64"),
"src": "00:11:32:34:9a:b7",
"ip": {
17. Single Raw
Posi%ve
Not
So
Much
Single
Access
Point
Even
Harder
to
Maintain
Same
development
speed
Loading
data
requires
Codecs
to
be
done
well
Faster
Access
to
Result
Set
More
complicated
to
Filter
Results
19. Semi-structure Single Collection
Posi%ve
Not
So
Much
Single
Access
Point
Needs
modeling
Common
Structure
to
all
data
Faster
Access
to
Result
Set
Single
"Shardable"
collecDon
28. 28
Takeway
• A good schema is crucial to the performance of your
system
– Functional
– Logical
• Different usage of data will shape your Schema
• Storage Engines will also be important
– Different storage Engines perform different according
with workload
29. MongoDB Days 2015
5
November,
2015
London
https://www.mongodb.com/events/mongodb-days-uk