26. Schema Free
“Your data schema is a direct corollary with how you view your
business’ direction and tech goals. When you pivot, especially if it’s
a significant one, your data may no longer make sense in the
context of that change. Give yourself room to breath. A schema-less
data model is MUCH easier to adapt to rapidly changing
requirements than a highly structured, rigidly enforced schema.”
from:
http://www.cleverkoala.com/2010/08/why-your-startup-should-be-
using-mongodb/
26
40. Pre-Aggregation
• Problem:
– You require up-to-the minute data, or up-to-the-second if
possible
– The queries for ranges of data (by time) must be as fast as
possible
40
59. Operational Intelligence
• Next best activity for support/callcenter
• interpret user session
• e.g. “RaspberryPi - strong interest”
• exp. 2000 events per second
59
68. Big Data Project
• started as prototype, in production now ;-)
• “beyond agile”
66
69. Big Data Project
• started as prototype, in production now ;-)
• “beyond agile”
• going from
66
70. Big Data Project
• started as prototype, in production now ;-)
• “beyond agile”
• going from
– fetch all, calculate in service layer
66
71. Big Data Project
• started as prototype, in production now ;-)
• “beyond agile”
• going from
– fetch all, calculate in service layer
– use MongoDB MapReduce on single node
66
72. Big Data Project
• started as prototype, in production now ;-)
• “beyond agile”
• going from
– fetch all, calculate in service layer
– use MongoDB MapReduce on single node
– use MongoDB MapReduce on 5 shards
66
73. Big Data Project
• started as prototype, in production now ;-)
• “beyond agile”
• going from
– fetch all, calculate in service layer
– use MongoDB MapReduce on single node
– use MongoDB MapReduce on 5 shards
– use MongoDB MapReduce on 24 shards (2
hi1.4xlarge instances)
66
74. Big Data Project
• started as prototype, in production now ;-)
• “beyond agile”
• going from
– fetch all, calculate in service layer
– use MongoDB MapReduce on single node
– use MongoDB MapReduce on 5 shards
– use MongoDB MapReduce on 24 shards (2
hi1.4xlarge instances)
– use EMR (around 10 m2.4xlarge instances)
66
77. Big Data Project
• why not use Aggregation Framework?
– we started with 2.0.6
– would have had to change data model
– M/R seemed the way to go (data size)
69
78. Big Data Project
• Numbers
– data comes in weekly increments
– 2TB raw data
– 14GB / week (into MongoDB)
– data grows in direct proportion to polygon count
– currently 1 replica set of 3 m2.4xlarge instances
70
83. Big Data Project
• more polygons -> more data
– key length can become an issue
• using polygons to display cell metrics
• tried different types of visualizations
75
84. Big Data Project
• key-size per doc: 1.8KB
– bad: {very_descriptive_long_key : “yay”}
– good { v : “yay”}
76
85. Big Data Project
100000 polygons 500000 polygons
0 100.0 200.0 300.0 400.0
62
GB / year
308
77
87. Big Data Project
• 308GB of EBS storage => 332$ per year
– backups / snapshot not considered
79
88. Big Data Project
• Future Plans
– new Use Case
– expecting about 1TB of data / week
80
89. Conclusion
• rapidly changing business needs
• ease of collecting huge amounts of data
• infrastructure as part of code
• MongoDB provides flexibility
81
90. Comments?
• @comsysto
• #MongoMunich2012
• http://blog.comsysto.com
• Don’t forget the hallway track
• Mongo User Group Munich
– http://www.meetup.com/Muenchen-MongoDB-
User-Group/
82