This document summarizes MongoDB use cases at French telecommunications company SFR. MongoDB is used for customer data storage and retrieval, targeted online advertising, and an online products catalog. Key points include storing 14 million customer documents, processing 80 million event logs in 3 months, and hosting a catalog with up to 10,000 product documents with good performance. Challenges involved adjusting Java driver defaults and learning non-relational data modeling.
10. User profile service (UPS)
Web services (SOAP or JSON)
Get the profile of SFR clients
Data are agregated from many backends of the information
system
Context
11. Java 1.6, mongo driver 2.6.5, replicat set + sharding
Technical data : « local storage » collection
■ only 1 collection in a database
■ « last connection date » of web account
■ 14 millions
■ read/writes by identifier of the web account (shard key)
Some functional data are coming : « internautes » collection
(6 millions)…
Data in UPS
12. My choice : read on slave and write (without acknowledge)
on master
« local storage » collection needs to be readable immediatly
after write
-> not really compatible with asynchronous replication and
reads on slave
-> use of memcached (like for most data in UPS) as a
cache for reads (let replication happens)
Implementation in MongoDB
13. 2 Go of data and 2 Go of index for 14 millions documents
(from « db.stats(); »)
Insert / update : 600 k each day / communication exception
: 6 k each day
Average insert/update time : 56 ms
Some figures
14. Default values of the Java mongo driver are inappropriate :
unlimited connect timeout, unlimited read timeout, wait 120
seconds to get a connection from pool !
Cant’ make « AND » query on the same field
before mongo 2.0
Is it a good choice to read on slave / write on master ?
Replication time ? Is it a real use case ?
To replace by :
force acknowledge on writes and read on slave ?
OR
don’t acknowledge writes and read on master ?
Problems & pending question
24. Java 1.6
Spring Data for MongoDB 1.0.0
(uses mongo driver 2.7.1)
Read/Write on master
No Sharding
WriteConcern.NORMAL
The D.U.N.C.E. principle : everything by default
26. Capped collections :
Event Logging
db.createCollection("mycoll", {capped: true, size:100000})
Old log data automatically LRU’s out
No risk of filling up a disk
no need to write log archival / deletion scripts
Good performance for a high number of writes compared to
reads
Event Logging
27. Map Reduce <- we are bad at this
Cron Job -> Server side logs aggregation by minute
and by ad
Aggregated logs persisted in a dedicated collection
Cron Job 2 consolidate aggregated logs by hour every
day
Cron Job 3 consolidate aggregated logs by day every
week
Log Analysis
32. Main collection (visitors web browsing history):
36 millions documents and growing
Some Data
Avg. document size 430 bytes
80 millions events processed in less than 3 months
By seconds 60 reads 50 writes (60 finds, 30 updates, 20
inserts)
Conclusion
33. It works! :)
Some Data
Default properties are good enough even for a high traffic
website (for now...)
Conclusion
35. Good morning!
David Rault, web architect @ SFR
In charge of MarketPlace project
@squat80 http://fr.linkedin.com/pub/david-rault/37/722/963
36. ● Products classified by categories
● Categories determine products features
● Multiple sellers
○ can create new products (based on EAN/MPN)
■ can modify the products they created
■ can only refer to products created by other
sellers
○ publish offers (product id + price)
● Order management is out-of-scope
○ delegated to existing order-management system
● Still in development
Context
37. ● Schema-less: products are structured
documents
○ Different properties depending on product category
(TVs, phone protections, wires, ...)
○ No JOIN required - documents load in a single call
○ New categories will come : no migration required
● Searching capabilities
○ Empowers navigating through the store
○ Complex-queries on products features
● Performance
○ Our Ops forbid intensive writes into Oracle DB (!)
Why Mongo ?
38. Java 7 - Tomcat 7
Direct use of Java driver (2.7.2)
Replicat-set (2 replicas + 1 arbiter)
Sharding enabled
Writes are replicas-safe
Technical choices
39. ● WS for creation/update of products and
offers
● Triggers (scheduled) to consolidate data
○ for each product : valid offers on a 2-day window
are agregated into the product
○ for each categories : product counts, pseudo-
enumerated field values (e.g. list of brands) are
agregrated into the product
● "Live streaming" into Google Search
Appliance
○ feed for both internal keyword searches & portal-
wide searches (within *.sfr.fr sites)
"Back-office" Design
40. ● Straight-forward queries
○ mostly READs
○ by product id, by category
○ filtering (min/max price, by brand, by color, ...)
■ filters are category-specific
● Customer-activity tracking
○ build knowledge base for future features:
■ recommendation engine
○ products viewed, previous orders, wish-list, etc.
○ both for identified and anonymous visitors
"Front-office" design
41. ● Need to unlearn 10+ years EXP in
relational design/development
○ Think "document", not relation
○ No magical (a.k.a ORM) framework
● bye bye Hibernate ;)
○ Some surprises/confusion with the query syntax
■ No "$and" in versions <2.0, didn't manage some
queries (though it worked in mongo shell)
● "min_price > a and min_price > b" with the Java driver
■ Function operators appear at varying positions
● { "$lt": { "some_field": some_value }}
● { "some_field": { "$in" : some_values }}
How is it going ?
42. ● Good performance
○ Although relatively low number of documents
(~5-10 000 documents)
● Fast development cycle
○ Only a few hours to have the first prototype
running
○ With google's help and a couple of hours, build a
micro full-text indexing search feature
● Mongo Shell is my friend
○ as well as Google & MongoDB.org
○ at last, a developer-friendly (command-line) tool
● bye bye sqlplus ;)
How is it still going ?