MongoDB@sfr.fr

Welcome

Antoine Raith, technical team leader @ SFR

Apache, Tomcat, JEE

1 mutualised platform

30 physical application servers

150 Tomcat deployed

Web development at Internet Direction

22M pageviews per day

4.5M only on homepage

8M customers authentication per day

We NEED to scale!

What do we face ?

Increase our scalability

Avoid Schema/Table/Column dependency

Closer to developper team than sysadmin or DBA
team

NoSQL?

Scalable
Complex queries
Schema-less
Easy deployment and monitoring
Open-Source

Why MongoDB ?

[Live project] customers data

[Live project] sfr.fr targeted ads

[Development project] Products catalog

Our projects based on MongoDB

Hello!

Jérôme Leleu, web architect @ SFR
In charge of SSO and user profile service

User profile service (UPS)

Web services (SOAP or JSON)

Get the profile of SFR clients

Data are agregated from many backends of the information
system

Context

Java 1.6, mongo driver 2.6.5, replicat set + sharding

Technical data : « local storage » collection
■ only 1 collection in a database
■ « last connection date » of web account
■ 14 millions
■ read/writes by identifier of the web account (shard key)

Some functional data are coming : « internautes » collection
(6 millions)…

Data in UPS

My choice : read on slave and write (without acknowledge)
on master

« local storage » collection needs to be readable immediatly
after write

-> not really compatible with asynchronous replication and
reads on slave

-> use of memcached (like for most data in UPS) as a
cache for reads (let replication happens)

Implementation in MongoDB

2 Go of data and 2 Go of index for 14 millions documents
(from « db.stats(); »)

Insert / update : 600 k each day / communication exception
: 6 k each day
Average insert/update time : 56 ms

Some figures

Default values of the Java mongo driver are inappropriate :
unlimited connect timeout, unlimited read timeout, wait 120
seconds to get a connection from pool !

Cant’ make « AND » query on the same field
before mongo 2.0

Is it a good choice to read on slave / write on master ?
Replication time ? Is it a real use case ?
To replace by :
force acknowledge on writes and read on slave ?
OR
don’t acknowledge writes and read on master ?

Problems & pending question

Mongo @ SFR
Targeted ads application

Hi!

Matthieu Blanc
Web architect @ Degetel, contractor for SFR

Context
Present targeted ads to www.sfr.fr web visitors
Based on :

● Their profile
● Their web browsing history
● Date/Time of the day
● etc.

Ex : A web visitor consult a smartphone @ www.sfr.fr

A smartphones ad is shown when he goes back to
homepage

Ex : A web visitor goes to www.sfr.fr from a search
engine

An ad related to his search is shown

Problem
Need to keep web visitor web browsing history

Need to track down every :
● Ad views
● Clicks
● Conversions

Mongo DB to the rescue!

image from http://www.flickr.
com/photos/cayusa/

The D.U.N.C.E. principle : everything by default

Java 1.6
Spring Data for MongoDB 1.0.0
(uses mongo driver 2.7.1)
Read/Write on master
No Sharding
WriteConcern.NORMAL

The D.U.N.C.E. principle : everything by default

Case Study

Event Logging with MongoDB

Capped collections :

Event Logging
db.createCollection("mycoll", {capped: true, size:100000})

Old log data automatically LRU’s out
No risk of filling up a disk

no need to write log archival / deletion scripts

Good performance for a high number of writes compared to
reads

Event Logging

Map Reduce <- we are bad at this

Cron Job -> Server side logs aggregation by minute
and by ad

Aggregated logs persisted in a dedicated collection

Cron Job 2 consolidate aggregated logs by hour every
day

Cron Job 3 consolidate aggregated logs by day every
week

Log Analysis

Main collection (visitors web browsing history):
36 millions documents and growing
Some Data
Avg. document size 430 bytes

80 millions events processed in less than 3 months

By seconds 60 reads 50 writes (60 finds, 30 updates, 20
inserts)

Conclusion

It works! :)

Some Data
Default properties are good enough even for a high traffic
website (for now...)

Conclusion

Good morning!

David Rault, web architect @ SFR
In charge of MarketPlace project
@squat80 http://fr.linkedin.com/pub/david-rault/37/722/963

● Products classified by categories
● Categories determine products features
● Multiple sellers
○ can create new products (based on EAN/MPN)
■ can modify the products they created

■ can only refer to products created by other

sellers
○ publish offers (product id + price)
● Order management is out-of-scope
○ delegated to existing order-management system
● Still in development

Context

● Schema-less: products are structured
documents
○ Different properties depending on product category
(TVs, phone protections, wires, ...)
○ No JOIN required - documents load in a single call
○ New categories will come : no migration required
● Searching capabilities
○ Empowers navigating through the store
○ Complex-queries on products features
● Performance
○ Our Ops forbid intensive writes into Oracle DB (!)

Why Mongo ?

Java 7 - Tomcat 7

Direct use of Java driver (2.7.2)

Replicat-set (2 replicas + 1 arbiter)

Sharding enabled

Writes are replicas-safe

Technical choices

● WS for creation/update of products and
offers
● Triggers (scheduled) to consolidate data
○ for each product : valid offers on a 2-day window
are agregated into the product
○ for each categories : product counts, pseudo-
enumerated field values (e.g. list of brands) are
agregrated into the product
● "Live streaming" into Google Search
Appliance
○ feed for both internal keyword searches & portal-
wide searches (within *.sfr.fr sites)

"Back-office" Design

● Straight-forward queries
○ mostly READs
○ by product id, by category
○ filtering (min/max price, by brand, by color, ...)
■ filters are category-specific

● Customer-activity tracking
○ build knowledge base for future features:
■ recommendation engine

○ products viewed, previous orders, wish-list, etc.
○ both for identified and anonymous visitors

"Front-office" design

● Need to unlearn 10+ years EXP in
relational design/development
○ Think "document", not relation
○ No magical (a.k.a ORM) framework
● bye bye Hibernate ;)
○ Some surprises/confusion with the query syntax
■ No "$and" in versions <2.0, didn't manage some

queries (though it worked in mongo shell)
● "min_price > a and min_price > b" with the Java driver
■ Function operators appear at varying positions
● { "$lt": { "some_field": some_value }}
● { "some_field": { "$in" : some_values }}

How is it going ?

● Good performance
○ Although relatively low number of documents
(~5-10 000 documents)
● Fast development cycle
○ Only a few hours to have the first prototype
running
○ With google's help and a couple of hours, build a
micro full-text indexing search feature
● Mongo Shell is my friend
○ as well as Google & MongoDB.org
○ at last, a developer-friendly (command-line) tool
● bye bye sqlplus ;)

How is it still going ?

"borrowed" from Geek and Poke http://geekandpoke.typepad.com/

Thank You!

MongoDB@sfr.fr

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (15)

Ähnlich wie MongoDB@sfr.fr

Ähnlich wie MongoDB@sfr.fr (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

MongoDB@sfr.fr