SlideShare ist ein Scribd-Unternehmen logo
1 von 43
MongoDB @ SFR
sfr.fr
Welcome




Antoine Raith, technical team leader @ SFR
Apache, Tomcat, JEE

1 mutualised platform

30 physical application servers

150 Tomcat deployed




Web development at Internet Direction
22M pageviews per day

4.5M only on homepage

8M customers authentication per day




We NEED to scale!

What do we face ?
Increase our scalability

 Avoid Schema/Table/Column dependency

 Closer to developper team than sysadmin or DBA
 team




NoSQL?
Scalable
Complex queries
Schema-less
Easy deployment and monitoring
Open-Source


Why MongoDB ?
[Live project] customers data

[Live project] sfr.fr targeted ads

[Development project] Products catalog




Our projects based on MongoDB
MongoDB @ SFR
Customer Data
Hello!




Jérôme Leleu, web architect @ SFR
In charge of SSO and user profile service
User profile service (UPS)

Web services (SOAP or JSON)

Get the profile of SFR clients

Data are agregated from many backends of the information
system




Context
Java 1.6, mongo driver 2.6.5, replicat set + sharding

Technical data : « local storage » collection
■   only 1 collection in a database
■   « last connection date » of web account
■   14 millions
■   read/writes by identifier of the web account (shard key)



Some functional data are coming : « internautes » collection
(6 millions)…




Data in UPS
My choice : read on slave and write (without acknowledge)
on master

« local storage » collection needs to be readable immediatly
after write

-> not really compatible with asynchronous replication and
reads on slave

-> use of memcached (like for most data in UPS) as a
cache for reads (let replication happens)




Implementation in MongoDB
2 Go of data and 2 Go of index for 14 millions documents
(from « db.stats(); »)

Insert / update : 600 k each day / communication exception
: 6 k each day
Average insert/update time : 56 ms




Some figures
Default values of the Java mongo driver are inappropriate :
unlimited connect timeout, unlimited read timeout, wait 120
seconds to get a connection from pool !

Cant’ make « AND » query on the same field
before mongo 2.0

Is it a good choice to read on slave / write on master ?
Replication time ? Is it a real use case ?
To replace by :
force acknowledge on writes and read on slave ?
OR
don’t acknowledge writes and read on master ?



Problems & pending question
Mongo @ SFR
Targeted ads application
Hi!




Matthieu Blanc
Web architect @ Degetel, contractor for SFR
Context
Present targeted ads to www.sfr.fr web visitors
Based on :

●   Their profile
●   Their web browsing history
●   Date/Time of the day
●   etc.
Ex : A web visitor consult a smartphone @ www.sfr.fr
A smartphones ad is shown when he goes back to
homepage
Ex : A web visitor goes to www.sfr.fr from a search
engine
An ad related to his search is shown
Problem
Need to keep web visitor web browsing history

Need to track down every :
● Ad views
● Clicks
● Conversions

Mongo DB to the rescue!
image from http://www.flickr.
                                           com/photos/cayusa/




The D.U.N.C.E. principle : everything by default
Java 1.6
         Spring Data for MongoDB 1.0.0
         (uses mongo driver 2.7.1)
         Read/Write on master
         No Sharding
         WriteConcern.NORMAL




The D.U.N.C.E. principle : everything by default
Case Study




Event Logging with MongoDB
Capped collections :

Event Logging
db.createCollection("mycoll", {capped: true, size:100000})


Old log data automatically LRU’s out
No risk of filling up a disk

no need to write log archival / deletion scripts

Good performance for a high number of writes compared to
reads




Event Logging
Map Reduce <- we are bad at this

  Cron Job -> Server side logs aggregation by minute
  and by ad

  Aggregated logs persisted in a dedicated collection

  Cron Job 2 consolidate aggregated logs by hour every
  day

  Cron Job 3 consolidate aggregated logs by day every
  week




Log Analysis
Event Logging
The Result
The Result
The Result
Main collection (visitors web browsing history):
36 millions documents and growing
Some Data
Avg. document size 430 bytes

80 millions events processed in less than 3 months

By seconds 60 reads 50 writes (60 finds, 30 updates, 20
inserts)




Conclusion
It works! :)


Some Data
Default properties are good enough even for a high traffic
website (for now...)




Conclusion
Mongo @ SFR
Products catalog
Good morning!




David Rault, web architect @ SFR
In charge of MarketPlace project
@squat80       http://fr.linkedin.com/pub/david-rault/37/722/963
●   Products classified by categories
 ●   Categories determine products features
 ●   Multiple sellers
     ○   can create new products (based on EAN/MPN)
          ■ can modify the products they created

          ■ can only refer to products created by other

            sellers
     ○   publish offers (product id + price)
 ●   Order management is out-of-scope
     ○   delegated to existing order-management system
 ●   Still in development

Context
●   Schema-less: products are structured
    documents
    ○   Different properties depending on product category
        (TVs, phone protections, wires, ...)
    ○   No JOIN required - documents load in a single call
    ○   New categories will come : no migration required
●   Searching capabilities
    ○   Empowers navigating through the store
    ○   Complex-queries on products features
●   Performance
    ○   Our Ops forbid intensive writes into Oracle DB (!)


Why Mongo ?
Java 7 - Tomcat 7

Direct use of Java driver (2.7.2)

Replicat-set (2 replicas + 1 arbiter)

Sharding enabled

Writes are replicas-safe



Technical choices
●   WS for creation/update of products and
     offers
 ●   Triggers (scheduled) to consolidate data
     ○   for each product : valid offers on a 2-day window
         are agregated into the product
     ○   for each categories : product counts, pseudo-
         enumerated field values (e.g. list of brands) are
         agregrated into the product
 ●   "Live streaming" into Google Search
     Appliance
     ○   feed for both internal keyword searches & portal-
         wide searches (within *.sfr.fr sites)

"Back-office" Design
●   Straight-forward queries
     ○   mostly READs
     ○   by product id, by category
     ○   filtering (min/max price, by brand, by color, ...)
           ■ filters are category-specific

 ●   Customer-activity tracking
     ○   build knowledge base for future features:
          ■ recommendation engine

     ○   products viewed, previous orders, wish-list, etc.
     ○   both for identified and anonymous visitors




"Front-office" design
●   Need to unlearn 10+ years EXP in
     relational design/development
     ○   Think "document", not relation
     ○   No magical (a.k.a ORM) framework
             ●   bye bye Hibernate ;)
     ○   Some surprises/confusion with the query syntax
          ■ No "$and" in versions <2.0, didn't manage some

            queries (though it worked in mongo shell)
             ●   "min_price > a and min_price > b" with the Java driver
         ■   Function operators appear at varying positions
             ●   { "$lt": { "some_field": some_value }}
             ●   { "some_field": { "$in" : some_values }}




How is it going ?
●   Good performance
     ○   Although relatively low number of documents
         (~5-10 000 documents)
 ●   Fast development cycle
     ○   Only a few hours to have the first prototype
         running
     ○   With google's help and a couple of hours, build a
         micro full-text indexing search feature
 ●   Mongo Shell is my friend
     ○   as well as Google & MongoDB.org
     ○   at last, a developer-friendly (command-line) tool
             ●   bye bye sqlplus ;)



How is it still going ?
"borrowed" from Geek and Poke http://geekandpoke.typepad.com/




       Thank You!

Weitere ähnliche Inhalte

Was ist angesagt?

Munching the mongo
Munching the mongoMunching the mongo
Munching the mongo
VulcanMinds
 
CouchDB: replicated data store for distributed proxy server
CouchDB: replicated data store for distributed proxy serverCouchDB: replicated data store for distributed proxy server
CouchDB: replicated data store for distributed proxy server
tkramar
 
GR8Conf 2011: Building Progressive UIs with Grails
GR8Conf 2011: Building Progressive UIs with GrailsGR8Conf 2011: Building Progressive UIs with Grails
GR8Conf 2011: Building Progressive UIs with Grails
GR8Conf
 
Intro To Mongo Db
Intro To Mongo DbIntro To Mongo Db
Intro To Mongo Db
chriskite
 

Was ist angesagt? (15)

MongoDB FabLab León
MongoDB FabLab LeónMongoDB FabLab León
MongoDB FabLab León
 
Getting Started with MongoDB
Getting Started with MongoDBGetting Started with MongoDB
Getting Started with MongoDB
 
MongoDB World 2018: Tutorial - MongoDB & NodeJS: Zero to Hero in 80 Minutes
MongoDB World 2018: Tutorial - MongoDB & NodeJS: Zero to Hero in 80 MinutesMongoDB World 2018: Tutorial - MongoDB & NodeJS: Zero to Hero in 80 Minutes
MongoDB World 2018: Tutorial - MongoDB & NodeJS: Zero to Hero in 80 Minutes
 
Mongodb (1)
Mongodb (1)Mongodb (1)
Mongodb (1)
 
Munching the mongo
Munching the mongoMunching the mongo
Munching the mongo
 
Grails and Neo4j
Grails and Neo4jGrails and Neo4j
Grails and Neo4j
 
MongoDB NoSQL - Developer Guide
MongoDB NoSQL - Developer GuideMongoDB NoSQL - Developer Guide
MongoDB NoSQL - Developer Guide
 
CouchDB: replicated data store for distributed proxy server
CouchDB: replicated data store for distributed proxy serverCouchDB: replicated data store for distributed proxy server
CouchDB: replicated data store for distributed proxy server
 
Mango Database - Web Development
Mango Database - Web DevelopmentMango Database - Web Development
Mango Database - Web Development
 
GR8Conf 2011: Building Progressive UIs with Grails
GR8Conf 2011: Building Progressive UIs with GrailsGR8Conf 2011: Building Progressive UIs with Grails
GR8Conf 2011: Building Progressive UIs with Grails
 
The Basics of MongoDB
The Basics of MongoDBThe Basics of MongoDB
The Basics of MongoDB
 
Intro To Mongo Db
Intro To Mongo DbIntro To Mongo Db
Intro To Mongo Db
 
MongoDB
MongoDBMongoDB
MongoDB
 
High Level Infrastructure of Data Driven Blog
High Level Infrastructure of Data Driven BlogHigh Level Infrastructure of Data Driven Blog
High Level Infrastructure of Data Driven Blog
 
Mojo Facets – so, you have data and browser?
Mojo Facets – so, you have data and browser?Mojo Facets – so, you have data and browser?
Mojo Facets – so, you have data and browser?
 

Ähnlich wie MongoDB@sfr.fr

MongoDB Tick Data Presentation
MongoDB Tick Data PresentationMongoDB Tick Data Presentation
MongoDB Tick Data Presentation
MongoDB
 
Programming for non tech entrepreneurs
Programming for non tech entrepreneursProgramming for non tech entrepreneurs
Programming for non tech entrepreneurs
Rodrigo Gil
 

Ähnlich wie MongoDB@sfr.fr (20)

Google BigQuery for Everyday Developer
Google BigQuery for Everyday DeveloperGoogle BigQuery for Everyday Developer
Google BigQuery for Everyday Developer
 
MongoDB Tick Data Presentation
MongoDB Tick Data PresentationMongoDB Tick Data Presentation
MongoDB Tick Data Presentation
 
Node.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scaleNode.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scale
 
Dust.js
Dust.jsDust.js
Dust.js
 
Exploring Google APIs with Python
Exploring Google APIs with PythonExploring Google APIs with Python
Exploring Google APIs with Python
 
Bogdan Belu @IMWORLD 2014. Case study: How Distinct helped evoMAG.ro handle B...
Bogdan Belu @IMWORLD 2014. Case study: How Distinct helped evoMAG.ro handle B...Bogdan Belu @IMWORLD 2014. Case study: How Distinct helped evoMAG.ro handle B...
Bogdan Belu @IMWORLD 2014. Case study: How Distinct helped evoMAG.ro handle B...
 
From prototype to production - The journey of re-designing SmartUp.io
From prototype to production - The journey of re-designing SmartUp.ioFrom prototype to production - The journey of re-designing SmartUp.io
From prototype to production - The journey of re-designing SmartUp.io
 
Web App Prototypes with Google App Engine
Web App Prototypes with Google App EngineWeb App Prototypes with Google App Engine
Web App Prototypes with Google App Engine
 
Programming for non tech entrepreneurs
Programming for non tech entrepreneursProgramming for non tech entrepreneurs
Programming for non tech entrepreneurs
 
Eko10 Workshop Opensource Database Auditing
Eko10  Workshop Opensource Database AuditingEko10  Workshop Opensource Database Auditing
Eko10 Workshop Opensource Database Auditing
 
Angular (v2 and up) - Morning to understand - Linagora
Angular (v2 and up) - Morning to understand - LinagoraAngular (v2 and up) - Morning to understand - Linagora
Angular (v2 and up) - Morning to understand - Linagora
 
PyGrunn2013 High Performance Web Applications with TurboGears
PyGrunn2013  High Performance Web Applications with TurboGearsPyGrunn2013  High Performance Web Applications with TurboGears
PyGrunn2013 High Performance Web Applications with TurboGears
 
SEO for Large Websites
SEO for Large WebsitesSEO for Large Websites
SEO for Large Websites
 
Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitM...
Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitM...Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitM...
Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitM...
 
Mongo db - How we use Go and MongoDB by Sam Helman
Mongo db - How we use Go and MongoDB by Sam HelmanMongo db - How we use Go and MongoDB by Sam Helman
Mongo db - How we use Go and MongoDB by Sam Helman
 
Eko10 workshop - OPEN SOURCE DATABASE MONITORING
Eko10 workshop - OPEN SOURCE DATABASE MONITORINGEko10 workshop - OPEN SOURCE DATABASE MONITORING
Eko10 workshop - OPEN SOURCE DATABASE MONITORING
 
Introduction to InfluxDB, an Open Source Distributed Time Series Database by ...
Introduction to InfluxDB, an Open Source Distributed Time Series Database by ...Introduction to InfluxDB, an Open Source Distributed Time Series Database by ...
Introduction to InfluxDB, an Open Source Distributed Time Series Database by ...
 
GWT - Building Rich Internet Applications Using OO Tools
GWT - Building Rich Internet Applications Using OO ToolsGWT - Building Rich Internet Applications Using OO Tools
GWT - Building Rich Internet Applications Using OO Tools
 
PHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
PHP At 5000 Requests Per Second: Hootsuite’s Scaling StoryPHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
PHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
 
MongoDB Schema Design by Examples
MongoDB Schema Design by ExamplesMongoDB Schema Design by Examples
MongoDB Schema Design by Examples
 

Kürzlich hochgeladen

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Kürzlich hochgeladen (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 

MongoDB@sfr.fr

  • 3. Apache, Tomcat, JEE 1 mutualised platform 30 physical application servers 150 Tomcat deployed Web development at Internet Direction
  • 4. 22M pageviews per day 4.5M only on homepage 8M customers authentication per day We NEED to scale! What do we face ?
  • 5. Increase our scalability Avoid Schema/Table/Column dependency Closer to developper team than sysadmin or DBA team NoSQL?
  • 6. Scalable Complex queries Schema-less Easy deployment and monitoring Open-Source Why MongoDB ?
  • 7. [Live project] customers data [Live project] sfr.fr targeted ads [Development project] Products catalog Our projects based on MongoDB
  • 9. Hello! Jérôme Leleu, web architect @ SFR In charge of SSO and user profile service
  • 10. User profile service (UPS) Web services (SOAP or JSON) Get the profile of SFR clients Data are agregated from many backends of the information system Context
  • 11. Java 1.6, mongo driver 2.6.5, replicat set + sharding Technical data : « local storage » collection ■ only 1 collection in a database ■ « last connection date » of web account ■ 14 millions ■ read/writes by identifier of the web account (shard key) Some functional data are coming : « internautes » collection (6 millions)… Data in UPS
  • 12. My choice : read on slave and write (without acknowledge) on master « local storage » collection needs to be readable immediatly after write -> not really compatible with asynchronous replication and reads on slave -> use of memcached (like for most data in UPS) as a cache for reads (let replication happens) Implementation in MongoDB
  • 13. 2 Go of data and 2 Go of index for 14 millions documents (from « db.stats(); ») Insert / update : 600 k each day / communication exception : 6 k each day Average insert/update time : 56 ms Some figures
  • 14. Default values of the Java mongo driver are inappropriate : unlimited connect timeout, unlimited read timeout, wait 120 seconds to get a connection from pool ! Cant’ make « AND » query on the same field before mongo 2.0 Is it a good choice to read on slave / write on master ? Replication time ? Is it a real use case ? To replace by : force acknowledge on writes and read on slave ? OR don’t acknowledge writes and read on master ? Problems & pending question
  • 15. Mongo @ SFR Targeted ads application
  • 16. Hi! Matthieu Blanc Web architect @ Degetel, contractor for SFR
  • 17. Context Present targeted ads to www.sfr.fr web visitors Based on : ● Their profile ● Their web browsing history ● Date/Time of the day ● etc.
  • 18. Ex : A web visitor consult a smartphone @ www.sfr.fr
  • 19. A smartphones ad is shown when he goes back to homepage
  • 20. Ex : A web visitor goes to www.sfr.fr from a search engine
  • 21. An ad related to his search is shown
  • 22. Problem Need to keep web visitor web browsing history Need to track down every : ● Ad views ● Clicks ● Conversions Mongo DB to the rescue!
  • 23. image from http://www.flickr. com/photos/cayusa/ The D.U.N.C.E. principle : everything by default
  • 24. Java 1.6 Spring Data for MongoDB 1.0.0 (uses mongo driver 2.7.1) Read/Write on master No Sharding WriteConcern.NORMAL The D.U.N.C.E. principle : everything by default
  • 25. Case Study Event Logging with MongoDB
  • 26. Capped collections : Event Logging db.createCollection("mycoll", {capped: true, size:100000}) Old log data automatically LRU’s out No risk of filling up a disk no need to write log archival / deletion scripts Good performance for a high number of writes compared to reads Event Logging
  • 27. Map Reduce <- we are bad at this Cron Job -> Server side logs aggregation by minute and by ad Aggregated logs persisted in a dedicated collection Cron Job 2 consolidate aggregated logs by hour every day Cron Job 3 consolidate aggregated logs by day every week Log Analysis
  • 32. Main collection (visitors web browsing history): 36 millions documents and growing Some Data Avg. document size 430 bytes 80 millions events processed in less than 3 months By seconds 60 reads 50 writes (60 finds, 30 updates, 20 inserts) Conclusion
  • 33. It works! :) Some Data Default properties are good enough even for a high traffic website (for now...) Conclusion
  • 35. Good morning! David Rault, web architect @ SFR In charge of MarketPlace project @squat80 http://fr.linkedin.com/pub/david-rault/37/722/963
  • 36. Products classified by categories ● Categories determine products features ● Multiple sellers ○ can create new products (based on EAN/MPN) ■ can modify the products they created ■ can only refer to products created by other sellers ○ publish offers (product id + price) ● Order management is out-of-scope ○ delegated to existing order-management system ● Still in development Context
  • 37. Schema-less: products are structured documents ○ Different properties depending on product category (TVs, phone protections, wires, ...) ○ No JOIN required - documents load in a single call ○ New categories will come : no migration required ● Searching capabilities ○ Empowers navigating through the store ○ Complex-queries on products features ● Performance ○ Our Ops forbid intensive writes into Oracle DB (!) Why Mongo ?
  • 38. Java 7 - Tomcat 7 Direct use of Java driver (2.7.2) Replicat-set (2 replicas + 1 arbiter) Sharding enabled Writes are replicas-safe Technical choices
  • 39. WS for creation/update of products and offers ● Triggers (scheduled) to consolidate data ○ for each product : valid offers on a 2-day window are agregated into the product ○ for each categories : product counts, pseudo- enumerated field values (e.g. list of brands) are agregrated into the product ● "Live streaming" into Google Search Appliance ○ feed for both internal keyword searches & portal- wide searches (within *.sfr.fr sites) "Back-office" Design
  • 40. Straight-forward queries ○ mostly READs ○ by product id, by category ○ filtering (min/max price, by brand, by color, ...) ■ filters are category-specific ● Customer-activity tracking ○ build knowledge base for future features: ■ recommendation engine ○ products viewed, previous orders, wish-list, etc. ○ both for identified and anonymous visitors "Front-office" design
  • 41. Need to unlearn 10+ years EXP in relational design/development ○ Think "document", not relation ○ No magical (a.k.a ORM) framework ● bye bye Hibernate ;) ○ Some surprises/confusion with the query syntax ■ No "$and" in versions <2.0, didn't manage some queries (though it worked in mongo shell) ● "min_price > a and min_price > b" with the Java driver ■ Function operators appear at varying positions ● { "$lt": { "some_field": some_value }} ● { "some_field": { "$in" : some_values }} How is it going ?
  • 42. Good performance ○ Although relatively low number of documents (~5-10 000 documents) ● Fast development cycle ○ Only a few hours to have the first prototype running ○ With google's help and a couple of hours, build a micro full-text indexing search feature ● Mongo Shell is my friend ○ as well as Google & MongoDB.org ○ at last, a developer-friendly (command-line) tool ● bye bye sqlplus ;) How is it still going ?
  • 43. "borrowed" from Geek and Poke http://geekandpoke.typepad.com/ Thank You!