MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
2014 03-12-fr schema design and app architecture-2
1. Tugdual Grall (@tgrall)
Alain Hélaïli (@AlainHelaili)
#MongoDBBasics @MongoDB
Construire une application avec MongoDB
Design du schéma et
architecture applicative
2. Agenda
• Travailler avec des documents
• Fonctionnalités de l’application
• Design du schéma
• Architecture de ‘myCMS’ et exemples de code
• Q&A
5. Exemple de document
{
‘_id’ : ObjectId(..),
‘title’: ‘Schema design in MongoDB’,
‘author’: ‘mattbates’,
‘text’: ‘Data in MongoDB has a flexible schema..’,
‘date’ : ISODate(..),
‘tags’: [‘MongoDB’, ‘schema’],
‘comments’: [ { ‘text ‘ : ‘Really useful..’, ts: ISODate(..) } ]
}
6. Fonctionnalités de ‘myCMS’
• Différents types d’articles et catégories.
• Les utilisateurs peuvent s’enregistrer, se
connecter/déconnecter, et éditer leur profil.
• Les utilisateurs peuvent poster des articles et effectuer des
commentaires sur ces articles.
• Des statistiques d’utilisation sont collectées et analysées –
publications, visualisations, interactions – pour le site et le
back-office (analytics).
13. Modélisation des commentaires (1)
• Deux collections – articles et comments
• Référence (i.e. foreign key) pour les relier
• MAIS.. N+1 requêtes pour récupérer articles et
commentaires
{
‘_id’: ObjectId(..),
‘title’: ‘Schema design in MongoDB’,
‘author’: ‘mattbates’,
‘date’: ISODate(..),
‘tags’: [‘MongoDB’, ‘schema’],
‘section’: ‘schema’,
‘slug’: ‘schema-design-in-mongodb’,
‘comments’: [ ObjectId(..),…]
}
{ ‘_id’: ObjectId(..),
‘article_id’: 1,
‘text’: ‘Agreat article, helped me
understand schema design’,
‘date’: ISODate(..),,
‘author’: ‘johnsmith’
}
14. Modélisation des commentaires (2)
• Une seule collection
articles–commentaires
embarqués dans les
documents article
• Pros
• Requête unique, design
optimisé pour la lecture
• Localité (disk, shard)
• Cons
• Tableau de commentaires non
borné; taille des documents va
croitre (rappel : limite 16MB)
{
‘_id’: ObjectId(..),
‘title’: ‘Schema design in MongoDB’,
‘author’: ‘mattbates’,
‘date’: ISODate(..),
‘tags’: [‘MongoDB’, ‘schema’],
…
‘comments’: [
{
‘text’: ‘Agreat
article,helped me
understandschema design’,
‘date’: ISODate(..),
‘author’: ‘johnsmith’
},
…
]
}
15. Modélisation des commentaires (3)
• Autre option: hybride de (2) et (3), embarquer
top x commentaires (e.g. par date, popularité)
dans le document article
• Tableau de commentaires de taille fixe (2.4 feature)
• Tous les autres commentaires sont déversés dans
une collection ‘comments’ par lots
• Pros
– Taille des documents plus stable– moins de déplacements
– Basé sur une seule requête dans la plupart des accès
– Historique complet des commentaires disponible via
requêtage/agrégation
16. Modélisation des commentaires (3)
{
‘_id’: ObjectId(..),
‘title’: ‘Schemadesignin MongoDB’,
‘author’: ‘mattbates’,
‘date’: ISODate(..),
‘tags’:[‘MongoDB’,‘schema’],
…
‘comments_count’:45,
‘comments_pages’: 1
‘comments’: [
{
‘text’: ‘Agreat article, helped me
understandschema design’,
‘date’: ISODate(..),
‘author’: ‘johnsmith’
},
…
]
}
Ajout d’un compteurde
commentaires
• Elimine les comptages
lors de la lecture
Tableau de
commentairesde taille
fixe
• 10 plus récents
• Triés par date lors de
l’insertion
18. Modélisation des interactions
• Interactions
– Article vus
– Commentaires
– (Social media sharing)
• Besoins
– Séries temporelles
– Pré-agrégations pour préparer l’analytique
19. Modélisation des interactions
• Document par article par jour –
‘bucketing’
• Compteur journalier et sous-
document par heure pour les
interactions
• Tableau borné (24 heures)
• Requête unitaire, prêt à être
graphé
{
‘_id’: ObjectId(..),
‘article_id’: ObjectId(..),
‘section’: ‘schema’,
‘date’: ISODate(..),
‘daily’: { ‘views’: 45, ‘comments’:
150 }
‘hours’: {
0 : { ‘views’: 10 },
1 : { ‘views’: 2 },
…
23 : { ‘comments’: 14, ‘views’: 10
}
}
}
20. JSON and RESTful API
Client-side
JSON
(eg AngularJS, (BSON)
Real applications are not built at a shell – let’s build a RESTful
API.
Pymongo
driver
Python web
app
HTTP(S) REST
Examples to follow: Python RESTful API using Flask
microframework
21. myCMS REST endpoints
Method URI Action
GET /articles Retrieve all articles
GET /articles-by-tag/[tag] Retrieve all articles by tag
GET /articles/[article_id] Retrieve a specific article by article_id
POST /articles Add a new article
GET /articles/[article_id]/comments Retrieve all article comments by
article_id
POST /articles/[article_id]/comments Add a new comment to an article.
POST /users Register a user user
GET /users/[username] Retrieve user’s profile
PUT /users/[username] Update a user’s profile
22. $ git clone http://www.github.com/mattbates/mycms-mongodb
$ cd mycms-mongodb
$ virtualenv venv
$ source venv/bin/activate
$ pip install –r requirements.txt
($ deactivate)
Getting started with the skeleton
code
23. @app.route('/cms/api/v1.0/articles', methods=['GET'])
def get_articles():
"""Retrieves all articles in the collection
sorted by date
"""
# query all articles and return a cursor sorted by date
cur = db['articles'].find().sort({'date':-1})
if not cur:
abort(400)
# iterate the cursor and add docs to a dict
articles = [article for article in cur]
return jsonify({'articles' : json.dumps(articles, default=json_util.default)})
RESTful API methods in Python +
Flask
24. @app.route('/cms/api/v1.0/articles/<string:article_id>/comments', methods = ['POST'])
def add_comment(article_id):
"""Adds a comment to the specified article and a
bucket, as well as updating a view counter
"””
…
# push the comment to the latest bucket and $inc the count
page = db['comments'].find_and_modify(
{ 'article_id' : ObjectId(article_id),
'page' : comments_pages},
{ '$inc' : { 'count' :1 },
'$push' : {
'comments' : comment } },
fields= {'count':1},
upsert=True,
new=True)
RESTful API methods in Python +
Flask
25. # $inc the page count if bucket size (100) is exceeded
if page['count'] > 100:
db.articles.update(
{ '_id' : article_id,
'comments_pages': article['comments_pages'] },
{ '$inc': { 'comments_pages': 1 } } )
# let's also add to the article itself
# most recent 10 comments only
res = db['articles'].update(
{'_id' : ObjectId(article_id)},
{'$push' : {'comments' : { '$each' : [comment],
'$sort' : {’date' : 1 },
'$slice' : -10}},
'$inc' : {'comment_count' : 1}})
…
RESTful API methods in Python +
Flask
26. def add_interaction(article_id, type):
"""Record the interaction (view/comment) for the
specified article into the daily bucket and
update an hourly counter
"""
ts = datetime.datetime.utcnow()
# $inc daily and hourly view counters in day/article stats bucket
# note the unacknowledged w=0 write concern for performance
db['interactions'].update(
{ 'article_id' : ObjectId(article_id),
'date' : datetime.datetime(ts.year, ts.month, ts.day)},
{ '$inc' : {
'daily.views' : 1,
'hourly.{}.{}'.format(type, ts.hour) : 1
}},
upsert=True,
w=0)
RESTful API methods in Python +
Flask
27. $ curl -i http://localhost:5000/cms/api/v1.0/articles
HTTP/1.0 200 OK
Content-Type: application/json
Content-Length: 20
Server: Werkzeug/0.9.4 Python/2.7.6
Date: Sat, 01 Feb 2014 09:52:57 GMT
{
"articles": "[{"author": "mattbates", "title": "Schema design in
MongoDB", "text": "Data in MongoDB has a flexible schema..", "tags":
["MongoDB", "schema"], "date": {"$date": 1391293347408}, "_id": {"$oid":
"52ed73a30bd031362b3c6bb3"}}]"
}
Testing the API – retrieve articles
28. $ curl -H "Content-Type: application/json" -X POST -d '{"text":"An interesting
article and a great read."}'
http://localhost:5000/cms/api/v1.0/articles/52ed73a30bd031362b3c6bb3/comment
s
{
"comment": "{"date": {"$date": 1391639269724}, "text": "An interesting
article and a great read."}”
}
Testing the API – comment on an
article
29. Schema iteration
New feature in the backlog?
Documents have dynamic schema so we just iterate
the object schema.
>>> user = { ‘username’: ‘matt’,
‘first’ : ‘Matt’,
‘last’ : ‘Bates’,
‘preferences’: { ‘opt_out’: True } }
>>> user..save(user)
33. Résumé
• Document avec schéma flexible et possiblité
d’embarquer des structures de données riches et
complexes
• Différentes stratégies pour assurer la performance
• Design du schéma s’appuie sur les modes d’accès
– pas sur les modes de stockage
• Références pour plus de flexibilité
• Garder en tête la distribution horizontale (shard key)
34. Further reading
• ‘myCMS’ skeleton source code:
http://www.github.com/mattbates/mycms-mongodb
• Use case - metadata and asset management:
http://docs.mongodb.org/ecosystem/use-
cases/metadata-and-asset-management/
• Use case - storing
comments:http://docs.mongodb.org/ecosystem/use-
cases/storing-comments/
35. Prochaine Session– 26 Mars
• Interactions avec la base de données
• Langage de requêtes (find & update)
• Interactions entre l’application et la base
• Exemples de code
In the filing cabinet model, the patient’s x-rays, checkups, and allergies are stored in separate drawers and pulled together (like an RDBMS)In the file folder model, we store all of the patient information in a single folder (like MongoDB)
PriorityFloating point number between 0..1000Highest member that is up to date wins Up to date == within 10 seconds of primaryIf a higher priority member catches up, it will force election and win Slave DelayLags behind master by configurable time delay Automatically hidden from clientsProtects against operator errorsFat fingeringApplication corrupts data
PriorityFloating point number between 0..1000Highest member that is up to date wins Up to date == within 10 seconds of primaryIf a higher priority member catches up, it will force election and win Slave DelayLags behind master by configurable time delay Automatically hidden from clientsProtects against operator errorsFat fingeringApplication corrupts data
PriorityFloating point number between 0..1000Highest member that is up to date wins Up to date == within 10 seconds of primaryIf a higher priority member catches up, it will force election and win Slave DelayLags behind master by configurable time delay Automatically hidden from clientsProtects against operator errorsFat fingeringApplication corrupts data
PriorityFloating point number between 0..1000Highest member that is up to date wins Up to date == within 10 seconds of primaryIf a higher priority member catches up, it will force election and win Slave DelayLags behind master by configurable time delay Automatically hidden from clientsProtects against operator errorsFat fingeringApplication corrupts data
PriorityFloating point number between 0..1000Highest member that is up to date wins Up to date == within 10 seconds of primaryIf a higher priority member catches up, it will force election and win Slave DelayLags behind master by configurable time delay Automatically hidden from clientsProtects against operator errorsFat fingeringApplication corrupts data
PriorityFloating point number between 0..1000Highest member that is up to date wins Up to date == within 10 seconds of primaryIf a higher priority member catches up, it will force election and win Slave DelayLags behind master by configurable time delay Automatically hidden from clientsProtects against operator errorsFat fingeringApplication corrupts data
PriorityFloating point number between 0..1000Highest member that is up to date wins Up to date == within 10 seconds of primaryIf a higher priority member catches up, it will force election and win Slave DelayLags behind master by configurable time delay Automatically hidden from clientsProtects against operator errorsFat fingeringApplication corrupts data
Large scale operation can be combined with high performance on commodity hardware through horizontal scalingBuild - Document oriented database maps perfectly to object oriented languagesScale - MongoDB presents clear path to scalability that isn't ops intensive - Provides same interface for sharded cluster as single instance
Cardinality – Can your data be broken down enough?Query Isolation - query targeting to a specific shardReliability – shard outagesA good shard key can:Optimize routingMinimize (unnecessary) trafficAllow best scaling