moma-django is a MongoDB manager for Django. It provides native Django ORM support for MongoDB documents, including the query API and the admin interface. It was developed as a part of two commercial products and released as an open source. In the talk we will review the motivation behind its developments, its features and go through 2-3 examples of how to use some of the features: migrating an existing model, advanced queries and the admin interface. If time permits we will discuss unit testing and south migrations.
Please find the video at: http://www.youtube.com/watch?v=cxQKTDLjb-w
Also check out: https://twitter.com/gadioren and www.ITculate.io
2. Django + MongoDB:
building a custom ORM layer
Overview of the talk:
moma-django is a MongoDB manager for Django. It provides
native Django ORM support for MongoDB documents,
including the query API and the admin interface. It was
developed as a part of two commercial products and released
as an open source.
In the talk we will review the motivation behind its
developments, its features and go through 2-3 examples of
how to use some of the features: migrating an existing model,
advanced queries and the admin interface. If time permits we
will discuss unit testing and south migrations
3. Who are we?
Company: Cloudoscope.com
What we do:
– Cloudoscope’s product enable IT vendors to automate the presales process by collecting and analyzing prospect IT
performance
– Previous product - Lucidel: B2C marketing analytics based on
website data
– Data intensive projects / sites, NoSQL, analytics focus
(as a way of funding)
Gadi Oren:
@gadioren,
gadioren
4. Why moma-django?
Certain problems can be addressed well with NoSQL
The team wants to experiment with a NoSQL
HOWEVER:
A lot of code needs to be rewritten
Team learn a new API
Some of the tools and procedures are no longer functioning
and should be replaced
– Admin interface
– Unit testing environment
Some of the data need to be somewhat de-normalized*
5. Why moma-django? (our example)
Needed a very efficient way of processing timeseries
The timeseries where constantly growing
We required very detailed search/slice/dice capabilities to
find the timeseries to be processed
Some of the data was optional (e.g. demographics
information was never complete)
Document size, content and structure varied widely
However, we have a small distributed team and we did not
want to create a massive project
We started experimenting using a stub Manager doing small
iterations, adding functionality as we needed over nine
months
6. Other packages
PyMongo – a dependency for moma-django
MongoEngine – somewhat similar concepts in terms of
models
Non relational versions of Django
7. “Native” - advantages
Django packages and plugins (e.g. Admin functionality)
Using similar code conventions
Easier to bring in new team members
Use the same unit testing frameworks (e.g. Jenkins)
Simple experimentation and migration path
8. Let’s make it interactive
Questions Anyone??? (Example Application)
Small question asking application
Allows voting and adding images
Implemented as a django application over MongoDB, using
moma-django
Register and login at http://momadjango.org
Ask away!
9. Migrating an existing model
class TstBook(models.Model):
name = models.CharField(max_length=64)
publish_date = MongoDateTimeField()
author = models.ForeignKey('testing.TstAuthor')
class Meta:
unique_together = ['name', 'author']
class TstAuthor(models.Model):
first_name = models.CharField(max_length=32)
last_name = models.CharField(max_length=32)
class TstBook(MongoModel):
name = models.CharField(max_length=64)
publish_date = MongoDateTimeField()
author = models.ForeignKey('testing.TstAuthor')
class Meta:
unique_together = ['name', 'author']
class TstAuthor(MongoModel):
first_name = models.CharField(max_length=32)
last_name = models.CharField(max_length=32)
models.signals.post_syncdb.connect(post_syncdb_
mongo_handler)
11. Migrating an existing model (2)
Syncdb:
Add objects
>>> TstBook(name=“Good night half moon”, publish_date=datetime.datetime(2014,2,20),
author=TstAuthor.objects.get(first_name=“Gadi”)).save()
12. Migrating an existing model (3)
Breaching uniqueness try and save the same object again:
13. Migrating an existing model (4)
In Mongo: content, indexes
class Meta:
unique_together = ['name', 'author']
Admin
14. New field types
MongoIDField – Internal. Used to hold the MongoDB object
ID
MongoDateTimeField – Used for Datetime
ValuesField – Used to represent a list of objects of any type
StringListField – Used for a list of strings
DictionaryField – Used as a dictionary
Current limitation: nested structures have limited support
19. Queries - by the structure of documents
# How many documents in the DB?
>>> UniqueVisit.objects.all().count()
20
>>> # For how many documents in the DB do we have age information?
>>> UniqueVisit.objects.filter(demographics__age__exists ="true").count()
7
>>> # For how many documents in the DB do we have gender information?
>>> UniqueVisit.objects.filter(demographics__gender__exists ="true").count()
3
>>> # For how many documents in the DB do we have gender and age information?
>>> UniqueVisit.objects.filter(demographics__age__exists ="true“,
demographics__gender__exists ="true").count()
1
>>>
20. Manipulating documents payload
# Model
class Question(MongoModel):
user = models.ForeignKey(User)
date = MongoDateTimeField(db_index=True)
question = models.CharField(max_length=256 )
docs = DictionaryField(models.CharField())
image = DictionaryField(models.TextField())
audio = DictionaryField()
other = DictionaryField()
vote_ids = ValuesField(models.IntegerField())
def __unicode__(self):
return u'%s[%s %s]' % (self.question, self.date,
self.user, )
class Meta:
unique_together = ['user', 'question',]
# Store an image: get the image from the “POST” upload form (snippet)
docfile = request.FILES['docfile']
question_id = form.cleaned_data['question_id']
docfile_name = docfile.name
docfile_name_changed = _replace_dots(docfile.name)
question = Question.objects.get(id=question_id)
# Store meta-data
question.docs.update({docfile_name_changed : docfile.content_type})
question.image.update(
{docfile_name_changed +'_url' : '/static/display/s_'+docfile_name,
docfile_name_changed +'_name' : docfile_name,
docfile_name_changed +'_content_type' : docfile.content_type})
# Store the actual image binary block (small scale implementation)
file_read = docfile.file.read() # Note – this is a naïve implementation!
file_data = base64.b64encode(file_read)
question.image.update({docfile_name_changed +'_data' : file_data})
question.save()
22. So – what’s next?
Github: https://github.com/gadio/moma-django
If you want to contribute – please contact (forking is also an
option)
Contact: gadi.oren.1 at gmail.com or
gadi at Cloudoscope.com
24. South
Dealing with apps with mixed models South to disregard
the model
# Enabling South for the non conventional mongo model
add_introspection_rules(
[
(
(MongoIdField, MongoDateTimeField, DictionaryField ),
[],
{
"max_length": ["max_length", {"default": None}],
},
),
],
["^moma_django.fields.*",])
25. Unit testing
The model name is defined in settings.py
In unit testing run, a new mongo DB schema is created
MONGO_COLLECTION prefixed with “test_”(e.g.
test_momaexample)
MONGO_HOST = 'localhost'
MONGO_PORT = 27017
MONGO_COLLECTION = 'momaexample'