5. 2013-07-05 Europython Florence mongopersist 5/37
choose your database, choose your future
pro con
ZODB
- very transparent
- object store
- only python/native
- no query lang
- no 3rd
party tools
- no default indexing
RDBMS
ORM
- ad-hoc SQL queries
- indexes, tools, etc.
- BIG impedance
mismatch
- limited transparency
- strict schema
mongoDB
- document store
- ad-hoc queries
- indexes, tools, etc.
- small impedance
mismatch
- limited/no
transparency
6. 2013-07-05 Europython Florence mongopersist 6/37
persistence
state that outlives the process
- get the object
- modify
- (finish the transaction)
16. 2013-07-05 Europython Florence mongopersist 16/37
collection sharing
class Person(persistent.Persistent):
_p_mongo_collection = 'person'
name = u''
...
class Employee(Person):
_p_mongo_collection = 'person'
salary = 0
...
mongopersist will automatically notice these cases and
stores the Python type as part of the document
17. 2013-07-05 Europython Florence mongopersist 17/37
sub object/document
class Car(persistent.Persistent):
_p_mongo_sub_object = True
def __init__(self, year, make, model):
self.year = year
self.make = make
self.model = model
>>> dm.root['stephan'].car = Car('2005', 'Ford', 'Explorer')
>>> dumpCollection('__main__.Person')
[{...
u'car': {u'_py_persistent_type': u'__main__.Car',
u'make': u'Ford',
u'model': u'Explorer',
u'year': u'2005'}, ...}]
18. 2013-07-05 Europython Florence mongopersist 18/37
beware of non Persistent objects
class Phone(object):
def __init__(self, country, area, number):
...
>>> stephan.phone = Phone('+1', '978', '394-5124')
>>> dumpCollection('__main__.Person')
[{...
u'phone': {u'_py_type': u'__main__.Phone',
u'area': u'978',
u'country': u'+1',
u'number': u'394-5124'}, ...]
>>> stephan.phone.number = '555-1234'
>>> transaction.commit()
Changes not saved, because not subclassing Persistent
21. 2013-07-05 Europython Florence mongopersist 21/37
Optimistic Data Dumping
The process of dumping data during a transaction
under the assumption the transaction will
succeed.
object modifications
...
object modifications
...
automatic/implicit flush
query
22. 2013-07-05 Europython Florence mongopersist 22/37
Optimistic Data Dumping
>>> stephan.foobar = 42
...code...
>>> roy.foobar = 88
...code...
>>> dm.get_collection_from_object(
... roy).count({'foobar': 88})
1
ALL query methods are wrapped to call flush
first
25. 2013-07-05 Europython Florence mongopersist 25/37
querying mongoDB
datamanager.get_collection(dbname, collname)
datamanager.get_collection_from_object(obj)
- find, find_one, count, etc
extra methods, which return objects:
- find_objects()
- find_one_object()
ALL query methods are wrapped to call flush first
datamanager.load(dbref)
26. 2013-07-05 Europython Florence mongopersist 26/37
object caching
DB access and object instantiation is quite slow
● class Lookup Cache: dbref → python class
lookup
● object Cache: dbref → object (within
transaction)
● document Cache: dbref → document (avoid DB
trip)
27. 2013-07-05 Europython Florence mongopersist 27/37
query logging, incl. traceback
LoggingDecorator: Logs the calls to
insert, update, remove, save, find, find_one,
find_and_modify, count
including args and kwargs.
With optional traceback.
28. 2013-07-05 Europython Florence mongopersist 28/37
containers and collections
class People(MongoCollectionMapping):
__mongo_collection__ = '__main__.Person'
__mongo_mapping_key__ = 'name'
● Mapping/dict API for a Mongo collection.
● Specify the collection to use for the mapping
● Specify the attribute that represents the
dictionary key.
35. 2013-07-05 Europython Florence mongopersist 35/37
databases
pro con
ZODB
- very transparent
- object store
- only python/native
- no query lang
- no 3rd
party tools
- no default indexing
RDBMS
ORM
- ad-hoc SQL queries
- indexes, tools, etc.
- BIG impedance
mismatch
- limited transparency
- rigid schema
mongoDB
- document store
- ad-hoc queries
- indexes, tools, etc.
- small impedance
mismatch
- limited/no
transparency
36. 2013-07-05 Europython Florence mongopersist 36/37
databases
pro con
ZODB
- very transparent
- object store
- only python/native
- no query lang
- no 3rd
party tools
- no default indexing
RDBMS
ORM
- ad-hoc SQL queries
- indexes, tools, etc.
- BIG impedance
mismatch
- limited transparency
- strict schema
mongoDB
- document store
- ad-hoc queries
- indexes, tools, etc.
- small impedance
mismatch
- limited/no
transparency
37. 2013-07-05 Europython Florence mongopersist 37/37
databases
pro con
ZODB
- very transparent
- object store
- only python/native
- no query lang
- no 3rd
party tools
- no default indexing
RDBMS
ORM
- ad-hoc SQL queries
- indexes, tools, etc.
- BIG impedance
mismatch
- limited transparency
- rigid schema
mongoDB
- document store
- ad-hoc queries
- indexes, tools, etc.
- small impedance
mismatch
- limited/no
transparency
Hinweis der Redaktion
A few sentences about mongoDB, databases and persistence Then we'll dive into the features of mongopersist
Ask around who knows mongoDB, pg/mysql, zodb, ZTK, pyramid authors and stuff: I had the idea first with using a KV store for a ZODB backend, because of relstorage and it’s memcached caching. But having ACID transactions on top of any KV store is a pain. Then MongoDB came along Stephan had the idea to implement on top of Persistent and just skip ZODB. I’m the co-pilot since the beginning.
Most of us know it, but it grew tons of features recently It’s document store is great, indexing and query features get improved. Pain is having no ACID transactions. You have to embrace eventual consistency. Some like it some hate it, lately I read complaints about it We like it because of it’s schemaless documents and because it provides easy query possibility, NOT really for scaling
Tools/databases have their strengths and weaknesses You have to evaluate and pick the right tool for the job. Sometimes it turns out later that the eventual best tool isn't the best. But sometimes it’s not just the tool, but their usage e.g. it matters if a sub-object is a sub-document or a separate collection Optimize on demand The goal of mongopersist is to address the cons of mongoDB by effectively reducing the impendance mismatch and being almost fully transparent like ZODB.
State that outlives the process that created it -- so optimal case is when my objects just keep their state without any extra calls I get my object from somewhere, manipulate it and I'm done. OK, need to commit the transaction Otherwise we do this all the time, just with additional circles, query, deserialize, modify, serialize, write Also, when directly manipulating documents / raw data, there are no objects in sight or I need to manually (de) serialize
Let's have a look at an example that's handmade This example is a VERY simple example, imagine traversal and etc Here we miss de/serialization, but at the same time loose OO paradigm At the beginning of a project I want simplicity and optimize later on demand
Then what we can do using mongopersist
A quick view on the class we'll be using with the examples Note the friends {} and visited () attributes IMPORTANT: subclass Persistent, PersistentDict, PersistentList
as good as it gets We need some connection setup code Either app startup code or request setup code will do that, so the DB/datamanager is more or less given for a single request
`dm.root` is a mapping from names to mongoDB DBRef objects that are automatically resolved to objects when accessed That sounds complicated, but in reality it does a find() on the object.name Let's create an object Add it to the datamanager root, which is our persistence root transaction.commit, abort
DumpCollection is our helper
Another sample class, note the _p_mongo_collection attribute That means the instances will get stored in that specified collection
Sub-object, which becomes a document in a different collection
it sometimes makes sense to store multiple types of (similar) objects in the same collection. mongopersist will automatically notice these cases and stores the Python type as part of the document
_p_mongo_sub_object The ``_p_mongo_sub_object`` is used to mark a type of object to be just part of another document. This is a design decision - which can avoid multiple queries - get more consistency - ...
In this case Phone does NOT have Persistent as it's superclass Sub-objects get dumped, but changes later not, that’s why Persistent is needed. mongopersist will return list and dict types converted to PersistentList and PersistentDict, respectively. This makes life easier.
Look, no declaration needed
object, list of objects Note the possibility of recursion in Person Mongopersist silently changes basic mutable types to their persistent implementations Circular references: Object trees might be a problem a problem when inserting
The process of dumping data during a transaction under the assumption the transaction will succeed. CRUD and query don’t match? like you need to store changes before, but a query needs to have those changes Mongopersist keeps the original state of the objects, so we can revert in case of a problem database might temporarily be in an inconsistent state
Here is the code example We modify some objects NO COMMIT! If mongopersist would not flush, the count would return who knows what.
no MVCC in mongoDB, we use a serial number in the document - NoCheckConflictHandler: This conflict handler does absolutely nothing to resolve conflicts. Default of the library. Last flush wins. - SimpleSerialConflictHandler: Detects conflicts by comparing serial numbers and always raises a ``ConflictError`` error. - ResolvingSerialConflictHandler: Detects conflicts by comparing serial numbers and allows objects to resolve conflicts by calling their ``_p_resolveConflict()`` method.
Some objects might not naturally serialize well and create a very ugly Mongo entry. Making querying a pain. Thus, we allow custom serializers to be registered, which can encode/decode different types of objects. register those in serialize.SERIALIZERS there’s already one for datetime.date, which serializes nicely to an ordinal number
datamanager.get_collection(): find_objects, find_one_object the datamanager can load any object by DBRef, that means if you have the database, collection and _id, it’s straightforward to get the object
DB access and craeting objects in python is quite slow, so there are some methods in place to improve on this: - dbref -> python class lookup - object cache: the same object instance is returned on access - document cache: retrieved documents can be cached, so that on object lookup a trip to the DB can be avoided.
LoggingDecorator: LOGGED_METHODS = ['insert', 'update', 'remove', 'save', 'find_and_modify', 'find_one', 'find', 'count'] logs the calls to those methods, incl. args and kwargs, optionally with traceback (by default it's added) mongoDB has it's own query logging, but it definitely won't log tracebacks → __traceback_info__
mapping.MongoCollectionMapping It has a dict-ish interface: subclasses UserDict.DictMixin which should provide all the methods as a dict With __mongo_collection__ you specify the collection NAME within the DB Uses by default the “key” attribute of the contained object as key Override with __mongo_mapping_key__
Explain (not just): events Contained, Container, __name__, __parent__: those are the basics of the tree structure that you usually build with a ZTK/ZODB app ZODB containers can hold Mongo items, allowing switching to mongo on any level MongoContained: works hard on __name__+__parent__ not easily persisted Don't be afraid of MongoContaine[r/d] they provide handy features Find* constrains the scope to the objects contained in the actual container
A usual webapp will be multithreaded Each thread should have it's own MongoDataManager Therefore Mongopersist provides connection pooling A bit more setup code, but with a webapp you'll need the pool Again, don't be afraid of the ZCA Also useful without ZTK, just copy the ZCA calls, it will work
annotations: IMongoAttributeAnnotatable zope annotations can store meta-data about the object itself, like DublinCore data, permissions, etc The default zope IAnnotatable BtreeContainers don't serialize well to mongo, had to rewrite Does not use __annotations__ but stores the keys directly as document attributes, that makes nicer mongo documents. DublinCore: all sorts of metadata (created, modified, etc) which get automatically updated by zope events
pickle? quite handy -- persists any python object (well most, without external effects like files) but it’s downside is the BLOB-ish storage that makes it unqueryable and unaccessible without python the sourcecode must be available to unpickle
It’s less widely known, so some details here: it’s actually a key-value store, it was before any KV store was hype (first commit 1997/feb) Features of the ZODB include: transactions, history/undo, transparently pluggable storage, built-in caching, multiversion concurrency control (MVCC), and scalability across a network (using ZEO). pro: very transparent -- my favourite for MVPs, ACID, good for read intensive apps con: the V in KV uses pickles build your own indexes usual practice is to build the indexes with ZODB, even text indexes, lately text indexes rather go into Lucene and friends
pro: data goes into a well known SQL database, ACID con: very strict schema, object - table impedance mismatch It starts with types, for example a list of strings, how do put that into pgsql? This can be more or less avoided by the ORM in mind and/or doing conversions. But do I want that? Keeping the object and DB schema in sync http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx
pro: documents match objects, quite complex structures can be built no strict schema con: no real ACID, you better think twice how you store the data