SlideShare ist ein Scribd-Unternehmen logo
1 von 38
Downloaden Sie, um offline zu lesen
Using MongoDB and Python
Michael Bright
Python User Group, Grenoble
23 Fevrier 2016
Databases
Databases
Relational Databases (RDBMS) “Les SGBDRs” (OLTP – Online Transaction Processing)
• Based on relational database model invented by E.F. Codd (IBM) in 1970
• SQL (SEQUEL): expressive query language
• Achieves performance increase through “vertical scaling” (higher performance node)
NoSQL (No SQL, or “Not just SQL”?) existed before RDBMS, resurged out of a need for more Scalability for
Web2.0
- Columnar, e.g. Cassandra, Hbase, Vertica
- Document Oriented, e.g. CouchDB, MongoDB, RethinkDB
- Graph, e.g. Neo4J
- Key-value, e.g. CouchDB, Dynamo, Riak, Redis, Oracle NoSQL, MUMPS
- Multi-model, e.g. Alchemy, ArangoDB
• Achieves performance increase through “horizontal scaling” (by adding nodes)
OLAP – Online Analytic Processing
Databases for scalability
The need for scalability requires distribution of processing making it more difficult
to satisfy all criteria, according to the CAP Theorem (2000)
- Consistency (Atomicity)
- Availability (A request will always receive a response)
- Partition Tolerance (if nodes become disconnected, the system continues to
function)
It is impossible to satisfy all three constraints (proved by MIT, 2002).
RDBMS - favour Consistency and Availability but cannot scale horizontally
NoSQL - favour Consistency and Partitioning or Availability and Partitioning
Relational vs. NoSQL
Relational databases provide
- Rich query language (SQL)
- Strong consistency
- Secondary indices
NoSQL provides
- Flexibility
- “dynamic schema” allows missing or extra fields, embedded structures
- Scalability
- Adding nodes allows to scale data size
- Performance
- Adding nodes allows to scale performance
MongoDB combines both sets of advantages
MongoDB
MongoDB – a Document-Oriented DB
Open source DB on github with commercial support from MongoDB Inc (was 10gen).
• Dynamic schema (schema-less) aids agile development
• A document is an associative array (possibly more than 1-level deep), e.g. JSON Object
• Uses BSON (binary JSON format) for serialization (easy to parse)
• Binaries downloadable for Linux(es), Windows, OSX, Solaris
• Drivers available in many languages
• Provides an aggregation framework
• Is the “M” in MEAN
MongoDB – Performance
• Written in C++
• It achieves high performance, especially on dynamic queries
• Allows primary & secondary indices (main tuning element)
• Is horizontally scalable across many nodes
• Extensive use of memory-mapped files
i.e. read-through write-through memory caching.
• Provides high availability with Master/slave replication
• Provides Sharding (horizontal partitioning of data across nodes)
• Horizontal partitioning : complete documents (or rows) are stored on a node
• Vertical partitioning: documents (or rows) are split across nodes
MongoDB – Architecture
MongoDB – Terminology
RDBMS MongoDB
Table, View ➜ Collection
Row ➜ Document
Index ➜ Index
Join ➜ Embedded Document
Foreign Key ➜ Reference
Partition ➜ Shard
MongoDB – Enterprise Products
There are supported products for the enterprise
• MongoDB Enterprise: for production workloads
• MongoDB Compass: Data and Schema visualization tool
• MongoDB Connector for BI:
Allows users to visualize their MongoDB Enterprise data using existing relational
business intelligence tools such as Tableau.
Foreign data wrapper with PostgreSQL to provide a relational SQL view on
MongoDB data
MongoDB v3.2: new features
• The default storage engine is now WiredTiger
• Following acquisition of WiredTiger, Inc.
• Introduced as an optional engine in MongoDB 3.0
• Has more granular concurrency control and native compression (lowering storage costs,
increasing h/w utilization, throughput and providing more predictable performance)
• Replication election enhancements
• Config servers as replica sets
• readConcern, and document validations.
• OpsManager 2.0
MongoDB v3.2: new aggregation features
• $sample, $lookup
• $indexStats
• $filter,$slice,$arrayElemAt, $isArray, $concatArrays
• Partial Indexes
• Document Validation
• 4 bit testing operators
• 2 new Accumulators for $group
• 10 new Arithmetic expression operators
MongoDB components
Components
mongod - The database daemon process.
mongos - Sharding controller.
mongo - The database shell (uses interactive javascript).
Utilities
mongodump - MongoDB dump tool - for backups, snapshots, etc.
mongorestore - MongoDB restore a dump
mongoexport - Export a single collection to test (JSON, CSV)
mongoimport - Import from JSON or CSV
mongofiles - Utility for putting and getting files from MongoDB GridFS
mongostat - Show performance statistics
MongoDB University (http://university.mongodb.com/)
MongoDB drivers exist for many languages
12 official drivers, plus many community languages
MongoDB docs: (https://docs.mongodb.org/manual/)
Using MongoDB
MongoDB shell
> mongo
MongoDB shell version: 3.2.3
connecting to: test
> show dbs
Money_UK 0.000GB
aggregation_example 0.000GB
local 0.000GB
test 0.000GB
> use Money_UK
switched to db Money_UK
> show collections
CARD
SAVING
MongoDB shell
> use test
switched to db test
> db.newcoll.insert({ 'field1': 'an example' })
WriteResult({ "nInserted" : 1 })
> db.newcoll.find().count()
1
> db.newcoll.find()
{ "_id" : ObjectId("56cc7446bb127b86163cf226"), "field1" : "an example" }
> db.newcoll.findOne()
{ "_id" : ObjectId("56cc7446bb127b86163cf226"), "field1" : "an example" }
> db.newcoll.find({'_id':ObjectId("56cc7446bb127b86163cf226")})
{ "_id" : ObjectId("56cc7446bb127b86163cf226"), "field1" : "an example" }
MongoDB shell
> var doc2={'field1': 'example', 'field2': 'example2'}
> db.newcoll.insert(doc2)
WriteResult({ "nInserted" : 1 })
> db.newcoll.find().count()
2
> db.newcoll.find().pretty()
{ "_id" : ObjectId("56cc7446bb127b86163cf226"), "field1" : "an example"
}
{
"_id" : ObjectId("56cc7711bb127b86163cf227"),
"field1" : "example",
"field2" : "example2"
}
MongoDB indexing
Indexing is the single biggest tunable performance factor.
Can index on a single field, e.g. on ascending or descending values:
db.newcoll.ensureIndex( { ‘field1’: 1 } ) // or -1: descending
Or subfields:
db.newcoll.ensureIndex( { ‘arr.subfield’: 1 } )
Or multiple fields:
db.newcoll.ensureIndex( { ‘arr.subfield’: 1, ‘field2’: -1 } )
MongoDB indexing - 2
It is also possible to provide hints to the indexer, on uniqueness, sparseness or to
request index creation as a background task, e.g.
db.newcoll.ensureIndex( { ‘field1’: 1 }, { ‘unique’: true} )
db.newcoll.ensureIndex( { ‘field1’: 1 , ‘field2’: 1}, { ‘sparse’: true} )
db.newcoll.ensureIndex( { ‘field1’: 1 }, { ‘background’: true} )
MongoDB searching
We can search based on a single field
db.newcoll.find( { ‘field1’: ‘any match’ } )
Or subfields:
db.newcoll.find( { ‘arr.subfield’: ‘sub field match’ } )
Or multiple fields:
db.newcoll.find( { ‘arr.subfield’: ‘sub field match’, ‘field2’: ‘match2’ } )
Available indices will be used if possible
MongoDB sorting
We can sort the results in ascending or descending order
db.newcoll.find().sort( { ‘field1’: 1 } ) // or -1: descending
or
db.newcoll.find().sort( { ‘field1’: 1, ‘field2’: -1 } )
Available indices will be used if possible
MongoDB projections
We can reduce the number of returned fields (the projection).
This may enable data access directly from the index.
In this example we limit the returned fields to ‘field1’ (and the ‘_id’ index).
db.newcoll.find({ ‘field1’: ‘example’ }, { ‘field1’: 1} )
In this example we limit the returned fields to ‘field1’ (without the ‘_id’ index).
db.newcoll.find({ ‘field1’: ‘example’ }, { ‘_id’: 0, ‘field1’: 1} )
We can also override the default index
db.newcoll.find({ ‘field1’: ‘example’ }, { ‘_id’: ‘field1’ } )
MongoDB explain plan
We can request explanation of how a query will be handled (showing possible index use)
> db.newcoll.find().explain()
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "test.newcoll",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [ ]
},
"winningPlan" : {
"stage" : "COLLSCAN",
"filter" : {
"$and" : [ ]
},
"direction" : "forward"
},
"rejectedPlans" : [ ]
},
"serverInfo" : {
"host" : "MJBRIGHT7",
"port" : 27017,
"version" : "3.2.3",
"gitVersion" : "b326ba837cf6f49d65c2f85e1b70f6f31ece7937"
},
"ok" : 1
}
MongoDB special index types
• Geospatial Indexes (2d Sphere)
• Text Indexes
• TTL Collections (expireAfterSeconds)
• Hashed Indexes for sharding
MongoDB aggregation framework
Aggregation uses a pipeline of operations amongst
• $match
• $project
• $unwind
• $group
• $sort
MongoDB aggregation example
pipeline = [
{ "$project": { // Select fields of interest
'year': { "$dateToString": { 'format': "%Y", 'date': "$date" } },
'tags': 1, 'value': 1, }, },
{ "$match": { 'year': str(year) }},
{ "$group": {"_id": "$tags", "total": {"$sum": {"$abs": "$value"}}} },
{ "$sort": SON([("total", -1), ("_id", -1)]) }
]
cursor = db.collection.aggregate(pipeline)
MongoDB aggregation example
pipeline = [
{ "$project": {
'year': { "$dateToString": { 'format': "%Y", 'date': "$date" } },
'tags': 1, 'value': 1, }, },
{ "$match": { 'year': str(year) }}, // match on fields
{ "$group": {"_id": "$tags", "total": {"$sum": {"$abs": "$value"}}} },
{ "$sort": SON([("total", -1), ("_id", -1)]) }
]
cursor = db.collection.aggregate(pipeline)
MongoDB aggregation example
pipeline = [
{ "$project": {
'year': { "$dateToString": { 'format': "%Y", 'date': "$date" } },
'tags': 1, 'value': 1, }, },
{ "$match": { 'year': str(year) }},
{ "$group": {"_id": "$tags", "total": {"$sum": {"$abs": "$value"}}} }, // ‘reduce’
{ "$sort": SON([("total", -1), ("_id", -1)]) }
]
cursor = db.collection.aggregate(pipeline)
MongoDB aggregation example
pipeline = [
{ "$project": {
'year': { "$dateToString": { 'format': "%Y", 'date': "$date" } },
'tags': 1, 'value': 1, }, },
{ "$match": { 'year': str(year) }},
{ "$group": {"_id": "$tags", "total": {"$sum": {"$abs": "$value"}}} },
{ "$sort": SON([("total", -1), ("_id", -1)]) } // sort the results
]
cursor = db.collection.aggregate(pipeline)
PyMongo – ca y est, on arrive !
But first …
PyMongo – ca y est, on arrive !
… there are alternatives to the official PyMongo driver.
• Motor is a full-featured, non-blocking MongoDB driver for Python Tornado
applications.
• TxMongo is an asynchronous Twisted Python driver for MongoDB.
PyMongo – ca y est, on arrive !
… and other Python/MongoDB tools
ORM-like Layers
New users are recommended to begin with PyMongo.
Nevertheless other implementations are available providing higher abstraction.
Humongolus, MongoKit, Ming, MongoAlchemy, MongoEngine, Minimongo, Manga,
MotorEngine
Framework Tools
This section lists tools and adapters that have been designed to work with various Python
frameworks and libraries.
Django MongoDB Engine, Mango, Django MongoEngine, mongodb_beaker, Log4Mongo,
MongoLog, C5t, rod.recipe.mongodb, repoze-what-plugins-mongodb, Mongobox, Flask-
MongoAlchemy, Flask-MongoKit, Flask-PyMongo.
Questions?
Backup Slides

Weitere ähnliche Inhalte

Was ist angesagt?

Webinar: Getting Started with MongoDB - Back to Basics
Webinar: Getting Started with MongoDB - Back to BasicsWebinar: Getting Started with MongoDB - Back to Basics
Webinar: Getting Started with MongoDB - Back to BasicsMongoDB
 
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...DataStax
 
Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Couchbase Tutorial: Big data Open Source Systems: VLDB2018Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Couchbase Tutorial: Big data Open Source Systems: VLDB2018Keshav Murthy
 
Mongodb introduction and_internal(simple)
Mongodb introduction and_internal(simple)Mongodb introduction and_internal(simple)
Mongodb introduction and_internal(simple)Kai Zhao
 
Cutting Edge Data Processing with PHP & XQuery
Cutting Edge Data Processing with PHP & XQueryCutting Edge Data Processing with PHP & XQuery
Cutting Edge Data Processing with PHP & XQueryWilliam Candillon
 
Benefits of Using MongoDB Over RDBMS (At An Evening with MongoDB Minneapolis ...
Benefits of Using MongoDB Over RDBMS (At An Evening with MongoDB Minneapolis ...Benefits of Using MongoDB Over RDBMS (At An Evening with MongoDB Minneapolis ...
Benefits of Using MongoDB Over RDBMS (At An Evening with MongoDB Minneapolis ...MongoDB
 
Json in Postgres - the Roadmap
 Json in Postgres - the Roadmap Json in Postgres - the Roadmap
Json in Postgres - the RoadmapEDB
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBNosh Petigara
 
Lecture 40 1
Lecture 40 1Lecture 40 1
Lecture 40 1patib5
 
Using MongoDB and Python
Using MongoDB and PythonUsing MongoDB and Python
Using MongoDB and PythonMike Bright
 
MongoDB, Hadoop and humongous data - MongoSV 2012
MongoDB, Hadoop and humongous data - MongoSV 2012MongoDB, Hadoop and humongous data - MongoSV 2012
MongoDB, Hadoop and humongous data - MongoSV 2012Steven Francia
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBantoinegirbal
 
MongoDB Best Practices for Developers
MongoDB Best Practices for DevelopersMongoDB Best Practices for Developers
MongoDB Best Practices for DevelopersMoshe Kaplan
 
Introduction to MongoDB and Hadoop
Introduction to MongoDB and HadoopIntroduction to MongoDB and Hadoop
Introduction to MongoDB and HadoopSteven Francia
 

Was ist angesagt? (20)

MongoDB
MongoDBMongoDB
MongoDB
 
Webinar: Getting Started with MongoDB - Back to Basics
Webinar: Getting Started with MongoDB - Back to BasicsWebinar: Getting Started with MongoDB - Back to Basics
Webinar: Getting Started with MongoDB - Back to Basics
 
Full metal mongo
Full metal mongoFull metal mongo
Full metal mongo
 
Mongodb
MongodbMongodb
Mongodb
 
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
 
Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Couchbase Tutorial: Big data Open Source Systems: VLDB2018Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Couchbase Tutorial: Big data Open Source Systems: VLDB2018
 
Mongodb introduction and_internal(simple)
Mongodb introduction and_internal(simple)Mongodb introduction and_internal(simple)
Mongodb introduction and_internal(simple)
 
Cutting Edge Data Processing with PHP & XQuery
Cutting Edge Data Processing with PHP & XQueryCutting Edge Data Processing with PHP & XQuery
Cutting Edge Data Processing with PHP & XQuery
 
Benefits of Using MongoDB Over RDBMS (At An Evening with MongoDB Minneapolis ...
Benefits of Using MongoDB Over RDBMS (At An Evening with MongoDB Minneapolis ...Benefits of Using MongoDB Over RDBMS (At An Evening with MongoDB Minneapolis ...
Benefits of Using MongoDB Over RDBMS (At An Evening with MongoDB Minneapolis ...
 
Json in Postgres - the Roadmap
 Json in Postgres - the Roadmap Json in Postgres - the Roadmap
Json in Postgres - the Roadmap
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Lecture 40 1
Lecture 40 1Lecture 40 1
Lecture 40 1
 
Using MongoDB and Python
Using MongoDB and PythonUsing MongoDB and Python
Using MongoDB and Python
 
MongoDB, Hadoop and humongous data - MongoSV 2012
MongoDB, Hadoop and humongous data - MongoSV 2012MongoDB, Hadoop and humongous data - MongoSV 2012
MongoDB, Hadoop and humongous data - MongoSV 2012
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
MongoDB Best Practices for Developers
MongoDB Best Practices for DevelopersMongoDB Best Practices for Developers
MongoDB Best Practices for Developers
 
Mongodb
MongodbMongodb
Mongodb
 
wtf is in Java/JDK/wtf7?
wtf is in Java/JDK/wtf7?wtf is in Java/JDK/wtf7?
wtf is in Java/JDK/wtf7?
 
Polyglot Persistence
Polyglot PersistencePolyglot Persistence
Polyglot Persistence
 
Introduction to MongoDB and Hadoop
Introduction to MongoDB and HadoopIntroduction to MongoDB and Hadoop
Introduction to MongoDB and Hadoop
 

Andere mochten auch

Comunicacion humana
Comunicacion humanaComunicacion humana
Comunicacion humanaAron Asir
 
2016 may-31 dockercon2016–cool-hackssubmission
2016 may-31 dockercon2016–cool-hackssubmission2016 may-31 dockercon2016–cool-hackssubmission
2016 may-31 dockercon2016–cool-hackssubmissionMichael Bright
 
Put oko Old School a #2 izdanje
Put oko Old School a #2 izdanjePut oko Old School a #2 izdanje
Put oko Old School a #2 izdanjeDavid Petrovic
 
Comunicacion humana
Comunicacion humanaComunicacion humana
Comunicacion humanaAron Asir
 
Replacing social media follow buttons with and email form is bad branding.
Replacing social media follow buttons with and email form is bad branding.Replacing social media follow buttons with and email form is bad branding.
Replacing social media follow buttons with and email form is bad branding.Erica MacDonald RN,BSN,MSN
 
Euro python2016 logistics
Euro python2016 logisticsEuro python2016 logistics
Euro python2016 logisticsMichael Bright
 
Las fuentes renovables
Las fuentes renovablesLas fuentes renovables
Las fuentes renovablesmanelait
 
TAG! You're It!
TAG! You're It!TAG! You're It!
TAG! You're It!bmellis
 
Container coneu2016 lab
Container coneu2016 labContainer coneu2016 lab
Container coneu2016 labMichael Bright
 
Put oko Old School a - Dillimore
Put oko Old School a - DillimorePut oko Old School a - Dillimore
Put oko Old School a - DillimoreDavid Petrovic
 
Comunicacion humana
Comunicacion humanaComunicacion humana
Comunicacion humanaAron Asir
 
JFC_itadakimasu出店説明会
JFC_itadakimasu出店説明会JFC_itadakimasu出店説明会
JFC_itadakimasu出店説明会Akihiko Koga
 
Presentation jupyter foreverythingelse
Presentation jupyter foreverythingelsePresentation jupyter foreverythingelse
Presentation jupyter foreverythingelseMichael Bright
 
2017 jan-18 meetup-functional_python
2017 jan-18 meetup-functional_python2017 jan-18 meetup-functional_python
2017 jan-18 meetup-functional_pythonMichael Bright
 
Put oko Old School-a #2 izdanje
Put oko Old School-a #2 izdanjePut oko Old School-a #2 izdanje
Put oko Old School-a #2 izdanjeDavid Petrovic
 

Andere mochten auch (20)

Comunicacion humana
Comunicacion humanaComunicacion humana
Comunicacion humana
 
2016 may-31 dockercon2016–cool-hackssubmission
2016 may-31 dockercon2016–cool-hackssubmission2016 may-31 dockercon2016–cool-hackssubmission
2016 may-31 dockercon2016–cool-hackssubmission
 
Resume
ResumeResume
Resume
 
Put oko Old School a #2 izdanje
Put oko Old School a #2 izdanjePut oko Old School a #2 izdanje
Put oko Old School a #2 izdanje
 
Comunicacion humana
Comunicacion humanaComunicacion humana
Comunicacion humana
 
Replacing social media follow buttons with and email form is bad branding.
Replacing social media follow buttons with and email form is bad branding.Replacing social media follow buttons with and email form is bad branding.
Replacing social media follow buttons with and email form is bad branding.
 
Euro python2016 logistics
Euro python2016 logisticsEuro python2016 logistics
Euro python2016 logistics
 
Las fuentes renovables
Las fuentes renovablesLas fuentes renovables
Las fuentes renovables
 
TAG! You're It!
TAG! You're It!TAG! You're It!
TAG! You're It!
 
FISHCANTSEEWATER
FISHCANTSEEWATERFISHCANTSEEWATER
FISHCANTSEEWATER
 
Container coneu2016 lab
Container coneu2016 labContainer coneu2016 lab
Container coneu2016 lab
 
Put oko Old School a - Dillimore
Put oko Old School a - DillimorePut oko Old School a - Dillimore
Put oko Old School a - Dillimore
 
Comunicacion humana
Comunicacion humanaComunicacion humana
Comunicacion humana
 
20150304 foodex
20150304 foodex20150304 foodex
20150304 foodex
 
JFC_itadakimasu出店説明会
JFC_itadakimasu出店説明会JFC_itadakimasu出店説明会
JFC_itadakimasu出店説明会
 
Akash sharma lo 2
Akash sharma lo 2Akash sharma lo 2
Akash sharma lo 2
 
Presentation jupyter foreverythingelse
Presentation jupyter foreverythingelsePresentation jupyter foreverythingelse
Presentation jupyter foreverythingelse
 
Glow Tech Lighting
Glow Tech LightingGlow Tech Lighting
Glow Tech Lighting
 
2017 jan-18 meetup-functional_python
2017 jan-18 meetup-functional_python2017 jan-18 meetup-functional_python
2017 jan-18 meetup-functional_python
 
Put oko Old School-a #2 izdanje
Put oko Old School-a #2 izdanjePut oko Old School-a #2 izdanje
Put oko Old School-a #2 izdanje
 

Ähnlich wie Using MongoDB and Python for data analysis

Mongodb intro
Mongodb introMongodb intro
Mongodb introchristkv
 
MongoDB at FrozenRails
MongoDB at FrozenRailsMongoDB at FrozenRails
MongoDB at FrozenRailsMike Dirolf
 
Barcelona MUG MongoDB + Hadoop Presentation
Barcelona MUG MongoDB + Hadoop PresentationBarcelona MUG MongoDB + Hadoop Presentation
Barcelona MUG MongoDB + Hadoop PresentationNorberto Leite
 
MongoDB using Grails plugin by puneet behl
MongoDB using Grails plugin by puneet behlMongoDB using Grails plugin by puneet behl
MongoDB using Grails plugin by puneet behlTO THE NEW | Technology
 
10gen MongoDB Video Presentation at WebGeek DevCup
10gen MongoDB Video Presentation at WebGeek DevCup10gen MongoDB Video Presentation at WebGeek DevCup
10gen MongoDB Video Presentation at WebGeek DevCupWebGeek Philippines
 
MongoDB Internals
MongoDB InternalsMongoDB Internals
MongoDB InternalsSiraj Memon
 
Using MongoDB For BigData in 20 Minutes
Using MongoDB For BigData in 20 MinutesUsing MongoDB For BigData in 20 Minutes
Using MongoDB For BigData in 20 MinutesAndrás Fehér
 
Webinar: Was ist neu in MongoDB 2.4
Webinar: Was ist neu in MongoDB 2.4Webinar: Was ist neu in MongoDB 2.4
Webinar: Was ist neu in MongoDB 2.4MongoDB
 
Mongo Bb - NoSQL tutorial
Mongo Bb - NoSQL tutorialMongo Bb - NoSQL tutorial
Mongo Bb - NoSQL tutorialMohan Rathour
 
MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)Uwe Printz
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBRaghunath A
 
DBVersity MongoDB Online Training Presentations
DBVersity MongoDB Online Training PresentationsDBVersity MongoDB Online Training Presentations
DBVersity MongoDB Online Training PresentationsSrinivas Mutyala
 
Fast querying indexing for performance (4)
Fast querying   indexing for performance (4)Fast querying   indexing for performance (4)
Fast querying indexing for performance (4)MongoDB
 
Indexing Strategies to Help You Scale
Indexing Strategies to Help You ScaleIndexing Strategies to Help You Scale
Indexing Strategies to Help You ScaleMongoDB
 
How to use MongoDB with CakePHP
How to use MongoDB with CakePHPHow to use MongoDB with CakePHP
How to use MongoDB with CakePHPichikaway
 
Introduction To MongoDB
Introduction To MongoDBIntroduction To MongoDB
Introduction To MongoDBElieHannouch
 

Ähnlich wie Using MongoDB and Python for data analysis (20)

Mongodb intro
Mongodb introMongodb intro
Mongodb intro
 
MongoDB_ppt.pptx
MongoDB_ppt.pptxMongoDB_ppt.pptx
MongoDB_ppt.pptx
 
Einführung in MongoDB
Einführung in MongoDBEinführung in MongoDB
Einführung in MongoDB
 
MongoDB at FrozenRails
MongoDB at FrozenRailsMongoDB at FrozenRails
MongoDB at FrozenRails
 
Barcelona MUG MongoDB + Hadoop Presentation
Barcelona MUG MongoDB + Hadoop PresentationBarcelona MUG MongoDB + Hadoop Presentation
Barcelona MUG MongoDB + Hadoop Presentation
 
MongoDB using Grails plugin by puneet behl
MongoDB using Grails plugin by puneet behlMongoDB using Grails plugin by puneet behl
MongoDB using Grails plugin by puneet behl
 
10gen MongoDB Video Presentation at WebGeek DevCup
10gen MongoDB Video Presentation at WebGeek DevCup10gen MongoDB Video Presentation at WebGeek DevCup
10gen MongoDB Video Presentation at WebGeek DevCup
 
MongoDB Internals
MongoDB InternalsMongoDB Internals
MongoDB Internals
 
Using MongoDB For BigData in 20 Minutes
Using MongoDB For BigData in 20 MinutesUsing MongoDB For BigData in 20 Minutes
Using MongoDB For BigData in 20 Minutes
 
Webinar: Was ist neu in MongoDB 2.4
Webinar: Was ist neu in MongoDB 2.4Webinar: Was ist neu in MongoDB 2.4
Webinar: Was ist neu in MongoDB 2.4
 
Mongo Bb - NoSQL tutorial
Mongo Bb - NoSQL tutorialMongo Bb - NoSQL tutorial
Mongo Bb - NoSQL tutorial
 
MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
DBVersity MongoDB Online Training Presentations
DBVersity MongoDB Online Training PresentationsDBVersity MongoDB Online Training Presentations
DBVersity MongoDB Online Training Presentations
 
Fast querying indexing for performance (4)
Fast querying   indexing for performance (4)Fast querying   indexing for performance (4)
Fast querying indexing for performance (4)
 
Indexing Strategies to Help You Scale
Indexing Strategies to Help You ScaleIndexing Strategies to Help You Scale
Indexing Strategies to Help You Scale
 
How to use MongoDB with CakePHP
How to use MongoDB with CakePHPHow to use MongoDB with CakePHP
How to use MongoDB with CakePHP
 
Couchbas for dummies
Couchbas for dummiesCouchbas for dummies
Couchbas for dummies
 
Introduction To MongoDB
Introduction To MongoDBIntroduction To MongoDB
Introduction To MongoDB
 
Drop acid
Drop acidDrop acid
Drop acid
 

Mehr von Michael Bright

Lightning talk unikernels
Lightning talk unikernelsLightning talk unikernels
Lightning talk unikernelsMichael Bright
 
2017 feb-10 snowcamp.io-unikernels
2017 feb-10 snowcamp.io-unikernels2017 feb-10 snowcamp.io-unikernels
2017 feb-10 snowcamp.io-unikernelsMichael Bright
 
2017 jan-29 devconf.cz-unikernels
2017 jan-29 devconf.cz-unikernels2017 jan-29 devconf.cz-unikernels
2017 jan-29 devconf.cz-unikernelsMichael Bright
 
2017 jan-19 meetup-unikernels
2017 jan-19 meetup-unikernels2017 jan-19 meetup-unikernels
2017 jan-19 meetup-unikernelsMichael Bright
 
2016 nov-16 grenoble-floss_tmux
2016 nov-16 grenoble-floss_tmux2016 nov-16 grenoble-floss_tmux
2016 nov-16 grenoble-floss_tmuxMichael Bright
 
2015 oct-17 pyconfr-pau_i_python_vers_jupyter
2015 oct-17 pyconfr-pau_i_python_vers_jupyter2015 oct-17 pyconfr-pau_i_python_vers_jupyter
2015 oct-17 pyconfr-pau_i_python_vers_jupyterMichael Bright
 
Container Con Europe 2016 - Container Orchestration: Which Conductor?
Container Con Europe 2016 - Container Orchestration: Which Conductor?Container Con Europe 2016 - Container Orchestration: Which Conductor?
Container Con Europe 2016 - Container Orchestration: Which Conductor?Michael Bright
 

Mehr von Michael Bright (7)

Lightning talk unikernels
Lightning talk unikernelsLightning talk unikernels
Lightning talk unikernels
 
2017 feb-10 snowcamp.io-unikernels
2017 feb-10 snowcamp.io-unikernels2017 feb-10 snowcamp.io-unikernels
2017 feb-10 snowcamp.io-unikernels
 
2017 jan-29 devconf.cz-unikernels
2017 jan-29 devconf.cz-unikernels2017 jan-29 devconf.cz-unikernels
2017 jan-29 devconf.cz-unikernels
 
2017 jan-19 meetup-unikernels
2017 jan-19 meetup-unikernels2017 jan-19 meetup-unikernels
2017 jan-19 meetup-unikernels
 
2016 nov-16 grenoble-floss_tmux
2016 nov-16 grenoble-floss_tmux2016 nov-16 grenoble-floss_tmux
2016 nov-16 grenoble-floss_tmux
 
2015 oct-17 pyconfr-pau_i_python_vers_jupyter
2015 oct-17 pyconfr-pau_i_python_vers_jupyter2015 oct-17 pyconfr-pau_i_python_vers_jupyter
2015 oct-17 pyconfr-pau_i_python_vers_jupyter
 
Container Con Europe 2016 - Container Orchestration: Which Conductor?
Container Con Europe 2016 - Container Orchestration: Which Conductor?Container Con Europe 2016 - Container Orchestration: Which Conductor?
Container Con Europe 2016 - Container Orchestration: Which Conductor?
 

Kürzlich hochgeladen

Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 

Kürzlich hochgeladen (20)

Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 

Using MongoDB and Python for data analysis

  • 1. Using MongoDB and Python Michael Bright Python User Group, Grenoble 23 Fevrier 2016
  • 3. Databases Relational Databases (RDBMS) “Les SGBDRs” (OLTP – Online Transaction Processing) • Based on relational database model invented by E.F. Codd (IBM) in 1970 • SQL (SEQUEL): expressive query language • Achieves performance increase through “vertical scaling” (higher performance node) NoSQL (No SQL, or “Not just SQL”?) existed before RDBMS, resurged out of a need for more Scalability for Web2.0 - Columnar, e.g. Cassandra, Hbase, Vertica - Document Oriented, e.g. CouchDB, MongoDB, RethinkDB - Graph, e.g. Neo4J - Key-value, e.g. CouchDB, Dynamo, Riak, Redis, Oracle NoSQL, MUMPS - Multi-model, e.g. Alchemy, ArangoDB • Achieves performance increase through “horizontal scaling” (by adding nodes) OLAP – Online Analytic Processing
  • 4. Databases for scalability The need for scalability requires distribution of processing making it more difficult to satisfy all criteria, according to the CAP Theorem (2000) - Consistency (Atomicity) - Availability (A request will always receive a response) - Partition Tolerance (if nodes become disconnected, the system continues to function) It is impossible to satisfy all three constraints (proved by MIT, 2002). RDBMS - favour Consistency and Availability but cannot scale horizontally NoSQL - favour Consistency and Partitioning or Availability and Partitioning
  • 5. Relational vs. NoSQL Relational databases provide - Rich query language (SQL) - Strong consistency - Secondary indices NoSQL provides - Flexibility - “dynamic schema” allows missing or extra fields, embedded structures - Scalability - Adding nodes allows to scale data size - Performance - Adding nodes allows to scale performance MongoDB combines both sets of advantages
  • 7. MongoDB – a Document-Oriented DB Open source DB on github with commercial support from MongoDB Inc (was 10gen). • Dynamic schema (schema-less) aids agile development • A document is an associative array (possibly more than 1-level deep), e.g. JSON Object • Uses BSON (binary JSON format) for serialization (easy to parse) • Binaries downloadable for Linux(es), Windows, OSX, Solaris • Drivers available in many languages • Provides an aggregation framework • Is the “M” in MEAN
  • 8. MongoDB – Performance • Written in C++ • It achieves high performance, especially on dynamic queries • Allows primary & secondary indices (main tuning element) • Is horizontally scalable across many nodes • Extensive use of memory-mapped files i.e. read-through write-through memory caching. • Provides high availability with Master/slave replication • Provides Sharding (horizontal partitioning of data across nodes) • Horizontal partitioning : complete documents (or rows) are stored on a node • Vertical partitioning: documents (or rows) are split across nodes
  • 10. MongoDB – Terminology RDBMS MongoDB Table, View ➜ Collection Row ➜ Document Index ➜ Index Join ➜ Embedded Document Foreign Key ➜ Reference Partition ➜ Shard
  • 11. MongoDB – Enterprise Products There are supported products for the enterprise • MongoDB Enterprise: for production workloads • MongoDB Compass: Data and Schema visualization tool • MongoDB Connector for BI: Allows users to visualize their MongoDB Enterprise data using existing relational business intelligence tools such as Tableau. Foreign data wrapper with PostgreSQL to provide a relational SQL view on MongoDB data
  • 12. MongoDB v3.2: new features • The default storage engine is now WiredTiger • Following acquisition of WiredTiger, Inc. • Introduced as an optional engine in MongoDB 3.0 • Has more granular concurrency control and native compression (lowering storage costs, increasing h/w utilization, throughput and providing more predictable performance) • Replication election enhancements • Config servers as replica sets • readConcern, and document validations. • OpsManager 2.0
  • 13. MongoDB v3.2: new aggregation features • $sample, $lookup • $indexStats • $filter,$slice,$arrayElemAt, $isArray, $concatArrays • Partial Indexes • Document Validation • 4 bit testing operators • 2 new Accumulators for $group • 10 new Arithmetic expression operators
  • 14. MongoDB components Components mongod - The database daemon process. mongos - Sharding controller. mongo - The database shell (uses interactive javascript). Utilities mongodump - MongoDB dump tool - for backups, snapshots, etc. mongorestore - MongoDB restore a dump mongoexport - Export a single collection to test (JSON, CSV) mongoimport - Import from JSON or CSV mongofiles - Utility for putting and getting files from MongoDB GridFS mongostat - Show performance statistics
  • 16. MongoDB drivers exist for many languages 12 official drivers, plus many community languages
  • 19. MongoDB shell > mongo MongoDB shell version: 3.2.3 connecting to: test > show dbs Money_UK 0.000GB aggregation_example 0.000GB local 0.000GB test 0.000GB > use Money_UK switched to db Money_UK > show collections CARD SAVING
  • 20. MongoDB shell > use test switched to db test > db.newcoll.insert({ 'field1': 'an example' }) WriteResult({ "nInserted" : 1 }) > db.newcoll.find().count() 1 > db.newcoll.find() { "_id" : ObjectId("56cc7446bb127b86163cf226"), "field1" : "an example" } > db.newcoll.findOne() { "_id" : ObjectId("56cc7446bb127b86163cf226"), "field1" : "an example" } > db.newcoll.find({'_id':ObjectId("56cc7446bb127b86163cf226")}) { "_id" : ObjectId("56cc7446bb127b86163cf226"), "field1" : "an example" }
  • 21. MongoDB shell > var doc2={'field1': 'example', 'field2': 'example2'} > db.newcoll.insert(doc2) WriteResult({ "nInserted" : 1 }) > db.newcoll.find().count() 2 > db.newcoll.find().pretty() { "_id" : ObjectId("56cc7446bb127b86163cf226"), "field1" : "an example" } { "_id" : ObjectId("56cc7711bb127b86163cf227"), "field1" : "example", "field2" : "example2" }
  • 22. MongoDB indexing Indexing is the single biggest tunable performance factor. Can index on a single field, e.g. on ascending or descending values: db.newcoll.ensureIndex( { ‘field1’: 1 } ) // or -1: descending Or subfields: db.newcoll.ensureIndex( { ‘arr.subfield’: 1 } ) Or multiple fields: db.newcoll.ensureIndex( { ‘arr.subfield’: 1, ‘field2’: -1 } )
  • 23. MongoDB indexing - 2 It is also possible to provide hints to the indexer, on uniqueness, sparseness or to request index creation as a background task, e.g. db.newcoll.ensureIndex( { ‘field1’: 1 }, { ‘unique’: true} ) db.newcoll.ensureIndex( { ‘field1’: 1 , ‘field2’: 1}, { ‘sparse’: true} ) db.newcoll.ensureIndex( { ‘field1’: 1 }, { ‘background’: true} )
  • 24. MongoDB searching We can search based on a single field db.newcoll.find( { ‘field1’: ‘any match’ } ) Or subfields: db.newcoll.find( { ‘arr.subfield’: ‘sub field match’ } ) Or multiple fields: db.newcoll.find( { ‘arr.subfield’: ‘sub field match’, ‘field2’: ‘match2’ } ) Available indices will be used if possible
  • 25. MongoDB sorting We can sort the results in ascending or descending order db.newcoll.find().sort( { ‘field1’: 1 } ) // or -1: descending or db.newcoll.find().sort( { ‘field1’: 1, ‘field2’: -1 } ) Available indices will be used if possible
  • 26. MongoDB projections We can reduce the number of returned fields (the projection). This may enable data access directly from the index. In this example we limit the returned fields to ‘field1’ (and the ‘_id’ index). db.newcoll.find({ ‘field1’: ‘example’ }, { ‘field1’: 1} ) In this example we limit the returned fields to ‘field1’ (without the ‘_id’ index). db.newcoll.find({ ‘field1’: ‘example’ }, { ‘_id’: 0, ‘field1’: 1} ) We can also override the default index db.newcoll.find({ ‘field1’: ‘example’ }, { ‘_id’: ‘field1’ } )
  • 27. MongoDB explain plan We can request explanation of how a query will be handled (showing possible index use) > db.newcoll.find().explain() { "queryPlanner" : { "plannerVersion" : 1, "namespace" : "test.newcoll", "indexFilterSet" : false, "parsedQuery" : { "$and" : [ ] }, "winningPlan" : { "stage" : "COLLSCAN", "filter" : { "$and" : [ ] }, "direction" : "forward" }, "rejectedPlans" : [ ] }, "serverInfo" : { "host" : "MJBRIGHT7", "port" : 27017, "version" : "3.2.3", "gitVersion" : "b326ba837cf6f49d65c2f85e1b70f6f31ece7937" }, "ok" : 1 }
  • 28. MongoDB special index types • Geospatial Indexes (2d Sphere) • Text Indexes • TTL Collections (expireAfterSeconds) • Hashed Indexes for sharding
  • 29. MongoDB aggregation framework Aggregation uses a pipeline of operations amongst • $match • $project • $unwind • $group • $sort
  • 30. MongoDB aggregation example pipeline = [ { "$project": { // Select fields of interest 'year': { "$dateToString": { 'format': "%Y", 'date': "$date" } }, 'tags': 1, 'value': 1, }, }, { "$match": { 'year': str(year) }}, { "$group": {"_id": "$tags", "total": {"$sum": {"$abs": "$value"}}} }, { "$sort": SON([("total", -1), ("_id", -1)]) } ] cursor = db.collection.aggregate(pipeline)
  • 31. MongoDB aggregation example pipeline = [ { "$project": { 'year': { "$dateToString": { 'format': "%Y", 'date': "$date" } }, 'tags': 1, 'value': 1, }, }, { "$match": { 'year': str(year) }}, // match on fields { "$group": {"_id": "$tags", "total": {"$sum": {"$abs": "$value"}}} }, { "$sort": SON([("total", -1), ("_id", -1)]) } ] cursor = db.collection.aggregate(pipeline)
  • 32. MongoDB aggregation example pipeline = [ { "$project": { 'year': { "$dateToString": { 'format': "%Y", 'date': "$date" } }, 'tags': 1, 'value': 1, }, }, { "$match": { 'year': str(year) }}, { "$group": {"_id": "$tags", "total": {"$sum": {"$abs": "$value"}}} }, // ‘reduce’ { "$sort": SON([("total", -1), ("_id", -1)]) } ] cursor = db.collection.aggregate(pipeline)
  • 33. MongoDB aggregation example pipeline = [ { "$project": { 'year': { "$dateToString": { 'format': "%Y", 'date': "$date" } }, 'tags': 1, 'value': 1, }, }, { "$match": { 'year': str(year) }}, { "$group": {"_id": "$tags", "total": {"$sum": {"$abs": "$value"}}} }, { "$sort": SON([("total", -1), ("_id", -1)]) } // sort the results ] cursor = db.collection.aggregate(pipeline)
  • 34. PyMongo – ca y est, on arrive ! But first …
  • 35. PyMongo – ca y est, on arrive ! … there are alternatives to the official PyMongo driver. • Motor is a full-featured, non-blocking MongoDB driver for Python Tornado applications. • TxMongo is an asynchronous Twisted Python driver for MongoDB.
  • 36. PyMongo – ca y est, on arrive ! … and other Python/MongoDB tools ORM-like Layers New users are recommended to begin with PyMongo. Nevertheless other implementations are available providing higher abstraction. Humongolus, MongoKit, Ming, MongoAlchemy, MongoEngine, Minimongo, Manga, MotorEngine Framework Tools This section lists tools and adapters that have been designed to work with various Python frameworks and libraries. Django MongoDB Engine, Mango, Django MongoEngine, mongodb_beaker, Log4Mongo, MongoLog, C5t, rod.recipe.mongodb, repoze-what-plugins-mongodb, Mongobox, Flask- MongoAlchemy, Flask-MongoKit, Flask-PyMongo.