Weitere ähnliche Inhalte
Ähnlich wie MongoDB Introduction and Data Modelling (20)
Kürzlich hochgeladen (20)
MongoDB Introduction and Data Modelling
- 1. © 2011 Xpanxion all rights reserved
GLOBAL SOFTWARE ENGINEERING EXCELLENCE
MongoDB
<Version 5.1>
17 April 2013
Internal
<Internal Restricted/Confidential(when filled) >
- Sachin Bhosale
- 2. © 2011 Xpanxion all rights reserved
The Evolution of Databases
2010
RDBMS
NoSQL
OLAP/BI
Hadoop
2000
RDBMS
OLAP/BI
1990
RDBMS
Operational
Data
Datawarehouse
- 3. © 2011 Xpanxion all rights reserved
Big Data
"Big Data" describes data sets so large and complex they are impractical
to manage with traditional software tools. Big Data relates to data
creation, storage, retrieval and analysis that is remarkable in terms
of volume, velocity, and variety.
Volume - A typical PC might have had 10 gigabytes of storage in 2000.
Today, Facebook ingests 500 terabytes of new data every day
Velocity - Clickstreams and ad impressions capture user behavior at
millions of events per second; high-frequency stock trading algorithms
reflect market changes within microseconds
Variety - Big Data data isn't just numbers, dates, and strings. Big Data
is also geospatial data, 3D data, audio and video, and unstructured
text, including log files and social media
- 4. © 2011 Xpanxion all rights reserved
Big Data Technologies
Operational Analytical
Latency 10 ms - 100 ms 1 min - 100 min
Concurrency 1000 - 100,000 1 - 10
Access Pattern Writes and Reads Reads
Queries Selective Unselective
Data Scope Operational Retrospective
End User Customer Data Scientist
Technology NoSQL MapReduce, MPP Database
- 5. © 2011 Xpanxion all rights reserved
Relational Database Challenges
Data Types
• Unstructured data
• Semi-structured data
• Polymorphic data
Volume of Data
• Petabytes of data
• Trillions of records
• Tens of millions of queries per second
Agile Development
• Iterative
• Short development cycles
• New workloads
New Architectures
• Horizontal scaling
• Commodity servers
• Cloud computing
- 6. © 2011 Xpanxion all rights reserved
NOSQL Categories
Redis Cassandra MongoDB Neo4j
- 8. © 2011 Xpanxion all rights reserved
What is MongoDB?
MongoDB is a ___________ database
Document
Open source
High performance
Horizontally scalable
Full featured
- 9. © 2011 Xpanxion all rights reserved
Document Database
Not for .PDF & .DOC files
A document is essentially an associative array
Document == JSON object
Document == PHP Array
Document == Python Dictionary
Document == Ruby Hash
etc
- 10. © 2011 Xpanxion all rights reserved
Open Source
MongoDB is an open source project
On GitHub
Licensed under the AGPL
Commercial licenses available
Started & sponsored by 10gen
- 11. © 2011 Xpanxion all rights reserved
High Performance
Written in C++
Extensive use of memory-mapped files
i.e. read-through write-through memory caching.
Runs nearly everywhere
Data serialized as BSON (fast parsing)
Full support for primary & secondary indexes
Document model = less work
- 13. © 2011 Xpanxion all rights reserved
Full Featured
Ad Hoc queries
Real time aggregation
Rich query capabilities
Traditionally consistent
Geospatial features
Support for most programming languages
JavaScript, Python, Ruby, PHP, Perl, Java, Scala, C#, C, C++
Flexible schema
- 14. © 2011 Xpanxion all rights reserved
MongoDB Installation
Get the MongoDB distributions by platform and version from
http://www.mongodb.org/downloads
MongoDB requires a data folder to store its files. The default location for
the MongoDB data directory is C:datadb (Windows) or /data/db (Linux)
Running MongoDB
Windows
C:mongodbbinmongod.exe --dbpath d:testdata
Linux
./bin/mongod --dbpath /data/mongodb
- 15. © 2011 Xpanxion all rights reserved
MongoDB Package Components - 1
Core Processes
mongod
mongos
mongo
Binary Import and Export Tools
mongodump
mongorestore
bsondump
Mongooplog
- 16. © 2011 Xpanxion all rights reserved
MongoDB Package Components - 2
Data Import and Export Tools
mongoimport
Mongoexport
Diagnostic Tools
mongostat
mongotop
mongosniff
Mongoperf
GridFS
mongofiles
- 17. © 2011 Xpanxion all rights reserved
Mongo Shell
vars / functions / data structs + types
Spidermonkey / V8
ObjectId("...")
new Date()
Object.bsonsize()
db["collection"].find/count/update
short-hand for collections
Doesn't require quoted keys
Don’t copy and paste too much
Embedded
Javascript
Interpreter
Global Functions
and Objects
MongoDB driver
Exposed
JSON-like stuff
- 19. © 2011 Xpanxion all rights reserved
Core MongoDB Operations (CRUD) - 1
CREATE
insert() - is the primary method to insert a document or documents
into a MongoDB collection
db.studs.insert({_id : 1, name : “Sachin”, score : 110})
save() - performs an insert if the document to save does not contain
the _id field
db.studs.save({name : “Sachin”, score : 110})
READ
find() - method returns a cursor that contains a number of documents
db.collection.find( <query>, <projection> )
findOne() - selects a single document from a collection and returns
that document
- 20. © 2011 Xpanxion all rights reserved
Core MongoDB Operations (CRUD) - 2
UPDATE
update() - method updates a single document, but by using the multi
option, update() can update all documents that match the query
criteria in the collection
Update Operators
Fields - $inc, $rename, $set, $unset
Array - $addToSet, $pop, $pullAll, $pull, $push
save() - performs a special type of update(), depending on the _id field
of the specified document
Examples
db.bios.update( { _id: 3}, {$unset: {birth: 1 } }, { multi: true } )
db.bios.update( { _id: 1}, {$set: {'contribs.1': 'ALGOL 58' } } )
- 21. © 2011 Xpanxion all rights reserved
Core MongoDB Operations (CRUD) - 3
DELETE
remove() - deletes documents from a collection.
db.collection.remove( <query>, <justOne> )
Remove All documents
db.bios.remove()
Remove a single document that matches a condition
db.bios.remove( { turing: true }, 1 )
- 22. © 2011 Xpanxion all rights reserved
Data Modeling
Data in MongoDB has a flexible schema.
Collections do not enforce document structure.
documents in the same collection do not need to have the same set of
fields or structure, and
common fields in a collection’s documents may hold different types of
data.
MongoDB does not support
Joins – on multiple collections
Transaction - across multiple documents
- 23. © 2011 Xpanxion all rights reserved
Data Modeling Considerations
Inherent properties and requirements of the application objects and the
relationships
MongoDB data models must also reflect
how data will grow and change over time, and
the kinds of queries your application will perform
These considerations and requirements force to make a number of multi-
factored decisions:
normalization and de-normalization
indexing strategy
representation of data in arrays in BSON
- 24. © 2011 Xpanxion all rights reserved
Data Modeling Decisions
Data modeling decisions involve determining how to structure the
documents to model the data effectively.
Embedding
To de-normalize data, store two related pieces of data in a single
document.
Referencing
To normalize data, store references between two documents to
indicate a relationship between the data represented in each
document.
Atomicity
MongoDB only provides atomic operations on the level of a single
document
- 25. © 2011 Xpanxion all rights reserved
Aggregation
MongoDB introduced the aggregation framework that provides a
powerful and flexible set of tools to use for many data aggregation tasks
without having to use map-reduce
While map-reduce is powerful, it is often more difficult than necessary for
many simple aggregation tasks, such as totaling or averaging field values.
db.collection.mapReduce()
Pipeline Operators and Indexes
$match, $sort, $limit, $skip, $project, $unwind, $group
db.articles.aggregate(
{ $project : {
author : 1,
tags : 1,
} },
{ $unwind : "$tags" },
{ $group : {
_id : { tags : "$tags" },
authors : { $addToSet : "$author" }
} }
)
- 26. © 2011 Xpanxion all rights reserved
Blog Project withMongoDB
Blogger with following functionality
Singup
New Post
Login
Logout
It uses Python, Pymongo drivers, MongoDB