1. MongoDB
● What is it ?
● Features
● Tools
● Use with Hadoop
● Hadoop Tools
www.semtech-solutions.co.nz info@semtech-solutions.co.nz
2. MongoDB – What is it ?
● Document oriented NoSql database
● BSON schema data format ( Binary JSON )
● Released as open source / free
● Can be used as a distributed database
● Has load balancing
● Has replication
● Written in C++
● Licensed via Apache
www.semtech-solutions.co.nz info@semtech-solutions.co.nz
3. MongoDB – Features
● Queries
– By field
– By regular expression
– User defined java script functions
– By range
● Indexes
– Primary and secondary
– Any document field
● Replication
– Master can replicate to multiple slaves
www.semtech-solutions.co.nz info@semtech-solutions.co.nz
4. MongoDB – Features
● Load balancing
– Data split across multple shards
– DB scales using shards
– New machines can be added to running database
● Map reduce can be used for aggregation
● File storage via GridFS
– Load balanced file system
– File system with replication
– Functions available for file manipulation
www.semtech-solutions.co.nz info@semtech-solutions.co.nz
5. MongoDB – Tools
● Mongo – a db access shell and admin tool
● Mongostat – a status tool similar to vmstat
● Mongotop – top processes like Unix top command
● Mongosniff – low level traffic sniffing
● Mongoimport – import JSON, CSV, TSV plus others
● Mongoexport – export tool ( as import )
● Mongodump – dump database contents
● Mongostore – reload database dumps
www.semtech-solutions.co.nz info@semtech-solutions.co.nz
6. MongoDB – With Hadoop
● Hadoop connector available from github
● Allows Hadoop I/O
● Compiles with SBT build tool
● Supports Hadoop
– 0.20/0.20.x
– 1.0/1.0.x
– 1.1/1.1.x
– 0.21/0.21.x
– CDH3
– CDH4
www.semtech-solutions.co.nz info@semtech-solutions.co.nz
7. MongoDB – Attributes
The image on the left shows how Hadoop and its tools are used with
MongoDB via a connector. The image on the right shows MongoDB
attributes.
www.semtech-solutions.co.nz info@semtech-solutions.co.nz
8. MongoDB – Hadoop Tools
● The Hadoop connector supports
– Map Reduce
– Pig
– Hadoop streaming
– Flume
– Hive
– Hive BSON file access
● MongoDB can use HDFS for storage
www.semtech-solutions.co.nz info@semtech-solutions.co.nz
9. MongoDB – Architecture
● A db server
– has many databases
● A database
– Has many collections
● A collection
– Has many documents
www.semtech-solutions.co.nz info@semtech-solutions.co.nz
10. Contact Us
● Feel free to contact us at
– www.semtech-solutions.co.nz
– info@semtech-solutions.co.nz
● We offer IT project consultancy
● We are happy to hear about your problems
● You can just pay for those hours that you need
● To solve your problems