Leverage all of your MySQL knowledge and experience to get up to speed quickly with MongoDB.
Presented at Percona Live London 2013 with Robert Hodges of Continuent.
Strategies for Landing an Oracle DBA Job as a Fresher
Use Your MySQL Knowledge to Become a MongoDB Guru
1. Use Your MySQL Knowledge to
Become a MongoDB Guru
Percona Live London 2013
Robert Hodges
CEO
Continuent
Tim Callaghan
VP/Engineering
Tokutek
®
Tuesday, November 12, 13
2. Our Companies
Robert Hodges
• CEO at Continuent
• Database nerd since 1982 starting with M204, RDBMS since
1990, NoSQL since 2012; designed Continuent Tungsten
•
Continuent offers clustering and replication for MySQL and
other fine DBMS types
Tim Callaghan
• VP/Engineering at Tokutek
• Long time database consumer (Oracle) and producer (VoltDB,
Tokutek)
•
Tokutek offers Fractal Tree indexes in MySQL (TokuDB) and
MongoDB (TokuMX)
®
Tuesday, November 12, 13
3. MongoDB -- The New MySQL
One Bad Thing about
MongoDB
One Good Thing about
MongoDB
®
Tuesday, November 12, 13
4. One Bad Thing about MongoDB
MySQL
> select * from table1 where column1 > column2;
> ... 5 row(s) returned
MongoDB
> db.collection1.find({$field1: {gt: $field2}});
> ReferenceError: $field2 is not defined
[current] MongoDB query language is
<field> <operator> <literal>
®
Tuesday, November 12, 13
5. One Good Thing about MongoDB
Robert’s “ease of use”
demo
®
Tuesday, November 12, 13
6. Today’s Question
How can you use your
MySQL knowledge to get
up to speed on MongoDB?
®
Tuesday, November 12, 13
8. How Do I Find Things in MongoDB?
mongod server
== mysqld
== MySQL schema
== MySQL table
~ Sort of like a MySQL row
!= MySQL column
database
collection
BSON document
key/value pair
key/value pair
key/value pair
BSON document...
8
®
Tuesday, November 12, 13
9. How Do I Create a Table and Insert Data?
Connect
# Ruby Code
MongoClient.new("localhost").
db("mydb").
Use database
collection("sample").
insert({"data" => "hello world"})
Choose
collection
Insert data to
materialize database
and collection
Primary key
generated
automatically
9
®
Tuesday, November 12, 13
10. How Do I Change the Schema?
# Ruby Code
MongoClient.new("localhost").
db("mydb").
collection("sample").
insert({"data" => "hello again!",
"author" => “robert”})
Just add
more data
10
®
Tuesday, November 12, 13
11. How Do I Validate Schema?
rs0:PRIMARY>
{ "_id" : 1,
{ "_id" : 2,
{ "_id" : 3,
db.samples.find()
"data" : "hello world" }
"daata" : "bye world” }
"data" : 26.44 }
Software bugs?
rs0:PRIMARY> show databases
local ! 2.0771484375GB
mydb! 7.9501953125GB
Typo from an
mydb1! 0.203125GB
early run
11
®
Tuesday, November 12, 13
12. How Do I Remove Data? (Part 1)
Drop a database
rs0:PRIMARY> db.dropDatabase()
{ "dropped" : "mydb", "ok" : 1 }
Drop a collection
rs0:PRIMARY> db.samples.drop()
true
Drop a column?
rs0:PRIMARY> db.foo.update(
{ author: { $exists: true }},
{ $unset: { author: 1 } },
false, true )
12
®
Tuesday, November 12, 13
13. How Do I Remove Data? (Part 2)
(Remove documents based on TTL index)
> db.samples.ensureIndex(
{"inserted": 1},
{"expireAfterSeconds": 60})
> db.samples.insert(
{"data": "hello world",
inserted: new Date()})
> db.table.count()
1
...
> db.table.count()
0
(Capped collections do same with space)
13
®
Tuesday, November 12, 13
14. How Does MongoDB Do Joins?
It Doesn’t!
(It is your job to denormalize or do
application level joins. This includes
thinking about storage.)
14
®
Tuesday, November 12, 13
16. How is My Data Stored, Logically?
MongoDB storage is very similar to MyISAM
secondary
index(es)
_id index
etc.
collection data (documents)
16
®
Tuesday, November 12, 13
17. How is My Data Stored, Physically?
But it does look different in the file system.
MyISAM
<db>/<table>.frm
<db>/<table>.myd
<db>/<table>.myi
MongoDB
<db1>.ns
<db1>.1 .. <db1>.n
<db2>.ns
<db2>.1 .. <db2>.n
• start MongoDB with “--directoryperdb” to put
files in database folders
• pro-tip : do this to gain IOPs by database
17
®
Tuesday, November 12, 13
18. How Much Memory Does It Use?
All of it!
18
®
Tuesday, November 12, 13
19. How does MongoDB Manage Memory?
• MyISAM
– key_cache_size determines index caching
– data is cached in Operating System buffers
• InnoDB
– innodb_buffer_pool_size determines index/data
caching
• MongoDB
– memory mapped files
– mongod grows to consume available RAM
– good : no knob
– bad : operating system is in charge of cache
– bad : available RAM may change over time
19
®
Tuesday, November 12, 13
20. How Will It Perform for My Workload?
• It depends...
– Determine your “working set”
o The portion of your data that clients access most often
o db.runCommand( { serverStatus: 1, workingSet: 1 } )
– If working set <= RAM
o Performance generally very good
o Be careful in high-concurrent-write use cases
– If working set >= RAM
o Likely IO bound
o Sharding to the rescue!
20
®
Tuesday, November 12, 13
21. How Can Schema Affect Working Set?
• Field names are stored with the document
– On disk and in memory
• Plan ahead, specially for large collections
BAD!
GOOD!
{ first_name: “Timothy”,
middle_initial: “M”,
last_name: “Callaghan”,
address_line_1: “555 Main Street”,
address_line_2: “Apt. 9” }
{ fn: “Timothy”,
mi: “M”,
ln: “Callaghan”,
al1: “555 Main Street”,
al2: “Apt. 9” }
21
®
Tuesday, November 12, 13
23. How Does the Query Optimizer Work?
• MySQL
– Optimizer find useable indexes for the query
– For each index, optimizer asks the storage engine
o What is the cardinality for the given keys?
o What is the estimated cost?
– The “best” plan is chosen and used for the query
• This occurs for every single query
23
®
Tuesday, November 12, 13
24. How Does the Query Optimizer Work?
• MongoDB
– All candidate indexes run the query in parallel
o “candidate” meaning it contains useful keys
– As matching results are found they are placed in a
shared buffer
– When one of the parallel runs completes, all
others are stopped
– This “plan” is used for future executions of the
same query
o Until the collection has 1,000 writes, mongod restarts, or
there is an index change to the collection
24
®
Tuesday, November 12, 13
25. A Simple Yet Elegant Solution?
• No more wrestling with the optimizer
• Hints are supported ($hint)
– Force a particular index
– http://docs.mongodb.org/manual/reference/
operator/meta/hint/
• Easier since MongoDB does not support joins
25
®
Tuesday, November 12, 13
27. MySQL Transactions and Isolation
InnoDB creates
MVCC view of data;
locks updated rows,
commits atomically
mysql> BEGIN;
...
mysql> INSERT INTO sample(data) VALUES
(“Hello world!”);
mysql> INSERT INTO sample(data) VALUES
(“Goodbye world!”);
...
mysql> COMMIT;
MyISAM locks table
and commits each
row immediately
27
®
Tuesday, November 12, 13
28. How Does MongoDB Implement Locking?
# Update data ranges of documents to
# show effects of database lock.
@col.update(
{key =>
Locks database
{"$gte" => first.to_s,
"$lt" => last.to_s}
},
{ "$set" =>
{ "data.x" => rand(@rows)}})
Test
Total Requests/Sec
Single thread updating single collection
Two threads updating two collections, same DB
Four threads updating two collections, same DB
Two threads updating two collections, different DBs
197
80 + 80 = 160
29+29+30+30 = 118
190 + 179 = 369
28
®
Tuesday, November 12, 13
29. How Does MongoDB Implement Isolation?
• MongoDB does not prevent threads from
seeing partially committed data
• Example: Index changes can result in “double
read” of data if query uses index while index
is changing
• Experiment: Construct a test to:
• Select from numeric index and count rows
• Simultaneously update index to shift lower
values past end of previous high value
29
®
Tuesday, November 12, 13
30. How Does MongoDB Implement Isolation?
# Select values.
count = 0
@col.find(“k1” =>
{"$gte" => 120000}).
each do |doc|
count += 1
end
puts "Count=#{count}"
# Run update to increase.
@col.update(
{"_id" =>
{"$exists" => true}},
{"$inc" =>
{“k1” => increment}},
{:multi => true})
Count=50000
Count=50000
Count=100000 <--Index shifts over tail
Count=50000
Count=50000
30
®
Tuesday, November 12, 13
32. Review of MySQL Replication
Master
Slave
Master-master
configuration
for fast failover
Relay
Log
Binlog
Relay
Log
set global
read_only=1;
Binlog
32
®
Tuesday, November 12, 13
33. How Does MongoDB Set Up Replication?
PRIMARY
Replication
SECONDARY
Heartbeat
Heartbeat
Replication
SECONDARY
33
®
Tuesday, November 12, 13
35. How Do Applications Connect?
# Connect to MongoDB replica set.
client = MongoReplicaSetClient.new(
['mongodb1', 'mongodb2', 'mongodb3'])
# Access a collection and add data
db = client.db("xacts")
col = db.collection("data")
col.insert({"data" => "hello world"})
35
®
Tuesday, November 12, 13
36. How Do You Read From a Slave?
# Connect to MongoDB replica set.
client = MongoReplicaSetClient.new(
['mongodb1', 'mongodb2', 'mongodb3'],
:slave_ok => true)
# Access a collection and select documents.
db = client.db("xacts")
col = db.collection("data")
col.find()
36
®
Tuesday, November 12, 13
37. Where’s the Binlog?
Find last document in
the OpLog
rs0:PRIMARY> use local
rs0:PRIMARY> db.oplog.rs.find().
... sort({ts:-1}).limit(1)
{ "ts" : Timestamp(1383980308, 1),
"h" : NumberLong("9112507265624716453"),
"v" : 2, "op" : "i", "ns" : "xacts.data",
"o" : {
"_id" :
ObjectId("527ddd116244f28f4592f6a8"),
"data" : "hello world!"
}
}
37
®
Tuesday, November 12, 13
38. How Do You Lock the DB to Back Up?
(= FLUSH TABLES WITH READ LOCK)
rs0:SECONDARY> db.fsyncLock()
{
! "info" : "now locked against writes, use
db.fsyncUnlock() to unlock",
! "seeAlso" : "http://dochub.mongodb.org/
core/fsynccommand",
! "ok" : 1
}
...
(tar or rsync data)
...
(= UNLOCK TABLES)
rs0:SECONDARY> db.fsyncUnlock()
{ "ok" : 1, "info" : "unlock completed" }
38
®
Tuesday, November 12, 13
39. How Do You Fail Over?
• Planned failover: update rs.config and save:
rs0:SECONDARY>
rs0:SECONDARY>
rs0:SECONDARY>
rs0:SECONDARY>
rs0:SECONDARY>
cfg = rs.conf()
cfg.members[0].priority = 1
cfg.members[1].priority = 1
cfg.members[2].priority = 2
rs.reconfig(cfg)
• Unplanned failover: kill or stop mongod
39
®
Tuesday, November 12, 13
41. How is Partitioning Like Sharding?
• MySQL partitioning breaks a table into <n>
tables
– “PARTITION” is actually a storage engine
• Tables can be partitioned by hash or range
– Hash = random distribution
– Range = user controlled distribution (date range)
• Helpful in “big data” use-cases
• Partitions can usually be dropped efficiently
– Unlike “delete from table1 where timeField <
’12/31/2012’;”
41
®
Tuesday, November 12, 13
42. How Does Partitioning Help Queries?
Partitioned big_table on dateCol by month.
select * from big_table where column1 = 5;
Aug-2013
Sep-2013
Oct-2013
Nov-2013
select * from big_table where dateCol = ’10/12/2013’;
42
®
Tuesday, November 12, 13
43. Can I Finally Scale My Workload Horizontally?
• MySQL partitioning is helpful, but is still
constrained to a single machine
• MongoDB supports cross-server sharding
– huge plus: it’s “in the box”
– MySQL fabric is bringing something, we’ll see
– Many other 3rd Party MySQL options exist
• Only shard the collections that require it
• Each MongoDB shard is a replica set (1
primary and 1+ secondaries)
43
®
Tuesday, November 12, 13
45. How Does Sharding Help Queries?
Sharded big_table on dateCol by month.
select * from big_table where column1 = 5;
Aug-2013
shard... shard1
Sep-2013
Oct-2013
Nov-2013
shard2
shard3
shard4
shard...
select * from big_table where dateCol = ’10/12/2013’;
45
®
Tuesday, November 12, 13
46. How Do I Pick a Shard Key?
• MongoDB shards on one or more fields
• Simple example
– “orders” collection (customerId and productId)
– 1: shard on customerId
o each order writes to a single shard
o reads by customer on single shard
o reads by product on entire cluster
– 2: shard on productId
o each order writes to several shards
o reads by customer on entire cluster
o reads by product on single shard
– 3: store everything twice and shard both ways
o worst case for writes
o best cast for reads (either is shingle shard)
46
®
Tuesday, November 12, 13
48. How Secure is It?
• basic username/password
• by database
• roles
– read = read any collection
– readWrite = read/write any collection
– dbAdmin = create index, create collection, rename
collection, etc.
48
®
Tuesday, November 12, 13
49. What About Advanced Security?
• Kerberos support in MongoDB Enterprise
Edition
• SSL is supported, but
– Note: The default distribution of MongoDB does
not contain support for SSL. To use SSL, you must
either build MongoDB locally passing the “--ssl”
option to scons or use MongoDB Enterprise.
49
®
Tuesday, November 12, 13
50. What Else is There to Learn?
• Tools - mongostat, mongo[export/
•
•
import], mongo[dump/restore]
Aggregation Framework
• Think SQL aggregate functionality
Map/Reduce
®
Tuesday, November 12, 13
52. Summary
We liked...
• Ease of install
• Ability to just “jump in”
Look [out] for...
• Query language (Tim says hang in there!)
• You have to think about storage and queries in advance
Highly Recommended Reading
• Karl Seguin’s “The Little MongoDB Book”
• http://openmymind.net/mongodb.pdf
• MongoDB’s “SQL to MongoDB Mapping Chart”
• http://docs.mongodb.org/manual/reference/sql-comparison/
®
Tuesday, November 12, 13