SlideShare ist ein Scribd-Unternehmen logo
1 von 53
Downloaden Sie, um offline zu lesen
Use Your MySQL Knowledge to
Become a MongoDB Guru
Percona Live London 2013

Robert Hodges
CEO
Continuent

Tim Callaghan
VP/Engineering
Tokutek
®

Tuesday, November 12, 13
Our Companies
Robert Hodges
• CEO at Continuent
• Database nerd since 1982 starting with M204, RDBMS since
1990, NoSQL since 2012; designed Continuent Tungsten

•

Continuent offers clustering and replication for MySQL and
other fine DBMS types

Tim Callaghan
• VP/Engineering at Tokutek
• Long time database consumer (Oracle) and producer (VoltDB,
Tokutek)

•

Tokutek offers Fractal Tree indexes in MySQL (TokuDB) and
MongoDB (TokuMX)
®

Tuesday, November 12, 13
MongoDB -- The New MySQL

One Bad Thing about
MongoDB
One Good Thing about
MongoDB
®

Tuesday, November 12, 13
One Bad Thing about MongoDB
MySQL
> select * from table1 where column1 > column2;
> ... 5 row(s) returned
MongoDB
> db.collection1.find({$field1: {gt: $field2}});
> ReferenceError: $field2 is not defined
[current] MongoDB query language is
<field> <operator> <literal>
®

Tuesday, November 12, 13
One Good Thing about MongoDB

Robert’s “ease of use”
demo

®

Tuesday, November 12, 13
Today’s Question

How can you use your
MySQL knowledge to get
up to speed on MongoDB?

®

Tuesday, November 12, 13
Topic:
Schema Design

®

Tuesday, November 12, 13
How Do I Find Things in MongoDB?

mongod server

== mysqld
== MySQL schema
== MySQL table
~ Sort of like a MySQL row
!= MySQL column

database
collection
BSON document
key/value pair
key/value pair
key/value pair

BSON document...

8
®

Tuesday, November 12, 13
How Do I Create a Table and Insert Data?
Connect
# Ruby Code

MongoClient.new("localhost").
db("mydb").
Use database
collection("sample").
insert({"data" => "hello world"})

Choose
collection

Insert data to
materialize database
and collection

Primary key
generated
automatically
9
®

Tuesday, November 12, 13
How Do I Change the Schema?

# Ruby Code

MongoClient.new("localhost").
db("mydb").
collection("sample").
insert({"data" => "hello again!",
"author" => “robert”})

Just add
more data
10
®

Tuesday, November 12, 13
How Do I Validate Schema?

rs0:PRIMARY>
{ "_id" : 1,
{ "_id" : 2,
{ "_id" : 3,

db.samples.find()
"data" : "hello world" }
"daata" : "bye world” }
"data" : 26.44 }

Software bugs?
rs0:PRIMARY> show databases
local ! 2.0771484375GB
mydb! 7.9501953125GB
Typo from an
mydb1! 0.203125GB

early run

11
®

Tuesday, November 12, 13
How Do I Remove Data? (Part 1)

Drop a database

rs0:PRIMARY> db.dropDatabase()
{ "dropped" : "mydb", "ok" : 1 }
Drop a collection

rs0:PRIMARY> db.samples.drop()
true
Drop a column?

rs0:PRIMARY> db.foo.update(
{ author: { $exists: true }},
{ $unset: { author: 1 } },
false, true )
12
®

Tuesday, November 12, 13
How Do I Remove Data? (Part 2)

(Remove documents based on TTL index)

> db.samples.ensureIndex(
{"inserted": 1},
{"expireAfterSeconds": 60})
> db.samples.insert(
{"data": "hello world",
inserted: new Date()})
> db.table.count()
1
...
> db.table.count()
0
(Capped collections do same with space)
13
®

Tuesday, November 12, 13
How Does MongoDB Do Joins?

It Doesn’t!
(It is your job to denormalize or do
application level joins. This includes
thinking about storage.)

14
®

Tuesday, November 12, 13
Topic:
Data Storage
and Organization

®

Tuesday, November 12, 13
How is My Data Stored, Logically?
MongoDB storage is very similar to MyISAM
secondary
index(es)

_id index

etc.

collection data (documents)
16
®

Tuesday, November 12, 13
How is My Data Stored, Physically?
But it does look different in the file system.
MyISAM
<db>/<table>.frm
<db>/<table>.myd
<db>/<table>.myi

MongoDB
<db1>.ns
<db1>.1 .. <db1>.n
<db2>.ns
<db2>.1 .. <db2>.n

• start MongoDB with “--directoryperdb” to put
files in database folders
• pro-tip : do this to gain IOPs by database
17
®

Tuesday, November 12, 13
How Much Memory Does It Use?

All of it!

18
®

Tuesday, November 12, 13
How does MongoDB Manage Memory?
• MyISAM
– key_cache_size determines index caching
– data is cached in Operating System buffers

• InnoDB
– innodb_buffer_pool_size determines index/data
caching

• MongoDB
– memory mapped files
– mongod grows to consume available RAM
– good : no knob
– bad : operating system is in charge of cache
– bad : available RAM may change over time
19
®

Tuesday, November 12, 13
How Will It Perform for My Workload?
• It depends...
– Determine your “working set”
o The portion of your data that clients access most often
o db.runCommand( { serverStatus: 1, workingSet: 1 } )

– If working set <= RAM
o Performance generally very good
o Be careful in high-concurrent-write use cases

– If working set >= RAM
o Likely IO bound
o Sharding to the rescue!

20
®

Tuesday, November 12, 13
How Can Schema Affect Working Set?
• Field names are stored with the document
– On disk and in memory

• Plan ahead, specially for large collections

BAD!

GOOD!

{ first_name: “Timothy”,
middle_initial: “M”,
last_name: “Callaghan”,
address_line_1: “555 Main Street”,
address_line_2: “Apt. 9” }

{ fn: “Timothy”,
mi: “M”,
ln: “Callaghan”,
al1: “555 Main Street”,
al2: “Apt. 9” }

21
®

Tuesday, November 12, 13
Topic:
Query Optimization

®

Tuesday, November 12, 13
How Does the Query Optimizer Work?
• MySQL
– Optimizer find useable indexes for the query
– For each index, optimizer asks the storage engine
o What is the cardinality for the given keys?
o What is the estimated cost?

– The “best” plan is chosen and used for the query

• This occurs for every single query

23
®

Tuesday, November 12, 13
How Does the Query Optimizer Work?
• MongoDB
– All candidate indexes run the query in parallel
o “candidate” meaning it contains useful keys

– As matching results are found they are placed in a
shared buffer
– When one of the parallel runs completes, all
others are stopped
– This “plan” is used for future executions of the
same query
o Until the collection has 1,000 writes, mongod restarts, or
there is an index change to the collection

24
®

Tuesday, November 12, 13
A Simple Yet Elegant Solution?
• No more wrestling with the optimizer
• Hints are supported ($hint)
– Force a particular index
– http://docs.mongodb.org/manual/reference/
operator/meta/hint/

• Easier since MongoDB does not support joins

25
®

Tuesday, November 12, 13
Topic:
Transactions

®

Tuesday, November 12, 13
MySQL Transactions and Isolation
InnoDB creates
MVCC view of data;
locks updated rows,
commits atomically

mysql> BEGIN;
...
mysql> INSERT INTO sample(data) VALUES
(“Hello world!”);
mysql> INSERT INTO sample(data) VALUES
(“Goodbye world!”);
...
mysql> COMMIT;
MyISAM locks table

and commits each
row immediately
27
®

Tuesday, November 12, 13
How Does MongoDB Implement Locking?
# Update data ranges of documents to
# show effects of database lock.
@col.update(
{key =>
Locks database
{"$gte" => first.to_s,
"$lt" => last.to_s}
},
{ "$set" =>
{ "data.x" => rand(@rows)}})
Test

Total Requests/Sec

Single thread updating single collection
Two threads updating two collections, same DB
Four threads updating two collections, same DB
Two threads updating two collections, different DBs

197
80 + 80 = 160
29+29+30+30 = 118
190 + 179 = 369

28
®

Tuesday, November 12, 13
How Does MongoDB Implement Isolation?
• MongoDB does not prevent threads from
seeing partially committed data
• Example: Index changes can result in “double
read” of data if query uses index while index
is changing
• Experiment: Construct a test to:
• Select from numeric index and count rows
• Simultaneously update index to shift lower
values past end of previous high value
29
®

Tuesday, November 12, 13
How Does MongoDB Implement Isolation?

# Select values.
count = 0
@col.find(“k1” =>
{"$gte" => 120000}).
each do |doc|
count += 1
end
puts "Count=#{count}"

# Run update to increase.
@col.update(
{"_id" =>
{"$exists" => true}},
{"$inc" =>
{“k1” => increment}},
{:multi => true})

Count=50000
Count=50000
Count=100000 <--Index shifts over tail
Count=50000
Count=50000
30
®

Tuesday, November 12, 13
Topic:
Replication and HA

®

Tuesday, November 12, 13
Review of MySQL Replication

Master

Slave
Master-master
configuration
for fast failover

Relay
Log

Binlog

Relay
Log

set global
read_only=1;

Binlog

32
®

Tuesday, November 12, 13
How Does MongoDB Set Up Replication?

PRIMARY

Replication

SECONDARY

Heartbeat

Heartbeat

Replication

SECONDARY

33
®

Tuesday, November 12, 13
Where Is The Replica Set Defined?
$ mongo localhost
...
# rs0:PRIMARY> rs.config()
{
!
"_id" : "rs0",
!
"version" : 8,
!
"members" : [
!
! {
!
! ! "_id" : 0,
!
! ! "host" : "mongodb1:27017"
!
! },
!
! {
!
! ! "_id" : 1,
!
! ! "host" : "mongodb2:27017"
!
! },
!
! {
!
! ! "_id" : 2,
!
! ! "host" : "mongodb3:27017”
!
! }
!
]
}
34
®

Tuesday, November 12, 13
How Do Applications Connect?

# Connect to MongoDB replica set.
client = MongoReplicaSetClient.new(
['mongodb1', 'mongodb2', 'mongodb3'])
# Access a collection and add data
db = client.db("xacts")
col = db.collection("data")
col.insert({"data" => "hello world"})

35
®

Tuesday, November 12, 13
How Do You Read From a Slave?

# Connect to MongoDB replica set.
client = MongoReplicaSetClient.new(
['mongodb1', 'mongodb2', 'mongodb3'],
:slave_ok => true)
# Access a collection and select documents.
db = client.db("xacts")
col = db.collection("data")
col.find()

36
®

Tuesday, November 12, 13
Where’s the Binlog?
Find last document in
the OpLog
rs0:PRIMARY> use local
rs0:PRIMARY> db.oplog.rs.find().
... sort({ts:-1}).limit(1)
{ "ts" : Timestamp(1383980308, 1),
"h" : NumberLong("9112507265624716453"),
"v" : 2, "op" : "i", "ns" : "xacts.data",
"o" : {
"_id" :
ObjectId("527ddd116244f28f4592f6a8"),
"data" : "hello world!"
}
}
37
®

Tuesday, November 12, 13
How Do You Lock the DB to Back Up?
(= FLUSH TABLES WITH READ LOCK)
rs0:SECONDARY> db.fsyncLock()
{
! "info" : "now locked against writes, use
db.fsyncUnlock() to unlock",
! "seeAlso" : "http://dochub.mongodb.org/
core/fsynccommand",
! "ok" : 1
}
...

(tar or rsync data)
...

(= UNLOCK TABLES)
rs0:SECONDARY> db.fsyncUnlock()
{ "ok" : 1, "info" : "unlock completed" }
38
®

Tuesday, November 12, 13
How Do You Fail Over?

• Planned failover: update rs.config and save:
rs0:SECONDARY>
rs0:SECONDARY>
rs0:SECONDARY>
rs0:SECONDARY>
rs0:SECONDARY>

cfg = rs.conf()
cfg.members[0].priority = 1
cfg.members[1].priority = 1
cfg.members[2].priority = 2
rs.reconfig(cfg)

• Unplanned failover: kill or stop mongod

39
®

Tuesday, November 12, 13
Topic:
Sharding

®

Tuesday, November 12, 13
How is Partitioning Like Sharding?
• MySQL partitioning breaks a table into <n>
tables
– “PARTITION” is actually a storage engine

• Tables can be partitioned by hash or range
– Hash = random distribution
– Range = user controlled distribution (date range)

• Helpful in “big data” use-cases
• Partitions can usually be dropped efficiently
– Unlike “delete from table1 where timeField <
’12/31/2012’;”

41
®

Tuesday, November 12, 13
How Does Partitioning Help Queries?
Partitioned big_table on dateCol by month.
select * from big_table where column1 = 5;

Aug-2013

Sep-2013

Oct-2013

Nov-2013

select * from big_table where dateCol = ’10/12/2013’;

42
®

Tuesday, November 12, 13
Can I Finally Scale My Workload Horizontally?
• MySQL partitioning is helpful, but is still
constrained to a single machine
• MongoDB supports cross-server sharding
– huge plus: it’s “in the box”
– MySQL fabric is bringing something, we’ll see
– Many other 3rd Party MySQL options exist

• Only shard the collections that require it
• Each MongoDB shard is a replica set (1
primary and 1+ secondaries)

43
®

Tuesday, November 12, 13
What Does MongoDB Sharding Look Like?

Master
client app1

client app2

Master

...
mongosn

shard1

Slave

shard2

Slave

shardn

mongos1

...

Slave

...
Master

44
®

Tuesday, November 12, 13
How Does Sharding Help Queries?
Sharded big_table on dateCol by month.
select * from big_table where column1 = 5;

Aug-2013
shard... shard1

Sep-2013

Oct-2013

Nov-2013

shard2

shard3

shard4

shard...

select * from big_table where dateCol = ’10/12/2013’;

45
®

Tuesday, November 12, 13
How Do I Pick a Shard Key?
• MongoDB shards on one or more fields
• Simple example
– “orders” collection (customerId and productId)
– 1: shard on customerId
o each order writes to a single shard
o reads by customer on single shard
o reads by product on entire cluster

– 2: shard on productId
o each order writes to several shards
o reads by customer on entire cluster
o reads by product on single shard

– 3: store everything twice and shard both ways
o worst case for writes
o best cast for reads (either is shingle shard)
46
®

Tuesday, November 12, 13
Topic:
Security

®

Tuesday, November 12, 13
How Secure is It?
• basic username/password
• by database
• roles
– read = read any collection
– readWrite = read/write any collection
– dbAdmin = create index, create collection, rename
collection, etc.

48
®

Tuesday, November 12, 13
What About Advanced Security?
• Kerberos support in MongoDB Enterprise
Edition
• SSL is supported, but
– Note: The default distribution of MongoDB does
not contain support for SSL. To use SSL, you must
either build MongoDB locally passing the “--ssl”
option to scons or use MongoDB Enterprise.

49
®

Tuesday, November 12, 13
What Else is There to Learn?

• Tools - mongostat, mongo[export/
•
•

import], mongo[dump/restore]
Aggregation Framework
• Think SQL aggregate functionality
Map/Reduce

®

Tuesday, November 12, 13
What Should You Do?

®

Tuesday, November 12, 13
Summary
We liked...
• Ease of install
• Ability to just “jump in”
Look [out] for...
• Query language (Tim says hang in there!)
• You have to think about storage and queries in advance
Highly Recommended Reading
• Karl Seguin’s “The Little MongoDB Book”
• http://openmymind.net/mongodb.pdf
• MongoDB’s “SQL to MongoDB Mapping Chart”
• http://docs.mongodb.org/manual/reference/sql-comparison/
®

Tuesday, November 12, 13
Questions?

Robert Hodges
CEO, Continuent
robert.hodges@continuent.com
@continuent

Tim Callaghan
VP/Engineering, Tokutek
tim@tokutek.com
@tmcallaghan

®

Tuesday, November 12, 13

Weitere ähnliche Inhalte

Was ist angesagt?

Create manula and automaticly database
Create manula and automaticly databaseCreate manula and automaticly database
Create manula and automaticly database
Anar Godjaev
 
Optimizing Slow Queries with Indexes and Creativity
Optimizing Slow Queries with Indexes and CreativityOptimizing Slow Queries with Indexes and Creativity
Optimizing Slow Queries with Indexes and Creativity
MongoDB
 
The data model is dead, long live the data model
The data model is dead, long live the data modelThe data model is dead, long live the data model
The data model is dead, long live the data model
Patrick McFadin
 
The world's next top data model
The world's next top data modelThe world's next top data model
The world's next top data model
Patrick McFadin
 
Los Angeles R users group - Dec 14 2010 - Part 2
Los Angeles R users group - Dec 14 2010 - Part 2Los Angeles R users group - Dec 14 2010 - Part 2
Los Angeles R users group - Dec 14 2010 - Part 2
rusersla
 

Was ist angesagt? (20)

Advanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMXAdvanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMX
 
MongoDB-SESSION03
MongoDB-SESSION03MongoDB-SESSION03
MongoDB-SESSION03
 
Create manula and automaticly database
Create manula and automaticly databaseCreate manula and automaticly database
Create manula and automaticly database
 
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...
 
Tiered storage intro. By Robert Hodges, Altinity CEO
Tiered storage intro. By Robert Hodges, Altinity CEOTiered storage intro. By Robert Hodges, Altinity CEO
Tiered storage intro. By Robert Hodges, Altinity CEO
 
Installing postgres & postgis
Installing postgres & postgisInstalling postgres & postgis
Installing postgres & postgis
 
Developing and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDWDeveloping and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDW
 
Optimizing Slow Queries with Indexes and Creativity
Optimizing Slow Queries with Indexes and CreativityOptimizing Slow Queries with Indexes and Creativity
Optimizing Slow Queries with Indexes and Creativity
 
The data model is dead, long live the data model
The data model is dead, long live the data modelThe data model is dead, long live the data model
The data model is dead, long live the data model
 
Datacon LA - MySQL without the SQL - Oh my!
Datacon LA - MySQL without the SQL - Oh my! Datacon LA - MySQL without the SQL - Oh my!
Datacon LA - MySQL without the SQL - Oh my!
 
MariaDB and Clickhouse Percona Live 2019 talk
MariaDB and Clickhouse Percona Live 2019 talkMariaDB and Clickhouse Percona Live 2019 talk
MariaDB and Clickhouse Percona Live 2019 talk
 
Cassandra 3.0 - JSON at scale - StampedeCon 2015
Cassandra 3.0 - JSON at scale - StampedeCon 2015Cassandra 3.0 - JSON at scale - StampedeCon 2015
Cassandra 3.0 - JSON at scale - StampedeCon 2015
 
How to teach an elephant to rock'n'roll
How to teach an elephant to rock'n'rollHow to teach an elephant to rock'n'roll
How to teach an elephant to rock'n'roll
 
The world's next top data model
The world's next top data modelThe world's next top data model
The world's next top data model
 
Los Angeles R users group - Dec 14 2010 - Part 2
Los Angeles R users group - Dec 14 2010 - Part 2Los Angeles R users group - Dec 14 2010 - Part 2
Los Angeles R users group - Dec 14 2010 - Part 2
 
15 Ways to Kill Your Mysql Application Performance
15 Ways to Kill Your Mysql Application Performance15 Ways to Kill Your Mysql Application Performance
15 Ways to Kill Your Mysql Application Performance
 
DataStax: An Introduction to DataStax Enterprise Search
DataStax: An Introduction to DataStax Enterprise SearchDataStax: An Introduction to DataStax Enterprise Search
DataStax: An Introduction to DataStax Enterprise Search
 
Cassandra Community Webinar | In Case of Emergency Break Glass
Cassandra Community Webinar | In Case of Emergency Break GlassCassandra Community Webinar | In Case of Emergency Break Glass
Cassandra Community Webinar | In Case of Emergency Break Glass
 
Percona xtra db cluster(pxc) non blocking operations, what you need to know t...
Percona xtra db cluster(pxc) non blocking operations, what you need to know t...Percona xtra db cluster(pxc) non blocking operations, what you need to know t...
Percona xtra db cluster(pxc) non blocking operations, what you need to know t...
 
TimesTen in memory database Creation
TimesTen in memory database Creation TimesTen in memory database Creation
TimesTen in memory database Creation
 

Ähnlich wie Use Your MySQL Knowledge to Become a MongoDB Guru

171_74_216_Module_5-Non_relational_database_-mongodb.pptx
171_74_216_Module_5-Non_relational_database_-mongodb.pptx171_74_216_Module_5-Non_relational_database_-mongodb.pptx
171_74_216_Module_5-Non_relational_database_-mongodb.pptx
sukrithlal008
 
NoSQL Solutions - a comparative study
NoSQL Solutions - a comparative studyNoSQL Solutions - a comparative study
NoSQL Solutions - a comparative study
Guillaume Lefranc
 
ComputeFest 2012: Intro To R for Physical Sciences
ComputeFest 2012: Intro To R for Physical SciencesComputeFest 2012: Intro To R for Physical Sciences
ComputeFest 2012: Intro To R for Physical Sciences
alexstorer
 
Introduction tomongodb
Introduction tomongodbIntroduction tomongodb
Introduction tomongodb
Lee Theobald
 

Ähnlich wie Use Your MySQL Knowledge to Become a MongoDB Guru (20)

Fractal Tree Indexes : From Theory to Practice
Fractal Tree Indexes : From Theory to PracticeFractal Tree Indexes : From Theory to Practice
Fractal Tree Indexes : From Theory to Practice
 
Redis the better NoSQL
Redis the better NoSQLRedis the better NoSQL
Redis the better NoSQL
 
Let your DBAs get some REST(api)
Let your DBAs get some REST(api)Let your DBAs get some REST(api)
Let your DBAs get some REST(api)
 
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, Heroku
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, HerokuPostgres & Redis Sitting in a Tree- Rimas Silkaitis, Heroku
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, Heroku
 
Mongodb workshop
Mongodb workshopMongodb workshop
Mongodb workshop
 
Scala in hulu's data platform
Scala in hulu's data platformScala in hulu's data platform
Scala in hulu's data platform
 
M|18 How Facebook Migrated to MyRocks
M|18 How Facebook Migrated to MyRocksM|18 How Facebook Migrated to MyRocks
M|18 How Facebook Migrated to MyRocks
 
The Basics of MongoDB
The Basics of MongoDBThe Basics of MongoDB
The Basics of MongoDB
 
Introduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataIntroduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big Data
 
171_74_216_Module_5-Non_relational_database_-mongodb.pptx
171_74_216_Module_5-Non_relational_database_-mongodb.pptx171_74_216_Module_5-Non_relational_database_-mongodb.pptx
171_74_216_Module_5-Non_relational_database_-mongodb.pptx
 
NoSQL Solutions - a comparative study
NoSQL Solutions - a comparative studyNoSQL Solutions - a comparative study
NoSQL Solutions - a comparative study
 
ComputeFest 2012: Intro To R for Physical Sciences
ComputeFest 2012: Intro To R for Physical SciencesComputeFest 2012: Intro To R for Physical Sciences
ComputeFest 2012: Intro To R for Physical Sciences
 
Introduction tomongodb
Introduction tomongodbIntroduction tomongodb
Introduction tomongodb
 
MySQL 开发
MySQL 开发MySQL 开发
MySQL 开发
 
Gur1009
Gur1009Gur1009
Gur1009
 
NoSQL Infrastructure - Late 2013
NoSQL Infrastructure - Late 2013NoSQL Infrastructure - Late 2013
NoSQL Infrastructure - Late 2013
 
What's New in the PHP Driver
What's New in the PHP DriverWhat's New in the PHP Driver
What's New in the PHP Driver
 
NoSQL Infrastructure
NoSQL InfrastructureNoSQL Infrastructure
NoSQL Infrastructure
 
Making MySQL Agile-ish
Making MySQL Agile-ishMaking MySQL Agile-ish
Making MySQL Agile-ish
 
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
 

Mehr von Tim Callaghan

Mehr von Tim Callaghan (11)

Is It Fast? : Measuring MongoDB Performance
Is It Fast? : Measuring MongoDB PerformanceIs It Fast? : Measuring MongoDB Performance
Is It Fast? : Measuring MongoDB Performance
 
Benchmarking MongoDB for Fame and Fortune
Benchmarking MongoDB for Fame and FortuneBenchmarking MongoDB for Fame and Fortune
Benchmarking MongoDB for Fame and Fortune
 
So you want to be a software developer? (version 2.0)
So you want to be a software developer? (version 2.0)So you want to be a software developer? (version 2.0)
So you want to be a software developer? (version 2.0)
 
Performance Benchmarking: Tips, Tricks, and Lessons Learned
Performance Benchmarking: Tips, Tricks, and Lessons LearnedPerformance Benchmarking: Tips, Tricks, and Lessons Learned
Performance Benchmarking: Tips, Tricks, and Lessons Learned
 
Introduction to TokuDB v7.5 and Read Free Replication
Introduction to TokuDB v7.5 and Read Free ReplicationIntroduction to TokuDB v7.5 and Read Free Replication
Introduction to TokuDB v7.5 and Read Free Replication
 
Use Your MySQL Knowledge to Become an Instant Cassandra Guru
Use Your MySQL Knowledge to Become an Instant Cassandra GuruUse Your MySQL Knowledge to Become an Instant Cassandra Guru
Use Your MySQL Knowledge to Become an Instant Cassandra Guru
 
5 Pitfalls to Avoid with MongoDB
5 Pitfalls to Avoid with MongoDB5 Pitfalls to Avoid with MongoDB
5 Pitfalls to Avoid with MongoDB
 
Get More Out of MySQL with TokuDB
Get More Out of MySQL with TokuDBGet More Out of MySQL with TokuDB
Get More Out of MySQL with TokuDB
 
Get More Out of MongoDB with TokuMX
Get More Out of MongoDB with TokuMXGet More Out of MongoDB with TokuMX
Get More Out of MongoDB with TokuMX
 
Creating a Benchmarking Infrastructure That Just Works
Creating a Benchmarking Infrastructure That Just WorksCreating a Benchmarking Infrastructure That Just Works
Creating a Benchmarking Infrastructure That Just Works
 
VoltDB : A Technical Overview
VoltDB : A Technical OverviewVoltDB : A Technical Overview
VoltDB : A Technical Overview
 

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

Use Your MySQL Knowledge to Become a MongoDB Guru

  • 1. Use Your MySQL Knowledge to Become a MongoDB Guru Percona Live London 2013 Robert Hodges CEO Continuent Tim Callaghan VP/Engineering Tokutek ® Tuesday, November 12, 13
  • 2. Our Companies Robert Hodges • CEO at Continuent • Database nerd since 1982 starting with M204, RDBMS since 1990, NoSQL since 2012; designed Continuent Tungsten • Continuent offers clustering and replication for MySQL and other fine DBMS types Tim Callaghan • VP/Engineering at Tokutek • Long time database consumer (Oracle) and producer (VoltDB, Tokutek) • Tokutek offers Fractal Tree indexes in MySQL (TokuDB) and MongoDB (TokuMX) ® Tuesday, November 12, 13
  • 3. MongoDB -- The New MySQL One Bad Thing about MongoDB One Good Thing about MongoDB ® Tuesday, November 12, 13
  • 4. One Bad Thing about MongoDB MySQL > select * from table1 where column1 > column2; > ... 5 row(s) returned MongoDB > db.collection1.find({$field1: {gt: $field2}}); > ReferenceError: $field2 is not defined [current] MongoDB query language is <field> <operator> <literal> ® Tuesday, November 12, 13
  • 5. One Good Thing about MongoDB Robert’s “ease of use” demo ® Tuesday, November 12, 13
  • 6. Today’s Question How can you use your MySQL knowledge to get up to speed on MongoDB? ® Tuesday, November 12, 13
  • 8. How Do I Find Things in MongoDB? mongod server == mysqld == MySQL schema == MySQL table ~ Sort of like a MySQL row != MySQL column database collection BSON document key/value pair key/value pair key/value pair BSON document... 8 ® Tuesday, November 12, 13
  • 9. How Do I Create a Table and Insert Data? Connect # Ruby Code MongoClient.new("localhost"). db("mydb"). Use database collection("sample"). insert({"data" => "hello world"}) Choose collection Insert data to materialize database and collection Primary key generated automatically 9 ® Tuesday, November 12, 13
  • 10. How Do I Change the Schema? # Ruby Code MongoClient.new("localhost"). db("mydb"). collection("sample"). insert({"data" => "hello again!", "author" => “robert”}) Just add more data 10 ® Tuesday, November 12, 13
  • 11. How Do I Validate Schema? rs0:PRIMARY> { "_id" : 1, { "_id" : 2, { "_id" : 3, db.samples.find() "data" : "hello world" } "daata" : "bye world” } "data" : 26.44 } Software bugs? rs0:PRIMARY> show databases local ! 2.0771484375GB mydb! 7.9501953125GB Typo from an mydb1! 0.203125GB early run 11 ® Tuesday, November 12, 13
  • 12. How Do I Remove Data? (Part 1) Drop a database rs0:PRIMARY> db.dropDatabase() { "dropped" : "mydb", "ok" : 1 } Drop a collection rs0:PRIMARY> db.samples.drop() true Drop a column? rs0:PRIMARY> db.foo.update( { author: { $exists: true }}, { $unset: { author: 1 } }, false, true ) 12 ® Tuesday, November 12, 13
  • 13. How Do I Remove Data? (Part 2) (Remove documents based on TTL index) > db.samples.ensureIndex( {"inserted": 1}, {"expireAfterSeconds": 60}) > db.samples.insert( {"data": "hello world", inserted: new Date()}) > db.table.count() 1 ... > db.table.count() 0 (Capped collections do same with space) 13 ® Tuesday, November 12, 13
  • 14. How Does MongoDB Do Joins? It Doesn’t! (It is your job to denormalize or do application level joins. This includes thinking about storage.) 14 ® Tuesday, November 12, 13
  • 16. How is My Data Stored, Logically? MongoDB storage is very similar to MyISAM secondary index(es) _id index etc. collection data (documents) 16 ® Tuesday, November 12, 13
  • 17. How is My Data Stored, Physically? But it does look different in the file system. MyISAM <db>/<table>.frm <db>/<table>.myd <db>/<table>.myi MongoDB <db1>.ns <db1>.1 .. <db1>.n <db2>.ns <db2>.1 .. <db2>.n • start MongoDB with “--directoryperdb” to put files in database folders • pro-tip : do this to gain IOPs by database 17 ® Tuesday, November 12, 13
  • 18. How Much Memory Does It Use? All of it! 18 ® Tuesday, November 12, 13
  • 19. How does MongoDB Manage Memory? • MyISAM – key_cache_size determines index caching – data is cached in Operating System buffers • InnoDB – innodb_buffer_pool_size determines index/data caching • MongoDB – memory mapped files – mongod grows to consume available RAM – good : no knob – bad : operating system is in charge of cache – bad : available RAM may change over time 19 ® Tuesday, November 12, 13
  • 20. How Will It Perform for My Workload? • It depends... – Determine your “working set” o The portion of your data that clients access most often o db.runCommand( { serverStatus: 1, workingSet: 1 } ) – If working set <= RAM o Performance generally very good o Be careful in high-concurrent-write use cases – If working set >= RAM o Likely IO bound o Sharding to the rescue! 20 ® Tuesday, November 12, 13
  • 21. How Can Schema Affect Working Set? • Field names are stored with the document – On disk and in memory • Plan ahead, specially for large collections BAD! GOOD! { first_name: “Timothy”, middle_initial: “M”, last_name: “Callaghan”, address_line_1: “555 Main Street”, address_line_2: “Apt. 9” } { fn: “Timothy”, mi: “M”, ln: “Callaghan”, al1: “555 Main Street”, al2: “Apt. 9” } 21 ® Tuesday, November 12, 13
  • 23. How Does the Query Optimizer Work? • MySQL – Optimizer find useable indexes for the query – For each index, optimizer asks the storage engine o What is the cardinality for the given keys? o What is the estimated cost? – The “best” plan is chosen and used for the query • This occurs for every single query 23 ® Tuesday, November 12, 13
  • 24. How Does the Query Optimizer Work? • MongoDB – All candidate indexes run the query in parallel o “candidate” meaning it contains useful keys – As matching results are found they are placed in a shared buffer – When one of the parallel runs completes, all others are stopped – This “plan” is used for future executions of the same query o Until the collection has 1,000 writes, mongod restarts, or there is an index change to the collection 24 ® Tuesday, November 12, 13
  • 25. A Simple Yet Elegant Solution? • No more wrestling with the optimizer • Hints are supported ($hint) – Force a particular index – http://docs.mongodb.org/manual/reference/ operator/meta/hint/ • Easier since MongoDB does not support joins 25 ® Tuesday, November 12, 13
  • 27. MySQL Transactions and Isolation InnoDB creates MVCC view of data; locks updated rows, commits atomically mysql> BEGIN; ... mysql> INSERT INTO sample(data) VALUES (“Hello world!”); mysql> INSERT INTO sample(data) VALUES (“Goodbye world!”); ... mysql> COMMIT; MyISAM locks table and commits each row immediately 27 ® Tuesday, November 12, 13
  • 28. How Does MongoDB Implement Locking? # Update data ranges of documents to # show effects of database lock. @col.update( {key => Locks database {"$gte" => first.to_s, "$lt" => last.to_s} }, { "$set" => { "data.x" => rand(@rows)}}) Test Total Requests/Sec Single thread updating single collection Two threads updating two collections, same DB Four threads updating two collections, same DB Two threads updating two collections, different DBs 197 80 + 80 = 160 29+29+30+30 = 118 190 + 179 = 369 28 ® Tuesday, November 12, 13
  • 29. How Does MongoDB Implement Isolation? • MongoDB does not prevent threads from seeing partially committed data • Example: Index changes can result in “double read” of data if query uses index while index is changing • Experiment: Construct a test to: • Select from numeric index and count rows • Simultaneously update index to shift lower values past end of previous high value 29 ® Tuesday, November 12, 13
  • 30. How Does MongoDB Implement Isolation? # Select values. count = 0 @col.find(“k1” => {"$gte" => 120000}). each do |doc| count += 1 end puts "Count=#{count}" # Run update to increase. @col.update( {"_id" => {"$exists" => true}}, {"$inc" => {“k1” => increment}}, {:multi => true}) Count=50000 Count=50000 Count=100000 <--Index shifts over tail Count=50000 Count=50000 30 ® Tuesday, November 12, 13
  • 32. Review of MySQL Replication Master Slave Master-master configuration for fast failover Relay Log Binlog Relay Log set global read_only=1; Binlog 32 ® Tuesday, November 12, 13
  • 33. How Does MongoDB Set Up Replication? PRIMARY Replication SECONDARY Heartbeat Heartbeat Replication SECONDARY 33 ® Tuesday, November 12, 13
  • 34. Where Is The Replica Set Defined? $ mongo localhost ... # rs0:PRIMARY> rs.config() { ! "_id" : "rs0", ! "version" : 8, ! "members" : [ ! ! { ! ! ! "_id" : 0, ! ! ! "host" : "mongodb1:27017" ! ! }, ! ! { ! ! ! "_id" : 1, ! ! ! "host" : "mongodb2:27017" ! ! }, ! ! { ! ! ! "_id" : 2, ! ! ! "host" : "mongodb3:27017” ! ! } ! ] } 34 ® Tuesday, November 12, 13
  • 35. How Do Applications Connect? # Connect to MongoDB replica set. client = MongoReplicaSetClient.new( ['mongodb1', 'mongodb2', 'mongodb3']) # Access a collection and add data db = client.db("xacts") col = db.collection("data") col.insert({"data" => "hello world"}) 35 ® Tuesday, November 12, 13
  • 36. How Do You Read From a Slave? # Connect to MongoDB replica set. client = MongoReplicaSetClient.new( ['mongodb1', 'mongodb2', 'mongodb3'], :slave_ok => true) # Access a collection and select documents. db = client.db("xacts") col = db.collection("data") col.find() 36 ® Tuesday, November 12, 13
  • 37. Where’s the Binlog? Find last document in the OpLog rs0:PRIMARY> use local rs0:PRIMARY> db.oplog.rs.find(). ... sort({ts:-1}).limit(1) { "ts" : Timestamp(1383980308, 1), "h" : NumberLong("9112507265624716453"), "v" : 2, "op" : "i", "ns" : "xacts.data", "o" : { "_id" : ObjectId("527ddd116244f28f4592f6a8"), "data" : "hello world!" } } 37 ® Tuesday, November 12, 13
  • 38. How Do You Lock the DB to Back Up? (= FLUSH TABLES WITH READ LOCK) rs0:SECONDARY> db.fsyncLock() { ! "info" : "now locked against writes, use db.fsyncUnlock() to unlock", ! "seeAlso" : "http://dochub.mongodb.org/ core/fsynccommand", ! "ok" : 1 } ... (tar or rsync data) ... (= UNLOCK TABLES) rs0:SECONDARY> db.fsyncUnlock() { "ok" : 1, "info" : "unlock completed" } 38 ® Tuesday, November 12, 13
  • 39. How Do You Fail Over? • Planned failover: update rs.config and save: rs0:SECONDARY> rs0:SECONDARY> rs0:SECONDARY> rs0:SECONDARY> rs0:SECONDARY> cfg = rs.conf() cfg.members[0].priority = 1 cfg.members[1].priority = 1 cfg.members[2].priority = 2 rs.reconfig(cfg) • Unplanned failover: kill or stop mongod 39 ® Tuesday, November 12, 13
  • 41. How is Partitioning Like Sharding? • MySQL partitioning breaks a table into <n> tables – “PARTITION” is actually a storage engine • Tables can be partitioned by hash or range – Hash = random distribution – Range = user controlled distribution (date range) • Helpful in “big data” use-cases • Partitions can usually be dropped efficiently – Unlike “delete from table1 where timeField < ’12/31/2012’;” 41 ® Tuesday, November 12, 13
  • 42. How Does Partitioning Help Queries? Partitioned big_table on dateCol by month. select * from big_table where column1 = 5; Aug-2013 Sep-2013 Oct-2013 Nov-2013 select * from big_table where dateCol = ’10/12/2013’; 42 ® Tuesday, November 12, 13
  • 43. Can I Finally Scale My Workload Horizontally? • MySQL partitioning is helpful, but is still constrained to a single machine • MongoDB supports cross-server sharding – huge plus: it’s “in the box” – MySQL fabric is bringing something, we’ll see – Many other 3rd Party MySQL options exist • Only shard the collections that require it • Each MongoDB shard is a replica set (1 primary and 1+ secondaries) 43 ® Tuesday, November 12, 13
  • 44. What Does MongoDB Sharding Look Like? Master client app1 client app2 Master ... mongosn shard1 Slave shard2 Slave shardn mongos1 ... Slave ... Master 44 ® Tuesday, November 12, 13
  • 45. How Does Sharding Help Queries? Sharded big_table on dateCol by month. select * from big_table where column1 = 5; Aug-2013 shard... shard1 Sep-2013 Oct-2013 Nov-2013 shard2 shard3 shard4 shard... select * from big_table where dateCol = ’10/12/2013’; 45 ® Tuesday, November 12, 13
  • 46. How Do I Pick a Shard Key? • MongoDB shards on one or more fields • Simple example – “orders” collection (customerId and productId) – 1: shard on customerId o each order writes to a single shard o reads by customer on single shard o reads by product on entire cluster – 2: shard on productId o each order writes to several shards o reads by customer on entire cluster o reads by product on single shard – 3: store everything twice and shard both ways o worst case for writes o best cast for reads (either is shingle shard) 46 ® Tuesday, November 12, 13
  • 48. How Secure is It? • basic username/password • by database • roles – read = read any collection – readWrite = read/write any collection – dbAdmin = create index, create collection, rename collection, etc. 48 ® Tuesday, November 12, 13
  • 49. What About Advanced Security? • Kerberos support in MongoDB Enterprise Edition • SSL is supported, but – Note: The default distribution of MongoDB does not contain support for SSL. To use SSL, you must either build MongoDB locally passing the “--ssl” option to scons or use MongoDB Enterprise. 49 ® Tuesday, November 12, 13
  • 50. What Else is There to Learn? • Tools - mongostat, mongo[export/ • • import], mongo[dump/restore] Aggregation Framework • Think SQL aggregate functionality Map/Reduce ® Tuesday, November 12, 13
  • 51. What Should You Do? ® Tuesday, November 12, 13
  • 52. Summary We liked... • Ease of install • Ability to just “jump in” Look [out] for... • Query language (Tim says hang in there!) • You have to think about storage and queries in advance Highly Recommended Reading • Karl Seguin’s “The Little MongoDB Book” • http://openmymind.net/mongodb.pdf • MongoDB’s “SQL to MongoDB Mapping Chart” • http://docs.mongodb.org/manual/reference/sql-comparison/ ® Tuesday, November 12, 13
  • 53. Questions? Robert Hodges CEO, Continuent robert.hodges@continuent.com @continuent Tim Callaghan VP/Engineering, Tokutek tim@tokutek.com @tmcallaghan ® Tuesday, November 12, 13