IN106 Performance with MongoDB

MWLUG 2017
Moving Collaboration Forward
MongoDB Performance
Kim Greene
Kim Greene Consulting, Inc.
kim@kimgreene.com

MWLUG 2017
About Me
{
“name”: “Kim Greene”,
“email”: “kim@kimgreene.com”,
“company”: “Kim Greene Consulting, Inc.”,
“website”: “www.kimgreene.com”,
“twitter”: “@iSeriesDomino”,
}

MWLUG 2017
Agenda
• Why companies are turning to MongoDB
• Hardware
• Sharding
• Database choice
• Schema design
• Indexes

MWLUG 2017
Why Customers are Turning to
MongoDB

MWLUG 2017
Inserting Data: MongoDB vs. MySQL
• Inserting 1,615 chemical compound records into two parent-
child tables
• Turned off foreign keys during insert and used string builder to
create bulk insert SQL statement in MySQL

MWLUG 2017
MongoDB vs. S3 Performance
• Download 220 KB object from MongoDB was
7x faster cold, and 3x faster when warm

MWLUG 2017
Hardware

MWLUG 2017
Hardware Requirements
• Can use commodity hardware all way up to
IBM Power and zSeries
– Use multi-core systems when possible
• Ensure indexes and most frequently accessed
data (working set) fits in RAM
• RAM is the most important factor for
hardware
• db.serverStatus()
– Use to obtain info on current working set

MWLUG 2017
Hardware Requirements
• Data placement is key!
– Use SSDs for:
• Write-heavy data
• Placement of journals
• Compression
– Can reduces footprint by up to 80%
– Equals fewer bits read from disk

MWLUG 2017
Compression
• WiredTiger has native compression
• Compression options for documents and indexes
– Snappy
• Default, balance between high document and journal
compression ratios
• Low CPU overhead
– zlib
• Higher compression
• Additional CPU overhead
– Prefix
• What indexes use by default, reduces size ~50%

MWLUG 2017
Compression
• Snappy and SSDs
– Use for frequently accessed data
• zlib and rotational disks
– Use for older, less frequently accessed data

MWLUG 2017
Sharding

MWLUG 2017
Sharding
• Place a portion of data on certain servers
• Use with
– Very large data sets
– High throughput demands
– Needs for geo location of data

MWLUG 2017
Sharding
• Distribute data across cluster based on query
patterns or data locality
• Types of sharding:
– Range
– Hash
– Zone

MWLUG 2017
Sharding
• Range sharding
– Divides data into ranges based on shard key values
– Efficient queries when reading documents in a contiguous range
– Can have poor read and write performance with poor shard key
range selection
• Hash sharding
– More even data distribution
– Can impact performance of range-based queries
• Zone sharding
– Used to improve locality of data
• By geographic region
• By hardware configuration for tiered storage-architectures
• By application feature

MWLUG 2017
Database Choice

MWLUG 2017
4 Types of Databases
• WiredTiger
– Most commonly used database type, the default
• Encrypted
– For highly sensitive data
• In-memory
– For performance critical data
• MMAPv1
– Improved version of database used in earlier
versions of MongoDB

MWLUG 2017
In-Memory Database
• Doesn’t maintain any on-disk data, including
configuration data, indexes, user credentials,
etc.
• Entire database needs to be able to fit into
memory
– Key to know true “working set”

MWLUG 2017
Schema Design

MWLUG 2017
Schema Design
• Schema design is critical
– Most performance problems are because of poor
schema design
• RDBMS schema design
– What answers do I have?
• MongoDB schema design
– What questions will I have?

MWLUG 2017
Schema Design
• Key items of focus
–How will the data be accessed
–What is the projected read to write ratio
–How large will documents become
• Want to structure data to match how it is
queried and updated

MWLUG 2017
Schema Design
• Basic schema designs
– Embedding
– Referencing
– Denormalization

MWLUG 2017
Embedding
• To embed or not to embed
– Favor embedding unless there is a compelling
reason not to
– If an object needs to be accessed frequently on
it’s own, it’s best not to embed

MWLUG 2017
Embedding

MWLUG 2017
Embedding
• Use when all of the data is manipulated
together
• Relationship between collections is one-to-
one
• When able to be used, normally reduces
latency of get requests by 50%

MWLUG 2017
Referencing
• Link to other documents when:
– One to many relationships
– Need to access parts of data stand-alone

MWLUG 2017
Denormalizing
• Read/write ratio is key for deciding on
denormalizing
– Fields primarily read and rarely updated are good
candidates
– If a field is updated frequently, don’t do it

MWLUG 2017
Denormalization
• Limits having to perform application-level join
for denormalized fields

MWLUG 2017
Denormalization
• Consider the write/read ratio when
denormalizing
– A field that will mostly be read and only seldom
updated is a good candidate for denormalization
– As updates become more frequent relative to
queries, the savings from denormalization
decrease

MWLUG 2017
Back to Embedding
• Embed computed information when you write
it
– Prevents needing to retrieve and compute over
and over
– Works well if writes are infrequent
– Pushes work to the application on the write, result
is dramatically improved read time

MWLUG 2017
Back to Embedding
• What to look for when choosing referencing vs
embedding data in a document
– Things that don’t change often and aren’t read
often are best stored in a separate document
– Parent document contains a reference to the less
frequently accessed/updated document

MWLUG 2017
Schema Design
• The MongoDB data schema design of choice
depends – entirely – on your particular
application’s data access patterns
• Structure your data to match the ways that
your application queries and updates it

MWLUG 2017
Indexes

MWLUG 2017
Indexes
• ½ of all performance issues are due to missing
or incorrect secondary indexes
• Index early
• Index often

MWLUG 2017
Types of Secondary Indexes
• Unique
• Compound
• Array
• Time to Live (TTL)
• Geospatial
• Partial
• Sparse
• Text search

MWLUG 2017
• Unique
– Rejects insertion of new documents or the update
of a document with an existing value for the field
it is built over
• Compound
– Useful for queries that specify multiple predicates
• Example: Find customers based on last name, first
name, and city of residence
– Can reduce the need for single field indexes as any
leading field in a compound index can be used

MWLUG 2017
• Array
– For fields that contain an array, each array value is
stored as a separate index entry
• Time to Live (TTL)
– Specify a period of time after which the data is
automatically deleted from the database

MWLUG 2017
• Geospatial
– Allow MongoDB to optimize queries for
documents that contain points or a polygon that
are closest to a given point or line; that are within
a circle, rectangle, or polygon; or that intersect
with a circle, rectangle, or polygon
• Partial
– Use to include only documents that meet specific
conditions

MWLUG 2017
• Sparse
– Contain entries for documents that contain a
specified field
– Allow for smaller, more efficient indexes when
fields are not present in all documents
• Text search
– Specialized index for text search that uses
advanced, language-specific linguistic rules for
stemming, tokenization, case sensitivity and stop
words

MWLUG 2017
Indexing Tidbits
• Query optimizer
– Selects best index to use by periodically running query
plans
• Index intersection
– Allows MongoDB to use more than one index to
optimize ad-hoc queries at run-time
• Covered queries
– Return results containing only indexed fields
– Very efficient, results returned without reading from
source documents

MWLUG 2017
Aggregation Pipeline
• Replaces find in certain scenarios
• Improves performance significantly
– Moves processing from the client side to the
server
– Saves CPU and bandwidth
• Reduce the amount of data transmitted to the
application layer

MWLUG 2017
Where to Find More Information

MWLUG 2017
Where to Find More Information
• MongoDB University
– university.mongodb.com
• YouTube tutorials
– youtube.com/mongodb
• MongoDB Performance Best Practices white
paper
– mongodb.com/collateral/mongodb-performance-
best-practices

IN106 Performance with MongoDB

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie IN106 Performance with MongoDB

Ähnlich wie IN106 Performance with MongoDB (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

IN106 Performance with MongoDB