Indexes are data structures that store a subset of data to allow for efficient retrieval. MongoDB stores indexes using a b-tree format. There are several types of indexes including simple, compound, multikey, full-text, and geospatial indexes. Indexes improve performance by enabling efficient retrieval, sorting, and filtering of documents. Indexes are created using the createIndex command and their usage can be checked using explain plans.
2. What weâll cover
⢠What are indexes?
⢠Types
⢠Properties
⢠Why use indexes?
⢠How to create indexes.
⢠Commands to check indexes and plans.
4. What are indexes?
Indexes are special data-structures that store a subset of
your data in an easily traversable format.
MongoDB stores indexes in a b-tree format which allows for
efficient access to the index content.
Proper index use is good and makes a system run
optimally. Improper index use can bring a system to a
grinding halt.
5. What are indexes?
Indexes are stored similar in a format similar to the
following if there was an index on Origin:
[ABE] -> 0xa193b48c
[ABE] -> 0x8e8b242a
[ABE] -> 0x0928cdc1
âŚ
[DEN] -> 0x24aa4ecd
[DEN] -> 0x87396a3c
[DEN] -> 0x9392ab2f
âŚ
[LAX] -> 0x89ccede0
âŚ
7. The _id index
⢠The _id index is automatically created and cannot be removed.
⢠This is the same as a primary key in traditional RDBMS.
⢠Default value is a 12-byte ObjectId:
⢠4-byte time stamp
⢠3-byte machine id
⢠2-byte process id
⢠3-byte counter
8. Simple index
⢠A simple index is an index on a single key
⢠This is similar to a bookâs index where you look
up a word to find the pages itâs referenced on.
9. Compound index
⢠A compound index is created over two or more
fields in a document
⢠This is similar to a phone book where you can
find the phone number of a person given their
first and last names.
10. Multikey index
⢠A multikey index is an index thatâs created on a
field that contains an array.
⢠If using in a compound index, only a single field
in a given document can be an array.
⢠You will get one entry in the index for every item
in the array for the given document. This means
if you have an array with 100 items, that
document will have 100 index entries.
11. Full-text index
⢠This is an index over a text based field, similar to
how Google indexes web pages.
12. Geo-spatial index
⢠A geo-spatial index will allow you to determine
distance from a given point.
⢠Works on both planar and spherical geometries.
13. Hashed indexes
⢠A hashed index is used in hash based sharding,
and allows for a more randomized distribution.
⢠Hashed indexes cannot contain compound keys
or be unique.
⢠Hashed indexes can contain the key in both a
hashed and non-hashed version. The non-
hashed version will allow for range based
queries.
15. Unique
⢠The unique property allows for only a single
value for the indexed field, or combination of
fields for a compound index
db.collection.createIndex({âemailâ: 1}, {âuniqueâ:
true})
⢠A unique index can only have a single null or
missing field value for all documents in the
collection.
16. Sparse
⢠The sparse property allows you to index only
documents that contain a value for the given
field.
db.collection.createIndex({âkidsâ: 1}, {âsparseâ: true})
⢠A sparse index will not be used if it would result in
an incomplete result set, unless specifically
hinted.
db.collection.find({âkidsâ: {â$gteâ: 5})
17. TTL
⢠The TTL property allows for the automatic removal of
documents after a given time period.
db.collection.createIndex({âaccessTimeâ: 1}, {âexpireAfterSecondsâ:
â1200â})
⢠The indexed field should contain an ISODate() value. If
any other type is used the document will not be removed.
⢠The TTL removal process runs once every 60 seconds so
you might see the document even though the time has
expired.
18. Partial
⢠The partial property allows you to index a subset
of your data.
db.collection.createIndex({âmovieâ: 1, âreviewsâ: 1},
{âratingâ: {â$gteâ: 4}})
⢠The index will not be used if it would provide an
incomplete result set (similar to the sparse
index).
20. Why use indexes?
⢠Efficiently retrieving document matches
⢠Equality matching
⢠Inequality or range matching
⢠Sorting
⢠Lack of a usable index will cause MongoDB to
scan the entire collection.
22. Before creating indexes
⢠Think about the queries you will be running and try to
create as few indexes as possible to support those
queries. Similar query patterns could use the same
(or very similar) indexes.
⢠Think about the data that you will query and put your
highly selective fields first in the index if possible.
⢠Check your current indexes before creating new
ones. MongoDB will allow you to create indexes with
the same fields in different orders.
23. Simple indexes
⢠When creating a simple index, the sort order,
ascending (1) or descending (-1), of the values
doesnât matter as much as MongoDB can walk
the index forwards and backwards.
⢠Simple index creation:
db.flights.createIndex({âOriginâ: 1})
24. Compound indexes
⢠When creating a compound index, the sort order, ascending (1) or
descending (-1), of the values starts to matter, especially if the index is used
to sort on multiple keys.
⢠When creating compound indexes you want to add keys to the index in the
following key order:
⢠Equality matches
⢠Sort fields
⢠Inequality matches
⢠A compound index will also help any queries that are made based off the
left most subset of keys.
26. Compound indexes
⢠An index created as follows:
db.flights.createIndex({âOriginâ: 1, âDestâ: -1})
Could be used with either of the following queries as well
since MongoDB can walk the index either way:
db.flights.find().sort({âOriginâ: 1, âDestâ: -1})
db.flights.find().sort({âOriginâ: -1, âDestâ: 1})
27. Full-text indexes
⢠Full-text index creation:
⢠db.messages.createIndex({âbodyâ: âtextâ})
⢠To search using the index finding any of the words:
db.messages.find({â$textâ: {â$searchâ: âsome textâ}})
⢠To search using the index finding a phrase
db.message.find({â$textâ: {â$searchâ: ââsome textââ}}
28. Covering indexes
⢠Covering indexes are indexes that will answer a
query without going back to the data. For example:
db.flights.createIndex({âOriginâ: 1, âDestâ: 1, âArrDelayâ:
1, âUniqueCarrierâ: 1})
⢠The following query would be covered as all fields
are in the index:
db.flights.find({âOriginâ: âDENâ, âDestâ: âJFKâ},
{âUniqueCarrierâ: 1, âArrDelayâ: 1, â_idâ:
0}).sort({âArrDelayâ: -1})
29. Indexing nested
fields/documents
⢠Letâs say you have documents with nested documents in them like the
following:
db.locations.findOne()
{
â_idâ: ObjectId(âŚ),
âŚ,
âlocationâ: {
âstateâ: âColoradoâ,
âcityâ: âLyonsâ
}
}
31. Indexing nested
fields/documents
⢠You can also index embedded documents
db.locations.createIndex({âlocationâ: 1})
⢠If you do this the query must match the document exactly
(keys in the same order). That means that this will return the
document:
db.locations.find({âlocationâ: {âstateâ: âColoradoâ, âcityâ:
âLyonsâ})
⢠But this wonât:
db.locations.find({âlocationâ: {âcityâ: âLyonsâ, âstateâ:
âColoradoâ})
32. Index Intersection
⢠Index intersection is when MongoDB uses two or more
indexes to satisfy a query.
⢠Given the following two indexes:
db.orders.createIndex({âqtyâ: 1})
db.orders.createIndex({âitemâ: 1})
⢠Index intersection means a query such as the following
could use both indexes in parallel with the results being
merged together to satisfy the query:
db.orders.find({âitemâ: âABC123â, âqtyâ: {â$gteâ: 15}})
33. Indexing arrays
⢠You can index fields that contain arrays as well.
⢠Compound indexes however can only have a single field that is an array in a given document. If
a document has two indexed fields that are arrays, you will get an error.
db.arrtest.createIndex({âaâ: 1, âbâ: 1})
db.arrtest.insert({"b": [1,2,3], "a": [1,2,3]})
cannot index parallel arrays [b] [a]
WriteResult({
"nInserted": 0,
"writeError": {
"code": 10088,
"errmsg": "cannot index parallel arrays [b] [a]"
}
})
34. Index Intersection
⢠Index intersection is when MongoDB uses two or more
indexes to satisfy a query.
⢠Given the following two indexes:
db.orders.createIndex({âqtyâ: 1})
db.orders.createIndex({âitemâ: 1})
⢠Index intersection means a query such as the following
could in theory use both indexes in parallel with the results
being merged together to satisfy the query:
db.orders.find({âitemâ: âABC123â, âqtyâ: {â$gteâ: 15}})
35. Removing indexes
⢠The command to remove indexes is similar to the
one to create the index.
db.flights.dropIndex({âOriginâ: 1, âDestâ: -1})
37. View all indexes in a
database
⢠To view all indexes in a database use the
following command:
db.system.indexes.find()
⢠For each index youâll see the fields the index was
created with, the name of the index and the
namespace (db.collection) that the index was
built on.
38. View indexes for a given
collection
⢠To view all indexes for a given collection use the
following command:
db.collection.getIndexes()
⢠This returns the same information as the
previous command, but is limited to the given
collection.
39. View index sizes
⢠To view the size of all indexes in a collection:
db.collection.stats()
⢠You will see the size of all indexes and the size
of each individual index in the results. The sizes
are in bytes.
40. How to see if an index is
used
⢠If you want to see if an index is used, append the
.explain() operator to your query
db.flights.find({âOriginâ: âDENâ}).explain()
⢠The explain operator has three levels of verbosity:
⢠queryPlanner - this is the default, and it returns the winning query plan
⢠executionStats - adds execution stats for the plan
⢠allPlansExecution - adds stats for the other candidate plans
41. Notes on indexes.
⢠When creating an index you need to know your
data and the queries that will run against it.
⢠Donât build indexes in isolation!
⢠While indexes can improve performance, be
careful to not over index as every index gets
updated every time you write to the collection.
43. End Notes
⢠User group discounts
⢠Manning publications: www.manning.com
⢠Code âug367â to save 36% off order
⢠APress publications: www.appress.com
⢠Code âUserGroupâ to save 10% off order
⢠OâReilly publication: www.oreilly.com
⢠Still waiting to get information
45. End Notes
⢠MongoDB World
⢠When: June 28th and 29th
⢠Where: NYC
⢠Save 25% by using code âDDuncanâ
Hinweis der Redaktion
The indexes do not have to store the field names as all fields are the same for each entry. After the values you will have a pointer back to the data portion of the file.
_id index is a primary key. Default value is a 12 byte ObjectId that has as itâs first 4 bytes a time stamp that the document was entered into the collection. 3 byte machine id, 2 byte process id and 3 byte counter starting with a random value. You can however override this as long as the values you enter are unique. Automatically created and cannot be removed.
Simple index is similar to a book where you look up a word and find page numbers.
Compound index is similar to a phone book where you can find the phone number of a person if you know their first and last names.
Multikey indexes are indexes over columns that have an array. There will be an entry for each item in the array and there can only be a single array column indexed in a given index.
Full-text indexes are similar to what Google does when search for words in a web site.
Geo-spatial indexes allow you to determine map proximity similar to Google maps find restaurants around this location. Can use both 2d for planar geometry and 2dsphere for spherical geometries.
Hashed indexes are used in hash-based sharding which allows for a more random distribution. Can only do equality searches against this type of index, unless you add the field as in both hashed and non-hashed forms ({âfieldâ: âhashedâ, âfieldâ: 1}). Cannot be a compound or unique index.
Unique indexes are indexes that can only store a single value for the given key thatâs being indexes (or set of values if a compound index). This can only contain a single document that has a null for the indexed field or a document that doesnât have the field at all. Cannot have both of these. db.coll.ensureIndex({âaâ: 1}, {âuniqueâ: true}).
Sparse indexes only index the documents that the field actually exists in. Will not index missing fields, but will index fields whose value is null. db.coll.ensureIndex({âaâ: 1}, {âsparseâ: true}). Use db.coll.find().hint({âaâ: 1}) to see what index contains. In 2.6 and earlier, this could result in queries returning incorrect data.
TTL indexes are indexes that will automatically remove documents in a collection after a given time. Indexed field needs to be a Date object. db.coll.ensureIndex({âfield1â: 1}, {âexpireAfterSecondsâ: 300}). If you put any other value in the indexed field it will never expire.
Partial indexes allow you to add a filter to the index so only those documents are indexes. This allows you to have smaller storage footprint than a regular index over the same field. These should be preferred over sparse indexes. db.coll.createIndex({âaâ: 1}, {âpartialFilterExpressionâ: {âaâ: {â$gtâ: 5}}}). Mongo will not use this index if it will return an incomplete result set. The query must contain the filter expression or a modified version that will return a smaller subset of the documents covered by the index.
Iâve never had a case where index intersection worked, at least not when running an explain() on the query.