This document discusses indexing in MySQL databases to improve query performance. It begins by defining an index as a data structure that speeds up data retrieval from databases. It then covers various types of indexes like primary keys, unique indexes, and different indexing algorithms like B-Tree, hash, and full text. The document discusses when to create indexes, such as on columns frequently used in queries like WHERE clauses. It also covers multi-column indexes, partial indexes, and indexes to support sorting, joining tables, and avoiding full table scans. The concepts of cardinality and selectivity are introduced. The document concludes with a discussion of index overhead and using EXPLAIN to view query execution plans and index usage.
2. Jehad Keriaki 2014
What is an Index
Data structure to improve the speed of data
retrieval from DBs.
MySQL: Indexing for Better Performance2
3. Jehad Keriaki 2014
Why Would We Use Indexes
Speed, Speed, and Speed
Constraints (Uniqueness)
IO Optimization
MAX, MIN
Sorting, Grouping
MySQL: Indexing for Better Performance3
4. Jehad Keriaki 2014
Index Types
Primary Key (PK), Unique, Key
Primary Key vs Unique
Unique can be NULL
InnoDB is clustered based on PK
MySQL: Indexing for Better Performance4
5. Jehad Keriaki 2014
Types (Algorithm)
B-Tree, R-Tree, Hash, Full text
R-Tree: Geo-spatial
Hash: Memory only, fast for equality, whole key is used,
no range
Full-text:
For MyISAM, and as of 5.6 for InnoDB too.
SELECT * WHERE MATCH(description) AGAINST ('toshiba')
boolean , with query expansion, stop words, short words,
50% rule
A better choice would be to use a search server like Sphinx
MySQL: Indexing for Better Performance5
6. Jehad Keriaki 2014
Types (Algorithm) [cont'd]
B-Tree:
For comparison operations (<>=..etc)
Range (Between)
Like, which is a special case of range when used with %
It is the DEFAULT in MySQL
In B-Tree, data are stored in the leaf nodes
MySQL: Indexing for Better Performance6
7. Jehad Keriaki 2014
Types (Structure)
One column
Multi-Column [composite]
Partial [prefix]
Any one of them can be "Covering Index", except
'partial'
MySQL: Indexing for Better Performance7
8. Jehad Keriaki 2014
What Indexes to Create?
PK is a must
Best to be unsigned [smallest int] auto increment
PK and InnoDB (Clustered)
InnoDB tables are clustered based on PKs
Each secondary index has the PK in it. example:
INDEX(name) is in fact (name, id)
AVOID long PKs. Why?
AVOID md5(), uuid(), etc.
MySQL: Indexing for Better Performance8
9. Jehad Keriaki 2014
MyISAM and InnoDB
In MyISAM:
Index entry tells the physical offset of the row in the
data file
In InnoDB:
PK index has the data. Secondary indexes store PK as
a pointer. Key on field F is (F, PK) - good for sorting
and covering index
MySQL: Indexing for Better Performance9
10. Jehad Keriaki 2014
Cardinality and Selectivity
Cardinality: Number of distinct values
Selectivity: Cardinality / total number of rows
What values are better
Optimize Stats Update
MySQL: Indexing for Better Performance10
11. Jehad Keriaki 2014
One Column Index
This index is on one column only
Query example:
SELECT * FROM employee WHERE first_name LIKE 'stephane';
Index solution:
ALTER TABLE employee ADD INDEX (first_name);
Notes:
Index the first n char of the char/varchar/text fields
Do not use a function. i.e.
WHERE md5(field)='1bc29b36f623ba82aaf6724fd3b16718'
MySQL: Indexing for Better Performance11
12. Jehad Keriaki 2014
Multi Column Index
What is it:
Index that involves more than one column.
Higher cardinality field goes first, with exceptions.
What 'left most' term is. [INDEX (A, B, C)]
Query example:
SELECT * FROM employee
WHERE department = 5 AND last_name LIKE 'tran';
Index solution:
ALTER TABLE employee ADD INDEX (last_name, department);
{WHY NOT (department, last_name)??}
MySQL: Indexing for Better Performance12
13. Jehad Keriaki 2014
Multi Column Index [Cont’d]
Query example:
SELECT * FROM employee WHERE department = 5 and
hiring_date>='2014-01-01';
Index solution:
ALTER TABLE employee ADD INDEX (department, hiring_date);
Notes
Should it be (hiring_date, department)? Is this an
exception?
Order of columns IS important
WILL NOT USE THE INDEX:
SELECT * FROM employee WHERE hiring_date>='2014-01-01';
MySQL: Indexing for Better Performance13
14. Jehad Keriaki 2014
Partial Index
What is it: Index on the first n char of a field.
Query example:
email: varchar(255);
SELECT * FROM users WHERE email like 'richardmelo@yahoo.com';
Index solution
ALTER TABLE users ADD INDEX (email(12));
vs
ALTER TABLE users ADD INDEX (email);
Notes:
Save space, efficient writing, same performance
SELECT COUNT(DISTINCT(LEFT(field, 20))) FROM table
85% threshold? 90% maybe?
MySQL: Indexing for Better Performance14
15. Jehad Keriaki 2014
Joins and Indexes
Linking two or more tables to get related rows
Query example:
SELECT employee.first_name, employee.last_name,
FROM department
INNER JOIN employee ON departmant.id = employee.department
WHERE department.location='MTL';
Index solution:
ALTER TABLE department ADD INDEX (location);
ALTER TABLE employee ADD INDEX (department);
Notes: The join could be on a non-indexed field on
department, but an index has to exist on "employee's field"
MySQL: Indexing for Better Performance15
16. Jehad Keriaki 2014
Multiple Indexes OR Multi-Col Index
What is it:
ALTER TABLE ADD INDEX(field1), ADD INDEX(field2)
ALTER TABLE ADD INDEX(field1, field2)
Query example:
WHERE field1=1 OR field2=2 [multiple indexes]
WHERE field1=1 AND field2=2 [multi-col index]
MySQL: Indexing for Better Performance16
17. Jehad Keriaki 2014
Covering Index
When the index has the required data, no need to
read data from table’s data!
Example:
employee(id, first_name, last_name, email, phone, hiring_date)
SELECT email FROM employee WHERE phone='123456789';
ALTER TABLE employee ADD INDEX(phone, email);
min(), max() functions use the index only.
MySQL: Indexing for Better Performance17
18. Jehad Keriaki 2014
Covering Index - Note
only in InnoDB:
myindex(col1,col2)
SELECT col1 FROM table1 WHERE col2 = 200 <<-- will use index
SELECT * FROM table1 where col2 = 200 <<-- will NOT use index.
MySQL: Indexing for Better Performance18
19. Jehad Keriaki 2014
ICP (Index Condition Pushdown) [5.6]
Lets the optimizer check in the index instead of checking in the
table's data.
employee(id, first_name, last_name, department, phone, email, address)
INDEX(department, email)
SELECT * FROM employee
WHERE department=5
AND email LIKE '%@beta.example%'
[and address LIKE '%montreal%'];
Instead of stopping at department and then use where to check for
email in the table's data, it will actually check in the index to see if
the 2nd condition is satisfied, and then if yes, it will fetch the data
from the table
MySQL: Indexing for Better Performance19
20. Jehad Keriaki 2014
Using Index for Sorting
ORDER BY x (index on x)
WHERE x ORDER BY y (index on x, y)
WHERE x ORDER BY x DESC, y DESC (index on x, y)
WHERE x ORDER BY x ASC, y DESC (Can't use index)
MySQL: Indexing for Better Performance20
21. Jehad Keriaki 2014
Exceptions
E.g. Date index with other less cardinal field.
Status or Gender special cases
MySQL: Indexing for Better Performance21
22. Jehad Keriaki 2014
Overhead of indexing
IO: Each DML operation will modify the indexes
Disk space
More indexes => Higher possibility of deadlock
MySQL: Indexing for Better Performance22
23. Jehad Keriaki 2014
ABOUT EXPLAIN
It lets us know the plan of query execution
What index would be used, if any
Rows to be scanned
MySQL: Indexing for Better Performance23