2. MySQL Timeline
Sun Microsystems: 2008 - 2010MySQL AB: until 2008 Oracle: since 2010
2001 2003 2005 2006 2008 2010 2013 20182015
Version 3.23
Version 4.0 Version 5.5
Version 4.1 Version 5.6
Version 5.7
Version 8.0Version 5.1
Version 5.0 Version 6.0
Query cache
InnoDB a standard
part of the server
Replication was
rewritten to use two
threads
Support for UNION
and multi-table
DELETE statements
Introduced views
Introduced triggers
Introduced stored
procedures
ISAM engine was
removed completely
Add replication
Add full-text indexing
MyISAM
Support InnoDB
plugin
Add Subqueries
Add INSERT ON
DUPLICATE KEY UPDATE
Support UTF-8
character
New binary protocol
and prepared
statement support
Introduced partitioning
Add row-based
replication
Add variety of plugin
APIs
BerkeleyDB storage
engine was removed
Optimized
performance,
scalability, replication
InnoDB default
storage engine
Add database
PERFORMANCE_SCHEMA
APIs for replication,
authentication
Improvement
query optimizer
More plugin APIs
Implement
replication GTID
Delayed slave
Multi source
replication
Added JSON
support
Improved InnoDB
scalability
Canceled
SQL Window
functions
JSON Extended
syntax
Default
character utf8mb4
Better at Read/Write
workloads
SQL Roles
MySQL
3. MySQL Architecture
Client connecors
JDBC, ODBC, PHP, Python, .NET, Perl, Ruby, Native C API, ...
Services and utilities
Connection pool
Storage engines
Connection Handling, Thread Reuse, Authentication, Security, Connections Limits, Caches
Backup & Recovery
Administration
Security
Replication
Cluster
Migration & Metadata
Monitoring
SQL Interface
DML, DDL
Stored procedures
Triggers
Views
Parser
Queries translation
Syntactic analyzer
Lexical semantic
Code generation
Object privileges
Optimizer
Rewriting
Statistic
Order of scanning
Usege Indexes
Caches
Global and Engines
specific caches and
buffers
MyISAMInnoDB Memory Archive Federated CSV Merge Blackhole
File system Files and Logs
NTFS, ext4, SAN, NAS
Binary, Error, Slow, General, Redo, Undo,
Data, Index
MySQL SERVER
MySQL
4. Transactions
ACID - an acronym standing for atomicity, consistency, isolation, and durability. These properties are all desirable in a database system, and
are all closely tied to the notion of a transaction. The transactional features of InnoDB adhere to the ACID principles.
Transactions are atomic units of work that can be committed or rolled back. When a transaction makes multiple changes to the database,
either all the changes succeed when the transaction is committed, or all the changes are undone when the transaction is rolled back.
A banking application is the classic example of why transactions are necessary, for example transfer money between two accounts:
START TRANSACTION;
UPDATE accounts.balance SET balance = balance - 100.00 WHERE customer_id = 1;
UPDATE accounts.balance SET balance = balance + 100.00 WHERE customer_id = 2;
COMMIT;
The entire operation should be wrapped in a transaction so that if any one of the steps fails, any completed steps can be rolled back.
You start a transaction with the START TRANSACTION statement and then either make its changes permanent with COMMIT or discard the
changes with ROLLBACK .
What happens if the database server crashes while performing line 3? The customer probably just lost $100. And what if another process
comes along between lines 2 and 3 and removes the entire checking account balance? The bank has given the customer a $100 credit
without even knowing it.
Transactions aren't enough unless the system passes the ACID test.
A: atomicity - transaction must function as a single indivisible unit of work so that the entire transaction is either applied or rolled back. When
transactions are atomic, there is no such thing as a partially completed transaction: it's all or nothing.
C: consistency - the database should always move from one consistent state to the next. In our example, consistency ensures that a crash
between lines 2 and 3 doesn't result in $100 disappearing from the checking account. Because the transaction is never committed, none of
the transaction’s changes are ever reflected in the database.
I: isolation - the results of a transaction are usually invisible to other transactions until the transaction is complete. This ensures that if a bank
account summary runs after line 2 but before line 3 in our example, it will still see the $100 in the checking account. When we discuss
isolation levels, you’ll understand why we said usually invisible.
D: durability - once committed, a transaction’s changes are permanent. This means the changes must be recorded such that data won’t be
lost in a system crash.
Deadlocks. A deadlock is when two or more transactions are mutually holding and requesting locks on the same resources, creating a cycle
of dependencies. Deadlocks occur when transactions try to lock resources in a different order. They can happen whenever
multiple transactions lock the same resources.
To combat this problem, database systems implement various forms of deadlock detection and timeouts. The more sophisticated systems,
such as the InnoDB storage engine, will notice circular dependencies and return an error instantly.
MySQL
5. Isolation Levels
Isolation level Dirty reads Phantom reads Locking readsNonrepeatable reads
READ UNCOMMITTED
READ COMMITTED
REPEATABLE READ
SERIALIZABLE
READ UNCOMMITTED
In the READ UNCOMMITTED isolation level, transactions can view the results of uncommitted
transactions. At this level, many problems can occur unless you really, really know what you are
doing and have a good reason for doing it. Reading uncommitted data is also known as a dirty read.
READ COMMITTED
READ COMMITTED satisfies the simple definition of isolation used earlier: a transaction will see
only those changes made by transactions that were already committed when it began, and its
changes won't be visible to others until it has committed. This level still allows what's known as a
nonrepeatable read. This means you can run the same statement twice and see different data.
REPEATABLE READ
REPEATABLE READ is MySQL's default transaction isolation level.
REPEATABLE READ guarantees that any rows a transaction reads will "look the same" in
subsequent reads within the same transaction, but in theory it still allows another tricky problem:
phantom reads. Simply put, a phantom read can happen when you select some range of rows,
another transaction inserts a new row into the range, and then you select the same range again;
you will then see the new "phantom" row.
SERIALIZABLE
The highest level of isolation, SERIALIZABLE, solves the phantom read problem by forcing
transactions to be ordered so that they can't possibly conflict. SERIALIZABLE places a lock on
every row it reads. At this level, a lot of timeouts and lock contention can occur.
The SQL standard defines four isolation levels, with specific rules for which changes are and aren't visible inside and outside a transaction.
Lower isolation levels typically allow higher concurrency and have lower overhead.
Summarizes the various isolation levels and the drawbacks associated with each one:
MySQL
6. Multiversion Concurrency Control
MySQL InnoDB don't use a simple row-locking mechanism. It uses row-level locking in conjunction with multiversion concurrency control
(MVCC). MVCC works by keeping a snapshot of the data as it existed at some point in time. This means transactions can see a consistent
view of the data, no matter how long they run.
InnoDB implements MVCC by storing with each row two additional, hidden values that record when the row was created (inserted) and when
it was expired (deleted). This is a number that increments each time a transaction begins. Each transaction keeps its own record of the
current system version, as of the time it began. Each query has to check each row's version numbers against the transaction's version.
MVCC works only with the REPEATABLE READ and READ COMMITTED isolation levels.
SELECT
InnoDB must examine each row to ensure that it meets two criteria:
1. InnoDB must find a version of the row that is at least as old as the transaction (i.e., its version must be less than or equal to the
transaction’s version). This ensures that either the row existed before the transaction began, or the transaction created or altered the row.
2. The row's deletion version must be undefined or greater than the transaction's version. This ensures that the row wasn't deleted before
the transaction began.
Rows that pass both tests may be returned as the query's result.
INSERT
InnoDB records the current system version number with the new row.
DELETE
InnoDB records the current system version number as the row's deletion ID.
UPDATE
InnoDB writes a new copy of the row, using the system version number for the new row's version. It also writes the system version
number as the old row's deletion version.
The result of all this extra record keeping is that most read queries never acquire locks. They simply read data as fast as they can, making
sure to select only rows that meet the criteria. The drawbacks are that the storage engine has to store more data with each row, do more
work when examining rows, and handle some additional housekeeping operations.
MySQL
7. Database normalization
Database normalization is a process by which an existing schema is modified to bring its component tables into compliance with a series of
progressive normal forms. The concept of database normalization was first introduced by Edgar Frank Codd in his paper: "A Relational
Model of Data for Large Shared Data Banks".
First Normal Form
The first normal form (or 1NF) requires that the values in each column of a table are atomic. By atomic we mean that there are no sets of
values within a column.
One method for bringing a table into first normal form is to separate the entities contained in the table into separate tables.
Second Normal Form
The second normal form (or 2NF) any non-key columns must depend on the entire primary key. In the case of a composite primary key, this
means that a non-key column cannot depend on only part of the composite key.
Third Normal Form
Third Normal Form (3NF) requires that all columns depend directly on the primary key. Tables violate the Third Normal Form when one
column depends on another column, which in turn depends on the primary key (a transitive dependency).
MySQL
1st NF 2nd NF 3rd NF Boyce-Codd NF 4th NF 5th NF Domain Key NF 6th NF
Boyce-Codd normal form is a slightly stronger version of the third normal form (3NF). If a relational schema is in BCNF then all redundancy
based on functional dependency has been removed, although other types of redundancy may still exist.
Boyce-Codd Normal Form
Fourth Normal Form
Fourth normal form (4NF) is a level of database normalization where there are no non-trivial multivalued dependencies other than a
candidate key.
Fifth Normal Form
Fifth normal form (5NF) is a level of database normalization designed to reduce redundancy in relational databases recording multi-valued
facts by isolating semantically related multiple relationships.
Domain-key Normal Form
Domain-key normal form (DK/NF) is a normal form which requires that the database contains no constraints other than domain constraints
and key constraints. A domain constraint specifies the permissible values for a given attribute, while a key constraint specifies the attributes
that uniquely identify a row in a given table.
Sixth Normal Form
Sixth normal form based on an extension of the relational algebra. Relational operators, such as join, are generalized to support a natural
treatment of interval data, such as sequences of dates or moments in time, for instance in temporal databases.
8. MySQL's Storage Engines
MySQL
MySQL stores each database (also called a schema) as a subdirectory of its data directory in the underlying filesystem. When you create a
table, MySQL stores the table definition in a .frm file with the same name as the table.
InnoDB: The default storage engine in MySQL 8.0. InnoDB is a transaction-safe (ACID compliant) storage engine for MySQL that has
commit, rollback, and crash-recovery capabilities to protect user data. InnoDB row-level locking (without escalation to coarser granularity
locks) and Oracle-style consistent nonlocking reads increase multi-user concurrency and performance. InnoDB stores user data in clustered
indexes to reduce I/O for common queries based on primary keys. To maintain data integrity, InnoDB also supports FOREIGN KEY
referential-integrity constraints.
MyISAM: These tables have a small footprint. Table-level locking limits the performance in read/write workloads, so it is often used in read-
only or read-mostly workloads in Web and data warehousing configurations.
Memory: Stores all data in RAM, for fast access in environments that require quick lookups of non-critical data. This engine was formerly
known as the HEAP engine. Its use cases are decreasing; InnoDB with its buffer pool memory area provides a general-purpose and durable
way to keep most or all data in memory, and NDBCLUSTER provides fast key-value lookups for huge distributed data sets.
CSV: Its tables are really text files with comma-separated values. CSV tables let you import or dump data in CSV format, to exchange data
with scripts and applications that read and write that same format. Because CSV tables are not indexed, you typically keep the data in
InnoDB tables during normal operation, and only use CSV tables during the import or export stage.
Archive: These compact, unindexed tables are intended for storing and retrieving large amounts of seldom-referenced historical, archived,
or security audit information.
Blackhole: The Blackhole storage engine accepts but does not store data, similar to the Unix /dev/null device. Queries always return an
empty set. These tables can be used in replication configurations where DML statements are sent to slave servers, but the master server
does not keep its own copy of the data.
NDB (also known as NDBCLUSTER): This clustered database engine is particularly suited for applications that require the highest possible
degree of uptime and availability.
Merge: Enables a MySQL DBA or developer to logically group a series of identical MyISAM tables and reference them as one object. Good
for VLDB environments such as data warehousing.
Federated: Offers the ability to link separate MySQL servers to create one logical database from many physical servers. Very good for
distributed or data mart environments.
9. Execution path of a query
Query
Cache
Parser
Preprocessor
Query
optimizer
MyISAM InnoDB
...
DATA
Storage engines
Query
execution
engine
MySQL Server
Client
Client/Server
protocol
Checks the query cache.
If there’s a hit, it returns the
stored result from the cache;
otherwise, it passes the SQL
statement to the next step.
The query execution engine
executes the plan by making
calls to the storage engine
API.
MySQL's parser breaks the
query into tokens and builds
a "parse tree" from them.
The preprocessor checks the
resulting parse tree for
additional semantics.
Checks that tables and
columns exist, and privileges.
- Logical Transformations.
- Cost-based optimizer: Join
order and access methods
- Plan refinement
5
42
1 3
Main phases of query execution
SQL
Result
API callsParse
Tree
Query
execution
plan
SQL
query
Parsed
Tree
Result Result
MySQL
10. Query Cache
The MySQL query cache holds the exact bits that a completed query returned to the client. When a query cache hit occurs, the server can
simply return the stored results immediately, skipping the parsing, optimization, and execution steps. The query cache keeps track of which
tables a query uses, and if any of those tables changes, it invalidates the cache entry.
MySQL does not parse, "normalize" or parameterize a statement when it checks for a cache hit; it uses the statement and other bits of data
exactly as the client sends them. Any difference in character case, spacing, or comments any difference at all will prevent a query from
matching a previously cached version.
Another caching consideration is that the query cache will not store a result unless the query that generated it was deterministic. Thus, any
query that contains a nondeterministic function, such as NOW() or CURRENT_DATE() , will not be cached. In fact, the query cache does not
work for queries that refer to user-defined functions, stored functions, user variables, temporary tables.
Any SELECT query that MySQL doesn't serve from the cache is a cache miss. A cache miss can occur for any of the following reasons:
The query is not cacheable, either because it contains a nondeterministic construct (such as CURRENT_DATE, NOW, etc) or because
its result set is too large to store.
The server has never seen the query before, so it never had a chance to cache its result.
The query's result was previously cached, but the server removed it. This can happen because there wasn't enough memory to keep it,
because someone instructed the server to remove it, or because it was invalidated.
If your server has a lot of cache misses but very few uncacheable queries, one of the following must be true:
The query cache is not warmed up yet. That is, the server hasn't had a chance to fill the cache with result sets.
The server is seeing queries it hasn't seen before. If you don't have a lot of repeated queries, this can happen even after the cache is
warmed up.
There are a lot of cache invalidations.
Cache invalidations can happen because of fragmentation, insufficient memory, or data modifications.
Query Cache Optimizations
It's more efficient to batch writes than to do them singly, because this method invalidates cached cache entries only once.You cannot control
the query cache on a per-database or per-table basis, but you can include or exclude individual queries with the SQL_CACHE and
SQL_NO_CACHE modifiers in the SELECT statement. You can also enable or disable the query cache on a per-connection basis by setting
the session-level query_cache_type server variable to the appropriate value.
If you want to avoid the query cache for most queries, but you know that some will benefit significantly from caching, you can set the global
query_cache_type to DEMAND and then add the SQL_CACHE hint to those queries you want to cache. Although this requires you to do
more work, it gives you very fine-grained control over the cache. Conversely, if you want to cache most queries and exclude just a few, you
can add SQL_NO_CACHE to them.
MySQL
11. MySQL's parser breaks the query into tokens and builds a "parse tree" from them. The parser uses MySQL's SQL grammar to interpret and
validate the query. For instance, it ensures that the tokens in the query are valid and in the proper order, and it checks for mistakes such as
quoted strings that aren't terminated.
The preprocessor then checks the resulting parse tree for additional semantics that the parser can't resolve. For example, it checks that
tables and columns exist, and it resolves names and aliases to ensure that column references aren't ambiguous. Next, the preprocessor
checks privileges.
The parser and the preprocessor
Query optimizer
The parse tree is now valid and ready for the optimizer to turn it into a query execution plan. A query can often be executed many different
ways and produce the same result. The optimizer's job is to find the best option. MySQL uses a cost-based optimizer, which means it tries to
predict the cost of various execution plans and choose the least expensive. The unit of cost was originally a single random 4 KB data page
read, but it has become more sophisticated and now includes factors such as the estimated cost of executing a WHERE clause comparison.
You can see how expensive the optimizer estimated a query to be by running the query, then inspecting the Last_query_cost session
variable.
The score is calculated based on various statistical information: the number of pages per table or index, the cardinality (number of distinct
values) of the indexes, the length of the rows and keys, and the key distribution. The optimizer does not include the effects of any type of
caching in its estimates - it assumes every read will result in a disk I/O operation.
The optimizer might not always choose the best plan, for many reasons:
The statistics could be wrong. The server relies on storage engines to provide statistics, and they can range from exactly correct to
wildly inaccurate. For example, the InnoDB storage engine doesn't maintain accurate statistics about the number of rows in a table
because of its MVCC architecture.
The cost metric is not exactly equivalent to the true cost of running the query, so even when the statistics are accurate, the query might
be more or less expensive than MySQL's approximation. A plan that reads more pages might actually be cheaper in some cases, such
as when the reads are sequential so the disk I/O is faster, or when the pages are already cached in memory. MySQL also doesn't
understand which pages are in memory and which pages are on disk, so it doesn't really know how much I/O the query will cause.
MySQL doesn't consider other queries that are running concurrently, which can affect how quickly the query runs.
The optimizer doesn't take into account the cost of operations not under its control, such as executing stored functions or user-defined
functions.
MySQL's query optimizer is a highly complex piece of software, and it uses many optimizations to transform the query into an execution plan.
There are two basic types of optimizations, which we call static and dynamic.
Static optimizations can be performed simply by inspecting the parse tree. For example, the optimizer can transform the WHERE clause
into an equivalent form by applying algebraic rules. Static optimizations are independent of values, such as the value of a constant in a
WHERE clause. They can be performed once and will always be valid, even when the query is reexecuted with different values.
Dynamic optimizations are based on context and can depend on many factors, such as which value is in a WHERE clause or how many
rows are in an index. They must be reevaluated each time the query is executed.
MySQL
12. Optimizing SQL Statements
The core logic of a database application is performed through SQL statements, whether issued directly through an interpreter or submitted
behind the scenes through an API. The guidelines cover SQL operations that read and write data, the behind-the-scenes overhead for SQL
operations in general. This section discusses optimizations that can be made for processing WHERE clauses.
The examples use SELECT statements, but the same optimizations apply for WHERE clauses in DELETE and UPDATE statements.
Removal of unnecessary parentheses:
((a AND b) AND c OR (((a AND b) AND (c AND d))))
(a AND b AND c) OR (a AND b AND c AND d)
Constant folding:
(a < b AND b = c) AND a = 5
b > 5 AND b = c AND a = 5
Constant condition removal:
(b >= 5 AND b = 5) OR (b = 6 AND 5 = 5) OR (b = 7 AND 5 = 6)
b = 5 OR b = 6
Equality Range Optimization of Many-Valued Comparisons:
col_name = val1 OR ... OR col_name = valN
col_name IN(val1,...,valN)
MySQL sorts the values in the IN() list and uses a fast binary search to see whether a value is in the list. This is O(log n) in the size of the list,
whereas an equivalent series of OR clauses is O(n) in
the size of the list (i.e., much slower for large lists).
Range Optimization of Row Constructor Expressions:
SELECT ... FROM t1 WHERE (col_1 = 'a' AND col_2 = 'b') OR (col_1 = 'c' AND col_2 = 'd');
SELECT ... FROM t1 WHERE (col_1, col_2) IN (('a', 'b'), ('c', 'd'));
MySQL
13. for each row in t1 matching range {
for each row in t2 matching reference key {
for each row in t3 {
if row satisfies join conditions, send to client
}
}
}
SQL JOIN
Nested-Loop Join Algorithm. A simple nested-loop join (NLJ) algorithm reads rows from the first table in a loop one at a time, passing each
row to a nested loop that processes the next table in the join. This process is repeated as many times as there remain tables to be joined.
Use indexes in destination join's columns for optimize perfomance to retrieve rows from other tables when performing joins. MySQL can use
indexes on columns more efficiently if they are declared as the same type and size.
Assume that a join between three tables t1, t2, and
t3 is to be executed using the following join types:
Table
t1
t2
t3
Join Type
range
ref
ALL
If a simple NLJ algorithm is used, the join is
processed like this:
FULL JOIN can't be executed with nested loops and
backtracking as soon as a table with no matching rows is
found, because it might begin with a table that has no
matching rows.
This explains why MySQL doesn't support FULL JOIN.
If you need to make FULL JOIN in MySQL you can combine
LEFT JOIN and RIGHT JOIN
SELECT <select list>
FROM table_a AS A
LEFT JOIN table_b AS B ON A.key = B.key
UNION
SELECT <select list>
FROM table_a AS A
RIGHT JOIN table_b AS B ON A.key = B.key
table_btable_a
col1=1, col2=1
...
...
col1=3, col2=1
...
1 loop
N loops
per row JOIN
JOIN
A B
C
A B
C
col2=1, col3=1
col2=1, col3=2
...
col2=1, col3=1
col2=1, col3=2
results
col1=1, col3=1
col1=1, col3=2
...
col1=3, col3=1
col1=3, col3=2
Swim-lane diagram illustrating retrieving rows using a join MySQL's query execution plans always take the form of a
left-deep tree
MySQL
MySQL executes joins between tables using a nested-loop algorithm or variations on it.
14. SQL TYPES OF JOIN
A B
A B
A B
A B
A B
A B
A B
Select all records from Table A and Table B, where the
join condition is met.
Select all records from Table A, along with records from
Table B for which the join condition is met (if at all).
Select all records from Table B, along with records from
Table A for which the join condition is met (if at all).
Select all records from Table A and Table B, regardless
of whether the join condition is met or not.
Select records from Table A which not exists in Table B.
Select records from Table B which not exists in Table A.
Select all records from Table A and Table B, excluded
common (cross) rows.
MySQL
SELECT <select list>
FROM table_a AS A
INNER JOIN table_b AS B ON A.Key = B.Key;
SELECT <select list>
FROM table_a AS A
LEFT JOIN table_b AS B ON A.Key = B.Key;
SELECT <select list>
FROM table_a AS A
RIGHT JOIN table_b AS B ON A.Key = B.Key;
SELECT <select list>
FROM table_a AS A
LEFT JOIN table_b AS B ON A.Key = B.Key
WHERE B.Key IS NULL;
SELECT <select list>
FROM table_a AS A
RIGHT JOIN table_b AS B ON A.Key = B.Key
WHERE A.Key IS NULL;
SELECT <select list>
FROM table_a AS A
FULL JOIN table_b AS B ON A.Key = B.Key;
SELECT <select list>
FROM table_a AS A
FULL JOIN table_b AS B ON A.Key = B.Key
WHERE A.Key IS NULL OR B.Key IS NULL;
15. MySQL Indexes
Indexes are used to find rows with specific column values quickly. Without an index, MySQL must begin with the first row and then read
through the entire table to find the relevant rows. The larger the table, the more this costs. If the table has an index for the columns in
question, MySQL can quickly determine the position to seek to in the middle of the data file without having to look at all the data. This is much
faster than reading every row sequentially. Most MySQL indexes (PRIMARY KEY, UNIQUE, INDEX, and FULLTEXT) are stored in B-trees.
Exceptions: Indexes on spatial data types use R-trees; MEMORY tables also support hash indexes; InnoDB uses inverted lists for FULLTEXT
indexes.
B-Tree indexes
When people talk about an index without mentioning a type, they are
probably referring to a B-Tree index. Most of MySQL's storage engines
support this index type.
B-Tree index speeds up data access because the storage engine
doesn't have to scan the whole table to find the desired data.
Instead, it starts at the root node, the slots in the root node hold pointers
to child nodes, and the storage engine follows these pointers. It finds
the right pointer by looking at the values in the node pages.
Eventually, the storage engine either determines that the desired value
doesn't exist or successfully reaches a leaf pages which have pointers
to the indexed data.
3 5
1 3 42 5 6 7
d1 d2 d3 d4 d5 d6 d7
B+Tree
B+ tree example linking the keys 1–7 to data values d1-d7.
Index selectivity is the ratio of the number of distinct indexed values (the cardinality) to the total number of rows in the table (#N), and
ranges from 1/#N to 1. A highly selective index is good because it lets MySQL filter out more rows when it looks for matches. A unique index
has a selectivity of 1, which is as good as it gets.
MySQL
Clustered Indexes. When a table has a clustered index, its rows are actually stored in the index's leaf pages. The term "clustered" refers to
the fact that rows with adjacent key values are stored close to each other. You can have only one clustered index per table.
If you don't define a primary key, InnoDB will try to use a unique nonnullable index instead. If there's no such index, InnoDB will define a
hidden primary key for you and then cluster on that. InnoDB clusters records together only within a page. Pages with adjacent key values
might be distant from each other.
A clustering primary key can help performance.
16. Types of queries that can use index.
B-Tree indexes work well for lookups by the full key value, a key range, or a key prefix.
INDEX(A,B,C)
WHERE A = 'a' AND B = 'b' AND C = 'c'; -- Match on the full key value specified values for all columns in the index.
Use columns A,B,C by index
- multicolumn index for columns: A,B,C
-- Leading column is not referenced (column A)
-- Leading column is not referenced (column A,B)
- any leftmost prefix of the index can be used by
the optimizer to look up rows
Use full or part of index
Not use index
-- Index use (left part of index)
-- Index not use
-- Index not use
Use index by LIKE
-- Match one part exactly and match a range on another part.
Use columns A,B,C(range) by index
-- Match a left part of index, first and second columns.
Use columns A,B(range) by index
-- Match a leftmost prefix. This uses only the first column in the index.
Use columns A(range) by index
WHERE A = 'a' AND B IN('a','b') AND C > 'c';
WHERE A = 'a' AND B >= 'b' AND C = 'c';
WHERE A >= 'a' AND B = 'b' AND C = 'c';
MySQL stop using key parts in multi part index as soon as it met the real range (<,>, BETWEEN), it however is able to continue
using key parts further to the right if IN(…) range is used.
MySQL
Covering Indexes
Indexes need to be designed for the whole query, not just the WHERE clause. Indexes are indeed a way to find rows efficiently, but MySQL
can also use an index to retrieve a column's data, so it doesn't have to read the row at all. The index's leaf nodes contain the values they
index; why read the row when reading the index can give you the data you want? An index that contains (or "covers") all the data needed to
satisfy a query is called a covering index.
WHERE B = 2 AND C = 3;
WHERE C = 3;
WHERE A LIKE 'abc%';
WHERE A LIKE '%abc';
WHERE A LIKE '%abc%';
17. Main types of indexes
Primary Key. The primary key for a table represents the column or set of columns. Query performance benefits from the NOT NULL
optimization, because it cannot include any NULL values. With the InnoDB storage engine, the table data is physically organized to do ultra-
fast lookups and sorts based on the primary key column or columns. Use as a primary key, you might create a separate column with auto-
increment values to use as the primary key. These unique IDs can serve as pointers to corresponding rows in other tables when you join
tables using foreign keys.
Foreign key. A foreign key is a field in a table that matches another field of another table. A foreign key places constraints on data in the
related tables, which enables MySQL to maintain referential integrity.
[CONSTRAINT [symbol]] FOREIGN KEY
[index_name](col_name,...)
REFERENCES tbl_name(col_name,...)
[ON DELETE reference_option]
[ON UPDATE reference_option]
reference_option:
RESTRICT | CASCADE | SET NULL | NO ACTION | SET DEFAULT
For storage engines supporting foreign keys, MySQL rejects any INSERT or UPDATE operation that attempts to create a foreign key value in
a child table if there is no a matching candidate key value in the parent table. When an UPDATE or DELETE operation affects a key value in
the parent table that has matching rows in the child table, the result depends on the referential action specified using ON UPDATE and ON
DELETE subclauses of the FOREIGN KEY clause. MySQL supports five options regarding the action to be taken, listed here:
CASCADE: Delete or update the row from the parent table, and automatically delete or update the matching rows in the child table. Both
ON DELETE CASCADE and ON UPDATE CASCADE are supported. Between two tables, do not define several ON UPDATE CASCADE
clauses that act on the same column in the parent table or in the child table.
SET NULL: Delete or update the row from the parent table, and set the foreign key column or columns in the child table to NULL. Both
ON DELETE SET NULL and ON UPDATE SET NULL clauses are supported.
RESTRICT: Rejects the delete or update operation for the parent table. Specifying RESTRICT (or NO ACTION) is the same as omitting
the ON DELETE or ON UPDATE clause.
NO ACTION: A keyword from standard SQL. In MySQL, equivalent to RESTRICT. The MySQL Server rejects the delete or update
operation for the parent table if there is a related foreign key value in the referenced table. Some database systems have deferred
checks, and NO ACTION is a deferred check. In MySQL, foreign key constraints are checked immediately, so NO ACTION is the same
as RESTRICT.
SET DEFAULT: This action is recognized by the MySQL parser, but both InnoDB and NDB reject table definitions containing ON
DELETE SET DEFAULT or ON UPDATE SET DEFAULT clauses.
MySQL
18. Prefix Indexes. Sometimes you need to index very long character columns, which makes your indexes large and slow. We can save space
and get good performance by indexing the first few characters instead of the whole value. This makes your indexes use less space, but it
also makes them less selective.
The trick is to choose a prefix that’s long enough to give good selectivity, but short enough to save space.
SELECT COUNT(DISTINCT name)/COUNT(*) AS sel_complete,
COUNT(DISTINCT LEFT(name, 3))/COUNT(*) AS sel_3,
COUNT(DISTINCT LEFT(name, 4))/COUNT(*) AS sel_4,
COUNT(DISTINCT LEFT(name, 5))/COUNT(*) AS sel_5,
COUNT(DISTINCT LEFT(name, 6))/COUNT(*) AS sel_6,
COUNT(DISTINCT LEFT(name, 7))/COUNT(*) AS sel_7,
COUNT(DISTINCT LEFT(name, 8))/COUNT(*) AS sel_8
FROM table;
sel_complete sel_3 sel_4 sel_5 sel_6 sel_7 sel_8
0.317 0.105 0.188 0.245 0.309 0.317 0.317
Calculate a good prefix length is by computing the full column selectivity and trying to make the prefixes selectivity close to that value:
customers
id
name
date
...
orders
order_id
order_name
datetime
customer_id
...
We have two tables: customers and orders. Each customer has zero or more orders
and each order belongs to only one customer. The relationship between customers
table and orders table is one-to-many, and it is established by a foreign key in the
orders table specified by the customer_id field. The customer_id field in the orders
table relates to the id primary key field in the customers table.
The customers table is called parent table or referenced table, and the orders table
is known as child table or referencing table. A foreign key can be a column or a set
of columns. The columns in the child table often refer to the primary key columns in
the parent table. A table may have more than one foreign key, and each foreign key
in the child table may refer to a different parent table.
Foreign key. Example
Column Indexes. The most common type of index involves a single column, storing copies of the values from that column in a data
structure, allowing fast lookups for the rows with the corresponding column values. The B-tree data structure lets the index quickly find a
specific value, a set of values, or a range of values, corresponding to operators such as =, >, ≤, BETWEEN, IN, and so on, in a WHERE
clause.
Multiple-Column Indexes. MySQL can create composite indexes (that is, indexes on multiple columns). An index may consist of up to
16 columns. multiple-column index can be considered a sorted array, the rows of which contain values that are created by concatenating
the values of the indexed columns.
FULLTEXT Indexes. FULLTEXT indexes are used for full-text searches. Only the InnoDB and MyISAM storage engines support
FULLTEXT indexes and only for CHAR, VARCHAR, and TEXT columns. Indexing always takes place over the entire column and column
prefix indexing is not supported. For queries that contain full-text expressions, MySQL evaluates those expressions during the optimization
phase of query execution. The optimizer does not just look at full-text expressions and make estimates, it actually evaluates them in the
process of developing an execution plan.
MySQL
ALTER TABLE table ADD KEY(name(7));
19. Replication Overview
Master
Binary
Logs
Data
changes
Slave
Relay
Logs
Read
Write
Read
Replay
Replication enables data from one MySQL database server (the master) to be copied to one or more MySQL database servers (the slaves).
Replication is asynchronous by default. Depending on the configuration, you can replicate all databases, selected databases, or even
selected tables within a database. Advantages of replication in MySQL include: Scale-out solutions, Data security, Analytics.
How Replication Works
1. The master records changes to its data in its binary log. (These records are
called binary log events.) Just before each transaction that updates data
completes on the master, the master records the changes in its binary log.
2. The replica copies the master's binary log events to its relay log. To begin, it
starts a worker thread, called the I/O slave thread. The I/O thread opens an
ordinary client connection to the master, then starts a special binlog dump
process. The binlog dump process reads events from the master's binary log.
If it catches up to the master, it goes to sleep and waits for the master to
signal it when there are new events. The I/O thread writes the events to the
replica's relay log.
3. The replica replays the events in the relay log, applying the changes to its
own data. The SQL slave thread reads and replays events from the relay log,
thus updating the replica’s data to match the master's. The events the SQL
thread executes can optionally go into the replica's own binary log.
1
2
3
Replication Formats
Statement-based binary logging, the master writes SQL statements to the binary log. Replication of the master to the slave works by
executing the SQL statements on the slave. This is called statement-based replication (SBR), which corresponds to the MySQL
statement-based binary logging format.
Row-based logging, the master writes events to the binary log that indicate how individual table rows are changed. Replication of the
master to the slave works by copying the events representing the changes to the table rows to the slave. This is called row-based
replication (RBR). Row-based logging is the default method.
Mixed-format logging. You can also configure MySQL to use a mix of both statement-based and row-based logging, depending on which
is most appropriate for the change to be logged. When using mixed-format logging, a statement-based log is used by default. Depending
on certain statements, and also the storage engine being used, the log is automatically switched to row-based in particular cases.
Replication using the mixed format is referred to as mixed-based replication or mixed-format replication.
MySQL
20. Replication Topologies
Single
Chain
CircularMultiple
Master with Backup Master
(Multiple Replication)
Multi - Circular
This setup use an intermediate master to act as a relay to the other
slaves in the replication chain. When there are many slaves connected to
a master, the network interface of the master can get overloaded.
This topology allows the read replicas to pull the replication stream from
the relay server to offload the master server.
The ring topology, this setup requires two or more MySQL servers which act
as master. All masters receive writes and generate binlogs with a few caveats:
Set auto-increment offset on each server to avoid primary key collisions.
There is no conflict resolution.
Common practice is to only write to one master and the other master acts
as a hot-standby node. Still, if you have slaves below that tier, you have to
switch to the new master manually if the designated master fails.
This the most straightforward MySQL replication topology.
One master receives writes, one or more slaves replicate from the
same master via asynchronous or semi-synchronous replication. If
the designated master goes down, the most up-to-date slave must
be promoted as new master. The remaining slaves resume the
replication from the new master.
The master pushes changes to a backup master and to one or more slaves.
Semi-synchronous replication is used between master and backup master.
Backup master gets update, writes to its relay log and flushes to disk.This
topology works well when performing master failover in case the master goes
down. The backup master acts as a warm-standby server as it has the highest
probability of having up-to-date data when compared to other slaves.
(read only)
(read only)
1 2
3
4 5
6
MySQL