SlideShare a Scribd company logo
1 of 20
Download to read offline
MySQL
MySQL Overview
A.Sidelev
2019
Architecture Transactions
Replication
Indexing
Optimization Storage Engines
Query Execution Isolation Levels
Normalization
MySQL Timeline
Sun Microsystems: 2008 - 2010MySQL AB: until 2008 Oracle: since 2010
2001 2003 2005 2006 2008 2010 2013 20182015
Version 3.23
Version 4.0 Version 5.5
Version 4.1 Version 5.6
Version 5.7
Version 8.0Version 5.1
Version 5.0 Version 6.0
Query cache
InnoDB a standard
part of the server
Replication was
rewritten to use two
threads
Support for UNION
and multi-table
DELETE statements
Introduced views
Introduced triggers
Introduced stored
procedures
ISAM engine was
removed completely
Add replication
Add full-text indexing
MyISAM
Support InnoDB
plugin
Add Subqueries
Add INSERT ON
DUPLICATE KEY UPDATE
Support UTF-8
character
New binary protocol
and prepared
statement support
Introduced partitioning
Add row-based
replication
Add variety of plugin
APIs
BerkeleyDB storage
engine was removed
Optimized
performance,
scalability, replication
InnoDB default
storage engine
Add database
PERFORMANCE_SCHEMA
APIs for replication,
authentication
Improvement
query optimizer
More plugin APIs
Implement
replication GTID
Delayed slave 
Multi source
replication
Added JSON
support
Improved InnoDB
scalability
Canceled
SQL Window
functions
JSON Extended
syntax
Default
character utf8mb4
Better at Read/Write
workloads
SQL Roles
MySQL
MySQL Architecture
Client connecors
JDBC, ODBC, PHP, Python, .NET, Perl, Ruby, Native C API, ...  
Services and utilities
Connection pool
Storage engines
Connection Handling, Thread Reuse, Authentication, Security, Connections Limits, Caches
Backup & Recovery
Administration
Security
Replication
Cluster
Migration & Metadata
Monitoring
SQL Interface
DML, DDL
Stored procedures
Triggers
Views
Parser
Queries translation
Syntactic analyzer
Lexical semantic
Code generation
Object privileges
Optimizer
Rewriting
Statistic
Order of scanning
Usege Indexes
Caches
Global and Engines
specific caches and
buffers
MyISAMInnoDB Memory Archive Federated CSV Merge Blackhole
File system Files and Logs
NTFS, ext4, SAN, NAS
Binary, Error, Slow, General, Redo, Undo,
Data, Index
MySQL SERVER
MySQL
Transactions
ACID - an acronym standing for atomicity, consistency, isolation, and durability. These properties are all desirable in a database system, and
are all closely tied to the notion of a transaction. The transactional features of InnoDB adhere to the ACID principles.
Transactions are atomic units of work that can be committed or rolled back. When a transaction makes multiple changes to the database,
either all the changes succeed when the transaction is committed, or all the changes are undone when the transaction is rolled back.
A banking application is the classic example of why transactions are necessary, for example transfer money between two accounts:
START	TRANSACTION;
UPDATE	accounts.balance	SET	balance	=	balance	-	100.00	WHERE	customer_id	=	1;
UPDATE	accounts.balance	SET	balance	=	balance	+	100.00	WHERE	customer_id	=	2;
COMMIT;
The entire operation should be wrapped in a transaction so that if any one of the steps fails, any completed steps can be rolled back.
You start a transaction with the START TRANSACTION statement and then either make its changes permanent with COMMIT or discard the
changes with ROLLBACK .
What happens if the database server crashes while performing line 3? The customer probably just lost $100. And what if another process
comes along between lines 2 and 3 and removes the entire checking account balance? The bank has given the customer a $100 credit
without even knowing it.
Transactions aren't enough unless the system passes the ACID test.
A: atomicity - transaction must function as a single indivisible unit of work so that the entire transaction is either applied or rolled back. When
transactions are atomic, there is no such thing as a partially completed transaction: it's all or nothing.
C: consistency - the database should always move from one consistent state to the next. In our example, consistency ensures that a crash
between lines 2 and 3 doesn't result in $100 disappearing from the checking account. Because the transaction is never committed, none of
the transaction’s changes are ever reflected in the database.
I: isolation - the results of a transaction are usually invisible to other transactions until the transaction is complete. This ensures that if a bank
account summary runs after line 2 but before line 3 in our example, it will still see the $100 in the checking account. When we discuss
isolation levels, you’ll understand why we said usually invisible.
D: durability - once committed, a transaction’s changes are permanent. This means the changes must be recorded such that data won’t be
lost in a system crash.
Deadlocks. A deadlock is when two or more transactions are mutually holding and requesting locks on the same resources, creating a cycle
of dependencies. Deadlocks occur when transactions try to lock resources in a different order. They can happen whenever
multiple transactions lock the same resources.
To combat this problem, database systems implement various forms of deadlock detection and timeouts. The more sophisticated systems,
such as the InnoDB storage engine, will notice circular dependencies and return an error instantly.
MySQL
Isolation Levels
Isolation level Dirty reads Phantom reads Locking readsNonrepeatable reads
READ UNCOMMITTED
READ COMMITTED
REPEATABLE READ
SERIALIZABLE
READ UNCOMMITTED
In the READ UNCOMMITTED isolation level, transactions can view the results of uncommitted
transactions. At this level, many problems can occur unless you really, really know what you are
doing and have a good reason for doing it. Reading uncommitted data is also known as a dirty read.
READ COMMITTED
READ COMMITTED satisfies the simple definition of isolation used earlier: a transaction will see
only those changes made by transactions that were already committed when it began, and its
changes won't be visible to others until it has committed. This level still allows what's known as a
nonrepeatable read. This means you can run the same statement twice and see different data.
REPEATABLE READ
REPEATABLE READ is MySQL's default transaction isolation level.
REPEATABLE READ guarantees that any rows a transaction reads will "look the same" in
subsequent reads within the same transaction, but in theory it still allows another tricky problem:
phantom reads. Simply put, a phantom read can happen when you select some range of rows,
another transaction inserts a new row into the range, and then you select the same range again;
you will then see the new "phantom" row.
SERIALIZABLE
The highest level of isolation, SERIALIZABLE, solves the phantom read problem by forcing
transactions to be ordered so that they can't possibly conflict. SERIALIZABLE places a lock on
every row it reads. At this level, a lot of timeouts and lock contention can occur. 
The SQL standard defines four isolation levels, with specific rules for which changes are and aren't visible inside and outside a transaction.
Lower isolation levels typically allow higher concurrency and have lower overhead.
Summarizes the various isolation levels and the drawbacks associated with each one:
MySQL
Multiversion Concurrency Control
MySQL InnoDB don't use a simple row-locking mechanism. It uses row-level locking in conjunction with multiversion concurrency control
(MVCC). MVCC works by keeping a snapshot of the data as it existed at some point in time. This means transactions can see a consistent
view of the data, no matter how long they run.
InnoDB implements MVCC by storing with each row two additional, hidden values that record when the row was created (inserted) and when
it was expired (deleted). This is a number that increments each time a transaction begins. Each transaction keeps its own record of the
current system version, as of the time it began. Each query has to check each row's version numbers against the transaction's version.
MVCC works only with the REPEATABLE READ and READ COMMITTED isolation levels.
SELECT
    InnoDB must examine each row to ensure that it meets two criteria:
    1. InnoDB must find a version of the row that is at least as old as the transaction (i.e., its version must be less than or equal to the             
    transaction’s version). This ensures that either the row existed before the transaction began, or the transaction created or altered the row.
    2. The row's deletion version must be undefined or greater than the transaction's version. This ensures that the row wasn't deleted before 
    the transaction began.
    Rows that pass both tests may be returned as the query's result.
INSERT
    InnoDB records the current system version number with the new row.
DELETE
    InnoDB records the current system version number as the row's deletion ID.
UPDATE
    InnoDB writes a new copy of the row, using the system version number for the new row's version. It also writes the system version             
    number as the old row's deletion version.
The result of all this extra record keeping is that most read queries never acquire locks. They simply read data as fast as they can, making
sure to select only rows that meet the criteria. The drawbacks are that the storage engine has to store more data with each row, do more
work when examining rows, and handle some additional housekeeping operations.
MySQL
Database normalization
Database normalization is a process by which an existing schema is modified to bring its component tables into compliance with a series of
progressive normal forms. The concept of database normalization was first introduced by Edgar Frank Codd in his paper: "A Relational
Model of Data for Large Shared Data Banks".
First Normal Form
The first normal form (or 1NF) requires that the values in each column of a table are atomic. By atomic we mean that there are no sets of
values within a column.
One method for bringing a table into first normal form is to separate the entities contained in the table into separate tables.
Second Normal Form
The second normal form (or 2NF) any non-key columns must depend on the entire primary key. In the case of a composite primary key, this
means that a non-key column cannot depend on only part of the composite key.
Third Normal Form
Third Normal Form (3NF) requires that all columns depend directly on the primary key. Tables violate the Third Normal Form when one
column depends on another column, which in turn depends on the primary key (a transitive dependency).
MySQL
1st NF 2nd NF 3rd NF Boyce-Codd NF 4th NF 5th NF Domain Key NF 6th NF
Boyce-Codd normal form is a slightly stronger version of the third normal form (3NF).  If a relational schema is in BCNF then all redundancy
based on functional dependency has been removed, although other types of redundancy may still exist.
Boyce-Codd Normal Form
Fourth Normal Form
Fourth normal form (4NF) is a level of database normalization where there are no non-trivial multivalued dependencies other than a
candidate key.
Fifth Normal Form
Fifth normal form (5NF) is a level of database normalization designed to reduce redundancy in relational databases recording multi-valued
facts by isolating semantically related multiple relationships.
Domain-key Normal Form
Domain-key normal form (DK/NF) is a normal form which requires that the database contains no constraints other than domain constraints
and key constraints. A domain constraint specifies the permissible values for a given attribute, while a key constraint specifies the attributes
that uniquely identify a row in a given table.
Sixth Normal Form
Sixth normal form based on an extension of the relational algebra. Relational operators, such as join, are generalized to support a natural
treatment of interval data, such as sequences of dates or moments in time, for instance in temporal databases.
MySQL's Storage Engines
MySQL
MySQL stores each database (also called a schema) as a subdirectory of its data directory in the underlying filesystem. When you create a
table, MySQL stores the table definition in a .frm file with the same name as the table. 
InnoDB: The default storage engine in MySQL 8.0. InnoDB is a transaction-safe (ACID compliant) storage engine for MySQL that has
commit, rollback, and crash-recovery capabilities to protect user data. InnoDB row-level locking (without escalation to coarser granularity
locks) and Oracle-style consistent nonlocking reads increase multi-user concurrency and performance. InnoDB stores user data in clustered
indexes to reduce I/O for common queries based on primary keys. To maintain data integrity, InnoDB also supports FOREIGN KEY
referential-integrity constraints.
MyISAM: These tables have a small footprint. Table-level locking limits the performance in read/write workloads, so it is often used in read-
only or read-mostly workloads in Web and data warehousing configurations.
Memory: Stores all data in RAM, for fast access in environments that require quick lookups of non-critical data. This engine was formerly
known as the HEAP engine. Its use cases are decreasing; InnoDB with its buffer pool memory area provides a general-purpose and durable
way to keep most or all data in memory, and NDBCLUSTER provides fast key-value lookups for huge distributed data sets. 
CSV: Its tables are really text files with comma-separated values. CSV tables let you import or dump data in CSV format, to exchange data
with scripts and applications that read and write that same format. Because CSV tables are not indexed, you typically keep the data in
InnoDB tables during normal operation, and only use CSV tables during the import or export stage.
Archive: These compact, unindexed tables are intended for storing and retrieving large amounts of seldom-referenced historical, archived,
or security audit information.
Blackhole: The Blackhole storage engine accepts but does not store data, similar to the Unix /dev/null device. Queries always return an
empty set. These tables can be used in replication configurations where DML statements are sent to slave servers, but the master server
does not keep its own copy of the data. 
NDB (also known as NDBCLUSTER): This clustered database engine is particularly suited for applications that require the highest possible
degree of uptime and availability.
Merge: Enables a MySQL DBA or developer to logically group a series of identical MyISAM tables and reference them as one object. Good
for VLDB environments such as data warehousing.
Federated: Offers the ability to link separate MySQL servers to create one logical database from many physical servers. Very good for
distributed or data mart environments.
Execution path of a query
Query
Cache
Parser
Preprocessor
Query
optimizer
MyISAM InnoDB
...
DATA
Storage engines
Query
execution
engine
MySQL Server
Client
Client/Server
protocol
Checks the query cache. 
If there’s a hit, it returns the
stored result from the cache;
otherwise, it passes the SQL
statement to the next step.
The query execution engine
executes the plan by making
calls to the storage engine
API.
MySQL's parser breaks the
query into tokens and builds
a "parse tree" from them.
The preprocessor checks the
resulting parse tree for
additional semantics.
Checks that tables and
columns exist, and privileges.
- Logical Transformations.
- Cost-based optimizer: Join
  order and access methods
- Plan refinement
5
42
1 3
Main phases of query execution
SQL
Result
API callsParse
Tree
Query
execution
plan
SQL
query
Parsed
Tree
Result Result
MySQL
Query Cache
The MySQL query cache holds the exact bits that a completed query returned to the client. When a query cache hit occurs, the server can
simply return the stored results immediately, skipping the parsing, optimization, and execution steps. The query cache keeps track of which
tables a query uses, and if any of those tables changes, it invalidates the cache entry.
MySQL does not parse, "normalize" or parameterize a statement when it checks for a cache hit; it uses the statement and other bits of data
exactly as the client sends them. Any difference in character case, spacing, or comments any difference at all will prevent a query from
matching a previously cached version.
Another caching consideration is that the query cache will not store a result unless the query that generated it was deterministic. Thus, any
query that contains a nondeterministic function, such as NOW() or CURRENT_DATE() , will not be cached. In fact, the query cache does not
work for queries that refer to user-defined functions, stored functions, user variables, temporary tables.
Any SELECT query that MySQL doesn't serve from the cache is a cache miss.  A cache miss can occur for any of the following reasons:
The query is not cacheable, either because it contains a nondeterministic construct (such as CURRENT_DATE, NOW, etc) or because
its result set is too large to store.
The server has never seen the query before, so it never had a chance to cache its result.
The query's result was previously cached, but the server removed it. This can happen because there wasn't enough memory to keep it,
because someone instructed the server to remove it, or because it was invalidated.
If your server has a lot of cache misses but very few uncacheable queries, one of the following must be true:
The query cache is not warmed up yet. That is, the server hasn't had a chance to fill the cache with result sets.
The server is seeing queries it hasn't seen before. If you don't have a lot of repeated queries, this can happen even after the cache is
warmed up.
There are a lot of cache invalidations.
Cache invalidations can happen because of fragmentation, insufficient memory, or data modifications.
Query Cache Optimizations
It's more efficient to batch writes than to do them singly, because this method invalidates cached cache entries only once.You cannot control
the query cache on a per-database or per-table basis, but you can include or exclude individual queries with the SQL_CACHE and
SQL_NO_CACHE modifiers in the SELECT statement. You can also enable or disable the query cache on a per-connection basis by setting
the session-level query_cache_type server variable to the appropriate value.
If you want to avoid the query cache for most queries, but you know that some will benefit significantly from caching, you can set the global
query_cache_type to DEMAND and then add the SQL_CACHE hint to those queries you want to cache. Although this requires you to do
more work, it gives you very fine-grained control over the cache. Conversely, if you want to cache most queries and exclude just a few, you
can add SQL_NO_CACHE to them.
MySQL
MySQL's parser breaks the query into tokens and builds a "parse tree" from them. The parser uses  MySQL's SQL grammar to interpret and
validate the query. For instance, it ensures that the tokens in the query are valid and in the proper order, and it checks for mistakes such as
quoted strings that aren't terminated. 
The preprocessor then checks the resulting parse tree for additional semantics that the parser can't resolve. For example, it checks that
tables and columns exist, and it resolves names and aliases to ensure that column references aren't ambiguous. Next, the preprocessor
checks privileges.
The parser and the preprocessor
Query optimizer
The parse tree is now valid and ready for the optimizer to turn it into a query execution plan. A query can often be executed many different
ways and produce the same result. The optimizer's job is to find the best option. MySQL uses a cost-based optimizer, which means it tries to
predict the cost of various execution plans and choose the least expensive. The unit of cost was originally a single random 4 KB data page
read, but it has become more sophisticated and now includes factors such as the estimated cost of executing a WHERE clause comparison.
You can see how expensive the optimizer estimated a query to be by running the query, then inspecting the Last_query_cost session
variable.
The score is calculated based on various statistical information: the number of pages per table or index, the cardinality (number of distinct
values) of the indexes, the length of the rows and keys, and the key distribution. The optimizer does not include the effects of any type of
caching in its estimates - it assumes every read will result in a disk I/O operation.
The optimizer might not always choose the best plan, for many reasons:
The statistics could be wrong. The server relies on storage engines to provide statistics, and they can range from exactly correct to
wildly inaccurate. For example, the InnoDB storage engine doesn't maintain accurate statistics about the number of rows in a table
because of its MVCC architecture.
The cost metric is not exactly equivalent to the true cost of running the query, so even when the statistics are accurate, the query might
be more or less expensive than MySQL's approximation. A plan that reads more pages might actually be cheaper in some cases, such
as when the reads are sequential so the disk I/O is faster, or when the pages are already cached in memory. MySQL also doesn't
understand which pages are in memory and which pages are on disk, so it doesn't really know how much I/O the query will cause.
MySQL doesn't consider other queries that are running concurrently, which can affect how quickly the query runs.
The optimizer doesn't take into account the cost of operations not under its control, such as executing stored functions or user-defined
functions.
MySQL's query optimizer is a highly complex piece of software, and it uses many optimizations to transform the query into an execution plan.
There are two basic types of optimizations, which we call static and dynamic.
Static optimizations can be performed simply by inspecting the parse tree. For example, the optimizer can transform the WHERE clause
into an equivalent form by applying algebraic rules. Static optimizations are independent of values, such as the value of a constant in a
WHERE clause. They can be performed once and will always be valid, even when the query is reexecuted with different values.
Dynamic optimizations are based on context and can depend on many factors, such as which value is in a WHERE clause or how many
rows are in an index. They must be reevaluated each time the query is executed.
MySQL
Optimizing SQL Statements
The core logic of a database application is performed through SQL statements, whether issued directly through an interpreter or submitted
behind the scenes through an API. The guidelines cover SQL operations that read and write data, the behind-the-scenes overhead for SQL
operations in general. This section discusses optimizations that can be made for processing WHERE clauses.
The examples use SELECT statements, but the same optimizations apply for WHERE clauses in DELETE and UPDATE statements. 
Removal of unnecessary parentheses:
((a	AND	b)	AND	c	OR	(((a	AND	b)	AND	(c	AND	d))))
(a	AND	b	AND	c)	OR	(a	AND	b	AND	c	AND	d)
Constant folding:             
(a	<	b	AND	b	=	c)	AND	a	=	5
b	>	5	AND	b	=	c	AND	a	=	5
Constant condition removal:
(b	>=	5	AND	b	=	5)	OR	(b	=	6	AND	5	=	5)	OR	(b	=	7	AND	5	=	6)
b	=	5	OR	b	=	6
Equality Range Optimization of Many-Valued Comparisons:
col_name	=	val1	OR	...	OR	col_name	=	valN
col_name	IN(val1,...,valN)
MySQL sorts the values in the IN() list and uses a fast binary search to see whether a value is in the list. This is O(log n) in the size of the list,
whereas an equivalent series of OR clauses is O(n) in
the size of the list (i.e., much slower for large lists).
Range Optimization of Row Constructor Expressions:
SELECT	...	FROM	t1	WHERE	(col_1	=	'a'	AND	col_2	=	'b')	OR	(col_1	=	'c'	AND	col_2	=	'd');
SELECT	...	FROM	t1	WHERE	(col_1,	col_2)	IN	(('a',	'b'),	('c',	'd'));
MySQL
for	each	row	in	t1	matching	range	{
		for	each	row	in	t2	matching	reference	key	{
				for	each	row	in	t3	{
						if	row	satisfies	join	conditions,	send	to	client				
				}		
		}
}
SQL JOIN
Nested-Loop Join Algorithm. A simple nested-loop join (NLJ) algorithm reads rows from the first table in a loop one at a time, passing each
row to a nested loop that processes the next table in the join. This process is repeated as many times as there remain tables to be joined.
Use indexes in destination join's columns  for optimize perfomance to retrieve rows from other tables when performing joins. MySQL can use
indexes on columns more efficiently if they are declared as the same type and size.           
Assume that a join between three tables t1, t2, and
t3 is to be executed using the following join types:     
      Table
t1
t2
t3
Join	Type
range
ref
ALL
If a simple NLJ algorithm is used, the join is
processed like this:           
FULL JOIN can't be executed with nested loops and
backtracking as soon as a table with no matching rows is
found, because it might begin with a table that has no
matching rows.
This explains why MySQL doesn't support FULL JOIN.
If you need to make FULL JOIN in MySQL you can combine
LEFT JOIN and RIGHT JOIN 
SELECT	<select	list>
FROM	table_a	AS	A	
LEFT	JOIN	table_b	AS	B	ON	A.key	=	B.key	
UNION				
SELECT	<select	list>
FROM	table_a	AS	A
RIGHT	JOIN	table_b	AS	B	ON	A.key	=	B.key
table_btable_a
col1=1, col2=1
...
...
col1=3, col2=1
...
1 loop
N loops
per row JOIN
JOIN
A B
C
A B
C
col2=1, col3=1
col2=1, col3=2
...
col2=1, col3=1
col2=1, col3=2
results
col1=1, col3=1
col1=1, col3=2
...
col1=3, col3=1
col1=3, col3=2
Swim-lane diagram illustrating retrieving rows using a join MySQL's query execution plans always take the form of a
left-deep tree
MySQL
MySQL executes joins between tables using a nested-loop algorithm or variations on it. 
SQL TYPES OF JOIN
A B
A B
A B
A B
A B
A B
A B
Select all records from Table A and Table B, where the
join condition is met.
Select all records from Table A, along with records from
Table B for which the join condition is met (if at all).
Select all records from Table B, along with records from
Table A for which the join condition is met (if at all).
Select all records from Table A and Table B, regardless
of whether the join condition is met or not.
Select records from Table A which not exists in Table B.
Select records from Table B which not exists in Table A.
Select all records from Table A and Table B, excluded
common (cross) rows.
MySQL
SELECT	<select	list>	
FROM	table_a	AS	A	
INNER	JOIN	table_b	AS	B	ON	A.Key	=	B.Key;
SELECT	<select	list>	
FROM	table_a	AS	A	
LEFT	JOIN	table_b	AS	B	ON	A.Key	=	B.Key;
SELECT	<select	list>	
FROM	table_a	AS	A	
RIGHT	JOIN	table_b	AS	B	ON	A.Key	=	B.Key;
SELECT	<select	list>	
FROM	table_a	AS	A	
LEFT	JOIN	table_b	AS	B	ON	A.Key	=	B.Key
WHERE	B.Key	IS	NULL;
SELECT	<select	list>	
FROM	table_a	AS	A	
RIGHT	JOIN	table_b	AS	B	ON	A.Key	=	B.Key
WHERE	A.Key	IS	NULL;
SELECT	<select	list>	
FROM	table_a	AS	A	
FULL	JOIN	table_b	AS	B	ON	A.Key	=	B.Key;
SELECT	<select	list>	
FROM	table_a	AS	A	
FULL	JOIN	table_b	AS	B	ON	A.Key	=	B.Key
WHERE	A.Key	IS	NULL	OR	B.Key	IS	NULL;
MySQL Indexes
Indexes are used to find rows with specific column values quickly. Without an index, MySQL must begin with the first row and then read
through the entire table to find the relevant rows. The larger the table, the more this costs. If the table has an index for the columns in
question, MySQL can quickly determine the position to seek to in the middle of the data file without having to look at all the data. This is much
faster than reading every row sequentially. Most MySQL indexes (PRIMARY KEY, UNIQUE, INDEX, and FULLTEXT) are stored in B-trees.
Exceptions: Indexes on spatial data types use R-trees; MEMORY tables also support hash indexes; InnoDB uses inverted lists for FULLTEXT
indexes. 
B-Tree indexes
When people talk about an index without mentioning a type, they are
probably referring to a B-Tree index. Most of MySQL's storage engines
support this index type. 
B-Tree index speeds up data access because the storage engine
doesn't have to scan the whole table to find the desired data.
Instead, it starts at the root node, the slots in the root node hold pointers
to child nodes, and the storage engine follows these pointers. It finds
the right pointer by looking at the values in the node pages.
Eventually, the storage engine either determines that the desired value
doesn't exist or successfully reaches a leaf pages which have pointers
to the indexed data.
3 5
1 3 42 5 6 7
d1 d2 d3 d4 d5 d6 d7
B+Tree
B+ tree example linking the keys 1–7 to data values d1-d7.
Index selectivity is the ratio of the number of distinct indexed values (the cardinality) to the total number of rows in the table (#N), and
ranges from 1/#N to 1. A highly selective index is good because it lets MySQL filter out more rows when it looks for matches. A unique index
has a selectivity of 1, which is as good as it gets.
MySQL
Clustered Indexes. When a table has a clustered index, its rows are actually stored in the index's leaf pages. The term "clustered" refers to
the fact that rows with adjacent key values are stored close to each other. You can have only one clustered index per table.
If you don't define a primary key, InnoDB will try to use a unique nonnullable index instead. If there's no such index, InnoDB will define a
hidden primary key for you and then cluster on that. InnoDB clusters records together only within a page. Pages with adjacent key values
might be distant from each other.
A clustering primary key can help performance.
Types of queries that can use index.
B-Tree indexes work well for lookups by the full key value, a key range, or a key prefix.
INDEX(A,B,C)
WHERE	A	=	'a'	AND	B	=	'b'	AND	C	=	'c'; -- Match on the full key value specified values for all columns in the index.
   Use columns A,B,C by index
- multicolumn index for columns: A,B,C
-- Leading column is not referenced (column A)
-- Leading column is not referenced (column A,B)
- any leftmost prefix of the index can be used by
  the optimizer to look up rows
Use full or part of index
Not use index
-- Index use (left part of index) 
-- Index not use
-- Index not use
Use  index by LIKE
-- Match one part exactly and match a range on another part.
   Use columns A,B,C(range) by index
-- Match a left part of index, first and second columns.
   Use columns A,B(range) by index
-- Match a leftmost prefix. This uses only the first column in the index.
   Use columns A(range) by index
WHERE	A	=	'a'	AND	B	IN('a','b')	AND	C	>	'c';
WHERE	A	=	'a'	AND	B	>=	'b'	AND	C	=	'c';
WHERE	A	>=	'a'	AND	B	=	'b'	AND	C	=	'c';
MySQL stop using key parts in multi part index as soon as it met the real range (<,>, BETWEEN), it however is able to continue
using key parts further to the right if IN(…) range is used.
MySQL
Covering Indexes
Indexes need to be designed for the whole query, not just the WHERE clause. Indexes are indeed a way to find rows efficiently, but MySQL
can also use an index to retrieve a column's data, so it doesn't have to read the row at all. The index's leaf nodes contain the values they
index; why read the row when reading the index can give you the data you want? An index that contains (or "covers") all the data needed to
satisfy a query is called a covering index.
WHERE	B	=	2	AND	C	=	3;
WHERE	C	=	3;
WHERE	A	LIKE	'abc%';
WHERE	A	LIKE	'%abc';
WHERE	A	LIKE	'%abc%';
Main types of indexes
   Primary Key. The primary key for a table represents the column or set of columns. Query performance benefits from the NOT NULL
optimization, because it cannot include any NULL values. With the InnoDB storage engine, the table data is physically organized to do ultra-
fast lookups and sorts based on the primary key column or columns. Use as a primary key, you might create a separate column with auto-
increment values to use as the primary key. These unique IDs can serve as pointers to corresponding rows in other tables when you join
tables using foreign keys.
   Foreign key. A foreign key is a field in a table that matches another field of another table. A foreign key places constraints on data in the
related tables, which enables MySQL to maintain referential integrity.
[CONSTRAINT	[symbol]]	FOREIGN	KEY
[index_name](col_name,...)
REFERENCES	tbl_name(col_name,...)
[ON	DELETE	reference_option]
[ON	UPDATE	reference_option]
reference_option:
				RESTRICT	|	CASCADE	|	SET	NULL	|	NO	ACTION	|	SET	DEFAULT
For storage engines supporting foreign keys, MySQL rejects any INSERT or UPDATE operation that attempts to create a foreign key value in
a child table if there is no a matching candidate key value in the parent table. When an UPDATE or DELETE operation affects a key value in
the parent table that has matching rows in the child table, the result depends on the referential action specified using ON UPDATE and ON
DELETE subclauses of the FOREIGN KEY clause. MySQL supports five options regarding the action to be taken, listed here:
CASCADE: Delete or update the row from the parent table, and automatically delete or update the matching rows in the child table. Both
ON DELETE CASCADE and ON UPDATE CASCADE are supported. Between two tables, do not define several ON UPDATE CASCADE
clauses that act on the same column in the parent table or in the child table.
SET NULL: Delete or update the row from the parent table, and set the foreign key column or columns in the child table to NULL. Both
ON DELETE SET NULL and ON UPDATE SET NULL clauses are supported.
RESTRICT: Rejects the delete or update operation for the parent table. Specifying RESTRICT (or NO ACTION) is the same as omitting
the ON DELETE or ON UPDATE clause.
NO ACTION: A keyword from standard SQL. In MySQL, equivalent to RESTRICT. The MySQL Server rejects the delete or update
operation for the parent table if there is a related foreign key value in the referenced table. Some database systems have deferred
checks, and NO ACTION is a deferred check. In MySQL, foreign key constraints are checked immediately, so NO ACTION is the same
as RESTRICT.
SET DEFAULT: This action is recognized by the MySQL parser, but both InnoDB and NDB reject table definitions containing ON
DELETE SET DEFAULT or ON UPDATE SET DEFAULT clauses.
MySQL
   Prefix Indexes. Sometimes you need to index very long character columns, which makes your indexes large and slow. We can save space
and get good performance by indexing the first few characters instead of the whole value. This makes your indexes use less space, but it
also makes them less selective.
The trick is to choose a prefix that’s long enough to give good selectivity, but short enough to save space.
SELECT	COUNT(DISTINCT	name)/COUNT(*)	AS	sel_complete,
							COUNT(DISTINCT	LEFT(name,	3))/COUNT(*)	AS	sel_3,
							COUNT(DISTINCT	LEFT(name,	4))/COUNT(*)	AS	sel_4,
							COUNT(DISTINCT	LEFT(name,	5))/COUNT(*)	AS	sel_5,
							COUNT(DISTINCT	LEFT(name,	6))/COUNT(*)	AS	sel_6,
							COUNT(DISTINCT	LEFT(name,	7))/COUNT(*)	AS	sel_7,
							COUNT(DISTINCT	LEFT(name,	8))/COUNT(*)	AS	sel_8
FROM	table;
sel_complete sel_3 sel_4 sel_5 sel_6 sel_7 sel_8
0.317 0.105 0.188 0.245 0.309 0.317 0.317
Calculate a good prefix length is by computing the full column selectivity and trying to make the prefixes selectivity close to that value:
customers
id
name
date
...
orders
order_id
order_name
datetime
customer_id
...
We have two tables: customers and orders. Each customer has zero or more orders
and each order belongs to only one customer. The relationship between customers
table and orders table is one-to-many, and it is established by a foreign key in the
orders table specified by the customer_id field. The customer_id field in the orders
table relates to the id primary key field in the customers table.
The customers table is called parent table or referenced table, and the orders table
is known as child table or referencing table. A foreign key can be a column or a set
of columns. The columns in the child table often refer to the primary key columns in
the parent table. A table may have more than one foreign key, and each foreign key
in the child table may refer to a different parent table.
Foreign key. Example
   Column Indexes. The most common type of index involves a single column, storing copies of the values from that column in a data
structure, allowing fast lookups for the rows with the corresponding column values. The B-tree data structure lets the index quickly find a
specific value, a set of values, or a range of values, corresponding to operators such as =, >, ≤, BETWEEN, IN, and so on, in a WHERE
clause.
   Multiple-Column Indexes.  MySQL can create composite indexes (that is, indexes on multiple columns). An index may consist of up to
16 columns.   multiple-column index can be considered a sorted array, the rows of which contain values that are created by concatenating
the values of the indexed columns.  
   FULLTEXT Indexes.  FULLTEXT indexes are used for full-text searches. Only the InnoDB and MyISAM storage engines support
FULLTEXT indexes and only for CHAR, VARCHAR, and TEXT columns. Indexing always takes place over the entire column and column
prefix indexing is not supported. For queries that contain full-text expressions, MySQL evaluates those expressions during the optimization
phase of query execution. The optimizer does not just look at full-text expressions and make estimates, it actually evaluates them in the
process of developing an execution plan. 
MySQL
ALTER	TABLE	table	ADD	KEY(name(7));
Replication Overview
Master
Binary
Logs
Data
changes
Slave
Relay
Logs
Read
Write
Read
Replay
Replication enables data from one MySQL database server (the master) to be copied to one or more MySQL database servers (the slaves).
Replication is asynchronous by default. Depending on the configuration, you can replicate all databases, selected databases, or even
selected tables within a database. Advantages of replication in MySQL include: Scale-out solutions, Data security, Analytics.
How Replication Works
1. The master records changes to its data in its binary log. (These records are
called binary log events.) Just before each transaction that updates data
completes on the master, the master records the changes in its binary log.
2. The replica copies the master's binary log events to its relay log. To begin, it
starts a worker thread, called the I/O slave thread. The I/O thread opens an
ordinary client connection to the master, then starts a special binlog dump
process. The binlog dump process reads events from the master's binary log.
If it catches up to the master, it goes to sleep and waits for the master to
signal it when there are new events. The I/O thread writes the events to the
replica's relay log.
3. The replica replays the events in the relay log, applying the changes to its
own data. The SQL slave thread reads and replays events from the relay log,
thus updating the replica’s data to match the master's.  The events the SQL
thread executes can optionally go into the replica's own binary log.
1
2
3
Replication Formats
Statement-based binary logging, the master writes SQL statements to the binary log. Replication of the master to the slave works by
executing the SQL statements on the slave. This is called statement-based replication (SBR), which corresponds to the MySQL
statement-based binary logging format.
Row-based logging, the master writes events to the binary log that indicate how individual table rows are changed. Replication of the
master to the slave works by copying the events representing the changes to the table rows to the slave. This is called row-based
replication (RBR). Row-based logging is the default method.
Mixed-format logging. You can also configure MySQL to use a mix of both statement-based and row-based logging, depending on which
is most appropriate for the change to be logged. When using mixed-format logging, a statement-based log is used by default. Depending
on certain statements, and also the storage engine being used, the log is automatically switched to row-based in particular cases.
Replication using the mixed format is referred to as mixed-based replication or mixed-format replication.
MySQL
Replication Topologies
Single
Chain
CircularMultiple
Master with Backup Master
(Multiple Replication)
Multi - Circular
This setup use an intermediate master to act as a relay to the other
slaves in the replication chain. When there are many slaves connected to
a master, the network interface of the master can get overloaded.
This topology allows the read replicas to pull the replication stream from
the relay server to offload the master server.
The ring topology, this setup requires two or more MySQL servers which act
as master. All masters receive writes and generate binlogs with a few caveats:
Set auto-increment offset on each server to avoid primary key collisions.
There is no conflict resolution.
Common practice is to only write to one master and the other master acts
as a hot-standby node. Still, if you have slaves below that tier, you have to
switch to the new master manually if the designated master fails.
This the most straightforward MySQL replication topology.
One master receives writes, one or more slaves replicate from the
same master via asynchronous or semi-synchronous replication. If
the designated master goes down, the most up-to-date slave must
be promoted as new master. The remaining slaves resume the
replication from the new master.
The master pushes changes to a backup master and to one or more slaves.
Semi-synchronous replication is used between master and backup master.
Backup master gets update, writes to its relay log and flushes to disk.This
topology works well when performing master failover in case the master goes
down. The backup master acts as a warm-standby server as it has the highest
probability of having up-to-date data when compared to other slaves.
(read only)
(read only)
1 2
3
4 5
6
MySQL

More Related Content

What's hot

What's hot (14)

Column-Stores vs. Row-Stores: How Different are they Really?
Column-Stores vs. Row-Stores: How Different are they Really?Column-Stores vs. Row-Stores: How Different are they Really?
Column-Stores vs. Row-Stores: How Different are they Really?
 
SKILLWISE-DB2 DBA
SKILLWISE-DB2 DBASKILLWISE-DB2 DBA
SKILLWISE-DB2 DBA
 
Db2 tutorial
Db2 tutorialDb2 tutorial
Db2 tutorial
 
Oracle SQL Basics
Oracle SQL BasicsOracle SQL Basics
Oracle SQL Basics
 
Adbms
AdbmsAdbms
Adbms
 
Db2
Db2Db2
Db2
 
IBM DB2
IBM DB2IBM DB2
IBM DB2
 
MySQL Query And Index Tuning
MySQL Query And Index TuningMySQL Query And Index Tuning
MySQL Query And Index Tuning
 
Db2 performance tuning for dummies
Db2 performance tuning for dummiesDb2 performance tuning for dummies
Db2 performance tuning for dummies
 
Changing platforms of Oracle database
Changing platforms of Oracle databaseChanging platforms of Oracle database
Changing platforms of Oracle database
 
Using T-SQL
Using T-SQL Using T-SQL
Using T-SQL
 
Physical architecture of sql server
Physical architecture of sql serverPhysical architecture of sql server
Physical architecture of sql server
 
UPD_OP_SQL
UPD_OP_SQLUPD_OP_SQL
UPD_OP_SQL
 
12c Database new features
12c Database new features12c Database new features
12c Database new features
 

Similar to MySQL Overview

SQL Server Transaction Management
SQL Server Transaction ManagementSQL Server Transaction Management
SQL Server Transaction ManagementMark Ginnebaugh
 
Synchronicity of a distributed account system
Synchronicity of a distributed account systemSynchronicity of a distributed account system
Synchronicity of a distributed account systemLuis Caldeira
 
Database consistency in NonStop SQL/MX
Database consistency in NonStop SQL/MXDatabase consistency in NonStop SQL/MX
Database consistency in NonStop SQL/MXFrans Jongma
 
DBMS-chap 2-Concurrency Control
DBMS-chap 2-Concurrency ControlDBMS-chap 2-Concurrency Control
DBMS-chap 2-Concurrency ControlMukesh Tekwani
 
Intro to tsql unit 12
Intro to tsql   unit 12Intro to tsql   unit 12
Intro to tsql unit 12Syed Asrarali
 
Introduction to mongodb
Introduction to mongodbIntroduction to mongodb
Introduction to mongodbMohammed Ragab
 
Transaction and concurrency pitfalls in Java
Transaction and concurrency pitfalls in JavaTransaction and concurrency pitfalls in Java
Transaction and concurrency pitfalls in JavaErsen Öztoprak
 
Transaction management
Transaction managementTransaction management
Transaction managementArchanaMani2
 
SQL Server Transaction Management
SQL Server Transaction ManagementSQL Server Transaction Management
SQL Server Transaction ManagementDenise McInerney
 
Session 9 Tp9
Session 9 Tp9Session 9 Tp9
Session 9 Tp9phanleson
 
DBMS Transaction course
DBMS Transaction courseDBMS Transaction course
DBMS Transaction courseGermainSIGETY1
 

Similar to MySQL Overview (20)

SQL Server Transaction Management
SQL Server Transaction ManagementSQL Server Transaction Management
SQL Server Transaction Management
 
Autonomous transaction
Autonomous transactionAutonomous transaction
Autonomous transaction
 
Dbms
DbmsDbms
Dbms
 
chp13.pdf
chp13.pdfchp13.pdf
chp13.pdf
 
Sistem manajemen basis data 8
Sistem manajemen basis data   8Sistem manajemen basis data   8
Sistem manajemen basis data 8
 
Synchronicity of a distributed account system
Synchronicity of a distributed account systemSynchronicity of a distributed account system
Synchronicity of a distributed account system
 
Sql server concurrency
Sql server concurrencySql server concurrency
Sql server concurrency
 
Rails DB migrate SAFE.pdf
Rails DB migrate SAFE.pdfRails DB migrate SAFE.pdf
Rails DB migrate SAFE.pdf
 
Database consistency in NonStop SQL/MX
Database consistency in NonStop SQL/MXDatabase consistency in NonStop SQL/MX
Database consistency in NonStop SQL/MX
 
DBMS-chap 2-Concurrency Control
DBMS-chap 2-Concurrency ControlDBMS-chap 2-Concurrency Control
DBMS-chap 2-Concurrency Control
 
DBMS UNIT IV.pptx
DBMS UNIT IV.pptxDBMS UNIT IV.pptx
DBMS UNIT IV.pptx
 
Intro to tsql unit 12
Intro to tsql   unit 12Intro to tsql   unit 12
Intro to tsql unit 12
 
Introduction to mongodb
Introduction to mongodbIntroduction to mongodb
Introduction to mongodb
 
Transaction and concurrency pitfalls in Java
Transaction and concurrency pitfalls in JavaTransaction and concurrency pitfalls in Java
Transaction and concurrency pitfalls in Java
 
Transaction management
Transaction managementTransaction management
Transaction management
 
Trancastion
TrancastionTrancastion
Trancastion
 
SQL Server Transaction Management
SQL Server Transaction ManagementSQL Server Transaction Management
SQL Server Transaction Management
 
CAP: Scaling, HA
CAP: Scaling, HACAP: Scaling, HA
CAP: Scaling, HA
 
Session 9 Tp9
Session 9 Tp9Session 9 Tp9
Session 9 Tp9
 
DBMS Transaction course
DBMS Transaction courseDBMS Transaction course
DBMS Transaction course
 

Recently uploaded

FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 

Recently uploaded (20)

FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 

MySQL Overview

  • 1. MySQL MySQL Overview A.Sidelev 2019 Architecture Transactions Replication Indexing Optimization Storage Engines Query Execution Isolation Levels Normalization
  • 2. MySQL Timeline Sun Microsystems: 2008 - 2010MySQL AB: until 2008 Oracle: since 2010 2001 2003 2005 2006 2008 2010 2013 20182015 Version 3.23 Version 4.0 Version 5.5 Version 4.1 Version 5.6 Version 5.7 Version 8.0Version 5.1 Version 5.0 Version 6.0 Query cache InnoDB a standard part of the server Replication was rewritten to use two threads Support for UNION and multi-table DELETE statements Introduced views Introduced triggers Introduced stored procedures ISAM engine was removed completely Add replication Add full-text indexing MyISAM Support InnoDB plugin Add Subqueries Add INSERT ON DUPLICATE KEY UPDATE Support UTF-8 character New binary protocol and prepared statement support Introduced partitioning Add row-based replication Add variety of plugin APIs BerkeleyDB storage engine was removed Optimized performance, scalability, replication InnoDB default storage engine Add database PERFORMANCE_SCHEMA APIs for replication, authentication Improvement query optimizer More plugin APIs Implement replication GTID Delayed slave  Multi source replication Added JSON support Improved InnoDB scalability Canceled SQL Window functions JSON Extended syntax Default character utf8mb4 Better at Read/Write workloads SQL Roles MySQL
  • 3. MySQL Architecture Client connecors JDBC, ODBC, PHP, Python, .NET, Perl, Ruby, Native C API, ...   Services and utilities Connection pool Storage engines Connection Handling, Thread Reuse, Authentication, Security, Connections Limits, Caches Backup & Recovery Administration Security Replication Cluster Migration & Metadata Monitoring SQL Interface DML, DDL Stored procedures Triggers Views Parser Queries translation Syntactic analyzer Lexical semantic Code generation Object privileges Optimizer Rewriting Statistic Order of scanning Usege Indexes Caches Global and Engines specific caches and buffers MyISAMInnoDB Memory Archive Federated CSV Merge Blackhole File system Files and Logs NTFS, ext4, SAN, NAS Binary, Error, Slow, General, Redo, Undo, Data, Index MySQL SERVER MySQL
  • 4. Transactions ACID - an acronym standing for atomicity, consistency, isolation, and durability. These properties are all desirable in a database system, and are all closely tied to the notion of a transaction. The transactional features of InnoDB adhere to the ACID principles. Transactions are atomic units of work that can be committed or rolled back. When a transaction makes multiple changes to the database, either all the changes succeed when the transaction is committed, or all the changes are undone when the transaction is rolled back. A banking application is the classic example of why transactions are necessary, for example transfer money between two accounts: START TRANSACTION; UPDATE accounts.balance SET balance = balance - 100.00 WHERE customer_id = 1; UPDATE accounts.balance SET balance = balance + 100.00 WHERE customer_id = 2; COMMIT; The entire operation should be wrapped in a transaction so that if any one of the steps fails, any completed steps can be rolled back. You start a transaction with the START TRANSACTION statement and then either make its changes permanent with COMMIT or discard the changes with ROLLBACK . What happens if the database server crashes while performing line 3? The customer probably just lost $100. And what if another process comes along between lines 2 and 3 and removes the entire checking account balance? The bank has given the customer a $100 credit without even knowing it. Transactions aren't enough unless the system passes the ACID test. A: atomicity - transaction must function as a single indivisible unit of work so that the entire transaction is either applied or rolled back. When transactions are atomic, there is no such thing as a partially completed transaction: it's all or nothing. C: consistency - the database should always move from one consistent state to the next. In our example, consistency ensures that a crash between lines 2 and 3 doesn't result in $100 disappearing from the checking account. Because the transaction is never committed, none of the transaction’s changes are ever reflected in the database. I: isolation - the results of a transaction are usually invisible to other transactions until the transaction is complete. This ensures that if a bank account summary runs after line 2 but before line 3 in our example, it will still see the $100 in the checking account. When we discuss isolation levels, you’ll understand why we said usually invisible. D: durability - once committed, a transaction’s changes are permanent. This means the changes must be recorded such that data won’t be lost in a system crash. Deadlocks. A deadlock is when two or more transactions are mutually holding and requesting locks on the same resources, creating a cycle of dependencies. Deadlocks occur when transactions try to lock resources in a different order. They can happen whenever multiple transactions lock the same resources. To combat this problem, database systems implement various forms of deadlock detection and timeouts. The more sophisticated systems, such as the InnoDB storage engine, will notice circular dependencies and return an error instantly. MySQL
  • 5. Isolation Levels Isolation level Dirty reads Phantom reads Locking readsNonrepeatable reads READ UNCOMMITTED READ COMMITTED REPEATABLE READ SERIALIZABLE READ UNCOMMITTED In the READ UNCOMMITTED isolation level, transactions can view the results of uncommitted transactions. At this level, many problems can occur unless you really, really know what you are doing and have a good reason for doing it. Reading uncommitted data is also known as a dirty read. READ COMMITTED READ COMMITTED satisfies the simple definition of isolation used earlier: a transaction will see only those changes made by transactions that were already committed when it began, and its changes won't be visible to others until it has committed. This level still allows what's known as a nonrepeatable read. This means you can run the same statement twice and see different data. REPEATABLE READ REPEATABLE READ is MySQL's default transaction isolation level. REPEATABLE READ guarantees that any rows a transaction reads will "look the same" in subsequent reads within the same transaction, but in theory it still allows another tricky problem: phantom reads. Simply put, a phantom read can happen when you select some range of rows, another transaction inserts a new row into the range, and then you select the same range again; you will then see the new "phantom" row. SERIALIZABLE The highest level of isolation, SERIALIZABLE, solves the phantom read problem by forcing transactions to be ordered so that they can't possibly conflict. SERIALIZABLE places a lock on every row it reads. At this level, a lot of timeouts and lock contention can occur.  The SQL standard defines four isolation levels, with specific rules for which changes are and aren't visible inside and outside a transaction. Lower isolation levels typically allow higher concurrency and have lower overhead. Summarizes the various isolation levels and the drawbacks associated with each one: MySQL
  • 6. Multiversion Concurrency Control MySQL InnoDB don't use a simple row-locking mechanism. It uses row-level locking in conjunction with multiversion concurrency control (MVCC). MVCC works by keeping a snapshot of the data as it existed at some point in time. This means transactions can see a consistent view of the data, no matter how long they run. InnoDB implements MVCC by storing with each row two additional, hidden values that record when the row was created (inserted) and when it was expired (deleted). This is a number that increments each time a transaction begins. Each transaction keeps its own record of the current system version, as of the time it began. Each query has to check each row's version numbers against the transaction's version. MVCC works only with the REPEATABLE READ and READ COMMITTED isolation levels. SELECT     InnoDB must examine each row to ensure that it meets two criteria:     1. InnoDB must find a version of the row that is at least as old as the transaction (i.e., its version must be less than or equal to the                  transaction’s version). This ensures that either the row existed before the transaction began, or the transaction created or altered the row.     2. The row's deletion version must be undefined or greater than the transaction's version. This ensures that the row wasn't deleted before      the transaction began.     Rows that pass both tests may be returned as the query's result. INSERT     InnoDB records the current system version number with the new row. DELETE     InnoDB records the current system version number as the row's deletion ID. UPDATE     InnoDB writes a new copy of the row, using the system version number for the new row's version. It also writes the system version                  number as the old row's deletion version. The result of all this extra record keeping is that most read queries never acquire locks. They simply read data as fast as they can, making sure to select only rows that meet the criteria. The drawbacks are that the storage engine has to store more data with each row, do more work when examining rows, and handle some additional housekeeping operations. MySQL
  • 7. Database normalization Database normalization is a process by which an existing schema is modified to bring its component tables into compliance with a series of progressive normal forms. The concept of database normalization was first introduced by Edgar Frank Codd in his paper: "A Relational Model of Data for Large Shared Data Banks". First Normal Form The first normal form (or 1NF) requires that the values in each column of a table are atomic. By atomic we mean that there are no sets of values within a column. One method for bringing a table into first normal form is to separate the entities contained in the table into separate tables. Second Normal Form The second normal form (or 2NF) any non-key columns must depend on the entire primary key. In the case of a composite primary key, this means that a non-key column cannot depend on only part of the composite key. Third Normal Form Third Normal Form (3NF) requires that all columns depend directly on the primary key. Tables violate the Third Normal Form when one column depends on another column, which in turn depends on the primary key (a transitive dependency). MySQL 1st NF 2nd NF 3rd NF Boyce-Codd NF 4th NF 5th NF Domain Key NF 6th NF Boyce-Codd normal form is a slightly stronger version of the third normal form (3NF).  If a relational schema is in BCNF then all redundancy based on functional dependency has been removed, although other types of redundancy may still exist. Boyce-Codd Normal Form Fourth Normal Form Fourth normal form (4NF) is a level of database normalization where there are no non-trivial multivalued dependencies other than a candidate key. Fifth Normal Form Fifth normal form (5NF) is a level of database normalization designed to reduce redundancy in relational databases recording multi-valued facts by isolating semantically related multiple relationships. Domain-key Normal Form Domain-key normal form (DK/NF) is a normal form which requires that the database contains no constraints other than domain constraints and key constraints. A domain constraint specifies the permissible values for a given attribute, while a key constraint specifies the attributes that uniquely identify a row in a given table. Sixth Normal Form Sixth normal form based on an extension of the relational algebra. Relational operators, such as join, are generalized to support a natural treatment of interval data, such as sequences of dates or moments in time, for instance in temporal databases.
  • 8. MySQL's Storage Engines MySQL MySQL stores each database (also called a schema) as a subdirectory of its data directory in the underlying filesystem. When you create a table, MySQL stores the table definition in a .frm file with the same name as the table.  InnoDB: The default storage engine in MySQL 8.0. InnoDB is a transaction-safe (ACID compliant) storage engine for MySQL that has commit, rollback, and crash-recovery capabilities to protect user data. InnoDB row-level locking (without escalation to coarser granularity locks) and Oracle-style consistent nonlocking reads increase multi-user concurrency and performance. InnoDB stores user data in clustered indexes to reduce I/O for common queries based on primary keys. To maintain data integrity, InnoDB also supports FOREIGN KEY referential-integrity constraints. MyISAM: These tables have a small footprint. Table-level locking limits the performance in read/write workloads, so it is often used in read- only or read-mostly workloads in Web and data warehousing configurations. Memory: Stores all data in RAM, for fast access in environments that require quick lookups of non-critical data. This engine was formerly known as the HEAP engine. Its use cases are decreasing; InnoDB with its buffer pool memory area provides a general-purpose and durable way to keep most or all data in memory, and NDBCLUSTER provides fast key-value lookups for huge distributed data sets.  CSV: Its tables are really text files with comma-separated values. CSV tables let you import or dump data in CSV format, to exchange data with scripts and applications that read and write that same format. Because CSV tables are not indexed, you typically keep the data in InnoDB tables during normal operation, and only use CSV tables during the import or export stage. Archive: These compact, unindexed tables are intended for storing and retrieving large amounts of seldom-referenced historical, archived, or security audit information. Blackhole: The Blackhole storage engine accepts but does not store data, similar to the Unix /dev/null device. Queries always return an empty set. These tables can be used in replication configurations where DML statements are sent to slave servers, but the master server does not keep its own copy of the data.  NDB (also known as NDBCLUSTER): This clustered database engine is particularly suited for applications that require the highest possible degree of uptime and availability. Merge: Enables a MySQL DBA or developer to logically group a series of identical MyISAM tables and reference them as one object. Good for VLDB environments such as data warehousing. Federated: Offers the ability to link separate MySQL servers to create one logical database from many physical servers. Very good for distributed or data mart environments.
  • 9. Execution path of a query Query Cache Parser Preprocessor Query optimizer MyISAM InnoDB ... DATA Storage engines Query execution engine MySQL Server Client Client/Server protocol Checks the query cache.  If there’s a hit, it returns the stored result from the cache; otherwise, it passes the SQL statement to the next step. The query execution engine executes the plan by making calls to the storage engine API. MySQL's parser breaks the query into tokens and builds a "parse tree" from them. The preprocessor checks the resulting parse tree for additional semantics. Checks that tables and columns exist, and privileges. - Logical Transformations. - Cost-based optimizer: Join   order and access methods - Plan refinement 5 42 1 3 Main phases of query execution SQL Result API callsParse Tree Query execution plan SQL query Parsed Tree Result Result MySQL
  • 10. Query Cache The MySQL query cache holds the exact bits that a completed query returned to the client. When a query cache hit occurs, the server can simply return the stored results immediately, skipping the parsing, optimization, and execution steps. The query cache keeps track of which tables a query uses, and if any of those tables changes, it invalidates the cache entry. MySQL does not parse, "normalize" or parameterize a statement when it checks for a cache hit; it uses the statement and other bits of data exactly as the client sends them. Any difference in character case, spacing, or comments any difference at all will prevent a query from matching a previously cached version. Another caching consideration is that the query cache will not store a result unless the query that generated it was deterministic. Thus, any query that contains a nondeterministic function, such as NOW() or CURRENT_DATE() , will not be cached. In fact, the query cache does not work for queries that refer to user-defined functions, stored functions, user variables, temporary tables. Any SELECT query that MySQL doesn't serve from the cache is a cache miss.  A cache miss can occur for any of the following reasons: The query is not cacheable, either because it contains a nondeterministic construct (such as CURRENT_DATE, NOW, etc) or because its result set is too large to store. The server has never seen the query before, so it never had a chance to cache its result. The query's result was previously cached, but the server removed it. This can happen because there wasn't enough memory to keep it, because someone instructed the server to remove it, or because it was invalidated. If your server has a lot of cache misses but very few uncacheable queries, one of the following must be true: The query cache is not warmed up yet. That is, the server hasn't had a chance to fill the cache with result sets. The server is seeing queries it hasn't seen before. If you don't have a lot of repeated queries, this can happen even after the cache is warmed up. There are a lot of cache invalidations. Cache invalidations can happen because of fragmentation, insufficient memory, or data modifications. Query Cache Optimizations It's more efficient to batch writes than to do them singly, because this method invalidates cached cache entries only once.You cannot control the query cache on a per-database or per-table basis, but you can include or exclude individual queries with the SQL_CACHE and SQL_NO_CACHE modifiers in the SELECT statement. You can also enable or disable the query cache on a per-connection basis by setting the session-level query_cache_type server variable to the appropriate value. If you want to avoid the query cache for most queries, but you know that some will benefit significantly from caching, you can set the global query_cache_type to DEMAND and then add the SQL_CACHE hint to those queries you want to cache. Although this requires you to do more work, it gives you very fine-grained control over the cache. Conversely, if you want to cache most queries and exclude just a few, you can add SQL_NO_CACHE to them. MySQL
  • 11. MySQL's parser breaks the query into tokens and builds a "parse tree" from them. The parser uses  MySQL's SQL grammar to interpret and validate the query. For instance, it ensures that the tokens in the query are valid and in the proper order, and it checks for mistakes such as quoted strings that aren't terminated.  The preprocessor then checks the resulting parse tree for additional semantics that the parser can't resolve. For example, it checks that tables and columns exist, and it resolves names and aliases to ensure that column references aren't ambiguous. Next, the preprocessor checks privileges. The parser and the preprocessor Query optimizer The parse tree is now valid and ready for the optimizer to turn it into a query execution plan. A query can often be executed many different ways and produce the same result. The optimizer's job is to find the best option. MySQL uses a cost-based optimizer, which means it tries to predict the cost of various execution plans and choose the least expensive. The unit of cost was originally a single random 4 KB data page read, but it has become more sophisticated and now includes factors such as the estimated cost of executing a WHERE clause comparison. You can see how expensive the optimizer estimated a query to be by running the query, then inspecting the Last_query_cost session variable. The score is calculated based on various statistical information: the number of pages per table or index, the cardinality (number of distinct values) of the indexes, the length of the rows and keys, and the key distribution. The optimizer does not include the effects of any type of caching in its estimates - it assumes every read will result in a disk I/O operation. The optimizer might not always choose the best plan, for many reasons: The statistics could be wrong. The server relies on storage engines to provide statistics, and they can range from exactly correct to wildly inaccurate. For example, the InnoDB storage engine doesn't maintain accurate statistics about the number of rows in a table because of its MVCC architecture. The cost metric is not exactly equivalent to the true cost of running the query, so even when the statistics are accurate, the query might be more or less expensive than MySQL's approximation. A plan that reads more pages might actually be cheaper in some cases, such as when the reads are sequential so the disk I/O is faster, or when the pages are already cached in memory. MySQL also doesn't understand which pages are in memory and which pages are on disk, so it doesn't really know how much I/O the query will cause. MySQL doesn't consider other queries that are running concurrently, which can affect how quickly the query runs. The optimizer doesn't take into account the cost of operations not under its control, such as executing stored functions or user-defined functions. MySQL's query optimizer is a highly complex piece of software, and it uses many optimizations to transform the query into an execution plan. There are two basic types of optimizations, which we call static and dynamic. Static optimizations can be performed simply by inspecting the parse tree. For example, the optimizer can transform the WHERE clause into an equivalent form by applying algebraic rules. Static optimizations are independent of values, such as the value of a constant in a WHERE clause. They can be performed once and will always be valid, even when the query is reexecuted with different values. Dynamic optimizations are based on context and can depend on many factors, such as which value is in a WHERE clause or how many rows are in an index. They must be reevaluated each time the query is executed. MySQL
  • 12. Optimizing SQL Statements The core logic of a database application is performed through SQL statements, whether issued directly through an interpreter or submitted behind the scenes through an API. The guidelines cover SQL operations that read and write data, the behind-the-scenes overhead for SQL operations in general. This section discusses optimizations that can be made for processing WHERE clauses. The examples use SELECT statements, but the same optimizations apply for WHERE clauses in DELETE and UPDATE statements.  Removal of unnecessary parentheses: ((a AND b) AND c OR (((a AND b) AND (c AND d)))) (a AND b AND c) OR (a AND b AND c AND d) Constant folding:              (a < b AND b = c) AND a = 5 b > 5 AND b = c AND a = 5 Constant condition removal: (b >= 5 AND b = 5) OR (b = 6 AND 5 = 5) OR (b = 7 AND 5 = 6) b = 5 OR b = 6 Equality Range Optimization of Many-Valued Comparisons: col_name = val1 OR ... OR col_name = valN col_name IN(val1,...,valN) MySQL sorts the values in the IN() list and uses a fast binary search to see whether a value is in the list. This is O(log n) in the size of the list, whereas an equivalent series of OR clauses is O(n) in the size of the list (i.e., much slower for large lists). Range Optimization of Row Constructor Expressions: SELECT ... FROM t1 WHERE (col_1 = 'a' AND col_2 = 'b') OR (col_1 = 'c' AND col_2 = 'd'); SELECT ... FROM t1 WHERE (col_1, col_2) IN (('a', 'b'), ('c', 'd')); MySQL
  • 13. for each row in t1 matching range { for each row in t2 matching reference key { for each row in t3 { if row satisfies join conditions, send to client } } } SQL JOIN Nested-Loop Join Algorithm. A simple nested-loop join (NLJ) algorithm reads rows from the first table in a loop one at a time, passing each row to a nested loop that processes the next table in the join. This process is repeated as many times as there remain tables to be joined. Use indexes in destination join's columns  for optimize perfomance to retrieve rows from other tables when performing joins. MySQL can use indexes on columns more efficiently if they are declared as the same type and size.            Assume that a join between three tables t1, t2, and t3 is to be executed using the following join types:            Table t1 t2 t3 Join Type range ref ALL If a simple NLJ algorithm is used, the join is processed like this:            FULL JOIN can't be executed with nested loops and backtracking as soon as a table with no matching rows is found, because it might begin with a table that has no matching rows. This explains why MySQL doesn't support FULL JOIN. If you need to make FULL JOIN in MySQL you can combine LEFT JOIN and RIGHT JOIN  SELECT <select list> FROM table_a AS A LEFT JOIN table_b AS B ON A.key = B.key UNION SELECT <select list> FROM table_a AS A RIGHT JOIN table_b AS B ON A.key = B.key table_btable_a col1=1, col2=1 ... ... col1=3, col2=1 ... 1 loop N loops per row JOIN JOIN A B C A B C col2=1, col3=1 col2=1, col3=2 ... col2=1, col3=1 col2=1, col3=2 results col1=1, col3=1 col1=1, col3=2 ... col1=3, col3=1 col1=3, col3=2 Swim-lane diagram illustrating retrieving rows using a join MySQL's query execution plans always take the form of a left-deep tree MySQL MySQL executes joins between tables using a nested-loop algorithm or variations on it. 
  • 14. SQL TYPES OF JOIN A B A B A B A B A B A B A B Select all records from Table A and Table B, where the join condition is met. Select all records from Table A, along with records from Table B for which the join condition is met (if at all). Select all records from Table B, along with records from Table A for which the join condition is met (if at all). Select all records from Table A and Table B, regardless of whether the join condition is met or not. Select records from Table A which not exists in Table B. Select records from Table B which not exists in Table A. Select all records from Table A and Table B, excluded common (cross) rows. MySQL SELECT <select list> FROM table_a AS A INNER JOIN table_b AS B ON A.Key = B.Key; SELECT <select list> FROM table_a AS A LEFT JOIN table_b AS B ON A.Key = B.Key; SELECT <select list> FROM table_a AS A RIGHT JOIN table_b AS B ON A.Key = B.Key; SELECT <select list> FROM table_a AS A LEFT JOIN table_b AS B ON A.Key = B.Key WHERE B.Key IS NULL; SELECT <select list> FROM table_a AS A RIGHT JOIN table_b AS B ON A.Key = B.Key WHERE A.Key IS NULL; SELECT <select list> FROM table_a AS A FULL JOIN table_b AS B ON A.Key = B.Key; SELECT <select list> FROM table_a AS A FULL JOIN table_b AS B ON A.Key = B.Key WHERE A.Key IS NULL OR B.Key IS NULL;
  • 15. MySQL Indexes Indexes are used to find rows with specific column values quickly. Without an index, MySQL must begin with the first row and then read through the entire table to find the relevant rows. The larger the table, the more this costs. If the table has an index for the columns in question, MySQL can quickly determine the position to seek to in the middle of the data file without having to look at all the data. This is much faster than reading every row sequentially. Most MySQL indexes (PRIMARY KEY, UNIQUE, INDEX, and FULLTEXT) are stored in B-trees. Exceptions: Indexes on spatial data types use R-trees; MEMORY tables also support hash indexes; InnoDB uses inverted lists for FULLTEXT indexes.  B-Tree indexes When people talk about an index without mentioning a type, they are probably referring to a B-Tree index. Most of MySQL's storage engines support this index type.  B-Tree index speeds up data access because the storage engine doesn't have to scan the whole table to find the desired data. Instead, it starts at the root node, the slots in the root node hold pointers to child nodes, and the storage engine follows these pointers. It finds the right pointer by looking at the values in the node pages. Eventually, the storage engine either determines that the desired value doesn't exist or successfully reaches a leaf pages which have pointers to the indexed data. 3 5 1 3 42 5 6 7 d1 d2 d3 d4 d5 d6 d7 B+Tree B+ tree example linking the keys 1–7 to data values d1-d7. Index selectivity is the ratio of the number of distinct indexed values (the cardinality) to the total number of rows in the table (#N), and ranges from 1/#N to 1. A highly selective index is good because it lets MySQL filter out more rows when it looks for matches. A unique index has a selectivity of 1, which is as good as it gets. MySQL Clustered Indexes. When a table has a clustered index, its rows are actually stored in the index's leaf pages. The term "clustered" refers to the fact that rows with adjacent key values are stored close to each other. You can have only one clustered index per table. If you don't define a primary key, InnoDB will try to use a unique nonnullable index instead. If there's no such index, InnoDB will define a hidden primary key for you and then cluster on that. InnoDB clusters records together only within a page. Pages with adjacent key values might be distant from each other. A clustering primary key can help performance.
  • 16. Types of queries that can use index. B-Tree indexes work well for lookups by the full key value, a key range, or a key prefix. INDEX(A,B,C) WHERE A = 'a' AND B = 'b' AND C = 'c'; -- Match on the full key value specified values for all columns in the index.    Use columns A,B,C by index - multicolumn index for columns: A,B,C -- Leading column is not referenced (column A) -- Leading column is not referenced (column A,B) - any leftmost prefix of the index can be used by   the optimizer to look up rows Use full or part of index Not use index -- Index use (left part of index)  -- Index not use -- Index not use Use  index by LIKE -- Match one part exactly and match a range on another part.    Use columns A,B,C(range) by index -- Match a left part of index, first and second columns.    Use columns A,B(range) by index -- Match a leftmost prefix. This uses only the first column in the index.    Use columns A(range) by index WHERE A = 'a' AND B IN('a','b') AND C > 'c'; WHERE A = 'a' AND B >= 'b' AND C = 'c'; WHERE A >= 'a' AND B = 'b' AND C = 'c'; MySQL stop using key parts in multi part index as soon as it met the real range (<,>, BETWEEN), it however is able to continue using key parts further to the right if IN(…) range is used. MySQL Covering Indexes Indexes need to be designed for the whole query, not just the WHERE clause. Indexes are indeed a way to find rows efficiently, but MySQL can also use an index to retrieve a column's data, so it doesn't have to read the row at all. The index's leaf nodes contain the values they index; why read the row when reading the index can give you the data you want? An index that contains (or "covers") all the data needed to satisfy a query is called a covering index. WHERE B = 2 AND C = 3; WHERE C = 3; WHERE A LIKE 'abc%'; WHERE A LIKE '%abc'; WHERE A LIKE '%abc%';
  • 17. Main types of indexes    Primary Key. The primary key for a table represents the column or set of columns. Query performance benefits from the NOT NULL optimization, because it cannot include any NULL values. With the InnoDB storage engine, the table data is physically organized to do ultra- fast lookups and sorts based on the primary key column or columns. Use as a primary key, you might create a separate column with auto- increment values to use as the primary key. These unique IDs can serve as pointers to corresponding rows in other tables when you join tables using foreign keys.    Foreign key. A foreign key is a field in a table that matches another field of another table. A foreign key places constraints on data in the related tables, which enables MySQL to maintain referential integrity. [CONSTRAINT [symbol]] FOREIGN KEY [index_name](col_name,...) REFERENCES tbl_name(col_name,...) [ON DELETE reference_option] [ON UPDATE reference_option] reference_option: RESTRICT | CASCADE | SET NULL | NO ACTION | SET DEFAULT For storage engines supporting foreign keys, MySQL rejects any INSERT or UPDATE operation that attempts to create a foreign key value in a child table if there is no a matching candidate key value in the parent table. When an UPDATE or DELETE operation affects a key value in the parent table that has matching rows in the child table, the result depends on the referential action specified using ON UPDATE and ON DELETE subclauses of the FOREIGN KEY clause. MySQL supports five options regarding the action to be taken, listed here: CASCADE: Delete or update the row from the parent table, and automatically delete or update the matching rows in the child table. Both ON DELETE CASCADE and ON UPDATE CASCADE are supported. Between two tables, do not define several ON UPDATE CASCADE clauses that act on the same column in the parent table or in the child table. SET NULL: Delete or update the row from the parent table, and set the foreign key column or columns in the child table to NULL. Both ON DELETE SET NULL and ON UPDATE SET NULL clauses are supported. RESTRICT: Rejects the delete or update operation for the parent table. Specifying RESTRICT (or NO ACTION) is the same as omitting the ON DELETE or ON UPDATE clause. NO ACTION: A keyword from standard SQL. In MySQL, equivalent to RESTRICT. The MySQL Server rejects the delete or update operation for the parent table if there is a related foreign key value in the referenced table. Some database systems have deferred checks, and NO ACTION is a deferred check. In MySQL, foreign key constraints are checked immediately, so NO ACTION is the same as RESTRICT. SET DEFAULT: This action is recognized by the MySQL parser, but both InnoDB and NDB reject table definitions containing ON DELETE SET DEFAULT or ON UPDATE SET DEFAULT clauses. MySQL
  • 18.    Prefix Indexes. Sometimes you need to index very long character columns, which makes your indexes large and slow. We can save space and get good performance by indexing the first few characters instead of the whole value. This makes your indexes use less space, but it also makes them less selective. The trick is to choose a prefix that’s long enough to give good selectivity, but short enough to save space. SELECT COUNT(DISTINCT name)/COUNT(*) AS sel_complete, COUNT(DISTINCT LEFT(name, 3))/COUNT(*) AS sel_3, COUNT(DISTINCT LEFT(name, 4))/COUNT(*) AS sel_4, COUNT(DISTINCT LEFT(name, 5))/COUNT(*) AS sel_5, COUNT(DISTINCT LEFT(name, 6))/COUNT(*) AS sel_6, COUNT(DISTINCT LEFT(name, 7))/COUNT(*) AS sel_7, COUNT(DISTINCT LEFT(name, 8))/COUNT(*) AS sel_8 FROM table; sel_complete sel_3 sel_4 sel_5 sel_6 sel_7 sel_8 0.317 0.105 0.188 0.245 0.309 0.317 0.317 Calculate a good prefix length is by computing the full column selectivity and trying to make the prefixes selectivity close to that value: customers id name date ... orders order_id order_name datetime customer_id ... We have two tables: customers and orders. Each customer has zero or more orders and each order belongs to only one customer. The relationship between customers table and orders table is one-to-many, and it is established by a foreign key in the orders table specified by the customer_id field. The customer_id field in the orders table relates to the id primary key field in the customers table. The customers table is called parent table or referenced table, and the orders table is known as child table or referencing table. A foreign key can be a column or a set of columns. The columns in the child table often refer to the primary key columns in the parent table. A table may have more than one foreign key, and each foreign key in the child table may refer to a different parent table. Foreign key. Example    Column Indexes. The most common type of index involves a single column, storing copies of the values from that column in a data structure, allowing fast lookups for the rows with the corresponding column values. The B-tree data structure lets the index quickly find a specific value, a set of values, or a range of values, corresponding to operators such as =, >, ≤, BETWEEN, IN, and so on, in a WHERE clause.    Multiple-Column Indexes.  MySQL can create composite indexes (that is, indexes on multiple columns). An index may consist of up to 16 columns.   multiple-column index can be considered a sorted array, the rows of which contain values that are created by concatenating the values of the indexed columns.      FULLTEXT Indexes.  FULLTEXT indexes are used for full-text searches. Only the InnoDB and MyISAM storage engines support FULLTEXT indexes and only for CHAR, VARCHAR, and TEXT columns. Indexing always takes place over the entire column and column prefix indexing is not supported. For queries that contain full-text expressions, MySQL evaluates those expressions during the optimization phase of query execution. The optimizer does not just look at full-text expressions and make estimates, it actually evaluates them in the process of developing an execution plan.  MySQL ALTER TABLE table ADD KEY(name(7));
  • 19. Replication Overview Master Binary Logs Data changes Slave Relay Logs Read Write Read Replay Replication enables data from one MySQL database server (the master) to be copied to one or more MySQL database servers (the slaves). Replication is asynchronous by default. Depending on the configuration, you can replicate all databases, selected databases, or even selected tables within a database. Advantages of replication in MySQL include: Scale-out solutions, Data security, Analytics. How Replication Works 1. The master records changes to its data in its binary log. (These records are called binary log events.) Just before each transaction that updates data completes on the master, the master records the changes in its binary log. 2. The replica copies the master's binary log events to its relay log. To begin, it starts a worker thread, called the I/O slave thread. The I/O thread opens an ordinary client connection to the master, then starts a special binlog dump process. The binlog dump process reads events from the master's binary log. If it catches up to the master, it goes to sleep and waits for the master to signal it when there are new events. The I/O thread writes the events to the replica's relay log. 3. The replica replays the events in the relay log, applying the changes to its own data. The SQL slave thread reads and replays events from the relay log, thus updating the replica’s data to match the master's.  The events the SQL thread executes can optionally go into the replica's own binary log. 1 2 3 Replication Formats Statement-based binary logging, the master writes SQL statements to the binary log. Replication of the master to the slave works by executing the SQL statements on the slave. This is called statement-based replication (SBR), which corresponds to the MySQL statement-based binary logging format. Row-based logging, the master writes events to the binary log that indicate how individual table rows are changed. Replication of the master to the slave works by copying the events representing the changes to the table rows to the slave. This is called row-based replication (RBR). Row-based logging is the default method. Mixed-format logging. You can also configure MySQL to use a mix of both statement-based and row-based logging, depending on which is most appropriate for the change to be logged. When using mixed-format logging, a statement-based log is used by default. Depending on certain statements, and also the storage engine being used, the log is automatically switched to row-based in particular cases. Replication using the mixed format is referred to as mixed-based replication or mixed-format replication. MySQL
  • 20. Replication Topologies Single Chain CircularMultiple Master with Backup Master (Multiple Replication) Multi - Circular This setup use an intermediate master to act as a relay to the other slaves in the replication chain. When there are many slaves connected to a master, the network interface of the master can get overloaded. This topology allows the read replicas to pull the replication stream from the relay server to offload the master server. The ring topology, this setup requires two or more MySQL servers which act as master. All masters receive writes and generate binlogs with a few caveats: Set auto-increment offset on each server to avoid primary key collisions. There is no conflict resolution. Common practice is to only write to one master and the other master acts as a hot-standby node. Still, if you have slaves below that tier, you have to switch to the new master manually if the designated master fails. This the most straightforward MySQL replication topology. One master receives writes, one or more slaves replicate from the same master via asynchronous or semi-synchronous replication. If the designated master goes down, the most up-to-date slave must be promoted as new master. The remaining slaves resume the replication from the new master. The master pushes changes to a backup master and to one or more slaves. Semi-synchronous replication is used between master and backup master. Backup master gets update, writes to its relay log and flushes to disk.This topology works well when performing master failover in case the master goes down. The backup master acts as a warm-standby server as it has the highest probability of having up-to-date data when compared to other slaves. (read only) (read only) 1 2 3 4 5 6 MySQL