SlideShare ist ein Scribd-Unternehmen logo
1 von 43
Downloaden Sie, um offline zu lesen
Deep Dive: InnoDB
Transactions and Write Paths
From the client connection to physical storage
Marko Mäkelä, Lead Developer InnoDB
Michaël de Groot, MariaDB Consultant
InnoDB
Concepts
A mini-transaction is an atomic
set of page reads or writes,
with write-ahead redo log.
A transaction writes undo log
before modifying indexes.
The read view of a transaction
may access the undo logs of
newer transactions to retrieve
old versions.
Purge may remove old undo logs
and delete-marked records
once no read view needs them.
Some terms that an Advanced
DBA should be familiar with
The “storage stack” of InnoDB
A Comparison to the OSI Model (Upper Layers)
The Open Systems Interconnection Model
7. Application layer
● Example: HTML5 web application
● Example: apt update; apt upgrade
6. Presentation layer
● XML, HTML, CSS, …
● JSON, BSON, …
● ASN.1 BER, …
5. Session layer
● SSL, TLS
● Web browser cookies, …
Some layers inside the MariaDB Server:
7. client connection
● Encrypted or cleartext
● Direct or via proxy
6. SQL
● Parser
● Access control
● Query optimization & execution
5. Storage engine interface
● Transactions: start, commit, rollback
● Tables: open, close, read, write
A Comparison to the OSI Model (Lower Layers)
The Open Systems Interconnection Model
4. Transport layer
● TCP/IP turns packets into reliable streams
● Retransmission, flow control
3. Network layer
● router/switch
● IP, ICMP, UDP, BGP, DNS, …
2. Data link
● Packet framing
● Checksums
1. Physical
● MAC: CSMA/CD, CSMA/CA, …
● LAN, WLAN, ATM, RS-232, …
InnoDB Storage Engine
4. Transaction
● Atomic, Consistent, Isolated access to
multiple tables via Locks & Read Views
● XA 2PC (distributed transactions by user,
or binlog-driven for cross-engine commit)
3. Mini-transaction
● Atomic, Durable multi-page changes
● Page checksums, crash recovery
2. Operating System (file system, block
device)
● Ext4, XFS, ZFS, NFS, …
1. Hardware/Firmware (physical storage)
● Hard disk, SSD, NVRAM, …
SQL and storage engine interface
(5-6)
Step 1: SQL Layer
UPDATE talk SET attendees = 25 WHERE conference="M|18"
AND name="Deep Dive";
● Constructs a parse tree.
● Checks for permissions.
● Acquires metadata lock on the table name (prevent DDL) and opens the table.
● Retrieves index cardinality statistics from the storage engine(s).
● Constructs a query execution plan.
Step 2a: Read via the Storage Engine Interface
UPDATE talk SET attendees = 25 WHERE conference="M|18"
AND name="Deep Dive";
● Find the matching record(s) via the chosen index (secondary index or primary
key index), either via lookup or index scan.
● On the very first read, InnoDB will lazily start a transaction:
○ Assign a new DB_TRX_ID (incrementing number global in InnoDB)
○ No read view is needed, because this is a locking operation.
Step 2b: Filtering the Rows
UPDATE talk SET attendees = 25 WHERE conference="M|18"
AND name="Deep Dive";
● If the WHERE condition can only be satisfied by range scan or table scan, we
will have to filter out non-matching rows.
○ If Index Condition Pushdown is used, InnoDB evaluates the predicate and filters rows.
○ Else, InnoDB will return every record one by one, and the SQL layer will decide what to do.
● After returning a single record, a new mini-transaction has to be started
○ Repositioning the cursor (search for a key) is moderately expensive
○ Optimization: After 4 separate reads, InnoDB will prefetch 8 rows (could be improved)
Step 2c: Locking the rows
● InnoDB will write-lock each index leaf read
○ Note: InnoDB has no “table row locks”, but “record locks” in each index separately.
○ If using index condition pushdown, non-matching records are not locked.
● All records that may be changed are locked exclusively
○ When using a secondary index, all possibly matching records stay locked
○ Depending on the isolation level, InnoDB may unlock non-matching rows in primary key scan
(MySQL Bug #3300, “semi-consistent read”)
○ Updates with a WHERE condition on PRIMARY KEY are faster!
Transaction layer
InnoDB Transaction Layer
The InnoDB transaction layer relies on atomically updated data pages forming
persistent data structures:
● Page allocation bitmap, index trees, undo log directories, undo logs
● Undo logs are the glue that make the indexes of a table look consistent.
● On startup, any pending transactions (with implicit locks) will be recovered
from undo logs.
○ Locks will be kept until incomplete transactions are rolled back, or
○ When found in binlog after InnoDB recovery completes, or
○ until explicit XA COMMIT or XA ROLLBACK.
● InnoDB supports non-locking reads (multi-versioning concurrency control)
by read view ‘snapshots’ that are based on undo logs.
128
B-tree root page B-tree leaf page
DB_ROLL_PTR
A Page View of the InnoDB Transaction Layer
TRX_SYS
Page
(ibdata1)
Rollback segment header page
Table
.ibd
Index
History List (to-purge committed)
Uncommitted transactions
(pkN,child page)
(pkM,child page)
B-tree internal page
(pkN1,child page)
(pkN2,child page) (pk2,DB_TRX_ID,DB_ROLL_PTR,cols)
(pk1,DB_TRX_ID,DB_ROLL_PTR,cols)
Undo Log
(linked list of pages)
Undo Log
(linked list of pages)
DB_TRX_ID and DB_ROLL_PTR
● DB_TRX_ID is an increasing sequence number, global in InnoDB
○ Each user transaction (except read-only non-locking) gets a new one, upon transaction
create
○ While other transaction ids (GTID, binlog position) are in commit order, this transaction id is in
create order
○ Each record has metadata with its current DB_TRX_ID
○ If the DB_TRX_ID is in a list of uncommitted transactions, there is an implicit lock
● DB_ROLL_PTR is a pointer to the undo log record of the previous version
○ Stored in the record's metadata (hidden column in the clustered index)
○ Or a flag in it is set, meaning that there is no previous version (the record was inserted)
Step 3a: Creating undo log records
● On the first write, the transaction is created in the buffer pool:
○ Create or reuse a page (same size as other pages)
○ Create undo log header, update undo log directory (also known as “rollback segment page”).
○ Done in separate mini-transaction from undo log record write; optimized away in MariaDB 10.3
● Undo log pages are normal data pages: They can fill up and get persisted
● All changes of the same transaction are added to the same undo log page(s)
● Writing each undo log record means (in a mini-transaction):
○ Append to last undo page, or allocate a new undo page and write the record there.
○ InnoDB writes each undo log in advance, before updating each index affected by the row
operation.
● Undo log records need to be purged when they are no longer needed
UPDATE talk SET attendees = 25 WHERE conference="M|18"
AND name="Deep Dive";
● update_row(old_row,new_row) is invoked for each matching row
● InnoDB calculates an “update vector” and applies to the current row.
● The primary key (clustered index) record will be write-locked.
○ It may already have been locked in the “Read” step.
● An undo log record will be written in its own mini-transaction.
● The primary key index will be updated in its own mini-transaction.
● For each secondary index on updated columns, use 2 mini-transactions:
(1) search, lock, delete-mark old record, (2) insert new record.
Step 3b: Updating the Matching Rows
UPDATE talk SET attendees = 25 WHERE conference="M|18"
AND name="Deep Dive";
● Because autocommit was enabled and there was no BEGIN, the SQL layer will
automatically commit at the end of each statement.
● In InnoDB, commit updates the undo log information in a mini-transaction:
1. Update undo log directory (rollback segments) and undo log header pages.
2. Commit the mini-transaction (write to the redo log buffer).
3. Obtain the last written LSN (redo log sequence number) in the buffer.
4. Optionally, ensure that all redo log buffer is written at least up to the LSN.
Step 4a: Commit
Alternate step 4a: Rollbacks
● Rollback is expensive because
○ All changes are done in reverse (jumping read pointers)
○ Reading undo log or index pages that were evicted from the buffer pool to disk
○ Secondary index leaf records might need removal (must look up primary key records)
● For each undo log record from the end to start (or savepoint), the log is
applied in reverse:
○ For UPDATE: the old version of the data is read and an update vector calculated
● The same process as the update step is followed but reversed
○ 1 mini-transaction per clustered index
○ 2 mini-transactions per secondary index
● The undo log pages are freed in a maintenance operation called Truncate
undo log tail.
Step 4b: Cleaning up
After User COMMIT or ROLLBACK is done, there are some clean-up steps:
● Transactional row locks can be released and waiting threads woken up.
● InnoDB will also wake up any other transactions that waited for the locks.
● If the binlog is enabled, a commit event will be synchronously written to it.
○ An internal XA PREPARE transition must have been persisted in the InnoDB redo log earlier.
○ Only the internal XA PREPARE but not the XA COMMIT needs to be flushed to redo log.
○ Listen to the replication talk by Andrei Elkin (next talk in the same room)
● Finally, the SQL layer will release metadata locks on the table name - the
table can now be ALTERed again
Mini-Transaction Layer
The InnoDB Mini-Transaction Layer
Each layer provides service to the upper layers, relying on the service provided
by the lower layer, adding some encapsulation or refinement:
● File system (or the layers below): Block address translation
○ Wear leveling, bad block avoidance
○ Deduplication
○ Redundancy or replication for fault tolerance or HA
● The InnoDB mini-transaction layer depends on a write-ahead redo log.
○ Provides an AcID service of modifying a number of persistent pages.
■ Example: Inserting a record into a B-tree, with a series of page splits.
○ An atomic mini-transaction becomes durable when its log is fully written to the redo log.
○ Recovery: Any redo log records after the latest checkpoint will be parsed and applied to the
buffer pool copies of referenced persistent data file pages.
Mini-Transactions and Page Locking
A mini-transaction is not a user transaction. It is a short operation comprising:
● A list of index, tablespace or page locks that have been acquired.
● A list of modifications to pages.
There is no rollback for mini-transactions.
Locks of unmodified pages can be released any time. Commit will:
1. Append the log (if anything was modified) to the redo log buffer
2. Release all locks
Index page lock acquisition must avoid deadlocks (server hangs):
● Backward scans are not possible. Only one index update per
mini-transaction.
● Forward scan: acquire next page lock, release previous page (if not changed)
Mini-Transaction
Mini-Transactions: RW-Locks and Redo Logs
Memo:
Locks or
Buffer-Fixes
Index tree latch
(dict_index_t::lock):
covers internal pages
Tablespace latch
(fil_space_t::lock):
allocating/freeing pages
Log:
Page
Changes Data Files
FIL_PAGE_LSN
Flush (after log written)
Redo Log Files
(ib_logfile*)
Redo Log Buffer
(log_sys_t::buf)
Write ahead (of page flush) to log (make durable)
Buffer pool page
buf_page_t::oldest_
modification
commit
A mini-transaction commit
stores the log position (LSN) to
each changed page.
Recovery will redo changes:
Apply log if the page LSN is
older than the log record LSN.
Log position
(LSN)
Mini-transactions and Durability
Depending on when the redo log buffer was written to the redo log files, recovery
may miss some latest mini-transaction commits.
When innodb_flush_log_at_trx_commit=1, the only mini-transaction
persisted to disk is that of a User transaction commit. All earlier mini-transactions
are also persisted.
Mini-transaction Write operations on Index Trees
In InnoDB, a table is a collection of indexes: PRIMARY KEY (clustered index,
storing all materialized columns), and optional secondary indexes.
A mini-transaction can only modify one index at a time.
● Metadata change: Delete-mark (or unmark) a record
● Insert: optimistic (fits in leaf page) or pessimistic (allocate page+split+write)
○ Allocate operation modifies the page allocation bitmap page belonging to that page number
○ Tablespace extension occurs if needed (file extended, tablespace metadata change)
● Delete (purge,rollback): optimistic or pessimistic (merge+free page)
● Index lookup: Acquire index lock, dive from the root to the leaf page to edit
Read Mini-Transactions in Transactions
In InnoDB, a table is a collection of indexes: PRIMARY KEY (clustered index,
storing all materialized columns), and optional secondary indexes.
A mini-transaction can read only 1 secondary index, the clustered index and its
undo log records
● Index lookup: Look up a single record, dive from the root to the leaf page.
○ If this version is not to be seen, InnoDB will read the older version using DB_ROLL_PTR
● Index range scan: Index lookup followed by:
○ release locks except the leaf page
○ acquire next leaf page lock, release previous page lock, …
○ Typical use case: Filtering records for the read view, or for pushed-down WHERE clause
● Encryption key rotation: Write a dummy change to an encrypted page, to cause
the page to be encrypted with a new key.
Flushing dirty pages
● This process runs asynchronously from the transactions
● Every second up to innodb_io_capacity iops will be used to write
changes to data files
● Includes (potentially no longer needed) undo log pages and the contents of
freed pages. To be improved: Collect garbage, do not write it!
● Data-at-rest encryption and page compression is done at this point
● Pages with oldest changes are written first
○ InnoDB keeps a list of dirty pages, sorted by the mini-transaction end LSN
On startup, any redo log after the latest checkpoint will be read and applied.
● A redo log checkpoint logically truncates the start of the redo log, up to the
LSN of the oldest dirty page.
● A clean shutdown ends with a log checkpoint.
● The InnoDB master thread periodically makes log checkpoints.
● When the redo log is full, a checkpoint will be initiated. Aggressive flushing
can occur on any mini-transaction write.
○ Until enough space is freed up, the server blocks any other write
○ This is very bad for performance, a bigger innodb_log_file_size would help
Redo Log Checkpoint (Truncating Redo Log)
What if the server was killed in the middle of a page flush?
● On Linux, killing a process may result in a partial write (usually 4096n bytes)
● Power outage? Disconnected cable? Cheating fsync()?
● On Windows with innodb_page_size=4k and matching sector size, no
problem.
Page writes first go to the doublewrite buffer (128 pages in ibdata1).
● If startup finds a corrupted page, it will try doublewrite buffer recovery.
● If it fails, then we are out of luck, because redo log records (usually) do not
contain the full page image.
Bugs and improvement potential: MDEV-11799, MDEV-12699, MDEV-12905
One More Layer: the Doublewrite Buffer
On startup, InnoDB performs some recovery steps:
● Read and apply all redo log since the latest redo log checkpoint.
○ While doing this, restore any half-written pages from the doublewrite buffer.
○ A clean shutdown ends with a log checkpoint, so the log would be empty.
● Resurrect non-committed transactions from the undo logs.
○ Tables referred to by the undo log records are opened.
○ This also resurrects implicit locks on all modified records, and table locks to block DDL.
○ XA PREPARE transactions will remain until explicit XA COMMIT or XA ROLLBACK.
● Initiate background ROLLBACK of active transactions.
○ Until this is completed, the locks may block some operations from new operations.
○ DBA hack: If you intend to DROP TABLE right after restart, delete the .ibd file upfront
Crash Recovery
Transaction Isolation Levels and
Multi-Version Concurrency Control
Read Views and Isolation Levels
The lowest transaction isolation level is READ UNCOMMITTED.
● Returns the newest data from the indexes, “raw” mini-transaction view.
● Indexes may appear inconsistent both with each other and internally: an
UPDATE of a key may be observed as a DELETE, with no INSERT observed.
REPEATABLE READ uses non-locking read views:
● Created on the first read of a record from an InnoDB table, or
● Explicitly by START TRANSACTION WITH CONSISTENT SNAPSHOT
READ COMMITTED is similar to REPEATABLE READ, but the read view is
● Created at the start of each statement, on the first read of an InnoDB record
Locks and Isolation Levels
The highest transaction isolation level SERIALIZABLE implies
SELECT…LOCK IN SHARE MODE
● Only a committed record can be locked. It is always the latest version.
● DELETE and UPDATE will internally do SELECT…FOR UPDATE, acquiring
explicit exclusive locks.
● INSERT initially acquires an implicit lock, identified by DB_TRX_ID pointing
to a non-committed transaction.
Read Views, Implicit Locks, and the Undo Log
InnoDB may have to read undo logs for the purposes of:
● Multi-versioned (non-locking) reads based on a read view:
○ Get the appropriate version of a record, or
○ “Unsee” a record that was inserted in the future or whose deletion is visible.
● Determining if a secondary index record is locked by an active transaction.
○ The PRIMARY KEY record is implicitly locked if DB_TRX_ID refers to an active transaction.
○ On conflict, the lock waiter will convert the implicit exclusive lock into an explicit one.
● Transaction rollback
○ Rollback to the start of statement, for example on duplicate key error
○ ROLLBACK [TO SAVEPOINT]
○ Explicit locks will be retained until the transaction is completed
○ Implicit locks are released one by one, for each rolled-back row change.
MVCC Read Views and the Undo Log
Multi-versioning concurrency control: provide non-locking reads from a virtual
‘snapshot’ that corresponds to a read view: The current DB_TRX_ID and a list
of DB_TRX_IDs that are not committed at that time.
The read view contains:
● Any changes made by the current transaction.
● Any records that were committed when the read view was created.
● No changes of transactions that were started or committed after the read
view was created.
“Too new” INSERT can be simply ignored. DELETE and UPDATE make the
previous version available by pointing to the undo log record.
MVCC Read Views in the PRIMARY Index
The hidden PRIMARY index fields DB_TRX_ID, DB_ROLL_PTR and undo log
records constitute a singly-linked list, from the newest version to the oldest.
A non-locking read iterates this list in a loop, starting from the newest version:
● If DB_TRX_ID is in the read view:
Return the record, or skip it if delete-marked.
● If DB_TRX_ID is in the future and DB_ROLL_PTR carries the “insert” flag:
Skip the record (the oldest version was inserted in the future).
● Else: Find the undo log record by DB_ROLL_PTR , construct the previous
version, go to next iteration.
MVCC Read Views and Secondary Indexes
Secondary indexes only contain a PAGE_MAX_TRX_ID.
● Secondary indexes may contain multiple (indexed_col, pk) records pointing to
a single pk, one for each value of indexed_col.
● All versions except at most one must be delete-marked.
● If a secondary index record is delete-marked, MVCC must look up pk in the
PRIMARY index and attempt to construct a version that matches indexed_col,
to determine if (indexed_col, pk) exists in the read view.
● If the PAGE_MAX_TRX_ID is too new, for each record in the secondary index
leaf page, we must look up pk.
● For old read views, secondary index scan can be very expensive!
Purging Old History
UPDATE talk SET attendees = 25 WHERE conference="M|18"
AND name="Deep Dive";
● If there were any secondary indexes that depend on the updated columns, we
delete-marked the records that contained the old values, before inserting
attendees=25. Those records should be removed eventually.
● In the primary key index, in MariaDB 10.3 we would set DB_TRX_ID=0 after
the history is no longer needed, to speed up future MVCC and locking checks.
● Also, the undo log pages are eventually freed for reuse, in a mini-transaction
Truncate undo log head.
Purging Old History
The purge threads can start removing history as soon as it is no longer visible
to any active read view.
● READ COMMITTED switches read views at the start of each statement.
● REPEATABLE READ keeps the read view until commit (or full rollback).
Undo log is by default stored in the InnoDB system tablespace (ibdata1).
● Because of purge lag, this file could grow a lot. InnoDB files never shrink!
● Open read views prevent purge from running.
● Purge lag (long History list length in SHOW ENGINE INNODB
STATUS) causes secondary indexes and undo logs to grow, and slows down
operations.
Purge Lag and Long-Running Transactions
Secondary Indexes and the Purge Lag
In the worst case, MVCC and implicit lock checks will require a clustered index
lookup for each secondary index record, followed by undo log lookups.
● Updates of indexed columns make this very expensive: O(versions·rows²).
● Ensure that purge can remove old history: Avoid long-lived read views.
● Avoid secondary index scans in long-lived read views.
● Watch the History list length in SHOW ENGINE INNODB STATUS!
Thank you!
To be continued in the other Deep Dive:
InnoDB Transactions and Replication

Weitere ähnliche Inhalte

Was ist angesagt?

Parallel Replication in MySQL and MariaDB
Parallel Replication in MySQL and MariaDBParallel Replication in MySQL and MariaDB
Parallel Replication in MySQL and MariaDBMydbops
 
MongoDB WiredTiger Internals: Journey To Transactions
MongoDB WiredTiger Internals: Journey To TransactionsMongoDB WiredTiger Internals: Journey To Transactions
MongoDB WiredTiger Internals: Journey To TransactionsMydbops
 
MySQL8.0_performance_schema.pptx
MySQL8.0_performance_schema.pptxMySQL8.0_performance_schema.pptx
MySQL8.0_performance_schema.pptxNeoClova
 
Linux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performancePostgreSQL-Consulting
 
Secondary Index Search in InnoDB
Secondary Index Search in InnoDBSecondary Index Search in InnoDB
Secondary Index Search in InnoDBMIJIN AN
 
In-memory OLTP storage with persistence and transaction support
In-memory OLTP storage with persistence and transaction supportIn-memory OLTP storage with persistence and transaction support
In-memory OLTP storage with persistence and transaction supportAlexander Korotkov
 
Solving PostgreSQL wicked problems
Solving PostgreSQL wicked problemsSolving PostgreSQL wicked problems
Solving PostgreSQL wicked problemsAlexander Korotkov
 
MySQL Administrator 2021 - 네오클로바
MySQL Administrator 2021 - 네오클로바MySQL Administrator 2021 - 네오클로바
MySQL Administrator 2021 - 네오클로바NeoClova
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL AdministrationEDB
 
[오픈소스컨설팅]Day #1 MySQL 엔진소개, 튜닝, 백업 및 복구, 업그레이드방법
[오픈소스컨설팅]Day #1 MySQL 엔진소개, 튜닝, 백업 및 복구, 업그레이드방법[오픈소스컨설팅]Day #1 MySQL 엔진소개, 튜닝, 백업 및 복구, 업그레이드방법
[오픈소스컨설팅]Day #1 MySQL 엔진소개, 튜닝, 백업 및 복구, 업그레이드방법Ji-Woong Choi
 
Postgresql database administration volume 1
Postgresql database administration volume 1Postgresql database administration volume 1
Postgresql database administration volume 1Federico Campoli
 
MariaDB Server Performance Tuning & Optimization
MariaDB Server Performance Tuning & OptimizationMariaDB Server Performance Tuning & Optimization
MariaDB Server Performance Tuning & OptimizationMariaDB plc
 
What is new in PostgreSQL 14?
What is new in PostgreSQL 14?What is new in PostgreSQL 14?
What is new in PostgreSQL 14?Mydbops
 
What's new in MariaDB TX 3.0
What's new in MariaDB TX 3.0What's new in MariaDB TX 3.0
What's new in MariaDB TX 3.0MariaDB plc
 
A Technical Introduction to WiredTiger
A Technical Introduction to WiredTigerA Technical Introduction to WiredTiger
A Technical Introduction to WiredTigerMongoDB
 
Upgrade from MySQL 5.7 to MySQL 8.0
Upgrade from MySQL 5.7 to MySQL 8.0Upgrade from MySQL 5.7 to MySQL 8.0
Upgrade from MySQL 5.7 to MySQL 8.0Olivier DASINI
 
PostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsPostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsCommand Prompt., Inc
 
MySQL 상태 메시지 분석 및 활용
MySQL 상태 메시지 분석 및 활용MySQL 상태 메시지 분석 및 활용
MySQL 상태 메시지 분석 및 활용I Goo Lee
 
Query Optimization with MySQL 8.0 and MariaDB 10.3: The Basics
Query Optimization with MySQL 8.0 and MariaDB 10.3: The BasicsQuery Optimization with MySQL 8.0 and MariaDB 10.3: The Basics
Query Optimization with MySQL 8.0 and MariaDB 10.3: The BasicsJaime Crespo
 
MySQL Space Management
MySQL Space ManagementMySQL Space Management
MySQL Space ManagementMIJIN AN
 

Was ist angesagt? (20)

Parallel Replication in MySQL and MariaDB
Parallel Replication in MySQL and MariaDBParallel Replication in MySQL and MariaDB
Parallel Replication in MySQL and MariaDB
 
MongoDB WiredTiger Internals: Journey To Transactions
MongoDB WiredTiger Internals: Journey To TransactionsMongoDB WiredTiger Internals: Journey To Transactions
MongoDB WiredTiger Internals: Journey To Transactions
 
MySQL8.0_performance_schema.pptx
MySQL8.0_performance_schema.pptxMySQL8.0_performance_schema.pptx
MySQL8.0_performance_schema.pptx
 
Linux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performance
 
Secondary Index Search in InnoDB
Secondary Index Search in InnoDBSecondary Index Search in InnoDB
Secondary Index Search in InnoDB
 
In-memory OLTP storage with persistence and transaction support
In-memory OLTP storage with persistence and transaction supportIn-memory OLTP storage with persistence and transaction support
In-memory OLTP storage with persistence and transaction support
 
Solving PostgreSQL wicked problems
Solving PostgreSQL wicked problemsSolving PostgreSQL wicked problems
Solving PostgreSQL wicked problems
 
MySQL Administrator 2021 - 네오클로바
MySQL Administrator 2021 - 네오클로바MySQL Administrator 2021 - 네오클로바
MySQL Administrator 2021 - 네오클로바
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL Administration
 
[오픈소스컨설팅]Day #1 MySQL 엔진소개, 튜닝, 백업 및 복구, 업그레이드방법
[오픈소스컨설팅]Day #1 MySQL 엔진소개, 튜닝, 백업 및 복구, 업그레이드방법[오픈소스컨설팅]Day #1 MySQL 엔진소개, 튜닝, 백업 및 복구, 업그레이드방법
[오픈소스컨설팅]Day #1 MySQL 엔진소개, 튜닝, 백업 및 복구, 업그레이드방법
 
Postgresql database administration volume 1
Postgresql database administration volume 1Postgresql database administration volume 1
Postgresql database administration volume 1
 
MariaDB Server Performance Tuning & Optimization
MariaDB Server Performance Tuning & OptimizationMariaDB Server Performance Tuning & Optimization
MariaDB Server Performance Tuning & Optimization
 
What is new in PostgreSQL 14?
What is new in PostgreSQL 14?What is new in PostgreSQL 14?
What is new in PostgreSQL 14?
 
What's new in MariaDB TX 3.0
What's new in MariaDB TX 3.0What's new in MariaDB TX 3.0
What's new in MariaDB TX 3.0
 
A Technical Introduction to WiredTiger
A Technical Introduction to WiredTigerA Technical Introduction to WiredTiger
A Technical Introduction to WiredTiger
 
Upgrade from MySQL 5.7 to MySQL 8.0
Upgrade from MySQL 5.7 to MySQL 8.0Upgrade from MySQL 5.7 to MySQL 8.0
Upgrade from MySQL 5.7 to MySQL 8.0
 
PostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsPostgreSQL Administration for System Administrators
PostgreSQL Administration for System Administrators
 
MySQL 상태 메시지 분석 및 활용
MySQL 상태 메시지 분석 및 활용MySQL 상태 메시지 분석 및 활용
MySQL 상태 메시지 분석 및 활용
 
Query Optimization with MySQL 8.0 and MariaDB 10.3: The Basics
Query Optimization with MySQL 8.0 and MariaDB 10.3: The BasicsQuery Optimization with MySQL 8.0 and MariaDB 10.3: The Basics
Query Optimization with MySQL 8.0 and MariaDB 10.3: The Basics
 
MySQL Space Management
MySQL Space ManagementMySQL Space Management
MySQL Space Management
 

Ähnlich wie M|18 Deep Dive: InnoDB Transactions and Write Paths

Faster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDBFaster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDBMariaDB plc
 
Evolution of MySQL Parallel Replication
Evolution of MySQL Parallel Replication Evolution of MySQL Parallel Replication
Evolution of MySQL Parallel Replication Mydbops
 
cPanelCon 2014: InnoDB Anatomy
cPanelCon 2014: InnoDB AnatomycPanelCon 2014: InnoDB Anatomy
cPanelCon 2014: InnoDB AnatomyRyan Robson
 
MySQL Parallel Replication: inventory, use-case and limitations
MySQL Parallel Replication: inventory, use-case and limitationsMySQL Parallel Replication: inventory, use-case and limitations
MySQL Parallel Replication: inventory, use-case and limitationsJean-François Gagné
 
Inno db datafiles backup and retore
Inno db datafiles backup and retoreInno db datafiles backup and retore
Inno db datafiles backup and retoreVasudeva Rao
 
MySQL Parallel Replication: inventory, use-cases and limitations
MySQL Parallel Replication: inventory, use-cases and limitationsMySQL Parallel Replication: inventory, use-cases and limitations
MySQL Parallel Replication: inventory, use-cases and limitationsJean-François Gagné
 
Webinar Slides: Migrating to Galera Cluster
Webinar Slides: Migrating to Galera ClusterWebinar Slides: Migrating to Galera Cluster
Webinar Slides: Migrating to Galera ClusterSeveralnines
 
Percona 服务器与 XtraDB 存储引擎
Percona 服务器与 XtraDB 存储引擎Percona 服务器与 XtraDB 存储引擎
Percona 服务器与 XtraDB 存储引擎YUCHENG HU
 
InnoDB: архитектура транзакционного хранилища (Константин Осипов)
InnoDB: архитектура транзакционного хранилища (Константин Осипов)InnoDB: архитектура транзакционного хранилища (Константин Осипов)
InnoDB: архитектура транзакционного хранилища (Константин Осипов)Ontico
 
InnoDB Scalability improvements in MySQL 8.0
InnoDB Scalability improvements in MySQL 8.0InnoDB Scalability improvements in MySQL 8.0
InnoDB Scalability improvements in MySQL 8.0Mydbops
 
Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability
Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous AvailabilityRamp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability
Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous AvailabilityPythian
 
MySQL Parallel Replication: inventory, use-case and limitations
MySQL Parallel Replication: inventory, use-case and limitationsMySQL Parallel Replication: inventory, use-case and limitations
MySQL Parallel Replication: inventory, use-case and limitationsJean-François Gagné
 
jacobs_tuuri_performance
jacobs_tuuri_performancejacobs_tuuri_performance
jacobs_tuuri_performanceHiroshi Ono
 
Transaction unit 1 topic 4
Transaction unit 1 topic 4Transaction unit 1 topic 4
Transaction unit 1 topic 4avniS
 
Microsoft Hekaton
Microsoft HekatonMicrosoft Hekaton
Microsoft HekatonSiraj Memon
 
Introduction to redis - version 2
Introduction to redis - version 2Introduction to redis - version 2
Introduction to redis - version 2Dvir Volk
 
MySQL/MariaDB Parallel Replication: inventory, use-case and limitations
MySQL/MariaDB Parallel Replication: inventory, use-case and limitationsMySQL/MariaDB Parallel Replication: inventory, use-case and limitations
MySQL/MariaDB Parallel Replication: inventory, use-case and limitationsJean-François Gagné
 
Percona XtraBackup - New Features and Improvements
Percona XtraBackup - New Features and ImprovementsPercona XtraBackup - New Features and Improvements
Percona XtraBackup - New Features and ImprovementsMarcelo Altmann
 
Kafka’s New Control Plane: The Quorum Controller | Colin McCabe, Confluent
Kafka’s New Control Plane: The Quorum Controller | Colin McCabe, ConfluentKafka’s New Control Plane: The Quorum Controller | Colin McCabe, Confluent
Kafka’s New Control Plane: The Quorum Controller | Colin McCabe, ConfluentHostedbyConfluent
 

Ähnlich wie M|18 Deep Dive: InnoDB Transactions and Write Paths (20)

Faster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDBFaster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDB
 
Evolution of MySQL Parallel Replication
Evolution of MySQL Parallel Replication Evolution of MySQL Parallel Replication
Evolution of MySQL Parallel Replication
 
cPanelCon 2014: InnoDB Anatomy
cPanelCon 2014: InnoDB AnatomycPanelCon 2014: InnoDB Anatomy
cPanelCon 2014: InnoDB Anatomy
 
MySQL Parallel Replication: inventory, use-case and limitations
MySQL Parallel Replication: inventory, use-case and limitationsMySQL Parallel Replication: inventory, use-case and limitations
MySQL Parallel Replication: inventory, use-case and limitations
 
Inno db datafiles backup and retore
Inno db datafiles backup and retoreInno db datafiles backup and retore
Inno db datafiles backup and retore
 
MySQL and MariaDB Backups
MySQL and MariaDB BackupsMySQL and MariaDB Backups
MySQL and MariaDB Backups
 
MySQL Parallel Replication: inventory, use-cases and limitations
MySQL Parallel Replication: inventory, use-cases and limitationsMySQL Parallel Replication: inventory, use-cases and limitations
MySQL Parallel Replication: inventory, use-cases and limitations
 
Webinar Slides: Migrating to Galera Cluster
Webinar Slides: Migrating to Galera ClusterWebinar Slides: Migrating to Galera Cluster
Webinar Slides: Migrating to Galera Cluster
 
Percona 服务器与 XtraDB 存储引擎
Percona 服务器与 XtraDB 存储引擎Percona 服务器与 XtraDB 存储引擎
Percona 服务器与 XtraDB 存储引擎
 
InnoDB: архитектура транзакционного хранилища (Константин Осипов)
InnoDB: архитектура транзакционного хранилища (Константин Осипов)InnoDB: архитектура транзакционного хранилища (Константин Осипов)
InnoDB: архитектура транзакционного хранилища (Константин Осипов)
 
InnoDB Scalability improvements in MySQL 8.0
InnoDB Scalability improvements in MySQL 8.0InnoDB Scalability improvements in MySQL 8.0
InnoDB Scalability improvements in MySQL 8.0
 
Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability
Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous AvailabilityRamp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability
Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability
 
MySQL Parallel Replication: inventory, use-case and limitations
MySQL Parallel Replication: inventory, use-case and limitationsMySQL Parallel Replication: inventory, use-case and limitations
MySQL Parallel Replication: inventory, use-case and limitations
 
jacobs_tuuri_performance
jacobs_tuuri_performancejacobs_tuuri_performance
jacobs_tuuri_performance
 
Transaction unit 1 topic 4
Transaction unit 1 topic 4Transaction unit 1 topic 4
Transaction unit 1 topic 4
 
Microsoft Hekaton
Microsoft HekatonMicrosoft Hekaton
Microsoft Hekaton
 
Introduction to redis - version 2
Introduction to redis - version 2Introduction to redis - version 2
Introduction to redis - version 2
 
MySQL/MariaDB Parallel Replication: inventory, use-case and limitations
MySQL/MariaDB Parallel Replication: inventory, use-case and limitationsMySQL/MariaDB Parallel Replication: inventory, use-case and limitations
MySQL/MariaDB Parallel Replication: inventory, use-case and limitations
 
Percona XtraBackup - New Features and Improvements
Percona XtraBackup - New Features and ImprovementsPercona XtraBackup - New Features and Improvements
Percona XtraBackup - New Features and Improvements
 
Kafka’s New Control Plane: The Quorum Controller | Colin McCabe, Confluent
Kafka’s New Control Plane: The Quorum Controller | Colin McCabe, ConfluentKafka’s New Control Plane: The Quorum Controller | Colin McCabe, Confluent
Kafka’s New Control Plane: The Quorum Controller | Colin McCabe, Confluent
 

Mehr von MariaDB plc

MariaDB Paris Workshop 2023 - MaxScale 23.02.x
MariaDB Paris Workshop 2023 - MaxScale 23.02.xMariaDB Paris Workshop 2023 - MaxScale 23.02.x
MariaDB Paris Workshop 2023 - MaxScale 23.02.xMariaDB plc
 
MariaDB Paris Workshop 2023 - Newpharma
MariaDB Paris Workshop 2023 - NewpharmaMariaDB Paris Workshop 2023 - Newpharma
MariaDB Paris Workshop 2023 - NewpharmaMariaDB plc
 
MariaDB Paris Workshop 2023 - Cloud
MariaDB Paris Workshop 2023 - CloudMariaDB Paris Workshop 2023 - Cloud
MariaDB Paris Workshop 2023 - CloudMariaDB plc
 
MariaDB Paris Workshop 2023 - MariaDB Enterprise
MariaDB Paris Workshop 2023 - MariaDB EnterpriseMariaDB Paris Workshop 2023 - MariaDB Enterprise
MariaDB Paris Workshop 2023 - MariaDB EnterpriseMariaDB plc
 
MariaDB Paris Workshop 2023 - Performance Optimization
MariaDB Paris Workshop 2023 - Performance OptimizationMariaDB Paris Workshop 2023 - Performance Optimization
MariaDB Paris Workshop 2023 - Performance OptimizationMariaDB plc
 
MariaDB Paris Workshop 2023 - MaxScale
MariaDB Paris Workshop 2023 - MaxScale MariaDB Paris Workshop 2023 - MaxScale
MariaDB Paris Workshop 2023 - MaxScale MariaDB plc
 
MariaDB Paris Workshop 2023 - novadys presentation
MariaDB Paris Workshop 2023 - novadys presentationMariaDB Paris Workshop 2023 - novadys presentation
MariaDB Paris Workshop 2023 - novadys presentationMariaDB plc
 
MariaDB Paris Workshop 2023 - DARVA presentation
MariaDB Paris Workshop 2023 - DARVA presentationMariaDB Paris Workshop 2023 - DARVA presentation
MariaDB Paris Workshop 2023 - DARVA presentationMariaDB plc
 
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server MariaDB plc
 
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-BackupMariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-BackupMariaDB plc
 
Einführung : MariaDB Tech und Business Update Hamburg 2023
Einführung : MariaDB Tech und Business Update Hamburg 2023Einführung : MariaDB Tech und Business Update Hamburg 2023
Einführung : MariaDB Tech und Business Update Hamburg 2023MariaDB plc
 
Hochverfügbarkeitslösungen mit MariaDB
Hochverfügbarkeitslösungen mit MariaDBHochverfügbarkeitslösungen mit MariaDB
Hochverfügbarkeitslösungen mit MariaDBMariaDB plc
 
Die Neuheiten in MariaDB Enterprise Server
Die Neuheiten in MariaDB Enterprise ServerDie Neuheiten in MariaDB Enterprise Server
Die Neuheiten in MariaDB Enterprise ServerMariaDB plc
 
Global Data Replication with Galera for Ansell Guardian®
Global Data Replication with Galera for Ansell Guardian®Global Data Replication with Galera for Ansell Guardian®
Global Data Replication with Galera for Ansell Guardian®MariaDB plc
 
Introducing workload analysis
Introducing workload analysisIntroducing workload analysis
Introducing workload analysisMariaDB plc
 
Under the hood: SkySQL monitoring
Under the hood: SkySQL monitoringUnder the hood: SkySQL monitoring
Under the hood: SkySQL monitoringMariaDB plc
 
Introducing the R2DBC async Java connector
Introducing the R2DBC async Java connectorIntroducing the R2DBC async Java connector
Introducing the R2DBC async Java connectorMariaDB plc
 
MariaDB Enterprise Tools introduction
MariaDB Enterprise Tools introductionMariaDB Enterprise Tools introduction
MariaDB Enterprise Tools introductionMariaDB plc
 
The architecture of SkySQL
The architecture of SkySQLThe architecture of SkySQL
The architecture of SkySQLMariaDB plc
 
What to expect from MariaDB Platform X5, part 1
What to expect from MariaDB Platform X5, part 1What to expect from MariaDB Platform X5, part 1
What to expect from MariaDB Platform X5, part 1MariaDB plc
 

Mehr von MariaDB plc (20)

MariaDB Paris Workshop 2023 - MaxScale 23.02.x
MariaDB Paris Workshop 2023 - MaxScale 23.02.xMariaDB Paris Workshop 2023 - MaxScale 23.02.x
MariaDB Paris Workshop 2023 - MaxScale 23.02.x
 
MariaDB Paris Workshop 2023 - Newpharma
MariaDB Paris Workshop 2023 - NewpharmaMariaDB Paris Workshop 2023 - Newpharma
MariaDB Paris Workshop 2023 - Newpharma
 
MariaDB Paris Workshop 2023 - Cloud
MariaDB Paris Workshop 2023 - CloudMariaDB Paris Workshop 2023 - Cloud
MariaDB Paris Workshop 2023 - Cloud
 
MariaDB Paris Workshop 2023 - MariaDB Enterprise
MariaDB Paris Workshop 2023 - MariaDB EnterpriseMariaDB Paris Workshop 2023 - MariaDB Enterprise
MariaDB Paris Workshop 2023 - MariaDB Enterprise
 
MariaDB Paris Workshop 2023 - Performance Optimization
MariaDB Paris Workshop 2023 - Performance OptimizationMariaDB Paris Workshop 2023 - Performance Optimization
MariaDB Paris Workshop 2023 - Performance Optimization
 
MariaDB Paris Workshop 2023 - MaxScale
MariaDB Paris Workshop 2023 - MaxScale MariaDB Paris Workshop 2023 - MaxScale
MariaDB Paris Workshop 2023 - MaxScale
 
MariaDB Paris Workshop 2023 - novadys presentation
MariaDB Paris Workshop 2023 - novadys presentationMariaDB Paris Workshop 2023 - novadys presentation
MariaDB Paris Workshop 2023 - novadys presentation
 
MariaDB Paris Workshop 2023 - DARVA presentation
MariaDB Paris Workshop 2023 - DARVA presentationMariaDB Paris Workshop 2023 - DARVA presentation
MariaDB Paris Workshop 2023 - DARVA presentation
 
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
 
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-BackupMariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
 
Einführung : MariaDB Tech und Business Update Hamburg 2023
Einführung : MariaDB Tech und Business Update Hamburg 2023Einführung : MariaDB Tech und Business Update Hamburg 2023
Einführung : MariaDB Tech und Business Update Hamburg 2023
 
Hochverfügbarkeitslösungen mit MariaDB
Hochverfügbarkeitslösungen mit MariaDBHochverfügbarkeitslösungen mit MariaDB
Hochverfügbarkeitslösungen mit MariaDB
 
Die Neuheiten in MariaDB Enterprise Server
Die Neuheiten in MariaDB Enterprise ServerDie Neuheiten in MariaDB Enterprise Server
Die Neuheiten in MariaDB Enterprise Server
 
Global Data Replication with Galera for Ansell Guardian®
Global Data Replication with Galera for Ansell Guardian®Global Data Replication with Galera for Ansell Guardian®
Global Data Replication with Galera for Ansell Guardian®
 
Introducing workload analysis
Introducing workload analysisIntroducing workload analysis
Introducing workload analysis
 
Under the hood: SkySQL monitoring
Under the hood: SkySQL monitoringUnder the hood: SkySQL monitoring
Under the hood: SkySQL monitoring
 
Introducing the R2DBC async Java connector
Introducing the R2DBC async Java connectorIntroducing the R2DBC async Java connector
Introducing the R2DBC async Java connector
 
MariaDB Enterprise Tools introduction
MariaDB Enterprise Tools introductionMariaDB Enterprise Tools introduction
MariaDB Enterprise Tools introduction
 
The architecture of SkySQL
The architecture of SkySQLThe architecture of SkySQL
The architecture of SkySQL
 
What to expect from MariaDB Platform X5, part 1
What to expect from MariaDB Platform X5, part 1What to expect from MariaDB Platform X5, part 1
What to expect from MariaDB Platform X5, part 1
 

Kürzlich hochgeladen

办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Business Analytics using Microsoft Excel
Business Analytics using Microsoft ExcelBusiness Analytics using Microsoft Excel
Business Analytics using Microsoft Excelysmaelreyes
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGIThomas Poetter
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一F sss
 
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...ttt fff
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 

Kürzlich hochgeladen (20)

办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Business Analytics using Microsoft Excel
Business Analytics using Microsoft ExcelBusiness Analytics using Microsoft Excel
Business Analytics using Microsoft Excel
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
 
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 

M|18 Deep Dive: InnoDB Transactions and Write Paths

  • 1. Deep Dive: InnoDB Transactions and Write Paths From the client connection to physical storage Marko Mäkelä, Lead Developer InnoDB Michaël de Groot, MariaDB Consultant
  • 2. InnoDB Concepts A mini-transaction is an atomic set of page reads or writes, with write-ahead redo log. A transaction writes undo log before modifying indexes. The read view of a transaction may access the undo logs of newer transactions to retrieve old versions. Purge may remove old undo logs and delete-marked records once no read view needs them. Some terms that an Advanced DBA should be familiar with
  • 4. A Comparison to the OSI Model (Upper Layers) The Open Systems Interconnection Model 7. Application layer ● Example: HTML5 web application ● Example: apt update; apt upgrade 6. Presentation layer ● XML, HTML, CSS, … ● JSON, BSON, … ● ASN.1 BER, … 5. Session layer ● SSL, TLS ● Web browser cookies, … Some layers inside the MariaDB Server: 7. client connection ● Encrypted or cleartext ● Direct or via proxy 6. SQL ● Parser ● Access control ● Query optimization & execution 5. Storage engine interface ● Transactions: start, commit, rollback ● Tables: open, close, read, write
  • 5. A Comparison to the OSI Model (Lower Layers) The Open Systems Interconnection Model 4. Transport layer ● TCP/IP turns packets into reliable streams ● Retransmission, flow control 3. Network layer ● router/switch ● IP, ICMP, UDP, BGP, DNS, … 2. Data link ● Packet framing ● Checksums 1. Physical ● MAC: CSMA/CD, CSMA/CA, … ● LAN, WLAN, ATM, RS-232, … InnoDB Storage Engine 4. Transaction ● Atomic, Consistent, Isolated access to multiple tables via Locks & Read Views ● XA 2PC (distributed transactions by user, or binlog-driven for cross-engine commit) 3. Mini-transaction ● Atomic, Durable multi-page changes ● Page checksums, crash recovery 2. Operating System (file system, block device) ● Ext4, XFS, ZFS, NFS, … 1. Hardware/Firmware (physical storage) ● Hard disk, SSD, NVRAM, …
  • 6. SQL and storage engine interface (5-6)
  • 7. Step 1: SQL Layer UPDATE talk SET attendees = 25 WHERE conference="M|18" AND name="Deep Dive"; ● Constructs a parse tree. ● Checks for permissions. ● Acquires metadata lock on the table name (prevent DDL) and opens the table. ● Retrieves index cardinality statistics from the storage engine(s). ● Constructs a query execution plan.
  • 8. Step 2a: Read via the Storage Engine Interface UPDATE talk SET attendees = 25 WHERE conference="M|18" AND name="Deep Dive"; ● Find the matching record(s) via the chosen index (secondary index or primary key index), either via lookup or index scan. ● On the very first read, InnoDB will lazily start a transaction: ○ Assign a new DB_TRX_ID (incrementing number global in InnoDB) ○ No read view is needed, because this is a locking operation.
  • 9. Step 2b: Filtering the Rows UPDATE talk SET attendees = 25 WHERE conference="M|18" AND name="Deep Dive"; ● If the WHERE condition can only be satisfied by range scan or table scan, we will have to filter out non-matching rows. ○ If Index Condition Pushdown is used, InnoDB evaluates the predicate and filters rows. ○ Else, InnoDB will return every record one by one, and the SQL layer will decide what to do. ● After returning a single record, a new mini-transaction has to be started ○ Repositioning the cursor (search for a key) is moderately expensive ○ Optimization: After 4 separate reads, InnoDB will prefetch 8 rows (could be improved)
  • 10. Step 2c: Locking the rows ● InnoDB will write-lock each index leaf read ○ Note: InnoDB has no “table row locks”, but “record locks” in each index separately. ○ If using index condition pushdown, non-matching records are not locked. ● All records that may be changed are locked exclusively ○ When using a secondary index, all possibly matching records stay locked ○ Depending on the isolation level, InnoDB may unlock non-matching rows in primary key scan (MySQL Bug #3300, “semi-consistent read”) ○ Updates with a WHERE condition on PRIMARY KEY are faster!
  • 12. InnoDB Transaction Layer The InnoDB transaction layer relies on atomically updated data pages forming persistent data structures: ● Page allocation bitmap, index trees, undo log directories, undo logs ● Undo logs are the glue that make the indexes of a table look consistent. ● On startup, any pending transactions (with implicit locks) will be recovered from undo logs. ○ Locks will be kept until incomplete transactions are rolled back, or ○ When found in binlog after InnoDB recovery completes, or ○ until explicit XA COMMIT or XA ROLLBACK. ● InnoDB supports non-locking reads (multi-versioning concurrency control) by read view ‘snapshots’ that are based on undo logs.
  • 13. 128 B-tree root page B-tree leaf page DB_ROLL_PTR A Page View of the InnoDB Transaction Layer TRX_SYS Page (ibdata1) Rollback segment header page Table .ibd Index History List (to-purge committed) Uncommitted transactions (pkN,child page) (pkM,child page) B-tree internal page (pkN1,child page) (pkN2,child page) (pk2,DB_TRX_ID,DB_ROLL_PTR,cols) (pk1,DB_TRX_ID,DB_ROLL_PTR,cols) Undo Log (linked list of pages) Undo Log (linked list of pages)
  • 14. DB_TRX_ID and DB_ROLL_PTR ● DB_TRX_ID is an increasing sequence number, global in InnoDB ○ Each user transaction (except read-only non-locking) gets a new one, upon transaction create ○ While other transaction ids (GTID, binlog position) are in commit order, this transaction id is in create order ○ Each record has metadata with its current DB_TRX_ID ○ If the DB_TRX_ID is in a list of uncommitted transactions, there is an implicit lock ● DB_ROLL_PTR is a pointer to the undo log record of the previous version ○ Stored in the record's metadata (hidden column in the clustered index) ○ Or a flag in it is set, meaning that there is no previous version (the record was inserted)
  • 15. Step 3a: Creating undo log records ● On the first write, the transaction is created in the buffer pool: ○ Create or reuse a page (same size as other pages) ○ Create undo log header, update undo log directory (also known as “rollback segment page”). ○ Done in separate mini-transaction from undo log record write; optimized away in MariaDB 10.3 ● Undo log pages are normal data pages: They can fill up and get persisted ● All changes of the same transaction are added to the same undo log page(s) ● Writing each undo log record means (in a mini-transaction): ○ Append to last undo page, or allocate a new undo page and write the record there. ○ InnoDB writes each undo log in advance, before updating each index affected by the row operation. ● Undo log records need to be purged when they are no longer needed
  • 16. UPDATE talk SET attendees = 25 WHERE conference="M|18" AND name="Deep Dive"; ● update_row(old_row,new_row) is invoked for each matching row ● InnoDB calculates an “update vector” and applies to the current row. ● The primary key (clustered index) record will be write-locked. ○ It may already have been locked in the “Read” step. ● An undo log record will be written in its own mini-transaction. ● The primary key index will be updated in its own mini-transaction. ● For each secondary index on updated columns, use 2 mini-transactions: (1) search, lock, delete-mark old record, (2) insert new record. Step 3b: Updating the Matching Rows
  • 17. UPDATE talk SET attendees = 25 WHERE conference="M|18" AND name="Deep Dive"; ● Because autocommit was enabled and there was no BEGIN, the SQL layer will automatically commit at the end of each statement. ● In InnoDB, commit updates the undo log information in a mini-transaction: 1. Update undo log directory (rollback segments) and undo log header pages. 2. Commit the mini-transaction (write to the redo log buffer). 3. Obtain the last written LSN (redo log sequence number) in the buffer. 4. Optionally, ensure that all redo log buffer is written at least up to the LSN. Step 4a: Commit
  • 18. Alternate step 4a: Rollbacks ● Rollback is expensive because ○ All changes are done in reverse (jumping read pointers) ○ Reading undo log or index pages that were evicted from the buffer pool to disk ○ Secondary index leaf records might need removal (must look up primary key records) ● For each undo log record from the end to start (or savepoint), the log is applied in reverse: ○ For UPDATE: the old version of the data is read and an update vector calculated ● The same process as the update step is followed but reversed ○ 1 mini-transaction per clustered index ○ 2 mini-transactions per secondary index ● The undo log pages are freed in a maintenance operation called Truncate undo log tail.
  • 19. Step 4b: Cleaning up After User COMMIT or ROLLBACK is done, there are some clean-up steps: ● Transactional row locks can be released and waiting threads woken up. ● InnoDB will also wake up any other transactions that waited for the locks. ● If the binlog is enabled, a commit event will be synchronously written to it. ○ An internal XA PREPARE transition must have been persisted in the InnoDB redo log earlier. ○ Only the internal XA PREPARE but not the XA COMMIT needs to be flushed to redo log. ○ Listen to the replication talk by Andrei Elkin (next talk in the same room) ● Finally, the SQL layer will release metadata locks on the table name - the table can now be ALTERed again
  • 21. The InnoDB Mini-Transaction Layer Each layer provides service to the upper layers, relying on the service provided by the lower layer, adding some encapsulation or refinement: ● File system (or the layers below): Block address translation ○ Wear leveling, bad block avoidance ○ Deduplication ○ Redundancy or replication for fault tolerance or HA ● The InnoDB mini-transaction layer depends on a write-ahead redo log. ○ Provides an AcID service of modifying a number of persistent pages. ■ Example: Inserting a record into a B-tree, with a series of page splits. ○ An atomic mini-transaction becomes durable when its log is fully written to the redo log. ○ Recovery: Any redo log records after the latest checkpoint will be parsed and applied to the buffer pool copies of referenced persistent data file pages.
  • 22. Mini-Transactions and Page Locking A mini-transaction is not a user transaction. It is a short operation comprising: ● A list of index, tablespace or page locks that have been acquired. ● A list of modifications to pages. There is no rollback for mini-transactions. Locks of unmodified pages can be released any time. Commit will: 1. Append the log (if anything was modified) to the redo log buffer 2. Release all locks Index page lock acquisition must avoid deadlocks (server hangs): ● Backward scans are not possible. Only one index update per mini-transaction. ● Forward scan: acquire next page lock, release previous page (if not changed)
  • 23. Mini-Transaction Mini-Transactions: RW-Locks and Redo Logs Memo: Locks or Buffer-Fixes Index tree latch (dict_index_t::lock): covers internal pages Tablespace latch (fil_space_t::lock): allocating/freeing pages Log: Page Changes Data Files FIL_PAGE_LSN Flush (after log written) Redo Log Files (ib_logfile*) Redo Log Buffer (log_sys_t::buf) Write ahead (of page flush) to log (make durable) Buffer pool page buf_page_t::oldest_ modification commit A mini-transaction commit stores the log position (LSN) to each changed page. Recovery will redo changes: Apply log if the page LSN is older than the log record LSN. Log position (LSN)
  • 24. Mini-transactions and Durability Depending on when the redo log buffer was written to the redo log files, recovery may miss some latest mini-transaction commits. When innodb_flush_log_at_trx_commit=1, the only mini-transaction persisted to disk is that of a User transaction commit. All earlier mini-transactions are also persisted.
  • 25. Mini-transaction Write operations on Index Trees In InnoDB, a table is a collection of indexes: PRIMARY KEY (clustered index, storing all materialized columns), and optional secondary indexes. A mini-transaction can only modify one index at a time. ● Metadata change: Delete-mark (or unmark) a record ● Insert: optimistic (fits in leaf page) or pessimistic (allocate page+split+write) ○ Allocate operation modifies the page allocation bitmap page belonging to that page number ○ Tablespace extension occurs if needed (file extended, tablespace metadata change) ● Delete (purge,rollback): optimistic or pessimistic (merge+free page) ● Index lookup: Acquire index lock, dive from the root to the leaf page to edit
  • 26. Read Mini-Transactions in Transactions In InnoDB, a table is a collection of indexes: PRIMARY KEY (clustered index, storing all materialized columns), and optional secondary indexes. A mini-transaction can read only 1 secondary index, the clustered index and its undo log records ● Index lookup: Look up a single record, dive from the root to the leaf page. ○ If this version is not to be seen, InnoDB will read the older version using DB_ROLL_PTR ● Index range scan: Index lookup followed by: ○ release locks except the leaf page ○ acquire next leaf page lock, release previous page lock, … ○ Typical use case: Filtering records for the read view, or for pushed-down WHERE clause
  • 27. ● Encryption key rotation: Write a dummy change to an encrypted page, to cause the page to be encrypted with a new key.
  • 28. Flushing dirty pages ● This process runs asynchronously from the transactions ● Every second up to innodb_io_capacity iops will be used to write changes to data files ● Includes (potentially no longer needed) undo log pages and the contents of freed pages. To be improved: Collect garbage, do not write it! ● Data-at-rest encryption and page compression is done at this point ● Pages with oldest changes are written first ○ InnoDB keeps a list of dirty pages, sorted by the mini-transaction end LSN
  • 29. On startup, any redo log after the latest checkpoint will be read and applied. ● A redo log checkpoint logically truncates the start of the redo log, up to the LSN of the oldest dirty page. ● A clean shutdown ends with a log checkpoint. ● The InnoDB master thread periodically makes log checkpoints. ● When the redo log is full, a checkpoint will be initiated. Aggressive flushing can occur on any mini-transaction write. ○ Until enough space is freed up, the server blocks any other write ○ This is very bad for performance, a bigger innodb_log_file_size would help Redo Log Checkpoint (Truncating Redo Log)
  • 30. What if the server was killed in the middle of a page flush? ● On Linux, killing a process may result in a partial write (usually 4096n bytes) ● Power outage? Disconnected cable? Cheating fsync()? ● On Windows with innodb_page_size=4k and matching sector size, no problem. Page writes first go to the doublewrite buffer (128 pages in ibdata1). ● If startup finds a corrupted page, it will try doublewrite buffer recovery. ● If it fails, then we are out of luck, because redo log records (usually) do not contain the full page image. Bugs and improvement potential: MDEV-11799, MDEV-12699, MDEV-12905 One More Layer: the Doublewrite Buffer
  • 31. On startup, InnoDB performs some recovery steps: ● Read and apply all redo log since the latest redo log checkpoint. ○ While doing this, restore any half-written pages from the doublewrite buffer. ○ A clean shutdown ends with a log checkpoint, so the log would be empty. ● Resurrect non-committed transactions from the undo logs. ○ Tables referred to by the undo log records are opened. ○ This also resurrects implicit locks on all modified records, and table locks to block DDL. ○ XA PREPARE transactions will remain until explicit XA COMMIT or XA ROLLBACK. ● Initiate background ROLLBACK of active transactions. ○ Until this is completed, the locks may block some operations from new operations. ○ DBA hack: If you intend to DROP TABLE right after restart, delete the .ibd file upfront Crash Recovery
  • 32. Transaction Isolation Levels and Multi-Version Concurrency Control
  • 33. Read Views and Isolation Levels The lowest transaction isolation level is READ UNCOMMITTED. ● Returns the newest data from the indexes, “raw” mini-transaction view. ● Indexes may appear inconsistent both with each other and internally: an UPDATE of a key may be observed as a DELETE, with no INSERT observed. REPEATABLE READ uses non-locking read views: ● Created on the first read of a record from an InnoDB table, or ● Explicitly by START TRANSACTION WITH CONSISTENT SNAPSHOT READ COMMITTED is similar to REPEATABLE READ, but the read view is ● Created at the start of each statement, on the first read of an InnoDB record
  • 34. Locks and Isolation Levels The highest transaction isolation level SERIALIZABLE implies SELECT…LOCK IN SHARE MODE ● Only a committed record can be locked. It is always the latest version. ● DELETE and UPDATE will internally do SELECT…FOR UPDATE, acquiring explicit exclusive locks. ● INSERT initially acquires an implicit lock, identified by DB_TRX_ID pointing to a non-committed transaction.
  • 35. Read Views, Implicit Locks, and the Undo Log InnoDB may have to read undo logs for the purposes of: ● Multi-versioned (non-locking) reads based on a read view: ○ Get the appropriate version of a record, or ○ “Unsee” a record that was inserted in the future or whose deletion is visible. ● Determining if a secondary index record is locked by an active transaction. ○ The PRIMARY KEY record is implicitly locked if DB_TRX_ID refers to an active transaction. ○ On conflict, the lock waiter will convert the implicit exclusive lock into an explicit one. ● Transaction rollback ○ Rollback to the start of statement, for example on duplicate key error ○ ROLLBACK [TO SAVEPOINT] ○ Explicit locks will be retained until the transaction is completed ○ Implicit locks are released one by one, for each rolled-back row change.
  • 36. MVCC Read Views and the Undo Log Multi-versioning concurrency control: provide non-locking reads from a virtual ‘snapshot’ that corresponds to a read view: The current DB_TRX_ID and a list of DB_TRX_IDs that are not committed at that time. The read view contains: ● Any changes made by the current transaction. ● Any records that were committed when the read view was created. ● No changes of transactions that were started or committed after the read view was created. “Too new” INSERT can be simply ignored. DELETE and UPDATE make the previous version available by pointing to the undo log record.
  • 37. MVCC Read Views in the PRIMARY Index The hidden PRIMARY index fields DB_TRX_ID, DB_ROLL_PTR and undo log records constitute a singly-linked list, from the newest version to the oldest. A non-locking read iterates this list in a loop, starting from the newest version: ● If DB_TRX_ID is in the read view: Return the record, or skip it if delete-marked. ● If DB_TRX_ID is in the future and DB_ROLL_PTR carries the “insert” flag: Skip the record (the oldest version was inserted in the future). ● Else: Find the undo log record by DB_ROLL_PTR , construct the previous version, go to next iteration.
  • 38. MVCC Read Views and Secondary Indexes Secondary indexes only contain a PAGE_MAX_TRX_ID. ● Secondary indexes may contain multiple (indexed_col, pk) records pointing to a single pk, one for each value of indexed_col. ● All versions except at most one must be delete-marked. ● If a secondary index record is delete-marked, MVCC must look up pk in the PRIMARY index and attempt to construct a version that matches indexed_col, to determine if (indexed_col, pk) exists in the read view. ● If the PAGE_MAX_TRX_ID is too new, for each record in the secondary index leaf page, we must look up pk. ● For old read views, secondary index scan can be very expensive!
  • 40. UPDATE talk SET attendees = 25 WHERE conference="M|18" AND name="Deep Dive"; ● If there were any secondary indexes that depend on the updated columns, we delete-marked the records that contained the old values, before inserting attendees=25. Those records should be removed eventually. ● In the primary key index, in MariaDB 10.3 we would set DB_TRX_ID=0 after the history is no longer needed, to speed up future MVCC and locking checks. ● Also, the undo log pages are eventually freed for reuse, in a mini-transaction Truncate undo log head. Purging Old History
  • 41. The purge threads can start removing history as soon as it is no longer visible to any active read view. ● READ COMMITTED switches read views at the start of each statement. ● REPEATABLE READ keeps the read view until commit (or full rollback). Undo log is by default stored in the InnoDB system tablespace (ibdata1). ● Because of purge lag, this file could grow a lot. InnoDB files never shrink! ● Open read views prevent purge from running. ● Purge lag (long History list length in SHOW ENGINE INNODB STATUS) causes secondary indexes and undo logs to grow, and slows down operations. Purge Lag and Long-Running Transactions
  • 42. Secondary Indexes and the Purge Lag In the worst case, MVCC and implicit lock checks will require a clustered index lookup for each secondary index record, followed by undo log lookups. ● Updates of indexed columns make this very expensive: O(versions·rows²). ● Ensure that purge can remove old history: Avoid long-lived read views. ● Avoid secondary index scans in long-lived read views. ● Watch the History list length in SHOW ENGINE INNODB STATUS!
  • 43. Thank you! To be continued in the other Deep Dive: InnoDB Transactions and Replication