SlideShare ist ein Scribd-Unternehmen logo
1 von 27
Transactional Storage for MySQL
        FAST. RELIABLE. PROVEN.



InnoDB Internals: InnoDB File
  Formats and Source Code
         Structure
   MySQL University, October 2009


                   Calvin Sun
               Principal Engineer
               Oracle Corporation
Today’s Topics
•   Goals of InnoDB
•   Key Functional Characteristics
•   InnoDB Design Considerations
•   InnoDB Architecture
•   InnoDB On Disk Format
•   Source Code Structure
•   Q&A
Goals of InnoDB


•   OLTP oriented
•   Performance, Reliability, Scalability
•   Data Protection
•   Portability
InnoDB Key Functional
            Characteristics
•   Full transaction support
•   Row-level locking
•   MVCC
•   Crash recovery
•   Efficient IO
Design Considerations
• Modeled on Gray & Reuter’s “Transactions
 Processing: Concepts & Techniques”
• Also emulated the Oracle architecture
• Added unique subsystems
  • Doublewrite
  • Insert buffering
  • Adaptive hash index
• Designed to evolve with changing
  hardware & requirements
InnoDB Architecture
    Server                        Applications


 Handler API         Embedded InnoDB API
                 Transaction
                   Cursor / Row
   Mini-
                      B-tree             Lock
transaction
                      Page
                    Buffer
              File Space Manager
                     IO
InnoDB On Disk Format
•   InnoDB Database Files
•   InnoDB Tablespaces
•   InnoDB Pages / Extents
•   InnoDB Rows
•   InnoDB Indexes
•   InnoDB Logs
•   File Format Design Considerations
InnoDB Database Files
                               MySQL Data Directory
System tablespace

                                                         InnoDB
                                                          tables
                           internal
                             data                                  .frm files
                          dictionary

                            insert        OR          innodb_file_per_table
                            buffer

                            undo
                            logs
                                                               .ibd files

                     ibdata files
InnoDB Tablespaces
• A tablespace consists of multiple files and/or
  raw disk partitions.
  file_name:file_size[:autoextend[:max:max_file_size]]
• A file/partition is a collection of segments.
• A segment consists of fixed-length pages.
• The page size is always 16KB in uncompressed
  tablespaces, and 1KB-16KB in compressed
  tablespaces (for both data and index).
System Tablespace
•   Internal Data Dictionary
•   Undo
•   Insert Buffer
•   Doublewrite Buffer
•   MySQL Replication Info
InnoDB Tablespaces
 Tablespace
                                           Segment
                                      Extent          Extent
 Leaf node segment
Non-leaf node segment                 Extent          Extent

                                                                   Extent
  Rollback segment
                                           Page
                                     Row        Row
             Row
             Trx id                 Row    Row Row
           Roll pointer
       Field pointers               Row   Row

 Field 1    Field 2       Field n

                                                        an extent = 64 pages
InnoDB Pages
                        InnoDB Page Types
      Symbol             Value                    Notes
   FIL_PAGE_INODE          3     File segment inode
   FIL_PAGE_INDEX        17855   B-tree node
 FIL_PAGE_TYPE_BLOB       10     Uncompressed BLOB page

 FIL_PAGE_TYPE_ZBLOB      11     1st compressed BLOB page
FIL_PAGE_TYPE_ZBLOB2      12     Subsequent compressed BLOB page
  FIL_PAGE_TYPE_SYS        6     System page

FIL_PAGE_TYPE_TRX_SYS      7     Transaction system page
                                 i-buf bitmap, I-buf free list, file space
       others                    header, extent desp page, new
                                 allocated page
InnoDB Pages
A page consists of: a page header, a page
  trailer, and a page body (rows or other
  contents).
                             Page header
               Row                 Row         Row       Row

                             Row                        Row

              Row       Row              Row


                 Row   Row


                                     row offset array
                              Page trailer
Page Declares
typedef struct                    /* a space address */
   {
     ulint     pageno;            /* page number within the file */
     ulint     boffset;           /* byte offset within the page */
   } fil_addr_t;

typedef struct
  {
   ulint      checksum;      /*
                             checksum of the page (since 4.0.14) */
   ulint      page_offset;   /*
                             page offset inside space */
   fil_addr_t previous;      /*
                             offset or fil_addr_t */
   fil_addr_t next;          /*
                             offset or fil_addr_t */
   dulint     page_lsn;      /*
                             lsn of the end of the newest
                              modification log record to the page */
  PAGE_TYPE page type;    /* file page type */
  dulint     file_flush_lsn;/* the file has been flushed to disk
                             at least up to this lsn */
  int         space_id;  /* space id of the page */
  char        data[];    /* will grow */
  ulint       page_lsn;  /* the last 4 bytes of page_lsn */
  ulint       checksum;  /* page checksum, or checksum magic, or 0 */
  } PAGE, *PAGE;
InnoDB Compressed Pages
   Page header      • InnoDB keeps a “modification
                      log” in each page
                • Updates & inserts of small
compressed data records are written to the log
                  w/o page reconstruction;
                  deletes don’t even require
                  uncompression
 modification log   • Log also tells InnoDB if the
                      page will compress to fit page
   empty space        size
  BLOB pointers     • When log space runs out,
  page directory      InnoDB uncompresses the
   Page trailer       page, applies the changes and
                      recompresses the page
InnoDB Rows
                             …     prefix(768B)          …
                                                          COMACT format



                                                                overflow
             20 bytes                                             page
       …                    …
                              DYNAMIC format



                                        overflow
                                          page



Record hdr   Trx ID     Roll ptr   Fld ptrs overflow-page ptr .. Field values
InnoDB Indexes - Primary
                           PK values
                           001 - nnn
                                                                         ●Data   rows are stored
                  …             …                                        in the B-tree leaf
                                                                         nodes of a clustered
      001 –
       500
                            500 –
                             800
                                                     801 –
                                                      nnn                index
                                                                          ● B-tree is organized
                                                                   xxx
001
 -
275
          276 –
           500
                     501
                      -
                     630
                             631
                              -
                             768
                                       769
                                        -
                                       800
                                               801
                                                -
                                               949
                                                             950
                                                              -
                                                             xxx
                                                                    -
                                                                   nnn      by primary key or
                                                                            non-null unique key
                                           clustered
                                                                            of table, if defined;
                  Key values
                                         (primary key)
                                             index                          else, an internal
                   501-
                   501-630
                   + data for
              corresponding rows
                                             Primary Index                  column with 6-byte
                                                                            ROW_ID is added.
InnoDB Indexes - Secondary
                                          clustered
                                     clustered
                                       (primary key)
                                   (primary PK values
                                              key)
                                        index - nnn
                                              001
                                            index

● Secondary index B-
  tree leaf nodes
  contain, for each key
  value, the primary           B-tree leaf nodes, containing data
  keys of the
  corresponding rows,
  used to access                            key values
                                               A Z

  clustering index to
  obtain the data
             Secondary Index
                               B-tree leaf nodes, containing PKs

                                        Secondary index
                                     Secondary index
InnoDB Logging

                              Rollback segments




     Log Buffer                                   Buffer Pool

           log thread
                                                      write thread




Log File                Log File
  #1
           redo                                   DATA
                          #2                                         rollback
            log
                                   log files
                                                       ibdata files
InnoDB Redo Log



         end of log      start of log        last checkpoint
                                     min LSN

Redo log structure:
        Space id      PageNo    OpCode       Data
File Format Management
              • Builtin InnoDB format: “Antelope”
              • New “Barracuda” format enables
                compression,ROW_FORMAT=DYNAMIC
   .ibd
                • Fast index creation, other features do not
 data files       require Barracuda file format
 (file per
  table)      • Builtin InnoDB can access “Antelope”
                databases, but not “Barracuda”
                databases
                • Check file format tag in system tablespace
                  on startup
              • Enable a file format with new dynamic
                parameter innodb_file_format
              • Preserves ability to downgrade easily
InnoDB File Format Design
      Considerations
• Durability
  • Logging, doublewrite, checksum;
• Performance
  • Insert buffering, table compression
• Efficiency
  • Dynamic row format, table compression
• Compatibility
  • File format management
Source Code Structure
• 31 subdirectories
• Relevant InnoDB source files on file
  formats
  • Tablespace: fsp0fsp {.c, .ic, .h}
  • Page: page0page, page0zip {.c, .ic, .h}
  • Log: log0log {.c, .ic, .h}
Source Code Subdirectories
•   buf       •   ibuf      •   que
•   data      •   include   •   read
•   db        •   lock      •   rem
•   dict      •   log       •   row
•   dyn       •   math      •   srv
•   eval      •   mem       •   sync
•   fil       •   mtr       •   thr
•   fsp       •   os        •   trx
•   fut       •   page      •   usr
•   ha        •   pars      •   ut
•   handler
Summary:
        Durability, Performance,
       Compatibility & Efficiency
• InnoDB is the leading transactional storage engine
  for MySQL
• InnoDB’s architecture is well-suited to modern, on-
  line transactional applications; as well as embedded
  applications.
• InnoDB’s file format is designed for high durability,
  better performance, and easy to manage
QUESTIONS
 ANSWERS
InnoDB Size Limits
•   Max # of tables: 4 G
•   Max size of a table: 32TB
•   Columns per table: 1000
•   Max row size: n*4 GB
    • 8 kB if stored on the same page
    • n*4 GB with n BLOBs
• Max key length: 3500
• Maximum tablespace size: 64 TB
• Max # of concurrent trxs: 1023

Weitere ähnliche Inhalte

Was ist angesagt?

MySQL Performance Schema in 20 Minutes
 MySQL Performance Schema in 20 Minutes MySQL Performance Schema in 20 Minutes
MySQL Performance Schema in 20 Minutes
Sveta Smirnova
 

Was ist angesagt? (20)

MariaDB Server Performance Tuning & Optimization
MariaDB Server Performance Tuning & OptimizationMariaDB Server Performance Tuning & Optimization
MariaDB Server Performance Tuning & Optimization
 
The Full MySQL and MariaDB Parallel Replication Tutorial
The Full MySQL and MariaDB Parallel Replication TutorialThe Full MySQL and MariaDB Parallel Replication Tutorial
The Full MySQL and MariaDB Parallel Replication Tutorial
 
How to use histograms to get better performance
How to use histograms to get better performanceHow to use histograms to get better performance
How to use histograms to get better performance
 
M|18 Deep Dive: InnoDB Transactions and Write Paths
M|18 Deep Dive: InnoDB Transactions and Write PathsM|18 Deep Dive: InnoDB Transactions and Write Paths
M|18 Deep Dive: InnoDB Transactions and Write Paths
 
Demystifying MySQL Replication Crash Safety
Demystifying MySQL Replication Crash SafetyDemystifying MySQL Replication Crash Safety
Demystifying MySQL Replication Crash Safety
 
MySQL Advanced Administrator 2021 - 네오클로바
MySQL Advanced Administrator 2021 - 네오클로바MySQL Advanced Administrator 2021 - 네오클로바
MySQL Advanced Administrator 2021 - 네오클로바
 
MySQL Performance for DevOps
MySQL Performance for DevOpsMySQL Performance for DevOps
MySQL Performance for DevOps
 
mysql 8.0 architecture and enhancement
mysql 8.0 architecture and enhancementmysql 8.0 architecture and enhancement
mysql 8.0 architecture and enhancement
 
PostgreSQL Database Slides
PostgreSQL Database SlidesPostgreSQL Database Slides
PostgreSQL Database Slides
 
preFOSDEM MySQL Day - Best Practices to Upgrade to MySQL 8.0
preFOSDEM MySQL Day - Best Practices to Upgrade to MySQL 8.0preFOSDEM MySQL Day - Best Practices to Upgrade to MySQL 8.0
preFOSDEM MySQL Day - Best Practices to Upgrade to MySQL 8.0
 
MySQL Timeout Variables Explained
MySQL Timeout Variables Explained MySQL Timeout Variables Explained
MySQL Timeout Variables Explained
 
InnoDB Locking Explained with Stick Figures
InnoDB Locking Explained with Stick FiguresInnoDB Locking Explained with Stick Figures
InnoDB Locking Explained with Stick Figures
 
MySQL Atchitecture and Concepts
MySQL Atchitecture and ConceptsMySQL Atchitecture and Concepts
MySQL Atchitecture and Concepts
 
MySQL GTID Concepts, Implementation and troubleshooting
MySQL GTID Concepts, Implementation and troubleshooting MySQL GTID Concepts, Implementation and troubleshooting
MySQL GTID Concepts, Implementation and troubleshooting
 
MySQL Performance Tuning. Part 1: MySQL Configuration (includes MySQL 5.7)
MySQL Performance Tuning. Part 1: MySQL Configuration (includes MySQL 5.7)MySQL Performance Tuning. Part 1: MySQL Configuration (includes MySQL 5.7)
MySQL Performance Tuning. Part 1: MySQL Configuration (includes MySQL 5.7)
 
ProxySQL High Availability (Clustering)
ProxySQL High Availability (Clustering)ProxySQL High Availability (Clustering)
ProxySQL High Availability (Clustering)
 
How to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better PerformanceHow to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better Performance
 
MySQL Performance Schema in Action
MySQL Performance Schema in ActionMySQL Performance Schema in Action
MySQL Performance Schema in Action
 
MySQL Performance Schema in 20 Minutes
 MySQL Performance Schema in 20 Minutes MySQL Performance Schema in 20 Minutes
MySQL Performance Schema in 20 Minutes
 
Histogram-in-Parallel-universe-of-MySQL-and-MariaDB
Histogram-in-Parallel-universe-of-MySQL-and-MariaDBHistogram-in-Parallel-universe-of-MySQL-and-MariaDB
Histogram-in-Parallel-universe-of-MySQL-and-MariaDB
 

Andere mochten auch

Recovery of lost or corrupted inno db tables(mysql uc 2010)
Recovery of lost or corrupted inno db tables(mysql uc 2010)Recovery of lost or corrupted inno db tables(mysql uc 2010)
Recovery of lost or corrupted inno db tables(mysql uc 2010)
Aleksandr Kuzminsky
 
Inno Db Internals Inno Db File Formats And Source Code Structure
Inno Db Internals Inno Db File Formats And Source Code StructureInno Db Internals Inno Db File Formats And Source Code Structure
Inno Db Internals Inno Db File Formats And Source Code Structure
MySQLConference
 
InnoDB Architecture and Performance Optimization, Peter Zaitsev
InnoDB Architecture and Performance Optimization, Peter ZaitsevInnoDB Architecture and Performance Optimization, Peter Zaitsev
InnoDB Architecture and Performance Optimization, Peter Zaitsev
Fuenteovejuna
 
Database , 5 Semantic
Database , 5 SemanticDatabase , 5 Semantic
Database , 5 Semantic
Ali Usman
 
Mvcc Unmasked (Bruce Momjian)
Mvcc Unmasked (Bruce Momjian)Mvcc Unmasked (Bruce Momjian)
Mvcc Unmasked (Bruce Momjian)
Ontico
 
The Power of MySQL Explain
The Power of MySQL ExplainThe Power of MySQL Explain
The Power of MySQL Explain
MYXPLAIN
 
Mv unmasked.w.code.march.2013
Mv unmasked.w.code.march.2013Mv unmasked.w.code.march.2013
Mv unmasked.w.code.march.2013
EDB
 

Andere mochten auch (20)

Recovery of lost or corrupted inno db tables(mysql uc 2010)
Recovery of lost or corrupted inno db tables(mysql uc 2010)Recovery of lost or corrupted inno db tables(mysql uc 2010)
Recovery of lost or corrupted inno db tables(mysql uc 2010)
 
Inno Db Internals Inno Db File Formats And Source Code Structure
Inno Db Internals Inno Db File Formats And Source Code StructureInno Db Internals Inno Db File Formats And Source Code Structure
Inno Db Internals Inno Db File Formats And Source Code Structure
 
cPanelCon 2014: InnoDB Anatomy
cPanelCon 2014: InnoDB AnatomycPanelCon 2014: InnoDB Anatomy
cPanelCon 2014: InnoDB Anatomy
 
InnoDB Architecture and Performance Optimization, Peter Zaitsev
InnoDB Architecture and Performance Optimization, Peter ZaitsevInnoDB Architecture and Performance Optimization, Peter Zaitsev
InnoDB Architecture and Performance Optimization, Peter Zaitsev
 
MySQL 5.5 Guide to InnoDB Status
MySQL 5.5 Guide to InnoDB StatusMySQL 5.5 Guide to InnoDB Status
MySQL 5.5 Guide to InnoDB Status
 
Optimizing MySQL
Optimizing MySQLOptimizing MySQL
Optimizing MySQL
 
Postgres MVCC - A Developer Centric View of Multi Version Concurrency Control
Postgres MVCC - A Developer Centric View of Multi Version Concurrency ControlPostgres MVCC - A Developer Centric View of Multi Version Concurrency Control
Postgres MVCC - A Developer Centric View of Multi Version Concurrency Control
 
innoDBのインデックスとアルゴリズムについて調べてみた話
innoDBのインデックスとアルゴリズムについて調べてみた話innoDBのインデックスとアルゴリズムについて調べてみた話
innoDBのインデックスとアルゴリズムについて調べてみた話
 
Mastering InnoDB Diagnostics
Mastering InnoDB DiagnosticsMastering InnoDB Diagnostics
Mastering InnoDB Diagnostics
 
Schemadoc
SchemadocSchemadoc
Schemadoc
 
Database , 5 Semantic
Database , 5 SemanticDatabase , 5 Semantic
Database , 5 Semantic
 
PL/pgSQL - An Introduction on Using Imperative Programming in PostgreSQL
PL/pgSQL - An Introduction on Using Imperative Programming in PostgreSQLPL/pgSQL - An Introduction on Using Imperative Programming in PostgreSQL
PL/pgSQL - An Introduction on Using Imperative Programming in PostgreSQL
 
MySQL 5.7: Focus on InnoDB
MySQL 5.7: Focus on InnoDBMySQL 5.7: Focus on InnoDB
MySQL 5.7: Focus on InnoDB
 
Mvcc Unmasked (Bruce Momjian)
Mvcc Unmasked (Bruce Momjian)Mvcc Unmasked (Bruce Momjian)
Mvcc Unmasked (Bruce Momjian)
 
The nightmare of locking, blocking and isolation levels!
The nightmare of locking, blocking and isolation levels!The nightmare of locking, blocking and isolation levels!
The nightmare of locking, blocking and isolation levels!
 
Mysql For Developers
Mysql For DevelopersMysql For Developers
Mysql For Developers
 
The Power of MySQL Explain
The Power of MySQL ExplainThe Power of MySQL Explain
The Power of MySQL Explain
 
Mv unmasked.w.code.march.2013
Mv unmasked.w.code.march.2013Mv unmasked.w.code.march.2013
Mv unmasked.w.code.march.2013
 
Como migrar una base de datos de mysql a power designer
Como migrar una base de datos de mysql a power designerComo migrar una base de datos de mysql a power designer
Como migrar una base de datos de mysql a power designer
 
Explain
ExplainExplain
Explain
 

Ähnlich wie InnoDB Internal

Open sql2010 recovery-of-lost-or-corrupted-innodb-tables
Open sql2010 recovery-of-lost-or-corrupted-innodb-tablesOpen sql2010 recovery-of-lost-or-corrupted-innodb-tables
Open sql2010 recovery-of-lost-or-corrupted-innodb-tables
Arvids Godjuks
 
Recovery of lost or corrupted inno db tables(mysql uc 2010)
Recovery of lost or corrupted inno db tables(mysql uc 2010)Recovery of lost or corrupted inno db tables(mysql uc 2010)
Recovery of lost or corrupted inno db tables(mysql uc 2010)
guest808c167
 
InnoDB architecture and performance optimization (Пётр Зайцев)
InnoDB architecture and performance optimization (Пётр Зайцев)InnoDB architecture and performance optimization (Пётр Зайцев)
InnoDB architecture and performance optimization (Пётр Зайцев)
Ontico
 
Locality of (p)reference
Locality of (p)referenceLocality of (p)reference
Locality of (p)reference
FromDual GmbH
 
My sql innovation work -innosql
My sql innovation work -innosqlMy sql innovation work -innosql
My sql innovation work -innosql
thinkinlamp
 

Ähnlich wie InnoDB Internal (20)

Inno db internals innodb file formats and source code structure
Inno db internals innodb file formats and source code structureInno db internals innodb file formats and source code structure
Inno db internals innodb file formats and source code structure
 
Data recovery talk on PLUK
Data recovery talk on PLUKData recovery talk on PLUK
Data recovery talk on PLUK
 
Open sql2010 recovery-of-lost-or-corrupted-innodb-tables
Open sql2010 recovery-of-lost-or-corrupted-innodb-tablesOpen sql2010 recovery-of-lost-or-corrupted-innodb-tables
Open sql2010 recovery-of-lost-or-corrupted-innodb-tables
 
Lecture storage-buffer
Lecture storage-bufferLecture storage-buffer
Lecture storage-buffer
 
Recovery of lost or corrupted inno db tables(mysql uc 2010)
Recovery of lost or corrupted inno db tables(mysql uc 2010)Recovery of lost or corrupted inno db tables(mysql uc 2010)
Recovery of lost or corrupted inno db tables(mysql uc 2010)
 
InnoDB architecture and performance optimization (Пётр Зайцев)
InnoDB architecture and performance optimization (Пётр Зайцев)InnoDB architecture and performance optimization (Пётр Зайцев)
InnoDB architecture and performance optimization (Пётр Зайцев)
 
Pldc2012 innodb architecture and internals
Pldc2012 innodb architecture and internalsPldc2012 innodb architecture and internals
Pldc2012 innodb architecture and internals
 
Locality of (p)reference
Locality of (p)referenceLocality of (p)reference
Locality of (p)reference
 
Page Cache in Linux 2.6.pdf
Page Cache in Linux 2.6.pdfPage Cache in Linux 2.6.pdf
Page Cache in Linux 2.6.pdf
 
Mysteries of the binary log
Mysteries of the binary logMysteries of the binary log
Mysteries of the binary log
 
MySQL innoDB split and merge pages
MySQL innoDB split and merge pagesMySQL innoDB split and merge pages
MySQL innoDB split and merge pages
 
Configuring workload-based storage and topologies
Configuring workload-based storage and topologiesConfiguring workload-based storage and topologies
Configuring workload-based storage and topologies
 
Innodb 和 XtraDB 结构和性能优化
Innodb 和 XtraDB 结构和性能优化Innodb 和 XtraDB 结构和性能优化
Innodb 和 XtraDB 结构和性能优化
 
BGOUG 2012 - XML Index Strategies
BGOUG 2012 - XML Index StrategiesBGOUG 2012 - XML Index Strategies
BGOUG 2012 - XML Index Strategies
 
Linux Kernel Booting Process (2) - For NLKB
Linux Kernel Booting Process (2) - For NLKBLinux Kernel Booting Process (2) - For NLKB
Linux Kernel Booting Process (2) - For NLKB
 
15 bufferand records
15 bufferand records15 bufferand records
15 bufferand records
 
SQLBits X SQL Server 2012 Rich Unstructured Data
SQLBits X SQL Server 2012 Rich Unstructured DataSQLBits X SQL Server 2012 Rich Unstructured Data
SQLBits X SQL Server 2012 Rich Unstructured Data
 
Apache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS SessionApache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS Session
 
Incremental backups
Incremental backupsIncremental backups
Incremental backups
 
My sql innovation work -innosql
My sql innovation work -innosqlMy sql innovation work -innosql
My sql innovation work -innosql
 

Mehr von mysqlops

Mehr von mysqlops (20)

The simplethebeautiful
The simplethebeautifulThe simplethebeautiful
The simplethebeautiful
 
Oracle数据库分析函数详解
Oracle数据库分析函数详解Oracle数据库分析函数详解
Oracle数据库分析函数详解
 
Percona Live 2012PPT:mysql-security-privileges-and-user-management
Percona Live 2012PPT:mysql-security-privileges-and-user-managementPercona Live 2012PPT:mysql-security-privileges-and-user-management
Percona Live 2012PPT:mysql-security-privileges-and-user-management
 
Percona Live 2012PPT: introduction-to-mysql-replication
Percona Live 2012PPT: introduction-to-mysql-replicationPercona Live 2012PPT: introduction-to-mysql-replication
Percona Live 2012PPT: introduction-to-mysql-replication
 
Percona Live 2012PPT: MySQL Cluster And NDB Cluster
Percona Live 2012PPT: MySQL Cluster And NDB ClusterPercona Live 2012PPT: MySQL Cluster And NDB Cluster
Percona Live 2012PPT: MySQL Cluster And NDB Cluster
 
Percona Live 2012PPT: MySQL Query optimization
Percona Live 2012PPT: MySQL Query optimizationPercona Live 2012PPT: MySQL Query optimization
Percona Live 2012PPT: MySQL Query optimization
 
DBA新人的述职报告
DBA新人的述职报告DBA新人的述职报告
DBA新人的述职报告
 
分布式爬虫
分布式爬虫分布式爬虫
分布式爬虫
 
MySQL应用优化实践
MySQL应用优化实践MySQL应用优化实践
MySQL应用优化实践
 
eBay EDW元数据管理及应用
eBay EDW元数据管理及应用eBay EDW元数据管理及应用
eBay EDW元数据管理及应用
 
基于协程的网络开发框架的设计与实现
基于协程的网络开发框架的设计与实现基于协程的网络开发框架的设计与实现
基于协程的网络开发框架的设计与实现
 
eBay基于Hadoop平台的用户邮件数据分析
eBay基于Hadoop平台的用户邮件数据分析eBay基于Hadoop平台的用户邮件数据分析
eBay基于Hadoop平台的用户邮件数据分析
 
对MySQL DBA的一些思考
对MySQL DBA的一些思考对MySQL DBA的一些思考
对MySQL DBA的一些思考
 
QQ聊天系统后台架构的演化与启示
QQ聊天系统后台架构的演化与启示QQ聊天系统后台架构的演化与启示
QQ聊天系统后台架构的演化与启示
 
腾讯即时聊天IM1.4亿在线背后的故事
腾讯即时聊天IM1.4亿在线背后的故事腾讯即时聊天IM1.4亿在线背后的故事
腾讯即时聊天IM1.4亿在线背后的故事
 
分布式存储与TDDL
分布式存储与TDDL分布式存储与TDDL
分布式存储与TDDL
 
MySQL数据库生产环境维护
MySQL数据库生产环境维护MySQL数据库生产环境维护
MySQL数据库生产环境维护
 
Memcached
MemcachedMemcached
Memcached
 
DevOPS
DevOPSDevOPS
DevOPS
 
MySQL数据库开发的三十六条军规
MySQL数据库开发的三十六条军规MySQL数据库开发的三十六条军规
MySQL数据库开发的三十六条军规
 

Kürzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 

InnoDB Internal

  • 1. Transactional Storage for MySQL FAST. RELIABLE. PROVEN. InnoDB Internals: InnoDB File Formats and Source Code Structure MySQL University, October 2009 Calvin Sun Principal Engineer Oracle Corporation
  • 2. Today’s Topics • Goals of InnoDB • Key Functional Characteristics • InnoDB Design Considerations • InnoDB Architecture • InnoDB On Disk Format • Source Code Structure • Q&A
  • 3. Goals of InnoDB • OLTP oriented • Performance, Reliability, Scalability • Data Protection • Portability
  • 4. InnoDB Key Functional Characteristics • Full transaction support • Row-level locking • MVCC • Crash recovery • Efficient IO
  • 5. Design Considerations • Modeled on Gray & Reuter’s “Transactions Processing: Concepts & Techniques” • Also emulated the Oracle architecture • Added unique subsystems • Doublewrite • Insert buffering • Adaptive hash index • Designed to evolve with changing hardware & requirements
  • 6. InnoDB Architecture Server Applications Handler API Embedded InnoDB API Transaction Cursor / Row Mini- B-tree Lock transaction Page Buffer File Space Manager IO
  • 7. InnoDB On Disk Format • InnoDB Database Files • InnoDB Tablespaces • InnoDB Pages / Extents • InnoDB Rows • InnoDB Indexes • InnoDB Logs • File Format Design Considerations
  • 8. InnoDB Database Files MySQL Data Directory System tablespace InnoDB tables internal data .frm files dictionary insert OR innodb_file_per_table buffer undo logs .ibd files ibdata files
  • 9. InnoDB Tablespaces • A tablespace consists of multiple files and/or raw disk partitions. file_name:file_size[:autoextend[:max:max_file_size]] • A file/partition is a collection of segments. • A segment consists of fixed-length pages. • The page size is always 16KB in uncompressed tablespaces, and 1KB-16KB in compressed tablespaces (for both data and index).
  • 10. System Tablespace • Internal Data Dictionary • Undo • Insert Buffer • Doublewrite Buffer • MySQL Replication Info
  • 11. InnoDB Tablespaces Tablespace Segment Extent Extent Leaf node segment Non-leaf node segment Extent Extent Extent Rollback segment Page Row Row Row Trx id Row Row Row Roll pointer Field pointers Row Row Field 1 Field 2 Field n an extent = 64 pages
  • 12. InnoDB Pages InnoDB Page Types Symbol Value Notes FIL_PAGE_INODE 3 File segment inode FIL_PAGE_INDEX 17855 B-tree node FIL_PAGE_TYPE_BLOB 10 Uncompressed BLOB page FIL_PAGE_TYPE_ZBLOB 11 1st compressed BLOB page FIL_PAGE_TYPE_ZBLOB2 12 Subsequent compressed BLOB page FIL_PAGE_TYPE_SYS 6 System page FIL_PAGE_TYPE_TRX_SYS 7 Transaction system page i-buf bitmap, I-buf free list, file space others header, extent desp page, new allocated page
  • 13. InnoDB Pages A page consists of: a page header, a page trailer, and a page body (rows or other contents). Page header Row Row Row Row Row Row Row Row Row Row Row row offset array Page trailer
  • 14. Page Declares typedef struct /* a space address */ { ulint pageno; /* page number within the file */ ulint boffset; /* byte offset within the page */ } fil_addr_t; typedef struct { ulint checksum; /* checksum of the page (since 4.0.14) */ ulint page_offset; /* page offset inside space */ fil_addr_t previous; /* offset or fil_addr_t */ fil_addr_t next; /* offset or fil_addr_t */ dulint page_lsn; /* lsn of the end of the newest modification log record to the page */ PAGE_TYPE page type; /* file page type */ dulint file_flush_lsn;/* the file has been flushed to disk at least up to this lsn */ int space_id; /* space id of the page */ char data[]; /* will grow */ ulint page_lsn; /* the last 4 bytes of page_lsn */ ulint checksum; /* page checksum, or checksum magic, or 0 */ } PAGE, *PAGE;
  • 15. InnoDB Compressed Pages Page header • InnoDB keeps a “modification log” in each page • Updates & inserts of small compressed data records are written to the log w/o page reconstruction; deletes don’t even require uncompression modification log • Log also tells InnoDB if the page will compress to fit page empty space size BLOB pointers • When log space runs out, page directory InnoDB uncompresses the Page trailer page, applies the changes and recompresses the page
  • 16. InnoDB Rows … prefix(768B) … COMACT format overflow 20 bytes page … … DYNAMIC format overflow page Record hdr Trx ID Roll ptr Fld ptrs overflow-page ptr .. Field values
  • 17. InnoDB Indexes - Primary PK values 001 - nnn ●Data rows are stored … … in the B-tree leaf nodes of a clustered 001 – 500 500 – 800 801 – nnn index ● B-tree is organized xxx 001 - 275 276 – 500 501 - 630 631 - 768 769 - 800 801 - 949 950 - xxx - nnn by primary key or non-null unique key clustered of table, if defined; Key values (primary key) index else, an internal 501- 501-630 + data for corresponding rows Primary Index column with 6-byte ROW_ID is added.
  • 18. InnoDB Indexes - Secondary clustered clustered (primary key) (primary PK values key) index - nnn 001 index ● Secondary index B- tree leaf nodes contain, for each key value, the primary B-tree leaf nodes, containing data keys of the corresponding rows, used to access key values A Z clustering index to obtain the data Secondary Index B-tree leaf nodes, containing PKs Secondary index Secondary index
  • 19. InnoDB Logging Rollback segments Log Buffer Buffer Pool log thread write thread Log File Log File #1 redo DATA #2 rollback log log files ibdata files
  • 20. InnoDB Redo Log end of log start of log last checkpoint min LSN Redo log structure: Space id PageNo OpCode Data
  • 21. File Format Management • Builtin InnoDB format: “Antelope” • New “Barracuda” format enables compression,ROW_FORMAT=DYNAMIC .ibd • Fast index creation, other features do not data files require Barracuda file format (file per table) • Builtin InnoDB can access “Antelope” databases, but not “Barracuda” databases • Check file format tag in system tablespace on startup • Enable a file format with new dynamic parameter innodb_file_format • Preserves ability to downgrade easily
  • 22. InnoDB File Format Design Considerations • Durability • Logging, doublewrite, checksum; • Performance • Insert buffering, table compression • Efficiency • Dynamic row format, table compression • Compatibility • File format management
  • 23. Source Code Structure • 31 subdirectories • Relevant InnoDB source files on file formats • Tablespace: fsp0fsp {.c, .ic, .h} • Page: page0page, page0zip {.c, .ic, .h} • Log: log0log {.c, .ic, .h}
  • 24. Source Code Subdirectories • buf • ibuf • que • data • include • read • db • lock • rem • dict • log • row • dyn • math • srv • eval • mem • sync • fil • mtr • thr • fsp • os • trx • fut • page • usr • ha • pars • ut • handler
  • 25. Summary: Durability, Performance, Compatibility & Efficiency • InnoDB is the leading transactional storage engine for MySQL • InnoDB’s architecture is well-suited to modern, on- line transactional applications; as well as embedded applications. • InnoDB’s file format is designed for high durability, better performance, and easy to manage
  • 27. InnoDB Size Limits • Max # of tables: 4 G • Max size of a table: 32TB • Columns per table: 1000 • Max row size: n*4 GB • 8 kB if stored on the same page • n*4 GB with n BLOBs • Max key length: 3500 • Maximum tablespace size: 64 TB • Max # of concurrent trxs: 1023