SlideShare ist ein Scribd-Unternehmen logo
1 von 16
Downloaden Sie, um offline zu lesen
Covering Indexes:
Orders-of-Magnitude Improvements

         Bradley C. Kuszmaul
           Chief Architect




  Percona Performance Conference 2009
A Performance Example
A fact table drawn from iiBench, except no indexes.
Like TPCH, the int values are essentially random.

Create Table: CREATE TABLE ‘facts‘ (
  ‘xid‘ int(11) NOT NULL AUTO_INCREMENT,
  ‘dateandtime‘ datetime DEFAULT NULL,
  ‘cashregisterid‘ int(11) NOT NULL,
  ‘cust_id‘ int(11) NOT NULL,
  ‘prod_id‘ int(11) NOT NULL,
  ‘price‘ float NOT NULL,
  PRIMARY KEY (‘xid‘),
) ENGINE=TOKUDB


Populated with 1 billion rows.
Bradley C. Kuszmaul           Covering Indexes 2
The Query
A simple query:
mysql> select prod_id from facts
              where cust_id = 50000;
 prod id
     525
     654
     704
       .
       .
     984
     276
     576
10014 rows in set (11 min 26.26 sec)

Implemented via table scan.
14.6 rows/s.

Bradley C. Kuszmaul      Covering Indexes 3
An Index Doubles Speed
mysql> alter table add index facts
             cust_idx(cust_id);
mysql> select prod_id from facts
              where cust_id = 50000;
 prod id
     525
     654
     704
       .
       .
     984
     276
     576
10014 rows in set (5 min 48.94 sec)

Looks at 0.001% of the data and get 2x speedup.
29 rows/s.
Bradley C. Kuszmaul            Covering Indexes 4
Covering Index
A covering index is an index in which all the
neccessary columns are part of the key.
mysql> alter table add index facts
             cust_prod_idx (cust_id, prod_id);

Even if we have a key that happens to be unique, we
can throw in some more keys (which doesn’t change
the index order) to make it a covering index.




Bradley C. Kuszmaul        Covering Indexes 5
Covering Index Is 1300x Faster
mysql> alter table add index facts
             cust_prod_idx (cust_id, prod_id);
mysql> select prod_id from facts
              where cust_id = 50000;
 prod id
       0
       0
       .
       .
     999
     999
10014 rows in set (0.26 sec)



Note different row order.

Bradley C. Kuszmaul                Covering Indexes 6
Outline
For a database that doesn’t fit in main memory, how
can we predict performance?

   • The Disk Access Model (DAM).
   • Analysis using the DAM.
   • Which indexes should we maintain?
   • Another brief example: TPC-H.
   • Does SSD help?

In this talk, I’ll describe a theoretical model for
predicting performance, and show how to use it.

Bradley C. Kuszmaul     Covering Indexes 7
The Disk-Access Model
                                        Disk

                                                               A theoretical model
                      Main memory
                                                               for understanding
                                                               performance of data
       Processor


                                                               structures on disk.
                                    B

                                                               Memory is
                                                               organized in blocks
                        B
                                               .
                                               .
                                               .

                                                               of size B.
                                           B


Blocks are transferred between memory and disk.
Count only the number of block transfers.
Model can predict the performance of a query plan.

Bradley C. Kuszmaul                       Covering Indexes 8
Analysis for No Index
                                      xid cust id prod id
                                        1     42     501         Block 0 (size B)
                                                .
                                                .
                                    6044 50000       525
                                                .
                                                .                Block 1
                                   20480 50000       654
                                                 .
                                                 .
                      109 rows                                   Block 109/B − 2
                                   44921 50000       704
                                                .
                                                .
                               999703368 50000       984
                                                .                Block 109/B − 1
                                                .
                               999850921 50000       276
                                                .
                                                .
                                                                 Block 109/B
                               999923451 50000       576
                                                .
                                                .
                                                  9
Table scan requires O(10 /B) transfers.
Bradley C. Kuszmaul                         Covering Indexes 9
Analysis for Index
    cust idx                                   facts
    cust id                    xid                   xid cust id prod id
         42                      1                     1      42     501
                      .                                        .
                      .                                        .
          50000               6044                  6044 50000       525
                                                               .
          50000              20480                             .
          50000              44921                 20480 50000       654
                      .                                        .
                      .                                        .
          50000           999703368                44921 50000       704
                                                               .
          50000           999850921                            .
          50000           999923451            999703368 50000       984
                      .                                        .
                      .                                        .
                                               999850921 50000       276
                                                               .
    Fetch only 10014 rows,                                     .
                                               999923451 50000       576
    but mostly in different                                    .
                                                               .
    blocks, so O(10014)
    memory transfers.
Bradley C. Kuszmaul                   Covering Indexes 10
Analysis With Covering Index
    cust prod idx                                     facts
    cust id prod id               xid                           xid cust id prod id
         42             501         1                            1       42    501
                                                                          .
      50000             276 999850921                                     .
      50000             525      6044                          6044   50000    525
                                                                          .
      50000             576 999923451                                     .
      50000             654     20480                         20480   50000    654
                                                                          .
      50000             704     44921                                     .
      50000             984 999703368                         44921   50000    704
                                                                          .
                                                                          .
    Answer directly out of                            999703368       50000    984
                                                                          .
                                                                          .
    the index: O(10014/B)
                                                      999850921       50000    276
    transfers.                                                            .
                                                                          .
    (The prod ids are sorted
                                                      999923451       50000    576
    for each customer.)                                                   .
                                                                          .

Bradley C. Kuszmaul                     Covering Indexes 11
Which Indexes?
Problem: MySQL only allows 16 columns in an
index (32 in TokuDB).
Covering indexes speed up queries.
But which columns should we throw in?
Solution: We defined clustering indexes.
A clustering index is an index which includes all
the columns in the index.
mysql> alter table facts add clustering index
             cust cluster idx (customerid);

Materializes a table, sorted in a different order,
clustered on the index.
A clustering index is a covering index for all
queries.
Bradley C. Kuszmaul        Covering Indexes 12
TPC-H Q17
We’ve been trying to figure out how to make
TPC-H-like queries run faster, so we picked Q17,
which is one of the slowest in MySQL.
Results: With a clustering index on
(L_PARTNUM, L_QTY):
     Scale         Standard Clustering
      SF10 (10GB) > 3600s 101s (> 36x speedup)
     SF100 (100GB) 680533s 773s (770x speedup)
I’ll write more about this in blog tokuview.com.



Bradley C. Kuszmaul      Covering Indexes 13
Indexes Are Expensive (or Are They?)
The downside of maintaining indexes is that
insertions are more expensive.
Fractal Tree indexes speed up insertions by orders
of magnitude, however.
                               9
For B = 4096 rows and N = 10 rows, the number
of memory transfers per operation is
                 B-Trees       Fractal Trees
                           log N
           Point Query O                  ≥ 1 O(logB N) ≥ 1
                           log B
           Range Query O(S/B)                       O(S/B)
                           log N                      log N
           Insertion   O                  ≥1O                 = 0.007
                           log B                        B
TokuDB, the Tokutek storage engine, implements
Fractal Tree indexes.
Bradley C. Kuszmaul           Covering Indexes 14
Other Materialization Ideas
Better tools are needed to help maintain other
interesting materializations for MySQL. For
example:
 • Denormalization: prejoining some columns.
 • Multidimensional indexing: often where
   clauses look like range queries on multiple
   columns.
          38<=a and a<=42 and 90<=b and b<=99
  Partions can provide a painful substitute for
  multidimensional indexing.
Fractal Tree indexes can help solve these problems.

Bradley C. Kuszmaul              Covering Indexes 15
iiBench on RAID10 and SSD
                     35000


                     30000


                     25000
    Insertion Rate




                                                                     TokuDB
                     20000
                                                                          FusionIO
                                                                          X25E
                                                                          RAID10
                     15000


                     10000

                                                                     InnoDB
                     5000
                                                                          FusionIO
                                                                          X25-E
                                                                          RAID10
                        0
                             0      5e+07                    1e+08    1.5e+08
                                        Cummulative Insertions

Percona measured TokuDB and InnoDB on iiBench on RAID 10 disks,
Intel X25-E 32GB SSD, and FusionIO 160GB SSD.
These SSDs provide surprisingly little performance.
Bradley C. Kuszmaul                         Covering Indexes 16

Weitere ähnliche Inhalte

Was ist angesagt?

Struktur data 05 (bs avl tree)
Struktur data 05 (bs avl tree)Struktur data 05 (bs avl tree)
Struktur data 05 (bs avl tree)Sunarya Marwah
 
Tugas Mandiri Riset Operasi
Tugas Mandiri Riset OperasiTugas Mandiri Riset Operasi
Tugas Mandiri Riset OperasiPrincess Nisa
 
Data Management (Data Mining Association Rule)
Data Management (Data Mining Association Rule)Data Management (Data Mining Association Rule)
Data Management (Data Mining Association Rule)Adam Mukharil Bachtiar
 
Program penjumlahan dan pengurangan matriks
Program penjumlahan dan pengurangan matriksProgram penjumlahan dan pengurangan matriks
Program penjumlahan dan pengurangan matriksSimon Patabang
 
Matriks, relasi dan fungsi
Matriks, relasi dan fungsi Matriks, relasi dan fungsi
Matriks, relasi dan fungsi Aisyah Turidho
 
Teori bilangan (induksi matematika)
Teori bilangan (induksi matematika)Teori bilangan (induksi matematika)
Teori bilangan (induksi matematika)1724143052
 
Aljabar matriks-its
Aljabar matriks-itsAljabar matriks-its
Aljabar matriks-itsMasnia Siti
 
EVALUASI KOMPRESI DATA MENGGUNAKAN ALGORITMA LEMPEL-ZIV-MARKOV CHAIN DENGAN A...
EVALUASI KOMPRESI DATA MENGGUNAKAN ALGORITMA LEMPEL-ZIV-MARKOV CHAIN DENGAN A...EVALUASI KOMPRESI DATA MENGGUNAKAN ALGORITMA LEMPEL-ZIV-MARKOV CHAIN DENGAN A...
EVALUASI KOMPRESI DATA MENGGUNAKAN ALGORITMA LEMPEL-ZIV-MARKOV CHAIN DENGAN A...Basri Yasin
 
Codetainer: a Docker-based browser code 'sandbox'
Codetainer: a Docker-based browser code 'sandbox'Codetainer: a Docker-based browser code 'sandbox'
Codetainer: a Docker-based browser code 'sandbox'Jen Andre
 
Algoritma dan Struktur Data - Pohon Biner
Algoritma dan Struktur Data - Pohon BinerAlgoritma dan Struktur Data - Pohon Biner
Algoritma dan Struktur Data - Pohon BinerKuliahKita
 
Bab 1 operasi bilangan real
Bab 1 operasi bilangan realBab 1 operasi bilangan real
Bab 1 operasi bilangan realEko Supriyadi
 
Algoritma Greedy (contoh soal)
Algoritma Greedy (contoh soal)Algoritma Greedy (contoh soal)
Algoritma Greedy (contoh soal)Ajeng Savitri
 
Pertemuan 4 Pemrograman Dasar
Pertemuan 4 Pemrograman DasarPertemuan 4 Pemrograman Dasar
Pertemuan 4 Pemrograman DasarDisma Ariyanti W
 
Permainan dua pemain jumlah-nol
Permainan dua pemain jumlah-nolPermainan dua pemain jumlah-nol
Permainan dua pemain jumlah-nolgleebelle
 

Was ist angesagt? (20)

Struktur data 05 (bs avl tree)
Struktur data 05 (bs avl tree)Struktur data 05 (bs avl tree)
Struktur data 05 (bs avl tree)
 
Bab 9 tree
Bab 9 treeBab 9 tree
Bab 9 tree
 
Tugas Mandiri Riset Operasi
Tugas Mandiri Riset OperasiTugas Mandiri Riset Operasi
Tugas Mandiri Riset Operasi
 
Soal uas struktur data
Soal uas struktur dataSoal uas struktur data
Soal uas struktur data
 
Data Management (Data Mining Association Rule)
Data Management (Data Mining Association Rule)Data Management (Data Mining Association Rule)
Data Management (Data Mining Association Rule)
 
Program penjumlahan dan pengurangan matriks
Program penjumlahan dan pengurangan matriksProgram penjumlahan dan pengurangan matriks
Program penjumlahan dan pengurangan matriks
 
Matriks, relasi dan fungsi
Matriks, relasi dan fungsi Matriks, relasi dan fungsi
Matriks, relasi dan fungsi
 
Ppt matriks
Ppt matriksPpt matriks
Ppt matriks
 
Teori bilangan (induksi matematika)
Teori bilangan (induksi matematika)Teori bilangan (induksi matematika)
Teori bilangan (induksi matematika)
 
Priority queues
Priority queuesPriority queues
Priority queues
 
Aljabar matriks-its
Aljabar matriks-itsAljabar matriks-its
Aljabar matriks-its
 
EVALUASI KOMPRESI DATA MENGGUNAKAN ALGORITMA LEMPEL-ZIV-MARKOV CHAIN DENGAN A...
EVALUASI KOMPRESI DATA MENGGUNAKAN ALGORITMA LEMPEL-ZIV-MARKOV CHAIN DENGAN A...EVALUASI KOMPRESI DATA MENGGUNAKAN ALGORITMA LEMPEL-ZIV-MARKOV CHAIN DENGAN A...
EVALUASI KOMPRESI DATA MENGGUNAKAN ALGORITMA LEMPEL-ZIV-MARKOV CHAIN DENGAN A...
 
Codetainer: a Docker-based browser code 'sandbox'
Codetainer: a Docker-based browser code 'sandbox'Codetainer: a Docker-based browser code 'sandbox'
Codetainer: a Docker-based browser code 'sandbox'
 
Algoritma dan Struktur Data - Pohon Biner
Algoritma dan Struktur Data - Pohon BinerAlgoritma dan Struktur Data - Pohon Biner
Algoritma dan Struktur Data - Pohon Biner
 
Bab 1 operasi bilangan real
Bab 1 operasi bilangan realBab 1 operasi bilangan real
Bab 1 operasi bilangan real
 
Algoritma Greedy (contoh soal)
Algoritma Greedy (contoh soal)Algoritma Greedy (contoh soal)
Algoritma Greedy (contoh soal)
 
Probabilitas Manprod 2
Probabilitas Manprod 2Probabilitas Manprod 2
Probabilitas Manprod 2
 
Pertemuan 4 Pemrograman Dasar
Pertemuan 4 Pemrograman DasarPertemuan 4 Pemrograman Dasar
Pertemuan 4 Pemrograman Dasar
 
Permainan dua pemain jumlah-nol
Permainan dua pemain jumlah-nolPermainan dua pemain jumlah-nol
Permainan dua pemain jumlah-nol
 
Matrik
MatrikMatrik
Matrik
 

Mehr von PerconaPerformance

Drizzles Approach To Improving Performance Of The Server
Drizzles  Approach To  Improving  Performance Of The  ServerDrizzles  Approach To  Improving  Performance Of The  Server
Drizzles Approach To Improving Performance Of The ServerPerconaPerformance
 
E M T Better Performance Monitoring
E M T  Better  Performance  MonitoringE M T  Better  Performance  Monitoring
E M T Better Performance MonitoringPerconaPerformance
 
Automated Performance Testing With J Meter And Maven
Automated  Performance  Testing With  J Meter And  MavenAutomated  Performance  Testing With  J Meter And  Maven
Automated Performance Testing With J Meter And MavenPerconaPerformance
 
Galera Multi Master Synchronous My S Q L Replication Clusters
Galera  Multi Master  Synchronous  My S Q L  Replication  ClustersGalera  Multi Master  Synchronous  My S Q L  Replication  Clusters
Galera Multi Master Synchronous My S Q L Replication ClustersPerconaPerformance
 
My S Q L Replication Getting The Most From Slaves
My S Q L  Replication  Getting  The  Most  From  SlavesMy S Q L  Replication  Getting  The  Most  From  Slaves
My S Q L Replication Getting The Most From SlavesPerconaPerformance
 
Performance Instrumentation Beyond What You Do Now
Performance  Instrumentation  Beyond  What  You  Do  NowPerformance  Instrumentation  Beyond  What  You  Do  Now
Performance Instrumentation Beyond What You Do NowPerconaPerformance
 
Boost Performance With My S Q L 51 Partitions
Boost Performance With  My S Q L 51 PartitionsBoost Performance With  My S Q L 51 Partitions
Boost Performance With My S Q L 51 PartitionsPerconaPerformance
 
Trees And More With Postgre S Q L
Trees And  More With  Postgre S Q LTrees And  More With  Postgre S Q L
Trees And More With Postgre S Q LPerconaPerformance
 
Database Performance With Proxy Architectures
Database  Performance With  Proxy  ArchitecturesDatabase  Performance With  Proxy  Architectures
Database Performance With Proxy ArchitecturesPerconaPerformance
 
Running A Realtime Stats Service On My Sql
Running A Realtime Stats Service On My SqlRunning A Realtime Stats Service On My Sql
Running A Realtime Stats Service On My SqlPerconaPerformance
 
How To Think About Performance
How To Think About PerformanceHow To Think About Performance
How To Think About PerformancePerconaPerformance
 
Object Oriented Css For High Performance Websites And Applications
Object Oriented Css For High Performance Websites And ApplicationsObject Oriented Css For High Performance Websites And Applications
Object Oriented Css For High Performance Websites And ApplicationsPerconaPerformance
 
Your Disk Array Is Slower Than It Should Be
Your Disk Array Is Slower Than It Should BeYour Disk Array Is Slower Than It Should Be
Your Disk Array Is Slower Than It Should BePerconaPerformance
 

Mehr von PerconaPerformance (17)

Drizzles Approach To Improving Performance Of The Server
Drizzles  Approach To  Improving  Performance Of The  ServerDrizzles  Approach To  Improving  Performance Of The  Server
Drizzles Approach To Improving Performance Of The Server
 
E M T Better Performance Monitoring
E M T  Better  Performance  MonitoringE M T  Better  Performance  Monitoring
E M T Better Performance Monitoring
 
Automated Performance Testing With J Meter And Maven
Automated  Performance  Testing With  J Meter And  MavenAutomated  Performance  Testing With  J Meter And  Maven
Automated Performance Testing With J Meter And Maven
 
Galera Multi Master Synchronous My S Q L Replication Clusters
Galera  Multi Master  Synchronous  My S Q L  Replication  ClustersGalera  Multi Master  Synchronous  My S Q L  Replication  Clusters
Galera Multi Master Synchronous My S Q L Replication Clusters
 
My S Q L Replication Getting The Most From Slaves
My S Q L  Replication  Getting  The  Most  From  SlavesMy S Q L  Replication  Getting  The  Most  From  Slaves
My S Q L Replication Getting The Most From Slaves
 
Performance Instrumentation Beyond What You Do Now
Performance  Instrumentation  Beyond  What  You  Do  NowPerformance  Instrumentation  Beyond  What  You  Do  Now
Performance Instrumentation Beyond What You Do Now
 
Boost Performance With My S Q L 51 Partitions
Boost Performance With  My S Q L 51 PartitionsBoost Performance With  My S Q L 51 Partitions
Boost Performance With My S Q L 51 Partitions
 
High Performance Erlang
High  Performance  ErlangHigh  Performance  Erlang
High Performance Erlang
 
Websites On Speed
Websites On  SpeedWebsites On  Speed
Websites On Speed
 
Trees And More With Postgre S Q L
Trees And  More With  Postgre S Q LTrees And  More With  Postgre S Q L
Trees And More With Postgre S Q L
 
Database Performance With Proxy Architectures
Database  Performance With  Proxy  ArchitecturesDatabase  Performance With  Proxy  Architectures
Database Performance With Proxy Architectures
 
Using Storage Class Memory
Using Storage Class MemoryUsing Storage Class Memory
Using Storage Class Memory
 
Websites On Speed
Websites On SpeedWebsites On Speed
Websites On Speed
 
Running A Realtime Stats Service On My Sql
Running A Realtime Stats Service On My SqlRunning A Realtime Stats Service On My Sql
Running A Realtime Stats Service On My Sql
 
How To Think About Performance
How To Think About PerformanceHow To Think About Performance
How To Think About Performance
 
Object Oriented Css For High Performance Websites And Applications
Object Oriented Css For High Performance Websites And ApplicationsObject Oriented Css For High Performance Websites And Applications
Object Oriented Css For High Performance Websites And Applications
 
Your Disk Array Is Slower Than It Should Be
Your Disk Array Is Slower Than It Should BeYour Disk Array Is Slower Than It Should Be
Your Disk Array Is Slower Than It Should Be
 

Kürzlich hochgeladen

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 

Kürzlich hochgeladen (20)

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 

Covering Indexes Ordersof Magnitude Improvements

  • 1. Covering Indexes: Orders-of-Magnitude Improvements Bradley C. Kuszmaul Chief Architect Percona Performance Conference 2009
  • 2. A Performance Example A fact table drawn from iiBench, except no indexes. Like TPCH, the int values are essentially random. Create Table: CREATE TABLE ‘facts‘ ( ‘xid‘ int(11) NOT NULL AUTO_INCREMENT, ‘dateandtime‘ datetime DEFAULT NULL, ‘cashregisterid‘ int(11) NOT NULL, ‘cust_id‘ int(11) NOT NULL, ‘prod_id‘ int(11) NOT NULL, ‘price‘ float NOT NULL, PRIMARY KEY (‘xid‘), ) ENGINE=TOKUDB Populated with 1 billion rows. Bradley C. Kuszmaul Covering Indexes 2
  • 3. The Query A simple query: mysql> select prod_id from facts where cust_id = 50000; prod id 525 654 704 . . 984 276 576 10014 rows in set (11 min 26.26 sec) Implemented via table scan. 14.6 rows/s. Bradley C. Kuszmaul Covering Indexes 3
  • 4. An Index Doubles Speed mysql> alter table add index facts cust_idx(cust_id); mysql> select prod_id from facts where cust_id = 50000; prod id 525 654 704 . . 984 276 576 10014 rows in set (5 min 48.94 sec) Looks at 0.001% of the data and get 2x speedup. 29 rows/s. Bradley C. Kuszmaul Covering Indexes 4
  • 5. Covering Index A covering index is an index in which all the neccessary columns are part of the key. mysql> alter table add index facts cust_prod_idx (cust_id, prod_id); Even if we have a key that happens to be unique, we can throw in some more keys (which doesn’t change the index order) to make it a covering index. Bradley C. Kuszmaul Covering Indexes 5
  • 6. Covering Index Is 1300x Faster mysql> alter table add index facts cust_prod_idx (cust_id, prod_id); mysql> select prod_id from facts where cust_id = 50000; prod id 0 0 . . 999 999 10014 rows in set (0.26 sec) Note different row order. Bradley C. Kuszmaul Covering Indexes 6
  • 7. Outline For a database that doesn’t fit in main memory, how can we predict performance? • The Disk Access Model (DAM). • Analysis using the DAM. • Which indexes should we maintain? • Another brief example: TPC-H. • Does SSD help? In this talk, I’ll describe a theoretical model for predicting performance, and show how to use it. Bradley C. Kuszmaul Covering Indexes 7
  • 8. The Disk-Access Model Disk A theoretical model Main memory for understanding performance of data Processor structures on disk. B Memory is organized in blocks B . . . of size B. B Blocks are transferred between memory and disk. Count only the number of block transfers. Model can predict the performance of a query plan. Bradley C. Kuszmaul Covering Indexes 8
  • 9. Analysis for No Index xid cust id prod id 1 42 501 Block 0 (size B) . . 6044 50000 525 . . Block 1 20480 50000 654 . . 109 rows Block 109/B − 2 44921 50000 704 . . 999703368 50000 984 . Block 109/B − 1 . 999850921 50000 276 . . Block 109/B 999923451 50000 576 . . 9 Table scan requires O(10 /B) transfers. Bradley C. Kuszmaul Covering Indexes 9
  • 10. Analysis for Index cust idx facts cust id xid xid cust id prod id 42 1 1 42 501 . . . . 50000 6044 6044 50000 525 . 50000 20480 . 50000 44921 20480 50000 654 . . . . 50000 999703368 44921 50000 704 . 50000 999850921 . 50000 999923451 999703368 50000 984 . . . . 999850921 50000 276 . Fetch only 10014 rows, . 999923451 50000 576 but mostly in different . . blocks, so O(10014) memory transfers. Bradley C. Kuszmaul Covering Indexes 10
  • 11. Analysis With Covering Index cust prod idx facts cust id prod id xid xid cust id prod id 42 501 1 1 42 501 . 50000 276 999850921 . 50000 525 6044 6044 50000 525 . 50000 576 999923451 . 50000 654 20480 20480 50000 654 . 50000 704 44921 . 50000 984 999703368 44921 50000 704 . . Answer directly out of 999703368 50000 984 . . the index: O(10014/B) 999850921 50000 276 transfers. . . (The prod ids are sorted 999923451 50000 576 for each customer.) . . Bradley C. Kuszmaul Covering Indexes 11
  • 12. Which Indexes? Problem: MySQL only allows 16 columns in an index (32 in TokuDB). Covering indexes speed up queries. But which columns should we throw in? Solution: We defined clustering indexes. A clustering index is an index which includes all the columns in the index. mysql> alter table facts add clustering index cust cluster idx (customerid); Materializes a table, sorted in a different order, clustered on the index. A clustering index is a covering index for all queries. Bradley C. Kuszmaul Covering Indexes 12
  • 13. TPC-H Q17 We’ve been trying to figure out how to make TPC-H-like queries run faster, so we picked Q17, which is one of the slowest in MySQL. Results: With a clustering index on (L_PARTNUM, L_QTY): Scale Standard Clustering SF10 (10GB) > 3600s 101s (> 36x speedup) SF100 (100GB) 680533s 773s (770x speedup) I’ll write more about this in blog tokuview.com. Bradley C. Kuszmaul Covering Indexes 13
  • 14. Indexes Are Expensive (or Are They?) The downside of maintaining indexes is that insertions are more expensive. Fractal Tree indexes speed up insertions by orders of magnitude, however. 9 For B = 4096 rows and N = 10 rows, the number of memory transfers per operation is B-Trees Fractal Trees log N Point Query O ≥ 1 O(logB N) ≥ 1 log B Range Query O(S/B) O(S/B) log N log N Insertion O ≥1O = 0.007 log B B TokuDB, the Tokutek storage engine, implements Fractal Tree indexes. Bradley C. Kuszmaul Covering Indexes 14
  • 15. Other Materialization Ideas Better tools are needed to help maintain other interesting materializations for MySQL. For example: • Denormalization: prejoining some columns. • Multidimensional indexing: often where clauses look like range queries on multiple columns. 38<=a and a<=42 and 90<=b and b<=99 Partions can provide a painful substitute for multidimensional indexing. Fractal Tree indexes can help solve these problems. Bradley C. Kuszmaul Covering Indexes 15
  • 16. iiBench on RAID10 and SSD 35000 30000 25000 Insertion Rate TokuDB 20000 FusionIO X25E RAID10 15000 10000 InnoDB 5000 FusionIO X25-E RAID10 0 0 5e+07 1e+08 1.5e+08 Cummulative Insertions Percona measured TokuDB and InnoDB on iiBench on RAID 10 disks, Intel X25-E 32GB SSD, and FusionIO 160GB SSD. These SSDs provide surprisingly little performance. Bradley C. Kuszmaul Covering Indexes 16