SSD Aware Scan Optimization in PostgreSQL

•Als PPTX, PDF herunterladen•

1 gefällt mir•409 views

This document summarizes a study on optimizing scan operations in PostgreSQL for SSD storage. It hypothesizes that index scans may outperform other scan methods on SSDs due to near-equal random and sequential access times. The methodology tests scan performance on a SSD-equipped system using indexes versus bitmap index scans and heap scans. Results show index scans improve performance by 29-44% for selective queries when sufficient memory holds the table. The optimization only benefits databases fitting entirely in memory.

Software

A Study on SSD Aware
Scan Operation
Optimization in
PostgreSQL Database

SSDs
Silicon memory chips
No moving parts
No rotational delay
Near zero seek time
Both random and sequential block access
time is almost the same !

But ...
The cost models in RDBMS are based on the
characteristics of spin type HDDs.
Assumes random_block_access_time >
sequential_block_access_time
When used with SSDs this assumption is not
valid
- Is there opportunities for improvements ??

Background information
Scan operation
- SELECT * FROM table WHERE condition
Selectivity
Scan operation alternatives in PostgreSQL
- Heap Scan
- Bitmap index scan + Bitmap heap scan
- Index scan

Our Hypothesis
Index scan based on a secondary index can
perform better than other scan operations in
databases which runs on SSD type storage
media.
Based on the fact that in SSDs the random
block access cost is almost similar to
sequential block access cost

Our Hypothesis (Continued)
SELECT * FROM table WHERE column = val
- column is indexed (not primary)
- correlation between primary index and
secondary index is zero

Methodology
Kingston 8GB Data Traveler
Dedicated PC running Ubuntu 12.04 (i5 2.3 GHz processor
and 4GB system memory)
PostgreSQL 9.3
Table with 36 columns, 6,000,000 rows of data
SELECT * FROM table_1 WHERE column_1 > val_1 AND
column_1 < val_2
1.7 GB of data (with indexes)

Methodology (Continued)
numeric field “idx_column” indexed using a
btree index
correlation between primary index and
secondary index is = 0.000000…
cardinality of the “idx_column” field is 933900

Selectivity
(log) seq scan BHS + BIS index scan
-4 10594 0 0
-3 10269 1 0
-2 10255 9 4
-1 10260 94 44
0 10278 644 457
1 10407 8794 4915
2 11600 16528 49395

In PostgreSQL
random_block_access_time
= 4 * seq_block_access_time
This is assuming spin type HDDs
What is the relation in SSDs ?
random_block_access_time
= seq_block_access_time ??

Selectivity (log)
Running times before
optimization(ms)
Optimum running
times(ms)
Running times
after
optimization(ms)
Cost reduction
(ms) Cost reduction (%)
-4 0 0 0 0 -
-3 1 0 0 1 100
-2 9 4 4 5 56
-1 94 44 44 50 53
0 644 457 457 187 29
1 8794 4915 4915 3879 44
2 11600 11600 11600 0 0

Are we done ??
We haven’t consider an important factor
- relative size of the table compared to the
system memory

Observations
Sequential scan remains consistent for all the
system memory values. why ?
Both BIS + BHS and index scan drastically
underperforms when system memory is
reduced.
BIS + BHS performs slightly better than index
scan

So the optimization will work only in special
conditions where at least majority of the
table content can reside in the main
memory.
- Does this means the optimization is of no
use ??

Potential of this optimization
- Small table size databases
- Embedded devices
- Mobile phones etc.

Empfohlen

21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...Athens Big Data

Cassandra at talkbitsMax Alexejev

Data Structures and Performance for Scientific Computing with Hadoop and Dumb...Austin Benson

FTS middleware doc.chopkins19

Apache tajo configurationJihoon Son

Performance evaluation of apache tajoJihoon Son

VLDB2013 Session 1 Emerging HardwareTakuma Wakamori

[Paper Reading] Efficient Query Processing with Optimistically Compressed Has...PingCAP

Empfohlen

21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...Athens Big Data

Cassandra at talkbitsMax Alexejev

Data Structures and Performance for Scientific Computing with Hadoop and Dumb...Austin Benson

FTS middleware doc.chopkins19

Apache tajo configurationJihoon Son

Performance evaluation of apache tajoJihoon Son

VLDB2013 Session 1 Emerging HardwareTakuma Wakamori

[Paper Reading] Efficient Query Processing with Optimistically Compressed Has...PingCAP

Power of NetworksAlec Couros

Hawk eye technologyAkash Sahu

Spain .NEXT on Tour Keynote and Technical SlidedeckNEXTtour

Italian .NEXT on Tour Keynote and Technical SlidedeckNEXTtour

160-Gb-s Silicon All-Optical Packet Switch for Buffer-less Optical Burst Swit...University of Technology

201111 diagramandy gandoz

USDA Rural Development webinar: Building Businesses on Rural Broadband Invest...Calix

The Network App Store, Maarten Ectors, Canonical. Alan Quayle

Silent sound technologynixytl

February 2017 Calix Investor PresentationCalixInc

Containers and Nutanix - Acropolis Container ServicesNEXTtour

August 2016 calix investor presentationCalixInc

IDC Nutanix - Hyperconvergence and the Pulling Forces in the DatacenterNEXTtour

Embedded System in Automobiles Seminar Links

Enterprise Cloud Platform - Keynote - UtrechtNEXTtour

Nutanix Fundamentals The Enterprise Cloud CompanyNEXTtour

Electronic' skin monitors heart, brain functioncmr cet

Nutanix NEXT on Tour - Maarssen, Netherlands NEXTtour

FTTH Solutions For Today And TomorrowCalix

ECG-T wave inversion , Dr. Malala Rajapaksha ,Cardiology unit,General Hospit...malala720

What’s Evolving in the Elastic StackElasticsearch

hpc2013_20131223Ryohei Kobayashi

Weitere ähnliche Inhalte

Andere mochten auch

Power of NetworksAlec Couros

Hawk eye technologyAkash Sahu

Spain .NEXT on Tour Keynote and Technical SlidedeckNEXTtour

Italian .NEXT on Tour Keynote and Technical SlidedeckNEXTtour

160-Gb-s Silicon All-Optical Packet Switch for Buffer-less Optical Burst Swit...University of Technology

201111 diagramandy gandoz

USDA Rural Development webinar: Building Businesses on Rural Broadband Invest...Calix

The Network App Store, Maarten Ectors, Canonical. Alan Quayle

Silent sound technologynixytl

February 2017 Calix Investor PresentationCalixInc

Containers and Nutanix - Acropolis Container ServicesNEXTtour

August 2016 calix investor presentationCalixInc

IDC Nutanix - Hyperconvergence and the Pulling Forces in the DatacenterNEXTtour

Embedded System in Automobiles Seminar Links

Enterprise Cloud Platform - Keynote - UtrechtNEXTtour

Nutanix Fundamentals The Enterprise Cloud CompanyNEXTtour

Electronic' skin monitors heart, brain functioncmr cet

Nutanix NEXT on Tour - Maarssen, Netherlands NEXTtour

FTTH Solutions For Today And TomorrowCalix

ECG-T wave inversion , Dr. Malala Rajapaksha ,Cardiology unit,General Hospit...malala720

Andere mochten auch (20)

Power of Networks

Hawk eye technology

Spain .NEXT on Tour Keynote and Technical Slidedeck

Italian .NEXT on Tour Keynote and Technical Slidedeck

160-Gb-s Silicon All-Optical Packet Switch for Buffer-less Optical Burst Swit...

201111 diagram

USDA Rural Development webinar: Building Businesses on Rural Broadband Invest...

The Network App Store, Maarten Ectors, Canonical.

Silent sound technology

February 2017 Calix Investor Presentation

Containers and Nutanix - Acropolis Container Services

August 2016 calix investor presentation

IDC Nutanix - Hyperconvergence and the Pulling Forces in the Datacenter

Embedded System in Automobiles

Enterprise Cloud Platform - Keynote - Utrecht

Nutanix Fundamentals The Enterprise Cloud Company

Electronic' skin monitors heart, brain function

Nutanix NEXT on Tour - Maarssen, Netherlands

FTTH Solutions For Today And Tomorrow

ECG-T wave inversion , Dr. Malala Rajapaksha ,Cardiology unit,General Hospit...

Ähnlich wie SSD Aware Scan Optimization in PostgreSQL

What’s Evolving in the Elastic StackElasticsearch

hpc2013_20131223Ryohei Kobayashi

Cost Based OracleSantosh Kangane

Deep Dive on Amazon DynamoDBAmazon Web Services

Sucet os module_5_notesSRINIVASUNIVERSITYEN

Imply at Apache Druid Meetup in London 1-15-20Jelena Zanko

Sql Server Performance TuningBala Subra

query-optimization-techniques_talk.pdfgaros1

PostgreSQL 9.4, 9.5 and Beyond @ COSCUP 2015 TaipeiSatoshi Nagayasu

Best Practices for Migrating Your Data Warehouse to Amazon RedshiftAmazon Web Services

Mass storage structureRobert Antony

Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...javier ramirez

Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade OffTimescale

Apache Cassandra at MacysDataStax Academy

PostgreSQL High_Performance_CheatsheetLucian Oprea

Why databases cry at nightMichael Yarichuk

Wolfgang Lehner Technische Universitat DresdenInfinIT - Innovationsnetværket for it

Modeling data and best practices for the Azure Cosmos DB.Mohammad Asif

PresentationDimitris Stripelis

Three steps to untangle data traffic jamsBol.com Techlab

Ähnlich wie SSD Aware Scan Optimization in PostgreSQL (20)

What’s Evolving in the Elastic Stack

hpc2013_20131223

Cost Based Oracle

Deep Dive on Amazon DynamoDB

Sucet os module_5_notes

Imply at Apache Druid Meetup in London 1-15-20

Sql Server Performance Tuning

query-optimization-techniques_talk.pdf

PostgreSQL 9.4, 9.5 and Beyond @ COSCUP 2015 Taipei

Best Practices for Migrating Your Data Warehouse to Amazon Redshift

Mass storage structure

Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...

Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off

Apache Cassandra at Macys

PostgreSQL High_Performance_Cheatsheet

Why databases cry at night

Wolfgang Lehner Technische Universitat Dresden

Modeling data and best practices for the Azure Cosmos DB.

Presentation

Three steps to untangle data traffic jams

Kürzlich hochgeladen

Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray

UI5ers live - Custom Controls wrapping 3rd-party libs.pptxAndreas Kunz

Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

How to submit a standout Adobe Champion ApplicationBradBedford3

Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent

SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler

Implementing Zero Trust strategy with AzureDinusha Kumarasiri

SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa

Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services

Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley

Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ

A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska

Cyber security and its impact on E commercemanigoyal112

Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky

Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts

Post Quantum Cryptography – The Impact on Identityteam-WIBU

Advantages of Odoo ERP 17 for Your BusinessEnvertis Software Solutions

Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel

Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz

Kürzlich hochgeladen (20)

Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...

UI5ers live - Custom Controls wrapping 3rd-party libs.pptx

Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service

How to submit a standout Adobe Champion Application

Machine Learning Software Engineering Patterns and Their Engineering

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...

SensoDat: Simulation-based Sensor Dataset of Self-driving Cars

Implementing Zero Trust strategy with Azure

SpotFlow: Tracking Method Calls and States at Runtime

Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...

Comparing Linux OS Image Update Models - EOSS 2024.pdf

Cloud Data Center Network Construction - IEEE

A healthy diet for your Java application Devoxx France.pdf

Cyber security and its impact on E commerce

Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...

Odoo 14 - eLearning Module In Odoo 14 Enterprise

Post Quantum Cryptography – The Impact on Identity

Advantages of Odoo ERP 17 for Your Business

Unveiling the Future: Sylius 2.0 New Features

Folding Cheat Sheet #4 - fourth in a series

SSD Aware Scan Optimization in PostgreSQL

1. A Study on SSD Aware Scan Operation Optimization in PostgreSQL Database

2. SSDs vs Traditional Spin Type HDDs

3. SSDs Silicon memory chips No moving parts No rotational delay Near zero seek time Both random and sequential block access time is almost the same !

4. But ... The cost models in RDBMS are based on the characteristics of spin type HDDs. Assumes random_block_access_time > sequential_block_access_time When used with SSDs this assumption is not valid - Is there opportunities for improvements ??

5. Background information Scan operation - SELECT * FROM table WHERE condition Selectivity Scan operation alternatives in PostgreSQL - Heap Scan - Bitmap index scan + Bitmap heap scan - Index scan

6. Our Hypothesis Index scan based on a secondary index can perform better than other scan operations in databases which runs on SSD type storage media. Based on the fact that in SSDs the random block access cost is almost similar to sequential block access cost

7. Our Hypothesis (Continued) SELECT * FROM table WHERE column = val - column is indexed (not primary) - correlation between primary index and secondary index is zero

8. Methodology Kingston 8GB Data Traveler Dedicated PC running Ubuntu 12.04 (i5 2.3 GHz processor and 4GB system memory) PostgreSQL 9.3 Table with 36 columns, 6,000,000 rows of data SELECT * FROM table_1 WHERE column_1 > val_1 AND column_1 < val_2 1.7 GB of data (with indexes)

9. Methodology (Continued) numeric field “idx_column” indexed using a btree index correlation between primary index and secondary index is = 0.000000… cardinality of the “idx_column” field is 933900

10.

11. Selectivity (log) seq scan BHS + BIS index scan -4 10594 0 0 -3 10269 1 0 -2 10255 9 4 -1 10260 94 44 0 10278 644 457 1 10407 8794 4915 2 11600 16528 49395

12. In PostgreSQL random_block_access_time = 4 * seq_block_access_time This is assuming spin type HDDs What is the relation in SSDs ? random_block_access_time = seq_block_access_time ??

13.

14.

15. Selectivity (log) Running times before optimization(ms) Optimum running times(ms) Running times after optimization(ms) Cost reduction (ms) Cost reduction (%) -4 0 0 0 0 - -3 1 0 0 1 100 -2 9 4 4 5 56 -1 94 44 44 50 53 0 644 457 457 187 29 1 8794 4915 4915 3879 44 2 11600 11600 11600 0 0

16. Are we done ?? We haven’t consider an important factor - relative size of the table compared to the system memory

17.

18.

19. Observations Sequential scan remains consistent for all the system memory values. why ? Both BIS + BHS and index scan drastically underperforms when system memory is reduced. BIS + BHS performs slightly better than index scan

20. So the optimization will work only in special conditions where at least majority of the table content can reside in the main memory. - Does this means the optimization is of no use ??

21. Potential of this optimization - Small table size databases - Embedded devices - Mobile phones etc.

22. Questions ??