SlideShare ist ein Scribd-Unternehmen logo
1 von 38
Downloaden Sie, um offline zu lesen
Performance evaluation of fast
integer compression
techniques over tables

Ikhtear Md. Sharif Bhuyan
Supervisors: Hazel Webb, Daniel Lemire, Owen Kaser
©Ikhtear Md. Sharif Bhuyan
Overview
•
•
•
•
•
•

Introduction
Compression in databases and issues
Objectives
Experimental Results
Conclusion
Future Work

Performance evaluation of fast integer compression techniques over tables

12/4/2013

2
Query processing
Processor

Cache
RAM

Disk

Performance evaluation of fast integer compression techniques over tables

12/4/2013

3
Compression in databases
•
•
•
•

Reduce storage
Query processing speed
Save I/O bandwidth
Improve performance for I/O-bound operation

Performance evaluation of fast integer compression techniques over tables

12/4/2013

4
Selecting Compression in
databases
• Lossless
• Trade off between compression ratio and speed of
compression and decompression

Performance evaluation of fast integer compression techniques over tables

12/4/2013

5
Objective
• Examining and comparing the performance of
patched schemes with other methods with respect
to compression ratio, decompression speed and
compression speed.
• Assessing the effect of different factors such as row
order.

Performance evaluation of fast integer compression techniques over tables

12/4/2013

6
Column-oriented database
system
104543

ID

Name

104543

Peter

203456

Sam

234321

Peter

203456

Sam

234321

Maria

Maria

Row-oriented database

104543

203456

234621

Peter

Sam

Maria

Column-oriented database
Performance evaluation of fast integer compression techniques over tables

12/4/2013

7
Compression Algorithm
• Variable length output
o Byte-oriented compression: Integers are coded in
units of bytes. i.e., Variable-Byte
o Block-based compression: These schemes use a
fixed number of input integers and output a
variable number of bytes. e.g., FOR, NewPFD,
FastPFD

Performance evaluation of fast integer compression techniques over tables

12/4/2013

8
Compression Algorithm (Contd …)
• Fixed length output
Each step takes a variable number of integers
and produces a compressed form of those integers
using a fixed number of bits as a unit. i.e., Simple9

Performance evaluation of fast integer compression techniques over tables

12/4/2013

9
Binary packing
• Original Sequence
67

78

85

96

98

• the numbers range from 67 to 98.
• Compressed Sequence
0

11

18

29

Performance evaluation of fast integer compression techniques over tables

31

12/4/2013

10
Patched Compression
• Original Sequence
11

1

10

…

11

11

11111 10

11

• The exception # 11111.
• Base value b=2 (non-exceptional values), maximum
number of bits 5, number of exception 1, location
of exception 125
• Compressed Sequence
11

1

10

…

11

11

11

Performance evaluation of fast integer compression techniques over tables

10

11
12/4/2013

11
Synthetic data experiments
• Compression Ratio Clustered data

Clustered Data

Performance evaluation of fast integer compression techniques over tables

12/4/2013

12
Synthetic data experiments (Contd …)
• Compression Ratio Uniform data

Uniform data
Performance evaluation of fast integer compression techniques over tables

12/4/2013

13
Synthetic data experiments(Contd …)
• Decompression Speed:

Clustered data

Performance evaluation of fast integer compression techniques over tables

12/4/2013

14
Synthetic data experiments(Contd …)

Uniform Data
Performance evaluation of fast integer compression techniques over tables

12/4/2013

15
Real Data Sets
• Census-Income
• Census1881
• Star Schema Benchmark

Performance evaluation of fast integer compression techniques over tables

12/4/2013

16
Column wise Compressed size

Original

Shuffled

Column-wise compressed size for Census1881 of frequency coded file
Performance evaluation of fast integer compression techniques over tables

12/4/2013

17
Column wise Compressed size
(Contd …)

Sort High Cardinality Column (column 1)

Sort Low Cardinality Column(column 3)

Column-wise compressed size for Census1881 of frequency coded file
Performance evaluation of fast integer compression techniques over tables

12/4/2013

18
Column wise Compression
speed

Column-wise compression speed for Census1881 of frequency coded file
Performance evaluation of fast integer compression techniques over tables

12/4/2013

19
Column wise Compression
speed (Contd …)

Column-wise compression speed for Census1881 of frequency coded file
Performance evaluation of fast integer compression techniques over tables

12/4/2013

20
Column wise Decompression
speed

Column-wise decompression speed for Census1881 of frequency coded file
Performance evaluation of fast integer compression techniques over tables

12/4/2013

21
Column wise Decompression
speed (Contd …)

Column-wise decompression speed for Census1881 of frequency coded file
Performance evaluation of fast integer compression techniques over tables

12/4/2013

22
Effect of Row Order

Histogram of compressed size (bits/int)
Performance evaluation of fast integer compression techniques over tables

12/4/2013

23
Conclusion
• Sorting columns results in good compressed size.
• Sorted columns can be compressed and
decompressed faster than shuffled order.
• Selection of compression schemes depends on the
nature of database(OLPT/OLAP) and the
requirement of storage and data access speed.

Performance evaluation of fast integer compression techniques over tables

12/4/2013

24
Future Work
• Incorporating a query engine to asses real world
performance.
• Comparing on processor-level metrics.
• Using multiple threads in compression algorithm.
• Query in compressed form

Performance evaluation of fast integer compression techniques over tables

12/4/2013

25
Thank You

Performance evaluation of fast integer compression techniques over tables

12/4/2013

26
Backup

Performance evaluation of fast integer compression techniques over tables

12/4/2013

27
Key Issues
• Data access latency
The time it takes between the request sent and the
data is found on disk to start processing.
• Disk bandwidth
The amount of data can be sent per second from the
disk.

Performance evaluation of fast integer compression techniques over tables

12/4/2013

28
Experimental Setup
• Hardware
o
o
o
o

Intel Core i5-2400
RAM: 8 GB
Cache: 6MB L3
Memory Clock Speed: 1333 MHz

• Software
o Java SDK version 1.7.0
o https://github.com/lemire/JavaFastPFOR
o Single-threaded

• More Info
o http://hdl.handle.net/1882/45703

Performance evaluation of fast integer compression techniques over tables

12/4/2013

29
Compressed Size
Coding Scheme

Original

Shuffled High Card.

Low Card.

Variable-Byte

15.00

15.00

15.00

15.00

Binary Packing

11.37

11.42

11.15

11.37

NewPFD

13.06

13.19

12.32

13.14

OptPFD

11.84

11.85

11.80

11.80

FastPFOR

11.27

11.29

11.06

11.24

Simple9

15.75

15.90

15.72

15.84

Result of compression (bits per integer) on SSB with frequency coded file
Performance evaluation of fast integer compression techniques over tables

12/4/2013

30
Compression Speed
Coding Scheme

Original

Shuffled High Card.

Low Card.

Variable-Byte

33

31

33

31

Binary Packing

729

711

746

732

NewPFD

52

36

40

34

OptPFD

6

3

5

4

FastPFOR

104

76

89

84

Simple9

78

60

69

64

Result of compression speed (mis) on Census1881 with frequency coded
file
Performance evaluation of fast integer compression techniques over tables

12/4/2013

31
Decompression Speed
Coding Scheme

Original

Shuffled High Card.

Low Card.

Variable-Byte

165

197

214

186

Binary Packing

1151

1089

1151

1135

NewPFD

709

615

729

689

OptPFD

421

357

482

381

FastPFOR

776

707

763

730

Simple9

488

377

447

398

Result of decompression speed (mis) on Census1881 with frequency
coded file
Performance evaluation of fast integer compression techniques over tables

12/4/2013

32
Column wise Compressed size

Original

Shuffled

Column-wise compressed size for Census1881 of frequency coded file
Performance evaluation of fast integer compression techniques over tables

12/4/2013

33
Column wise Compressed size

Sort High Cardinality Column (column 1)

Sort Low Cardinality Column(column 3)

Column-wise compressed size for Census1881 of frequency coded file
Performance evaluation of fast integer compression techniques over tables

12/4/2013

34
Column wise Compression
speed

Column-wise compression speed for Census1881 of frequency coded file
Performance evaluation of fast integer compression techniques over tables

12/4/2013

35
Column wise Decompression
speed

Column-wise decompression speed for Census1881 of frequency coded file
Performance evaluation of fast integer compression techniques over tables

12/4/2013

36
Effect of CPU family on
compression speed

Compression speed (mis) on different processor

Performance evaluation of fast integer compression techniques over tables

12/4/2013

37
Effect of CPU family on
decompression speed

Decompression speed (mis) on different processor
Performance evaluation of fast integer compression techniques over tables

12/4/2013

38

Weitere ähnliche Inhalte

Kürzlich hochgeladen

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 

Kürzlich hochgeladen (20)

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 

Empfohlen

Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 

Empfohlen (20)

Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 

Performance evaluation of fast integer compression techniques over tables

  • 1. Performance evaluation of fast integer compression techniques over tables Ikhtear Md. Sharif Bhuyan Supervisors: Hazel Webb, Daniel Lemire, Owen Kaser ©Ikhtear Md. Sharif Bhuyan
  • 2. Overview • • • • • • Introduction Compression in databases and issues Objectives Experimental Results Conclusion Future Work Performance evaluation of fast integer compression techniques over tables 12/4/2013 2
  • 3. Query processing Processor Cache RAM Disk Performance evaluation of fast integer compression techniques over tables 12/4/2013 3
  • 4. Compression in databases • • • • Reduce storage Query processing speed Save I/O bandwidth Improve performance for I/O-bound operation Performance evaluation of fast integer compression techniques over tables 12/4/2013 4
  • 5. Selecting Compression in databases • Lossless • Trade off between compression ratio and speed of compression and decompression Performance evaluation of fast integer compression techniques over tables 12/4/2013 5
  • 6. Objective • Examining and comparing the performance of patched schemes with other methods with respect to compression ratio, decompression speed and compression speed. • Assessing the effect of different factors such as row order. Performance evaluation of fast integer compression techniques over tables 12/4/2013 6
  • 8. Compression Algorithm • Variable length output o Byte-oriented compression: Integers are coded in units of bytes. i.e., Variable-Byte o Block-based compression: These schemes use a fixed number of input integers and output a variable number of bytes. e.g., FOR, NewPFD, FastPFD Performance evaluation of fast integer compression techniques over tables 12/4/2013 8
  • 9. Compression Algorithm (Contd …) • Fixed length output Each step takes a variable number of integers and produces a compressed form of those integers using a fixed number of bits as a unit. i.e., Simple9 Performance evaluation of fast integer compression techniques over tables 12/4/2013 9
  • 10. Binary packing • Original Sequence 67 78 85 96 98 • the numbers range from 67 to 98. • Compressed Sequence 0 11 18 29 Performance evaluation of fast integer compression techniques over tables 31 12/4/2013 10
  • 11. Patched Compression • Original Sequence 11 1 10 … 11 11 11111 10 11 • The exception # 11111. • Base value b=2 (non-exceptional values), maximum number of bits 5, number of exception 1, location of exception 125 • Compressed Sequence 11 1 10 … 11 11 11 Performance evaluation of fast integer compression techniques over tables 10 11 12/4/2013 11
  • 12. Synthetic data experiments • Compression Ratio Clustered data Clustered Data Performance evaluation of fast integer compression techniques over tables 12/4/2013 12
  • 13. Synthetic data experiments (Contd …) • Compression Ratio Uniform data Uniform data Performance evaluation of fast integer compression techniques over tables 12/4/2013 13
  • 14. Synthetic data experiments(Contd …) • Decompression Speed: Clustered data Performance evaluation of fast integer compression techniques over tables 12/4/2013 14
  • 15. Synthetic data experiments(Contd …) Uniform Data Performance evaluation of fast integer compression techniques over tables 12/4/2013 15
  • 16. Real Data Sets • Census-Income • Census1881 • Star Schema Benchmark Performance evaluation of fast integer compression techniques over tables 12/4/2013 16
  • 17. Column wise Compressed size Original Shuffled Column-wise compressed size for Census1881 of frequency coded file Performance evaluation of fast integer compression techniques over tables 12/4/2013 17
  • 18. Column wise Compressed size (Contd …) Sort High Cardinality Column (column 1) Sort Low Cardinality Column(column 3) Column-wise compressed size for Census1881 of frequency coded file Performance evaluation of fast integer compression techniques over tables 12/4/2013 18
  • 19. Column wise Compression speed Column-wise compression speed for Census1881 of frequency coded file Performance evaluation of fast integer compression techniques over tables 12/4/2013 19
  • 20. Column wise Compression speed (Contd …) Column-wise compression speed for Census1881 of frequency coded file Performance evaluation of fast integer compression techniques over tables 12/4/2013 20
  • 21. Column wise Decompression speed Column-wise decompression speed for Census1881 of frequency coded file Performance evaluation of fast integer compression techniques over tables 12/4/2013 21
  • 22. Column wise Decompression speed (Contd …) Column-wise decompression speed for Census1881 of frequency coded file Performance evaluation of fast integer compression techniques over tables 12/4/2013 22
  • 23. Effect of Row Order Histogram of compressed size (bits/int) Performance evaluation of fast integer compression techniques over tables 12/4/2013 23
  • 24. Conclusion • Sorting columns results in good compressed size. • Sorted columns can be compressed and decompressed faster than shuffled order. • Selection of compression schemes depends on the nature of database(OLPT/OLAP) and the requirement of storage and data access speed. Performance evaluation of fast integer compression techniques over tables 12/4/2013 24
  • 25. Future Work • Incorporating a query engine to asses real world performance. • Comparing on processor-level metrics. • Using multiple threads in compression algorithm. • Query in compressed form Performance evaluation of fast integer compression techniques over tables 12/4/2013 25
  • 26. Thank You Performance evaluation of fast integer compression techniques over tables 12/4/2013 26
  • 27. Backup Performance evaluation of fast integer compression techniques over tables 12/4/2013 27
  • 28. Key Issues • Data access latency The time it takes between the request sent and the data is found on disk to start processing. • Disk bandwidth The amount of data can be sent per second from the disk. Performance evaluation of fast integer compression techniques over tables 12/4/2013 28
  • 29. Experimental Setup • Hardware o o o o Intel Core i5-2400 RAM: 8 GB Cache: 6MB L3 Memory Clock Speed: 1333 MHz • Software o Java SDK version 1.7.0 o https://github.com/lemire/JavaFastPFOR o Single-threaded • More Info o http://hdl.handle.net/1882/45703 Performance evaluation of fast integer compression techniques over tables 12/4/2013 29
  • 30. Compressed Size Coding Scheme Original Shuffled High Card. Low Card. Variable-Byte 15.00 15.00 15.00 15.00 Binary Packing 11.37 11.42 11.15 11.37 NewPFD 13.06 13.19 12.32 13.14 OptPFD 11.84 11.85 11.80 11.80 FastPFOR 11.27 11.29 11.06 11.24 Simple9 15.75 15.90 15.72 15.84 Result of compression (bits per integer) on SSB with frequency coded file Performance evaluation of fast integer compression techniques over tables 12/4/2013 30
  • 31. Compression Speed Coding Scheme Original Shuffled High Card. Low Card. Variable-Byte 33 31 33 31 Binary Packing 729 711 746 732 NewPFD 52 36 40 34 OptPFD 6 3 5 4 FastPFOR 104 76 89 84 Simple9 78 60 69 64 Result of compression speed (mis) on Census1881 with frequency coded file Performance evaluation of fast integer compression techniques over tables 12/4/2013 31
  • 32. Decompression Speed Coding Scheme Original Shuffled High Card. Low Card. Variable-Byte 165 197 214 186 Binary Packing 1151 1089 1151 1135 NewPFD 709 615 729 689 OptPFD 421 357 482 381 FastPFOR 776 707 763 730 Simple9 488 377 447 398 Result of decompression speed (mis) on Census1881 with frequency coded file Performance evaluation of fast integer compression techniques over tables 12/4/2013 32
  • 33. Column wise Compressed size Original Shuffled Column-wise compressed size for Census1881 of frequency coded file Performance evaluation of fast integer compression techniques over tables 12/4/2013 33
  • 34. Column wise Compressed size Sort High Cardinality Column (column 1) Sort Low Cardinality Column(column 3) Column-wise compressed size for Census1881 of frequency coded file Performance evaluation of fast integer compression techniques over tables 12/4/2013 34
  • 35. Column wise Compression speed Column-wise compression speed for Census1881 of frequency coded file Performance evaluation of fast integer compression techniques over tables 12/4/2013 35
  • 36. Column wise Decompression speed Column-wise decompression speed for Census1881 of frequency coded file Performance evaluation of fast integer compression techniques over tables 12/4/2013 36
  • 37. Effect of CPU family on compression speed Compression speed (mis) on different processor Performance evaluation of fast integer compression techniques over tables 12/4/2013 37
  • 38. Effect of CPU family on decompression speed Decompression speed (mis) on different processor Performance evaluation of fast integer compression techniques over tables 12/4/2013 38