Accelerating Database Performance with Compression

•Als PPTX, PDF herunterladen•

0 gefällt mir•1,037 views

Presentation about the potential benefits and costs of using data compression within SQL Server. Presented to NJ SQL Group 19 March 2012

Technologie

Accelerating Database
Performance Using
Compression
Joey D’Antoni
New Jersey SQL Server Users Group
19 March 2013

 Principal Architect SQL Server, Comcast Cable
 @jdanton – Twitter
About Me
 Joedantoni.wordpress.com
 jdanton1@yahoo.com

Compression—What Does It Really
Mean?
Overview Deduplication—What is That?
What Should I Compress?
Columnstore—How Is It Different?

 Specialized compression to eliminate duplicate copies of
repeating data
 Real Example—In VMWare, you may have 10 copies of Windows
Deduplication running on same physical machine. Memory blocks (Common .dlls
for example) may be deduplicated.
 Backups

Faster performance on selects
Benefits of Less I/O is required to return data
Compression Better Space Utilization on Disk
More Pages In Memory

Expenses of  Added CPU Cycles
Compression  Slower bulk inserts and updates

Row Compression
SQL Server Page Compression
Compression  Prefix Compression
Types  Dictionary Compression
 Backup Compression

 In all editions of SQL Server, starting with 2008
R2
 Always use Backup Compression (even when
Backup your storage team says no)
Compression  Space is by default pre-allocated for estimated
size of uncompressed backup
 Trace Flag 3042

SQL Data Compression gives a great deal
of flexibility
Can compress tables, indexes and/or
So What To partitioning
Compress
Can use different methods of
compression
How to Decide?

 sp_estimate_data_compression_savings
What Won’t Compress Well
 Columns with numeric or fixed-length character data types where most
values require all the bytes allocated for the specific data type
Space Savings  Not much repeating data
 Repeating data with non-repeating prefixes
 Data stored out of the row
 FILESTREAM data

 Microsoft advises using for Row Compression for EVERYTHING if
Application you can spare 10% CPU. Don’t do this!
Workloads  Page Compression has higher overhead

Savings Savings
Table Scans Updates Decision Notes
ROW % PAGE %
T1 80% 90% 3.80% 57.27% ROW Low S, very high
U. ROW savings
close to PAGE

T2 15% 89% 92.46% 0% PAGE Very high S

T3 30% 81% 27.14% 4.17% ROW Low S

Example T4 38% 83% 89.16% 10.54% ROW High U

T5 21% 87% 0.00% 0% PAGE Append ONLY
table

T6 28% 87% 87.54% 0% PAGE High S, low U

T7 99% appends
-
source Microsoft Compression White Paper
T8
29% 88% 0.50% 0% PAGE
85% appends
30% 90% 11.44% 0.06% PAGE
T9 84% 92% 0.02% 0.00% ROW ROW savings ~=
PAGE

T10 15% 89% 100.00% 0.00% PAGE Read ONLY table

 Tables and Indexes are rebuilt using ALTER TABLE…REBUILD and
ALTER INDEX..REBUILD
 Requires workspace, CPU and I/O
What Happens  Same mechanism as rebuilding an index
with  Free workspace required in
 User Database
Compression  Transaction Log
 Temp B

 Online vs Offline
How and
 Concurrent vs Serial
When
 Order of Compressiong—start small and work
Compress up
Data  SORT_IN_TEMPDB

Table organization Table compression setting
ROW PAGE
Heap The newly inserted row is The newly inserted row is
row-compressed. page-compressed:
· if new row goes to an
existing page with page
compression
· if the new row is inserted

Managing through BULK INSERT with
TABLOCK
· if the new row is inserted
Inserts and through INSERT INTO ...
(TABLOCK) SELECT ... FROM
Updates Otherwise, the row is row-
compressed.*
Clustered index The newly inserted row is The newly inserted row is
row-compressed. page-compressed if new row
goes to an existing page with
page compression Otherwise,
it is row compressed until the
page fills up. Page
compression is attempted
before a page split.**

Mapping
Version store
index for
Table Transaction Sort pages for (with SI or
rebuilding the
compression log queries RCSI isolation
Underlying clustered
index
level)

Data St ROW ROW NONE NONE ROW

PAGE ROW NONE NONE ROW

 Non Updateable
 Limited Data Types
 Can only be nonclustered index
ColumnStore  No computed columns
Limiations  No sparse columns
 No indexed views
 One Per Table

Weitere ähnliche Inhalte

Mehr von Joseph D'Antoni

Pass 2013 dantoni azure a gsJoseph D'Antoni

Pass bac jd_smJoseph D'Antoni

Sql server 2012 ha and dr sql saturday bostonJoseph D'Antoni

Sql Server 2012 HA and DR -- SQL Saturday RichmondJoseph D'Antoni

Windows server 2012 failover clustering new featuresJoseph D'Antoni

San presentation nov 2012 central paJoseph D'Antoni

Always on availability groups way too deepJoseph D'Antoni

South jersey sql virtualizationJoseph D'Antoni

Virtualization for DBAJoseph D'Antoni

Sql server 2012 ha dr 24_hop_finalJoseph D'Antoni

Sql server 2012 ha dr novaJoseph D'Antoni

Sql server 2012 ha drJoseph D'Antoni

Sql saturday powerpoint dc_sanJoseph D'Antoni

Sql saturday dc vm wareJoseph D'Antoni

Deploying your Application to SQLRallyJoseph D'Antoni

Building your first sql server clusterJoseph D'Antoni

Deploying data tier applications sql saturday dcJoseph D'Antoni

Server virtualization and cloud computingJoseph D'Antoni

Management data warehouseJoseph D'Antoni

Mehr von Joseph D'Antoni (20)

Pass 2013 dantoni azure a gs

Pass bac jd_sm

Sql server 2012 ha and dr sql saturday boston

Sql Server 2012 HA and DR -- SQL Saturday Richmond

Windows server 2012 failover clustering new features

San presentation nov 2012 central pa

Always on availability groups way too deep

South jersey sql virtualization

Virtualization for DBA

Sql server 2012 ha dr 24_hop_final

Sql server 2012 ha dr nova

Sql server 2012 ha dr

Sql saturday powerpoint dc_san

Sql saturday dc vm ware

Deploying your Application to SQLRally

Building your first sql server cluster

Deploying data tier applications sql saturday dc

Server virtualization and cloud computing

Management data warehouse

Kürzlich hochgeladen

Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited

"ML in Production",Oleksandr BaganFwdays

SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3

TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc

Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University

WordPress Websites for Engineers: Elevate Your Brandgvaughan

Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB

Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity

Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3

How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3

Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3

Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed

TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey

What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett

DevEX - reference for building teams, processes, and platformsSergiu Bodiu

Advanced Computer Architecture – An IntroductionDilum Bandara

Kürzlich hochgeladen (20)

Ensuring Technical Readiness For Copilot in Microsoft 365

"ML in Production",Oleksandr Bagan

SALESFORCE EDUCATION CLOUD | FEXLE SERVICES

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy

Nell’iperspazio con Rocket: il Framework Web di Rust!

WordPress Websites for Engineers: Elevate Your Brand

Developer Data Modeling Mistakes: From Postgres to NoSQL

Dev Dives: Streamline document processing with UiPath Studio Web

Moving Beyond Passwords: FIDO Paris Seminar.pdf

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx

How AI, OpenAI, and ChatGPT impact business and software.

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx

Generative AI for Technical Writer or Information Developers

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx

Scanning the Internet for External Cloud Exposures via SSL Certs

TeamStation AI System Report LATAM IT Salaries 2024

What's New in Teams Calling, Meetings and Devices March 2024

DevEX - reference for building teams, processes, and platforms

Advanced Computer Architecture – An Introduction

Accelerating Database Performance with Compression

1. Accelerating Database Performance Using Compression Joey D’Antoni New Jersey SQL Server Users Group 19 March 2013

2.  Principal Architect SQL Server, Comcast Cable  @jdanton – Twitter About Me  Joedantoni.wordpress.com  jdanton1@yahoo.com

3. Compression—What Does It Really Mean? Overview Deduplication—What is That? What Should I Compress? Columnstore—How Is It Different?

4. Compression

5.  Specialized compression to eliminate duplicate copies of repeating data  Real Example—In VMWare, you may have 10 copies of Windows Deduplication running on same physical machine. Memory blocks (Common .dlls for example) may be deduplicated.  Backups

6. Faster performance on selects Benefits of Less I/O is required to return data Compression Better Space Utilization on Disk More Pages In Memory

7. Expenses of  Added CPU Cycles Compression  Slower bulk inserts and updates

8. Row Compression SQL Server Page Compression Compression  Prefix Compression Types  Dictionary Compression  Backup Compression

9.  In all editions of SQL Server, starting with 2008 R2  Always use Backup Compression (even when Backup your storage team says no) Compression  Space is by default pre-allocated for estimated size of uncompressed backup  Trace Flag 3042

10. SQL Data Compression gives a great deal of flexibility Can compress tables, indexes and/or So What To partitioning Compress Can use different methods of compression How to Decide?

11.  sp_estimate_data_compression_savings What Won’t Compress Well  Columns with numeric or fixed-length character data types where most values require all the bytes allocated for the specific data type Space Savings  Not much repeating data  Repeating data with non-repeating prefixes  Data stored out of the row  FILESTREAM data

12.  Microsoft advises using for Row Compression for EVERYTHING if Application you can spare 10% CPU. Don’t do this! Workloads  Page Compression has higher overhead

13. Savings Savings Table Scans Updates Decision Notes ROW % PAGE % T1 80% 90% 3.80% 57.27% ROW Low S, very high U. ROW savings close to PAGE T2 15% 89% 92.46% 0% PAGE Very high S T3 30% 81% 27.14% 4.17% ROW Low S Example T4 38% 83% 89.16% 10.54% ROW High U T5 21% 87% 0.00% 0% PAGE Append ONLY table T6 28% 87% 87.54% 0% PAGE High S, low U T7 99% appends - source Microsoft Compression White Paper T8 29% 88% 0.50% 0% PAGE 85% appends 30% 90% 11.44% 0.06% PAGE T9 84% 92% 0.02% 0.00% ROW ROW savings ~= PAGE T10 15% 89% 100.00% 0.00% PAGE Read ONLY table

14.  Tables and Indexes are rebuilt using ALTER TABLE…REBUILD and ALTER INDEX..REBUILD  Requires workspace, CPU and I/O What Happens  Same mechanism as rebuilding an index with  Free workspace required in  User Database Compression  Transaction Log  Temp B

15.  Online vs Offline How and  Concurrent vs Serial When  Order of Compressiong—start small and work Compress up Data  SORT_IN_TEMPDB

16. Table organization Table compression setting ROW PAGE Heap The newly inserted row is The newly inserted row is row-compressed. page-compressed: · if new row goes to an existing page with page compression · if the new row is inserted Managing through BULK INSERT with TABLOCK · if the new row is inserted Inserts and through INSERT INTO ... (TABLOCK) SELECT ... FROM Updates Otherwise, the row is row- compressed.* Clustered index The newly inserted row is The newly inserted row is row-compressed. page-compressed if new row goes to an existing page with page compression Otherwise, it is row compressed until the page fills up. Page compression is attempted before a page split.**

17. Mapping Version store index for Table Transaction Sort pages for (with SI or rebuilding the compression log queries RCSI isolation Underlying clustered index level) Data St ROW ROW NONE NONE ROW PAGE ROW NONE NONE ROW

18. What is Columnstore?

19.  Non Updateable  Limited Data Types  Can only be nonclustered index ColumnStore  No computed columns Limiations  No sparse columns  No indexed views  One Per Table

Hinweis der Redaktion

In computer science and information theory, data compression, source coding,[1] or bit-rate reduction involves encoding information using fewer bits than the original representation.[2] Compression can be either lossy or lossless. Lossless compression reduces bits by identifying and eliminating statistical redundancy. No information is lost in lossless compression. Lossy compression reduces bits by identifying unnecessary information and removing it.[3] The process of reducing the size of a data file is popularly referred to as data compression, although its formal name is source coding (coding done at the source of the data before it is stored or transmitted).[4]Compression is useful because it helps reduce resources usage, such as data storage space or transmission capacity. Because compressed data must be decompressed to use, this extra processing imposes computational or other costs through decompression; this situation is far from being a free lunch. Data compression is subject to a space-time complexity trade-off. For instance, a compression scheme for video may require expensive hardware for the video to be decompressed fast enough to be viewed as it is being decompressed, and the option to decompress the video in full before watching it may be inconvenient or require additional storage. The design of data compression schemes involves trade-offs among various factors, including the degree of compression, the amount of distortion introduced (e.g., when using lossy data compression), and the computational resources required to compress and uncompress the data.[5]
n computing, data deduplication is a specialized data compression technique for eliminating duplicate copies of repeating data. Related and somewhat synonymous terms are intelligent (data) compression and single-instance (data) storage. The technique is used to improve storage utilization and can also be applied to network data transfers to reduce the number of bytes that must be sent. In the deduplication process, unique chunks of data, or byte patterns, are identified and stored during a process of analysis. As the analysis continues, other chunks are compared to the stored copy and whenever a match occurs, the redundant chunk is replaced with a small reference that points to the stored chunk. Given that the same byte pattern may occur dozens, hundreds, or even thousands of times (the match frequency is dependent on the chunk size), the amount of data that must be stored or transferred can be greatly reduced.[1]This type of deduplication is different from that performed by standard file-compression tools, such as LZ77 and LZ78. Whereas these tools identify short repeated substrings inside individual files, the intent of storage-based data deduplication is to inspect large volumes of data and identify large sections – such as entire files or large sections of files – that are identical, in order to store only one copy of it. This copy may be additionally compressed by single-file compression techniques. For example a typical email system might contain 100 instances of the same onemegabyte (MB) file attachment. Each time the email platform is backed up, all 100 instances of the attachment are saved, requiring 100 MB storage space. With data deduplication, only one instance of the attachment is actually stored; the subsequent instances are referenced back to the saved copy for deduplication ratio of roughly 100 to 1.
SQL Server 2008 provides two levels of data compression – row compression and page compression. Row compression helps store data more efficiently in a row by storing fixed-length data types in variable-length storage format. A compressed row uses 4 bits per compressed column to store the length of the data in the column. NULL and 0 values across all data types take no additional space other than these 4 bits.Page compression is a superset of row compression. In addition to storing data efficiently inside a row, page compression optimizes storage of multiple rows in a page, by minimizing the data redundancy. Page compression uses prefix compression and dictionary compression. Prefix compression looks for common patterns in the beginning of the column values on a given column across all rows on each page. Dictionary compression looks for exact value matches across all columns and rows on each page. Both dictionary and prefix are type-agnostic and see every column value as a bag of bytes.The data compression feature is available in the Enterprise and Developer editions of SQL Server 2008. Databases with compressed tables or indexes cannot be restored, attached, or in any way used on other editions. To determine whether a database is using compression, query the dynamic management view (DMV)sys.dm_db_persisted_sku_features. To determine what is compressed, and how (row or page), query the data_compression_desc column in the catalog view sys.partition
For compressed backups, the size of the final backup file depends on how compressible the data is, and this is unknown before the backup operation finishes. Therefore, by default, when backing up a database using compression, the Database Engine uses a pre-allocation algorithm for the backup file. This algorithm pre-allocates a predefined percentage of the size of the database for the backup file. If more space is needed during the backup operation, the Database Engine grows the file. If the final size is less than the allocated space, at the end of the backup operation, the Database Engine shrinks the file to the actual final size of the backup.To allow the backup file to grow only as needed to reach its final size, use trace flag 3042. Trace flag 3042 causes the backup operation to bypass the default backup compression pre-allocation algorithm. This trace flag is useful if you need to save on space by allocating only the actual size required for the compressed backup. However, using this trace flag might cause a slight performance penalty (a possible increase in the duration of the backup operation).
A table or a partition can have three allocation units - IN_ROW_DATA, LOB_DATA, and ROW_OVERFLOW_DATA. The data stored in LOB_DATA and ROW_OVERFLOW_DATA allocation units is not compressed. Only the data that is stored in the IN_ROW_DATA allocation unit is compressed. Use Appendix B to understand how much data is stored in each of these three allocation units.FILESTREAM data is stored outside the database in a FILESTREAM data container on an NTFS volume. This data is not compressed.
Compressed pages are persisted as compressed on disk and stay compressed when read into memory. Data is decompressed (not the entire page, but only the data values of interest) when it meets one of the following conditions:It is read for filtering, sorting, joining, as part of a query response.It is updated by an application.There is no in-memory, decompressed copy of the compressed page. Decompressing data consumes CPU. However, because compressed data uses fewer data pages, it also saves:Physical I/O: Because physical I/O is expensive from a workload perspective, reduced physical I/O often results in a bigger saving than the additional CPU cost to compress and decompress the data. Note that physical I/O is saved both because a smaller volume of data is read from or written to disk, and because more data can remain cached in buffer pool memory.Logical I/O (if data is in memory): Because logical I/O consumes CPU, reduced logical I/O can sometimes compensate for the CPU cost to compress and decompress the data.
Here is an example how a customer used these measurements to decide which tables to page-compress. The customer had an OLTP database running on a server with average CPU utilization of approximately 20 percent. The large amount of available CPU, the significant amount of planned database growth, and the expense of storage provided motivation for data compression. The customer computed space savings, and the U and S measurements for the largest tables in the database, and targeted tables with S greater than 75 percent, U less than 20 percent for page compression. Table 1 shows the estimated row and page savings, the values of S and U, and the decision as to whether to row or page compress.Based on the metrics shown in Table 1, the customer decided to page-compress tables T2, T5, T6, T7, T8, and T10. All other tables in the database were row-compressed. Following this plan, the customer achieved 50 percent space savings, and approximately 10 percent increase in CPU utilization.
Workspace Required in the User DatabaseIn the user database, free workspace is required for the following:The compressed table or indexThe mapping index if you are compressing a heap or a clustered index with the ONLINE option set to ON and the SORT_IN_TEMPDB option set to OFF. (The recommendation is to set SORT_IN_TEMPDB to ON. The workspace requirement for the mapping index is discussed in the tempdb section.).While a table is being compressed, both the uncompressed table and the compressed table exist together until the compression is successful and committed. After the table or the index is compressed, the uncompressed table is dropped, and the space is released to the filegroup. To estimate the size of the compressed table, use the output of sp_estimate_data_compression_savings.Transaction Log SpaceThe amount of transaction log space needed depends on whether ONLINE is set to ON or OFF, and the recovery model used (full, bulk-logged, or simple).Workspace Required in tempdbIn the tempdb database, free workspace is required if ONLINE is set to ON:For the mapping index, an internal structure used to map old bookmarks to new bookmarks, enabling concurrent DML transactions. This is stored in tempdb if SORT_IN_TEMPDB is set to ON.For the version store. This is only used if there are concurrent DML operations. Size depends upon the volume of ongoing modifications and duration of long running DML transactions.
Side Effects of Compressing a Table or IndexWhen you compress a table or an index, you should be aware of two side effects:Compression includes a rebuild, thus removing fragmentation from the table or index.When a heap is compressed, if there are any nonclustered indexes on the heap, they are rebuilt as follows: o With ONLINE set to OFF, the nonclustered indexes are rebuilt one by one. o With ONLINE set to ON, all the nonclustered indexes are rebuilt simultaneously.You must account for the workspace required to rebuild the nonclustered indexes, because the space for the uncompressed heap is not released until the rebuild of the nonclustered indexes is complete.r
* The resulting row-compressed pages can be page-compressed by running a heap rebuild with page compression. ** With page compression, all the pages in the table may not actually be page-compressed. A page is page-compressed only if the space savings on that page exceeds an internally defined threshold.Updating or Deleting Compressed RowsAll updates to the rows in a row-compressed table or partition will maintain the rows in row-compressed format. Not every update to the rows in a page-compressed table or partition will cause the column prefix and page dictionary to be recomputed. When the number of changes on a given page-compressed page exceeds an internally defined threshold, the column prefix and page dictionary are recomputed.
When an application manipulates (INSERT, UPDATE, DELETE, CREATE/REBUILD INDEX, and so on) data in a table, some supporting data structures may be created by SQL Server, which may temporarily hold a subset of the data. Some of these data structures are:Transaction logMapping indexVersion storeSort pagesWhether or not these supporting structures are compressed if the data in a compressed table is manipulated depends upon the type of the data structure and the type of data compression used on the table. Table 4 summarizes the compression characteristics of the supporting data structures when a compressed table is manipulated.
In SQL Server, a columnstore index is data stored in column-wise fashion that can be used to answer a query just like data in any other type of index. A columnstore index appears as an index on a table when examining catalog views or the Object Explorer in Management Studio. The query optimizer considers the columnstore index as a data source for accessing data just like it considers other indexes when creating a query plan.This allows much greater compression. There isn’t a lot of open discussion on the compression algorithm being used, but it is much more instensive than page compression. In this case we are limited to non-updateable indexes right now. This is best used for warehouse type queries that return a lot of rows.
What data types cannot be used in a columnstore index?The following data types cannot be used in a columnstore index: decimal or numeric with precision > 18, datetimeoffset with precision > 2, binary, varbinary, image, text, ntext, varchar(max), nvarchar(max), cursor, hierarchyid, timestamp, uniqueidentifier, sqlvariant, xml.

Accelerating Database Performance with Compression

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Mehr von Joseph D'Antoni

Mehr von Joseph D'Antoni (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Accelerating Database Performance with Compression

Hinweis der Redaktion