SlideShare ist ein Scribd-Unternehmen logo
1 von 19
Accelerating Database
Performance Using
Compression
Joey D’Antoni
New Jersey SQL Server Users Group
19 March 2013
 Principal Architect SQL Server, Comcast Cable
            @jdanton – Twitter
About Me
            Joedantoni.wordpress.com
            jdanton1@yahoo.com
Compression—What Does It Really
            Mean?
Overview   Deduplication—What is That?
           What Should I Compress?
           Columnstore—How Is It Different?
Compression
 Specialized compression to eliminate duplicate copies of
                  repeating data
                 Real Example—In VMWare, you may have 10 copies of Windows
Deduplication     running on same physical machine. Memory blocks (Common .dlls
                  for example) may be deduplicated.
                 Backups
Faster performance on selects
Benefits of     Less I/O is required to return data
Compression   Better Space Utilization on Disk
              More Pages In Memory
Expenses of    Added CPU Cycles
Compression    Slower bulk inserts and updates
Row Compression
SQL Server    Page Compression
Compression      Prefix Compression
Types            Dictionary Compression
               Backup Compression
 In all editions of SQL Server, starting with 2008
                R2
               Always use Backup Compression (even when
Backup          your storage team says no)
Compression    Space is by default pre-allocated for estimated
                size of uncompressed backup
               Trace Flag 3042
SQL Data Compression gives a great deal
              of flexibility
             Can compress tables, indexes and/or
So What To    partitioning
Compress
             Can use different methods of
              compression
             How to Decide?
 sp_estimate_data_compression_savings
                What Won’t Compress Well
                    Columns with numeric or fixed-length character data types where most
                    values require all the bytes allocated for the specific data type
Space Savings    Not much repeating data
                 Repeating data with non-repeating prefixes
                 Data stored out of the row
                 FILESTREAM data
 Microsoft advises using for Row Compression for EVERYTHING if
Application     you can spare 10% CPU. Don’t do this!
Workloads      Page Compression has higher overhead
Savings   Savings
                                           Table                       Scans    Updates   Decision   Notes
                                                   ROW %     PAGE %
                                           T1      80%       90%       3.80%    57.27%    ROW        Low S, very high
                                                                                                     U. ROW savings
                                                                                                     close to PAGE



                                           T2      15%       89%       92.46%   0%        PAGE       Very high S

                                           T3      30%       81%       27.14%   4.17%     ROW        Low S

Example                                    T4      38%       83%       89.16%   10.54%    ROW        High U

                                           T5      21%       87%       0.00%    0%        PAGE       Append ONLY
                                                                                                     table

                                           T6      28%       87%       87.54%   0%        PAGE       High S, low U

                                           T7                                                        99% appends
-
source Microsoft Compression White Paper
                                           T8
                                                   29%       88%       0.50%    0%        PAGE
                                                                                                     85% appends
                                                   30%       90%       11.44%   0.06%     PAGE
                                           T9      84%       92%       0.02%    0.00%     ROW        ROW savings ~=
                                                                                                     PAGE

                                           T10     15%       89%       100.00% 0.00%      PAGE       Read ONLY table
 Tables and Indexes are rebuilt using ALTER TABLE…REBUILD and
                 ALTER INDEX..REBUILD
                Requires workspace, CPU and I/O
What Happens    Same mechanism as rebuilding an index
with            Free workspace required in
                    User Database
Compression         Transaction Log
                    Temp B
 Online vs Offline
How and
            Concurrent vs Serial
When
            Order of Compressiong—start small and work
Compress     up
Data        SORT_IN_TEMPDB
Table organization        Table compression setting
                                   ROW                         PAGE
              Heap                 The newly inserted row is   The newly inserted row is
                                   row-compressed.             page-compressed:
                                                               ·    if new row goes to an
                                                               existing page with page
                                                               compression
                                                               ·    if the new row is inserted

Managing                                                       through BULK INSERT with
                                                               TABLOCK
                                                               ·    if the new row is inserted
Inserts and                                                    through INSERT INTO ...
                                                               (TABLOCK) SELECT ... FROM
Updates                                                        Otherwise, the row is row-
                                                               compressed.*
              Clustered index      The newly inserted row is   The newly inserted row is
                                   row-compressed.             page-compressed if new row
                                                               goes to an existing page with
                                                               page compression Otherwise,
                                                               it is row compressed until the
                                                               page fills up. Page
                                                               compression is attempted
                                                               before a page split.**
Mapping
                                                                         Version store
                                         index for
             Table         Transaction                  Sort pages for   (with SI or
                                         rebuilding the
             compression   log                          queries          RCSI isolation
Underlying                               clustered
                                         index
                                                                         level)

Data St      ROW           ROW           NONE           NONE             ROW

             PAGE          ROW           NONE           NONE             ROW
What is
Columnstore?
 Non Updateable
               Limited Data Types
               Can only be nonclustered index
ColumnStore    No computed columns
Limiations     No sparse columns
               No indexed views
               One Per Table

Weitere ähnliche Inhalte

Mehr von Joseph D'Antoni

Pass 2013 dantoni azure a gs
Pass 2013 dantoni azure a gsPass 2013 dantoni azure a gs
Pass 2013 dantoni azure a gsJoseph D'Antoni
 
Sql server 2012 ha and dr sql saturday boston
Sql server 2012 ha and dr sql saturday bostonSql server 2012 ha and dr sql saturday boston
Sql server 2012 ha and dr sql saturday bostonJoseph D'Antoni
 
Sql Server 2012 HA and DR -- SQL Saturday Richmond
Sql Server 2012 HA and DR -- SQL Saturday RichmondSql Server 2012 HA and DR -- SQL Saturday Richmond
Sql Server 2012 HA and DR -- SQL Saturday RichmondJoseph D'Antoni
 
Windows server 2012 failover clustering new features
Windows server 2012 failover clustering new featuresWindows server 2012 failover clustering new features
Windows server 2012 failover clustering new featuresJoseph D'Antoni
 
San presentation nov 2012 central pa
San presentation nov 2012 central paSan presentation nov 2012 central pa
San presentation nov 2012 central paJoseph D'Antoni
 
Always on availability groups way too deep
Always on availability groups way too deepAlways on availability groups way too deep
Always on availability groups way too deepJoseph D'Antoni
 
South jersey sql virtualization
South jersey sql virtualizationSouth jersey sql virtualization
South jersey sql virtualizationJoseph D'Antoni
 
Sql server 2012 ha dr 24_hop_final
Sql server 2012 ha dr 24_hop_finalSql server 2012 ha dr 24_hop_final
Sql server 2012 ha dr 24_hop_finalJoseph D'Antoni
 
Sql server 2012 ha dr 24_hop_final
Sql server 2012 ha dr 24_hop_finalSql server 2012 ha dr 24_hop_final
Sql server 2012 ha dr 24_hop_finalJoseph D'Antoni
 
Sql server 2012 ha dr nova
Sql server 2012 ha dr novaSql server 2012 ha dr nova
Sql server 2012 ha dr novaJoseph D'Antoni
 
Sql saturday powerpoint dc_san
Sql saturday powerpoint dc_sanSql saturday powerpoint dc_san
Sql saturday powerpoint dc_sanJoseph D'Antoni
 
Deploying your Application to SQLRally
Deploying your Application to SQLRallyDeploying your Application to SQLRally
Deploying your Application to SQLRallyJoseph D'Antoni
 
Building your first sql server cluster
Building your first sql server clusterBuilding your first sql server cluster
Building your first sql server clusterJoseph D'Antoni
 
Deploying data tier applications sql saturday dc
Deploying data tier applications sql saturday dcDeploying data tier applications sql saturday dc
Deploying data tier applications sql saturday dcJoseph D'Antoni
 
Server virtualization and cloud computing
Server virtualization and cloud computingServer virtualization and cloud computing
Server virtualization and cloud computingJoseph D'Antoni
 
Management data warehouse
Management data warehouseManagement data warehouse
Management data warehouseJoseph D'Antoni
 

Mehr von Joseph D'Antoni (20)

Pass 2013 dantoni azure a gs
Pass 2013 dantoni azure a gsPass 2013 dantoni azure a gs
Pass 2013 dantoni azure a gs
 
Pass bac jd_sm
Pass bac jd_smPass bac jd_sm
Pass bac jd_sm
 
Sql server 2012 ha and dr sql saturday boston
Sql server 2012 ha and dr sql saturday bostonSql server 2012 ha and dr sql saturday boston
Sql server 2012 ha and dr sql saturday boston
 
Sql Server 2012 HA and DR -- SQL Saturday Richmond
Sql Server 2012 HA and DR -- SQL Saturday RichmondSql Server 2012 HA and DR -- SQL Saturday Richmond
Sql Server 2012 HA and DR -- SQL Saturday Richmond
 
Windows server 2012 failover clustering new features
Windows server 2012 failover clustering new featuresWindows server 2012 failover clustering new features
Windows server 2012 failover clustering new features
 
San presentation nov 2012 central pa
San presentation nov 2012 central paSan presentation nov 2012 central pa
San presentation nov 2012 central pa
 
Always on availability groups way too deep
Always on availability groups way too deepAlways on availability groups way too deep
Always on availability groups way too deep
 
South jersey sql virtualization
South jersey sql virtualizationSouth jersey sql virtualization
South jersey sql virtualization
 
Virtualization for DBA
Virtualization for DBAVirtualization for DBA
Virtualization for DBA
 
Sql server 2012 ha dr 24_hop_final
Sql server 2012 ha dr 24_hop_finalSql server 2012 ha dr 24_hop_final
Sql server 2012 ha dr 24_hop_final
 
Sql server 2012 ha dr 24_hop_final
Sql server 2012 ha dr 24_hop_finalSql server 2012 ha dr 24_hop_final
Sql server 2012 ha dr 24_hop_final
 
Sql server 2012 ha dr nova
Sql server 2012 ha dr novaSql server 2012 ha dr nova
Sql server 2012 ha dr nova
 
Sql server 2012 ha dr
Sql server 2012 ha drSql server 2012 ha dr
Sql server 2012 ha dr
 
Sql saturday powerpoint dc_san
Sql saturday powerpoint dc_sanSql saturday powerpoint dc_san
Sql saturday powerpoint dc_san
 
Sql saturday dc vm ware
Sql saturday dc vm wareSql saturday dc vm ware
Sql saturday dc vm ware
 
Deploying your Application to SQLRally
Deploying your Application to SQLRallyDeploying your Application to SQLRally
Deploying your Application to SQLRally
 
Building your first sql server cluster
Building your first sql server clusterBuilding your first sql server cluster
Building your first sql server cluster
 
Deploying data tier applications sql saturday dc
Deploying data tier applications sql saturday dcDeploying data tier applications sql saturday dc
Deploying data tier applications sql saturday dc
 
Server virtualization and cloud computing
Server virtualization and cloud computingServer virtualization and cloud computing
Server virtualization and cloud computing
 
Management data warehouse
Management data warehouseManagement data warehouse
Management data warehouse
 

Kürzlich hochgeladen

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 

Kürzlich hochgeladen (20)

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 

Accelerating Database Performance with Compression

  • 1. Accelerating Database Performance Using Compression Joey D’Antoni New Jersey SQL Server Users Group 19 March 2013
  • 2.  Principal Architect SQL Server, Comcast Cable  @jdanton – Twitter About Me  Joedantoni.wordpress.com  jdanton1@yahoo.com
  • 3. Compression—What Does It Really Mean? Overview Deduplication—What is That? What Should I Compress? Columnstore—How Is It Different?
  • 5.  Specialized compression to eliminate duplicate copies of repeating data  Real Example—In VMWare, you may have 10 copies of Windows Deduplication running on same physical machine. Memory blocks (Common .dlls for example) may be deduplicated.  Backups
  • 6. Faster performance on selects Benefits of Less I/O is required to return data Compression Better Space Utilization on Disk More Pages In Memory
  • 7. Expenses of  Added CPU Cycles Compression  Slower bulk inserts and updates
  • 8. Row Compression SQL Server Page Compression Compression  Prefix Compression Types  Dictionary Compression  Backup Compression
  • 9.  In all editions of SQL Server, starting with 2008 R2  Always use Backup Compression (even when Backup your storage team says no) Compression  Space is by default pre-allocated for estimated size of uncompressed backup  Trace Flag 3042
  • 10. SQL Data Compression gives a great deal of flexibility Can compress tables, indexes and/or So What To partitioning Compress Can use different methods of compression How to Decide?
  • 11.  sp_estimate_data_compression_savings What Won’t Compress Well  Columns with numeric or fixed-length character data types where most values require all the bytes allocated for the specific data type Space Savings  Not much repeating data  Repeating data with non-repeating prefixes  Data stored out of the row  FILESTREAM data
  • 12.  Microsoft advises using for Row Compression for EVERYTHING if Application you can spare 10% CPU. Don’t do this! Workloads  Page Compression has higher overhead
  • 13. Savings Savings Table Scans Updates Decision Notes ROW % PAGE % T1 80% 90% 3.80% 57.27% ROW Low S, very high U. ROW savings close to PAGE T2 15% 89% 92.46% 0% PAGE Very high S T3 30% 81% 27.14% 4.17% ROW Low S Example T4 38% 83% 89.16% 10.54% ROW High U T5 21% 87% 0.00% 0% PAGE Append ONLY table T6 28% 87% 87.54% 0% PAGE High S, low U T7 99% appends - source Microsoft Compression White Paper T8 29% 88% 0.50% 0% PAGE 85% appends 30% 90% 11.44% 0.06% PAGE T9 84% 92% 0.02% 0.00% ROW ROW savings ~= PAGE T10 15% 89% 100.00% 0.00% PAGE Read ONLY table
  • 14.  Tables and Indexes are rebuilt using ALTER TABLE…REBUILD and ALTER INDEX..REBUILD  Requires workspace, CPU and I/O What Happens  Same mechanism as rebuilding an index with  Free workspace required in  User Database Compression  Transaction Log  Temp B
  • 15.  Online vs Offline How and  Concurrent vs Serial When  Order of Compressiong—start small and work Compress up Data  SORT_IN_TEMPDB
  • 16. Table organization Table compression setting ROW PAGE Heap The newly inserted row is The newly inserted row is row-compressed. page-compressed: · if new row goes to an existing page with page compression · if the new row is inserted Managing through BULK INSERT with TABLOCK · if the new row is inserted Inserts and through INSERT INTO ... (TABLOCK) SELECT ... FROM Updates Otherwise, the row is row- compressed.* Clustered index The newly inserted row is The newly inserted row is row-compressed. page-compressed if new row goes to an existing page with page compression Otherwise, it is row compressed until the page fills up. Page compression is attempted before a page split.**
  • 17. Mapping Version store index for Table Transaction Sort pages for (with SI or rebuilding the compression log queries RCSI isolation Underlying clustered index level) Data St ROW ROW NONE NONE ROW PAGE ROW NONE NONE ROW
  • 19.  Non Updateable  Limited Data Types  Can only be nonclustered index ColumnStore  No computed columns Limiations  No sparse columns  No indexed views  One Per Table

Hinweis der Redaktion

  1. In computer science and information theory, data compression, source coding,[1] or bit-rate reduction involves encoding information using fewer bits than the original representation.[2] Compression can be either lossy or lossless. Lossless compression reduces bits by identifying and eliminating statistical redundancy. No information is lost in lossless compression. Lossy compression reduces bits by identifying unnecessary information and removing it.[3] The process of reducing the size of a data file is popularly referred to as data compression, although its formal name is source coding (coding done at the source of the data before it is stored or transmitted).[4]Compression is useful because it helps reduce resources usage, such as data storage space or transmission capacity. Because compressed data must be decompressed to use, this extra processing imposes computational or other costs through decompression; this situation is far from being a free lunch. Data compression is subject to a space-time complexity trade-off. For instance, a compression scheme for video may require expensive hardware for the video to be decompressed fast enough to be viewed as it is being decompressed, and the option to decompress the video in full before watching it may be inconvenient or require additional storage. The design of data compression schemes involves trade-offs among various factors, including the degree of compression, the amount of distortion introduced (e.g., when using lossy data compression), and the computational resources required to compress and uncompress the data.[5]
  2. n computing, data deduplication is a specialized data compression technique for eliminating duplicate copies of repeating data. Related and somewhat synonymous terms are intelligent (data) compression and single-instance (data) storage. The technique is used to improve storage utilization and can also be applied to network data transfers to reduce the number of bytes that must be sent. In the deduplication process, unique chunks of data, or byte patterns, are identified and stored during a process of analysis. As the analysis continues, other chunks are compared to the stored copy and whenever a match occurs, the redundant chunk is replaced with a small reference that points to the stored chunk. Given that the same byte pattern may occur dozens, hundreds, or even thousands of times (the match frequency is dependent on the chunk size), the amount of data that must be stored or transferred can be greatly reduced.[1]This type of deduplication is different from that performed by standard file-compression tools, such as LZ77 and LZ78. Whereas these tools identify short repeated substrings inside individual files, the intent of storage-based data deduplication is to inspect large volumes of data and identify large sections – such as entire files or large sections of files – that are identical, in order to store only one copy of it. This copy may be additionally compressed by single-file compression techniques. For example a typical email system might contain 100 instances of the same onemegabyte (MB) file attachment. Each time the email platform is backed up, all 100 instances of the attachment are saved, requiring 100 MB storage space. With data deduplication, only one instance of the attachment is actually stored; the subsequent instances are referenced back to the saved copy for deduplication ratio of roughly 100 to 1.
  3. SQL Server 2008 provides two levels of data compression – row compression and page compression. Row compression helps store data more efficiently in a row by storing fixed-length data types in variable-length storage format. A compressed row uses 4 bits per compressed column to store the length of the data in the column. NULL and 0 values across all data types take no additional space other than these 4 bits.Page compression is a superset of row compression. In addition to storing data efficiently inside a row, page compression optimizes storage of multiple rows in a page, by minimizing the data redundancy. Page compression uses prefix compression and dictionary compression. Prefix compression looks for common patterns in the beginning of the column values on a given column across all rows on each page. Dictionary compression looks for exact value matches across all columns and rows on each page. Both dictionary and prefix are type-agnostic and see every column value as a bag of bytes.The data compression feature is available in the Enterprise and Developer editions of SQL Server 2008. Databases with compressed tables or indexes cannot be restored, attached, or in any way used on other editions. To determine whether a database is using compression, query the dynamic management view (DMV)sys.dm_db_persisted_sku_features. To determine what is compressed, and how (row or page), query the data_compression_desc column in the catalog view sys.partition
  4. For compressed backups, the size of the final backup file depends on how compressible the data is, and this is unknown before the backup operation finishes. Therefore, by default, when backing up a database using compression, the Database Engine uses a pre-allocation algorithm for the backup file. This algorithm pre-allocates a predefined percentage of the size of the database for the backup file. If more space is needed during the backup operation, the Database Engine grows the file. If the final size is less than the allocated space, at the end of the backup operation, the Database Engine shrinks the file to the actual final size of the backup.To allow the backup file to grow only as needed to reach its final size, use trace flag 3042. Trace flag 3042 causes the backup operation to bypass the default backup compression pre-allocation algorithm. This trace flag is useful if you need to save on space by allocating only the actual size required for the compressed backup. However, using this trace flag might cause a slight performance penalty (a possible increase in the duration of the backup operation).
  5. A table or a partition can have three allocation units - IN_ROW_DATA, LOB_DATA, and ROW_OVERFLOW_DATA. The data stored in LOB_DATA and ROW_OVERFLOW_DATA allocation units is not compressed. Only the data that is stored in the IN_ROW_DATA allocation unit is compressed. Use Appendix B to understand how much data is stored in each of these three allocation units.FILESTREAM data is stored outside the database in a FILESTREAM data container on an NTFS volume. This data is not compressed.
  6. Compressed pages are persisted as compressed on disk and stay compressed when read into memory. Data is decompressed (not the entire page, but only the data values of interest) when it meets one of the following conditions:It is read for filtering, sorting, joining, as part of a query response.It is updated by an application.There is no in-memory, decompressed copy of the compressed page. Decompressing data consumes CPU. However, because compressed data uses fewer data pages, it also saves:Physical I/O: Because physical I/O is expensive from a workload perspective, reduced physical I/O often results in a bigger saving than the additional CPU cost to compress and decompress the data. Note that physical I/O is saved both because a smaller volume of data is read from or written to disk, and because more data can remain cached in buffer pool memory.Logical I/O (if data is in memory): Because logical I/O consumes CPU, reduced logical I/O can sometimes compensate for the CPU cost to compress and decompress the data. 
  7. Here is an example how a customer used these measurements to decide which tables to page-compress. The customer had an OLTP database running on a server with average CPU utilization of approximately 20 percent. The large amount of available CPU, the significant amount of planned database growth, and the expense of storage provided motivation for data compression. The customer computed space savings, and the U and S measurements for the largest tables in the database, and targeted tables with S greater than 75 percent, U less than 20 percent for page compression. Table 1 shows the estimated row and page savings, the values of S and U, and the decision as to whether to row or page compress.Based on the metrics shown in Table 1, the customer decided to page-compress tables T2, T5, T6, T7, T8, and T10. All other tables in the database were row-compressed. Following this plan, the customer achieved 50 percent space savings, and approximately 10 percent increase in CPU utilization.
  8. Workspace Required in the User DatabaseIn the user database, free workspace is required for the following:The compressed table or indexThe mapping index if you are compressing a heap or a clustered index with the ONLINE option set to ON and the SORT_IN_TEMPDB option set to OFF. (The recommendation is to set SORT_IN_TEMPDB to ON. The workspace requirement for the mapping index is discussed in the tempdb section.).While a table is being compressed, both the uncompressed table and the compressed table exist together until the compression is successful and committed. After the table or the index is compressed, the uncompressed table is dropped, and the space is released to the filegroup. To estimate the size of the compressed table, use the output of sp_estimate_data_compression_savings.Transaction Log SpaceThe amount of transaction log space needed depends on whether ONLINE is set to ON or OFF, and the recovery model used (full, bulk-logged, or simple).Workspace Required in tempdbIn the tempdb database, free workspace is required if ONLINE is set to ON:For the mapping index, an internal structure used to map old bookmarks to new bookmarks, enabling concurrent DML transactions. This is stored in tempdb if SORT_IN_TEMPDB is set to ON.For the version store. This is only used if there are concurrent DML operations. Size depends upon the volume of ongoing modifications and duration of long running DML transactions.
  9. Side Effects of Compressing a Table or IndexWhen you compress a table or an index, you should be aware of two side effects:Compression includes a rebuild, thus removing fragmentation from the table or index.When a heap is compressed, if there are any nonclustered indexes on the heap, they are rebuilt as follows:            o   With ONLINE set to OFF, the nonclustered indexes are rebuilt one by one.            o   With ONLINE set to ON, all the nonclustered indexes are rebuilt simultaneously.You must account for the workspace required to rebuild the nonclustered indexes, because the space for the uncompressed heap is not released until the rebuild of the nonclustered indexes is complete.r
  10. * The resulting row-compressed pages can be page-compressed by running a heap rebuild with page compression. ** With page compression, all the pages in the table may not actually be page-compressed. A page is page-compressed only if the space savings on that page exceeds an internally defined threshold.Updating or Deleting Compressed RowsAll updates to the rows in a row-compressed table or partition will maintain the rows in row-compressed format. Not every update to the rows in a page-compressed table or partition will cause the column prefix and page dictionary to be recomputed. When the number of changes on a given page-compressed page exceeds an internally defined threshold, the column prefix and page dictionary are recomputed.
  11. When an application manipulates (INSERT, UPDATE, DELETE, CREATE/REBUILD INDEX, and so on) data in a table, some supporting data structures may be created by SQL Server, which may temporarily hold a subset of the data. Some of these data structures are:Transaction logMapping indexVersion storeSort pagesWhether or not these supporting structures are compressed if the data in a compressed table is manipulated depends upon the type of the data structure and the type of data compression used on the table. Table 4 summarizes the compression characteristics of the supporting data structures when a compressed table is manipulated.
  12. In SQL Server, a columnstore index is data stored in column-wise fashion that can be used to answer a query just like data in any other type of index. A columnstore index appears as an index on a table when examining catalog views or the Object Explorer in Management Studio. The query optimizer considers the columnstore index as a data source for accessing data just like it considers other indexes when creating a query plan.This allows much greater compression. There isn’t a lot of open discussion on the compression algorithm being used, but it is much more instensive than page compression. In this case we are limited to non-updateable indexes right now. This is best used for warehouse type queries that return a lot of rows.
  13. What data types cannot be used in a columnstore index?The following data types cannot be used in a columnstore index: decimal or numeric with precision > 18, datetimeoffset with precision > 2, binary, varbinary, image, text, ntext, varchar(max), nvarchar(max), cursor, hierarchyid, timestamp, uniqueidentifier, sqlvariant, xml.