In memory columnstore indexes--make your data warehouse

•Als PPTX, PDF herunterladen•

2 gefällt mir•2,231 views

Presentation on SQL Server 2012 and 2014 Columnstore Indexing feature presented to Philadelphia SQL BI Usergroup on November 19, 2013

Technologie

In-Memory Columnstore Indexes-Make Your Data Warehouse Fly
Joe D’Antoni
Philadelphia SQL Server Business Intelligence
Group
19 November 2013

About Me
Solution Architect, Anexinet
@jdanton – Twitter
jdanton1@yahoo.com
Joedantoni.wordpress.com – Blog, Slides

Agenda
Indexes—a basic overview
Columnstore—an introduction
Report Performance—Demo
2012 and 2014—What’s Changing?

2014—Demo
Questions

Indexes
• Data Structure that allows us to
speed data retrieval, by
maintaining an extra copy of
data
• Can be filtered
• Can be function based, or
ordered
• Penalty is that writes become
more expensive
• More storage required

Indexes in SQL Server
• Clustered vs Nonclustered
• Non-clustered index ―just
an index‖

Clustered Index
• Data is ordered as is inserted into
pages
• Data in clustered index is only
stored on disk once (it’s the data
from the tables)
• Table without a clustered index is
called a heap—no order at all

Non-Clustered Index
• Duplicate copy of the data in table
• Provides point from index to table
data
• No specific order of data in index

Data Warehouse Queries
• Data Warehouses have a lot of
data
• Querying lots of a data can
take a really long time
• Processing data row by row—
may not be the most efficient
way to perform aggregations

Traditional Approaches To Improving
Performance
• Partitioned Tables
• Indexed Views
• Data Compression

Introducing Columnstore Indexes (SQL
2012)
• Data is stored in columns, as
opposed to rows
• This allows a much higher rate of
compression
• Columns not used in a query a
simply not scanned, nor returned
• Recommended practice is to add
most columns in a table to a index

So How is So Much Faster?
• Very good compression ratio for Column oriented
data
• Better use of Memory
• Segment Elimination Skips Large Chunks of Data
• Batch Mode
• Processes data in chunks of a 1000 row
―batches‖ rather than row by row
• 7-40x CPU savings with batch mode

“The key to getting the best
performance is to make sure
your queries process the
large majority of data in

Columnstore All The Things?
• Awesome performance—so what’s
the negative?
• Can’t update/insert in 2012
• Can only be nonclustered index—
so we are storing more data on
disk
• Data types are somewhat limited
• One index per table
• Can’t be a sorted index

So Where To Use Columnstore
Indexes?
• Only on Large Tables—Fact
tables and Dimension Tables >
3 Million Rows
• Include Every Column
• Structure Queries as star joins
with grouping and aggregation

More details here

Columnstore in 2014
• Fewer Data Type Limitations
• Updateable
• Can be Clustered Index

• New Archival Compression Mode
• Batch Mode Improvements

Columnstore Updates (2014)

Updates To
Index

Collected
until they
reach 1000
rows

Tuple Movers
Move into
Index

Columnstore Updates (2014)
• Bulk Inserts go through
special API
• Updates are processed
as inserts and deletes,
so expensive
opertation

What Do We Do Differently in 2014
• Best Practices are mostly the
same
• Batch mode gets enhanced and
gains more query types
• No need to worry about dropping
and rebuilding indexes—just
append data
• Still focus on large tables where
data is not frequently updated
• Archival Compression Good for
old unused data

Contact
jdanton1@yahoo.com
Joedantoni.wordpress.com
@jdanton

In memory columnstore indexes--make your data warehouse

Weitere ähnliche Inhalte

Mehr von Joseph D'Antoni

Sql Server 2012 HA and DR -- SQL Saturday Richmond

Joseph D'Antoni

Sql server 2012 ha and dr sql saturday tampa

Joseph D'Antoni

Windows server 2012 failover clustering new features

Joseph D'Antoni

Sql server 2012 ha and dr sql saturday dc

Joseph D'Antoni

San presentation nov 2012 central pa

Joseph D'Antoni

Always on availability groups way too deep

Joseph D'Antoni

South jersey sql virtualization

Joseph D'Antoni

Virtualization for DBA

Joseph D'Antoni

Sql server 2012 ha dr 24_hop_final

Joseph D'Antoni

Sql server 2012 ha dr 24_hop_final

Joseph D'Antoni

Sql server 2012 ha dr nova

Joseph D'Antoni

Sql server 2012 ha dr

Joseph D'Antoni

Sql saturday powerpoint dc_san

Joseph D'Antoni

Sql saturday dc vm ware

Joseph D'Antoni

Deploying your Application to SQLRally

Joseph D'Antoni

Deploying data tier applications sql saturday dc

Joseph D'Antoni

Building your first sql server cluster

Joseph D'Antoni

Deploying data tier applications sql saturday dc

Joseph D'Antoni

Server virtualization and cloud computing

Joseph D'Antoni

Management data warehouse

Joseph D'Antoni

Mehr von Joseph D'Antoni (20)

Sql Server 2012 HA and DR -- SQL Saturday Richmond

Sql server 2012 ha and dr sql saturday tampa

Windows server 2012 failover clustering new features

Sql server 2012 ha and dr sql saturday dc

San presentation nov 2012 central pa

Always on availability groups way too deep

South jersey sql virtualization

Virtualization for DBA

Sql server 2012 ha dr 24_hop_final

Sql server 2012 ha dr nova

Sql server 2012 ha dr

Sql saturday powerpoint dc_san

Sql saturday dc vm ware

Deploying your Application to SQLRally

Deploying data tier applications sql saturday dc

Building your first sql server cluster

Deploying data tier applications sql saturday dc

Server virtualization and cloud computing

Management data warehouse

Kürzlich hochgeladen

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER

MadyBayot

Strategies for Landing an Oracle DBA Job as a Fresher

Remote DBA Services

[BuildWithAI] Introduction to Gemini.pdf

Sandro Moreira

presentation ICT roal in 21st century education

jfdjdjcjdnsjd

MINDCTI Revenue Release Quarter One 2024

MIND CTI

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...

Zilliz

ICT role in 21st century education and its challenges

rafiqahmad00786416

DBX First Quarter 2024 Investor Presentation

Dropbox

Whatsapp Number Escorts Call girls 8617370543 Available 24x7 Mcleodganj Call Girls Service Offer Genuine VIP Model Escorts Call Girls in Your Budget. Mcleodganj Call Girls Service Provide Real Call Girls Number. Make Your Sexual Pleasure Memorable with Our Mcleodganj Call Girls at Affordable Price. Top VIP Escorts Call Girls, High Profile Independent Escorts Call Girls, Housewife Women Escorts Call Girl, College Girls Escorts Call Girls, Russian Escorts Call girls Service in Your Budget.

Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model

Deepika Singh

Join our latest Connector Corner webinar to discover how UiPath Integration Service revolutionizes API-centric automation in a 'Quote to Cash' process—and how that automation empowers businesses to accelerate revenue generation. A comprehensive demo will explore connecting systems, GenAI, and people, through powerful pre-built connectors designed to speed process cycle times. Speakers: James Dickson, Senior Software Engineer Charlie Greenberg, Host, Product Marketing Manager

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...

DianaGray10

Dubai, known for its towering skyscrapers, luxurious lifestyle, and relentless pursuit of innovation, often finds itself in the global spotlight. However, amidst the glitz and glamour, the emirate faces its own set of challenges, including the occasional threat of flooding. In recent years, Dubai has experienced sporadic but significant floods, disrupting normalcy and posing unique challenges to its infrastructure. Among the critical nodes in this bustling metropolis is the Dubai International Airport, a vital hub connecting the world. This article delves into the intersection of Dubai flood events and the resilience demonstrated by the Dubai International Airport in the face of such challenges.

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf

Orbitshub

Exploring Multimodal Embeddings with Milvus

Zilliz

Angeliki Cooney has spent over twenty years at the forefront of the life sciences industry, working out of Wynantskill, NY. She is highly regarded for her dedication to advancing the development and accessibility of innovative treatments for chronic diseases, rare disorders, and cancer. Her professional journey has centered on strategic consulting for biopharmaceutical companies, facilitating digital transformation, enhancing omnichannel engagement, and refining strategic commercial practices. Angeliki's innovative contributions include pioneering several software-as-a-service (SaaS) products for the life sciences sector, earning her three patents. As the Senior Vice President of Life Sciences at Avenga, Angeliki orchestrated the firm's strategic entry into the U.S. market. Avenga, a renowned digital engineering and consulting firm, partners with significant entities in the pharmaceutical and biotechnology fields. Her leadership was instrumental in expanding Avenga's client base and establishing its presence in the competitive U.S. market.

Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...

Angeliki Cooney

Following the popularity of “Cloud Revolution: Exploring the New Wave of Serverless Spatial Data,” we’re thrilled to announce this much-anticipated encore webinar. In this sequel, we’ll dive deeper into the Cloud-Native realm by uncovering practical applications and FME support for these new formats, including COGs, COPC, FlatGeoBuf, GeoParquet, STAC, and ZARR. Building on the foundation laid by industry leaders Michelle Roby of Radiant Earth and Chris Holmes of Planet in the first webinar, this second part offers an in-depth look at the real-world application and behind-the-scenes dynamics of these cutting-edge formats. We will spotlight specific use-cases and workflows, showcasing their efficiency and relevance in practical scenarios. Discover the vast possibilities each format holds, highlighted through detailed discussions and demonstrations. Our expert speakers will dissect the key aspects and provide critical takeaways for effective use, ensuring attendees leave with a thorough understanding of how to apply these formats in their own projects. Elevate your understanding of how FME supports these cutting-edge technologies, enhancing your ability to manage, share, and analyze spatial data. Whether you’re building on knowledge from our initial session or are new to the serverless spatial data landscape, this webinar is your gateway to mastering cloud-native formats in your workflows.

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Safe Software

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

Product Anonymous

Elevate Developer Efficiency & build GenAI Application with Amazon Q

Bhuvaneswari Subramani

Dubai, often portrayed as a shimmering oasis in the desert, faces its own set of challenges, including the occasional threat of flooding. Despite its reputation for opulence and modernity, the emirate is not immune to the forces of nature. In recent years, Dubai has experienced sporadic but significant floods, testing the resilience of its infrastructure and communities. Among the critical lifelines in this bustling metropolis is the Dubai International Airport, a bustling hub that connects the city to the world. This article explores the intersection of Dubai flood events and the resilience demonstrated by the Dubai International Airport in the face of such challenges.

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...

Orbitshub

Architecting Cloud Native Applications

WSO2

Webinar Recording: https://www.panagenda.com/webinars/why-teams-call-analytics-is-critical-to-your-entire-business Nothing is as frustrating and noticeable as being in an important call and being unable to see or hear the other person. Not surprising then, that issues with Teams calls are among the most common problems users call their helpdesk for. Having in depth insight into everything relevant going on at the user’s device, local network, ISP and Microsoft itself during the call is crucial for good Microsoft Teams Call quality support. To ensure a quick and adequate solution and to ensure your users get the most out of their Microsoft 365. But did you know that ‘bad calls’ are also an excellent indicator of other problems arising? Precisely because it is so noticeable!? Like the canary in the mine, bad calls can be early indicators of problems. Problems that might otherwise not have been noticed for a while but can have a big impact on productivity and satisfaction. Join this session by Christoph Adler to learn how true Microsoft Teams call quality analytics helped other organizations troubleshoot bad calls and identify and fix problems that impacted Teams calls or the use of Microsoft365 in general. See what it can do to keep your users happy and productive! In this session we will cover - Why CQD data alone is not enough to troubleshoot call problems - The importance of attributing call problems to the right call participant - What call quality analytics can do to help you quickly find, fix-, and prevent problems - Why having retrospective detailed insights matters - Real life examples of how others have used Microsoft Teams call quality monitoring to problem shoot problems with their ISP, network, device health and more.

Why Teams call analytics are critical to your entire business

panagenda

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood

Juan lago vázquez

Kürzlich hochgeladen (20)

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER

Strategies for Landing an Oracle DBA Job as a Fresher

[BuildWithAI] Introduction to Gemini.pdf

presentation ICT roal in 21st century education

MINDCTI Revenue Release Quarter One 2024

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...

ICT role in 21st century education and its challenges

DBX First Quarter 2024 Investor Presentation

Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf

Exploring Multimodal Embeddings with Milvus

Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

Elevate Developer Efficiency & build GenAI Application with Amazon Q

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...

Architecting Cloud Native Applications

Why Teams call analytics are critical to your entire business

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood

In memory columnstore indexes--make your data warehouse

1. In-Memory Columnstore Indexes-Make Your Data Warehouse Fly Joe D’Antoni Philadelphia SQL Server Business Intelligence Group 19 November 2013

2. About Me Solution Architect, Anexinet @jdanton – Twitter jdanton1@yahoo.com Joedantoni.wordpress.com – Blog, Slides

3. Agenda Indexes—a basic overview Columnstore—an introduction Report Performance—Demo 2012 and 2014—What’s Changing? 2014—Demo Questions

4. Indexes • Data Structure that allows us to speed data retrieval, by maintaining an extra copy of data • Can be filtered • Can be function based, or ordered • Penalty is that writes become more expensive • More storage required

5. Indexes in SQL Server • Clustered vs Nonclustered • Non-clustered index ―just an index‖

6. Clustered Index • Data is ordered as is inserted into pages • Data in clustered index is only stored on disk once (it’s the data from the tables) • Table without a clustered index is called a heap—no order at all

7. Non-Clustered Index • Duplicate copy of the data in table • Provides point from index to table data • No specific order of data in index

8. So Why All This Talk About Indexes?

9. Data Warehouse Queries • Data Warehouses have a lot of data • Querying lots of a data can take a really long time • Processing data row by row— may not be the most efficient way to perform aggregations

10. Traditional Approaches To Improving Performance • Partitioned Tables • Indexed Views • Data Compression

11. Introducing Columnstore Indexes (SQL 2012) • Data is stored in columns, as opposed to rows • This allows a much higher rate of compression • Columns not used in a query a simply not scanned, nor returned • Recommended practice is to add most columns in a table to a index

12. Columnar Data Storage

13. Columnstore 2012 Demo

14. So How is So Much Faster? • Very good compression ratio for Column oriented data • Better use of Memory • Segment Elimination Skips Large Chunks of Data • Batch Mode • Processes data in chunks of a 1000 row ―batches‖ rather than row by row • 7-40x CPU savings with batch mode “The key to getting the best performance is to make sure your queries process the large majority of data in

15. Columnstore All The Things? • Awesome performance—so what’s the negative? • Can’t update/insert in 2012 • Can only be nonclustered index— so we are storing more data on disk • Data types are somewhat limited • One index per table • Can’t be a sorted index

16. So Where To Use Columnstore Indexes? • Only on Large Tables—Fact tables and Dimension Tables > 3 Million Rows • Include Every Column • Structure Queries as star joins with grouping and aggregation More details here

17. Columnstore 2014

18. Columnstore in 2014 • Fewer Data Type Limitations • Updateable • Can be Clustered Index • New Archival Compression Mode • Batch Mode Improvements

19. Columnstore Updates (2014) Updates To Index Collected until they reach 1000 rows Tuple Movers Move into Index

20. Columnstore Updates (2014) • Bulk Inserts go through special API • Updates are processed as inserts and deletes, so expensive opertation

21. Columnstore 2014 Demo

22. What Do We Do Differently in 2014 • Best Practices are mostly the same • Batch mode gets enhanced and gains more query types • No need to worry about dropping and rebuilding indexes—just append data • Still focus on large tables where data is not frequently updated • Archival Compression Good for old unused data

23. Questions

24. Contact jdanton1@yahoo.com Joedantoni.wordpress.com @jdanton

Hinweis der Redaktion

Clustered indexes sort and store the data rows in the table or view based on their key values. These are the columns included in the index definition. There can be only one clustered index per table, because the data rows themselves can be sorted in only one order.The only time the data rows in a table are stored in sorted order is when the table contains a clustered index. When a table has a clustered index, the table is called a clustered table. If a table has no clustered index, its data rows are stored in an unordered structure called a heap.
. Generally, nonclustered indexes are created to improve the performance of frequently used queries not covered by the clustered index or to locate rows in a table without a clustered index (called a heap). You can create multiple nonclustered indexes on a table or indexed view.
The columnstore index in SQL Server employs Microsoft’s patented Vertipaq™ technology, which itshares with SQL Server Analysis Services and PowerPivot. SQL Server columnstore indexes don’t have tofit in main memory, but they can effectively use as much memory as is available on the server. Portionsof columns are moved in and out of memory on demand
What data types cannot be used in a columnstore index?The following data types cannot be used in a columnstore index: decimal or numeric with precision > 18, datetimeoffset with precision > 2, binary, varbinary, image, text, ntext, varchar(max), nvarchar(max), cursor, hierarchyid, timestamp, uniqueidentifier, sqlvariant, xml.The SQL Server 2012 implementation did not support a number of data types such as numeric beyond precision 18, datetimeoffset beyond precision 2, GUID and binary columns. The upcoming version adds support for all the above data types. It also introducessupport for storing short strings by value instead of converting all strings to a 32 bit id within a dictionary. This removes the extraoverhead associated with the dictionary and helps improve the column store compression even further.
Include every column of the table in the columnstore index. If you don't, then a query that references a column not included in the index will not benefit from the columnstores index much or at all.Structure your queries as star joins with grouping and aggregation as much as possible. Avoid joining pairs of large tables. Join a single large fact table to one or more smaller dimensions using standard inner joins. Use a dimensional modeling approach for your data as much as possible to allow you to structure your queries this way.Use best practices for statistics management and query design. This is independent of columnstore technology. Use good statistics and avoid query design pitfalls to get the best performance. See the white paper on SQL Server statistics for guidance. In particular, see the section "Best Practices for Managing Statistics."

In memory columnstore indexes--make your data warehouse

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Mehr von Joseph D'Antoni

Mehr von Joseph D'Antoni (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

In memory columnstore indexes--make your data warehouse

Hinweis der Redaktion