SlideShare ist ein Scribd-Unternehmen logo
1 von 53
Downloaden Sie, um offline zu lesen
1. Introduction: Why Scale?
2. Vertical & Horizontal
Partitioning
3. Partitioned Tables
4. Distributed Partitioned Views
5. Database Sharding
6. Stretch Databases (optional)
Ralph: Who am I?
• An Enterprise Architect
• at iGamingCloud, Gaming Innovation Group
• focus on Data Platforms
• A Microsoft Certified Trainer
• deliver MTA, MCSA, MCSE locally
• covering Windows, SQL Server, C#
• I’m here to describe the need for database scalability, describe a
number of possible cross platform solutions, and demonstrate
technologies available in MS SQL Server 2016 and Azure.
1. Introduction
Why do we need to scale databases?
Overview of possible options
Scaling Databases: Why?
• Most application environments are developed as a monolith, a single application running a single
database on a single server.
• In time, the whole application environment starts slowing down:
• increased data volumes
• increased work loads
• The simplest option is to introduce an app/web farm to balance the application across multiple
servers whilst using the same old single database.
• But this might not be enough… we need to scale the database!
Scaling Databases: Optimisations
• Unless the whole application environment is redesigned and redeveloped, one needs to look into
optimising the database layer.
• Large database problems include:
• Queries become slower, possibly giving time-outs under load
• Backups are slower to take, to ship, and to restore
• Performing index maintenance impacts even more
• Common optimisations include:
• Vertical Scaling: scale up current servers to max disk/memory/cpu, or simply migrate to a bigger server
• Read Scaling: scale out to introduce an (a)sync server to split read-only queries from the application
• Database restructuring: improved table designs, introduction of aggregation tables
• Offload data: move old transactional data to archive servers, deletion of log data
• But this might not be enough… we need to partition the database!
Scaling Databases: Data Partitioning
• Even though we can scale vertically by adding more resources,
a single database would need to be scaled within itself:
• Vertical & Horizontal Partitioning
• Partitioned Tables
• When a single database is too big, horizontal scaling is done
using distributed databases:
• Distributed Partitioned Views
• Database Sharding
Scale
Up
Scale
Out
Scaling Databases: Domain Partitioning
• A different approach is to partition your data by domain.
• This is achieved by splitting data by domain and moving them into their own database.
• This could be fairly easy if tables are already grouped into their own schema by domain.
• However it could be problematic if application queries and reports span multiple schemas
• reports would now need to mesh multiple databases together
• or read from a consolidated data warehouse
• Even though this breaks the database down into smaller databases, each smaller database has the
potential to become a problem on its own.
• Refactoring a monolith application into various microservices adopts this principle with each
microservice having its own data store.
• Microservices are usually polyglot persistent. The appropriate data store is chosen according to the
required features and partition usage: e.g. using a mix of SQL & NoSQL datastores.
2. Partitioning
Benefits
Strategies: Horizontal & Vertical Partitioning
Updatable Views
DEMO
Partitioning Benefits
• Scalability: Scale-up will eventually reach a physical hardware limit.
• Performance: Data access takes place on smaller partitions, in parallel for multiple partitions.
• Availability: Reduce single point of failures; multiple disk drives, multiple databases, multiple servers.
• Security: Separate sensitive and non-sensitive data into different partitions.
• Flexibility: Varied operational management strategies by partition; monitoring, backups, restores,
indexing, etc.
https://docs.microsoft.com/en-us/azure/architecture/best-practices/data-partitioning
Strategy: Vertical Partitioning
ProductID Name Price DateCreated Stock LastOrderded
AR-5381 Adjustable Race 50 11-Jan-2016 8 17-Nov-2016
AA-8327 Bearing Ball 100 11-Feb-2016 46 21-Nov-2017
BE-2349 BB Ball Bearing 105 11-Mar-2016 52 16-Sep-2017
CE-2908
Headset Ball
Bearings
90 11-Jan-2017 13 12-Feb-2017
CL-2036 Blade 70 11-Feb-2017 28 01-Dec-2017
DA-5965 LL Crankarm 150 11-Mar-2017 30 08-Dec-2017
ProductID Name Price DateCreated
AR-5381 Adjustable Race 50 11-Jan-2016
AA-8327 Bearing Ball 100 11-Feb-2016
BE-2349 BB Ball Bearing 105 11-Mar-2016
CE-2908 Headset Ball Bearings 90 11-Jan-2017
CL-2036 Blade 70 11-Feb-2017
DA-5965 LL Crankarm 150 11-Mar-2017
ProductID Stock LastOrderded
AR-5381 8 17-Nov-2016
AA-8327 46 21-Nov-2017
BE-2349 52 16-Sep-2017
CE-2908 13 12-Feb-2017
CL-2036 28 01-Dec-2017
DA-5965 30 08-Dec-2017
Strategy: Horizontal Partitioning
ProductID Name Price Stock DateCreated LastOrderded
AR-5381 Adjustable Race 50 8 11-Jan-2016 17-Nov-2016
AA-8327 Bearing Ball 100 46 11-Feb-2016 21-Nov-2017
BE-2349 BB Ball Bearing 105 52 11-Mar-2016 16-Sep-2017
CE-2908 Headset Ball Bearings 90 13 11-Jan-2017 12-Feb-2017
CL-2036 Blade 70 28 11-Feb-2017 01-Dec-2017
DA-5965 LL Crankarm 150 30 11-Mar-2017 08-Dec-2017
ProductID Name Price Stock DateCreated LastOrderded
CE-2908 Headset Ball Bearings 90 13 11-Jan-2017 12-Feb-2017
CL-2036 Blade 70 28 11-Feb-2017 01-Dec-2017
DA-5965 LL Crankarm 150 30 11-Mar-2017 08-Dec-2017
ProductID Name Price Stock DateCreated LastOrderded
AR-5381 Adjustable Race 50 8 11-Jan-2016 17-Nov-2016
AA-8327 Bearing Ball 100 46 11-Feb-2016 21-Nov-2017
BE-2349 BB Ball Bearing 105 52 11-Mar-2016 16-Sep-2017
Production.Products_2016
Production.Products
Production.Products_2017
Horizontal Partitioning: Why?
• The idea behind horizontal partitioning is that to split a large table into multiple smaller tables.
• Query-wise
• One smaller table is faster to query than a larger table
• However querying multiple smaller tables is problematic
• Administration-wise, multiple tables can be placed into different file groups, which
• Can be placed into different physical disks > parallelism can be faster
• Can be backed up individually > smaller backup windows
• Set as read-only > protect older data from modifications, backup once and forget
Horizontal Partitioning: Dynamic Queries
DECLARE @SQL AS NVARCHAR(MAX) = CONCAT('
SELECT ProductId, Name, Price, Stock, DateCreated, LastOrdered
FROM Production.Products_', dbo.GetPartition('Production.Products', @FromDate), ' WITH(NOLOCK)
WHERE DateCreated >= @FromDate AND Date <= @ToDate
')
EXECUTE sp_ExecuteSql @Stmt = @SQL
, @Params = N'@FromDate AS DATETIME, @ToDate AS DATETIME‘
, @FromDate = @FromDate
, @ToDate = @ToDate
Horizontal Partitioning: UNIONed Queries
;WITH products AS
(
SELECT ProductId, Name, Price, Stock, DateCreated, LastOrdered
FROM Production.Products_2016 WITH(NOLOCK)
WHERE DateCreated >= @FromDate AND Date <= @ToDate
UNION ALL
SELECT ProductId, Name, Price, Stock, DateCreated, LastOrdered
FROM Production.Products_2017 WITH(NOLOCK)
WHERE DateCreated >= @FromDate AND Date <= @ToDate
UNION ALL
...
)
SELECT ProductId, Name, Price, Stock, DateCreated, LastOrdered
FROM products
Views
• Dynamic Queries are a pain! No syntax checking, string concatenation, etc…
• Constantly creating CTEs to union tables is heavy for everyone.
• Usually create VIEWs to provide a unified view
• however could be cumbersome and repetitive e.g. every month
• thus we dynamically create them using custom code and jobs
• VIEWS help transparently replace an existing table with multiple smaller ones
• no code changes required
• however not all views are updatable
Updateable Views
• You can modify the data of an underlying base table through a view, as long as the following
conditions are true:
• Any modifications, including UPDATE, INSERT, and DELETE statements, must reference columns from only one base
table.
• The columns being modified in the view must directly reference the underlying data in the table columns.
• The columns being modified are not affected by GROUP BY, HAVING, or DISTINCT clauses.
• TOP is not used anywhere in the select statement of the view together with the WITH CHECK OPTION clause.
• INSTEAD OF triggers can be created on a view to make it updatable. The INSTEAD OF trigger is
executed instead of the data modification statement on which the trigger is defined.
https://docs.microsoft.com/en-us/sql/t-sql/statements/create-view-transact-sql
Partitioning
3. Partitioned Tables
Defining Partition Functions & Partition Schemes
Tooling: Custom Partition framework
Myths and performance issues
DEMO
Partitioned Tables: Definition…
• Microsoft introduced Partitioned Tables in MSSQL SERVER 2005
• It supports the use of multiple file groups
• It provides a single table to query from irrespective of partitions
• The above example partitions a table into:
• A partition per month within the current year
• A partition per year for the last two years
• A partition for all the previous years
2015 2016
Jan
2017
E
M
P
T
Y
Feb
2017
E
M
P
T
Y
Pre-2015
Partitioned Tables: Definition…
• A Partition Function
• A Data Type – typically DATE related
• A Range – LEFT or RIGHT
CREATE PARTITION FUNCTION PF_Name (DATETIME2)
AS RANGE RIGHT FOR VALUES ('20170101','20170201','20170301');
• A Partition Scheme – that associates file groups to the partition function
CREATE PARTITION SCHEME PS_Name
AS PARTITION PF_Name
TO (FG000000, FG201701, FG201702, FG201703);
Partitioned Tables: Definition…
• With a RIGHT Range, the previous partitioned table example requires 4 partitions:
• A partition on the left, containing everything from beginning of time till before Jan 2017 – should be empty
• A partition from Jan 2017 till before Feb 2017
• A partition from Feb 2017 till before Mar 2017
• A partition from Mar 2017 till the end of time – should be empty
Jan
2017
Feb
2017
E
M
P
T
Y
Mar
2017
(EMPTY)
Partitioned Tables: Splitting…
• A partitioned table can be extended by splitting an existing partition
• We first add a new file group to the partition scheme
ALTER PARTITION SCHEME PS_Name
NEXT USED [FG201704]
• We then split the partition function to the right
ALTER PARTITION FUNCTION PF_Name()
SPLIT RANGE ('20170401')
Jan
2017
Feb
2017
E
M
P
T
Y
Mar
2017
(EMPTY)
Jan
2017
Feb
2017
E
M
P
T
Y
Apr
2017
(EMPTY)
Mar
2017
Summary: Required steps
• On setup:
1. Create file group for non-partitioned indexes (if required)
2. Create file group for left hand side (to remain empty)
3. Create Partition Function (partitioning key datatype, range direction)
4. Create Partition Scheme (with empty file group)
• Regularly (e.g. monthly)
1. Create file group
2. Split partition
Tooling: Custom partitioning framework
• We created a number of stored procedures to handle these steps:
• Maintenance.UspCreateFileGroup – used to create files and file groups
• Maintenance.UspCreatePartition – used once to create the partition function and partition scheme
• Maintenance.UspCreatePartitionView – used to create a monthly view per partition by date range
• Maintenance.UspSplitPartition – used monthly to create a new file group, split partition, create view
• Maintenance.UspSplitPartitionAllTables – used monthly to split all partition tables via agent job
Partitioned Tables
Partitioned Tables: Merging…
• A partitioned table can have multiple partitions merged into one
ALTER PARTITION FUNCTION PF_Name()
MERGE RANGE('20170201');
• Note: Merging partitions with data movement across file groups will be slow
Jan & Feb
2017
E
M
P
T
Y
Mar
2017
Apr
2017
(EMPTY)
Jan
2017
Feb
2017
E
M
P
T
Y
Mar
2017
Apr
2017
(EMPTY)
Partitioned Tables: Switching…
• Partition switching reduces locks whilst:
• Loading data into a warehouse
• Deleting old data during archival
• Move data between tiered storage
• Partitions need to be in the same file group
• Re-create the staging indexes to move physical data
ALTER TABLE schema.StgTable
SWITCH PARTITION $PARTITION.PF_Name('20170201')
TO schema.PrdTable PARTITION $PARTITION.PF_Name('20170201')
Jan 2017 Feb 2017 Mar 2017
Apr 2017
(EMPTY)
Empty
Partition Function
Production table
Staging table
Partitioned Tables
https://support.microsoft.com/en-us/help/2965553/decreased-performance-for-sql-server-when-you-run-a-top--max-or-min-ag
Decreased performance: TOP, MAX or MIN
Decreased performance: TOP, MAX or MIN
• Test results show that TOP is slower on partitioned tables by 10%
• ROWCOUNT can be used instead
Increased performance: SELECT using non-clustered PK
• When using ROWCOUNT, throughput on partitioned tables is faster
• by 22% throughput
• and has a 3% improvement on response time when using the non-clustered primary key
Increased performance: SELECT using Partitioning Key
• When using ROWCOUNT, throughput on partitioned tables is faster
• by 6% throughput
• and has a 7% improvement on response time when using the clustered partitioning date key
Increased performance: Inserts
• Combined INSERT & SELECT tests found partitioned tables to be faster:
• SELECT – 9% Throughput benefit / 11% improvement in response times
• INSERT – 4% Throughput benefit / 9% improvement in response times
https://blogs.msdn.microsoft.com/sqlmeditation/2013/04/02/dealing-with-unique-columns-when-using-table-partitioning
Unique columns
Unique columns
• Traditionally developers create an IDENTITY(1,1) PRIMARY KEY to provide uniqueness
• This cannot be used with partitioned tables
• Should be replaced with a UNIQUEIDENTIFIER generated at application level (also in preparation for distributed
tables…)
• A PRIMARY KEY is by default CLUSTERED and stored with the data
• In partitioned tables, the Partitioning Key has to be CLUSTERED to split the data
• Thus if the PRIMARY KEY does not contain the Partitioning Key this cannot be CLUSTERED
• An un-partitioned NONCLUSTERED PRIMARY KEY can be used to enforce uniqueness
• However this prohibits SWITCHING of partitions due to unaligned indexes
https://www.mssqltips.com/sqlservertip/1914/sql-server-database-partitioning-myths-and-truths
Myth: Metadata only operations
Myth: Metadata only operations
• Switching partitions in & out
• Requires schema lock on both source and destination tables
• Usually the command is set with a timeout; and try again later
• Splitting & merging partitions
• Altering the partition function is an offline operation
• Splitting a partition which contains data requires data movement
• If the range split introduces a different file group, data needs to physically move between files
• This is why we keep an empty partition on the left and right, and we always split the empty partition
4. Distributed Partitioned Views
Definition
Requirements… loads!
DEMO
Distributed Partitioned Views: Definition
• Basically a view which unions data from multiple databases hosted on different servers.
• Also referred to as Federated Databases.
• Used when applications are unaware of such partitioning.
• Requires Linked Servers.
• Performance improves with lazy schema validation option.
• Read-only views work everywhere.
• Updatable views require Enterprise Edition.
• INSTEAD OF triggers can be used to make views updatable on Standard Edition.
https://docs.microsoft.com/en-us/sql/sql-server/editions-and-components-of-sql-server-2016
Distributed Partitioned Views
Distributed Partitioned Views: Requirements
• Tables Rules
• Member tables cannot be referenced more than one time in the view.
• Member tables cannot have indexes created on any computed columns.
• Member tables must have all PRIMARY KEY constraints on the same number of columns.
• Member tables must have the same ANSI padding setting.
• Column Rules
• All columns in each member table must be included in the same ordinal position in the select list.
• Columns cannot be referenced more than one time in the select list.
• The columns in the select list of each SELECT statement must be of the same type.
• The key ranges of the CHECK constraints in each table cannot overlap with the ranges of any other table.
• Partitioning Column Rules
• The partitioning column cannot be an identity, default, timestamp, or computed column.
• The partitioning column must be in the same ordinal location in the select list of each SELECT statement in the view.
• The partitioning column cannot allow for nulls.
• The partitioning column must be a part of the primary key of the table.
• There must be only one constraint on the partitioning column.
• There are no restrictions on the updatability of the partitioning column.
https://technet.microsoft.com/en-us/library/ms188299(v=sql.105).aspx
Distributed Partitioned Views: Updatable
• INSERT Statements
• All columns must be included in the INSERT statement even if the column can be NULL in the base table or has a DEFAULT constraint defined in
the base table.
• The DEFAULT keyword cannot be specified in the VALUES clause of the INSERT statement.
• INSERT statements must supply a value that satisfies the logic of the CHECK constraint defined on the partitioning column for one of the
member tables.
• INSERT statements are not allowed if a member table contains a column with an identity property.
• INSERT statements are not allowed if a member table contains a timestamp column.
• INSERT statements are not allowed if there is a self-join with the same view or any one of the member tables.
• UPDATE Statements
• UPDATE statements cannot specify the DEFAULT keyword as a value in the SET clause even if the column has a DEFAULT value defined in the
corresponding member table
• The value of a column with an identity property cannot be changed: however, the other columns can be updated.
• The value of a PRIMARY KEY cannot be changed if the column contains text, image, or ntext data.
• Updates are not allowed if a base table contains a timestamp column.
• Updates are not allowed if there is a self-join with the same view or any one of the member tables.
• DELETE Statements
• DELETE statements are not allowed when there is a self-join with the same view or any one of the member tables.
https://technet.microsoft.com/en-us/library/ms187067(v=sql.105).aspx
Distributed Partitioned Views
5. Database Sharding
Definition
Sharding Strategies
Database Sharding: Definition
• A form of horizontal partitioning in which partitions are distributed on commodity servers.
• An individual partition is referred to as a shard.
• The application is shard-aware and can route connection requests autonomously without the
need of distributed partitioned views.
• Sharding is used to truly circumvent issues of having a single monolith database or a single entry-
point in terms of Storage space, Computing resources, Network bandwidth, and Geography.
https://docs.microsoft.com/en-us/azure/architecture/patterns/sharding
Database Sharding: Problems
• Queries that JOIN shards together are problematic and would need to be meshed together via the
application.
• Multiple shards can be queried in parallel and merged together either in memory or client-side.
• Referential integrity might be non existent.
• Shards are usually used with domain-based partitioning and thus referenced tables could be in different databases.
• Un-partitioned reference tables would also be placed outside the shards.
• However, static reference tables could be treated as global tables, thus copied and replicated into all shards.
• Rebalancing sharded data is problematic. This might be required when
• a shard key changes and thus data need to move between shards
• a new shard is added and data needs to be redistributed
https://docs.microsoft.com/en-us/azure/architecture/patterns/sharding
Database Sharding: Strategies
• The Lookup strategy
• A map is used to route a request for data to the shard that contains such data using the shard key.
• Multi-tenant applications can store all the data for a tenant together in a shard using the tenant ID as
shard key.
• Multiple tenants can share the same shard, but the data for a single tenant cannot spread across
multiple shards.
• The Range strategy
• Sequential shard keys are ordered and grouped together.
• Useful for applications that frequently retrieve sets of items using range queries.
• The Hash strategy
• This is used to reduce the chance of hotspots (shards that receive a disproportionate amount of load).
• The chosen hashing function should distribute data evenly across the shards, possibly by introducing
some random element into the computation.
6. Stretch Database
Definition
Demo
Stretch Database: Definition
• Stretch Database is a feature of SQL Server 2016.
• This is used to move cold data from on-premise instances
directly into the cloud with only a few clicks.
• Eliminates the need to manually create archiving procedures
that move data out of production db and into archive db.
• Requires an Azure subscription.
• Download “Data Migration Assistant” to identify candidate tables to stretch.
https://docs.microsoft.com/en-us/sql/sql-server/stretch-database/stretch-database
Stretch Database: Limitations
• Limitations for Stretch-enabled tables
• Uniqueness is not enforced for UNIQUE constraints and PRIMARY KEY constraints in the Azure table that contains the
migrated data.
• You can't UPDATE or DELETE rows that have been migrated, or rows that are eligible for migration.
• You can't INSERT rows into a Stretch-enabled table on a linked server.
• You can't create an index for a view that includes Stretch-enabled tables.
• Filters on SQL Server indexes are not propagated to the remote table.
• Limitations that currently prevent you from enabling Stretch for a table
• Tables that have more than 1,023 columns or more than 998 indexes
• FileTables or tables that contain FILESTREAM data
• Tables that are replicated, or that are actively using Change Tracking or Change Data Capture
• Memory-optimized tables
• Data types: text, ntext, image, timestamp, sql_variant, XML, and CLR data types including geometry, geography, hierarchyid
• Computed columns
• Default constraints and check constraints
• Foreign key constraints that reference the table.
• Full text indexes, XML indexes, Spatial indexes, Indexed views
https://docs.microsoft.com/en-us/sql/sql-server/stretch-database/limitations-for-stretch-database
Stretch Database
• Today’s event was sponsored by:
Microsoft Malta : location and refreshments
Gaming Innovation Group : Parking vouchers
• The Tech-Spark community requires your help. Sponsor an event by providing a meeting place,
refreshments, and why not, deliver a session! Feel free to contact us should you want to help.
Contact Us
Ralph Attard
raland@raland.net
Tech Spark
http://www.tech-spark.com
https://www.facebook.com/techsparkmalta

Weitere ähnliche Inhalte

Was ist angesagt?

SQL Server 2016: Just a Few of Our DBA's Favorite Things
SQL Server 2016: Just a Few of Our DBA's Favorite ThingsSQL Server 2016: Just a Few of Our DBA's Favorite Things
SQL Server 2016: Just a Few of Our DBA's Favorite ThingsHostway|HOSTING
 
SQL Server 2016 New Features and Enhancements
SQL Server 2016 New Features and EnhancementsSQL Server 2016 New Features and Enhancements
SQL Server 2016 New Features and EnhancementsJohn Martin
 
Modernizing your database with SQL Server 2019
Modernizing your database with SQL Server 2019Modernizing your database with SQL Server 2019
Modernizing your database with SQL Server 2019Antonios Chatzipavlis
 
Data warehouse 21 snowflake schema
Data warehouse 21 snowflake schemaData warehouse 21 snowflake schema
Data warehouse 21 snowflake schemaVaibhav Khanna
 
ABCs of CDC with SSIS 2012
ABCs of CDC with SSIS 2012ABCs of CDC with SSIS 2012
ABCs of CDC with SSIS 2012Steve Wake
 
SQL server 2016 New Features
SQL server 2016 New FeaturesSQL server 2016 New Features
SQL server 2016 New Featuresaminmesbahi
 
Real Time Operational Analytics with Microsoft Sql Server 2016 [Liviu Ieran]
Real Time Operational Analytics with Microsoft Sql Server 2016 [Liviu Ieran]Real Time Operational Analytics with Microsoft Sql Server 2016 [Liviu Ieran]
Real Time Operational Analytics with Microsoft Sql Server 2016 [Liviu Ieran]ITCamp
 
Exploring Scalability, Performance And Deployment
Exploring Scalability, Performance And DeploymentExploring Scalability, Performance And Deployment
Exploring Scalability, Performance And Deploymentrsnarayanan
 
A to z for sql azure databases
A to z for sql azure databasesA to z for sql azure databases
A to z for sql azure databasesAntonios Chatzipavlis
 
How to install Vertica in a single node.
How to install Vertica in a single node.How to install Vertica in a single node.
How to install Vertica in a single node.Anil Maharjan
 
Adding Value to HBase with IBM InfoSphere BigInsights and BigSQL
Adding Value to HBase with IBM InfoSphere BigInsights and BigSQLAdding Value to HBase with IBM InfoSphere BigInsights and BigSQL
Adding Value to HBase with IBM InfoSphere BigInsights and BigSQLPiotr Pruski
 
Big SQL 3.0: Datawarehouse-grade Performance on Hadoop - At last!
Big SQL 3.0: Datawarehouse-grade Performance on Hadoop - At last!Big SQL 3.0: Datawarehouse-grade Performance on Hadoop - At last!
Big SQL 3.0: Datawarehouse-grade Performance on Hadoop - At last!Nicolas Morales
 
Big Data: HBase and Big SQL self-study lab
Big Data:  HBase and Big SQL self-study lab Big Data:  HBase and Big SQL self-study lab
Big Data: HBase and Big SQL self-study lab Cynthia Saracco
 
Big Data: Working with Big SQL data from Spark
Big Data:  Working with Big SQL data from Spark Big Data:  Working with Big SQL data from Spark
Big Data: Working with Big SQL data from Spark Cynthia Saracco
 
Advanced integration services on microsoft ssis 1
Advanced integration services on microsoft ssis 1Advanced integration services on microsoft ssis 1
Advanced integration services on microsoft ssis 1Skillwise Group
 
Oracle 12.2 sharded database management
Oracle 12.2 sharded database managementOracle 12.2 sharded database management
Oracle 12.2 sharded database managementLeyi (Kamus) Zhang
 
Hands-on-Lab: Adding Value to HBase with IBM InfoSphere BigInsights and BigSQL
Hands-on-Lab: Adding Value to HBase with IBM InfoSphere BigInsights and BigSQLHands-on-Lab: Adding Value to HBase with IBM InfoSphere BigInsights and BigSQL
Hands-on-Lab: Adding Value to HBase with IBM InfoSphere BigInsights and BigSQLPiotr Pruski
 

Was ist angesagt? (18)

SQL Server 2016: Just a Few of Our DBA's Favorite Things
SQL Server 2016: Just a Few of Our DBA's Favorite ThingsSQL Server 2016: Just a Few of Our DBA's Favorite Things
SQL Server 2016: Just a Few of Our DBA's Favorite Things
 
SQL Server 2016 New Features and Enhancements
SQL Server 2016 New Features and EnhancementsSQL Server 2016 New Features and Enhancements
SQL Server 2016 New Features and Enhancements
 
Modernizing your database with SQL Server 2019
Modernizing your database with SQL Server 2019Modernizing your database with SQL Server 2019
Modernizing your database with SQL Server 2019
 
Data warehouse 21 snowflake schema
Data warehouse 21 snowflake schemaData warehouse 21 snowflake schema
Data warehouse 21 snowflake schema
 
Azure SQL
Azure SQLAzure SQL
Azure SQL
 
ABCs of CDC with SSIS 2012
ABCs of CDC with SSIS 2012ABCs of CDC with SSIS 2012
ABCs of CDC with SSIS 2012
 
SQL server 2016 New Features
SQL server 2016 New FeaturesSQL server 2016 New Features
SQL server 2016 New Features
 
Real Time Operational Analytics with Microsoft Sql Server 2016 [Liviu Ieran]
Real Time Operational Analytics with Microsoft Sql Server 2016 [Liviu Ieran]Real Time Operational Analytics with Microsoft Sql Server 2016 [Liviu Ieran]
Real Time Operational Analytics with Microsoft Sql Server 2016 [Liviu Ieran]
 
Exploring Scalability, Performance And Deployment
Exploring Scalability, Performance And DeploymentExploring Scalability, Performance And Deployment
Exploring Scalability, Performance And Deployment
 
A to z for sql azure databases
A to z for sql azure databasesA to z for sql azure databases
A to z for sql azure databases
 
How to install Vertica in a single node.
How to install Vertica in a single node.How to install Vertica in a single node.
How to install Vertica in a single node.
 
Adding Value to HBase with IBM InfoSphere BigInsights and BigSQL
Adding Value to HBase with IBM InfoSphere BigInsights and BigSQLAdding Value to HBase with IBM InfoSphere BigInsights and BigSQL
Adding Value to HBase with IBM InfoSphere BigInsights and BigSQL
 
Big SQL 3.0: Datawarehouse-grade Performance on Hadoop - At last!
Big SQL 3.0: Datawarehouse-grade Performance on Hadoop - At last!Big SQL 3.0: Datawarehouse-grade Performance on Hadoop - At last!
Big SQL 3.0: Datawarehouse-grade Performance on Hadoop - At last!
 
Big Data: HBase and Big SQL self-study lab
Big Data:  HBase and Big SQL self-study lab Big Data:  HBase and Big SQL self-study lab
Big Data: HBase and Big SQL self-study lab
 
Big Data: Working with Big SQL data from Spark
Big Data:  Working with Big SQL data from Spark Big Data:  Working with Big SQL data from Spark
Big Data: Working with Big SQL data from Spark
 
Advanced integration services on microsoft ssis 1
Advanced integration services on microsoft ssis 1Advanced integration services on microsoft ssis 1
Advanced integration services on microsoft ssis 1
 
Oracle 12.2 sharded database management
Oracle 12.2 sharded database managementOracle 12.2 sharded database management
Oracle 12.2 sharded database management
 
Hands-on-Lab: Adding Value to HBase with IBM InfoSphere BigInsights and BigSQL
Hands-on-Lab: Adding Value to HBase with IBM InfoSphere BigInsights and BigSQLHands-on-Lab: Adding Value to HBase with IBM InfoSphere BigInsights and BigSQL
Hands-on-Lab: Adding Value to HBase with IBM InfoSphere BigInsights and BigSQL
 

Ähnlich wie Tech-Spark: Scaling Databases

Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Michael Rys
 
SQL Server 2017 - Mejoras Impulsadas por la Comunidad
SQL Server 2017 - Mejoras Impulsadas por la ComunidadSQL Server 2017 - Mejoras Impulsadas por la Comunidad
SQL Server 2017 - Mejoras Impulsadas por la ComunidadJavier Villegas
 
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)Michael Rys
 
Introduction to Azure Data Lake
Introduction to Azure Data LakeIntroduction to Azure Data Lake
Introduction to Azure Data LakeAntonios Chatzipavlis
 
Optimizing Data Accessin Sq Lserver2005
Optimizing Data Accessin Sq Lserver2005Optimizing Data Accessin Sq Lserver2005
Optimizing Data Accessin Sq Lserver2005rainynovember12
 
Maaz Anjum - IOUG Collaborate 2013 - An Insight into Space Realization on ODA...
Maaz Anjum - IOUG Collaborate 2013 - An Insight into Space Realization on ODA...Maaz Anjum - IOUG Collaborate 2013 - An Insight into Space Realization on ODA...
Maaz Anjum - IOUG Collaborate 2013 - An Insight into Space Realization on ODA...Maaz Anjum
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overviewAlessandro Melchiori
 
Database Administration & Management - 01
Database Administration & Management - 01Database Administration & Management - 01
Database Administration & Management - 01FaisalMashood
 
DBAM-01.pdf
DBAM-01.pdfDBAM-01.pdf
DBAM-01.pdfhania80
 
Self-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Self-serve analytics journey at Celtra: Snowflake, Spark, and DatabricksSelf-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Self-serve analytics journey at Celtra: Snowflake, Spark, and DatabricksGrega Kespret
 
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...Amazon Web Services
 
Unity Connect - Getting SQL Spinning with SharePoint - Best Practices for the...
Unity Connect - Getting SQL Spinning with SharePoint - Best Practices for the...Unity Connect - Getting SQL Spinning with SharePoint - Best Practices for the...
Unity Connect - Getting SQL Spinning with SharePoint - Best Practices for the...Knut Relbe-Moe [MVP, MCT]
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)James Serra
 
SQLServer Database Structures
SQLServer Database Structures SQLServer Database Structures
SQLServer Database Structures Antonios Chatzipavlis
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseJames Serra
 
SQLDAY 2023 Chodkowski Adrian Databricks Performance Tuning
SQLDAY 2023 Chodkowski Adrian Databricks Performance TuningSQLDAY 2023 Chodkowski Adrian Databricks Performance Tuning
SQLDAY 2023 Chodkowski Adrian Databricks Performance TuningSeeQuality.net
 
Cloud architectural patterns and Microsoft Azure tools
Cloud architectural patterns and Microsoft Azure toolsCloud architectural patterns and Microsoft Azure tools
Cloud architectural patterns and Microsoft Azure toolsPushkar Chivate
 

Ähnlich wie Tech-Spark: Scaling Databases (20)

Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
 
SQL Server 2017 - Mejoras Impulsadas por la Comunidad
SQL Server 2017 - Mejoras Impulsadas por la ComunidadSQL Server 2017 - Mejoras Impulsadas por la Comunidad
SQL Server 2017 - Mejoras Impulsadas por la Comunidad
 
Azure SQL DWH
Azure SQL DWHAzure SQL DWH
Azure SQL DWH
 
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
 
AZURE Data Related Services
AZURE Data Related ServicesAZURE Data Related Services
AZURE Data Related Services
 
Introduction to Azure Data Lake
Introduction to Azure Data LakeIntroduction to Azure Data Lake
Introduction to Azure Data Lake
 
Azure Databases with IaaS
Azure Databases with IaaSAzure Databases with IaaS
Azure Databases with IaaS
 
Optimizing Data Accessin Sq Lserver2005
Optimizing Data Accessin Sq Lserver2005Optimizing Data Accessin Sq Lserver2005
Optimizing Data Accessin Sq Lserver2005
 
Maaz Anjum - IOUG Collaborate 2013 - An Insight into Space Realization on ODA...
Maaz Anjum - IOUG Collaborate 2013 - An Insight into Space Realization on ODA...Maaz Anjum - IOUG Collaborate 2013 - An Insight into Space Realization on ODA...
Maaz Anjum - IOUG Collaborate 2013 - An Insight into Space Realization on ODA...
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
 
Database Administration & Management - 01
Database Administration & Management - 01Database Administration & Management - 01
Database Administration & Management - 01
 
DBAM-01.pdf
DBAM-01.pdfDBAM-01.pdf
DBAM-01.pdf
 
Self-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Self-serve analytics journey at Celtra: Snowflake, Spark, and DatabricksSelf-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Self-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
 
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
 
Unity Connect - Getting SQL Spinning with SharePoint - Best Practices for the...
Unity Connect - Getting SQL Spinning with SharePoint - Best Practices for the...Unity Connect - Getting SQL Spinning with SharePoint - Best Practices for the...
Unity Connect - Getting SQL Spinning with SharePoint - Best Practices for the...
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
 
SQLServer Database Structures
SQLServer Database Structures SQLServer Database Structures
SQLServer Database Structures
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data Warehouse
 
SQLDAY 2023 Chodkowski Adrian Databricks Performance Tuning
SQLDAY 2023 Chodkowski Adrian Databricks Performance TuningSQLDAY 2023 Chodkowski Adrian Databricks Performance Tuning
SQLDAY 2023 Chodkowski Adrian Databricks Performance Tuning
 
Cloud architectural patterns and Microsoft Azure tools
Cloud architectural patterns and Microsoft Azure toolsCloud architectural patterns and Microsoft Azure tools
Cloud architectural patterns and Microsoft Azure tools
 

KĂźrzlich hochgeladen

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 

KĂźrzlich hochgeladen (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 

Tech-Spark: Scaling Databases

  • 1. 1. Introduction: Why Scale? 2. Vertical & Horizontal Partitioning 3. Partitioned Tables 4. Distributed Partitioned Views 5. Database Sharding 6. Stretch Databases (optional)
  • 2. Ralph: Who am I? • An Enterprise Architect • at iGamingCloud, Gaming Innovation Group • focus on Data Platforms • A Microsoft Certified Trainer • deliver MTA, MCSA, MCSE locally • covering Windows, SQL Server, C# • I’m here to describe the need for database scalability, describe a number of possible cross platform solutions, and demonstrate technologies available in MS SQL Server 2016 and Azure.
  • 3. 1. Introduction Why do we need to scale databases? Overview of possible options
  • 4. Scaling Databases: Why? • Most application environments are developed as a monolith, a single application running a single database on a single server. • In time, the whole application environment starts slowing down: • increased data volumes • increased work loads • The simplest option is to introduce an app/web farm to balance the application across multiple servers whilst using the same old single database. • But this might not be enough… we need to scale the database!
  • 5. Scaling Databases: Optimisations • Unless the whole application environment is redesigned and redeveloped, one needs to look into optimising the database layer. • Large database problems include: • Queries become slower, possibly giving time-outs under load • Backups are slower to take, to ship, and to restore • Performing index maintenance impacts even more • Common optimisations include: • Vertical Scaling: scale up current servers to max disk/memory/cpu, or simply migrate to a bigger server • Read Scaling: scale out to introduce an (a)sync server to split read-only queries from the application • Database restructuring: improved table designs, introduction of aggregation tables • Offload data: move old transactional data to archive servers, deletion of log data • But this might not be enough… we need to partition the database!
  • 6. Scaling Databases: Data Partitioning • Even though we can scale vertically by adding more resources, a single database would need to be scaled within itself: • Vertical & Horizontal Partitioning • Partitioned Tables • When a single database is too big, horizontal scaling is done using distributed databases: • Distributed Partitioned Views • Database Sharding Scale Up Scale Out
  • 7. Scaling Databases: Domain Partitioning • A different approach is to partition your data by domain. • This is achieved by splitting data by domain and moving them into their own database. • This could be fairly easy if tables are already grouped into their own schema by domain. • However it could be problematic if application queries and reports span multiple schemas • reports would now need to mesh multiple databases together • or read from a consolidated data warehouse • Even though this breaks the database down into smaller databases, each smaller database has the potential to become a problem on its own. • Refactoring a monolith application into various microservices adopts this principle with each microservice having its own data store. • Microservices are usually polyglot persistent. The appropriate data store is chosen according to the required features and partition usage: e.g. using a mix of SQL & NoSQL datastores.
  • 8. 2. Partitioning Benefits Strategies: Horizontal & Vertical Partitioning Updatable Views DEMO
  • 9. Partitioning Benefits • Scalability: Scale-up will eventually reach a physical hardware limit. • Performance: Data access takes place on smaller partitions, in parallel for multiple partitions. • Availability: Reduce single point of failures; multiple disk drives, multiple databases, multiple servers. • Security: Separate sensitive and non-sensitive data into different partitions. • Flexibility: Varied operational management strategies by partition; monitoring, backups, restores, indexing, etc. https://docs.microsoft.com/en-us/azure/architecture/best-practices/data-partitioning
  • 10. Strategy: Vertical Partitioning ProductID Name Price DateCreated Stock LastOrderded AR-5381 Adjustable Race 50 11-Jan-2016 8 17-Nov-2016 AA-8327 Bearing Ball 100 11-Feb-2016 46 21-Nov-2017 BE-2349 BB Ball Bearing 105 11-Mar-2016 52 16-Sep-2017 CE-2908 Headset Ball Bearings 90 11-Jan-2017 13 12-Feb-2017 CL-2036 Blade 70 11-Feb-2017 28 01-Dec-2017 DA-5965 LL Crankarm 150 11-Mar-2017 30 08-Dec-2017 ProductID Name Price DateCreated AR-5381 Adjustable Race 50 11-Jan-2016 AA-8327 Bearing Ball 100 11-Feb-2016 BE-2349 BB Ball Bearing 105 11-Mar-2016 CE-2908 Headset Ball Bearings 90 11-Jan-2017 CL-2036 Blade 70 11-Feb-2017 DA-5965 LL Crankarm 150 11-Mar-2017 ProductID Stock LastOrderded AR-5381 8 17-Nov-2016 AA-8327 46 21-Nov-2017 BE-2349 52 16-Sep-2017 CE-2908 13 12-Feb-2017 CL-2036 28 01-Dec-2017 DA-5965 30 08-Dec-2017
  • 11. Strategy: Horizontal Partitioning ProductID Name Price Stock DateCreated LastOrderded AR-5381 Adjustable Race 50 8 11-Jan-2016 17-Nov-2016 AA-8327 Bearing Ball 100 46 11-Feb-2016 21-Nov-2017 BE-2349 BB Ball Bearing 105 52 11-Mar-2016 16-Sep-2017 CE-2908 Headset Ball Bearings 90 13 11-Jan-2017 12-Feb-2017 CL-2036 Blade 70 28 11-Feb-2017 01-Dec-2017 DA-5965 LL Crankarm 150 30 11-Mar-2017 08-Dec-2017 ProductID Name Price Stock DateCreated LastOrderded CE-2908 Headset Ball Bearings 90 13 11-Jan-2017 12-Feb-2017 CL-2036 Blade 70 28 11-Feb-2017 01-Dec-2017 DA-5965 LL Crankarm 150 30 11-Mar-2017 08-Dec-2017 ProductID Name Price Stock DateCreated LastOrderded AR-5381 Adjustable Race 50 8 11-Jan-2016 17-Nov-2016 AA-8327 Bearing Ball 100 46 11-Feb-2016 21-Nov-2017 BE-2349 BB Ball Bearing 105 52 11-Mar-2016 16-Sep-2017 Production.Products_2016 Production.Products Production.Products_2017
  • 12. Horizontal Partitioning: Why? • The idea behind horizontal partitioning is that to split a large table into multiple smaller tables. • Query-wise • One smaller table is faster to query than a larger table • However querying multiple smaller tables is problematic • Administration-wise, multiple tables can be placed into different file groups, which • Can be placed into different physical disks > parallelism can be faster • Can be backed up individually > smaller backup windows • Set as read-only > protect older data from modifications, backup once and forget
  • 13. Horizontal Partitioning: Dynamic Queries DECLARE @SQL AS NVARCHAR(MAX) = CONCAT(' SELECT ProductId, Name, Price, Stock, DateCreated, LastOrdered FROM Production.Products_', dbo.GetPartition('Production.Products', @FromDate), ' WITH(NOLOCK) WHERE DateCreated >= @FromDate AND Date <= @ToDate ') EXECUTE sp_ExecuteSql @Stmt = @SQL , @Params = N'@FromDate AS DATETIME, @ToDate AS DATETIME‘ , @FromDate = @FromDate , @ToDate = @ToDate
  • 14. Horizontal Partitioning: UNIONed Queries ;WITH products AS ( SELECT ProductId, Name, Price, Stock, DateCreated, LastOrdered FROM Production.Products_2016 WITH(NOLOCK) WHERE DateCreated >= @FromDate AND Date <= @ToDate UNION ALL SELECT ProductId, Name, Price, Stock, DateCreated, LastOrdered FROM Production.Products_2017 WITH(NOLOCK) WHERE DateCreated >= @FromDate AND Date <= @ToDate UNION ALL ... ) SELECT ProductId, Name, Price, Stock, DateCreated, LastOrdered FROM products
  • 15. Views • Dynamic Queries are a pain! No syntax checking, string concatenation, etc… • Constantly creating CTEs to union tables is heavy for everyone. • Usually create VIEWs to provide a unified view • however could be cumbersome and repetitive e.g. every month • thus we dynamically create them using custom code and jobs • VIEWS help transparently replace an existing table with multiple smaller ones • no code changes required • however not all views are updatable
  • 16. Updateable Views • You can modify the data of an underlying base table through a view, as long as the following conditions are true: • Any modifications, including UPDATE, INSERT, and DELETE statements, must reference columns from only one base table. • The columns being modified in the view must directly reference the underlying data in the table columns. • The columns being modified are not affected by GROUP BY, HAVING, or DISTINCT clauses. • TOP is not used anywhere in the select statement of the view together with the WITH CHECK OPTION clause. • INSTEAD OF triggers can be created on a view to make it updatable. The INSTEAD OF trigger is executed instead of the data modification statement on which the trigger is defined. https://docs.microsoft.com/en-us/sql/t-sql/statements/create-view-transact-sql
  • 18. 3. Partitioned Tables Defining Partition Functions & Partition Schemes Tooling: Custom Partition framework Myths and performance issues DEMO
  • 19. Partitioned Tables: Definition… • Microsoft introduced Partitioned Tables in MSSQL SERVER 2005 • It supports the use of multiple file groups • It provides a single table to query from irrespective of partitions • The above example partitions a table into: • A partition per month within the current year • A partition per year for the last two years • A partition for all the previous years 2015 2016 Jan 2017 E M P T Y Feb 2017 E M P T Y Pre-2015
  • 20. Partitioned Tables: Definition… • A Partition Function • A Data Type – typically DATE related • A Range – LEFT or RIGHT CREATE PARTITION FUNCTION PF_Name (DATETIME2) AS RANGE RIGHT FOR VALUES ('20170101','20170201','20170301'); • A Partition Scheme – that associates file groups to the partition function CREATE PARTITION SCHEME PS_Name AS PARTITION PF_Name TO (FG000000, FG201701, FG201702, FG201703);
  • 21. Partitioned Tables: Definition… • With a RIGHT Range, the previous partitioned table example requires 4 partitions: • A partition on the left, containing everything from beginning of time till before Jan 2017 – should be empty • A partition from Jan 2017 till before Feb 2017 • A partition from Feb 2017 till before Mar 2017 • A partition from Mar 2017 till the end of time – should be empty Jan 2017 Feb 2017 E M P T Y Mar 2017 (EMPTY)
  • 22. Partitioned Tables: Splitting… • A partitioned table can be extended by splitting an existing partition • We first add a new file group to the partition scheme ALTER PARTITION SCHEME PS_Name NEXT USED [FG201704] • We then split the partition function to the right ALTER PARTITION FUNCTION PF_Name() SPLIT RANGE ('20170401') Jan 2017 Feb 2017 E M P T Y Mar 2017 (EMPTY) Jan 2017 Feb 2017 E M P T Y Apr 2017 (EMPTY) Mar 2017
  • 23. Summary: Required steps • On setup: 1. Create file group for non-partitioned indexes (if required) 2. Create file group for left hand side (to remain empty) 3. Create Partition Function (partitioning key datatype, range direction) 4. Create Partition Scheme (with empty file group) • Regularly (e.g. monthly) 1. Create file group 2. Split partition
  • 24. Tooling: Custom partitioning framework • We created a number of stored procedures to handle these steps: • Maintenance.UspCreateFileGroup – used to create files and file groups • Maintenance.UspCreatePartition – used once to create the partition function and partition scheme • Maintenance.UspCreatePartitionView – used to create a monthly view per partition by date range • Maintenance.UspSplitPartition – used monthly to create a new file group, split partition, create view • Maintenance.UspSplitPartitionAllTables – used monthly to split all partition tables via agent job
  • 26. Partitioned Tables: Merging… • A partitioned table can have multiple partitions merged into one ALTER PARTITION FUNCTION PF_Name() MERGE RANGE('20170201'); • Note: Merging partitions with data movement across file groups will be slow Jan & Feb 2017 E M P T Y Mar 2017 Apr 2017 (EMPTY) Jan 2017 Feb 2017 E M P T Y Mar 2017 Apr 2017 (EMPTY)
  • 27. Partitioned Tables: Switching… • Partition switching reduces locks whilst: • Loading data into a warehouse • Deleting old data during archival • Move data between tiered storage • Partitions need to be in the same file group • Re-create the staging indexes to move physical data ALTER TABLE schema.StgTable SWITCH PARTITION $PARTITION.PF_Name('20170201') TO schema.PrdTable PARTITION $PARTITION.PF_Name('20170201') Jan 2017 Feb 2017 Mar 2017 Apr 2017 (EMPTY) Empty Partition Function Production table Staging table
  • 30. Decreased performance: TOP, MAX or MIN • Test results show that TOP is slower on partitioned tables by 10% • ROWCOUNT can be used instead
  • 31. Increased performance: SELECT using non-clustered PK • When using ROWCOUNT, throughput on partitioned tables is faster • by 22% throughput • and has a 3% improvement on response time when using the non-clustered primary key
  • 32. Increased performance: SELECT using Partitioning Key • When using ROWCOUNT, throughput on partitioned tables is faster • by 6% throughput • and has a 7% improvement on response time when using the clustered partitioning date key
  • 33. Increased performance: Inserts • Combined INSERT & SELECT tests found partitioned tables to be faster: • SELECT – 9% Throughput benefit / 11% improvement in response times • INSERT – 4% Throughput benefit / 9% improvement in response times
  • 35. Unique columns • Traditionally developers create an IDENTITY(1,1) PRIMARY KEY to provide uniqueness • This cannot be used with partitioned tables • Should be replaced with a UNIQUEIDENTIFIER generated at application level (also in preparation for distributed tables…) • A PRIMARY KEY is by default CLUSTERED and stored with the data • In partitioned tables, the Partitioning Key has to be CLUSTERED to split the data • Thus if the PRIMARY KEY does not contain the Partitioning Key this cannot be CLUSTERED • An un-partitioned NONCLUSTERED PRIMARY KEY can be used to enforce uniqueness • However this prohibits SWITCHING of partitions due to unaligned indexes
  • 37. Myth: Metadata only operations • Switching partitions in & out • Requires schema lock on both source and destination tables • Usually the command is set with a timeout; and try again later • Splitting & merging partitions • Altering the partition function is an offline operation • Splitting a partition which contains data requires data movement • If the range split introduces a different file group, data needs to physically move between files • This is why we keep an empty partition on the left and right, and we always split the empty partition
  • 38. 4. Distributed Partitioned Views Definition Requirements… loads! DEMO
  • 39. Distributed Partitioned Views: Definition • Basically a view which unions data from multiple databases hosted on different servers. • Also referred to as Federated Databases. • Used when applications are unaware of such partitioning. • Requires Linked Servers. • Performance improves with lazy schema validation option. • Read-only views work everywhere. • Updatable views require Enterprise Edition. • INSTEAD OF triggers can be used to make views updatable on Standard Edition.
  • 41. Distributed Partitioned Views: Requirements • Tables Rules • Member tables cannot be referenced more than one time in the view. • Member tables cannot have indexes created on any computed columns. • Member tables must have all PRIMARY KEY constraints on the same number of columns. • Member tables must have the same ANSI padding setting. • Column Rules • All columns in each member table must be included in the same ordinal position in the select list. • Columns cannot be referenced more than one time in the select list. • The columns in the select list of each SELECT statement must be of the same type. • The key ranges of the CHECK constraints in each table cannot overlap with the ranges of any other table. • Partitioning Column Rules • The partitioning column cannot be an identity, default, timestamp, or computed column. • The partitioning column must be in the same ordinal location in the select list of each SELECT statement in the view. • The partitioning column cannot allow for nulls. • The partitioning column must be a part of the primary key of the table. • There must be only one constraint on the partitioning column. • There are no restrictions on the updatability of the partitioning column. https://technet.microsoft.com/en-us/library/ms188299(v=sql.105).aspx
  • 42. Distributed Partitioned Views: Updatable • INSERT Statements • All columns must be included in the INSERT statement even if the column can be NULL in the base table or has a DEFAULT constraint defined in the base table. • The DEFAULT keyword cannot be specified in the VALUES clause of the INSERT statement. • INSERT statements must supply a value that satisfies the logic of the CHECK constraint defined on the partitioning column for one of the member tables. • INSERT statements are not allowed if a member table contains a column with an identity property. • INSERT statements are not allowed if a member table contains a timestamp column. • INSERT statements are not allowed if there is a self-join with the same view or any one of the member tables. • UPDATE Statements • UPDATE statements cannot specify the DEFAULT keyword as a value in the SET clause even if the column has a DEFAULT value defined in the corresponding member table • The value of a column with an identity property cannot be changed: however, the other columns can be updated. • The value of a PRIMARY KEY cannot be changed if the column contains text, image, or ntext data. • Updates are not allowed if a base table contains a timestamp column. • Updates are not allowed if there is a self-join with the same view or any one of the member tables. • DELETE Statements • DELETE statements are not allowed when there is a self-join with the same view or any one of the member tables. https://technet.microsoft.com/en-us/library/ms187067(v=sql.105).aspx
  • 45. Database Sharding: Definition • A form of horizontal partitioning in which partitions are distributed on commodity servers. • An individual partition is referred to as a shard. • The application is shard-aware and can route connection requests autonomously without the need of distributed partitioned views. • Sharding is used to truly circumvent issues of having a single monolith database or a single entry- point in terms of Storage space, Computing resources, Network bandwidth, and Geography. https://docs.microsoft.com/en-us/azure/architecture/patterns/sharding
  • 46. Database Sharding: Problems • Queries that JOIN shards together are problematic and would need to be meshed together via the application. • Multiple shards can be queried in parallel and merged together either in memory or client-side. • Referential integrity might be non existent. • Shards are usually used with domain-based partitioning and thus referenced tables could be in different databases. • Un-partitioned reference tables would also be placed outside the shards. • However, static reference tables could be treated as global tables, thus copied and replicated into all shards. • Rebalancing sharded data is problematic. This might be required when • a shard key changes and thus data need to move between shards • a new shard is added and data needs to be redistributed https://docs.microsoft.com/en-us/azure/architecture/patterns/sharding
  • 47. Database Sharding: Strategies • The Lookup strategy • A map is used to route a request for data to the shard that contains such data using the shard key. • Multi-tenant applications can store all the data for a tenant together in a shard using the tenant ID as shard key. • Multiple tenants can share the same shard, but the data for a single tenant cannot spread across multiple shards. • The Range strategy • Sequential shard keys are ordered and grouped together. • Useful for applications that frequently retrieve sets of items using range queries. • The Hash strategy • This is used to reduce the chance of hotspots (shards that receive a disproportionate amount of load). • The chosen hashing function should distribute data evenly across the shards, possibly by introducing some random element into the computation.
  • 49. Stretch Database: Definition • Stretch Database is a feature of SQL Server 2016. • This is used to move cold data from on-premise instances directly into the cloud with only a few clicks. • Eliminates the need to manually create archiving procedures that move data out of production db and into archive db. • Requires an Azure subscription. • Download “Data Migration Assistant” to identify candidate tables to stretch. https://docs.microsoft.com/en-us/sql/sql-server/stretch-database/stretch-database
  • 50. Stretch Database: Limitations • Limitations for Stretch-enabled tables • Uniqueness is not enforced for UNIQUE constraints and PRIMARY KEY constraints in the Azure table that contains the migrated data. • You can't UPDATE or DELETE rows that have been migrated, or rows that are eligible for migration. • You can't INSERT rows into a Stretch-enabled table on a linked server. • You can't create an index for a view that includes Stretch-enabled tables. • Filters on SQL Server indexes are not propagated to the remote table. • Limitations that currently prevent you from enabling Stretch for a table • Tables that have more than 1,023 columns or more than 998 indexes • FileTables or tables that contain FILESTREAM data • Tables that are replicated, or that are actively using Change Tracking or Change Data Capture • Memory-optimized tables • Data types: text, ntext, image, timestamp, sql_variant, XML, and CLR data types including geometry, geography, hierarchyid • Computed columns • Default constraints and check constraints • Foreign key constraints that reference the table. • Full text indexes, XML indexes, Spatial indexes, Indexed views https://docs.microsoft.com/en-us/sql/sql-server/stretch-database/limitations-for-stretch-database
  • 52. • Today’s event was sponsored by: Microsoft Malta : location and refreshments Gaming Innovation Group : Parking vouchers • The Tech-Spark community requires your help. Sponsor an event by providing a meeting place, refreshments, and why not, deliver a session! Feel free to contact us should you want to help.
  • 53. Contact Us Ralph Attard raland@raland.net Tech Spark http://www.tech-spark.com https://www.facebook.com/techsparkmalta