SlideShare ist ein Scribd-Unternehmen logo
1 von 51
Azure SQL DWH
Big data-as-a-service by
About me
4/4/2016 Azure SQL DWH
 Shy Engelberg, CTO @
 Email : Shy@Valinor.co.il
 Phone : 054-771-711-5
 Twitter : @ShyEngelberg
2 |
Agenda
4/4/2016 Azure SQL DWH
 SQL DWH introduction
 Architecture
 Creating a DWH
 Loading data
 Tools
 Summary
3 |
Objectives
4/4/2016 Azure SQL DWH
 Know what Azure SQL Data warehouse is
 Know how Azure SQL Data warehouse works
 Know how to create and connect to Azure
SQL Data warehouse
 Know the basics tools and methods to get
started with developing
 Identify scenarios that this solution might suit.
4 |
The data warehouse fairytale
 Once upon a time, data warehouse was an appliance who required
fixed combinations of
storage and compute,
often underutilizing
expensive resources.
Meaning
monstrous
hardware was
lying unused.
The data warehouse fairytale
 Once upon a time, data warehouse was an appliance who required
fixed combinations of
storage and compute,
often underutilizing
expensive resources.
Meaning
monstrous
hardware was
lying unused.
What is Azure SQL Data warehouse
4/4/2016 Azure SQL DWH7 |
 an enterprise-class, distributed database
capable of processing massive volumes of
relational and non-relational data.
 It is the industry's first cloud data warehouse
that combines proven SQL capabilities with
the ability to grow, shrink, and pause in
seconds.
 Now on GA! (July 14th)
What is Azure SQL Data warehouse
4/4/2016 Azure SQL DWH8 |
 an enterprise-class,
capable of of
and data.
 It is the industry's first data warehouse
that combines proven with
the ability to in
seconds.
What is Azure SQL Data warehouse
4/4/2016 Azure SQL DWH9 |
– Azure PaaS
– an MPP
– up to PBs
– a relational DB that can
query also non-relational data
– based on the product we know and
love
– use what you need, when
you need it.
What is Azure SQL Data warehouse
4/4/2016 Azure SQL DWH10 |
 Easily deploys in seconds.
 Pay for query performance only when you
need it (or you can pause it completely)
 Fully managed service, removes the hassle
of software patching, maintenance, back-ups.
What is Azure SQL Data warehouse
4/4/2016 Azure SQL DWH11 |
SQL Data Warehouse uses Microsoft’s
massively parallel processing (MPP)
architecture. You pay for time-to-insight, not
hardware. (details are a few slides ahead)
What is Azure SQL Data warehouse
4/4/2016 Azure SQL DWH12 |
Using PolyBase, leverage Transact-SQL to
query seamlessly across both relational data in
a relational database and non-relational data in
common Hadoop formats.
What is Azure SQL Data warehouse
4/4/2016 Azure SQL DWH13 |
SQL Data Warehouse is based on the proven
relational database engine of SQL Server and
includes the features you expect, including
stored procedures, UDF’s, partitioning, indexes,
and collations.
If you already know Transact-SQL, its easy to
transfer your knowledge to SQL Data
Warehouse.
What is Azure SQL Data warehouse
4/4/2016 Azure SQL DWH14 |
You can grow or shrink compute power in
minutes. Take full advantage of storage at
cloud scale, and apply query compute based
on changing performance needs.
When compute is paused, you pay only for
storage.
Architecture
4/4/2016 Azure SQL DWH15 |
 At its core, SQL Data Warehouse uses
Microsoft’s massive parallel processing
(MPP) architecture, originally designed to run
some of the largest on-premises enterprise
data warehouses.
Architecture
4/4/2016 Azure SQL DWH16 |
 At its core, SQL Data Warehouse uses
Microsoft’s
architecture, originally designed to run
some of the largest on-premises enterprise
data warehouses.
Architecture – MPP
The coordinated processing of a program by
multiple processors working on different parts
of the program.
Each processor has its own operating system
and memory.
The MPP way
Mission-
process
a lot of
data
The SMP way
Scale
for better
performance
The SMP way The MPP way
Architecture – MPP
Architecture – MPP
 Breaks a large queries across nodes for
simultaneous processing.
 Every node is “working” on a local subset of the
data.
 Capable of higher data ingestion rates through
parallelization.
 Scale horizontally by adding nodes, rather than
moving to a server with more CPUs or higher
storage capacity.
 Unlike SMP – there is no single bottleneck.
Architecture – Azure SQL Data warehouse
 SQL Data Warehouse independently scales
compute and storage.
 This concept is what allows us the ability to
pause compute, scale performance in
seconds, and pay only for the performance
we need.
Architecture – Azure SQL Data warehouse
 SQL Data Warehouse
.
 This concept is what allows us the ability to
, and
we need.
Architecture – Azure SQL Data warehouse
Data management service
Control node (MPP engine)
Control DBs TempDBMaster
SQL Server
Azure blob storage
Data management service
Compute node 2
User Data
SQL
Server
Data management service
Compute node 1
User Data
SQL
Server
Architecture – Azure SQL Data warehouse
Data management service
Control node (MPP engine)
Control DBs TempDBMaster
SQL Server
• “Controls" the system.
• It is the front end that interacts with all
applications and connections.
• powered by SQL Database, and
connecting to it looks and feels the
same.
• Under the surface, the Control node
coordinates all of the data movement
and computation required to run parallel
queries on your distributed data.
• When you submit a TSQL query to SQL
Data Warehouse, the Control node
transforms it into separate queries that
will run on each Compute node in
parallel.
Architecture – Azure SQL Data warehouse
Data management service
Control node (MPP engine)
Control DBs TempDBMaster
SQL Server
Azure blob storage
Data management service
Compute node 2
User Data
SQL
Server
Data management service
Compute node 1
User Data
SQL
Server
Architecture – Azure SQL Data warehouse
User Data
SQL
Server
Data management service
Compute node 1
User Data
SQL
Server
 SQL Databases which
process your query
steps and manage
your data.
 The Compute nodes
are the workers that
run the parallel
queries on your data.
 After processing, they
pass the results back
to the Control node.
To finish the
query, the
Control node
aggregates the
results and
returns the
final result.
Architecture – Azure SQL Data warehouse
Data management service
Control node (MPP engine)
Control DBs TempDBMaster
SQL Server
Azure blob storage
Data management service
Compute node 2
User Data
SQL
Server
Data management service
Compute node 1
User Data
SQL
Server
Architecture – Azure SQL Data warehouse
 Data Movement Service (DMS)
is our technology for moving
data between the nodes.
 DMS gives the Compute nodes
access to data they need for
joins and aggregations.
 DMS is not an Azure service. It
is a Windows service that runs
alongside SQL Database on all
the nodes.
Data management service
Architecture – Azure SQL Data warehouse
Data management service
Control node (MPP engine)
Control DBs TempDBMaster
SQL Server
Azure blob storage
Data management service
Compute node 2
User Data
SQL
Server
Data management service
Compute node 1
User Data
SQL
Server
Architecture – Azure SQL Data warehouse
Azure blob storage
 Data is stored in Azure Storage Blobs.
 When Compute nodes interact with
data, they write and read directly to and
from blob storage.
 Since Azure storage expands
transparently and limitlessly, SQL Data
Warehouse can do the same.
 Since compute and storage are
independent, SQL Data Warehouse can
automatically scale storage separately
from scaling compute, and vice-versa.
Azure Storage is fully fault tolerant.
Architecture – scaling
 Since each compute node only works on a
subset of the data, if we want to scale, all we
need to do is add more compute nodes.
 The “Magic” is that we can add more
compute nodes without moving
(redistributing) the data.
 The scaling takes only a couple of minutes
(initializing the compute node)
Architecture – scaling
 Changing the amount of
compute is as simple as
moving a slider to the
left or right, but can also be
scheduled using T-SQL or PShell.
 Compute usage in SQL Data
Warehouse is measured
using SQL Data Warehouse Units (DWUs).
Architecture – data distribution
 All tables are distributed.
 Each distribution is like a bucket;
storing a unique subset of the data.
 For now, SQL DW has 60 distributions.
Each table is divided into 60 different distributions,
from the moment it’s created.
When there’s only one compute node, it holds all
distributions, when there’s more, the distributions
are spread among them.
Architecture – finally
 Bring all the data you want, pay only for the
storage.
 If you want to query your data (dahhh), pay
only for the compute you need, when you
need.
 Classic MPP design, scaling is almost linear.
 We don’t really need to know how many
nodes or distributes are there under the cover
– it’s a PaaS, we are guaranteed a certain
amount of performance.
Creating a DWH
 Creating is simple as 1,2,3…
 DWH is defined in a “Server”
just like Azure SQL database.
 Pause and scale are a button away.
DEMO
Creating a table – distribution
 When a table is created, it is spread across
all of the distributions.
 We need to choose how to distribute the
data:
 Hash
 Round-robin
(evenly but randomly,
default behavior)
Creating a table – type
 The default behavior is that a table is created
with a clustered column store index.
(which makes Azure SQL DWH, a columnar
database)
 Choose, what type of table during creation:
Creating a table – statistics
 Statistics are not created automatically,
we have to create them ourselves!
 Statistics are not updated automatically!
Connecting and creating a table
DEMO
 Add firewall rule
 Connect using SSMS
 Create a table
 Create statistics
Loading data
 SQL Data Warehouse supports many loading
methods:
 SSIS
 BCP
 SQLBulkCopy API
 Azure Data Factory
PolyBase
The “Push” methods – a query
that goes through the “control
node”, which becomes a
bottleneck. (single-client gated)
by far the fastest and most
scalable SQL Data Warehouse
loading method to date
Loading data – PolyBase
 PolyBase is a scalable, query processing
framework compatible with Transact-SQL that
can be used to combine and bridge data across
relational database management systems and
Azure Blob Storage.
 Currently PolyBase can load data from UTF-8
encoded delimited text files as well as the
popular Hadoop file formats RC File, ORC, and
Parquet.
PolyBase can load data from gzip, zlib and
Snappy compressed files.
 A “Pull” method.
 Every compute node,
has an HDFS bridge
in the DMS service.
Every bridge can
parallel connect to
external resources.
Loading data – PolyBase
 PolyBase data loading is not limited by the
Control node, and so as you scale out your
DWU, your data transfer throughput also
increases.
 A recommended way of loading data:
1. write your source to CSV files
2. put the files in Azure Blob Storage
3. Load using PolyBase
Loading data – PolyBase
Loading data – PolyBase
DEMO
 Query using PolyBase
 CTAS to load data using PolyBase
This is SQL Server inside – you know it all.
 Uses almost the same T-SQL.
 Supports views, stored procedures, partitions
and many other known and loved features.
 Built-in HADR (it’s a PaaS, remember?)
 Out-of-the-box backup and restore service.
Azure SQL Data warehouse – features
 SSMS is not yet fully supported  (but very
soon) – SSMS is supported (July 14th)!
 Visual studio (SSDT) is supported 
 Integrates easily to Azure ML, PowerBI and
Data Factory.
 Many 3rd party solutions are available:
Azure SQL Data warehouse – tools
 Best used for data processing scenarios, not
as a data store all users can query, because:
 Concurrency is very limited (like all MPPs)
 Use to load data, process it, and move it to
the next step or for individual queries that
needs massive amount of processing.
Azure SQL Data warehouse – usage
Azure SQL Data warehouse – Summary
 A relational columnar DWH.
 MPP service that allows us to scale compute in
separate from storage.
 We can pause the compute whenever needed.
 Using the infinite power of the cloud, we can
process as much data as we want, and use as
much power as we want.
 We don’t need to buy expensive hardware.
Azure SQL Data warehouse – Summary
Best for:
 Your data is already in Azure.
 You need scheduled computing power for
data processing.
 You have a lot of data, but don’t want to
spend a lot of money.
 Not for concurrent querying!
THANK YOU!
Questions?

Weitere ähnliche Inhalte

Was ist angesagt?

What's new in SQL Server 2016
What's new in SQL Server 2016What's new in SQL Server 2016
What's new in SQL Server 2016James Serra
 
Azure Lowlands: An intro to Azure Data Lake
Azure Lowlands: An intro to Azure Data LakeAzure Lowlands: An intro to Azure Data Lake
Azure Lowlands: An intro to Azure Data LakeRick van den Bosch
 
Snowflake essentials
Snowflake essentialsSnowflake essentials
Snowflake essentialsqureshihamid
 
Snowflake Best Practices for Elastic Data Warehousing
Snowflake Best Practices for Elastic Data WarehousingSnowflake Best Practices for Elastic Data Warehousing
Snowflake Best Practices for Elastic Data WarehousingAmazon Web Services
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's includedJames Serra
 
Changing the game with cloud dw
Changing the game with cloud dwChanging the game with cloud dw
Changing the game with cloud dwelephantscale
 
HA/DR options with SQL Server in Azure and hybrid
HA/DR options with SQL Server in Azure and hybridHA/DR options with SQL Server in Azure and hybrid
HA/DR options with SQL Server in Azure and hybridJames Serra
 
Data warehouse con azure synapse analytics
Data warehouse con azure synapse analyticsData warehouse con azure synapse analytics
Data warehouse con azure synapse analyticsEduardo Castro
 
Azure Data Factory V2; The Data Flows
Azure Data Factory V2; The Data FlowsAzure Data Factory V2; The Data Flows
Azure Data Factory V2; The Data FlowsThomas Sykes
 
Afternoons with Azure - Azure Data Services
Afternoons with Azure - Azure Data ServicesAfternoons with Azure - Azure Data Services
Afternoons with Azure - Azure Data ServicesCCG
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overviewJames Serra
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategyJames Serra
 
Azure SQL Database Managed Instance
Azure SQL Database Managed InstanceAzure SQL Database Managed Instance
Azure SQL Database Managed InstanceJames Serra
 
Cortana Analytics Suite
Cortana Analytics SuiteCortana Analytics Suite
Cortana Analytics SuiteJames Serra
 
Delivering rapid-fire Analytics with Snowflake and Tableau
Delivering rapid-fire Analytics with Snowflake and TableauDelivering rapid-fire Analytics with Snowflake and Tableau
Delivering rapid-fire Analytics with Snowflake and TableauHarald Erb
 
Store Data in Azure SQL Database
Store Data in Azure SQL DatabaseStore Data in Azure SQL Database
Store Data in Azure SQL DatabaseSuhail Jamaldeen
 
Self-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Self-serve analytics journey at Celtra: Snowflake, Spark, and DatabricksSelf-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Self-serve analytics journey at Celtra: Snowflake, Spark, and DatabricksGrega Kespret
 

Was ist angesagt? (20)

What's new in SQL Server 2016
What's new in SQL Server 2016What's new in SQL Server 2016
What's new in SQL Server 2016
 
Azure Lowlands: An intro to Azure Data Lake
Azure Lowlands: An intro to Azure Data LakeAzure Lowlands: An intro to Azure Data Lake
Azure Lowlands: An intro to Azure Data Lake
 
Snowflake essentials
Snowflake essentialsSnowflake essentials
Snowflake essentials
 
Snowflake Best Practices for Elastic Data Warehousing
Snowflake Best Practices for Elastic Data WarehousingSnowflake Best Practices for Elastic Data Warehousing
Snowflake Best Practices for Elastic Data Warehousing
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's included
 
Changing the game with cloud dw
Changing the game with cloud dwChanging the game with cloud dw
Changing the game with cloud dw
 
Introduction to Azure Data Lake
Introduction to Azure Data LakeIntroduction to Azure Data Lake
Introduction to Azure Data Lake
 
HA/DR options with SQL Server in Azure and hybrid
HA/DR options with SQL Server in Azure and hybridHA/DR options with SQL Server in Azure and hybrid
HA/DR options with SQL Server in Azure and hybrid
 
Data warehouse con azure synapse analytics
Data warehouse con azure synapse analyticsData warehouse con azure synapse analytics
Data warehouse con azure synapse analytics
 
Azure Data Factory V2; The Data Flows
Azure Data Factory V2; The Data FlowsAzure Data Factory V2; The Data Flows
Azure Data Factory V2; The Data Flows
 
Afternoons with Azure - Azure Data Services
Afternoons with Azure - Azure Data ServicesAfternoons with Azure - Azure Data Services
Afternoons with Azure - Azure Data Services
 
Snowflake Datawarehouse Architecturing
Snowflake Datawarehouse ArchitecturingSnowflake Datawarehouse Architecturing
Snowflake Datawarehouse Architecturing
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategy
 
Azure SQL Database Managed Instance
Azure SQL Database Managed InstanceAzure SQL Database Managed Instance
Azure SQL Database Managed Instance
 
An intro to Azure Data Lake
An intro to Azure Data LakeAn intro to Azure Data Lake
An intro to Azure Data Lake
 
Cortana Analytics Suite
Cortana Analytics SuiteCortana Analytics Suite
Cortana Analytics Suite
 
Delivering rapid-fire Analytics with Snowflake and Tableau
Delivering rapid-fire Analytics with Snowflake and TableauDelivering rapid-fire Analytics with Snowflake and Tableau
Delivering rapid-fire Analytics with Snowflake and Tableau
 
Store Data in Azure SQL Database
Store Data in Azure SQL DatabaseStore Data in Azure SQL Database
Store Data in Azure SQL Database
 
Self-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Self-serve analytics journey at Celtra: Snowflake, Spark, and DatabricksSelf-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Self-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
 

Andere mochten auch

AnalyticsConf : Azure SQL Data Warehouse
AnalyticsConf : Azure SQL Data WarehouseAnalyticsConf : Azure SQL Data Warehouse
AnalyticsConf : Azure SQL Data WarehouseWlodek Bielski
 
SQL Azure Data Warehouse - Silviu Niculita
SQL Azure Data Warehouse - Silviu NiculitaSQL Azure Data Warehouse - Silviu Niculita
SQL Azure Data Warehouse - Silviu NiculitaITCamp
 
Introducing Azure SQL Database
Introducing Azure SQL DatabaseIntroducing Azure SQL Database
Introducing Azure SQL DatabaseJames Serra
 
20060416 Azure Boot Camp 2016- Azure Data Lake Storage and Analytics
20060416   Azure Boot Camp 2016- Azure Data Lake Storage and Analytics20060416   Azure Boot Camp 2016- Azure Data Lake Storage and Analytics
20060416 Azure Boot Camp 2016- Azure Data Lake Storage and AnalyticsŁukasz Grala
 
Microsoft Azure Batch
Microsoft Azure BatchMicrosoft Azure Batch
Microsoft Azure BatchKhalid Salama
 
[JSS2015] Azure SQL Data Warehouse - Azure Data Lake
[JSS2015] Azure SQL Data Warehouse - Azure Data Lake[JSS2015] Azure SQL Data Warehouse - Azure Data Lake
[JSS2015] Azure SQL Data Warehouse - Azure Data LakeGUSS
 
SQL Saturday #313 Rheinland - MapReduce in der Praxis
SQL Saturday #313 Rheinland - MapReduce in der PraxisSQL Saturday #313 Rheinland - MapReduce in der Praxis
SQL Saturday #313 Rheinland - MapReduce in der PraxisSascha Dittmann
 
How to deploy SQL Server on an Microsoft Azure virtual machines
How to deploy SQL Server on an Microsoft Azure virtual machinesHow to deploy SQL Server on an Microsoft Azure virtual machines
How to deploy SQL Server on an Microsoft Azure virtual machinesSolarWinds
 
Geek Sync I Polybase and Time Travel (Temporal Tables)
Geek Sync I Polybase and Time Travel (Temporal Tables)Geek Sync I Polybase and Time Travel (Temporal Tables)
Geek Sync I Polybase and Time Travel (Temporal Tables)IDERA Software
 
Datawarehouse como servicio en azure (sqldw)
Datawarehouse como servicio en azure (sqldw)Datawarehouse como servicio en azure (sqldw)
Datawarehouse como servicio en azure (sqldw)Enrique Catala Bañuls
 
Enterprise Cloud Data Platforms - with Microsoft Azure
Enterprise Cloud Data Platforms - with Microsoft AzureEnterprise Cloud Data Platforms - with Microsoft Azure
Enterprise Cloud Data Platforms - with Microsoft AzureKhalid Salama
 
Machine learning with Spark
Machine learning with SparkMachine learning with Spark
Machine learning with SparkKhalid Salama
 
Intorducing Big Data and Microsoft Azure
Intorducing Big Data and Microsoft AzureIntorducing Big Data and Microsoft Azure
Intorducing Big Data and Microsoft AzureKhalid Salama
 
Cortana Analytics Workshop: Azure Data Lake
Cortana Analytics Workshop: Azure Data LakeCortana Analytics Workshop: Azure Data Lake
Cortana Analytics Workshop: Azure Data LakeMSAdvAnalytics
 
Microsoft Azure - SQL Data Warehouse
Microsoft Azure - SQL Data WarehouseMicrosoft Azure - SQL Data Warehouse
Microsoft Azure - SQL Data WarehouseMicrosoft
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsKhalid Salama
 
Azure Data platform
Azure Data platformAzure Data platform
Azure Data platformMostafa
 
Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudJames Serra
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureMark Kromer
 
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature Mapping
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature MappingMicrosoft Azure vs Amazon Web Services (AWS) Services & Feature Mapping
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature MappingIlyas F ☁☁☁
 

Andere mochten auch (20)

AnalyticsConf : Azure SQL Data Warehouse
AnalyticsConf : Azure SQL Data WarehouseAnalyticsConf : Azure SQL Data Warehouse
AnalyticsConf : Azure SQL Data Warehouse
 
SQL Azure Data Warehouse - Silviu Niculita
SQL Azure Data Warehouse - Silviu NiculitaSQL Azure Data Warehouse - Silviu Niculita
SQL Azure Data Warehouse - Silviu Niculita
 
Introducing Azure SQL Database
Introducing Azure SQL DatabaseIntroducing Azure SQL Database
Introducing Azure SQL Database
 
20060416 Azure Boot Camp 2016- Azure Data Lake Storage and Analytics
20060416   Azure Boot Camp 2016- Azure Data Lake Storage and Analytics20060416   Azure Boot Camp 2016- Azure Data Lake Storage and Analytics
20060416 Azure Boot Camp 2016- Azure Data Lake Storage and Analytics
 
Microsoft Azure Batch
Microsoft Azure BatchMicrosoft Azure Batch
Microsoft Azure Batch
 
[JSS2015] Azure SQL Data Warehouse - Azure Data Lake
[JSS2015] Azure SQL Data Warehouse - Azure Data Lake[JSS2015] Azure SQL Data Warehouse - Azure Data Lake
[JSS2015] Azure SQL Data Warehouse - Azure Data Lake
 
SQL Saturday #313 Rheinland - MapReduce in der Praxis
SQL Saturday #313 Rheinland - MapReduce in der PraxisSQL Saturday #313 Rheinland - MapReduce in der Praxis
SQL Saturday #313 Rheinland - MapReduce in der Praxis
 
How to deploy SQL Server on an Microsoft Azure virtual machines
How to deploy SQL Server on an Microsoft Azure virtual machinesHow to deploy SQL Server on an Microsoft Azure virtual machines
How to deploy SQL Server on an Microsoft Azure virtual machines
 
Geek Sync I Polybase and Time Travel (Temporal Tables)
Geek Sync I Polybase and Time Travel (Temporal Tables)Geek Sync I Polybase and Time Travel (Temporal Tables)
Geek Sync I Polybase and Time Travel (Temporal Tables)
 
Datawarehouse como servicio en azure (sqldw)
Datawarehouse como servicio en azure (sqldw)Datawarehouse como servicio en azure (sqldw)
Datawarehouse como servicio en azure (sqldw)
 
Enterprise Cloud Data Platforms - with Microsoft Azure
Enterprise Cloud Data Platforms - with Microsoft AzureEnterprise Cloud Data Platforms - with Microsoft Azure
Enterprise Cloud Data Platforms - with Microsoft Azure
 
Machine learning with Spark
Machine learning with SparkMachine learning with Spark
Machine learning with Spark
 
Intorducing Big Data and Microsoft Azure
Intorducing Big Data and Microsoft AzureIntorducing Big Data and Microsoft Azure
Intorducing Big Data and Microsoft Azure
 
Cortana Analytics Workshop: Azure Data Lake
Cortana Analytics Workshop: Azure Data LakeCortana Analytics Workshop: Azure Data Lake
Cortana Analytics Workshop: Azure Data Lake
 
Microsoft Azure - SQL Data Warehouse
Microsoft Azure - SQL Data WarehouseMicrosoft Azure - SQL Data Warehouse
Microsoft Azure - SQL Data Warehouse
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake Analytics
 
Azure Data platform
Azure Data platformAzure Data platform
Azure Data platform
 
Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloud
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft Azure
 
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature Mapping
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature MappingMicrosoft Azure vs Amazon Web Services (AWS) Services & Feature Mapping
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature Mapping
 

Ähnlich wie Azure SQL DWH

Exploring Microsoft Azure Infrastructures
Exploring Microsoft Azure InfrastructuresExploring Microsoft Azure Infrastructures
Exploring Microsoft Azure InfrastructuresCCG
 
Best-Practices-for-Using-Tableau-With-Snowflake.pdf
Best-Practices-for-Using-Tableau-With-Snowflake.pdfBest-Practices-for-Using-Tableau-With-Snowflake.pdf
Best-Practices-for-Using-Tableau-With-Snowflake.pdfssuserf8f9b2
 
Scalable relational database with SQL Azure
Scalable relational database with SQL AzureScalable relational database with SQL Azure
Scalable relational database with SQL AzureShy Engelberg
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)James Serra
 
Microsoft Data Integration Pipelines: Azure Data Factory and SSIS
Microsoft Data Integration Pipelines: Azure Data Factory and SSISMicrosoft Data Integration Pipelines: Azure Data Factory and SSIS
Microsoft Data Integration Pipelines: Azure Data Factory and SSISMark Kromer
 
Sql server 2016 Discovery Day
Sql server 2016 Discovery DaySql server 2016 Discovery Day
Sql server 2016 Discovery DayThomas Sykes
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Martin Bém
 
Azure Data.pptx
Azure Data.pptxAzure Data.pptx
Azure Data.pptxFedoRam1
 
01_DP-300T00A-Intro.pptx
01_DP-300T00A-Intro.pptx01_DP-300T00A-Intro.pptx
01_DP-300T00A-Intro.pptxKareemBullard1
 
Azure from scratch part 3 By Girish Kalamati
Azure from scratch part 3 By Girish KalamatiAzure from scratch part 3 By Girish Kalamati
Azure from scratch part 3 By Girish KalamatiGirish Kalamati
 
SQL or NoSQL, is this the question? - George Grammatikos
SQL or NoSQL, is this the question? - George GrammatikosSQL or NoSQL, is this the question? - George Grammatikos
SQL or NoSQL, is this the question? - George GrammatikosGeorge Grammatikos
 
Analytics in the Cloud
Analytics in the CloudAnalytics in the Cloud
Analytics in the CloudRoss McNeely
 

Ähnlich wie Azure SQL DWH (20)

Exploring Microsoft Azure Infrastructures
Exploring Microsoft Azure InfrastructuresExploring Microsoft Azure Infrastructures
Exploring Microsoft Azure Infrastructures
 
AZURE Data Related Services
AZURE Data Related ServicesAZURE Data Related Services
AZURE Data Related Services
 
Best-Practices-for-Using-Tableau-With-Snowflake.pdf
Best-Practices-for-Using-Tableau-With-Snowflake.pdfBest-Practices-for-Using-Tableau-With-Snowflake.pdf
Best-Practices-for-Using-Tableau-With-Snowflake.pdf
 
Scalable relational database with SQL Azure
Scalable relational database with SQL AzureScalable relational database with SQL Azure
Scalable relational database with SQL Azure
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
 
Azure synapse by usama whaba khan
Azure synapse by usama whaba khanAzure synapse by usama whaba khan
Azure synapse by usama whaba khan
 
Azure SQL Data Warehouse
Azure SQL Data Warehouse Azure SQL Data Warehouse
Azure SQL Data Warehouse
 
Microsoft Data Integration Pipelines: Azure Data Factory and SSIS
Microsoft Data Integration Pipelines: Azure Data Factory and SSISMicrosoft Data Integration Pipelines: Azure Data Factory and SSIS
Microsoft Data Integration Pipelines: Azure Data Factory and SSIS
 
Sql server 2016 Discovery Day
Sql server 2016 Discovery DaySql server 2016 Discovery Day
Sql server 2016 Discovery Day
 
Azure SQL Database
Azure SQL DatabaseAzure SQL Database
Azure SQL Database
 
No sql database
No sql databaseNo sql database
No sql database
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
 
UNIT -IV.docx
UNIT -IV.docxUNIT -IV.docx
UNIT -IV.docx
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
 
Azure Data.pptx
Azure Data.pptxAzure Data.pptx
Azure Data.pptx
 
01_DP-300T00A-Intro.pptx
01_DP-300T00A-Intro.pptx01_DP-300T00A-Intro.pptx
01_DP-300T00A-Intro.pptx
 
Azure from scratch part 3 By Girish Kalamati
Azure from scratch part 3 By Girish KalamatiAzure from scratch part 3 By Girish Kalamati
Azure from scratch part 3 By Girish Kalamati
 
SQL or NoSQL, is this the question? - George Grammatikos
SQL or NoSQL, is this the question? - George GrammatikosSQL or NoSQL, is this the question? - George Grammatikos
SQL or NoSQL, is this the question? - George Grammatikos
 
How to Win When Migrating to Azure
How to Win When Migrating to AzureHow to Win When Migrating to Azure
How to Win When Migrating to Azure
 
Analytics in the Cloud
Analytics in the CloudAnalytics in the Cloud
Analytics in the Cloud
 

Kürzlich hochgeladen

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 

Kürzlich hochgeladen (20)

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 

Azure SQL DWH

  • 1. Azure SQL DWH Big data-as-a-service by
  • 2. About me 4/4/2016 Azure SQL DWH  Shy Engelberg, CTO @  Email : Shy@Valinor.co.il  Phone : 054-771-711-5  Twitter : @ShyEngelberg 2 |
  • 3. Agenda 4/4/2016 Azure SQL DWH  SQL DWH introduction  Architecture  Creating a DWH  Loading data  Tools  Summary 3 |
  • 4. Objectives 4/4/2016 Azure SQL DWH  Know what Azure SQL Data warehouse is  Know how Azure SQL Data warehouse works  Know how to create and connect to Azure SQL Data warehouse  Know the basics tools and methods to get started with developing  Identify scenarios that this solution might suit. 4 |
  • 5. The data warehouse fairytale  Once upon a time, data warehouse was an appliance who required fixed combinations of storage and compute, often underutilizing expensive resources. Meaning monstrous hardware was lying unused.
  • 6. The data warehouse fairytale  Once upon a time, data warehouse was an appliance who required fixed combinations of storage and compute, often underutilizing expensive resources. Meaning monstrous hardware was lying unused.
  • 7. What is Azure SQL Data warehouse 4/4/2016 Azure SQL DWH7 |  an enterprise-class, distributed database capable of processing massive volumes of relational and non-relational data.  It is the industry's first cloud data warehouse that combines proven SQL capabilities with the ability to grow, shrink, and pause in seconds.  Now on GA! (July 14th)
  • 8. What is Azure SQL Data warehouse 4/4/2016 Azure SQL DWH8 |  an enterprise-class, capable of of and data.  It is the industry's first data warehouse that combines proven with the ability to in seconds.
  • 9. What is Azure SQL Data warehouse 4/4/2016 Azure SQL DWH9 | – Azure PaaS – an MPP – up to PBs – a relational DB that can query also non-relational data – based on the product we know and love – use what you need, when you need it.
  • 10. What is Azure SQL Data warehouse 4/4/2016 Azure SQL DWH10 |  Easily deploys in seconds.  Pay for query performance only when you need it (or you can pause it completely)  Fully managed service, removes the hassle of software patching, maintenance, back-ups.
  • 11. What is Azure SQL Data warehouse 4/4/2016 Azure SQL DWH11 | SQL Data Warehouse uses Microsoft’s massively parallel processing (MPP) architecture. You pay for time-to-insight, not hardware. (details are a few slides ahead)
  • 12. What is Azure SQL Data warehouse 4/4/2016 Azure SQL DWH12 | Using PolyBase, leverage Transact-SQL to query seamlessly across both relational data in a relational database and non-relational data in common Hadoop formats.
  • 13. What is Azure SQL Data warehouse 4/4/2016 Azure SQL DWH13 | SQL Data Warehouse is based on the proven relational database engine of SQL Server and includes the features you expect, including stored procedures, UDF’s, partitioning, indexes, and collations. If you already know Transact-SQL, its easy to transfer your knowledge to SQL Data Warehouse.
  • 14. What is Azure SQL Data warehouse 4/4/2016 Azure SQL DWH14 | You can grow or shrink compute power in minutes. Take full advantage of storage at cloud scale, and apply query compute based on changing performance needs. When compute is paused, you pay only for storage.
  • 15. Architecture 4/4/2016 Azure SQL DWH15 |  At its core, SQL Data Warehouse uses Microsoft’s massive parallel processing (MPP) architecture, originally designed to run some of the largest on-premises enterprise data warehouses.
  • 16. Architecture 4/4/2016 Azure SQL DWH16 |  At its core, SQL Data Warehouse uses Microsoft’s architecture, originally designed to run some of the largest on-premises enterprise data warehouses.
  • 17. Architecture – MPP The coordinated processing of a program by multiple processors working on different parts of the program. Each processor has its own operating system and memory.
  • 18. The MPP way Mission- process a lot of data The SMP way
  • 21. Architecture – MPP  Breaks a large queries across nodes for simultaneous processing.  Every node is “working” on a local subset of the data.  Capable of higher data ingestion rates through parallelization.  Scale horizontally by adding nodes, rather than moving to a server with more CPUs or higher storage capacity.  Unlike SMP – there is no single bottleneck.
  • 22. Architecture – Azure SQL Data warehouse  SQL Data Warehouse independently scales compute and storage.  This concept is what allows us the ability to pause compute, scale performance in seconds, and pay only for the performance we need.
  • 23. Architecture – Azure SQL Data warehouse  SQL Data Warehouse .  This concept is what allows us the ability to , and we need.
  • 24. Architecture – Azure SQL Data warehouse Data management service Control node (MPP engine) Control DBs TempDBMaster SQL Server Azure blob storage Data management service Compute node 2 User Data SQL Server Data management service Compute node 1 User Data SQL Server
  • 25. Architecture – Azure SQL Data warehouse Data management service Control node (MPP engine) Control DBs TempDBMaster SQL Server • “Controls" the system. • It is the front end that interacts with all applications and connections. • powered by SQL Database, and connecting to it looks and feels the same. • Under the surface, the Control node coordinates all of the data movement and computation required to run parallel queries on your distributed data. • When you submit a TSQL query to SQL Data Warehouse, the Control node transforms it into separate queries that will run on each Compute node in parallel.
  • 26. Architecture – Azure SQL Data warehouse Data management service Control node (MPP engine) Control DBs TempDBMaster SQL Server Azure blob storage Data management service Compute node 2 User Data SQL Server Data management service Compute node 1 User Data SQL Server
  • 27. Architecture – Azure SQL Data warehouse User Data SQL Server Data management service Compute node 1 User Data SQL Server  SQL Databases which process your query steps and manage your data.  The Compute nodes are the workers that run the parallel queries on your data.  After processing, they pass the results back to the Control node. To finish the query, the Control node aggregates the results and returns the final result.
  • 28. Architecture – Azure SQL Data warehouse Data management service Control node (MPP engine) Control DBs TempDBMaster SQL Server Azure blob storage Data management service Compute node 2 User Data SQL Server Data management service Compute node 1 User Data SQL Server
  • 29. Architecture – Azure SQL Data warehouse  Data Movement Service (DMS) is our technology for moving data between the nodes.  DMS gives the Compute nodes access to data they need for joins and aggregations.  DMS is not an Azure service. It is a Windows service that runs alongside SQL Database on all the nodes. Data management service
  • 30. Architecture – Azure SQL Data warehouse Data management service Control node (MPP engine) Control DBs TempDBMaster SQL Server Azure blob storage Data management service Compute node 2 User Data SQL Server Data management service Compute node 1 User Data SQL Server
  • 31. Architecture – Azure SQL Data warehouse Azure blob storage  Data is stored in Azure Storage Blobs.  When Compute nodes interact with data, they write and read directly to and from blob storage.  Since Azure storage expands transparently and limitlessly, SQL Data Warehouse can do the same.  Since compute and storage are independent, SQL Data Warehouse can automatically scale storage separately from scaling compute, and vice-versa. Azure Storage is fully fault tolerant.
  • 32. Architecture – scaling  Since each compute node only works on a subset of the data, if we want to scale, all we need to do is add more compute nodes.  The “Magic” is that we can add more compute nodes without moving (redistributing) the data.  The scaling takes only a couple of minutes (initializing the compute node)
  • 33. Architecture – scaling  Changing the amount of compute is as simple as moving a slider to the left or right, but can also be scheduled using T-SQL or PShell.  Compute usage in SQL Data Warehouse is measured using SQL Data Warehouse Units (DWUs).
  • 34. Architecture – data distribution  All tables are distributed.  Each distribution is like a bucket; storing a unique subset of the data.  For now, SQL DW has 60 distributions. Each table is divided into 60 different distributions, from the moment it’s created. When there’s only one compute node, it holds all distributions, when there’s more, the distributions are spread among them.
  • 35. Architecture – finally  Bring all the data you want, pay only for the storage.  If you want to query your data (dahhh), pay only for the compute you need, when you need.  Classic MPP design, scaling is almost linear.  We don’t really need to know how many nodes or distributes are there under the cover – it’s a PaaS, we are guaranteed a certain amount of performance.
  • 36. Creating a DWH  Creating is simple as 1,2,3…  DWH is defined in a “Server” just like Azure SQL database.  Pause and scale are a button away. DEMO
  • 37. Creating a table – distribution  When a table is created, it is spread across all of the distributions.  We need to choose how to distribute the data:  Hash  Round-robin (evenly but randomly, default behavior)
  • 38. Creating a table – type  The default behavior is that a table is created with a clustered column store index. (which makes Azure SQL DWH, a columnar database)  Choose, what type of table during creation:
  • 39. Creating a table – statistics  Statistics are not created automatically, we have to create them ourselves!  Statistics are not updated automatically!
  • 40. Connecting and creating a table DEMO  Add firewall rule  Connect using SSMS  Create a table  Create statistics
  • 41. Loading data  SQL Data Warehouse supports many loading methods:  SSIS  BCP  SQLBulkCopy API  Azure Data Factory PolyBase The “Push” methods – a query that goes through the “control node”, which becomes a bottleneck. (single-client gated) by far the fastest and most scalable SQL Data Warehouse loading method to date
  • 42. Loading data – PolyBase  PolyBase is a scalable, query processing framework compatible with Transact-SQL that can be used to combine and bridge data across relational database management systems and Azure Blob Storage.  Currently PolyBase can load data from UTF-8 encoded delimited text files as well as the popular Hadoop file formats RC File, ORC, and Parquet. PolyBase can load data from gzip, zlib and Snappy compressed files.
  • 43.  A “Pull” method.  Every compute node, has an HDFS bridge in the DMS service. Every bridge can parallel connect to external resources. Loading data – PolyBase
  • 44.  PolyBase data loading is not limited by the Control node, and so as you scale out your DWU, your data transfer throughput also increases.  A recommended way of loading data: 1. write your source to CSV files 2. put the files in Azure Blob Storage 3. Load using PolyBase Loading data – PolyBase
  • 45. Loading data – PolyBase DEMO  Query using PolyBase  CTAS to load data using PolyBase
  • 46. This is SQL Server inside – you know it all.  Uses almost the same T-SQL.  Supports views, stored procedures, partitions and many other known and loved features.  Built-in HADR (it’s a PaaS, remember?)  Out-of-the-box backup and restore service. Azure SQL Data warehouse – features
  • 47.  SSMS is not yet fully supported  (but very soon) – SSMS is supported (July 14th)!  Visual studio (SSDT) is supported   Integrates easily to Azure ML, PowerBI and Data Factory.  Many 3rd party solutions are available: Azure SQL Data warehouse – tools
  • 48.  Best used for data processing scenarios, not as a data store all users can query, because:  Concurrency is very limited (like all MPPs)  Use to load data, process it, and move it to the next step or for individual queries that needs massive amount of processing. Azure SQL Data warehouse – usage
  • 49. Azure SQL Data warehouse – Summary  A relational columnar DWH.  MPP service that allows us to scale compute in separate from storage.  We can pause the compute whenever needed.  Using the infinite power of the cloud, we can process as much data as we want, and use as much power as we want.  We don’t need to buy expensive hardware.
  • 50. Azure SQL Data warehouse – Summary Best for:  Your data is already in Azure.  You need scheduled computing power for data processing.  You have a lot of data, but don’t want to spend a lot of money.  Not for concurrent querying!