This document provides an overview of Microsoft Azure Data Services and Azure SQL Database. It discusses Infrastructure as a Service (IaaS) versus Platform as a Service (PaaS), and highlights the opportunities in the Linux database market. It also discusses Microsoft's commitment to customer choice and partnerships with companies like Red Hat. The remainder of the document focuses on features of Azure SQL Database, including an overview of the DTU and vCore purchasing models, managed instances, backup and recovery, high availability options, elastic scalability, and data sync capabilities.
7. There’s big
opportunity
$15B+
Linux DB
market by 2019
Source: Cloud Market Intelligence, FY16 H1 LRF (Nov 2015)
Windows
Linux
Relational DB market
growth through 2019
New server shipments of Linux
expected to be 2.4xthat of
Windows by FY 2021
6.6%
per year
Microsoft is the only
Gartner RDBMS
Magic Quadrant
vendor without
support for Linux
8. Committed
to choice
Azure and Red Hat partnership
HDInsight for Linux
R Server on Linux
SQL Server on Linux
So for the first time
now, we have the
ability to go to an
enterprise and talk
about that entire data
estate across Windows
and Linux.
11. Windows Linux
Developer, Express, Web, Standard, Enterprise
Database Engine, Integration Services
R Services, Analysis Services, Reporting Services, MDS, DQS
Maximum number of cores Unlimited Unlimited
Maximum memory utilized per instance 24 TB 12 TB
Maximum database size 524 PB 524 PB
Basic OLTP (Basic In-Memory OLTP, Basic operational analytics)
Advanced OLTP (Advanced In-Memory OLTP, Advanced operational analytics)
Basic high availability (2-node single database failover, non-readable secondary)
Advanced HA (Always On - multi-node, multi-db failover, readable secondaries)
Security
Basic security (Basic auditing, Row-level security, Data masking, Always Encrypted)
Advanced security (Transparent Data Encryption)
Data
warehousing
PolyBase2
Basic data warehousing/data marts (Basic In-Memory ColumnStore, Partitioning, Compression)
Advanced data warehousing (Advanced In-Memory ColumnStore)
Advanced data integration (Fuzzy grouping and look ups)
Tools
Windows ecosystem: Full-fidelity Management & Dev Tool (SSMS & SSDT), command line tools
Linux/OSX/Windows ecosystem: Dev tools (VS Code), DB Admin GUI tool, command line tools
Developer
Programmability (T-SQL, CLR, Data Types, JSON)
Windows Filesystem Integration - FileTable
Business
intelligence
Basic reporting, analytics & data integration
Basic Corporate Business Intelligence (Multi-dimensional models, Basic tabular model)
Advanced Corporate Business Intelligence (Advanced tabular model, DirectQuery, advanced data mining)
Mobile BI (Datazen)
Advanced analytics
Basic “R” integration (Connectivity to R Open, Limited parallelism for ScaleR)
Advanced “R” integration (Full parallelism for ScaleR)
Hybrid cloud Stretch Database
What’s coming in
SQL Server on
Linux
12. 12
Azure SQL Database (PaaS)
Fully managed database-as-a-service that lets you focus on your business
Database provisioning on-demand
Scalable and elastic performance for all workloads
99.99% availability, zero maintenance
Intelligent: learns and adapts to optimize performance
Secure and compliant to protect sensitive data
Geo-replication and restore-from-backup for data protection
Compatible with SQL Server 2014, 2016
14. Seamless and compatibleIntelligent DBaaS Competitive TCO
( 2 0 1 7 ) A Z U R E S Q L DATA B A S E
Privacy and Trust
OPERATIONAL ANALYTICS
Columnstore
Hekaton (in-memory
OLTP)
PREDICTABLE PERFORMANCE
Query Store
Index Optimization
AUTOMATIC TUNING
AUTO QUERY PLAN
CORRECTION
PERFORMANCE INSIGHT IN
OMS
ADAPTIVE QUERY
PROCESSING
SQL GRAPH
ADVANCED ANALYTICS
NATIVE PREDICT
R SERVICES
ACTIVITY MONITORING
Engine Audit
Threat Detection (NEW
SCENARIOS)
CENTRALIZED DASHBOARD
OMS INTEGRATION
ACCESS CONTROL
SQL Firewall
RLS, Dyn. Data Masking
AAD WITH MFA
DATA PROTECTION
Encrypt in motion (TLS)
TDE & BYK
Always Encrypted (S/W)
SERVICE ENDPOINT
ALWAYS ENCRYPTED (SECURE
H/W)
DISCOVERY & ASSESSMENT
VULNERABILITY ASSESSMENT
HA-DR BUILT-IN
99.99% SLA
Geo-restore
ACTIVE GEO REPLICAS (4)
MULTI-AZ
BACKUP AND RESTORE
Backup with health
check
35 days PITR
10 YEARS DATA RETENTION
DISTRIBUTED APPLICATION
Change Tracking
TRANSACTION REPLICATION
DATA SYNC
SSIS SERVICE
BIZ MODEL & SKUS
DTU/eDTU
<=1TB
BIGGER STD: S4-S12
SEPARATE COMPUTE AND
STORAGE
AZURE HYBRID BENEFIT
COST OPTIMIZATION
INTELLIGENT PAAS
16. 16
Azure SQL Database (PaaS)
You need to use a logical server prior to creating your first database.A logical server is the entry point
for the databases and controls logins, firewall rules, auditing rules, thread detection policies and
failover groups.You should not confuse an Azure SQL Database logical server with an on-premises SQL
Server.The logical server is a logical structure that doesn’t provide any way for connecting to instance
or feature level.
Because of how Azure provides high availability to the databases, there is no need for the Logical server
to be on the same region as the databases it manages.Azure SQL Database does not guarantee that
the logical server and its related databases will be on the same region.
This first account is a SQL login account.You can only use SQL login andAzure Active Directory login
accounts.Windows authentication is not supported with SQL logical server.
21. 21
vCore-based model
Each 100 DTU in Standard tier requires at least 1 vCore in General Purpose tier; each
125 DTU in Premium tier requires at least 1 vCore in Business Critical tier.
In the vCore-based purchasing model, you can exchange your existing licenses for
discounted rates on SQL Database using the Azure Hybrid Use Benefit for SQL Server.
This Azure benefit allows you to use your on-premises SQL Server licenses to save
more than 40% on Azure SQL Database using your on-premises SQL Server licenses
with Software Assurance.
If your database or elastic pool consumes more than 300 DTU conversion to vCore
may reduce your cost.
28. 28
Elastic pools
You can configure resources for the pool based
either on the DTU-based purchasing model or the
vCore-based purchasing model.The resource
requirement for a pool is determined by the
aggregate utilization of its databases.The
amount of resources available to the pool is
controlled by the developer budget.
The user adds databases to the pool, sets the
minimum and maximum eDTUS for each
database, and sets the eDTU limit of the pool
based on their budget.This means that within the
pool, each database is given the ability to auto-
scale in a set range.
30. 30
Managed Instance
• Are your customers
interested in moving to
cloud?
• Want to close your data center
• Current hosting solution is high
maintenance
• You’re asked to do more with less
• Want to expand your reach globally
Managed Instance brings
PaaS closer to you!
??
?
• Do your customer want to
avoid app rewrites but still
benefit from PaaS?
34. 34
Backup
Configuring and performing point in time recovery Azure SQL Database does a full backup every week, a differential
backup each day, and an incremental log backup every five minutes. If you want to extend the default retention period,
you need to configure long-term retention.This feature depends on Azure Recovery Services, and you can extend the
retention time up to 10 years.
SQL Database automatically creates database backups and uses Azure read-access geo-redundant storage (RA-GRS) to
provide geo-redundancy.These backups are created automatically and at no additional charge.
If you delete the Azure SQL server that hosts SQL databases, all elastic pools and databases that belong to the server
are also deleted and cannot be recovered.You cannot restore a deleted server. But if you configured long-term
retention, the backups for the databases with LTR will not be deleted and these databases can be restored.
If your database is encrypted withTDE, the backups are automatically encrypted at rest, including LTR backups
Backup storage up to 100% of the maximum database size is included, beyond which you will be billed in GB/month
consumed.
35. 35
Backup
When you need to recover a database from an automatic backup you can
restore it to:
A new database in the same logical server from a point-in-time within
the retention period.
A database in the same logical server from a deleted database.
A new database from the most recent daily backup to any logical server
in any region.
37. 37
Backup
*If you need faster recovery, use active geo-replication. If you need to be able to recover data
from a period older than 35 days, use Long-term retention.
43. 43
Business Continuity
Every Azure SQL Database subscription has built-in redundancy.Three copies of your
data are stored across fault domains in the datacenter to protect against server and
hardware failure.This is built in to the subscription price and is not configurable.
Standard/general purpose model that provides 99.99% of availability but with some
potential performance degradation during maintenance activities.
Premium/business critical model that provides also provides 99.99% availability with
minimal performance impact on your workload even during maintenance activities.
Although high availability is a great feature, it does not protect against a catastrophic
failure of the entire Azure region. For those cases, you need to put in place a disaster
recovery plan. Azure SQL Database provides you with two features that makes it easier
to implement these type of plans: active geo-replication and auto-failover groups.
44. 44
Failover groups and active geo-replication
Active geo-replication has the following benefits:
Database-level disaster recovery goes quickly when you’ve replicated transactions to
databases on different SQL Database servers in the same or different regions.
You can fail over to a different data center in the event of a natural disaster or other
intentionally malicious act.
Online secondary databases are readable, and they can be used as load balancers for
read-only workloads such as reporting.
With automatic asynchronous replication, after an online secondary database has
been seeded, updates to the primary database are automatically copied to the
secondary database.
45. 45
Failover groups and active geo-replication
With active geo-replication you can configure up to four readable
secondary databases in the same or different regions. In case of a region
outage, your application needs to manually failover the database. If you
require that the failover happens automatically performance, then you
need to use auto-failover groups.
Secondary active geo-replication databases are priced at 100 percent of
primary database prices.The cost of geo-replication traffic between the
primary and the online secondary is included in the cost of the online
secondary. Active geo-replication is available for all database tiers.
46. 46
Failover groups and active geo-replication
Before you create an online secondary, the following requirements must be
met:
The secondary database must have the same name as the primary.
They must be on separate servers.
They both must be on the same subscription.
The secondary server cannot be a lower performance tier than the
primary.
51. 51
Elastic scalability
If you reach 80% of your performance metrics, it’s time to consider
increasing your service tier or performance level. If you’re consistently
below 10 percent of the DTU, you might consider decreasing your service
tier or performance level.
we can scale-up.This means that we will add CPU, memory, and better
disk i/o to handle the load. In Azure SQL Database, scaling up is very
simple: we just move the slider bar over to the right or choose a new
pricing tier.This will give us the ability to handle more DTUs.
52. 52
Elastic scalability
In some cases, even the highest performance tiers and performance optimizations might
not handle your workload on successful and cost-effective way. we might even not be able
to scale-up much further. In that cases you have other options to scale your database:
Read scale-out is a feature available in where you are getting one read-only replica of
your data where you can execute demanding read-only queries such as reports. Read-
only replica will handle your read-only workload without affecting resource usage on your
primary database.
Database sharding is a set of techniques that enables you to split your data into several
databases and scale them independently.
53. 53
Read scale-out
Each database in the Premium tier (DTU-based purchasing model)
or in the Business Critical tier (vCore-based purchasing model) is
automatically provisioned with severalAlwaysON replicas to
support the availability SLA.
These replicas are provisioned with the same performance level as
the read-write replica used by the regular database connections.
The Read Scale-Out feature allows you to load balance SQL
Database read-only workloads using the capacity of one of the
read-only replicas instead of sharing the read-write replica.
54. 54
Sharding
We may shard a database because:
It is too large to be stored in a single Azure SQL Database.
It is too much data to backup and restore in a reasonable amount of time.
Our customers require that their data is stored away from other customers
Sharding involves rewriting a significant portion of our applications to
handle multiple databases.
Sharding is easily implemented in AzureTable Storage and Azure Cosmos
DB, but is significantly more difficult in a relational database like Azure SQL
Database.The complexity comes from being transactionally consistent while
having data available and spread throughout several databases.
55. 55
Sharding
Microsoft has released a set of tools called Elastic DatabaseTools that
are compatible with Azure SQL Database.This client library can be used in
your application to create sharded databases.
The main power of the Elastic DatabaseTools is the ability to fan-out
queries across multiple shards without a lot of code changes.
56. 56
Sharding
When you use the Elastic client library, you deal with
shards, which is conceptually equivalent to a database.
This client library helps you with:
Shard map management creates a shard map
database for storing metadata about the mapping of
each tenant with its database, allowing you to register
each database as a shard
Data dependent routing allows you to select the
correct database based on the information that you
provide on the query for accessing the tenant’s data.
Multi-shard queries (MSQ) executes the sameT-SQL
on all shards that participate with the query and returns
the resultant data as the result of a UNION ALL.
57. 57
Azure SQL Data Sync
Synchronize data across multipleAzure SQL databases and
SQL Server instances, in uni-direction or bi-direction.
Keep data up-to-date across all SQL databases Distributed
Applications
Cloud
App
Cloud
App
Cloud
App
On-prem
App
58. 58
Azure SQL Data Sync
SQL Data Sync is a new service for Azure SQL Database. It allows you to bi-directionally
replicate data between two Azure SQL Databases or between an Azure SQL Database and
an on-premise SQL Server.
A Sync Group is a group of databases that you want to synchronize using Azure SQL Data
Sync.
A Sync Schema is the data you want to synchronize.
Sync Direction allows you to synchronize data in either one direction or bi-directionally.
Sync Interval controls how often synchronization occurs.
Finally, a Conflict Resolution Policy determines who wins if data conflicts with one another.
The hub database must always be an Azure SQL Database. A member database can either
be Azure SQL Database or an on-premise SQL Server.
This can be used to populate a read-only version of the database for reporting, but only if
the schema will be 100% consistent.
59. 59
Azure SQL Data Sync
• All SQL databases supported
(SQL Server, SQL IaaS & Azure SQL
Database)
• Zero code required to enable data
synchronization among SQL databases
• Hub-and-Spoke Synchronization
technology
• Both One-way or Bi-
directional synchronization
• Table-level synchronization with
Column Filter
• Minute-level latency
62. 62
Azure SQL Data Sync
Data Sync Active Geo Replication
Pros • Active-active support
• Sync selected tables and
columns
• Sync between on-prem and
Azure SQL Database
• Seconds level latency
• Transactional consistency
• Auto failover with failover
group
• Designed for DR or read-only
scaling
Cons • 5 min or more latency
• No transactional consistency
• Higher performance impact
• Non-Writeable secondaries
• Replicates the entire database
• Secondary must use same
edition
63. 63
Azure SQL Data Sync
Data Sync Transactional Replication
Pros • Active-active support
• Bi-directional between on-
prem and Azure SQL Database
• Lower latency
• Transactional consistency
• Designed for on-prem to
Azure DB replication or
migration
Cons • 5 min or more latency
• No transactional consistency
• Higher performance impact
• On-prem/Azure SQLVM to
Azure SQL Database only
• High maintenance cost
64. 64
Azure SQL Data Sync
Data Sync SSIS
Pros • Easy configuration • Support transformation
• Support more types of
sources and destinations
• Designed for ETL
Cons • Transformation is not
supported
• Domain knowledge required
• Need extra hosted services
(VM or SSIS PaaS)
• Need additional change
tracking technologies
65. 65
SQL Server Stretch Database
SQL Server Stretch Database migrates your cool data securely and
transparently to Azure.
The main advantage of this solution is that your data is always online, and
you not need to change any query or any configuration or code line in
your application to work with SQL Server Stretch Database.
Since you are moving your cool data to the cloud, you reduce your need
for high performance storage for the on-premises database servers.
You can migrate full tables or just parts of online tables by using a filtering
function.
66. 66
SQL Server Stretch Database
Creates a secure connection between the
Source SQL Server andAzure
Provisions remote instance and begins
migration
Apps and Queries continue to run for both
the local database and remote endpoint
Security controls and maintenance remain
local
Available in all versions of SQL Server 2016
SQL
Stretch
Database
SQL
2016 Cold DataHot data
Cold data
On-premises network Azure PaaS
67. 67
SQL Server Stretch Database
Compute billed as DU, storage billed as Standard Disk rates.
71. 71
Migration to Azure SQL Database
Migration with downtime during the migration
*Rather than using DMA, you can also use a BACPAC file.
See Import a BACPAC file to a new Azure SQL Database.
78. S E A M L E S S C LO U D
I N T E G R AT I O N
Easy lift-and-shift, integrate and
distribute
Active Geo-replicas “data CDN” for your edge
deployments
SQL Azure Data Sync v2 synchronize data
across distributed and occasionally connected
applications
Azure SQL Database Managed Instance
facilitates lift and shift migration from on-
premises SQL Server to cloud
Azure Hybrid Benefit for SQL Server
maximizes current on-premises license
investments to facilitate migration
Database Migration Service (DMS)
provides seamless and reliable migration at scale
with minimal downtime
Most consistent data platform
Database Migration
Ser vice (DMS)
Azure SQL Database
Managed Instance
Azure Hybrid Benefit
(AHB) for SQL Ser ver
SQL Ser ver
Managed SSIS in Azure
Azure SQL Database
79. 79
Graph Database
SQL Server 2017 introduces a new graph database feature.
Graph databases are yet another NoSQL solution.
Graph database introduce two new vocabulary words: nodes and relationships.
Nodes are entities in relational database terms. Each node is popularly a noun, like a person, an
event, an employee, a product, or a car. A relationship is similar to a relationship in SQL Server in
that it defines that a connection exists between nouns.
A key difference between a relational storage engine and a graph database storage engine is
that as the number of nodes increase, the performance cost stays the same.
Graph databases are popularly traversed through a domain specific language (DSL) called
Gremlin. In Azure SQL Database, graph-like capabilities are implemented throughT-SQL.
DDL Extensions – create node/edge tables
Query Language Extensions – New built-in: MATCH, to support pattern matching and
traversals
80. 80
What is a Graph?
Attendee Session
attends
• A graph is collection of Nodes and Edges
– Nodes: Entities – for example
customer, supplier, product
– Edges: Relationships that various
entities share with each other
– Properties: Node or Edge attributes
81. 81
Why Graph Databases?
Hierarchical or interconnected
data, entities with multiple
parents.
Analyze interconnected data,
materialize new information
from existing facts. Identify non-
obvious connections
Complex many-to-many
relationships. One relation
flexibly connecting multiple
entities.
A
John
Mary
Alice
Shaun
Jacob
Jerry
Natalie
Bob
leads
manages
leadsleads
82. 82
Our approach – Embrace and Extend
Backed by Research
References
J. Fan, A. Gerald, S. Raj and J. M. Patel,
"The case against specialized graph
analytics engines," in CIDR, Asilomar,
CA, 2015.
A. Jindal, S. Madden, M. Castellanos
and M. Hsu, "Graph analytics using
vertica relational database," in IEEE
BigData, Santa Clara, CA, 2015
Matured Product
40+ years of academic and
industry research.
Highly evolved ecosystem,
including tooling and
community support
Build on-prem, cloud,
Hybrid Solutions
Best of both relational
and graph database on a
single platform
Trusted
Used and trusted by
millions of customers for
enterprise and mission
critical workloads.
83. 83
DDL Extensions
CREATE NODE
CREATE TABLE [dbo].[Attendee](
[Attendee_Id] [uniqueidentifier] PRIMARY KEY,
[Attendee_FName] varchar(100),
[Attendee_LName] varchar(100)
) AS NODE
GO
SELECT TOP 5 * FROM Attendee;
84. 84
DDL Extensions
CREATE TABLE attends (Rating integer) AS EDGE;
CREATE TABLE [from] AS EDGE;
CREATE EDGE
SELECT TOP 5 * FROM [from];
85. 85
Query Language Extensions
• Multi-hop navigation and join-free pattern matching using MATCH
predicate
• ASCII-art syntax to facilitate graph traversal
SELECT
Attendee.Attendee_Name AS ‘AttendeeName’,
Session.Session_ID AS ‘SessionName’
FROM
attends a,
Attendee at,
Session s
WHERE
MATCH (Attendee-(attends)->Session)
AND Session.session_name = 'Graph extensions in Microsoft SQL
Server 2017 and Azure SQL Database'
86. 86
Relational vs. Graph
Graph and relational designs can answer the same questions
But if traversal of relationships define the primary application requirements,
Graph can solve this more intuitively and with less code
87. 87
Graph Database Scenarios
Recommendation Systems
Fraud Detection
Content Management
Bill of Materials, product hierarchy
CRM
88. 88
AutomaticTuning
• One-click to enable
• Prevent and mitigate
performance issues
• No app changes needed
• Tuning actions
Create missing indexes
Drop unused/duplicate indexes
Force last good plan
94. 94
Intelligent Insights
• Continuous monitoring
• Disruptive event detection
• Root cause analysis
• Available as diagnostic log
Azure SQL Analytics solution
Stream to Event Hub
Archive to Storage
Root-cause: Hitting resource limits caused by new ad-hoc query 0X9001RTYU. Impacted query 0X9002FGJR started
timing out. Consider stopping the ad-hoc query or increasing your pricing tier.
Disruptive
event
Queries:
0X9003HA4J OK
0X9002FGJR Regressed query
0X901119GI OK
0X900044RJ OK
100. 100
Query Performance Insight
Query Performance Insight allows you to spend less time troubleshooting database
performance by providing the following:
Deeper insight into your databases resource (DTU) consumption.
The top queries by CPU/Duration/Execution count, which can potentially be tuned
for improved performance.
The ability to drill down into the details of a query, view its text and history of
resource utilization.
Performance tuning annotations that show actions performed by SQL Azure
Database Advisor
*Query Performance Insight requires that Query Store is active
on your database. If Query Store is not running, the portal
prompts you to turn it on.
107. 107
Automated discovery and
classification of sensitive data
Labeling (tagging) sensitive data on
column level with persistency
Audit access to sensitive data
Visibility through dashboards and
reports
Hybrid cloud + on-premises
115. 115
Detects suspicious database activities
Just turn it ON
Detects potential
vulnerabilities and SQL
injection attacks
Detects unusual behavior
activities
Actionable alerts which
recommend how to
investigate & remediate
Azure SQL DatabaseApps
Audit
Log
Threat Detection
(1) Turn on Threat Detection
(3) Real-time actionable alerts
*It costs $15/server/month , first 60 days for free.
(2) Possible threat to
access / breach data
121. 121
Service Endpoint
Restrict Access to the DB
from VMs in a given
VNET/Subnet
Separation of duties between network
admin and DB admin
Simplify management of VIPs and
firewall rules;
Server-level configuration
available for SQL Database, SQL Data
Warehouse
126. 126
Orchestration Key ManagementPrivate Connections Monitoring
AZURE EXPRESSROUTE AZURE DATA FACTORY AZURE KEY VAULT OPERATIONS MANAGEMENT SUITE
AZURE SQL DATA WAREHOUSE
DATA FACTORY
DATA FACTORY
AZURE MACHINE LEARNING & MACHINE LEARNING SERVER
AZURE DATA LAKE STORE AZURE DATA LAKE ANALYTICS COSMOS DB WEB & MOBILE APPS
AZURE STREAM ANALYTICS
Power BI
COGNITIVE SERVICESBOT SERVICE Logic App
AZURE ANALYSIS SERVICES
127. 127
SMP vs. MPP Architecture
VS
Scale-up Scale-out
Symmetric Multi-Processing (SMP) vs. Massively Parallel Processing (MPP)
133. 133
Azure SQL DataWarehouse
Azure SQL DataWarehouse offers two different performance tiers:
Optimized for Elasticity On this performance tier, storage and compute are in
separate architectural layers.This tier is ideal for workloads of heavy peaks of
activity, allowing you to scale the compute and storage tiers separately
depending on your needs.
Optimized for Compute Microsoft provides you with the latest hardware for
this performance tier, using NVMe Solid State Disk cache.This way, most
recently accessed data keeps as close as possible to the CPU.This tier provides
the highest level of scalability, by providing you up to 30,000 compute Data
Warehouse Units (cDWU).
139. 139
How to choose your performance tier
Elasticity Compute
Current status Generally available Preview in fall
Regional availability 33 6 (growing over time)
Entry pricing $1.21 / hour $6.05 / hour (preview rate)
Starting scale point 100 DWUs 1000 cDWUs
Max compute scale 6,000 DWUs 30,000 cDWUs
Max storage 240TB (compressed) Unlimited (columnar)
Use of elasticity Dynamic “burst” scaling Incremental scaling
Min memory per query 6GB 15 GB
Language surface area Same Same
140. 140
Hash-distributed tables
A hash distributed table can deliver the highest
query performance for joins and aggregations on
large tables.
141. 141
Round-robin distributed tables
A round-robin table is the simplest table to create and delivers
fast performance when used as a staging table for loads.
144. 144
Data Migration Recommendations
Data FormatConversion
• Date Format, Field delimiters, escaping, field order, encoding
Compression
• Use Gzip, ORC, Parquet
• 7-Zip utility, .NET/JAVA libraries
Export
• BCP for fast export
• Multiple files per large table, one folder per table
Copy
• AZCopy
• Data Movement Library
Tips
• Incorrect format means migration
needs to be entirely repeated
• Exploit bcp options, hints, parallelism
• Multiple compressed files, Split files
• Parallel import, reliable transfer
• Don’t use multiple files in the same
gziped file
• EfficientCopy
• Parallel, Async, Resumable
• Limit concurrent copies if low
bandwidth
• Very Large Data transfer
• Express Route, Import/Export Service
145. 145
Data Loading Recommendations
PolyBase and SSIS (with 2017 Azure feature pack) the fastest method
• Upload to BLOB viaAZCOPY or PowerShell library
• Historical load – use CTAS
• Incremental – use INSERT…SELECT
Use the highest resource class (without sacrificing concurrency)
Increase DWU during load, decrease when done
PolyBase now supports UTF-16 file types.ADLS as a source and target is also supported
Known Issues:
• Does not support extendedASCII
• Does not support custom multi-date format. E.g. 2000-1-6
• No reject files/reason for rejected rows.
147. 147
Azure SQL DataWarehouse
Target workload: Analytics (OLAP)
Store large volumes of data
Consolidate disparate data into a single location
Shape, model, transform and aggregate data
Perform query analysis across large datasets
Ad-hoc reporting across large data volumes
All using simple SQL constructs
148. 148
Azure SQL DataWarehouse
Unsuitable workloads
Operational workloads (OLTP)
High frequency reads & writes
Large numbers of singleton selects
High volume of single row inserts
Data Preparation
Row by row processing needs
Incompatible formats (JSON, XML)
Sourced from General vNext goals slide “2% of Linux on-premises DB market ~$150M”
http://www.bloomberg.com/news/articles/2016-03-07/microsoft-plans-linux-database-in-bid-to-win-sales-from-oracle
Mark R. Murphy
Satya, regarding the announcement that you will release your SQL Server database on the Linux platform, I was wondering if you can walk us through your decision tree just in terms of what you think the potential risks are and what you think the potential rewards are of reaching for that level of openness, if you will. And just how impactful do you think that, that product can be in enhancing Microsoft's share of the database market?
Satya Nadella
Thanks for the question. So the decision logic was driven primarily by what I'd say the increased competitiveness of SQL Server. If you think about where SQL Server now with this new release, SQL Server 2016, it's become a fantastic database for many, many of the workloads, everything from OLTP to data warehousing to BI to advanced analytics. For the Tier 1, this is a capability that's been multiple decades in the work, but here we are with very competitive total cost of ownership, price competitiveness but with a technology that is, in many cases, as Gartner talks about, at the top of the charts when it comes to all of these workloads. So now that we find yourselves with that capability, we're saying, "Look, what's the way to think about market -- all the markets that we can, in fact, take this product to." And the Linux operating system database market is not something that -- which is mostly primarily a Tier 1 segment, is something that we never worked in. And so, therefore, we look at that as an expansion opportunity so we take that. We've already made the call that Azure Linux's FirstClass. We already have 20-plus points of -- or 20-plus percent of VMs in Azure or Linux and we'll all increasingly have Linux via big share of percentage of what is happening in Azure. So for the first time now, we have the ability to go to an enterprise and talk about that entire data estate across Windows and Linux. People don't really move between operating systems. Those choices have been made. But at the same time, now they have a choice around database. And so we think that, that's a very good incremental opportunity for us.
Next steps: create SQL Server vNext slide once messaging finalized
Current status: messaging workstream with Sydney Davis
Planned pillars: new "platform of choice" pillar to supplement existing pillars
Notes: “Any data” my be overselling; won’t have some capabilities at Public Preview but will at GA
Title: SQL Server - The platform of choice
Any data
Access diverse data, including video, streaming, documents, relational, both external data and data internal to your org
Use Polybase to access Hadoop big data and Azure blog storage with the simplicity of t-SQL
You can use Azure DocumentDB, a NoSQL document database service, for native JSON support and JavaScript built directly inside the database engine
Any application
Leverage the t-SQL skills of your talent base to run advanced analytics through R models, and to access structured and unstructured data
Take advantage of Microsoft–created database connectivity drivers and open-source drivers that enable developers to build any application using the platforms and tools of their choice, including Python, Ruby, and Node.js
Anywhere
Flexible on-premises and cloud
Easily backup to the cloud
You can now migrate a SQL Server workload to Azure SQL DB. The parity is there and the notion that SQL Server doesn’t map to Azure SQL DB is no longer the case
Keep more historical data at your fingertips by dynamically stretching tables to the cloud with Stretch Database.
Choice of platform
Aligns to your operating system environment. Today, SQL Server is on Windows/Windows Server, will also be on Ubuntu Linux, and we are targeting additional platforms, including Red Hat Linux
Benefit from continued integration with Windows Server for industry-leading performance, scale and virtualization on Windows.
Note: Tux penguin image created by Larry Ewing
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
Brand new feature – that we’re announcing is in Public Preview *today*!!
Beginnings we saw in VA - expanding to more comprehensive solution
This is a VITAL element of GDPR/ data privacy story - data discovery + classification –
We help you by automatically discovering sensitive data.
You can label it with classifications – and the metatdata is persisted in the DB!
This enables management, visibility. Audit access. Track sensitive data when it leaves DB boundaries. The persistent label will be identified by external apps to handle accordingly, e.g. encrypt.
Manage the policy ACROSS Azure – for all your data! In ASC! Classification framework integrated with MIP for holistic MS data classification story.
It can serve as infrastructure for:
Helping meet data privacy standards and regulatory compliance requirements.
Various security scenarios, such as monitoring (auditing) and alerting on anomalous access to sensitive data.
Controlling access to and hardening the security of databases containing highly sensitive data.
Data Discovery & Classification introduces a set of advanced services and new SQL capabilities, forming a new SQL Information Protection paradigm aimed at protecting the data, not just the database:
Discovery & recommendations – The classification engine scans your database and identifies columns containing potentially sensitive data. It then provides you an easy way to review and apply the appropriate classification recommendations via the Azure portal.
Labeling – Sensitivity classification labels can be persistently tagged on columns using new classification metadata attributes introduced into the SQL Engine. This metadata can then be utilized for advanced sensitivity-based auditing and protection scenarios.
Query result set sensitivity – The sensitivity of query result set is calculated in real time for auditing purposes.
Visibility - The database classification state can be viewed in a detailed dashboard in the portal. Additionally, you can download a report (in Excel format) to be used for compliance & auditing purposes, as well as other needs.
RON
SQL Vulnerability Assessment is our newest security intelligent feature, which was just released to Public Preview
It provides you visibility into the security state of your and allows you to constantly track and improve it over time
It is a built-in security feature in Azure SQL Database and it is also available using the latest SQL Server Management Studio (for SQL OnPrem or SQL on VM)
2) In short, SQL Vulnerability Assessment runs a set of security checks which
Discover sensitive data which is not protected
Identify security misconfigurations that leave your database vulnerable to attack
In addition, it provides a clear report which is very helpful for security audits.
It can help you:
Meet compliance requirements that require database scan reports.
Meet data privacy standards.
Monitor a dynamic database environment where changes are difficult to track.
s
s
s
RON
The second security intelligent feature that I would it to share with you is SQL Threat Detection
It is also a built-in feature in Azure SQL Database, which detects anomalous database activities indicating unusual and potentially harmful attempts to breach the database
1) It is super simple to enable it using Azure portal or standard API and requires no modifications to your application code
2) It provides you a set of world-class algorithms that learn, profile and detect potential SQL injections and unusual behavior patterns
3) It trigger an immediate email & portal alert upon detection ,which includes clear description and actionable investigation and remediation steps
Vulnerability to SQL Injection: This alert is triggered when an application generates a faulty SQL statement in the database. This may indicate a possible vulnerability to SQL injection attacks. There are two possible reasons for the generation of a faulty statement:
A defect in application code that constructs the faulty SQL statement
Application code or stored procedures don't sanitize user input when constructing the faulty SQL statement, which may be exploited for SQL Injection
Potential SQL injection: This alert is triggered when an active exploit happens against an identified application vulnerability to SQL injection. This means the attacker is trying to inject malicious SQL statements using the vulnerable application code or stored procedures.
Access from unusual location: This alert is triggered when there is a change in the access pattern to SQL server, where someone has logged on to the SQL server from an unusual geographical location. In some cases, the alert detects a legitimate action (a new application or developer maintenance). In other cases, the alert detects a malicious action (former employee, external attacker).
Access from unusual Azure data center: This alert is triggered when there is a change in the access pattern to SQL server, where someone has logged on to the SQL server from an unusual Azure data center that was seen on this server during the recent period. In some cases, the alert detects a legitimate action (your new application in Azure, Power BI, Azure SQL Query Editor). In other cases, the alert detects a malicious action from an Azure resource/service (former employee, external attacker).
Access from unfamiliar principal: This alert is triggered when there is a change in the access pattern to SQL server, where someone has logged on to the SQL server using an unusual principal (SQL user). In some cases, the alert detects a legitimate action (new application, developer maintenance). In other cases, the alert detects a malicious action (former employee, external attacker).
Access from a potentially harmful application: This alert is triggered when a potentially harmful application is used to access the database. In some cases, the alert detects penetration testing in action. In other cases, the alert detects an attack using common attack tools.
Brute force SQL credentials: This alert is triggered when there is an abnormal high number of failed logins with different credentials. In some cases, the alert detects penetration testing in action. In other cases, the alert detects brute force attack.
s
s
s
s
Only one geographic region
Server-level, not database-level
s
s
s
s
Add key for the coluors
De-coupled storage from compute & control
Completely elastic
Pay for the data you store and the compute you provision
De-coupled storage from compute & control
Completely elastic
Pay for the data you store and the compute you provision
Data storage and snapshots
Data storage is charged based on Azure Premium Storage rates of €125.39/1 TB/month (€0.18/1 TB/hour). Data storage includes the size of your data warehouse and 7-days of incremental snapshot storage.
Note—Storage transactions are not billed. You only pay for stored data and not storage transactions.
Geo-redundant disaster recovery
Your data warehouse is copied to geo-redundant storage for disaster recovery. Storage for geo-redundant copies is billed at Azure Standard Disk read-access geo-redundant storageof €0.102/GB/month.
Compute is billed at €930.87/100 DWUs/month, unless the data warehouse is paused. Storage is billed at €125.39/1 TB/month.
You cannot opt out of snapshots, as this capability provides your data warehouse with data loss and corruption protection.
DWU: In essence, DWU is a function of memory, CPU and concurrency. Basic DWU, DW100 can have upto 24GB of RAM with lesser concurrency
1 DWU is approximately 7.5 DTU (Database Throughput Unit, used to express the horse power of an OLTP Azure SQL Database) in capacity although they are not exactly comparable.
To calculate your DTU needs, multiply the 7.5 by the total DWU needed, or multiply 9.0 by the total cDWU needed.