SQL and NoSQL in the Context of SQL Server: Understanding Data Models and Scaling Techniques

SQL and NoSQL
in the Context of SQL Server
Michael Rys
Program Manager, Microsoft Corp.
@SQLServerMike

Key Session Takeaways

 Scaling your Business is important
 What are the NoSQL paradigms
 You can use NoSQL Paradigms with SQL
Server and SQL Azure
 We are working on moving the paradigms
into SQL Server

The Web 2.0 Business Architecture

Attract Individual
Consumers:
- Provide interesting
service
- Provide mobility
- Provide social

Monetize Individual:
- Upsell service
Online Monetize the Social:
- Improve individual
- VIP
- Speed
Business experience
- Re-sell Aggregate Data
- Extra
Capabilities
Application (e.g., Advertisers)

Social Networking: the Business Problem
 100s of million of users
 10s of million of users
concurrently
 Terabytes to petabytes of
data
 Structured and unstructured
 Required (eventual) data
consistency across users
 E.g. show your updated state
in your friends’ profile pages

Solution
 Shard/Partition user data across
hundreds to thousands of SQL
Databases
 Propagate data changes using
reliable, async Message Service
 No Global Transactions! Hinder scale
and availability!
 Provide a caching layer for
performance
 Also used for
 Clean-up state (e.g. on account close)
 Deploy business logic (stored procedures)

Example Architecture (MySpace.com)

1-1000 3001-4000
Async My DB I change
Message
gets updated my status
Service TX1
TX3 TX2
Dispatcher Async userId=1024
Message
Async
2001-3000 Message
1001-2000

TX4 TX5

4001-5000 5001-6000 Web Tier
Data Tier

Many Large Scale Customers using Similar Patterns

 Patterns
 Sharding and reliable messaging
 Sharding and fan/out query layer
 Caching layer

 Customer Examples
 Social Networking: Facebook, MySpace, etc
 Online electronic stores (cannot give names )
 Travel reservation systems (e.g. Choice International)
 MSN Casual Gaming
 etc.

Lessons Learned from these Scenarios

 Require high availability
 Be able to scale out
 Functional and Data Partitioning Architecture
 Provide scale-out processing
 Be able to deal with failures
 Be able to quickly grow and change
 Elastic scale
 Flexible, open schema
 Multi-version schema support

Move better support for these patterns into the Data
Platform!

What is NoSQL about?
 NoSQL = operational and developer agility at low CapEx and OpEx!
 Low Cost
 Free Software and Support
 Scale CapEx cost below customer growth rate
 Web friendly developer model and tool chain, Easy to use
 Processing Paradigms
 High Availability
 Data and Processing Scale-out
 Performance
 Tunable/Eventual Consistency
 Data Model Paradigms
 Data first: Flexible Schema
 Low-impedance mismatch between programming and data model
From devices, over OLTP Web 2.0 applications to BigData Analytics

Data Models
Data Model Example Stores
Simple Key-Value Pairs Memcache, Redis, Dynamo, Voldermort, LevelDB,
Azure Caching
Wide Sparse Column Sets HyperTable, Big Table, Cassandra, HBASE,
Hyperbase, Amazon DynamoDB, Windows Azure
Tables, SQL Server/Azure Sparse columns
BLOBs Amazon S3, Oracle Berkeley NoSQL, Windows
Azure Blob Store, SQL Server RBS/FileTable
JSON Documents MongoDB, CouchBase, Riak, RavenDB
Graph Neo4J, GraphDB, HypergraphDB, Stig,
Intellidimension
Objects and XML Documents Versant, Oracle Berkeley NoSQL, MarkLogic,
existDB, EMC HiveDB, SQL Server/Azure, Oracle,
IBM DB2
Extended Relational Oracle, EMC SQLFire, IBM DB2, MySQL, Postgres,
SQL Server/Azure/Parallel DW

Operational Agility
 You want:
 Availability of service (scalability)
 Global consistency
 Network Partition Tolerance
 You can only get 2 of 3 (CAP Theorem)
 In Brave New World:
 Online businesses need availability
 It is distributed, because it is big
 thus Network Partitioning is unavoidable
 Hence global consistency must be relaxed
→ BASE vs ACID

BASE vs ACID Consistency
 ACID :
Atomicity, Consistency, Isolation, Durability
 Full Serializability provides all 4
 Distributed transactions providing all 4 limits
service availability, throughput and scalability
 BASE: Basically Available, Soft state, Eventual
consistency
 Relaxes ACID properties to increase Replica
availability, throughput and scalability
Primary
 Replica consistency:
Replica
 Impacts recoverability
 Cross-node consistency: Replica
 Impacts globally consistent view of the world
Primary
Replica

Operational Agility
 Performance and Scale
 Automate management lifecycle (or fail)
 Simple deployment lifecycle
 No DB or OS Admin telling me what to do

Developer Agility

 Code First and revise quickly
 Application-model first (before database)
 Flexible open data models
 You don’t know exactly what you are looking for
 Lower Pain of adoption and maintenance
 No DB or OS Admin telling me what to do

NoSQL and BigData: Two sides of the same coin

 BigData:
 Origin: large unstructured data processing
(sensor data, scientific research, web stream analysis)
 Analytics focused (“new” OLAP, Map-Reduce, Hadoop)
 Scale-out data and processing paradigm at low cost
 NoSQL:
 Origin: developing agile, scalable web applications
 Realtime customer transaction focused (“new” OLTP)
 Scale-out data and processing paradigm with flexible
data model at low cost
 Both use many of the same paradigms

Scale-Out Data PLATFORM Architecture

Readable
Replica

Primary Copy
Shard

OLTP Workloads Readable
Replica
Traditional OLAP Workloads
Highly Available
known schema
High Scale
Data warehouse, “Star joins”
High Flexibility Readable
Replica
mostly touching 1 Primary
to low number of Shard Dynamic OLAP Workloads
shards Readable
Replica 3Vs (Volume, Velocity, Variety)
Exploratory

Readable Scale-out queries, often using
Replica eventual consistent scale-out
frameworks like Hadoop
Primary
Shard Query
Readable
Replica

What does SQL Server provide today?
 Scale-programming models
 Service Broker provides:
 Functional, service-oriented architecture
 Scale out on demand
 Async reliable messaging provides for true eventual consistency
 SQL Azure Federations provides Sharding support
 Distributed Queries
 SQL Server Parallel Data Warehouse
 Programmer Agility
 XML, XQuery for XML documents
 FileTable for documents (but what is equivalent solution in the cloud?)
 Open Schema: Sparse Columns and column sets (but still schema first)
 CLR extensibility, but
 No indexing, bad cost-models
 Difficult to deploy (and DB Admins often do not allow it!)
 Failure Resilience
 SQL Azure has local automatic HA, self-healing
 Rich Services
 Semantic Extraction and Similarity Search in SQL Server 2012
 DB/OS Admin “interference”
 SQL Azure: Self-maintaining and Self-provisioning

Introducing SQL Azure Federations

 Provides Data Partitioning/Sharding
at the Data Platform
 Enables applications to build elastic
scale-out applications
 Provides non-blocking SPLIT/DROP for
shards (MERGE to come later)
 Auto-connect to right shard based on
sharding keyvalue
 Provides SPLIT resilient query mode

SQL Azure Federation Concepts
 Federation
Azure DB with Federation Root
Represents the data being sharded
 Federation Root Federation Directories, Federation
Database that logically houses Users, Federation Distributions, …
federations, contains federation meta data
 Federation Key
Value that determines the routing of a piece Federation “Orders_Fed”
of data (defines a Federation Distribution) (Federation Key: CustomerID)
 Federation Member (aka Shard)
Physical container for a set of federated
tables of a specific key range and reference Member: PK [min, 100)
tables
 Atomic Unit AU
PK=5
AU
PK=25
AU
PK=35
All rows with the same federation
key value: always together!
 Federated Table
Member: PK [100, 488)
Table that contains only atomic units
for the member’s key range
AU AU AU

Connection

Reference Table PK=105 PK=235 PK=365
Gateway

Non-sharded table

Member: PK [488, max)

AU AU AU
Sharded PK=555 PK=2545 PK=3565

20 Application

Demo
Map-Reduce scale-out
over SQL Azure Federations

SQL Azure: A Not Only SQL Data Platform
SQL Azure adds support for NoSQL paradigms in the data platform:
 No CapEx, Low OpEx (which should/will be even lower )
 High-Availability (each DB has two replicas)
 Sharding support with federations:
 Data platform provides online SPLIT/DROP
 Filtered connection to provide split resilient programming model
 Flexible Data Models:
 XML support
 Sparse columns/Column sets
 More to come in the future…
 More scale and tunable HA (to support OLTP/OLAP model)
 Taking Federations further (orthogonality, merge, fanout)
 Integration with Hadoop eco-system
 More data-first (data-driven columnsets, JSON)

Call to Action

 Download the Presentation from:
http://www.slideshare.net/MichaelRys/presentations
 Fill out SQL Azure Federation Survey:
http://connect.microsoft.com/BusinessPlatform/Survey/S
urvey.aspx?SurveyID=13625

Related Content
 Related Whitepapers and Presentations:
 CACM: Scalable SQL: http://cacm.acm.org/magazines/2011/6/108663-scalable-sql
 NoSQL and the Windows Azure Platform:
http://download.microsoft.com/download/9/E/9/9E9F240D-0EB6-472E-B4DE-
6D9FCBB505DD/Windows%20Azure%20No%20SQL%20White%20Paper.pdf
 SQL Federation blog: http://blogs.msdn.com/b/cbiyikoglu/archive/2011/03/03/nosql-genes-in-
sql-azure-federations.aspx
 Windows Gaming Experience Case Study:
http://www.microsoft.com/casestudies/Case_Study_Detail.aspx?CaseStudyID=4
000008310
 NoSQL Presentations: http://www.slideshare.net/MichaelRys/presentations

 Contact me:
 mrys@microsoft.com
 @SQLServerMike
 http://sqlblog.com/blogs/michael_rys/default.aspx

SQL and NoSQL in the Context of SQL Server: Understanding Data Models and Scaling Techniques

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (19)

Similar to SQL and NoSQL in the Context of SQL Server: Understanding Data Models and Scaling Techniques

Similar to SQL and NoSQL in the Context of SQL Server: Understanding Data Models and Scaling Techniques (20)

More from Michael Rys

More from Michael Rys (20)

Recently uploaded

Recently uploaded (20)

SQL and NoSQL in the Context of SQL Server: Understanding Data Models and Scaling Techniques

Editor's Notes