This document provides an overview of NoSQL databases in Azure. It discusses 7 different database types - key-value, column family, document, graph and Hadoop. For each database type it provides information on what it is, examples of use cases, and how to query or model data. It encourages attendees to explore these databases and stresses that choosing the right database for the job is important.
2. Technical Architect at Microsoft
Primary focus on data solutions in the cloud
Lara Rubbelke
@sqlgal
www.linkedin.com/in/lararubbelke/
3. Karen has 20+ years of data and information architecture
experience on large, multi-project programs.
She is a frequent speaker on data modeling, data-driven
methodologies and pattern data models.
She wants you to love your data.
Karen López #TEAMDATA
4. The only reason for time is so that
everything doesn’t happen at once.
- Albert Einstein*
Session inspired by the book
Seven Databases in Seven Weeks
5. key concepts for
hybrid database
architectures
database /
datastore types
reasons to go
explore
Outcomes
We want you to leave here understanding:
6. This
is
NOT…
a deep dive on any technology
a comprehensive list
a roadmap discussion
What We Will Cover
7. What We’ll Cover
NoSQL
101
Comparison to relational
Not Only SQL (but really “Not SQL”)
Terminology
Categories What they are
Why you use them
When you use them
A little of how to use them
CAP
ACID
BASE
SCHEMA
Cloud
Scale
8. Distributed Systems and the CAP Theorem
AvailabilityConsistency
Partition Tolerant
Eric Brewer’s
CAP Theorem
and even better
CAP Twelve Years Later
Myth: Eric Brewer On Why Banks Are
BASE Not ACID - Availability Is Revenue
11. Polyschematic
Multiple schemas over
the same data
Schema on read, not
on write
Data integrity may be
managed elsewhere
The Why
* ALL DATA HAS STRUCTURE!
** EMBRACE DENORMALIZATION
12. Kinect Telemetry Retail Application
Reporting/Analysis
Hadoop Batch
Processing
Sensor Data
Column Family
Price Check
Key-Value
Product Catalog
Document Store
{ }
13.
14. Data-Intensive Applications in
the Cloud Computing World
Activity Queue
Azure Storage
Google Analytics
Logs
Azure Storage
Email DBs
SQL Azure x 16
Username DBs
SQL Azure x 16
User Profiles
SQL Azure x 400
Activity Table
X 50 Partitions
Azure Storage
IIS Logs
Azure Storage
Data Analysis: Staging
Virtual Machine
Data
Warehouse
Reporting
Services
Activity Processors
Worker Roles x 2
Cache
Users and Friends Feed
Games and Leader Boards
Resources and References
Distributed Cache x 32
Cache Tasks
Worker Roles x 4
Back Office
Web Roles x 2
Background Tasks DB
Utility DB, Content
DB, Taxonomy DB
SQL Azure
Web Application
Web Roles x 180
Web Service/API
Web Roles x 2
Moderation
Service/Appliance
CRISP/3rd
Party
15. NoSQL, Not Only SQL
Relational Key Value
Column
Family
Document Hadoop Graph
16. …Lots of other sessions to learn about this….
Relational
19. • Low cost, scalable, highly available
and geo-redundant
• Flexible schema
• Fast reads and writes on single key
values or partitioned key values
• Log data and cache
Patterns/What Works Anti-Pattern/Danger
Anything that requires:
• Joins
• Custom sorting
• Non-key filters
Why Key-Value
20. // Create a table client.
CloudTableClient tableKinect = account.CreateCloudTableClient();
CloudTable tableKinectTelemetry =
tableKinect.GetTableReference(“pricecompare");
// Create a query for all entities.
IQueryable<DynamicTableEntity> query =
from q in tableKinectTelemetry.CreateQuery<DynamicTableEntity>()
where q.PartitionKey.Equals(123)
and q.RowKey.Equals(013803204131)
select q;
Azure Tables: LINQ Query
21. Introduction to Windows Azure Tables
Azure Redis Cache 101 on Channel9
Learn More: Azure Tables and Redis Cache
26. • Variable Data Structures for same
type of entity
• Fast reads and writes on a complete
entity set
• Highly nested data stories
• Partially completed workflows
• You love JavaScript
Patterns/What Works Anti-Pattern/Danger
Anything that requires:
• Joins
• Complex transactional needs
• Lots of aggregation
Why Document
28. Azure DocumentDB .NET Code Samples
Azure DocumentDB 101 on Channel9
Azure DocumentDB 102 on Channel9
Build a web application with ASP.NET MVC using
DocumentDB
Learn More: Azure DocumentDB
41. CREATE TABLE IF NOT EXISTS "kinecttelemetry"("k" VARCHAR
primary key, "age" VARCHAR, "gender" VARCHAR)
default_column_family='demographics';
Apache Phoenix: SQL Skin over HBase
Phoenix in 15 Minutes or Less
42. Get started using HBase with Hadoop in HDInsight
Analyze Real-Time Twitter Sentiment with HBase in
HDInsight
Learn More: HBase on Azure
44. Hadoop On Your Terms
Cloudera Selects Microsoft
Azure as a Preferred Cloud
Platform
Hortonworks Data Platform
is now Microsoft Azure
Certified
100% Apache Hadoop-based
Service in the Cloud
Microsoft Azure
HDInsight
Qubole Partners with
Microsoft Azure
46. CREATE EXTERNAL TABLE irs_data_20082(
state string,
zipcode string,
agi_class int,
n1 int,
mars2 int,
prep int,
n2 int,
numdep int,
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION
'wasb://$containerName@$storageAccountName.blob.co
re.windows.net/all/data/';
Create Table Query
select state, zipcode,
agi_class
from irs_Data_20082;
Hadoop Hive: External Table
47. • Batch processing
• Map…and reduce
• Lots of aggregation
• Multiple schemas on same data
• Fast
Patterns/What Works Anti-Pattern/Danger
Anything that requires:
• Joins
• Complex transactional needs
• Granular security requirements
• Not a relational database
replacement
• Not fast
Why Hadoop
51. • Highly connected data
• Relationships make the data story
• Paths through data
• Finding shortest/longest path
Patterns/What Works Anti-Pattern/Danger
• Low connected data (e.g. Log data)
• Very high number of updates on a
regular basis.
Why Graph
53. Free Graph Dabases E-Book
Project Naiad from Microsoft Research
Learn More: Graph Databases
54. It’s fun
Database technologies aren’t YES/NO decisions
It’s inexpensive to learn
It’s fast to spin up a learning environment
A data professional needs to knows more than one tool
Using the right tool for the right job is key
It’s fun
7 Reasons to Go Explore