IoT and Business don't depend on data, but on processes.
So choosing a relational Db is not always the correct choice. In an IoT scenario, is better finding a data solution to store data with more performance: NoSQL databases. We'll see DocumentDb, the NoSql Db from Microsoft in Azure. But there are also other alternatives!
3. IoT day 2015
Speaker info/Marco Parenzan
www.slideshare.net/marco.parenzan
www.github.com/marcoparenzan
marco [dot] parenzan [at] 1nn0va [dot] it
www.1nnova.it
@marco_parenzan
Formazione ,Divulgazione e Consulenza con 1nn0va
Microsoft MVP 2014 for Microsoft Azure
Cloud Architect, NET developer
Loves Functional Programming, Html5 Game Programming and Internet of Things
Microservices
Saturday 2015:
un viaggio con
NServiceBus LI
VE
AZURE
COMMUNITY
BOOTCAMP 2015
5. IoT day 2015
Data Ecosystem
Where do I put data
received in EventHub?
6. From private to public Cloud
A Continuous offering
Microsoft Relational Storage Options
7. IoT day 2015
SQL Server database technology “as a Service”
Fully managed database-as-a-service built on SQL with near zero administration
Enterprise-ready with automatic support for HA, DR, Backups, replication and more
Highly available and elastically scalable for unpredictable SaaS workloads
Uptime SLA of 99.99%
Predictable performance & Pricing
Built-in regional database geo-replication for additional protection
All core search capabilities - faceting, suggestions, geospatial
Secure and compliant for your sensitive data
Fully compatible with SQL Server 2014 databases
SQL Azure features
10. IoT day 2015
Business, no longer data, is the foundation of software design
DDD!=OOP
Don’t start from Data
Data are not unique
No more ACID…ACID transactions are not useful with a
distributed model over different storages
Paradigm Shift
11. IoT day 2015
How many queries can be determined at level analysis?
“A repository should offer an explicit and well defined contract
and avoid arbitrary query”
In business … don’t‘ delete anything (Repository doesn’t
delete anything)
From theory to practice
14. CQRS for IoT (Service Bus Powered)
Event Handler
UI
Event
Command Handler
Event
Device
Queue
Topics/Subscription
Event Hub
Write
Model
Read
/Search
Model
15. IoT day 2015
No longer build on data…but on “what happens”
No more one single data store
Data store typess
Logs
Persistence
Saga (long transactions)
Search
Event-based systems
23. What is a document database?
Definitely NOT this
kind of document !
24. What is a document database?
Not ideal, but it can work -
{
"id": "13244_post",
"text": "Lorizzle ghetto dolor tellivizzle boofron, stuff pimpin' elizzle. Nullam sapizzle
velizzle, my shizz tellivizzle, suscipizzle funky fresh, shizzle my nizzle crocodizzle
vizzle, arcu. Pellentesque eget tortizzle. Sizzle erizzle. Mammasay mammasa mamma oo sa
break it down dolor own yo' things fo shizzle mah nizzle fo rizzle, mah home g-dizzle
sure. Maurizzle pellentesque dawg ghetto turpizzle. Shiz izzle my shizz. Pellentesque
eleifend rhoncizzle nisi. In its fo rizzle owned ma nizzle dictumst. Sizzle gangsta.
Curabitur tellizzle urna, pretizzle go to hizzle, mattizzle izzle, eleifend vitae,
tellivizzle. Dawg shizzlin dizzle. Integer semper velit sizzle stuff.
Boofron mofo auctizzle ma nizzle. Pot a elizzle ut nibh pretium tincidunt. Maecenizzle
things erat. Own yo' in lacizzle sed maurizzle elementizzle tristique. I'm in the
shizzle yippiyo sizzle daahng dawg eros ultricizzle . In velit tortor, ultricizzle
ghetto, hendrerizzle fo shizzle mah nizzle fo rizzle, mah home g-dizzle, adipiscing
crunk, boom shackalack. Etizzle velit doggy, hizzle consequizzle, pharetra get down
get down, dictizzle sed, shut the shizzle up. Fo shizzle neque. Fo lorizzle. Bling
bling vitae pizzle ut libero commodo gizzle. Fusce izzle augue eu yo mamma dang.
Phasellizzle break it down fo nizzle erat. Suspendisse shizzlin dizzle owned,
sollicitudin sizzle, mah nizzle izzle, commodo nec, justo. Donizzle fizzle
porttitizzle ligula. Nunc feugizzle, tellus tellivizzle ornare tempor, sapizzle break
it down tincidunt gangster, eget dapibus daahng dawg enizzle izzle that's the shizzle.
Stuff quizzle leo, imperdizzle izzle, fo shizzle my nizzle izzle, semper izzle,
sapien. Ut boofron magna vizzle ghetto. I'm in the shizzle ante bling bling,
suscipizzle vitae, yo mamma stuff, rutrizzle pizzle, velizzle.
Mauris da bomb go to zzle. Sizzle mammasay mammasa mamma oo sa magna own yo' amet risus
congue. Boofron mofo auctizzle ma nizzle. Pot a elizzle ut nibh pretium tincidunt.
things erat. Own yo' in lacizzle sed maurizzle elementizzle tristique. I'm in the
shizzle yippiyo sizzle daahng dawg eros ultricizzle . In velit tortor, ultricizzle
ghetto, hendrerizzle fo shizzle mah nizzle fo rizzle, mah home g-dizzle, adipiscing
crunk, boom shackalack. Etizzle velit doggy, hizzle consequizzle, pharetra get down
get down, dictizzle sed, shut the shizzle up. Fo shizzle neque. Fo lorizzle. Bling "
}
26. IoT day 2015
JSON can represent complex containment relationships that are
difficult to represent in RDBMS
Schema-less – great for growing requirements during dev unlike
RDBMS where you must know the structure up front and its
painful to modify it
Native notation for JavaScript
Why JSON?
27. IoT day 2015
try to treat your entities as self-contained documents represented in JSON
When working with relational databases, we've been taught for years to normalize, normalize,
normalize.
There are contains relationships between entities.
There are one-to-few relationships between entities.
There is embedded data that changes infrequently.
There is embedded data won't grow without bound.
There is embedded data that is integral to data in a document.
Embedding
better read performance
28. IoT day 2015
Representing one-to-many relationships.
Representing many-to-many relationships.
Related data changes frequently.
Referenced data could be unbounded
Provides more flexibility than embedding
More round trips to read data
Referencing
Normalizing typically provides better write performance
29. •
No magic bullet
Think about how your data
is going to be written, read
and model accordingly
Hybrid models ~ denormalize + reference + aggregate
{
"id": "1",
"firstName": "Thomas",
"lastName": "Andersen",
"countOfBooks": 3,
"books": [1, 2, 3],
"images": [
{"thumbnail": "http://....png"}
{"profile": "http://....png"}
]
}
{
"id": 1,
"name": "DocumentDB 101",
"authors": [
{"id": 1, "name": "Thomas Andersen", "thumbnail": "http://....png"},
{"id": 2, "name": "William Wakefield", "thumbnail": "http://....png"}
]
}
30. IoT day 2015
Promote code first development (mapping objects to json)
Resilient to iterative schema changes
Richer query and indexing (compared to KV stores)
Low impedance as object / JSON store; no ORM required
It just works
It’s fast
Developer Appeal
32. IoT day 2015
Store schema-less JSON documents
Excels at search w/ SQL syntax
JavaScript for Stored Procs, Triggers and UDFs
Elastic capacity (not in specific Azure sense, up to now)
Multi-document transaction (Batch)
Tweak everything (read/write performance vs. consistency, index
performance, security)
Designed for massive scale
What is DocumentDb?
33. IoT day 2015
Applications that need managed elastic scale
Customer does not want to add additional IT resources for
support and maintenance
Avoiding CAPEX and OPEX
Built-for-the-cloud database technology
Access via RESTful HTTP API or client library
DocumentDB: DbaaS
34. IoT day 2015
Catalog data
Preferences and state
Event store
User generated content
Data exchange
Typical usage
42. IoT day 2015
a container of JSON documents and the associated JavaScript
application logic
JSON docs inside of a collection can vary dramatically
A unit of scale for transaction and query throughput (capacity
units allocated uniformly across all collections)
A unit of scale for capacity
A unit of replication
What is a collection?
43. IoT day 2015
Collections in DocumentDB are not just logical containers, but
also physical containers
They are the transaction boundary for stored procedures and
triggers
entry point to queries and CRUD operations
Each collection is assigned a reserved amount of throughput
which is not shared with other collections in the same account
Collections do not enforce schema
Collections
45. Design: Partitioning
Why Partition?
• Data Size
A single collection (currently*) holds 10GB
• Throughput
3 Performance tiers with a max of 2,500 RU/sec
46. IoT day 2015
In hash partitioning, partitions are assigned based on the value
of a hash function, allowing you to evenly distribute requests
and data across a number of partitions. This is commonly used
to partition data produced or consumed from a large number of
distinct clients, and is useful for storing user profiles, catalog
items, and IoT ("Internet of Things") telemetry data.
Hash Partitioning
47. IoT day 2015
In range partitioning, partitions are assigned based on whether
the partition key is within a certain range
This is commonly used for partitioning with time
stamp properties
Keep current data hot, Warm historical data, Scale-down older
data, Purge / Archive
Range partitioning
48. IoT day 2015
In lookup partitioning, partitions are assigned based on a
lookup map that assigns discrete partition values to specific
partitions a.k.a. a partition or shard map
This is commonly used for partitioning by region
Lookup partitioning
Tenant Partition Id
Customer 1
Big Customer 2
Another 3
51. IoT day 2015
Query / transaction throughput (and reliability – i.e., hardware failure) depend on
replication!
All writes to the primary are replicated across two secondary replicas
All reads are distributed across three copies
“Scalability of throughput” – allowing different clients to read from different replicas helps prevent
bottlenecks
BUT replication takes time!
Potential scenario: some clients are
reading while another is writing
Now, the data is out-of-date, inconsistent!
Why worry about consistency?
52. IoT day 2015
Trade-off: speed (performance & availability) or consistency
(data correctness)?
“Does every read need the MOST current data?”
“Or do I need every request to be handled and handled quickly?”
No “one size fits all” answer … so it’s up to you!
4 options …
For the entire Db…
…In a future release, we intend to support overriding the default consistency level on
a per collection basis.
Tweakable Consistency
53. IoT day 2015
client always sees completely consistent data
Slowest reads / writes
Mission critical: e.x. stock market, banking, airline reservation
Strong
54. IoT day 2015
Default – even trade-off between performance & availability vs.
data correctness
client reads its own writes, but other clients reading this same
data might see older values
Session
55. IoT day 2015
client might see old data, but it can specify a limit for how old
that data can be (ex. 2 seconds)
Updates happen in order received
similar to Session consistency, but speeds up reads while still
preserving the order of updates
Bounded Staleness
56. IoT day 2015
client might see old data for as long as it takes a write to
propagate to all replicas
High performance & availability, but a client might sometimes
read out-of-date information or see updates out of order
Eventual
57. IoT day 2015
At the database level (see preview portal)
On a per-read or per-query basis (optional parameter on
CreateDocumentQuery method)
Setting Consistency
58. IoT day 2015
Use Weaker Consistency Levels for better Read latencies
• IoT
• Data Analysis
http://azure.microsoft.com/blog/2015/01/27/performance-tips-
for-azure-documentdb-part-2/
Consistency Tips
60. IoT day 2015
Efficient, rich hierarchical and relational queries without any schema or
index definitions.
Consistent query results while handling a sustained volume of writes. For
high write throughput workloads with consistent queries, the index is
updated incrementally, efficiently, and online while handling a sustained
volume of writes.
Storage efficiency. For cost effectiveness, the on-disk storage overhead of
the index is bounded and predictable.
Indexing
61. var collection = new DocumentCollection
{
Id = "lazyCollection"
};
collection.IndexingPolicy.IndexingMode = IndexingMode.Lazy;
client.CreateDocumentCollectionAsync(databaseLink, collection);
Indexing modes
Consistent
Default mode
Index updated synchronously on writes
Lazy
Useful for bulk ingestion scenarios
Indexing policies
Automatic
Default
Manual
Can choose to index documents via
RequestOptions
Can read non-indexed documents
via selflink
Indexing – Modes and policies
Set indexing mode
Set indexing policy
var collection = new DocumentCollection
{
Id = "manualCollection"
};
collection.IndexingPolicy.Automatic = false;
client.CreateDocumentCollectionAsync(databaseLink, collection);
62. Setting paths, types, and precision
var collection = new DocumentCollection
{
Id = "Orders"
};
collection.IndexingPolicy.ExcludedPaths.Add("/"metaData"/*");
collection.IndexingPolicy.IncludedPaths.Add(new IndexingPath
{
IndexType = IndexType.Hash,
Path = "/",
});
collection.IndexingPolicy.IncludedPaths.Add(new IndexingPath
{
IndexType = IndexType.Range,
Path = @"/""shippedTimestamp""/?",
NumericPrecision = 7
});
client.CreateDocumentCollectionAsync(databaseLink, collection);
Index paths
Include and/or Exclude paths
Index types
Hash
Supported for strings and numbers
Optimized for equality matches
Range
Supported for numbers
Optimized for comparison queries
Index precision
String precision
Default is 3
Numeric precision
Default is 3
Increase for larger number fields
Indexing – Paths and types
63. IoT day 2015
Use lazy indexing for faster peak time ingestion rates
Exclude unused paths from indexing for faster writes
Specify range index path type for all paths used in range queries
Vary index precision for write vs query performance vs storage
tradeoffs
http://azure.microsoft.com/blog/2015/01/27/performance-tips-
for-azure-documentdb-part-2/
Indexing tips
65. IoT day 2015
Optimize for queries with small result sets for scalability
Limit use of scans (no range index, NOT, UDFs in WHERE)
Use page size (MaxItemCount) and continuation tokens
For large result sets, use a larger page size (1000)
Querying
66. Query over heterogeneous documents without defining
schema or managing indexes
Query arbitrary paths, properties and values without
specifying secondary indexes or indexing hints
Execute queries with consistent results
Supported SQL features; predicates, iterations (arrays),
sub-queries, logical operators, UDFs, intra-document
JOINs, JSON transforms
In general, more predicates result in a larger request
charge.
Additional predicates can help if they result in narrowing
the overall result set.
from book in client.CreateDocumentQuery<Book>(collectionSelfLink)
where book.Title == "War and Peace"
select book;
from book in client.CreateDocumentQuery<Book>(collectionSelfLink)
where book.Author.Name == "Leo Tolstoy"
select book.Author;
-- Nested lookup against index
SELECT B.Author
FROM Books B
WHERE B.Author.Name = "Leo Tolstoy"
-- Transformation, Filters, Array access
SELECT { Name: B.Title, Author: B.Author.Name }
FROM Books B
WHERE B.Price > 10 AND B.Language[0] = "English"
-- Joins, User Defined Functions (UDF)
SELECT udf.CalculateRegionalTax(B.Price, "USA", "WA")
FROM Books B
JOIN L IN B.Languages
WHERE L.Language = "Russian"
LINQ Query
SQL Query Grammar
Query
68. function region(doc)
{
switch (doc.Location.Region)
{
case 0:
return "North";
case 1:
return "Middle";
case 2:
return "South";
}
}
The complexity of a query impacts the
request units consumed for an operation:
Use of user-defined functions (UDFs)
SELECT or WHERE clauses
To take advantage of indexing, try and have at least
one filter against an indexed property when
leveraging a UDF in the WHERE clause
.
Query with user-defined function
69. function count(filterQuery, continuationToken) {
var collection = getContext().getCollection();
var maxResult = 25; // MAX number of docs to process in one
batch, when reached, return to client/request continuation.
// intentionally set low to demonstrate the concept. This can
be much higher. Try experimenting.
// We've had it in to the high thousands before seeing the
stored proceudre timing out.
// The number of documents counted.
var result = 0;
tryQuery(continuationToken);
}
Execute “explicit” Javascript
code on collection
Executing Stored Procedures
70. function normalize() {
var collection = getContext().getCollection();
var collectionLink = collection.getSelfLink();
var doc = getContext().getRequest().getBody();
var newDoc = {
"Sensor": {
"Id": doc.sensorId,
"Class": 0
},
"Degree": {
"Value": doc.degreeValue,
"Type": 0
},
"Location": {
"Name": doc.locationName,
"Region": doc.locationRegion,
"Longitude": doc.locationLong,
"Latitude": doc.locationLat
},
"id": doc.id
};
// Update the request -- this is what is going to be inserted.
getContext().getRequest().setBody(newDoc);
}
Execute “implicit” Javascript
code on CRUD operations
(Insert, Update, Delete) on
collections
Triggers!
72. IoT day 2015
Data is saved on SSD
All writes to the primary are replicated across two secondary
replicas
(Replicas are spread on different hardware in same region to protect
against failures)
All reads are distributed across the
three copies (when and how depend
on consistency level for db account
and query)
DocumentDb Performance
73. IoT day 2015
Measure and Tune for lower request units/second usage
DocumentDB offers a rich set of database operations including relational and hierarchical queries with UDFs, stored procedures and triggers – all operating on the
documents within a database collection. The cost associated with each of these operations will vary based on the CPU, IO and memory required to complete the operation.
Instead of thinking about and managing hardware resources, you can think of a request unit (RU) as a single measure for the resources required to perform various database
operations and service an application request.
Handle Server throttles/request rate too large
When a client attempts to exceed the reserved throughput for an account, there will be no performance degradation at the server and no use of throughput capacity beyond the reserved
level. The server will preemptively end the request with RequestRateTooLarge (HTTP status code 429) and return the x-ms-retry-after-ms header indicating the amount of time, in
milliseconds, that the user must wait before reattempting the request.
Delete empty collections to utilize all provisioned throughput
Every document collection created in a DocumentDB account is allocated reserved throughput capacity based on the number of Capacity Units (CUs) provisioned, and the number of
collections created. A single CU makes available 2,000 request units (RUs) and supports up to 3 collections
Design for smaller documents for higher throughput
The Request Charge (i.e. request processing cost) of a given operation is directly correlated to the size of the document
http://azure.microsoft.com/blog/2015/01/27/performance-tips-for-azure-documentdb-part-2/
Performance Tips
75. IoT day 2015
User generated content
Many specific data (varbinary(MAX) in SQL)
Catalog data
Log data
User preferences data
Device sensor data
IoT use cases commonly share some patterns in how they ingest, process and store data. First, these
systems allow for data intake that can ingest bursts of data from device sensors of various locales. Next,
these systems process and analyze streaming data to derive real time insights. And last but not least,
most if not all data will eventually land in a data store for adhoc querying and offline analytics.
Usage: what is DocumentDb for?
76. IoT day 2015
Maturity: Balancing embedding (ok) and relating (limits)
Searching and Denormalizing
Opportunity
Storing transient Data
Better Opportunities
Storing Files
Append Only
(Table) Storage
Limits from DocumentDb
78. IoT day 2015
Targeted at streaming workloads (E.g. files read from beginning
to end like media files)
Each blob consists of a sequence of blocks
Each block is identified by a Block ID
Each block can be a maximum of 64 MB in size
Size limit 200GB per blob
Azure Storage Blob: Block Blob
Block Blob:
79. IoT day 2015
Targeted at random read/write workloads (E.g. backing storage
for the VHDs used in Azure VMs)
Each blob consists of an array of pages
Each page is identified by its offset from the start of the blob
Size limit 1TB per blob
Azure Storage Blob: Page Blob
80. IoT day 2015
Not an RDBMS Table!
The mental picture is ‘Entities’
Entity can have up to 255 properties
Up to 1MB per entity
Partitioning
PartitionKey & RowKey are mandatory properties
Composite key which uniquely identifies an entity
They are the only indexed properties
Defines the sort order
Purpose of the PartitionKey:
Entity Locality
Entities in the same partition will be stored together
Efficient querying and cache locality
Entity Group Transactions
Target throughput – 500 tps/partition, several thousand tps/account
Microsoft Azure monitors the usage patterns of partitions
Automatically load balance partitions
Each partition can be served by a different storage node
Scale to meet the traffic needs of your table
Supports full manipulation (CRUD)
Table Scalability
Azure Table Storage Details
81. IoT day 2015
Embed a sophisticated search experience into web and mobile
applications without having to worry about the complexities of
full-text search and without having to deploy, maintain or
manage any infrastructure.
Perfect for enterprise cloud developers, cloud software vendors,
cloud architects who need a fully-managed search solution.
Search is a natural backend for Cortana
Take a bunch of words apply linguistics return relevant results
Azure Search
82. IoT day 2015
“Search service”
Scope for capacity
Bound to a region
Has keys, indexes, indexers, data sources
Provisioning
Azure Portal
Azure resource management API
Elastic scale
Capacity can be changed dynamically
Replicas ~ more QPS, HA
Partitions ~ more documents, write throughput
Azure Search Service
83. IoT day 2015
Simple HTTP/JSON API for creating indexes, pushing documents, searching
Keyword search with user-friendly operators (+, -, *, “”, etc.)
Hit highlighting
Faceting (histograms over ranges, typically used in catalog browsing)
Based on ElasticSearch
Search Functionality
84. IoT day 2015
Linguistics are key in search
Support for 50 languages
Word breaking, stop words, inflections
Lucene analyzers
Well-known analyzer stack
Stemming
Microsoft analyzers
Same NLP stack used by parts of Office, Bing
Lematization in many languages
Linguistics
85. IoT day 2015
Suggestions (auto-complete)
Rich structured queries (filter, select, sort) that combines with search
Scoring profiles to model search result relevance
Geo-spatial support integrated in filtering, sorting and ranking (such as finding all
restaurants within 5 KM of your current location)
Search Functionality
86. IoT day 2015
Redis is an open source, BSD licensed, networked, single-
threaded, in-memory key-value cache and store.
Key-value cache and store (value can be a couple of things)
In-memory (no persistence, but you can)
Single-threaded (atomic operations & transactions)
Networked (it’s a server and it does master/slave)
Some other stuff (scripting, pub/sub, Sentinel, snapshot
Caching: Redis
88. IoT day 2015
Pro:
partitioning, replica and scaling at it’s core
self contained documents
programmability in Javascript
SQL like “intradocument” queries
Cons:
No SQL generic queries
Can work alone just in few scenarios
So DocumentDb…
89. IoT day 2015
Great storage opportunities in Azure
• Log
• Search
• Transient
• Files/Attachments
• SQL!
• And all new Data Analysis/Machine Learning opportunities
Other Not Only SQL alternatives
90. IoT day 2015
http://bit.do/documentdb-pricing
Capacity Units (CU)
Capacity
Throughput (in terms of rate of transactions / second)
• Request Units (RU) = 2000 request per second
• “Request” depends on the size of the document – ex. Uploading 1000 large JSON documents
might count as more than one request
Pricing
91. Standard pricing tier with hourly
billing
1 hr from just $0.034!
Performance levels can be
adjusted
Each collection = 10GB of SSD
Collection* perf is set by S1, S2,
S3
Limit of 100 collections (1 TB)
Soft limit, can be lifted as
needed per account
What does DocumentDB cost?
* collection != table of homogenous entities
collection ~ a data partition
92. IoT day 2015
NoSQL in Azure per l’IoT
(e il Business)
Marco Parenzan
Microsoft Azure MVP
@marco_parenzan
marco [dot] parenzan [at] 1nn0va [dot] it
Hinweis der Redaktion
Slide Objectives:
Show Microsoft’ continuous Private to Public Cloud Offering, but this presentation will focus on Microsoft’s relational database PaaS offering.
Transition:
Microsoft provides a continuous solution from private cloud to the public cloud. No matter where you are on your technology roadmap we have a solution to fit your needs.
We are a trusted advisor and platform in the traditional enterprise and ISV space with new IaaS offerings that making it easier to bring this same level of trust and ease of use to the public cloud. However, Microsoft Azure SQL Database extends SQL Server capabilities to the cloud by offering SQL Server as a relational database service.
Speaking Points:
SQL Database provides SQL Server as a relational service.
Slide Objectives:
Understand the overall concepts and benefits of SQL Database
Transition:
Let’s clear up any confusion and look at the basics of what SQL Database really is and some of its benefits.
Speaking Points:
The same great SQL Server database technology that you know, love, and use on-premises provided as a service
Enterprise-ready
Automatic support for High-Availability
DR = Disaster Recovery
Designed to scale on-demand to provide the same great elasticity
Notes:
High-availability – 3 copies of the database free for the cost of the one database. Always in sync. The cost to do this on-premises isn’t cheap. This is FREE in SQL Database.
Notes
A data lake is a massive, easily accessible, centralized repository of large volumes of structured and unstructured data
Slide Objective
Speaker Notes
Notes
Slide Objective
Speaker Notes
Notes
Slide Objective
Understand block blob
Speaker Notes
Block blobs are comprised of blocks, each of which is identified by a block ID.
You create or modify a block blob by uploading a set of blocks and committing them by their block IDs.
Each block can be a maximum of 64 MB in size. The maximum size for a block blob in version 2009-09-19 is 200 GB, or up to 50,000 blocks.
Notes
http://msdn.microsoft.com/en-us/library/dd135734.aspx
Slide Objective
Understand page blob
Speaker Notes
Page blobs are a collection of pages.
A page is a range of data that is identified by its offset from the start of the blob.
To create a page blob, you initialize the page blob by calling Put Blob and specifying its maximum size.
To add content to or update a page blob, you call the Put Page operation to modify a page or range of pages by specifying an offset and range. All pages must align 512-byte page boundaries.
Unlike writes to block blobs, writes to page blobs happen in-place and are immediately committed to the blob.
The maximum size for a page blob is 1 TB.
A page written to a page blob may be up to 1 TB in size but will typically be much smaller
Notes
http://msdn.microsoft.com/en-us/library/dd135734.aspx
Slide Objectives
Understand Tables
Speaker Notes
Within a storage account, a developer may create named tables.
Tables store data as entities.
An entity is a collection of named properties and their values, similar to a row.
Tables are partitioned to support load balancing across storage nodes.
Each table has as its first property a partition key that specifies the partition an entity belongs to.
The second property is a row key that identifies an entity within a given partition.
The combination of the partition key and the row key forms a primary key that identifies each entity uniquely within the table.
The Table service does not enforce any schema.
A developer may choose to implement and enforce a schema on the client side
Notes
http://msdn.microsoft.com/en-us/library/dd573356.aspx
Azure Search is a fully managed search solution that allows developers to enable search experiences in applications.
What Azure Search does is that it sits right next to your data store (relational or NOSQL) which can be on-prem or on the Cloud (which may be Azure or any other public cloud) and provides the necessary index that can be used to search the operational data.
This service is used only by the application developer and saves him the overhead of developing a search function specifically for his app.
Faceted navigation is a filtering mechanism that provides self-directed drilldown navigation in search applications.