2. www.slideshare.net/marco.parenzan
www.github.com/marcoparenzan
marco [dot] parenzan [at] 1nn0va [dot] it
www.1nnova.it
@marco_parenzan
Formazione ,Divulgazione e Consulenza con 1nn0va
Microsoft MVP 2015 for Microsoft Azure
Cloud Architect, NET developer
Loves Functional Programming, Html5 Game Programming and Internet of Things AZURE
COMMUNITY
BOOTCAMP 2015
IoT Day - 08/05/2015
@1nn0va
#microservicesconf2015
9 Maggio 2015
7. IoT day 2015
Business, no longer data, is the foundation of software design
DDD!=OOP
Don’t start from Data
Data are not unique
No more ACID…ACID transactions are not useful with a
distributed model over different storages
10. IoT day 2015
try to treat your entities as self-contained documents represented in JSON
When working with relational databases, we've been taught for years to normalize, normalize,
normalize.
There are contains relationships between entities.
There are one-to-few relationships between entities.
There is embedded data that changes infrequently.
There is embedded data won't grow without bound.
There is embedded data that is integral to data in a document.
better read performance
11. IoT day 2015
Representing one-to-many relationships.
Representing many-to-many relationships.
Related data changes frequently.
Referenced data could be unbounded
Provides more flexibility than embedding
More round trips to read data
Normalizing typically provides better write performance
12. IoT day 2015
Promote code first development (mapping objects to json)
Resilient to iterative schema changes
Richer query and indexing (compared to KV stores)
Low impedance as object / JSON store; no ORM required
It just works
It’s fast
13. IoT day 2015
a container of JSON documents and the associated JavaScript
application logic
JSON docs inside of a collection can vary dramatically
A unit of scale for transaction and query throughput (capacity
units allocated uniformly across all collections)
A unit of scale for capacity
A unit of replication
14. IoT day 2015
Collections in DocumentDB are not just logical containers, but
also physical containers
They are the transaction boundary for stored procedures and triggers
entry point to queries and CRUD operations
Each collection is assigned a reserved amount of throughput which is
not shared with other collections in the same account
Collections do not enforce schema
16. IoT day 2015
In hash partitioning, partitions are assigned based on the value
of a hash function, allowing you to evenly distribute requests
and data across a number of partitions. This is commonly used
to partition data produced or consumed from a large number
of distinct clients, and is useful for storing user profiles, catalog
items, and IoT ("Internet of Things") telemetry data.
Evenly distribute across n number of partitions (algorithmic) ….
17. IoT day 2015
In range partitioning, partitions are assigned based on whether
the partition key is within a certain range
This is commonly used for partitioning with time
stamp properties
Keep current data hot, Warm historical data, Scale-down older
data, Purge / Archive
18. IoT day 2015
In lookup partitioning, partitions are assigned based on a lookup
map that assigns discrete partition values to specific partitions a.k.a. a
partition or shard map
This is commonly used for partitioning by region
Home tenant / user to a specific partition. Use "master" lookup.
Cache this shard map to avoid making the lookup the bottleneck
Tenant Partition Id
Customer 1
Big Customer 2
Another 3
20. IoT day 2015
Query / transaction throughput (and reliability – i.e., hardware
failure) depend on replication!
All writes to the primary are replicated across two secondary replicas
All reads are distributed across three copies
“Scalability of throughput” – allowing different clients to read from different replicas
helps prevent bottlenecks
BUT replication takes time!
Potential scenario: some clients are
reading while another is writing
Now, the data is out-of-date, inconsistent!
21. IoT day 2015
Trade-off: speed (performance & availability) or consistency
(data correctness)?
“Does every read need the MOST current data?”
“Or do I need every request to be handled and handled quickly?”
No “one size fits all” answer … so it’s up to you!
4 options …
For the entire Db…
…In a future release, we intend to support overriding the default consistency level
on a per collection basis.
22. IoT day 2015
client always sees completely consistent data
Slowest reads / writes
Mission critical: e.x. stock market, banking, airline reservation
23. IoT day 2015
Default – even trade-off between performance & availability vs.
data correctness
client reads its own writes, but other clients reading this same
data might see older values
24. IoT day 2015
client might see old data, but it can specify a limit for how old
that data can be (ex. 2 seconds)
Updates happen in order received
similar to Session consistency, but speeds up reads while still
preserving the order of updates
25. IoT day 2015
client might see old data for as long as it takes a write to
propagate to all replicas
High performance & availability, but a client might sometimes
read out-of-date information or see updates out of order
26. IoT day 2015
At the database level (see preview portal)
On a per-read or per-query basis (optional parameter on
CreateDocumentQuery method)
27. IoT day 2015
Use Weaker Consistency Levels for better Read latencies
IoT
Data Analysis
http://azure.microsoft.com/blog/2015/01/27/performance-tips-
for-azure-documentdb-part-2/