6. Service Bus
Queue
Message
Batch
Process
Messages
Process
Message
Process
Message ..
7. Service Bus
Queue
Message
Batch
Process
Messages
Process
Message
..
Process
Message
10. 00:30.2
00:25.9
00:21.6
00:17.3
00:13.0
00:08.6
00:04.3
00:00.0
Message
Type 1
Message
Type 2
Message
Type 3
Message
Type 4
Message
Type 5
Message
Type 6
Message
Type 7
Message
Type 8
Variation in Message Processing
Avg Min Max
17. ...
Azure Cloud Service
Web Role Worker
Blob
Queue
Azure Storage Account
20. Query Throughput Latency Reach
Every 30 seconds, each device publishes a status
update (location, health, etc)
4k – 100k
msgs/sec
2000 – 5000
ms
Single
device
Every 10 minutes, a batch job retrieves all of the
status updates delivered in the past 10 minutes
2M msgs / 10
minutes
2 minutes All devices
On an ad-hoc basis, a user may request the
current status and recent history of all of their
devices
15 requests /
second
500 ms Limited
device set
On an ad-hoc basis, a user may request a
historical time range of all of their devices
5 requests /
second
750 ms Limited
device set
28. Pk={Device;Day},
Rk={Timestamp}
Payload={fields}
STB Readiness
This isn’t a relational workload
Per-device insert and lookup
Periodic batch transfer
Per-device lookup
Natural fit for table storage
Device ID = Pk
Data type = Rk
Periodic batch transfer
Natural fit for blob storage
Instance + Timestamp = blob id
Buffer and write into blocks
Roll over on time interval (10 min)
0101 1101 0111
1101 0111 ...
Time/space
buffer
Table
Storage
Blob
Storage
Uri={Minute;Instance}
Payload={JSON Data}
Querying by device
By time - direct { PkRk } lookup
By day - direct { Pk } max of 2880 records per partition
Batch transfer by time frame
Parallel download of all blobs matching timeframe pattern
Adding scale capacity
20k operations per storage account,
33. Where are the
scalability
bottlenecks?
Where are the
availability and
failure points?
Where are the key
insight and
instrumentation
points?
Cloud Service
Front End
Web Role
Instance Instance Instance Instance
Caching
Role
Instance Instance
Worker
Role
Instance
Databases
DB DB DB DB
Storage
Storage
Account
Storage
Account
Optimize for the most stringent case
More options for latency insensitive workflows
Simplicity is king
Use the simplest, most robust approach that fulfills needs
But not simpler…
That’s not always the obvious or familiar one…
No one, true solution
Favour of composition of approaches to improve resiliency, reduce complexity
Periodic query spike on bulk reporting
Impact to online operations (30M+ rows)
Rebalancing
Moving data between partitions / databases
Distribution of reference data (relational model)
Keeping in sync
Impact of noisy neighbors (Azure SQL DB)
Variable latency, pushback under heavy load
Cost of management (SQL IaaS)
Cost of automation for patching, maintenance
Periodic query spike on bulk reporting
Impact to online operations (30M+ rows)
Rebalancing
Moving data between partitions / databases
Distribution of reference data (relational model)
Keeping in sync
Impact of noisy neighbors (Azure SQL DB)
Variable latency, pushback under heavy load
Cost of management (SQL IaaS)
Cost of automation for patching, maintenance
Periodic query spike on bulk reporting
Impact to online operations (30M+ rows)
Rebalancing
Moving data between partitions / databases
Distribution of reference data (relational model)
Keeping in sync
Impact of noisy neighbors (Azure SQL DB)
Variable latency, pushback under heavy load
Cost of management (SQL IaaS)
Cost of automation for patching, maintenance
Scalability bottlenecks
No long able to add additional capacity?
How to add additional scale units?
Where are the key optimization points?
Messaging, serialization, asynchronicity
Availability and failure points
Node level
Service level
DC level
Operational and instrumentation points