What do you do with all data you receive in an Event Hub, from your IoT devices? You can do telemetry, you can do analytics.There is a new tool in Azure right for this task: Azure Stream Analytics.
How it works, for what is good for.
2. www.slideshare.net/marco.parenzan
www.github.com/marcoparenzan
marco [dot] parenzan [at] 1nn0va [dot] it
www.1nnova.it
@marco_parenzan
Formazione ,Divulgazione e Consulenza con 1nn0va
Microsoft MVP 2014 for Microsoft Azure
Cloud Architect, NET developer
Loves Functional Programming, Html5 Game Programming and Internet of Things AZURE
COMMUNITY
BOOTCAMP 2015
IoT Day - 08/05/2015
@1nn0va
#microservicesconf2015
9 Maggio 2015
3.
4.
5.
6.
7.
8. Real-time Analytics
• Intake millions of events per second (up to 1 GB/s)
• Low processing latency, auto adaptive (sub-second to seconds)
• Correlate between different streams, or with reference data
• Find patterns or lack of patterns in data in real-time
Fully Managed Cloud Service
• No hardware acquisition and maintenance
• No platform/infrastructure deployment and maintenance
• Easily expand your business globally leveraging Azure regions
9. Mission Critical Reliability
• Guaranteed event delivery
• Guaranteed business continuity: Automatic and fast recovery
Effective Audits
• Privacy and security properties of solutions are evident
• Azure integration for monitoring and ops alerting
Easy To Scale
• Scale from small to large on demand
10. Rapid Development with SQL like language
• High-level: focus on stream analytics solution
• Concise: less code to maintain
• Fast test: Rapid development and debugging
• First-class support for event streams and reference data
Built in temporal semantics
• Built-in temporal windowing and joining
• Simple policy configuration to manage out-of-order events
and late arrivals
11. • SELECT
• FROM
• WHERE
• GROUP BY
• HAVING
• CASE WHEN THEN ELSE
• INNER/LEFT OUTER JOIN
• UNION
• CROSS/OUTER APPLY
• CAST
• INTO
• ORDER BY ASC, DSC
• WITH
• PARTITION B
• OVER
• DateName
• DatePart
• Day
• Month
• Year
• DateTimeFromParts
• DateDiff
• DateAdd
• TumblingWindow
• HoppingWindow
• SlidingWindow
• Sum
• Count
• Avg
• Min
• Max
• StDev
• StDevP
• Var
• VarP
• Len
• Concat
• CharIndex
• Substring
• PatIndex
• Lag IsFirst
• CollectTop
13. Filters
SELECT UserName, TimeZone
FROM InputStream
WHERE Topic = 'XBox'
Show me the user name and time zone of tweets on the topic XBox
"Haroon”, “Eastern
Time (US & Canada)”
"XO", “London”
“Zach Dotseth“, “London”,
“Football”,(…)
"Haroon”, “Eastern
Time (US & Canada)”
“XBox”,(…)
"XO",”London”,
“XBox“, (…)
14. Windowing Concepts
• Windows can be tumbling, hopping, or sliding
• Windows are fixed length
• Must be used in a GROUP BY clause
• Output event will have the timestamp of the end of the window
1 5 4 26 8 6 4
t1 t2 t5 t6t3 t4
Time
Window 1 Window 2 Window 3
Aggregate
Function (Sum)
18 14Output Events
15. SELECT Topic, Count(*) AS TotalTweets
FROM TwitterStream TIMESTAMP BY CreatedAt
GROUP BY Topic, TumblingWindow(second, 10)
“Give me the count of tweets every 10
seconds”
1 5 4 26 8 6
A 10-second Tumbling Window
8 6
5 3 6 1
1 5 4 26
6 15 3
16. SELECT Topic, Count(*) AS TotalTweets
FROM TwitterStream TIMESTAMP BY CreatedAt
GROUP BY Topic, HoppingWindow(second, 10, 5)
“Every 5 seconds give me the count of
tweets over the last 10 seconds”
1 5 4 26 8 6
A 10-second Hopping Window with a 5-second “Hop”
4 26
8 6
5 3 6 1
1 5 4 26
8 6 5 3
6 15 3
17. SELECT Topic, Count(*) AS TotalTweets
FROM TwitterStream TIMESTAMP BY CreatedAt
GROUP BY Topic, SlidingWindow(second, 10)
“Give me the count of tweets in every
distinct 10 seconds window”
1 5 4 26 8 6
Every 10-second Sliding Window with changes
8 6
5 3 6 1
1 5 4 26
6 15 3
18.
19. Reference Data Seamless correlation of event streams
with reference data
Static or slowly-changing data stored in blobs
CSV and JSON files in Azure Blobs;
scanned for new snapshots on a settable cadence
JOIN (INNER or LEFT OUTER) between streams and
reference data sources
Reference data appears like another input:
SELECT myRefData.Name, myStream.Value
FROM myStream
JOIN myRefData
ON myStream.myKey = myRefData.myKey
20. WITH Step1 AS (
SELECT Count(*) AS CountTweets, Topic
FROM TwitterStream PARTITION BY PartitionId
GROUP BY TumblingWindow(second, 3), Topic, PartitionId
),
Step2 AS (
SELECT Avg(CountTweets)
FROM Step1
GROUP BY TumblingWindow(minute, 3)
)
SELECT * INTO Output1 FROM Step1
SELECT * INTO Output2 FROM Step2
SELECT * INTO Output3 FROM Step2
• A query can have multiple
steps to enable pipeline
execution
• A step is a sub-query defined
using WITH (“common table
expression”)
• Can be used to develop
complex queries more
elegantly by creating a
intermediary named result
• Creates unit of execution for
scaling out when PARTITION
BY is used
• Each step’s output can be sent
to multiple output targets
using INTO
21. Partitioning allows for
parallel execution over
scaled-out resources
SELECT Count(*) AS Count, Topic
FROM TwitterStream PARTITION BY PartitionId
GROUP BY TumblingWindow(minute, 3), Topic, PartitionId
Query Result 1
Query Result 2
Query Result 3
Event Hub