Being able to analyze data in real-time will be a very hot topic for sure in near future. Not only for IoT-related tasks but as a general approach to user-to-machine or machine-to-machine interaction. From product recommendations to fraud detection alarms, a lot of stuff would be perfect if it could happen in real time. Now, with Azure Event Hubs and Stream Analytics, it’s possible. In this session, Davide will demonstrate how to use Event Hubs to quickly ingest new real-time data and Stream Analytics to query on-the-fly data, in order to do a real-time analysis of what’s happening right now.
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
Event Hub & Azure Stream Analytics
1. Event Hub & Azure Stream Analytics
Davide Mauri
Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek
2. Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek
About Me
Microsoft SQL Server MVP
Works with SQL Server from 6.5, on BI from 2003
Specialized in Data Solution Architecture, Database Design, Performance
Tuning, High-Performance Data Warehousing, BI, Big Data
President of UGISS (Italian SQL Server UG)
Regular Speaker @ SQL Server events
Consulting & Training, Mentor @ SolidQ
E-mail: dmauri@solidq.com
Twitter: @mauridb
Blog: http://sqlblog.com/blogs/davide_mauri/default.aspx
3. Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek
Agenda
• Complex Event Processing
• The Lambda Architecture
• Azure Stream Analytics
• Data Ingestion
• Azure Stream Analytics Query Language
• Advanced Features
• Additional Resources
• Conclusions
4. Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek
Complex Event Processing
• Event processing is a method of tracking and analyzing (processing)
streams of information (data) about things that happen (events)
• Complex event processing, or CEP, is event processing that combines
data from multiple sources to infer events or patterns that suggest
more complicated circumstances.
• Start to appear in 1990
• Goal: identify meaningful events (such as opportunities or threats) and
respond to them as quickly as possible
5. Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek
Complex Event Processing Use Cases
• Network monitoring
• Intelligence and surveillance
• Risk management
• E-commerce
• Fraud detection
• Smart order routing
• Transaction cost analysis
• Pricing and analytics
• Market data management
• Algorithmic trading
• Data warehouse augmentation
Ref: http://www.infoq.com/articles/stream-processing-hadoop
6. Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek
The Lambda Architecture
Generic, scalable and fault-tolerant data processing architecture […]
in which low-latency reads and updates are required.
Ref: http://lambda-architecture.net/
7. Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek
Hadoop but not only that!
• Apache Hadoop Ecosystem is the typical solution nowadays
• “Mature” Option
• Flume (optional collector and streaming data movement system)
• Kafka (distributed messaging system)
• Storm (distributed real-time computation system)
• “Innovative” Option
• Spark + Spark Streaming
• Very powerful, but very complex
8. Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek
Why the Cloud? And why Azure?
• Due to the high scalability and computing power that a streaming
solution may require, the cloud is a perfect environment for it
• Very cheap and Very Simple to start a project
• Very well integrated with all other Azure offerings
• From Monitoring to Power BI
9. Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek
Stream analytics
• Real-Time (somehow) complex event processing engine
• Enables real-time event processing in a very simple and cheap way
• SQL-Like language
• Temporal Semantic Support
• Different from SQL Server 2016
• Specific for streaming data
• Azure Only at present time
10. Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek
Stream analytics
• Platform-as-a-Service
• Can handle millions of events per second
• Based on the REEF project (now Apache incubated)
• Main objects: Job, Query, Functions, Input & Outputs
• Totally manageable from a REST interface
• “Streaming Units” is the base concept to manage performance,
scalability and costs
• Roughly 1 Streaming Units = 1 MB/Sec of throughput
11. Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek
Stream analytics - Data ingestion
• Inputs for Stream Analytics
• Streaming Sources (“Data in motion”)
• JSON, CSV or AVRO
• Reference Data (“Data at rest”)
• JSON or CSV
• Blob Store (max 50MB)
• Streaming Sources
• Event Hubs
• IoT Hub
12. Stream analytics – High-Level Architecture
Azure SQL DB
Azure Event Hubs
Azure Blob StorageAzure BlobStorage
Azure EventHubs
Reference Data
Queryrunscontinuouslyagainsttheincomingstreamofevents
Events have defined schema
and are temporal
(sequenced in time)
Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek
Other Azure Stuff
13. Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek
Data ingestion
• A nice tool to monitor Event Hub is the “Service Bus Explorer”
• https://github.com/paolosalvatori/ServiceBusExplorer
14. Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek
DEMO
Simple Setup of Event Hubs, Source and Destination
15. Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek
Stream Analytics Query Engine
• Take date from one or more input
• Send resulting data to one or more output
• Support most common data types:
• bigint, float, unicode strings, datetime
• key-value pairs
• arrays
16. Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek
Stream Analytics Query Language
• Stream Analytics Query Language Reference
• https://msdn.microsoft.com/library/azure/dn834998.aspx
• Subset of T-SQL
• With specific temporal extension
• Time values to be used can be set using TIMESTAMP BY directive
17. Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek
Stream Analytics Query Language
DML Statements
• SELECT
• FROM
• WHERE
• GROUP BY
• HAVING
• CASE
• JOIN
• UNION
Windowing Extensions
• Tumbling Window
• Hopping Window
• Sliding Window
• Duration
Aggregate Functions
• SUM
• COUNT
• AVG
• MIN
• MAX
Scaling Functions
• WITH
• PARTITION BY
Date and Time Functions
• DATENAME
• DATEPART
• DAY
• MONTH
• YEAR
• DATETIMEFROMPARTS
• DATEDIFF
• DATADD
String Functions
• LEN
• CONCAT
• CHARINDEX
• SUBSTRING
• PATINDEX
Statistical Functions
• VAR/VARP
• STDEV/STDEVP
18. Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek
DEMO
Stream Analytics Query in action
19. Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek
Advanced features
• Partitioning Support
• Specially useful for high scalability
• CTE-Like constructs that also helps scaling out
• Temporal aggregations
• Tumbling, Hopping and Sliding Windows
• (Temporal) Join between input streams
20. Tumbling window
• Adjacent non-overlapping
windows
• Answer to the question:
“What happened in the last
X seconds? And in the next
X? And in the next X?” And
so on…
1 5 4 26 8 6 5
Time
(secs)
1 5 4 26
8 6
A 20-second Tumbling Window
3 6 1
5 3 6 1
Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek
Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek
21. Hopping window
1 5 4 26 8 7
A 20-second Hopping Window with a 10-second “Hop”
4 26
8 6
5 3 6 1
1 5 4 26
8 6 5 3
6 15 3
Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek
• Overlapping windows
• Answer to the question:
“Each X second tell me what
happened in the previous Y
seconds”
• The same event can be in
more than one windows
• Think to a “moving average”
22. Sliding window
1 5
A 20-second Sliding Window
1
8
8
5 1
9
5 1 9
Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek
• A forward moving window.
Every time something
happen, you get data of
what happened in the last
“X” seconds.
23. Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek
DEMO
Stream Analytics Full Power!
24. Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek
Stream analytics and machine learning
• Apply AzureML model to streaming data
• Sample use-cases
• Fraud Detection
• Product Recommendation
• Customer Sentiment Analysis
• Maintenance Prediction
• Right now in preview and available only through the “old” portal
• https://manage.windowsazure.com/
25. Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek
DEMO
Stream Analytics & Machine Learning
26. Stream analytics alternative (on azure)
• Apache Storm
• IaaS or PaaS (With HDInsight)
• Much more complex to manage and develop…but much more
powerful
• https://azure.microsoft.com/en-us/documentation/articles/stream-analytics-
comparison-storm/
Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek
27. Stream analytics on-premises?
• Apache Hadoop Ecosystem
• Flume / Kafka / Storm
• StreamInsight
• CEP solution part of the SQL Server Platform
• EventStore
• Javascript OpenSource CEP
• None of them (except EventStore) has native temporal extension
Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek
29. Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek
Thanks!
Questions?
30. Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek
Demos available on GitHub
https://github.com/yorek/devweek2016
Hinweis der Redaktion
Stream + Reference Data
To get a finer granularity of time, we can use a generalized version of tumbling window, called Hopping Window. Hopping windows are windows that "hop" forward in time by a fixed period. The window is defined by two time spans: the hop size H and the window size S. For every H time unit, a new window of size S is created. The tumbling window is a special case of a hopping window where the hop size is equal to the window size.
Syntax
HOPPINGWINDOW ( timeunit , windowsize , hopsize )
HOPPINGWINDOW ( Duration( timeunit , windowsize ) , Hop (timeunit , windowsize )
Note: The Hopping Window can be used in the above two ways. If the windowsize and the hopsize has the same timeunit, you can use it without the Duration and Hop functions.
The Duration function can also be used with other types of windows to specify the window size
A Sliding window is a fixed length window which moves forward by an (€) epsilon and produces an output only during the occurrence of an event. An epsilon is one hundredth of a nanosecond.
Syntax
SLIDINGWINDOW ( timeunit , windowsize )
SLIDINGWINDOW(DURATION(timeunit, windowsize), Hop(timeunit, windowsize))