SlideShare ist ein Scribd-Unternehmen logo
1 von 34
Azure Stream Analytics
Dr. Nico Jacobs, nico@ .be, @SQLWaldorf
Tweet and win an Ignite 2016 ticket #itproceed
Why
• Traditional Business Intelligence first collects data and
analyzes it afterwards
– Typically 1 day latency
• But we live in a fast paced world
– Social media
– Internet of Things
– Just-in-time production
• We want to monitor and analyze streams of data in
near real time
– Typically a few seconds up to a few minutes latency
A different kind of query
• Traditional querying assumes the data doesn’t
change while you are querying it:
We query a fixed state
– If the data is changing: snapshots and transactions
‘freeze’ the data while we query it
– Since we query a finite state, our query should finish
in a finite amount of time
table
query
result
table
14
A different kind of query
• When analyzing a stream of data, we deal with
a potential infinite amount of data
• As a consequence our query will never end!
• To solve this problem most queries will use
time windows
stream
temporal
query result
stream
12:15:00 1
12:15:10 3
12:15:20 2
…
Azure Stream Analytics
• In Azure Stream Analytics we create, manage
and run jobs
• Every job has at least one input, one query and
one output
• But jobs can be more complex: a query can
read from different inputs and write to multiple
outputs
QueryInput Output
Query
Inputs
• Currently two types of input supported
– Data Stream: an Azure Event Hub or Azure Blob
through which we receive a stream of data
– Reference Data: an Azure Blob for static reference
data (lookup ‘table’)
• No support for Azure databases or other cloud
storage (yet)
Temporal query
• Query is written in SQL!
– No Java or .Net coding skills needed
• Mainly a subset of T-SQL
• A few extra keywords are added to deal
with temporal queries
Output
• Results are stored either in
– Azure Blob storage: creates log files with temporal query results
• Ideal for archiving
– SQL database: Stores results in Azure SQL Database table
• Ideal as source for traditional reporting and analysis
– Event hub: Sends an event to an event hub
• Ideal to generate actionable events such as alerts or notifications
– Azure Table storage:
• More structured than blob storage, easier to setup than SQL database and
durable (in contrast to event hub)
– PowerBI.com:
• Ideal for near real time reporting!
Time for action!
• Online feedback on this talk
• Browse to itprofeed.azurewebsites.net
Event hub
Azure
Stream
Analytics
PowerBI.com
Demos
1. Create an Azure Service Bus Event Hub
2. Implement applications to send data into the
Event Hub
3. Create an Azure Stream Analytics job
4. Link the input
5. Create an output
6. Write and test a query
7. Start the job
Create Azure Event Hub
• Azure event hub is newest component in
Azure Service Bus
• Typically used to collect sensor and app
data
• Event hub collects and temporary stores
thousands of events per second
Implement application for sending
events
Create Azure Stream Analytics job
• Currently only available
in the old Azure portal
• Preferably put it in the
same region as Event
Hub and data storage
Link the input
• Event hub does not assume any data format
• But stream analytics needs to parse the data
• Three data formats supported: JSON, CSV and
Apache Avro (binary JSON)
• No columns specified
Create an output
• Five output options: Azure Table or Blob, SQL
Database, Event Hub or PowerBI.com
• Blob and event hub do not require predefined
meta-data
– Again: CSV, JSON and Avro supported
• When storing information in a SQL Database or
Azure Table storage we need to create upfront the
table in which we will store the results
– Meta-data needed upfront
Create Query
• In a query window we can write two types of
statements:
– SELECT statement to extract a stream of results
from one or more input streams
• Required
• Can use WITH clause to write more complex constructs
or increase parallelism
– CREATE TABLE statements to specify type
information on our input stream(s)
Simple SELECT statement
• SELECT <fields> | * FROM <input> [WHERE
<condition>]
– This query simply produces a filtered output-
stream based on the input stream
– In the SELECT statement and WHERE clause we
can use functions such as DATEDIFF
– But many functions from T-SQL are not available
• E.g. we can use CAST but not CONVERT
Testing a query
• Trial and error query development would be slow:
– Starting a Stream Analytics job takes some minutes
– Inspecting the outcome of a job means checking
tables or blobs
– We cannot modify a query while it is running
• Luckily when a job is stopped, we can run a query
on data from a JSON text file and see the outcome
in the browser
– There is even a ‘sample input’ option
Data types
• Very simple type system:
– Bigint
– Float
– Nvarchar(max)
– Datetime
• Inputs will be casted into one of these types
• We can control these types with a CREATE TABLE
statement:
– This does not create a table, but just a data type mapping
for the inputs
Group by
• Group by returns data aggregated over a certain subset of
data
• How to define a subset in a stream?
• Windowing functions!
– Each Group By requires a windowing function
(fromMSDN)
3 Windowing functions
Tumbling Hopping Sliding
Timestamp by
• A record can have multiple timestamps associated with
them
– E.g. the time a phone call starts, ends, is submitted to the
event hub, is processed by Azure Stream Analytics, …
– By default the timestamp used in the temporal SQL queries
is System.Timestamp
• Event hub arrival time
• Blob last modified data
– But we can include an explicit timestamp in the data we
provide. In that case we must follow the FROM in our
temporal query with TIMESTAMP BY <fieldname>
JOIN
• We can combine multiple event streams or an event
stream with reference data via a join (inner join) or a left
outer join
• In the join clause we can specify the time window in
which we want the join to take place
– We use a special version of DateDiff for this
INTO clause
• We can have multiple outputs
• Without INTO clause we write to destination
named ‘output’
• With INTO clause we can choose for every
select the appropriate destination
– E.g. send events to blob storage for big data
analysis, but send special events to event hub for
alerting
Out of order inputs
• What if event 6:54:32 arrives after event
6:55:55?
– Trick: buffer your data for n minutes: all
events that arrive less than n minutes late
will be processed (tolerance window)
– What do we do with everything that arrives
more then n minutes late? Do we skip them
(drop) or do we pretend they happened just
now (adjust)?
Scaling
• By default every job consists of 1 streaming unit
• A streaming unit can process up to 1 Mb / second
• When higher throughput is needed we can activate
up to 6 streaming units per regular query
• If your input is a partitioned event hub, we can
write partitioned queries and partitioned
subqueries (WITH clause)
• A non-partitioned query with a 3-fold partitioned
subquery can have (1+3) * 4 = 24 streaming units!
Pricing
• Azure Stream Analytics
• 0.55 € per streaming unit per day (+- 17 €/month)
• 0.0008 € per Gb throughput
• So, when processing about 10 million
events at a max. rate of 1 Mb/sec. this
costs less than 18 € a month
Machine Learning
• Sensor thresholds are not always constant
• But Azure can ‘learn’ which values
preceded issues Azure Machine Learning
Summary
• Azure Stream Analytics is a PaaS version of
StreamInsight
– Process stream of events via temporal queries
• Supports multiple input and output formats
• Scales to large volumes of events
• Temporal queries are written in SQL variant
And win a Lumia 635
Feedback form will be sent to you by email
Give me (more) feedback
Follow Technet Belgium
@technetbelux
Subscribe to the TechNet newsletter
aka.ms/benews
Be the first to know
Thank you!
Belgiums’ biggest IT PRO Conference

Weitere ähnliche Inhalte

Was ist angesagt?

Event Streaming Architecture for Industry 4.0 - Abdelkrim Hadjidj & Jan Kuni...
Event Streaming Architecture for Industry 4.0 -  Abdelkrim Hadjidj & Jan Kuni...Event Streaming Architecture for Industry 4.0 -  Abdelkrim Hadjidj & Jan Kuni...
Event Streaming Architecture for Industry 4.0 - Abdelkrim Hadjidj & Jan Kuni...
Flink Forward
 
StreamAnalytix - Multi-Engine Streaming Analytics Platform
StreamAnalytix - Multi-Engine Streaming Analytics PlatformStreamAnalytix - Multi-Engine Streaming Analytics Platform
StreamAnalytix - Multi-Engine Streaming Analytics Platform
Atul Sharma
 

Was ist angesagt? (20)

Let's Talk About: Azure Monitor
Let's Talk About: Azure MonitorLet's Talk About: Azure Monitor
Let's Talk About: Azure Monitor
 
Real-Time Geospatial Intelligence at Scale
Real-Time Geospatial Intelligence at Scale Real-Time Geospatial Intelligence at Scale
Real-Time Geospatial Intelligence at Scale
 
Event Driven Architecture: Mistakes, I've made a few...
Event Driven Architecture: Mistakes, I've made a few...Event Driven Architecture: Mistakes, I've made a few...
Event Driven Architecture: Mistakes, I've made a few...
 
Streaming Analytics for IoT with Apache Spark
Streaming Analytics for IoT with Apache SparkStreaming Analytics for IoT with Apache Spark
Streaming Analytics for IoT with Apache Spark
 
One Kubernetes to rule them all (ZEUS 2019 Keynote)
One Kubernetes to rule them all (ZEUS 2019 Keynote)One Kubernetes to rule them all (ZEUS 2019 Keynote)
One Kubernetes to rule them all (ZEUS 2019 Keynote)
 
Event Streaming Architecture for Industry 4.0 - Abdelkrim Hadjidj & Jan Kuni...
Event Streaming Architecture for Industry 4.0 -  Abdelkrim Hadjidj & Jan Kuni...Event Streaming Architecture for Industry 4.0 -  Abdelkrim Hadjidj & Jan Kuni...
Event Streaming Architecture for Industry 4.0 - Abdelkrim Hadjidj & Jan Kuni...
 
Getting It Right Exactly Once: Principles for Streaming Architectures
Getting It Right Exactly Once: Principles for Streaming ArchitecturesGetting It Right Exactly Once: Principles for Streaming Architectures
Getting It Right Exactly Once: Principles for Streaming Architectures
 
Building the Next-gen Digital Meter Platform for Fluvius
Building the Next-gen Digital Meter Platform for FluviusBuilding the Next-gen Digital Meter Platform for Fluvius
Building the Next-gen Digital Meter Platform for Fluvius
 
Auto-Train a Time-Series Forecast Model With AML + ADB
Auto-Train a Time-Series Forecast Model With AML + ADBAuto-Train a Time-Series Forecast Model With AML + ADB
Auto-Train a Time-Series Forecast Model With AML + ADB
 
Apache Kafka® and Analytics in a Connected IoT World
Apache Kafka® and Analytics in a Connected IoT WorldApache Kafka® and Analytics in a Connected IoT World
Apache Kafka® and Analytics in a Connected IoT World
 
Intuit Analytics Cloud 101
Intuit Analytics Cloud 101Intuit Analytics Cloud 101
Intuit Analytics Cloud 101
 
Life is but a Stream
Life is but a StreamLife is but a Stream
Life is but a Stream
 
Full Stack Monitoring with Azure Monitor
Full Stack Monitoring with Azure MonitorFull Stack Monitoring with Azure Monitor
Full Stack Monitoring with Azure Monitor
 
Winning the On-Demand Economy with Spark and Predictive Analytics
Winning the On-Demand Economy with Spark and Predictive AnalyticsWinning the On-Demand Economy with Spark and Predictive Analytics
Winning the On-Demand Economy with Spark and Predictive Analytics
 
Introduction to Azure monitor
Introduction to Azure monitorIntroduction to Azure monitor
Introduction to Azure monitor
 
StreamAnalytix - Multi-Engine Streaming Analytics Platform
StreamAnalytix - Multi-Engine Streaming Analytics PlatformStreamAnalytix - Multi-Engine Streaming Analytics Platform
StreamAnalytix - Multi-Engine Streaming Analytics Platform
 
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
 
Rediscovering the Value of Apache Kafka® in Modern Data Architecture
Rediscovering the Value of Apache Kafka® in Modern Data ArchitectureRediscovering the Value of Apache Kafka® in Modern Data Architecture
Rediscovering the Value of Apache Kafka® in Modern Data Architecture
 
IoT & Azure (EventHub)
IoT & Azure (EventHub)IoT & Azure (EventHub)
IoT & Azure (EventHub)
 
Three Pillars, Zero Answers: Rethinking Observability
Three Pillars, Zero Answers: Rethinking ObservabilityThree Pillars, Zero Answers: Rethinking Observability
Three Pillars, Zero Answers: Rethinking Observability
 

Andere mochten auch

Qubole hadoop-summit-2013-europe
Qubole hadoop-summit-2013-europeQubole hadoop-summit-2013-europe
Qubole hadoop-summit-2013-europe
Joydeep Sen Sarma
 
Getting to 1.5M Ads/sec: How DataXu manages Big Data
Getting to 1.5M Ads/sec: How DataXu manages Big DataGetting to 1.5M Ads/sec: How DataXu manages Big Data
Getting to 1.5M Ads/sec: How DataXu manages Big Data
Qubole
 

Andere mochten auch (20)

BIPD Tech Tuesday Presentation - Qubole
BIPD Tech Tuesday Presentation - QuboleBIPD Tech Tuesday Presentation - Qubole
BIPD Tech Tuesday Presentation - Qubole
 
Creating a fortigate vpn network & security blog
Creating a fortigate vpn   network & security blogCreating a fortigate vpn   network & security blog
Creating a fortigate vpn network & security blog
 
Fortinet Automates Migration onto Layered Secure Workloads
Fortinet Automates Migration onto Layered Secure WorkloadsFortinet Automates Migration onto Layered Secure Workloads
Fortinet Automates Migration onto Layered Secure Workloads
 
Qubole presentation for the Cleveland Big Data and Hadoop Meetup
Qubole presentation for the Cleveland Big Data and Hadoop Meetup   Qubole presentation for the Cleveland Big Data and Hadoop Meetup
Qubole presentation for the Cleveland Big Data and Hadoop Meetup
 
Azure ARM’d and Ready
Azure ARM’d and ReadyAzure ARM’d and Ready
Azure ARM’d and Ready
 
Azure Document Db
Azure Document DbAzure Document Db
Azure Document Db
 
Benjamin Guinebertière - Microsoft Azure: Document DB and other noSQL databas...
Benjamin Guinebertière - Microsoft Azure: Document DB and other noSQL databas...Benjamin Guinebertière - Microsoft Azure: Document DB and other noSQL databas...
Benjamin Guinebertière - Microsoft Azure: Document DB and other noSQL databas...
 
AWS re:Invent 2016: How DataXu scaled its Attribution System to handle billio...
AWS re:Invent 2016: How DataXu scaled its Attribution System to handle billio...AWS re:Invent 2016: How DataXu scaled its Attribution System to handle billio...
AWS re:Invent 2016: How DataXu scaled its Attribution System to handle billio...
 
Qubole hadoop-summit-2013-europe
Qubole hadoop-summit-2013-europeQubole hadoop-summit-2013-europe
Qubole hadoop-summit-2013-europe
 
Getting to 1.5M Ads/sec: How DataXu manages Big Data
Getting to 1.5M Ads/sec: How DataXu manages Big DataGetting to 1.5M Ads/sec: How DataXu manages Big Data
Getting to 1.5M Ads/sec: How DataXu manages Big Data
 
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
 
RDO-Packstack Workshop
RDO-Packstack Workshop RDO-Packstack Workshop
RDO-Packstack Workshop
 
5 Crucial Considerations for Big data adoption
5 Crucial Considerations for Big data adoption5 Crucial Considerations for Big data adoption
5 Crucial Considerations for Big data adoption
 
Qubole @ AWS Meetup Bangalore - July 2015
Qubole @ AWS Meetup Bangalore - July 2015Qubole @ AWS Meetup Bangalore - July 2015
Qubole @ AWS Meetup Bangalore - July 2015
 
Atlanta Data Science Meetup | Qubole slides
Atlanta Data Science Meetup | Qubole slidesAtlanta Data Science Meetup | Qubole slides
Atlanta Data Science Meetup | Qubole slides
 
Nw qubole overview_033015
Nw qubole overview_033015Nw qubole overview_033015
Nw qubole overview_033015
 
Unlocking Self-Service Big Data Analytics on AWS
Unlocking Self-Service Big Data Analytics on AWSUnlocking Self-Service Big Data Analytics on AWS
Unlocking Self-Service Big Data Analytics on AWS
 
DataXu: Programmatic Premium Webinar - June 7, 2012
DataXu: Programmatic Premium Webinar - June 7, 2012DataXu: Programmatic Premium Webinar - June 7, 2012
DataXu: Programmatic Premium Webinar - June 7, 2012
 
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
 
15 Years of Web Security: The Rebellious Teenage Years
15 Years of Web Security: The Rebellious Teenage Years15 Years of Web Security: The Rebellious Teenage Years
15 Years of Web Security: The Rebellious Teenage Years
 

Ähnlich wie Azure stream analytics by Nico Jacobs

The rice and fail of an IoT solution
The rice and fail of an IoT solutionThe rice and fail of an IoT solution
The rice and fail of an IoT solution
Radu Vunvulea
 
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
Flink Forward
 

Ähnlich wie Azure stream analytics by Nico Jacobs (20)

StructuredStreaming webinar slides.pptx
StructuredStreaming webinar slides.pptxStructuredStreaming webinar slides.pptx
StructuredStreaming webinar slides.pptx
 
StructuredStreaming webinar slides.pptx
StructuredStreaming webinar slides.pptxStructuredStreaming webinar slides.pptx
StructuredStreaming webinar slides.pptx
 
Near Real-Time IoT Analytics of Pumping Stations in PowerBI
Near Real-Time IoT Analytics of Pumping Stations in PowerBINear Real-Time IoT Analytics of Pumping Stations in PowerBI
Near Real-Time IoT Analytics of Pumping Stations in PowerBI
 
A lap around Azure Data Factory
A lap around Azure Data FactoryA lap around Azure Data Factory
A lap around Azure Data Factory
 
The rice and fail of an IoT solution
The rice and fail of an IoT solutionThe rice and fail of an IoT solution
The rice and fail of an IoT solution
 
Breaking data
Breaking dataBreaking data
Breaking data
 
PlayStation and Lucene - Indexing 1M documents per second: Presented by Alexa...
PlayStation and Lucene - Indexing 1M documents per second: Presented by Alexa...PlayStation and Lucene - Indexing 1M documents per second: Presented by Alexa...
PlayStation and Lucene - Indexing 1M documents per second: Presented by Alexa...
 
Gcp dataflow
Gcp dataflowGcp dataflow
Gcp dataflow
 
Sql azure cluster dashboard public.ppt
Sql azure cluster dashboard public.pptSql azure cluster dashboard public.ppt
Sql azure cluster dashboard public.ppt
 
AWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data AnalyticsAWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data Analytics
 
Understanding event data
Understanding event dataUnderstanding event data
Understanding event data
 
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
 
Using extended events for troubleshooting sql server
Using extended events for troubleshooting sql serverUsing extended events for troubleshooting sql server
Using extended events for troubleshooting sql server
 
Collaborate 2019 - How to Understand an AWR Report
Collaborate 2019 - How to Understand an AWR ReportCollaborate 2019 - How to Understand an AWR Report
Collaborate 2019 - How to Understand an AWR Report
 
Stateful Stream Processing at In-Memory Speed
Stateful Stream Processing at In-Memory SpeedStateful Stream Processing at In-Memory Speed
Stateful Stream Processing at In-Memory Speed
 
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
 
How we evolved data pipeline at Celtra and what we learned along the way
How we evolved data pipeline at Celtra and what we learned along the wayHow we evolved data pipeline at Celtra and what we learned along the way
How we evolved data pipeline at Celtra and what we learned along the way
 
BDA311 Introduction to AWS Glue
BDA311 Introduction to AWS GlueBDA311 Introduction to AWS Glue
BDA311 Introduction to AWS Glue
 
Case Study Real Time Olap Cubes
Case Study Real Time Olap CubesCase Study Real Time Olap Cubes
Case Study Real Time Olap Cubes
 
Cloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark AnalyticsCloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark Analytics
 

Mehr von ITProceed

Mehr von ITProceed (20)

ITPROCEED_WorkplaceMobility_Windows 10 in the enterprise
ITPROCEED_WorkplaceMobility_Windows 10 in the enterpriseITPROCEED_WorkplaceMobility_Windows 10 in the enterprise
ITPROCEED_WorkplaceMobility_Windows 10 in the enterprise
 
ITPROCEED_TransformTheDatacenter_ten most common mistakes when deploying adfs...
ITPROCEED_TransformTheDatacenter_ten most common mistakes when deploying adfs...ITPROCEED_TransformTheDatacenter_ten most common mistakes when deploying adfs...
ITPROCEED_TransformTheDatacenter_ten most common mistakes when deploying adfs...
 
The Internet of your things by Jan Tielens
The Internet of your things by Jan  TielensThe Internet of your things by Jan  Tielens
The Internet of your things by Jan Tielens
 
Optimal Azure Database Development by Karel Coenye
 Optimal Azure Database Development by Karel Coenye Optimal Azure Database Development by Karel Coenye
Optimal Azure Database Development by Karel Coenye
 
Azure SQL DB V12 at your service by Pieter Vanhove
Azure SQL DB V12 at your service by Pieter VanhoveAzure SQL DB V12 at your service by Pieter Vanhove
Azure SQL DB V12 at your service by Pieter Vanhove
 
ITPROCEED_WorkplaceMobility_Delivering applications with Azure RemoteApp
ITPROCEED_WorkplaceMobility_Delivering applications with Azure RemoteAppITPROCEED_WorkplaceMobility_Delivering applications with Azure RemoteApp
ITPROCEED_WorkplaceMobility_Delivering applications with Azure RemoteApp
 
ITPROCEED_TransformTheDatacenter_Automate yourself service management like a ...
ITPROCEED_TransformTheDatacenter_Automate yourself service management like a ...ITPROCEED_TransformTheDatacenter_Automate yourself service management like a ...
ITPROCEED_TransformTheDatacenter_Automate yourself service management like a ...
 
ITPROCEED_WorkplaceMobility_Creating a seamless experience with ue v and wind...
ITPROCEED_WorkplaceMobility_Creating a seamless experience with ue v and wind...ITPROCEED_WorkplaceMobility_Creating a seamless experience with ue v and wind...
ITPROCEED_WorkplaceMobility_Creating a seamless experience with ue v and wind...
 
ITPROCEED_WorkplaceMobility_Delivering traditional File Server Workloads in a...
ITPROCEED_WorkplaceMobility_Delivering traditional File Server Workloads in a...ITPROCEED_WorkplaceMobility_Delivering traditional File Server Workloads in a...
ITPROCEED_WorkplaceMobility_Delivering traditional File Server Workloads in a...
 
ITPROCEED2015_WorkplaceMobility_Configuration Manager 2012’s latest Service P...
ITPROCEED2015_WorkplaceMobility_Configuration Manager 2012’s latest Service P...ITPROCEED2015_WorkplaceMobility_Configuration Manager 2012’s latest Service P...
ITPROCEED2015_WorkplaceMobility_Configuration Manager 2012’s latest Service P...
 
Office Track: Information Protection and Control in Exchange Online/On Premis...
Office Track: Information Protection and Control in Exchange Online/On Premis...Office Track: Information Protection and Control in Exchange Online/On Premis...
Office Track: Information Protection and Control in Exchange Online/On Premis...
 
Office Track: Exchange 2013 in the real world - Michael Van Horenbeeck
Office Track: Exchange 2013 in the real world - Michael Van HorenbeeckOffice Track: Exchange 2013 in the real world - Michael Van Horenbeeck
Office Track: Exchange 2013 in the real world - Michael Van Horenbeeck
 
Office Track: SharePoint Online Migration - Asses, Prepare, Migrate & Support...
Office Track: SharePoint Online Migration - Asses, Prepare, Migrate & Support...Office Track: SharePoint Online Migration - Asses, Prepare, Migrate & Support...
Office Track: SharePoint Online Migration - Asses, Prepare, Migrate & Support...
 
Office Track: Lync & Skype Federation v2 Deep Dive - Johan Delimon
Office Track: Lync & Skype Federation v2 Deep Dive - Johan DelimonOffice Track: Lync & Skype Federation v2 Deep Dive - Johan Delimon
Office Track: Lync & Skype Federation v2 Deep Dive - Johan Delimon
 
Office Track: Lync in a VDI Infrastructure - Ruben Nauwelaers & Wim Borgers
Office Track: Lync in a VDI Infrastructure - Ruben Nauwelaers & Wim BorgersOffice Track: Lync in a VDI Infrastructure - Ruben Nauwelaers & Wim Borgers
Office Track: Lync in a VDI Infrastructure - Ruben Nauwelaers & Wim Borgers
 
Office Track: SharePoint Apps for the IT Pro - Thomas Vochten
Office Track: SharePoint Apps for the IT Pro - Thomas VochtenOffice Track: SharePoint Apps for the IT Pro - Thomas Vochten
Office Track: SharePoint Apps for the IT Pro - Thomas Vochten
 
SQL Track: Restoring databases with powershell
SQL Track: Restoring databases with powershellSQL Track: Restoring databases with powershell
SQL Track: Restoring databases with powershell
 
SQL Track: Get more out of your data visualizations
SQL Track: Get more out of your data visualizationsSQL Track: Get more out of your data visualizations
SQL Track: Get more out of your data visualizations
 
SQL Track: SQL Server unleashed meet SQL Server's extreme sides
SQL Track: SQL Server unleashed meet SQL Server's extreme sidesSQL Track: SQL Server unleashed meet SQL Server's extreme sides
SQL Track: SQL Server unleashed meet SQL Server's extreme sides
 
SQL Track: In Memory OLTP in SQL Server
SQL Track: In Memory OLTP in SQL ServerSQL Track: In Memory OLTP in SQL Server
SQL Track: In Memory OLTP in SQL Server
 

Kürzlich hochgeladen

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Kürzlich hochgeladen (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 

Azure stream analytics by Nico Jacobs

  • 1. Azure Stream Analytics Dr. Nico Jacobs, nico@ .be, @SQLWaldorf Tweet and win an Ignite 2016 ticket #itproceed
  • 2. Why • Traditional Business Intelligence first collects data and analyzes it afterwards – Typically 1 day latency • But we live in a fast paced world – Social media – Internet of Things – Just-in-time production • We want to monitor and analyze streams of data in near real time – Typically a few seconds up to a few minutes latency
  • 3. A different kind of query • Traditional querying assumes the data doesn’t change while you are querying it: We query a fixed state – If the data is changing: snapshots and transactions ‘freeze’ the data while we query it – Since we query a finite state, our query should finish in a finite amount of time table query result table 14
  • 4. A different kind of query • When analyzing a stream of data, we deal with a potential infinite amount of data • As a consequence our query will never end! • To solve this problem most queries will use time windows stream temporal query result stream 12:15:00 1 12:15:10 3 12:15:20 2 …
  • 5. Azure Stream Analytics • In Azure Stream Analytics we create, manage and run jobs • Every job has at least one input, one query and one output • But jobs can be more complex: a query can read from different inputs and write to multiple outputs QueryInput Output Query
  • 6. Inputs • Currently two types of input supported – Data Stream: an Azure Event Hub or Azure Blob through which we receive a stream of data – Reference Data: an Azure Blob for static reference data (lookup ‘table’) • No support for Azure databases or other cloud storage (yet)
  • 7. Temporal query • Query is written in SQL! – No Java or .Net coding skills needed • Mainly a subset of T-SQL • A few extra keywords are added to deal with temporal queries
  • 8. Output • Results are stored either in – Azure Blob storage: creates log files with temporal query results • Ideal for archiving – SQL database: Stores results in Azure SQL Database table • Ideal as source for traditional reporting and analysis – Event hub: Sends an event to an event hub • Ideal to generate actionable events such as alerts or notifications – Azure Table storage: • More structured than blob storage, easier to setup than SQL database and durable (in contrast to event hub) – PowerBI.com: • Ideal for near real time reporting!
  • 9. Time for action! • Online feedback on this talk • Browse to itprofeed.azurewebsites.net Event hub Azure Stream Analytics PowerBI.com
  • 10. Demos 1. Create an Azure Service Bus Event Hub 2. Implement applications to send data into the Event Hub 3. Create an Azure Stream Analytics job 4. Link the input 5. Create an output 6. Write and test a query 7. Start the job
  • 11. Create Azure Event Hub • Azure event hub is newest component in Azure Service Bus • Typically used to collect sensor and app data • Event hub collects and temporary stores thousands of events per second
  • 12. Implement application for sending events
  • 13. Create Azure Stream Analytics job • Currently only available in the old Azure portal • Preferably put it in the same region as Event Hub and data storage
  • 14. Link the input • Event hub does not assume any data format • But stream analytics needs to parse the data • Three data formats supported: JSON, CSV and Apache Avro (binary JSON) • No columns specified
  • 15. Create an output • Five output options: Azure Table or Blob, SQL Database, Event Hub or PowerBI.com • Blob and event hub do not require predefined meta-data – Again: CSV, JSON and Avro supported • When storing information in a SQL Database or Azure Table storage we need to create upfront the table in which we will store the results – Meta-data needed upfront
  • 16. Create Query • In a query window we can write two types of statements: – SELECT statement to extract a stream of results from one or more input streams • Required • Can use WITH clause to write more complex constructs or increase parallelism – CREATE TABLE statements to specify type information on our input stream(s)
  • 17. Simple SELECT statement • SELECT <fields> | * FROM <input> [WHERE <condition>] – This query simply produces a filtered output- stream based on the input stream – In the SELECT statement and WHERE clause we can use functions such as DATEDIFF – But many functions from T-SQL are not available • E.g. we can use CAST but not CONVERT
  • 18. Testing a query • Trial and error query development would be slow: – Starting a Stream Analytics job takes some minutes – Inspecting the outcome of a job means checking tables or blobs – We cannot modify a query while it is running • Luckily when a job is stopped, we can run a query on data from a JSON text file and see the outcome in the browser – There is even a ‘sample input’ option
  • 19. Data types • Very simple type system: – Bigint – Float – Nvarchar(max) – Datetime • Inputs will be casted into one of these types • We can control these types with a CREATE TABLE statement: – This does not create a table, but just a data type mapping for the inputs
  • 20. Group by • Group by returns data aggregated over a certain subset of data • How to define a subset in a stream? • Windowing functions! – Each Group By requires a windowing function (fromMSDN)
  • 22. Timestamp by • A record can have multiple timestamps associated with them – E.g. the time a phone call starts, ends, is submitted to the event hub, is processed by Azure Stream Analytics, … – By default the timestamp used in the temporal SQL queries is System.Timestamp • Event hub arrival time • Blob last modified data – But we can include an explicit timestamp in the data we provide. In that case we must follow the FROM in our temporal query with TIMESTAMP BY <fieldname>
  • 23. JOIN • We can combine multiple event streams or an event stream with reference data via a join (inner join) or a left outer join • In the join clause we can specify the time window in which we want the join to take place – We use a special version of DateDiff for this
  • 24. INTO clause • We can have multiple outputs • Without INTO clause we write to destination named ‘output’ • With INTO clause we can choose for every select the appropriate destination – E.g. send events to blob storage for big data analysis, but send special events to event hub for alerting
  • 25. Out of order inputs • What if event 6:54:32 arrives after event 6:55:55? – Trick: buffer your data for n minutes: all events that arrive less than n minutes late will be processed (tolerance window) – What do we do with everything that arrives more then n minutes late? Do we skip them (drop) or do we pretend they happened just now (adjust)?
  • 26. Scaling • By default every job consists of 1 streaming unit • A streaming unit can process up to 1 Mb / second • When higher throughput is needed we can activate up to 6 streaming units per regular query • If your input is a partitioned event hub, we can write partitioned queries and partitioned subqueries (WITH clause) • A non-partitioned query with a 3-fold partitioned subquery can have (1+3) * 4 = 24 streaming units!
  • 27. Pricing • Azure Stream Analytics • 0.55 € per streaming unit per day (+- 17 €/month) • 0.0008 € per Gb throughput • So, when processing about 10 million events at a max. rate of 1 Mb/sec. this costs less than 18 € a month
  • 28. Machine Learning • Sensor thresholds are not always constant • But Azure can ‘learn’ which values preceded issues Azure Machine Learning
  • 29.
  • 30. Summary • Azure Stream Analytics is a PaaS version of StreamInsight – Process stream of events via temporal queries • Supports multiple input and output formats • Scales to large volumes of events • Temporal queries are written in SQL variant
  • 31. And win a Lumia 635 Feedback form will be sent to you by email Give me (more) feedback
  • 32. Follow Technet Belgium @technetbelux Subscribe to the TechNet newsletter aka.ms/benews Be the first to know
  • 34. Belgiums’ biggest IT PRO Conference