SlideShare a Scribd company logo
1 of 57
1
TOPIC
Azure Data
Explorer
Time series analytics with Azure ADX
Who I am
@ZamanaRiccardo
@riccardozamana
zama202
https://www.linkedin.com/in/riccardozamana/
RICCARDO ZAMANA
Questions
What about TIME SERIES DATABASE?
When Have I to use it?
Which are market possible choices?
OpenTSDB? Kairos over Scylla/Cassandra? Influx?
Why Have I to learn Another DB
yet!?
Why not SQL? Why not COSMOS?
1. Intro
2. Service
3. Trust
4. Basics
5. Tecniques
6. Dive into Scalar Functions
7. Real Use Cases
Summary
• seconds freshness, days retention
• in-mem aggregated data
• pre-defined standing queries
• split-seconds query performance
• data viewing
Hot
• minutes freshness, months retention
• raw data
• ad-hoc queries
• seconds-minutes query perf
• data exploration
Warm
• hours freshness, years retention
• raw data
• programmatic batch processing
• minutes-hours query perf
• data manipulation
Cold
• in-mem cube
• stream analytics
• …
• column store
• Indexing
• …
• distributed file system
• map reduce
• …
Example Scenario: Multi-temperature data processing paths
Where is located the ADX Process?
What is Azure Data Explorer
Any append-
only stream
of records
Relational query
model:
Filter, aggregate, join,
calculated columns, …
Fully-
managed
Rapid iterations to
explore the data
High volume
High velocity
High variance
(structured, semi-
structured, free-text)
PaaS,
Vanilla,
Database
Purposely built
Fully managed big data analytics service
• Fully managed
for efficiency
Focus on insights, not the
infra-structure for fast time to
value
• No infrastructure to manage;
provision the service, choose the
SKU for your workload, and
create database.
• Optimized for
streaming data
Get near-instant insights
from fast-flowing data
• Scale linearly up to 200 MB per
second per node with highly
performant, low latency ingestion.
• Designed for
data exploration
• Run ad-hoc queries using the
intuitive query language
• Returns results from 1 Billion
records < 1 second without
modifying the data or metadata
Use cases & Best Scenario
When is it useful?
• Analyze Telemetry data
• Retrieve trends/Series from clustered data
• Make regression over Big Data
• Summarize and export ordered streams
Which kind of scenario is suitable for?
• IoT
• Troubleshooting and diagnostics
• Monitoring
• Security research
• Usage analytics
Azure Data Explorer Architecture
SPARK
ADF
Apps => API
Logstash plg
Kafka sync
IotHub
EventHub
EventGrid
Data
Management
Engine
SSD
Blob /
ADLS
STREAM
BATCH
Ingested
Data
ODBC
PowerBI
ADX UI
MS Flow
Logic Apps
Notebooks
Grafana
Spark
Azure Data Explorer SLA
SLA: at least 99.9%
availability (Last
updated: Feb 2019)
Maximum Available Minutes: is the total number of minutes for a given Cluster deployed by Customer in a
Microsoft Azure subscription during a billing month.
Downtime: is the total number of minutes within Maximum Available Minutes during which a Cluster is unavailable.
Monthly Uptime Percentage: for the Azure Data Explorer is calculated as Maximum Available Minutes less
Downtime divided by Maximum Available Minutes.
Monthly Uptime % = (Maximum Available Minutes-Downtime) / Maximum Available Minutes X 100
Daily: 01m 26.4s
Weekly: 10m 04.8s
Monthly: 43m 49.7s
How about the pricing Model?
1) Is a cluster (N Engine VMs + 1 Data management VM + Classic Iaas Network)
2) Billed per minutes (if you don’t use it, stop it!)
3) Honest price (if you stop VMs, ADX charges markup stops also)
4) Markup is proportionally related to Engine VMs number (only 1 DM VM “naked”)
Developer VM
(QPU)
Compute Optimized
(vCPU)
Storage Optimized
(vCPU)
Only for query test or analytics
automation development
Production(Azure Data Explorer will also charge for Storage and Networking charges incurred)
Workload that need high rate of
queries over a smaller data size
workloads that need fewer
queries over a large volume of
data
Development
How about the Pricing «REAL» Model?
Pricing considerations:
• More RI savings on D
• More space on L
• DM Doubled price on L series (2x VM, with 2xresources)
https://dataexplorer.azure.com/AzureDataExplorerCostEstimator.html
Pricing
https://dataexplorer.azure.com/AzureDataExplorerCostEstimator.html
Think to ADX as an
ANALYSIS TOOL, in a
MultiTenant
Environment
if you want to
pay little
money
if you can
afford a space
shuttle
Think to ADX as an INGESTION
& RESILIENCY TOOL in order to
break through your traditional
«Live DWH»
* Space shuttle definition: ADX Can scale up to 500 VMs => 8000 CORE x 64TB RAM….. and 1M euro/month :D
Your spending will
be a mix between
ADX and ADF /
PowerBI costs
Your spending will
be a mix between
ADX, Compute&
Event Services
Available SKU
Attribute D SKU L SKU
Small SKUs Minimal size is D11 with two cores Minimal size is L4 with four cores
Availability Available in all regions (the DS+PS
version has more limited availability)
Available in a few regions
Cost per GB cache per core High with the D SKU, low with the
DS+PS version
Lowest with the Pay-As-You-Go option
Reserved Instances (RI) pricing High discount (over 55 percent for a
three-year commitment)
Lower discount (20 percent for a three-
year commitment)
• D v2: The D SKU is compute-optimized (Optional Premium Storage disk)
• LS: The L SKU is storage-optimized (greater SSD size than D SKU equivalent)
First questions about ADX
It is normal to evaluate a new Azure service, in terms of Maturity, Applicability and
Affordability. So:
A) Are we sure that ADX is a mature Service?
B) Which are the correct use cases where it is really useful?
C) Which are the OSS Alternatives that we should compare with?
A) Are we sure that ADX is a mature Service?
Telemetry Analytics for internal Analytics Data Platform for products
AI OMS ASC Defender IOT
Interactive Analytics Big Data Platform
2015 - 2016
Starting with 1st party validation
Building modern analytics
Vision of analytics platform for MSFT
2019
Analytics engine for 3rd party offers
Unified platform across OMS/AI
Expanded scenarios for IOT timeseries
Bridged across client/server security
2017
GA - February 2019
B) Which are the correct use cases where it is really useful?
SCENARIO USE CASE USER STORY
Asset management
Troubleshooting Scenario
You need a Telemetry Analytics
Platform, in order to retrieve
aggregations or statistical calcultation
on historial series
(«As an IT Manager» I want a platform
to load logs from various file types, in
order to analyze them and focus
graphically the problem during time)
IoT Saas Solution
Logistics Saas Solution
You want to offer multi tenant SAAS
Solutions
(«As a Product Lead Engineer» I want to
manage the backend of my multitenant
SAAS solution using a unique, fat, huge
backend service)
IIoT Quality Management You need, within an Industrial IoT
solution deelopment, to have common
backend to handle with process
variables, to make a correation analysis
using continuous stream query
(«As a Quality manager» I need a
prebuilt backend solutions to
dynamically configure time based query
on data in order tofind out correlations
from process variables )
C) Which are the OSS Alternatives that we should compare with?
From db-engines.com
Azure Data Explorer
Fully managed big data
interactive analytics platform
Elastic Search
A distributed, RESTful modern
search and analytics engine
ADX can be a replacement for search and log analytics
engines such as Elasticsearch, Splunk, InfluxDB.
Splunk
real-time insights Engine to
boost productivity & security.
InfluxDB
DBMS for storing time series,
events and metrics
Vs
This is a comparison chart
Name Elasticsearch (ELASTIC) InfluxDB (InfluxData Inc.) Azure Data Explorer (Microsoft) Splunk (Splunk Inc.)
Description A distributed, RESTful modern search and
analytics engine based on Apache Lucene
DBMS for storing time series, events and
metrics
Fully managed big data interactive analytics
platform
Analytics Platform for Big Data
Database models Search engine, Document store Time Series DBMS Time Series DBMS, Search engine, Document
store , Event Store, Relational DBMS
Search engine
Initial release 2010 2013 2019 2003
License Open Source Open Source commercial commercial
Cloud-based only no no yes no
Implementation language Java Go
Server operating systems All OS with a Java VM Linux, OS X hosted Linux, OS X, Solaris, Windows
Data scheme schema-free schema-free Fixed schema with schema-less datatypes
(dynamic)
yes
Typing yes Numeric data and Strings yes yes
XML support no no yes yes
Secondary indexes yes no all fields are automatically indexed yes
SQL SQL-like query language SQL-like query language Kusto Query Language (KQL), SQL subset no
APIs and other access methods RESTful HTTP/JSON API HTTP API RESTful HTTP API HTTP REST
Java API JSON over UDP Microsoft SQL Server communication protocol
(MS-TDS)
Supported programming languages .Net, Java, JavaScript, Python .Net, Java, JavaScript, Python .Net, Java, JavaScript, Python .Net, Java, JavaScript, Python
Ruby, PHP, Perl, Groovy, Community
Contributed Clients
R,Ruby,PHP,Perl,Haskell,Clojure,Erlang,Go,Lisp,
Rust,Scala
R, PowerShell Ruby, PHP
Server-side scripts yes no Yes, possible languages: KQL, Python, R yes
Triggers yes no yes yes
Partitioning methods Sharding Sharding Sharding Sharding
Replication methods yes selectable replication factor yes Master-master replication
MapReduce ES-Hadoop Connector no no yes
Consistency concepts Eventual Consistency Eventual Consistency Eventual Consistency
Immediate Consistency
Foreign keys no no no no
Transaction concepts no no no no
Concurrency yes yes yes yes
Durability yes yes yes yes
In-memory capabilities Memcached and Redis integration yes no no
User concepts simple rights management via user accounts Azure Active Directory Authentication Access rights for users and roles
Why ADX is Unique
Simplified costs
• Vm costs
• ADX service add
on cost
Many Prebuilt
Inputs
• ADF
• Spark
• Logstash
• Kafka
• Iothub
• EventHub
Many Prebuilt
Outputs
• TDS/SQL
• Power BI
• ODBC Connector
• Spark
• Jupyter
• Grafana
Azure services with ADX usage
Azure Monitor
• Log Analytics
• Application Insights
Security Products
• Windows Defender
• Azure Security Center
• Azure Sentinel
IoT
• Time Series Insights
• Azure IoT Central
ADX QUICK START
Then go to https://dataexplorer.azure.com/
Sign-in to Azure
Portal, search for
‘Azure Data Explorer
cluster’ and click on
Create.
Fill the name of the
database, retention
and cache period in
days and hit the
Create button.
click on Create data connection to
load/ingest data from Event Hub, Blob
Storage or IoT Hub, Kafka connector,
Azure Data Factory, Event Hub, into the
database you just created.
With The first command creates a table, the second command ingests data from the csv file to this table (example).
How to set ADX «really»?
Create a database
Use Database to link Ingestion Sources
DataConnection
Login
az login
Select Subscription
az account set --subscription MyAzureSub
Cluster creation
az kusto cluster create --name azureclitest --sku
D11_V2 –capacity 2 --resource-group testrg
Database Creation
az kusto database create --cluster-name azureclitest --
name clidatabase --resource-group testrg --soft-
delete-period P365D --hot-cache-period P31D
HOT-CACHE-PERIOD: Amount of time that data should be kept in cache.
Duration in ISO8601 format (for example, 100 days would be P100D).
SOFT-DELETE-PERIOD: Amount of time that data should be kept so it is available to query.
Duration in ISO8601 format (for example, 100 days would be P100D)
Like a Newbie Like a Pro
Configuration
How to set and use ADX «really»?
Or like a BOSS
After retrieving Azure identity, create (with async method)
Check for a Succeded Provisioning state
Create the database
Then Do the fucking job and remove all traces [like a Killer]
FIRST
SECOND
THIRD
..LAST!
How to script with ADX «really»?
• Open VSC, create a file, save it and then edit
• Use Log Analytics or KUSTO/KQL extensions (
.csl | .kusto | .kql)
• Use Kuskus polugins (4) to highlight grammar
(Kuskus use Textmate grammar compliancy)
Like a Newbie Like a Pro
• Open https://dataexplorer.azure.com/
• Connect your cluster
• Use Online editor to test queries
• Save KQL Files using
How to script with ADX «really»?
Make ADX as part
of a bigger,
collaborative,
workflow.
Make your own web Application
Build a Analysis Workflow with
• Creation of Temporary Clusters
• Collaborative workspaces
• Track History of Analysis done
• Score People usage
Make your application secure
Make your result exportable
Natively to your Databases
Serve Power BI «for humans»
<iframe src="https://dataexplorer.azure.com/clusters/<cluster>?ibizaPortal=true" ></iframe>
https://microsoft.github.io/monaco-editor/index.html
Or like a BOSS
How to ingest data in ADX «really» ?
Like a Newbie Like a Pro
• Open https://dataexplorer.azure.com/
• Right click on DB, Ingest new data (preview)
Use LIGHT INGEST
• command-line utility for ad-hoc data ingestion into Kusto
• pull source data from a local folder or a Azure Blob Storage
container
[Ingest JSON data from blobs]
LightIngest "https://adxclu001.kusto.windows.net;Federated=true"
-database:db001
-table:LAB
-sourcePath: "Blob storage accunt with container SAS Token"
-prefix:MyDir1/MySubDir2
-format:json
-mappingRef:DefaultJsonMapping
-pattern:*.json
-limit:100
[Ingest CSV data with headers from local files]
LightIngest "https://adxclu001.kusto.windows.net;Federated=true"
-database:MyDb
-table:MyTable
-sourcePath:"D:MyFolderData"
-format:csv
-ignoreFirstRecord:true
-mappingPath:"D:MyFolderCsvMapping.txt"
-pattern:*.csv.gz
-limit:100
https://docs.microsoft.com/en-us/azure/kusto/tools/lightingest
How to ingest data in ADX «really» ?
Or like a BOSS
• Use a DataLake to store files
• Use EventGrid in order to have a Trigger Platform
• Make a Function … and That’s it!
How to visualize data in ADX «really»?
Like a Newbie Like a Pro
• Open https://dataexplorer.azure.com/ using «RENDER»
• Download Kusto Explorer and click the right Chart
• Use PowerBI Connector (Direct or Import Modes)
• Use Excel connector
How to visualize data in ADX «really»?
Use Python as your Swiss
knife for everything you
don’t wanna be «Properly
Developed»
Or like a BOSS
Use VSC as a Jupyter Notebook IDE with «KQLMagic»
KQLMagic extends the capabilities of the Python kernel in Jupyter, so you can run
Kusto language queries natively, combining Python and Kusto query language.
https://github.com/microsoft/jupyter-Kqlmagic
Willkommen auf Level 2
ADVANCED INGESTION CAPABILITIES
Connect Edge to Cloud thourgh a Secure Chain
Path no.1: Local Log Streaming
KAFKA cluster as a Log Streamer, equipped with a KUSTO SINK, that
writes a direct stream
Path no.2: Remote Log Streaming
Connect KAFKA directly to Azure Event hub, and then ingest with built in
process
Path no.2: Local Pipeline
LogStash as a local dynamic data collection pipeline with an extensible
plugin ecosystem
Path no.3: Remote Pipeline
Use Integration runtime to link OnPremise Data using Azure Datafactory
Dataflow
Ingestion Tecniques
For high-volume, reliable, and cheap
data ingestion
Batch ingestion
(provided by SDK)
the client uploads the data to Azure Blob storage
(designated by the Azure Data Explorer data management
service) and posts a notification to an Azure Queue.
Batch ingestion is the recommended technique.
Most appropriate for exploration
and prototyping
.Inline ingestion
(provided by query tools)
Inline ingestion: control command (.ingest inline)
containing in-band data is intended for ad hoc testing
purposes.
Ingest from query: control command (.set, .set-or-
append, .set-or-replace) that points to query results is
used for generating reports or small temporary tables.
Ingest from storage: control command (.ingest into)
with data stored externally (for example, Azure Blob
Storage) allows efficient bulk ingestion of data.
Supported data formats
The supported data formats are:
- CSV, TSV, TSVE, PSV, SCSV, SOH
- JSON (line-separated, multi-line), Avro
- ZIP and GZIP
• CSV Mapping (optional): works with all ordinal-based formats. It can be performed using
the ingest command parameter or pre-created on the table and referenced from the ingest
command parameter.
• JSON Mapping (mandatory) and Avro mapping (mandatory) can be performed using the
ingest command parameter. They can also be pre-created on the table and referenced from
the ingest command parameter.
For all ingestion methods other than ingest from query, YOU HAVE TO FORMAT THE DATA
so that Azure Data Explorer can parse it. Schema mapping helps bind source data fields to
destination table columns.
How about orchestration?
Three use cases in which FLOW + KUSTO are the solution
Push data to Power BI dataset
Periodically do queries, and
push to PowerBI dataset
Conditional queries
Make data checks, and send
notifications with no code
Email multiple ADX Flow charts
Send incredible emails with HTML5
Chart as query result
Use ADX with Application not natively supported by a connector
Use ODBC to connect to Azure Data Explorer from applications that don't have a dedicated connector. Azure
Data Explorer supports a subset of the SQL Server communication protocol.
Than you can use your preferred tool, compliant with ODBC Connection.
Use ODBC Driver 17 Connect to ADX Cluster Select Database Test connection
IMPORTANT NOTES:
• T-SQL SELECT statements are supported (sub-queries are not supported instead)
• Considering it a SQL Server with AD authentication, you can use LINQPad, Azure Data Studio, Power BI Desktop, Tableau,
Excel, DBeavera and SSMS(they're all MS-TDS clients).
• SQL server on-premises allows to attach linked server, and you can use "ADX as SQL"
From T-SQL to KQL and viceversa
Kusto supports translating T-SQL queries to Kusto query language.
Is translated into
Some example
NOTE: remember the dashes ( !#£$%!! )
Now, you can go straight to the third level.
ADX Functions
Functions are reusable queries or query parts. Kusto supports
several kinds of functions:
• Stored functions, which are user-defined functions that are stored and
managed a one kind of a database's schema entities. See Stored
functions.
• Query-defined functions, which are user-defined functions that are
defined and used within the scope of a single query. The definition of
such functions is done through a let statement. See User-defined
functions.
• Built-in functions, which are hard-coded (defined by Kusto and cannot
be modified by users).
Language examples
Alias
database["wiki"] = cluster("https://somecluster.kusto.windows.net:443").database("somedatabase");
database("wiki").PageViews | count
Let
start = ago(5h);
let period = 2h;
T | where Time > start and Time < start + period | ...
Bin:
T | summarize Hits=count() by bin(Duration, 1s)
Batch:
let m = materialize(StormEvents | summarize n=count() by State);
m | where n > 2000; m | where n < 10
Tabular expression:
Logs
| where Timestamp > ago(1d)
| join ( Events | where continent == 'Europe' ) on RequestId
Time Series Analysis – Bin Operator
T | summarize Hits=count() by bin(Duration, 1s)
bin(value,roundTo)
bin operator
Rounds values down to an integer multiple of a given bin size. If you have a scattered set of values, they will be
grouped into a smaller set of specific values.
[Rule]
[Example]
Time Series Analysis – Make Series Operator
T | make-series sum(amount) default=0, avg(price)
default=0 on timestamp from datetime(2016-01-01) to
datetime(2016-01-10) step 1d by supplier
T | make-series [MakeSeriesParamters] [Column =] Aggregation [default = DefaultValue] [, ...] on
AxisColumn from start to end step step [by [Column =] GroupExpression [, ...]]
make-series operator
[Rule]
[Example]
Time Series Analysis – Basket Operator
StormEvents
| where monthofyear(StartTime) == 5
| extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" , "NO")
| project State, EventType, Damage, DamageCrops
| evaluate basket(0.2)
basket operator
Basket finds all frequent patterns of discrete attributes (dimensions) in the data and will return all frequent patterns
that passed the frequency threshold in the original query.
[Rule]
[Example]
T | evaluate basket([Threshold, WeightColumn, MaxDimensions, CustomWildcard, CustomWildcard, ...])
Time Series Analysis – Autocluster Operator
StormEvents
| where monthofyear(StartTime) == 5
| extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" , "NO")
| project State , EventType , Damage
| evaluate autocluster(0.6)
autocluster operator
AutoCluster finds common patterns of discrete attributes (dimensions) in the data and will reduce the results of the
original query (whether it's 100 or 100k rows) to a small number of patterns.
[Rule]
[Example]
T | evaluate autocluster([SizeWeight, WeightColumn, NumSeeds, CustomWildcard, CustomWildcard, ...])
StormEvents
| where monthofyear(StartTime) == 5
| extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" , "NO")
| project State , EventType , Damage
| evaluate autocluster(0.2, '~', '~', '*')
Export
To Storage
.export async compressed to csv (
h@"https://storage1.blob.core.windows.net/containerName;secretKey",
h@"https://storage1.blob.core.windows.net/containerName2;secretKey" ) with (
sizeLimit=100000, namePrefix=export, includeHeaders=all, encoding =UTF8NoBOM
) <| myLogs | where id == "moshe" | limit 10000
To Sql
.export async to sql ['dbo.MySqlTable']
h@"Server=tcp:myserver.database.windows.net,1433;Database=MyDatabase;Authen
tication=Active Directory Integrated;Connection Timeout=30;" with
(createifnotexists="true", primarykey="Id") <| print Message = "Hello World!",
Timestamp = now(), Id=12345678
1. DEFINE COMMAND
Define ADX command and try your
recurrent export strategy
2. TRY IN EDITOR
Use an Editor to try command,
verifying conection strings and
parametrizing them
3. BUILD A JOB
Build a Notebook or a C# JOB using
the command as a SQL QUERY in
your CODE
External tables & Continuous Export
It’s an external endpoint:
Azure Storage
Azure Datalake Store
SQL Server
You need to define:
Destination
Continuous-Export Strategy
EXT TABLE CREATION
.create external table ExternalAdlsGen2 (Timestamp:datetime, x:long,
s:string) kind=adl partition by bin(Timestamp, 1d) dataformat=csv (
h@'abfss://filesystem@storageaccount.dfs.core.windows.net/path;secretKey
' ) with ( docstring = "Docs", folder = "ExternalTables", namePrefix="Prefix" )
EXPORT to EXT TABLE
.create-or-alter continuous-export MyExport over (T) to table
ExternalAdlsGen2 with (intervalBetweenRuns=1h, forcedLatency=10m,
sizeLimit=104857600) <| T
FACTS:
A) Kusto stores its ingested data in reliable storage (most commonly Azure Blob Storage).
B) To speed-up queries on that data, Kusto caches this data (or parts of it) on its processing nodes,
Retention policy
The Kusto cache provides a granular cache policy that
customers can use to differentiate between two data
cache policies: hot data cache and cold data cache.
set query_datascope="hotcache";
T | union U | join (T datascope=all | where Timestamp < ago(365d) on X
YOU CAN SPECIFY WHICH LOCATION MUST BE USED
Cache policy
is independent
of retention policy !
Retention policy
• Soft Delete Period (number)
• Data is available for query
ts is the ADX IngestionDate
• Default is set to 100 YEARS
• Recoverability (enabled/disabled)
• Default is set to ENABLED
• Recoverable for 14 days after deletion
.alter database DatabaseName policy retention "{}"
.alter table TableName policy retention "{}«
EXAMPLE:
{ "SoftDeletePeriod": "36500.00:00:00",
"Recoverability":"Enabled" }
.delete database DatabaseName policy retention
.delete table TableName policy retention
.alter-merge table MyTable1 policy retention softdelete = 7d
2 Parameters, applicable to DB or Table Use KUSTO to set KUSTO
Data Purge
The purge process is final and irreversible
PURGE PROCESS:
1. It requires database admin
permissions
2. Prior to Purging you have to
be ENABLED, opening a
SUPPORT TICKET.
3. Run purge QUERY, and
identify SIZE, EXEC.TIME and
give VerificationToken
4. Run REALLY purge QUERY
passing Verification Token
.purge table MyTable records in database MyDatabase <| where
CustomerId in ('X', 'Y')
NumRecordsToPurge
EstimatedPurge
ExecutionTime VerificationToken
1,596 00:00:02 e43c7184ed22f4f
23c7a9d7b124d19
6be2e570096987
e5baadf65057fa6
5736b
.purge table MyTable records in database MyDatabase with
(verificationtoken='e43c7184ed22f4f23c7a9d7b124d196be2e570
096987e5baadf65057fa65736b') <| where CustomerId in ('X', 'Y')
.purge table MyTable records
in database MyDatabase
with (noregrets='true')
2 STEP PROCESS 1 STEP PROCESS
With No Regrets !!!!
KUSTO: Do and Don’t
1. DO analytics over Big Data.
2. DO and support entities such as databases, tables, and columns
3. DO and support complex analytics query operators (calculated
columns, filtering, group by, joins).
BUT DO NOT perform in-place updates !
UC1: Event Correlation
Name City
SessionI
d Timestamp
Start London 2817330 2015-12-09T10:12:02.32
Game London 2817330 2015-12-09T10:12:52.45
Start Manchester 4267667 2015-12-09T10:14:02.23
Stop London 2817330 2015-12-09T10:23:43.18
Cancel Manchester 4267667 2015-12-09T10:27:26.29
Stop Manchester 4267667 2015-12-09T10:28:31.72
City SessionId StartTime StopTime Duration
London 2817330 2015-12-
09T10:12:02.32
2015-12-
09T10:23:43.18
00:11:40.46
Manch
ester
4267667 2015-12-
09T10:14:02.23
2015-12-
09T10:28:31.72
00:14:29.49
Get sessions from start and stop events
Let's suppose we have a log of events, in which some events mark the start or end of an
extended activity or session. Every event has an SessionId, so the problem is to match up the
start and stop events with the same id.
Kusto
let Events = MyLogTable
| where ... ;
Events
| where Name == "Start"
| project Name, City, SessionId, StartTime=timestamp
| join (Events | where Name="Stop" | project StopTime=timestamp, SessionId)
on SessionId
| project City, SessionId, StartTime, StopTime, Duration = StopTime - StartTime
Use let to name a projection of the table that is pared down as far as possible
before going into the join.
Project is used to change the names of the timestamps so that both the start
and stop times can appear in the result. It also selects the other columns we
want to see in the result.
join matches up the start and stop entries for the same activity, creating a row
for each activity. Finally, project again adds a column to show the duration of
the activity.
UC2: In Place Enrichment
Creating and using query-time dimension tables
In many cases one wants to join the results of a query with some ad-hoc dimension table that is not stored in the
database. It is possible to define an expression whose result is a table scoped to a single query by doing something
like this:
Kusto
// Create a query-time dimension table using datatable
let DimTable = datatable(EventType:string, Code:string) [ "Heavy Rain", "HR", "Tornado", "T" ] ;
DimTable
| join StormEvents on EventType
| summarize count() by Code
Brief Summary: Azure Data Explorer
Easy to ingest the data and easy to query the data
Blob
&
Azure
Queue
Python
SDK
IoT Hub
.NET SDK
Azure Data
Explorer
REST API
Event Hub
.NET SDK
Python SDK
Web UI
Desktop App
Jupyter
Magic
APIs UX
Power BI
Direct Query
Microsoft
Flow
Azure App
Logic
Connectors
Grafana
ADF
MS-TDS
Java SDK
Java Script
Monaco IDE
Azure
Notebooks
Protocols
Streaming
Bulk
APIs
Blob
&
Event Grid
Queued
Ingestion Direct
Java SDK
DEMO
Who I am
@ZamanaRiccardo
@riccardozamana
zama202
riccardozamana
RICCARDO ZAMANA
Thanks
Questions?
zama202
@ZamanaRiccardo
@riccardozamana
riccardozamana

More Related Content

What's hot

Azure data bricks by Eugene Polonichko
Azure data bricks by Eugene PolonichkoAzure data bricks by Eugene Polonichko
Azure data bricks by Eugene PolonichkoAlex Tumanoff
 
Common Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta LakehouseCommon Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta LakehouseDatabricks
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...HostedbyConfluent
 
Moving Beyond Lambda Architectures with Apache Kudu
Moving Beyond Lambda Architectures with Apache KuduMoving Beyond Lambda Architectures with Apache Kudu
Moving Beyond Lambda Architectures with Apache KuduCloudera, Inc.
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architectureAdam Doyle
 
Building Advanced Analytics Pipelines with Azure Databricks
Building Advanced Analytics Pipelines with Azure DatabricksBuilding Advanced Analytics Pipelines with Azure Databricks
Building Advanced Analytics Pipelines with Azure DatabricksLace Lofranco
 
Cloud DW technology trends and considerations for enterprises to apply snowflake
Cloud DW technology trends and considerations for enterprises to apply snowflakeCloud DW technology trends and considerations for enterprises to apply snowflake
Cloud DW technology trends and considerations for enterprises to apply snowflakeSANG WON PARK
 
Azure Purview Data Toboggan Erwin de Kreuk
Azure Purview Data Toboggan Erwin de KreukAzure Purview Data Toboggan Erwin de Kreuk
Azure Purview Data Toboggan Erwin de KreukErwin de Kreuk
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta LakeDatabricks
 
Microsoft Azure Databricks
Microsoft Azure DatabricksMicrosoft Azure Databricks
Microsoft Azure DatabricksSascha Dittmann
 
Azure governance v4.0
Azure governance v4.0Azure governance v4.0
Azure governance v4.0Marcos Oikawa
 
Azure SQL Database
Azure SQL DatabaseAzure SQL Database
Azure SQL Databaserockplace
 
모든 데이터를 위한 단 하나의 저장소, Amazon S3 기반 데이터 레이크::정세웅::AWS Summit Seoul 2018
모든 데이터를 위한 단 하나의 저장소, Amazon S3 기반 데이터 레이크::정세웅::AWS Summit Seoul 2018모든 데이터를 위한 단 하나의 저장소, Amazon S3 기반 데이터 레이크::정세웅::AWS Summit Seoul 2018
모든 데이터를 위한 단 하나의 저장소, Amazon S3 기반 데이터 레이크::정세웅::AWS Summit Seoul 2018Amazon Web Services Korea
 
Migrating to Cloud: Inhouse Hadoop to Databricks (3)
Migrating to Cloud: Inhouse Hadoop to Databricks (3)Migrating to Cloud: Inhouse Hadoop to Databricks (3)
Migrating to Cloud: Inhouse Hadoop to Databricks (3)Knoldus Inc.
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Cathrine Wilhelmsen
 

What's hot (20)

Azure data bricks by Eugene Polonichko
Azure data bricks by Eugene PolonichkoAzure data bricks by Eugene Polonichko
Azure data bricks by Eugene Polonichko
 
Common Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta LakehouseCommon Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta Lakehouse
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
 
Moving Beyond Lambda Architectures with Apache Kudu
Moving Beyond Lambda Architectures with Apache KuduMoving Beyond Lambda Architectures with Apache Kudu
Moving Beyond Lambda Architectures with Apache Kudu
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architecture
 
Building Advanced Analytics Pipelines with Azure Databricks
Building Advanced Analytics Pipelines with Azure DatabricksBuilding Advanced Analytics Pipelines with Azure Databricks
Building Advanced Analytics Pipelines with Azure Databricks
 
Azure Synapse Analytics
Azure Synapse AnalyticsAzure Synapse Analytics
Azure Synapse Analytics
 
Cloud DW technology trends and considerations for enterprises to apply snowflake
Cloud DW technology trends and considerations for enterprises to apply snowflakeCloud DW technology trends and considerations for enterprises to apply snowflake
Cloud DW technology trends and considerations for enterprises to apply snowflake
 
Azure Purview Data Toboggan Erwin de Kreuk
Azure Purview Data Toboggan Erwin de KreukAzure Purview Data Toboggan Erwin de Kreuk
Azure Purview Data Toboggan Erwin de Kreuk
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
 
Microsoft Azure Databricks
Microsoft Azure DatabricksMicrosoft Azure Databricks
Microsoft Azure Databricks
 
Azure governance v4.0
Azure governance v4.0Azure governance v4.0
Azure governance v4.0
 
Amazon OpenSearch Service
Amazon OpenSearch ServiceAmazon OpenSearch Service
Amazon OpenSearch Service
 
Azure SQL Database
Azure SQL DatabaseAzure SQL Database
Azure SQL Database
 
모든 데이터를 위한 단 하나의 저장소, Amazon S3 기반 데이터 레이크::정세웅::AWS Summit Seoul 2018
모든 데이터를 위한 단 하나의 저장소, Amazon S3 기반 데이터 레이크::정세웅::AWS Summit Seoul 2018모든 데이터를 위한 단 하나의 저장소, Amazon S3 기반 데이터 레이크::정세웅::AWS Summit Seoul 2018
모든 데이터를 위한 단 하나의 저장소, Amazon S3 기반 데이터 레이크::정세웅::AWS Summit Seoul 2018
 
Migrating to Cloud: Inhouse Hadoop to Databricks (3)
Migrating to Cloud: Inhouse Hadoop to Databricks (3)Migrating to Cloud: Inhouse Hadoop to Databricks (3)
Migrating to Cloud: Inhouse Hadoop to Databricks (3)
 
Introduction to Amazon Redshift
Introduction to Amazon RedshiftIntroduction to Amazon Redshift
Introduction to Amazon Redshift
 
Azure storage
Azure storageAzure storage
Azure storage
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
 
Amazon Aurora
Amazon AuroraAmazon Aurora
Amazon Aurora
 

Similar to Time Series Analytics Azure ADX

Azure satpn19 time series analytics with azure adx
Azure satpn19   time series analytics with azure adxAzure satpn19   time series analytics with azure adx
Azure satpn19 time series analytics with azure adxRiccardo Zamana
 
Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”Amazon Web Services
 
Microsoft Azure News - February 2018
Microsoft Azure News - February 2018Microsoft Azure News - February 2018
Microsoft Azure News - February 2018Daniel Toomey
 
Build 2017 - P4010 - A lap around Azure HDInsight and Cosmos DB Open Source A...
Build 2017 - P4010 - A lap around Azure HDInsight and Cosmos DB Open Source A...Build 2017 - P4010 - A lap around Azure HDInsight and Cosmos DB Open Source A...
Build 2017 - P4010 - A lap around Azure HDInsight and Cosmos DB Open Source A...Windows Developer
 
USQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventUSQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventTrivadis
 
Azure CosmosDb - Where we are
Azure CosmosDb - Where we areAzure CosmosDb - Where we are
Azure CosmosDb - Where we areMarco Parenzan
 
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...Cisco DevNet
 
Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...
Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...
Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...Gary Arora
 
Building a Real-Time IoT monitoring application with Azure
Building a Real-Time IoT monitoring application with AzureBuilding a Real-Time IoT monitoring application with Azure
Building a Real-Time IoT monitoring application with AzureDavide Mauri
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionJames Serra
 
Azure Cosmos DB - NoSQL Strikes Back (An introduction to the dark side of you...
Azure Cosmos DB - NoSQL Strikes Back (An introduction to the dark side of you...Azure Cosmos DB - NoSQL Strikes Back (An introduction to the dark side of you...
Azure Cosmos DB - NoSQL Strikes Back (An introduction to the dark side of you...Andre Essing
 
UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015Christopher Curtin
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Precisely
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudAmazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Innovations of .NET and Azure (Recaps of Build 2017 selected sessions)
Innovations of .NET and Azure (Recaps of Build 2017 selected sessions)Innovations of .NET and Azure (Recaps of Build 2017 selected sessions)
Innovations of .NET and Azure (Recaps of Build 2017 selected sessions)Jeff Chu
 

Similar to Time Series Analytics Azure ADX (20)

Azure satpn19 time series analytics with azure adx
Azure satpn19   time series analytics with azure adxAzure satpn19   time series analytics with azure adx
Azure satpn19 time series analytics with azure adx
 
Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
The Best of re:invent 2016
The Best of re:invent 2016The Best of re:invent 2016
The Best of re:invent 2016
 
Microsoft Azure News - February 2018
Microsoft Azure News - February 2018Microsoft Azure News - February 2018
Microsoft Azure News - February 2018
 
Build 2017 - P4010 - A lap around Azure HDInsight and Cosmos DB Open Source A...
Build 2017 - P4010 - A lap around Azure HDInsight and Cosmos DB Open Source A...Build 2017 - P4010 - A lap around Azure HDInsight and Cosmos DB Open Source A...
Build 2017 - P4010 - A lap around Azure HDInsight and Cosmos DB Open Source A...
 
USQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventUSQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake Event
 
Azure CosmosDb - Where we are
Azure CosmosDb - Where we areAzure CosmosDb - Where we are
Azure CosmosDb - Where we are
 
Best of re:Invent
Best of re:InventBest of re:Invent
Best of re:Invent
 
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
 
Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...
Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...
Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...
 
Building a Real-Time IoT monitoring application with Azure
Building a Real-Time IoT monitoring application with AzureBuilding a Real-Time IoT monitoring application with Azure
Building a Real-Time IoT monitoring application with Azure
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
 
Azure Cosmos DB - NoSQL Strikes Back (An introduction to the dark side of you...
Azure Cosmos DB - NoSQL Strikes Back (An introduction to the dark side of you...Azure Cosmos DB - NoSQL Strikes Back (An introduction to the dark side of you...
Azure Cosmos DB - NoSQL Strikes Back (An introduction to the dark side of you...
 
UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS Cloud
 
Using Data Lakes
Using Data Lakes Using Data Lakes
Using Data Lakes
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Innovations of .NET and Azure (Recaps of Build 2017 selected sessions)
Innovations of .NET and Azure (Recaps of Build 2017 selected sessions)Innovations of .NET and Azure (Recaps of Build 2017 selected sessions)
Innovations of .NET and Azure (Recaps of Build 2017 selected sessions)
 

More from Riccardo Zamana

Copilot Prompting Toolkit_All Resources.pdf
Copilot Prompting Toolkit_All Resources.pdfCopilot Prompting Toolkit_All Resources.pdf
Copilot Prompting Toolkit_All Resources.pdfRiccardo Zamana
 
At the core you will have KUSTO
At the core you will have KUSTOAt the core you will have KUSTO
At the core you will have KUSTORiccardo Zamana
 
Data saturday malta - ADX Azure Data Explorer overview
Data saturday malta - ADX Azure Data Explorer overviewData saturday malta - ADX Azure Data Explorer overview
Data saturday malta - ADX Azure Data Explorer overviewRiccardo Zamana
 
Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...
Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...
Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...Riccardo Zamana
 
Azure Industrial Iot Edge
Azure Industrial Iot EdgeAzure Industrial Iot Edge
Azure Industrial Iot EdgeRiccardo Zamana
 
Industrial iot: dalle parole ai fatti
Industrial iot: dalle parole ai fatti Industrial iot: dalle parole ai fatti
Industrial iot: dalle parole ai fatti Riccardo Zamana
 
Azure dayroma java, il lato oscuro del cloud
Azure dayroma   java, il lato oscuro del cloudAzure dayroma   java, il lato oscuro del cloud
Azure dayroma java, il lato oscuro del cloudRiccardo Zamana
 
Industrial Iot - IotSaturday
Industrial Iot - IotSaturday Industrial Iot - IotSaturday
Industrial Iot - IotSaturday Riccardo Zamana
 

More from Riccardo Zamana (11)

Copilot Prompting Toolkit_All Resources.pdf
Copilot Prompting Toolkit_All Resources.pdfCopilot Prompting Toolkit_All Resources.pdf
Copilot Prompting Toolkit_All Resources.pdf
 
At the core you will have KUSTO
At the core you will have KUSTOAt the core you will have KUSTO
At the core you will have KUSTO
 
Data saturday malta - ADX Azure Data Explorer overview
Data saturday malta - ADX Azure Data Explorer overviewData saturday malta - ADX Azure Data Explorer overview
Data saturday malta - ADX Azure Data Explorer overview
 
MCT Virtual Summit 2021
MCT Virtual Summit 2021MCT Virtual Summit 2021
MCT Virtual Summit 2021
 
Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...
Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...
Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...
 
Azure Industrial Iot Edge
Azure Industrial Iot EdgeAzure Industrial Iot Edge
Azure Industrial Iot Edge
 
Industrial iot: dalle parole ai fatti
Industrial iot: dalle parole ai fatti Industrial iot: dalle parole ai fatti
Industrial iot: dalle parole ai fatti
 
Azure dayroma java, il lato oscuro del cloud
Azure dayroma   java, il lato oscuro del cloudAzure dayroma   java, il lato oscuro del cloud
Azure dayroma java, il lato oscuro del cloud
 
Industrial Iot - IotSaturday
Industrial Iot - IotSaturday Industrial Iot - IotSaturday
Industrial Iot - IotSaturday
 
Azure reactive systems
Azure reactive systemsAzure reactive systems
Azure reactive systems
 
Industrial IoT on azure
Industrial IoT on azureIndustrial IoT on azure
Industrial IoT on azure
 

Recently uploaded

ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 

Recently uploaded (20)

ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 

Time Series Analytics Azure ADX

  • 1. 1 TOPIC Azure Data Explorer Time series analytics with Azure ADX
  • 3. Questions What about TIME SERIES DATABASE? When Have I to use it? Which are market possible choices? OpenTSDB? Kairos over Scylla/Cassandra? Influx? Why Have I to learn Another DB yet!? Why not SQL? Why not COSMOS?
  • 4. 1. Intro 2. Service 3. Trust 4. Basics 5. Tecniques 6. Dive into Scalar Functions 7. Real Use Cases Summary
  • 5. • seconds freshness, days retention • in-mem aggregated data • pre-defined standing queries • split-seconds query performance • data viewing Hot • minutes freshness, months retention • raw data • ad-hoc queries • seconds-minutes query perf • data exploration Warm • hours freshness, years retention • raw data • programmatic batch processing • minutes-hours query perf • data manipulation Cold • in-mem cube • stream analytics • … • column store • Indexing • … • distributed file system • map reduce • … Example Scenario: Multi-temperature data processing paths Where is located the ADX Process?
  • 6. What is Azure Data Explorer Any append- only stream of records Relational query model: Filter, aggregate, join, calculated columns, … Fully- managed Rapid iterations to explore the data High volume High velocity High variance (structured, semi- structured, free-text) PaaS, Vanilla, Database Purposely built
  • 7. Fully managed big data analytics service • Fully managed for efficiency Focus on insights, not the infra-structure for fast time to value • No infrastructure to manage; provision the service, choose the SKU for your workload, and create database. • Optimized for streaming data Get near-instant insights from fast-flowing data • Scale linearly up to 200 MB per second per node with highly performant, low latency ingestion. • Designed for data exploration • Run ad-hoc queries using the intuitive query language • Returns results from 1 Billion records < 1 second without modifying the data or metadata
  • 8. Use cases & Best Scenario When is it useful? • Analyze Telemetry data • Retrieve trends/Series from clustered data • Make regression over Big Data • Summarize and export ordered streams Which kind of scenario is suitable for? • IoT • Troubleshooting and diagnostics • Monitoring • Security research • Usage analytics
  • 9. Azure Data Explorer Architecture SPARK ADF Apps => API Logstash plg Kafka sync IotHub EventHub EventGrid Data Management Engine SSD Blob / ADLS STREAM BATCH Ingested Data ODBC PowerBI ADX UI MS Flow Logic Apps Notebooks Grafana Spark
  • 10. Azure Data Explorer SLA SLA: at least 99.9% availability (Last updated: Feb 2019) Maximum Available Minutes: is the total number of minutes for a given Cluster deployed by Customer in a Microsoft Azure subscription during a billing month. Downtime: is the total number of minutes within Maximum Available Minutes during which a Cluster is unavailable. Monthly Uptime Percentage: for the Azure Data Explorer is calculated as Maximum Available Minutes less Downtime divided by Maximum Available Minutes. Monthly Uptime % = (Maximum Available Minutes-Downtime) / Maximum Available Minutes X 100 Daily: 01m 26.4s Weekly: 10m 04.8s Monthly: 43m 49.7s
  • 11. How about the pricing Model? 1) Is a cluster (N Engine VMs + 1 Data management VM + Classic Iaas Network) 2) Billed per minutes (if you don’t use it, stop it!) 3) Honest price (if you stop VMs, ADX charges markup stops also) 4) Markup is proportionally related to Engine VMs number (only 1 DM VM “naked”) Developer VM (QPU) Compute Optimized (vCPU) Storage Optimized (vCPU) Only for query test or analytics automation development Production(Azure Data Explorer will also charge for Storage and Networking charges incurred) Workload that need high rate of queries over a smaller data size workloads that need fewer queries over a large volume of data Development
  • 12. How about the Pricing «REAL» Model? Pricing considerations: • More RI savings on D • More space on L • DM Doubled price on L series (2x VM, with 2xresources) https://dataexplorer.azure.com/AzureDataExplorerCostEstimator.html
  • 13. Pricing https://dataexplorer.azure.com/AzureDataExplorerCostEstimator.html Think to ADX as an ANALYSIS TOOL, in a MultiTenant Environment if you want to pay little money if you can afford a space shuttle Think to ADX as an INGESTION & RESILIENCY TOOL in order to break through your traditional «Live DWH» * Space shuttle definition: ADX Can scale up to 500 VMs => 8000 CORE x 64TB RAM….. and 1M euro/month :D Your spending will be a mix between ADX and ADF / PowerBI costs Your spending will be a mix between ADX, Compute& Event Services
  • 14. Available SKU Attribute D SKU L SKU Small SKUs Minimal size is D11 with two cores Minimal size is L4 with four cores Availability Available in all regions (the DS+PS version has more limited availability) Available in a few regions Cost per GB cache per core High with the D SKU, low with the DS+PS version Lowest with the Pay-As-You-Go option Reserved Instances (RI) pricing High discount (over 55 percent for a three-year commitment) Lower discount (20 percent for a three- year commitment) • D v2: The D SKU is compute-optimized (Optional Premium Storage disk) • LS: The L SKU is storage-optimized (greater SSD size than D SKU equivalent)
  • 15. First questions about ADX It is normal to evaluate a new Azure service, in terms of Maturity, Applicability and Affordability. So: A) Are we sure that ADX is a mature Service? B) Which are the correct use cases where it is really useful? C) Which are the OSS Alternatives that we should compare with?
  • 16. A) Are we sure that ADX is a mature Service? Telemetry Analytics for internal Analytics Data Platform for products AI OMS ASC Defender IOT Interactive Analytics Big Data Platform 2015 - 2016 Starting with 1st party validation Building modern analytics Vision of analytics platform for MSFT 2019 Analytics engine for 3rd party offers Unified platform across OMS/AI Expanded scenarios for IOT timeseries Bridged across client/server security 2017 GA - February 2019
  • 17. B) Which are the correct use cases where it is really useful? SCENARIO USE CASE USER STORY Asset management Troubleshooting Scenario You need a Telemetry Analytics Platform, in order to retrieve aggregations or statistical calcultation on historial series («As an IT Manager» I want a platform to load logs from various file types, in order to analyze them and focus graphically the problem during time) IoT Saas Solution Logistics Saas Solution You want to offer multi tenant SAAS Solutions («As a Product Lead Engineer» I want to manage the backend of my multitenant SAAS solution using a unique, fat, huge backend service) IIoT Quality Management You need, within an Industrial IoT solution deelopment, to have common backend to handle with process variables, to make a correation analysis using continuous stream query («As a Quality manager» I need a prebuilt backend solutions to dynamically configure time based query on data in order tofind out correlations from process variables )
  • 18. C) Which are the OSS Alternatives that we should compare with? From db-engines.com Azure Data Explorer Fully managed big data interactive analytics platform Elastic Search A distributed, RESTful modern search and analytics engine ADX can be a replacement for search and log analytics engines such as Elasticsearch, Splunk, InfluxDB. Splunk real-time insights Engine to boost productivity & security. InfluxDB DBMS for storing time series, events and metrics Vs
  • 19. This is a comparison chart Name Elasticsearch (ELASTIC) InfluxDB (InfluxData Inc.) Azure Data Explorer (Microsoft) Splunk (Splunk Inc.) Description A distributed, RESTful modern search and analytics engine based on Apache Lucene DBMS for storing time series, events and metrics Fully managed big data interactive analytics platform Analytics Platform for Big Data Database models Search engine, Document store Time Series DBMS Time Series DBMS, Search engine, Document store , Event Store, Relational DBMS Search engine Initial release 2010 2013 2019 2003 License Open Source Open Source commercial commercial Cloud-based only no no yes no Implementation language Java Go Server operating systems All OS with a Java VM Linux, OS X hosted Linux, OS X, Solaris, Windows Data scheme schema-free schema-free Fixed schema with schema-less datatypes (dynamic) yes Typing yes Numeric data and Strings yes yes XML support no no yes yes Secondary indexes yes no all fields are automatically indexed yes SQL SQL-like query language SQL-like query language Kusto Query Language (KQL), SQL subset no APIs and other access methods RESTful HTTP/JSON API HTTP API RESTful HTTP API HTTP REST Java API JSON over UDP Microsoft SQL Server communication protocol (MS-TDS) Supported programming languages .Net, Java, JavaScript, Python .Net, Java, JavaScript, Python .Net, Java, JavaScript, Python .Net, Java, JavaScript, Python Ruby, PHP, Perl, Groovy, Community Contributed Clients R,Ruby,PHP,Perl,Haskell,Clojure,Erlang,Go,Lisp, Rust,Scala R, PowerShell Ruby, PHP Server-side scripts yes no Yes, possible languages: KQL, Python, R yes Triggers yes no yes yes Partitioning methods Sharding Sharding Sharding Sharding Replication methods yes selectable replication factor yes Master-master replication MapReduce ES-Hadoop Connector no no yes Consistency concepts Eventual Consistency Eventual Consistency Eventual Consistency Immediate Consistency Foreign keys no no no no Transaction concepts no no no no Concurrency yes yes yes yes Durability yes yes yes yes In-memory capabilities Memcached and Redis integration yes no no User concepts simple rights management via user accounts Azure Active Directory Authentication Access rights for users and roles
  • 20. Why ADX is Unique Simplified costs • Vm costs • ADX service add on cost Many Prebuilt Inputs • ADF • Spark • Logstash • Kafka • Iothub • EventHub Many Prebuilt Outputs • TDS/SQL • Power BI • ODBC Connector • Spark • Jupyter • Grafana
  • 21. Azure services with ADX usage Azure Monitor • Log Analytics • Application Insights Security Products • Windows Defender • Azure Security Center • Azure Sentinel IoT • Time Series Insights • Azure IoT Central
  • 22. ADX QUICK START Then go to https://dataexplorer.azure.com/ Sign-in to Azure Portal, search for ‘Azure Data Explorer cluster’ and click on Create. Fill the name of the database, retention and cache period in days and hit the Create button. click on Create data connection to load/ingest data from Event Hub, Blob Storage or IoT Hub, Kafka connector, Azure Data Factory, Event Hub, into the database you just created. With The first command creates a table, the second command ingests data from the csv file to this table (example).
  • 23.
  • 24. How to set ADX «really»? Create a database Use Database to link Ingestion Sources DataConnection Login az login Select Subscription az account set --subscription MyAzureSub Cluster creation az kusto cluster create --name azureclitest --sku D11_V2 –capacity 2 --resource-group testrg Database Creation az kusto database create --cluster-name azureclitest -- name clidatabase --resource-group testrg --soft- delete-period P365D --hot-cache-period P31D HOT-CACHE-PERIOD: Amount of time that data should be kept in cache. Duration in ISO8601 format (for example, 100 days would be P100D). SOFT-DELETE-PERIOD: Amount of time that data should be kept so it is available to query. Duration in ISO8601 format (for example, 100 days would be P100D) Like a Newbie Like a Pro Configuration
  • 25. How to set and use ADX «really»? Or like a BOSS After retrieving Azure identity, create (with async method) Check for a Succeded Provisioning state Create the database Then Do the fucking job and remove all traces [like a Killer] FIRST SECOND THIRD ..LAST!
  • 26. How to script with ADX «really»? • Open VSC, create a file, save it and then edit • Use Log Analytics or KUSTO/KQL extensions ( .csl | .kusto | .kql) • Use Kuskus polugins (4) to highlight grammar (Kuskus use Textmate grammar compliancy) Like a Newbie Like a Pro • Open https://dataexplorer.azure.com/ • Connect your cluster • Use Online editor to test queries • Save KQL Files using
  • 27. How to script with ADX «really»? Make ADX as part of a bigger, collaborative, workflow. Make your own web Application Build a Analysis Workflow with • Creation of Temporary Clusters • Collaborative workspaces • Track History of Analysis done • Score People usage Make your application secure Make your result exportable Natively to your Databases Serve Power BI «for humans» <iframe src="https://dataexplorer.azure.com/clusters/<cluster>?ibizaPortal=true" ></iframe> https://microsoft.github.io/monaco-editor/index.html Or like a BOSS
  • 28. How to ingest data in ADX «really» ? Like a Newbie Like a Pro • Open https://dataexplorer.azure.com/ • Right click on DB, Ingest new data (preview) Use LIGHT INGEST • command-line utility for ad-hoc data ingestion into Kusto • pull source data from a local folder or a Azure Blob Storage container [Ingest JSON data from blobs] LightIngest "https://adxclu001.kusto.windows.net;Federated=true" -database:db001 -table:LAB -sourcePath: "Blob storage accunt with container SAS Token" -prefix:MyDir1/MySubDir2 -format:json -mappingRef:DefaultJsonMapping -pattern:*.json -limit:100 [Ingest CSV data with headers from local files] LightIngest "https://adxclu001.kusto.windows.net;Federated=true" -database:MyDb -table:MyTable -sourcePath:"D:MyFolderData" -format:csv -ignoreFirstRecord:true -mappingPath:"D:MyFolderCsvMapping.txt" -pattern:*.csv.gz -limit:100 https://docs.microsoft.com/en-us/azure/kusto/tools/lightingest
  • 29. How to ingest data in ADX «really» ? Or like a BOSS • Use a DataLake to store files • Use EventGrid in order to have a Trigger Platform • Make a Function … and That’s it!
  • 30. How to visualize data in ADX «really»? Like a Newbie Like a Pro • Open https://dataexplorer.azure.com/ using «RENDER» • Download Kusto Explorer and click the right Chart • Use PowerBI Connector (Direct or Import Modes) • Use Excel connector
  • 31. How to visualize data in ADX «really»? Use Python as your Swiss knife for everything you don’t wanna be «Properly Developed» Or like a BOSS Use VSC as a Jupyter Notebook IDE with «KQLMagic» KQLMagic extends the capabilities of the Python kernel in Jupyter, so you can run Kusto language queries natively, combining Python and Kusto query language. https://github.com/microsoft/jupyter-Kqlmagic
  • 33. ADVANCED INGESTION CAPABILITIES Connect Edge to Cloud thourgh a Secure Chain Path no.1: Local Log Streaming KAFKA cluster as a Log Streamer, equipped with a KUSTO SINK, that writes a direct stream Path no.2: Remote Log Streaming Connect KAFKA directly to Azure Event hub, and then ingest with built in process Path no.2: Local Pipeline LogStash as a local dynamic data collection pipeline with an extensible plugin ecosystem Path no.3: Remote Pipeline Use Integration runtime to link OnPremise Data using Azure Datafactory Dataflow
  • 34. Ingestion Tecniques For high-volume, reliable, and cheap data ingestion Batch ingestion (provided by SDK) the client uploads the data to Azure Blob storage (designated by the Azure Data Explorer data management service) and posts a notification to an Azure Queue. Batch ingestion is the recommended technique. Most appropriate for exploration and prototyping .Inline ingestion (provided by query tools) Inline ingestion: control command (.ingest inline) containing in-band data is intended for ad hoc testing purposes. Ingest from query: control command (.set, .set-or- append, .set-or-replace) that points to query results is used for generating reports or small temporary tables. Ingest from storage: control command (.ingest into) with data stored externally (for example, Azure Blob Storage) allows efficient bulk ingestion of data.
  • 35. Supported data formats The supported data formats are: - CSV, TSV, TSVE, PSV, SCSV, SOH - JSON (line-separated, multi-line), Avro - ZIP and GZIP • CSV Mapping (optional): works with all ordinal-based formats. It can be performed using the ingest command parameter or pre-created on the table and referenced from the ingest command parameter. • JSON Mapping (mandatory) and Avro mapping (mandatory) can be performed using the ingest command parameter. They can also be pre-created on the table and referenced from the ingest command parameter. For all ingestion methods other than ingest from query, YOU HAVE TO FORMAT THE DATA so that Azure Data Explorer can parse it. Schema mapping helps bind source data fields to destination table columns.
  • 36. How about orchestration? Three use cases in which FLOW + KUSTO are the solution Push data to Power BI dataset Periodically do queries, and push to PowerBI dataset Conditional queries Make data checks, and send notifications with no code Email multiple ADX Flow charts Send incredible emails with HTML5 Chart as query result
  • 37. Use ADX with Application not natively supported by a connector Use ODBC to connect to Azure Data Explorer from applications that don't have a dedicated connector. Azure Data Explorer supports a subset of the SQL Server communication protocol. Than you can use your preferred tool, compliant with ODBC Connection. Use ODBC Driver 17 Connect to ADX Cluster Select Database Test connection IMPORTANT NOTES: • T-SQL SELECT statements are supported (sub-queries are not supported instead) • Considering it a SQL Server with AD authentication, you can use LINQPad, Azure Data Studio, Power BI Desktop, Tableau, Excel, DBeavera and SSMS(they're all MS-TDS clients). • SQL server on-premises allows to attach linked server, and you can use "ADX as SQL"
  • 38. From T-SQL to KQL and viceversa Kusto supports translating T-SQL queries to Kusto query language. Is translated into Some example NOTE: remember the dashes ( !#£$%!! )
  • 39. Now, you can go straight to the third level.
  • 40. ADX Functions Functions are reusable queries or query parts. Kusto supports several kinds of functions: • Stored functions, which are user-defined functions that are stored and managed a one kind of a database's schema entities. See Stored functions. • Query-defined functions, which are user-defined functions that are defined and used within the scope of a single query. The definition of such functions is done through a let statement. See User-defined functions. • Built-in functions, which are hard-coded (defined by Kusto and cannot be modified by users).
  • 41. Language examples Alias database["wiki"] = cluster("https://somecluster.kusto.windows.net:443").database("somedatabase"); database("wiki").PageViews | count Let start = ago(5h); let period = 2h; T | where Time > start and Time < start + period | ... Bin: T | summarize Hits=count() by bin(Duration, 1s) Batch: let m = materialize(StormEvents | summarize n=count() by State); m | where n > 2000; m | where n < 10 Tabular expression: Logs | where Timestamp > ago(1d) | join ( Events | where continent == 'Europe' ) on RequestId
  • 42. Time Series Analysis – Bin Operator T | summarize Hits=count() by bin(Duration, 1s) bin(value,roundTo) bin operator Rounds values down to an integer multiple of a given bin size. If you have a scattered set of values, they will be grouped into a smaller set of specific values. [Rule] [Example]
  • 43. Time Series Analysis – Make Series Operator T | make-series sum(amount) default=0, avg(price) default=0 on timestamp from datetime(2016-01-01) to datetime(2016-01-10) step 1d by supplier T | make-series [MakeSeriesParamters] [Column =] Aggregation [default = DefaultValue] [, ...] on AxisColumn from start to end step step [by [Column =] GroupExpression [, ...]] make-series operator [Rule] [Example]
  • 44. Time Series Analysis – Basket Operator StormEvents | where monthofyear(StartTime) == 5 | extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" , "NO") | project State, EventType, Damage, DamageCrops | evaluate basket(0.2) basket operator Basket finds all frequent patterns of discrete attributes (dimensions) in the data and will return all frequent patterns that passed the frequency threshold in the original query. [Rule] [Example] T | evaluate basket([Threshold, WeightColumn, MaxDimensions, CustomWildcard, CustomWildcard, ...])
  • 45. Time Series Analysis – Autocluster Operator StormEvents | where monthofyear(StartTime) == 5 | extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" , "NO") | project State , EventType , Damage | evaluate autocluster(0.6) autocluster operator AutoCluster finds common patterns of discrete attributes (dimensions) in the data and will reduce the results of the original query (whether it's 100 or 100k rows) to a small number of patterns. [Rule] [Example] T | evaluate autocluster([SizeWeight, WeightColumn, NumSeeds, CustomWildcard, CustomWildcard, ...]) StormEvents | where monthofyear(StartTime) == 5 | extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" , "NO") | project State , EventType , Damage | evaluate autocluster(0.2, '~', '~', '*')
  • 46. Export To Storage .export async compressed to csv ( h@"https://storage1.blob.core.windows.net/containerName;secretKey", h@"https://storage1.blob.core.windows.net/containerName2;secretKey" ) with ( sizeLimit=100000, namePrefix=export, includeHeaders=all, encoding =UTF8NoBOM ) <| myLogs | where id == "moshe" | limit 10000 To Sql .export async to sql ['dbo.MySqlTable'] h@"Server=tcp:myserver.database.windows.net,1433;Database=MyDatabase;Authen tication=Active Directory Integrated;Connection Timeout=30;" with (createifnotexists="true", primarykey="Id") <| print Message = "Hello World!", Timestamp = now(), Id=12345678 1. DEFINE COMMAND Define ADX command and try your recurrent export strategy 2. TRY IN EDITOR Use an Editor to try command, verifying conection strings and parametrizing them 3. BUILD A JOB Build a Notebook or a C# JOB using the command as a SQL QUERY in your CODE
  • 47. External tables & Continuous Export It’s an external endpoint: Azure Storage Azure Datalake Store SQL Server You need to define: Destination Continuous-Export Strategy EXT TABLE CREATION .create external table ExternalAdlsGen2 (Timestamp:datetime, x:long, s:string) kind=adl partition by bin(Timestamp, 1d) dataformat=csv ( h@'abfss://filesystem@storageaccount.dfs.core.windows.net/path;secretKey ' ) with ( docstring = "Docs", folder = "ExternalTables", namePrefix="Prefix" ) EXPORT to EXT TABLE .create-or-alter continuous-export MyExport over (T) to table ExternalAdlsGen2 with (intervalBetweenRuns=1h, forcedLatency=10m, sizeLimit=104857600) <| T
  • 48. FACTS: A) Kusto stores its ingested data in reliable storage (most commonly Azure Blob Storage). B) To speed-up queries on that data, Kusto caches this data (or parts of it) on its processing nodes, Retention policy The Kusto cache provides a granular cache policy that customers can use to differentiate between two data cache policies: hot data cache and cold data cache. set query_datascope="hotcache"; T | union U | join (T datascope=all | where Timestamp < ago(365d) on X YOU CAN SPECIFY WHICH LOCATION MUST BE USED Cache policy is independent of retention policy !
  • 49. Retention policy • Soft Delete Period (number) • Data is available for query ts is the ADX IngestionDate • Default is set to 100 YEARS • Recoverability (enabled/disabled) • Default is set to ENABLED • Recoverable for 14 days after deletion .alter database DatabaseName policy retention "{}" .alter table TableName policy retention "{}« EXAMPLE: { "SoftDeletePeriod": "36500.00:00:00", "Recoverability":"Enabled" } .delete database DatabaseName policy retention .delete table TableName policy retention .alter-merge table MyTable1 policy retention softdelete = 7d 2 Parameters, applicable to DB or Table Use KUSTO to set KUSTO
  • 50. Data Purge The purge process is final and irreversible PURGE PROCESS: 1. It requires database admin permissions 2. Prior to Purging you have to be ENABLED, opening a SUPPORT TICKET. 3. Run purge QUERY, and identify SIZE, EXEC.TIME and give VerificationToken 4. Run REALLY purge QUERY passing Verification Token .purge table MyTable records in database MyDatabase <| where CustomerId in ('X', 'Y') NumRecordsToPurge EstimatedPurge ExecutionTime VerificationToken 1,596 00:00:02 e43c7184ed22f4f 23c7a9d7b124d19 6be2e570096987 e5baadf65057fa6 5736b .purge table MyTable records in database MyDatabase with (verificationtoken='e43c7184ed22f4f23c7a9d7b124d196be2e570 096987e5baadf65057fa65736b') <| where CustomerId in ('X', 'Y') .purge table MyTable records in database MyDatabase with (noregrets='true') 2 STEP PROCESS 1 STEP PROCESS With No Regrets !!!!
  • 51. KUSTO: Do and Don’t 1. DO analytics over Big Data. 2. DO and support entities such as databases, tables, and columns 3. DO and support complex analytics query operators (calculated columns, filtering, group by, joins). BUT DO NOT perform in-place updates !
  • 52. UC1: Event Correlation Name City SessionI d Timestamp Start London 2817330 2015-12-09T10:12:02.32 Game London 2817330 2015-12-09T10:12:52.45 Start Manchester 4267667 2015-12-09T10:14:02.23 Stop London 2817330 2015-12-09T10:23:43.18 Cancel Manchester 4267667 2015-12-09T10:27:26.29 Stop Manchester 4267667 2015-12-09T10:28:31.72 City SessionId StartTime StopTime Duration London 2817330 2015-12- 09T10:12:02.32 2015-12- 09T10:23:43.18 00:11:40.46 Manch ester 4267667 2015-12- 09T10:14:02.23 2015-12- 09T10:28:31.72 00:14:29.49 Get sessions from start and stop events Let's suppose we have a log of events, in which some events mark the start or end of an extended activity or session. Every event has an SessionId, so the problem is to match up the start and stop events with the same id. Kusto let Events = MyLogTable | where ... ; Events | where Name == "Start" | project Name, City, SessionId, StartTime=timestamp | join (Events | where Name="Stop" | project StopTime=timestamp, SessionId) on SessionId | project City, SessionId, StartTime, StopTime, Duration = StopTime - StartTime Use let to name a projection of the table that is pared down as far as possible before going into the join. Project is used to change the names of the timestamps so that both the start and stop times can appear in the result. It also selects the other columns we want to see in the result. join matches up the start and stop entries for the same activity, creating a row for each activity. Finally, project again adds a column to show the duration of the activity.
  • 53. UC2: In Place Enrichment Creating and using query-time dimension tables In many cases one wants to join the results of a query with some ad-hoc dimension table that is not stored in the database. It is possible to define an expression whose result is a table scoped to a single query by doing something like this: Kusto // Create a query-time dimension table using datatable let DimTable = datatable(EventType:string, Code:string) [ "Heavy Rain", "HR", "Tornado", "T" ] ; DimTable | join StormEvents on EventType | summarize count() by Code
  • 54. Brief Summary: Azure Data Explorer Easy to ingest the data and easy to query the data Blob & Azure Queue Python SDK IoT Hub .NET SDK Azure Data Explorer REST API Event Hub .NET SDK Python SDK Web UI Desktop App Jupyter Magic APIs UX Power BI Direct Query Microsoft Flow Azure App Logic Connectors Grafana ADF MS-TDS Java SDK Java Script Monaco IDE Azure Notebooks Protocols Streaming Bulk APIs Blob & Event Grid Queued Ingestion Direct Java SDK
  • 55. DEMO

Editor's Notes

  1. - L’intento di oggi è - cosa NON DIREMO - perché gli esempi sono MS
  2. Not for Updates? Append only? Yes, of course.
  3. Approfondimento da inserire qui https://www.linkedin.com/pulse/azure-data-explorer-business-continuity-henning-rauch/?trackingId=h2RAxJ2JNJ%2BYSyTHGSDLbw%3D%3D
  4. Azure Data Explorer cluster is a pair of engine and data management clusters which uses several Azure resources such as Azure Linux VM’s and Storage. The applicable VMs, Azure Storage, Azure Networking and Azure Load balancer costs are billed directly to the customer subscription STORAGE => ci metto dati per retrieve puntuale, sempre spenta => Long retention COMPUTE => cui metto dati temporanei da spazzolare spesso Corta retention
  5. It is like a IAAS… Based on VM size, Storage and Network, and Not based on DATABASE Numbers.
  6. SE SEI IN ANTICIPO, PROVA!!!!
  7. PROVA SU VSC CTRL+P => kuskus Poi: cluster(adxclu001).database('db001').table('TBL_LAB01') | count
  8. VIIIILLKOMMEN AUF LEVEL ZVAI, MAAIINe LIIIBN Willkommen auf Level 2, meine Lieben
  9. Queued ingestion: https://docs.microsoft.com/en-us/azure/kusto/api/netfx/kusto-ingest-queued-ingest-sample Inline ingestion: https://docs.microsoft.com/it-it/azure/kusto/management/data-ingestion/ingest-inline
  10. OOOK, go straight to the thiid
  11. .create-or-alter function with (folder = "AzureSaturday2019", docstring = "Func1", skipvalidation = "true") MyFunction1(i:long) {TBL_LAB0X | limit 100 | where minimum_nights > i} MyFunction1(80); explain SELECT name, minimum_nights from TBL_LAB0X .create-or-alter function with (folder = "AzureSaturday2019", docstring = "Func1", skipvalidation = "true") MyFunction1(i:long) {TBL_LAB0X | project name, minimum_nights | limit 100 | where minimum_nights > i | render columnchart} MyFunction1(80);
  12. T | summarize Hits=count() by bin(Duration, 1s)