This document summarizes a presentation on Azure Data Explorer (ADX). It discusses ingesting data into ADX from various sources using different techniques like LightIngest and batch ingestion. It also covers visualizing data using tools like notebooks, querying data using the Kusto Query Language (KQL), and orchestrating workflows with Logic Apps. Examples of querying techniques like filtering, extending, and binning data are also provided.
4. MCT Summit 2021
What is ADX for me, today
4
• A Telemetry data Search engine => ELK replacement
• A TSDB envolved in LAMBDA replacements (as WARM path) => OSS LAMBDA (MinIO +
Kafka) replacement
• A Tool to Materialize data into ADLS & SQL
• A Tool for monitoring, summarizing information and send notifications
7. MCT Summit 2021
What is Azure Data Explorer
Any append-
only stream
of records
Relational query
model:
Filter, aggregate, join,
calculated columns, …
Fully-
managed
Rapid iterations to
explore the data
High volume
High velocity
High variance
(structured, semi-
structured, free-text)
PaaS,
Vanilla,
Database
Purposely built
8. MCT Summit 2021
The role of ADX
8
Raw data DWH
Refined data
Real time
derived data
Data
comparison
and fast kpi
ADX
THREE KEY USERS IN ONE TOOL:
• IoT Developer (data check, rule engine for insights)
• Data engineer (data comparison)
• Data scientist (data exploration)
9. MCT Summit 2021
How ADX is Organized
11
INSTANCE DATABASE SOURCES
DB Users/Apps
Ingestion URL
Querying URL
Cache storage
Blob storage
EXTERNAL
SOURCES
EXTERNAL
DESTINATIONS
IotHUB
EventHub
Storage
ADLS
Sql Server
MANY..
11. MCT Summit 2021
FIRST PHASE: Ingestion
15
• Many connections & Plugins
• Many SDKs
• Many managed pipelines
• Many tools to Ingest Rapidly
Managed pipelines:
• Ingest blob using EventGrid
• Ingest Eventhub stream
• Ingest IotHub stream
• Ingest data from ADF
Connections & Plugins:
• Logstash plugin
• Kafka Connector
• Apache spark Connector
Many SDK:
• Python SDK
• .NET SDK
• Java SDK
• Node SDK
• REST API
• GO API
Tools:
• One click ingestion
• LightIngest
12. MCT Summit 2021
Ingestion Types:
16
• Streaming ingestion: Optimized for low volume of data per table,
over thousands of tables
• Operation completes in under 10 seconds
• Data available for query after completion
• Batching ingestion: optimized for high ingestion throughput
• Default batch params: 5 minutes, 500 items, or 1000MB
13. MCT Summit 2021
Ingestion Tecniques
17
For high-volume, reliable, and
cheap data ingestion
Batch ingestion
(provided by SDK)
the client uploads the data to Azure Blob
storage (designated by the Azure Data
Explorer data management service) and
posts a notification to an Azure Queue.
Batch ingestion is the recommended
technique.
Most appropriate for exploration and
prototyping
.Inline ingestion
(provided by query tools)
Inline ingestion: control command (.ingest inline) containing in-
band data is intended for ad hoc testing purposes.
Ingest from query: control command (.set, .set-or-append, .set-
or-replace) that points to query results is used for generating
reports or small temporary tables.
Ingest from storage: control command (.ingest into) with data
stored externally (for example, Azure Blob Storage) allows
efficient bulk ingestion of data.
14. MCT Summit 2021
What is LightIngest
18
• command-line utility for ad-hoc data
ingestion into Kusto
• pull source data from a local folder
• pull source data from an Azure Blob
Storage container
• Useful to ingest fastly and play with
ADX
• Most useful when you want to ingest a
large amount of data, (time constraint
on ingestion duration)
[Ingest JSON data from blobs]
LightIngest "https://adxclu001.kusto.windows.net;Federated=true"
-database:db001
-table:LAB
-sourcePath:"https://ACCOUNT_NAME.blob.core.windows.net/CONTAINER_NAME?SAS_TOKEN"
-prefix:MyDir1/MySubDir2
-format:json
-mappingRef:DefaultJsonMapping
-pattern:*.json
-limit:100
[Ingest CSV data with headers from local files]
LightIngest "https://adxclu001.kusto.windows.net;Federated=true"
-database:MyDb
-table:MyTable
-sourcePath:"D:MyFolderData"
-format:csv
-ignoreFirstRecord:true
-mappingPath:"D:MyFolderCsvMapping.txt"
-pattern:*.csv.gz
-limit:100
REFERENCE:
https://docs.microsoft.com/en-us/azure/kusto/tools/lightingest
15. MCT Summit 2021
LightIngest: pay attention with Users!
19
Queued ingestion
Direct ingestion
IMPORTANT: the -creationTimePattern argument allows users to partition the data by creation time, not ingestion time
16. MCT Summit 2021
LightIngest: pay attention with Users!
20
IMPORTANT:
All the data is indexed but... How is partitioned???? By Ingestion TIME !!!
the -creationTimePattern argument allows users to partition the data by creation time, not ingestion time
[Ingest CSV data with headers from local files]
LightIngest "https://adxclu001.kusto.windows.net;Federated=true"
-database:MyDb
-table:MyTable
-sourcePath:"D:MyFolderData"
-format:csv
-ignoreFirstRecord:true
-mappingPath:"D:MyFolderCsvMapping.txt"
-pattern:*.csv.gz
-limit:100
[Ingest JSON data from blobs]
LightIngest "https://adxclu001.kusto.windows.net;Federated=true"
-database:db001
-table:LAB
-sourcePath:
"https://ACCOUNT_NAME.blob.core.windows.net/CONTAIN
ER_NAME?SAS_TOKEN"
-prefix:MyDir1/MySubDir2
-format:json
-mappingRef:DefaultJsonMapping
-pattern:*.json
-limit:100
17. MCT Summit 2021
One Click ingestion GA
21
• One Click makes ingestion (intuitive UX)
• Start ingesting data , creating tables and mapping structures
• Different data formats
• One-time or continuous ingestion
FIRST: check your data,
create and destroy tons of
test tables
18. MCT Summit 2021
Kafka Gold certified connector
22
• From apache Kafka
cluster (on cloud or
onprem)
• Kafka to ingest data
into ADX at scale
• GOLD (Partner
supported <
Microsoft)
What’s the VISION behind it?
19. MCT Summit 2021
What is FluentBIT
23
• Collaboration with CNCF FluentBIT project
• Multi platform Log Processor and Forwarder to collect
data/logs from different sources
• Unify and send to Block Blob
• Ingest them into ADX using EventGrid
• Can use AZURITE as a storageEndpoint for Simulation
https://docs.microsoft.com/en-us/azure/storage/common/storage-use-
azurite?toc=/azure/storage/blobs/toc.json
20. MCT Summit 2021
Ingestion: Format & UseCases
24
• Ingest data using native formats: ApacheAvro, CSV (RFC4180),
JSON, MultiJSON (jsonLine), ORC, Parquet, PSV, SCSV, TSV, TXT
• Files/Blobs can be compressed: ZIP, GZIP
• Better to use declarative names: MyData.csv.zip, MyData.json.gz
21. MCT Summit 2021
Supported data formats
25
For all ingestion methods other than ingest from query, format the data so that Azure Data Explorer can parse it. The
supported data formats are:
• CSV, TSV, TSVE, PSV, SCSV, SOH
• JSON (line-separated, multi-line), Avro
• ZIP and GZIP
Schema mapping helps bind source data fields to destination table columns.
• CSV Mapping (optional) works with all ordinal-based formats. It can be performed using the ingest
command parameter or pre-created on the table and referenced from the ingest command
parameter.
• JSON Mapping (mandatory) and Avro mapping (mandatory) can be performed using the ingest
command parameter. They can also be pre-created on the table and referenced from the ingest
command parameter.
22. MCT Summit 2021
My ingestion best experience
26
Open points:
• Why EventHub after IotHub?
• Why the second EventHub?
24. MCT Summit 2021
How about the Tools?
28
3.VISUALIZE
• Notebooks
• Power BI
• Graphana
• ADX WEB UI
2.QUERY
• Kusto.Explorer
• Web UI
4.ORCHESTRATE
• Microsoft Flow
• Microsoft Logic App
1.LOAD
• LightIngest
• Azure Data Factory
Load
Query
Visualize
Orchestrate
BI People
IT People
ML People
25. MCT Summit 2021
Azure data studio plugins:
29
Manager Cluster
Manager
Notebooks
1. Select New connection from the Connections pane.
2. Fill in the Connection Details information.
3. For Connection type , select Kusto.
4. For Cluster , enter in your Azure Data Explorer cluster.
5. (When entering the cluster name, don't include the https://
prefix or a trailing /)
6. For Authentication Type , use the default - Azure Active
Directory - Universal with MFA account.
7. For Account , use your account information.
8. For Database , use Default.
9. For Server Group , use Default.
10. For Name (optional) , leave blank.
26. MCT Summit 2021
Azure data studio plugins:
30
• Filter/View data
• Build 3D Charts
• Take snapshot as JSON declarative file
27. MCT Summit 2021
Notebooks + ADX = KQL Magic
32
KQL magic:
https://github.com/microsoft/jupyter-Kqlmagic
• extends the capabilities of the Python kernel in Jupyter
• can run Kusto language queries natively
• combine Python and Kusto query language
29. MCT Summit 2021
Which are the OSS Alternatives that we should compare with?
35
From db-engines.com
Azure Data Explorer
Fully managed big data
interactive analytics platform
Elastic Search
A distributed, RESTful modern
search and analytics engine
ADX can be a replacement for search and log analytics engines such as Elasticsearch, Splunk, InfluxDB.
Splunk
real-time insights Engine to
boost productivity & security.
InfluxDB
DBMS for storing time series,
events and metrics
Vs
30. MCT Summit 2021
Comparison chart
36
Name Elasticsearch (ELASTIC) InfluxDB (InfluxData Inc.) Azure Data Explorer (Microsoft) Splunk (Splunk Inc.)
Description A distributed, RESTful modern search and
analytics engine based on Apache Lucene
DBMS for storing time series, events and
metrics
Fully managed big data interactive analytics
platform
Analytics Platform for Big Data
Database models Search engine, Document store Time Series DBMS Time Series DBMS, Search engine, Document
store , Event Store, Relational DBMS
Search engine
Initial release 2010 2013 2019 2003
License Open Source Open Source commercial commercial
Cloud-based only no no yes no
Implementation language Java Go
Server operating systems All OS with a Java VM Linux, OS X hosted Linux, OS X, Solaris, Windows
Data scheme schema-free schema-free Fixed schema with schema-less datatypes
(dynamic)
yes
Typing yes Numeric data and Strings yes yes
XML support no no yes yes
Secondary indexes yes no all fields are automatically indexed yes
SQL SQL-like query language SQL-like query language Kusto Query Language (KQL), SQL subset no
APIs and other access methods RESTful HTTP/JSON API HTTP API RESTful HTTP API HTTP REST
Java API JSON over UDP Microsoft SQL Server communication
protocol (MS-TDS)
Supported programming languages .Net, Java, JavaScript, Python .Net, Java, JavaScript, Python .Net, Java, JavaScript, Python .Net, Java, JavaScript, Python
Ruby, PHP, Perl, Groovy, Community
Contributed Clients
R,Ruby,PHP,Perl,Haskell,Clojure,Erlang,Go,Lisp
,Rust,Scala
R, PowerShell Ruby, PHP
Server-side scripts yes no Yes, possible languages: KQL, Python, R yes
Triggers yes no yes yes
Partitioning methods Sharding Sharding Sharding Sharding
Replication methods yes selectable replication factor yes Master-master replication
MapReduce ES-Hadoop Connector no no yes
Consistency concepts Eventual Consistency Eventual Consistency Eventual Consistency
Immediate Consistency
Foreign keys no no no no
Transaction concepts no no no no
Concurrency yes yes yes yes
Durability yes yes yes yes
In-memory capabilities Memcached and Redis integration yes no no
User concepts simple rights management via user accounts Azure Active Directory Authentication Access rights for users and roles
31. MCT Summit 2021
Update Policy
37
Automatically append data to a target table whenever new data is inserted into the source table, based on a
transformation query that runs on the data inserted into the source table.
USE IT IF:
• The source table is as a «free-text column based»
• The target table accepts only specific morphology
Cascading updates are allowed (TableA → TableB → TableC → ...).
Raw table Refined table
32. MCT Summit 2021
How to use Update Policy
38
// Create a function that will be used for update
.create function
MyUpdateFunction()
{
MyTableX
| where ColumnA == 'some-string'
| summarize MyCount=count() by ColumnB, Key=ColumnC
| join (OtherTable | project OtherColumnZ, Key=OtherColumnC) on Key
| project ColumnB, ColumnZ=OtherColumnZ, Key, MyCount
}
// Create the target table (if it doesn't already exist)
.set-or-append DerivedTableX <| MyUpdateFunction() | limit 0
// Use update policy on table DerivedTableX
.alter table DerivedTableX policy update
@'[{"IsEnabled": true, "Source": "MyTableX", "Query": "MyUpdateFunction()", "IsTransactional": false, "PropagateIngestionProperties": false}]'
33. MCT Summit 2021
Pay attention to failures!
39
Evaluate resource usage
.show table MySourceTable extents;
// The following line provides the extent ID for the not-yet-merged extent in the source table which has the most records
let extentId = $command_results | where MaxCreatedOn > ago(1hr) and MinCreatedOn == MaxCreatedOn | top 1 by RowCount
desc | project ExtentId;
let MySourceTable = MySourceTable | where extent_id() == toscalar(extentId);
MyFunction()
Failures
.show ingestion failures
| where FailedOn > ago(1hr) and OriginatesFromUpdatePolicy == true
• Non-transactional policy: ignored
• Transactional policy: If the ingestion method is pull => automated retry
on the entire ingestion operation (max time)
SO:
You should check failures to
trigger «BROKEN FILES» … but
HOW?
34. MCT Summit 2021
Use this pattern
40
First table is NEVER wide!! … but YES for the second!
First table schema is K,V,TS,Metadata
Second table schema is WT (Wide Table)
Telemetry oriented ML oriented
35. MCT Summit 2021
FUNCTION3
FUNCTION2
FUNCTION1
My personal approach
41
DATA
FUNCTION3.1
FUNCTION3.2
FUNCTION3.3
KPI
DEFINITION
KPI
DEFINITION
KPI
DEFINITION
KPI
DEFINITION
DASHBOARD
(use KPI to
embed and
filter them)
37. MCT Summit 2021
Some code Examples
43
Query with between
Function with parameters «ToScalar» expression
«Extend» usage
38. MCT Summit 2021
Kusto for SQL USers
44
• Perform SQL SELECT (no DDL, only SELECT)
• Use KQL (Kusto Query Language)
• Supports translating T-SQL queries to Kusto query language
--
explain
select top(10) * from StormEvents
order by DamageProperty desc
StormEvents
| sort by DamageProperty desc nulls first
| take 10
39. MCT Summit 2021
Language examples
45
Alias
database["wiki"] =
cluster("https://somecluster.kusto.windows.net:443").da
tabase("somedatabase");
database("wiki").PageViews | count
Let
start = ago(5h);
let period = 2h;
T | where Time > start and Time < start + period | ...
Bin:
T | summarize Hits=count() by bin(Duration, 1s)
Batch:
let m = materialize(StormEvents | summarize n=count() by
State);
m | where n > 2000; m | where n < 10
Tabular expression:
Logs
| where Timestamp > ago(1d)
| join ( Events | where continent == 'Europe' ) on RequestId
40. MCT Summit 2021
Time Series Analysis – Bin Operator
46
T | summarize Hits=count() by bin(Duration, 1s)
bin(value,roundTo)
bin operator
Rounds values down to an integer multiple of a given bin size. If you have a scattered set of values, they will be
grouped into a smaller set of specific values.
[Rule]
[Example]
41. MCT Summit 2021
Time Series Analysis – Make Series Operator
47
T | make-series sum(amount) default=0, avg(price) default=0 on
timestamp from datetime(2016-01-01) to datetime(2016-01-10) step
1d by supplier
T | make-series [MakeSeriesParamters] [Column =] Aggregation [default = DefaultValue] [, ...] on
AxisColumn from start to end step step [by [Column =] GroupExpression [, ...]]
make-series operator
[Rule]
[Example]
42. MCT Summit 2021
Time Series Analysis – Basket Operator
48
StormEvents
| where monthofyear(StartTime) == 5
| extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" , "NO")
| project State, EventType, Damage, DamageCrops
| evaluate basket(0.2)
basket operator
Basket finds all frequent patterns of discrete attributes (dimensions) in the data and will return all frequent patterns
that passed the frequency threshold in the original query.
[Rule]
[Example]
T | evaluate basket([Threshold, WeightColumn, MaxDimensions, CustomWildcard, CustomWildcard, ...])
43. MCT Summit 2021
Time Series Analysis – Autocluster Operator
49
StormEvents
| where monthofyear(StartTime) == 5
| extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" , "NO")
| project State , EventType , Damage
| evaluate autocluster(0.6)
autocluster operator
AutoCluster finds common patterns of discrete attributes (dimensions) in the data and will reduce the results of the
original query (whether it's 100 or 100k rows) to a small number of patterns.
[Rule]
[Example]
T | evaluate autocluster([SizeWeight, WeightColumn, NumSeeds, CustomWildcard, CustomWildcard, ...])
StormEvents
| where monthofyear(StartTime) == 5
| extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" , "NO")
| project State , EventType , Damage
| evaluate autocluster(0.2, '~', '~', '*')
44. MCT Summit 2021
ADX Functions
50
Functions are reusable queries or query parts. Kusto supports several kinds
of functions:
• Stored functions, which are user-defined functions that are stored and managed a
one kind of a database's schema entities. See Stored functions.
• Query-defined functions, which are user-defined functions that are defined and
used within the scope of a single query. The definition of such functions is done
through a let statement. See User-defined functions.
• Built-in functions, which are hard-coded (defined by Kusto and cannot be modified
by users).
45. MCT Summit 2021
Materialized views
51
The view expose an always up-to-date view of the defined aggregation.
Advantages:
• Performance improvement
• Freshness
• Cost reduction
Behind the scenes:
• Source table is periodically materialized into the view table
• During the query time, the view combines the materialized part with the DELTA in raw table since last
materialization to return complete results
46. MCT Summit 2021
QUERY AND PERFORMANCE OPTIMIZATION
52
• Materialized views
• Partitioning
• Query result caching
• Near real time scoring of AML and ONNX models
• FFT functions
• Geospatial
47. MCT Summit 2021
Query result caching
53
• Better query performance
• Lower resource consumption
• The queries needs to be identical
• The cache policy will be defined ùby MAX AGE
• Common use cases: DASHBOARD
48. MCT Summit 2021
Geospatial joins
55
• Use cases
• Connected mobility solutions
• Geospatial risk analysis
• Agriculture optimization using weather data
• Technical background
• Join of polygons reference data and geospatial timeseries data
• Based on three-dimensioanl S2 geometry
• Consists on a coarse-grained join using S2 cell coverage and exact
validation using geo_point_in_polygon function
50. MCT Summit 2021
ADX Dashboards
60
• Integration in KUSTO
Web Explorer
• Optimized for big data
• Using powerful KQL to
retrieve visual data
• Make dynamic views
or widgets
51. MCT Summit 2021
Grafana query builder
61
• Create Grafana panels with no
KQL knowledge
• Select values/filter/grouping
using simple UI dropdowns
• Switch to RawMode to enhance
queries with KQL
52. MCT Summit 2021
How to use Grafana easily
62
Go to https://grafana.com/
Signup and get and Account
53. MCT Summit 2021
How to use Grafana easily
63
Go to All Plugins section, search ADX
Datasource and install plugin
54. MCT Summit 2021
How to use Grafana easily
64
Go to your grafana
https://<workbenchname>.grafana.net/datasources
And configure ADX datasource
And then Start building dashboards!
56. MCT Summit 2021
How about orchestration?
Three use cases in which FLOW + KUSTO are the solution
Push data to Power BI dataset
Periodically do queries, and
push to PowerBI dataset
Conditional queries
Make data checks, and send
notifications with no code
Email multiple ADX Flow charts
Send incredible emails with HTML5
Chart as query result
57. MCT Summit 2021
Orchestration?
Manage costs
Starting and stopping cluster,
evaluating a condition
Query sets to check data
Plan a Set of Queries in order
to say «IT’S OK, even Today
!»
Manage data retention
Based on dynamic condition
58. MCT Summit 2021
An Example of:
68
1. Set trigger 2. Connect and test ADX BLOCK 3. Configure Email BLOCK with dynamic params
61. MCT Summit 2021
Export
71
• To Storage
.export async compressed to csv (
h@"https://storage1.blob.core.windows.net/containerName;secretKey",
h@"https://storage1.blob.core.windows.net/containerName2;secretKey" ) with (
sizeLimit=100000, namePrefix=export, includeHeaders=all, encoding =UTF8NoBOM
) <| myLogs | where id == "moshe" | limit 10000
• To Sql
.export async to sql ['dbo.MySqlTable']
h@"Server=tcp:myserver.database.windows.net,1433;Database=MyDatabase;Auth
entication=Active Directory Integrated;Connection Timeout=30;" with
(createifnotexists="true", primarykey="Id") <| print Message = "Hello World!",
Timestamp = now(), Id=12345678
1. DEFINE COMMAND
Define ADX command and try your
recurrent export strategy
2. TRY IN EDITOR
Use an Editor to try command,
verifying conection strings and
parametrizing them
3. BUILD A JOB
Build a Notebook or a C# JOB using
the command as a SQL QUERY in
your CODE
62. MCT Summit 2021
External tables & Continuous Export
72
• It’s an external
endpoint:
• Azure Storage
• Azure Datalake Store
• SQL Server
• You need to define:
• Destination
• Continuous-Export
Strategy
EXT TABLE CREATION
.create external table ExternalAdlsGen2 (Timestamp:datetime, x:long,
s:string) kind=adl partition by bin(Timestamp, 1d) dataformat=csv (
h@'abfss://filesystem@storageaccount.dfs.core.windows.net/path;secretKey
' ) with ( docstring = "Docs", folder = "ExternalTables", namePrefix="Prefix" )
EXPORT to EXT TABLE
.create-or-alter continuous-export MyExport over (T) to table
ExternalAdlsGen2 with (intervalBetweenRuns=1h, forcedLatency=10m,
sizeLimit=104857600) <| T
63. MCT Summit 2021
My best experience
73
Open points
• How to extract insights, using dynamic and
codeless approach?
• Ho to integrate ADX with low cost DB
solutions?
66. MCT Summit 2021
Data encryption in ADX
• encryption rest (using Azure Storage
• A Microsoft-managed key is used
• customer-managed keys can be enabled
• key rotation, temporary disable and revoke access controls can be
implemented.
• Soft Delete and Purge Protection will be enabled on the Key Vault and cannot
be disabled.
76
67. MCT Summit 2021
Extents, policies and Partition
• What are data shards or extents
• Column, segments, and blocks
• merge policy and sharding policy
• Data partitioning policy (post-ingestion)
77
68. MCT Summit 2021
FACTS:
A) Kusto stores its ingested data in reliable storage (most commonly Azure Blob Storage).
B) To speed-up queries on that data, Kusto caches this data (or parts of it) on its processing nodes,
The Kusto cache provides a granular cache policy that
customers can use to differentiate between two data
cache policies: hot data cache and cold data cache.
set query_datascope="hotcache";
T | union U | join (T datascope=all | where Timestamp < ago(365d) on X
YOU CAN SPECIFY WHICH LOCATION MUST BE USED
Cache policy
is independent
from retention
policy !
Retention policy
78
69. MCT Summit 2021
Retention policy
79
• Soft Delete Period (number)
• Data is available for query
ts is the ADX IngestionDate
• Default is set to 100 YEARS
• Recoverability (enabled/disabled)
• Default is set to ENABLED
• Recoverable for 14 days after deletion
.alter database DatabaseName policy retention "{}"
.alter table TableName policy retention "{}"
EXAMPLE:
{ "SoftDeletePeriod": "36500.00:00:00",
"Recoverability":"Enabled" }
.delete database DatabaseName policy retention
.delete table TableName policy retention
.alter-merge table MyTable1 policy retention softdelete = 7d
2 Parameters, applicable to DB or Table
70. MCT Summit 2021
Data Purge
80
The purge process is final and irreversible
PURGE PROCESS:
1. It requires database admin
permissions
2. Prior to Purging you have to
be ENABLED, opening a
SUPPORT TICKET.
3. Run purge QUERY, and
identify SIZE, EXEC.TIME and
give VerificationToken
4. Run REALLY purge QUERY
passing Verification Token
.purge table MyTable records in database MyDatabase <| where
CustomerId in ('X', 'Y')
NumRecordsToPurge
EstimatedPurge
ExecutionTime VerificationToken
1,596 00:00:02 e43c7184ed22f4f
23c7a9d7b124d19
6be2e570096987
e5baadf65057fa6
5736b
.purge table MyTable records in database MyDatabase with
(verificationtoken='e43c7184ed22f4f23c7a9d7b124d196be2e570
096987e5baadf65057fa65736b') <| where CustomerId in ('X', 'Y')
.purge table MyTable records
in database MyDatabase
with (noregrets='true')
2 STEP PROCESS 1 STEP PROCESS
With No Regrets !!!!
71. MCT Summit 2021
Virtual Network
81
BENEFITS
• USE NSG rules to limit traffic.
• Connect your on-premise network to Azure Data Explorer cluster's subnet.
• Secure your data connection sources (Event Hub and Event Grid) with service
endpoints.
VNET gives you TWO Independent IPs
• Private IP: access the cluster inside the VNet.
• Public IP: access the cluster from outside the VNet (management and monitoring) and as a source address
for outbound connections initiated from the cluster.
73. MCT Summit 2021
Enterprise readiness
83
• RLS
• Provides fine control of access to table data by different users
• Allow specifying user access to specific rows in tables
• Provides mechanics to mask PII data in tables
74. MCT Summit 2021
Leader and Follower
84
• Azure Data Share creates a symbolic link between two ADX cluster.
• Sharing occurs in near-real-time (no data pipeline)
• ADX Decouples the storage and compute
• Allows customers to run multiple compute (read-only) instances on the same underlying storage
• You can attach a database as a follower database, which is a read-only database on a remote cluster.
• You can share the data at the database level or at the cluster level.
The cluster sharing the database is the leader cluster and the
cluster receiving the share is the follower cluster.
A follower cluster can follow one or more leader cluster
databases. The follower cluster periodically synchronizes to
check for changes.
The queries running on the follower cluster use local cache
and don't use the resources of the leader cluster.
EXAMPLE of QUEUED INGESTION
https://docs.microsoft.com/en-us/azure/kusto/api/netfx/kusto-ingest-queued-ingest-sample
Example of INLINE INGESTION
https://docs.microsoft.com/it-it/azure/kusto/management/data-ingestion/ingest-inline
DEMO:
LightIngest is a command-line utility for ad-hoc data ingestion into Azure Data Explorer (ADX). The utility can pull source data from a local folder or from an Azure blob storage container. LightIngest is most useful when you want to ingest a large amount of data, because there is no time constraint on ingestion duration. When historical data is loaded from existing system to ADX, all records receive the same the ingestion date. Now, the -creationTimePattern argument allows users to partition the data by creation time, not ingestion time. The -creationTimePattern argument extracts the CreationTime property from the file or blob path; the pattern doesn't need to reflect the entire item path, just the section enclosing the timestamp you want.
LightIngest is a command-line utility for ad-hoc data ingestion into Azure Data Explorer (ADX). The utility can pull source data from a local folder or from an Azure blob storage container. LightIngest is most useful when you want to ingest a large amount of data, because there is no time constraint on ingestion duration. When historical data is loaded from existing system to ADX, all records receive the same the ingestion date. Now, the -creationTimePattern argument allows users to partition the data by creation time, not ingestion time. The -creationTimePattern argument extracts the CreationTime property from the file or blob path; the pattern doesn't need to reflect the entire item path, just the section enclosing the timestamp you want.
LightIngest is a command-line utility for ad-hoc data ingestion into Azure Data Explorer (ADX). The utility can pull source data from a local folder or from an Azure blob storage container. LightIngest is most useful when you want to ingest a large amount of data, because there is no time constraint on ingestion duration. When historical data is loaded from existing system to ADX, all records receive the same the ingestion date. Now, the -creationTimePattern argument allows users to partition the data by creation time, not ingestion time. The -creationTimePattern argument extracts the CreationTime property from the file or blob path; the pattern doesn't need to reflect the entire item path, just the section enclosing the timestamp you want.
Fai vedere NOTEBOOKS
PROVA SU VSC
CTRL+P => kuskus
Poi:
cluster(adxclu001).database('db001').table('TBL_LAB01')
| count
instructs Kusto to automatically append data to a target table whenever new data is inserted into the source table, based on a transformation query that runs on the data inserted into the source table.
The query can invoke stored functions, but can't include cross-database or cross-cluster queries.
Update policy is initiated following ingestion
Update policies take effect when data is ingested or moved to (extents are created in) a defined source table using any of the following commands:
The update policy will behave like regular ingestion when the following conditions are met:
The source table is a high-rate trace table with interesting data formatted as a free-text column.
The target table on which the update policy is defined accepts only specific trace lines.
The table has a well-structured schema that is a transformation of the original free-text data created by the parse operator.
Se usi HAS usi indici, contains non usa indici
Spiega come fai i between
FARE UN PO DI PROVE
FARE la distinct per introdurre la SUMMERIZE
T | summarize Hits=count() by bin(Duration, 1s)
.create-or-alter function with (folder = "AzureSaturday2019", docstring = "Func1", skipvalidation = "true") MyFunction1(i:long) {TBL_LAB0X | limit 100 | where minimum_nights > i}
MyFunction1(80);
explain SELECT name, minimum_nights from TBL_LAB0X
.create-or-alter function with (folder = "AzureSaturday2019", docstring = "Func1", skipvalidation = "true") MyFunction1(i:long) {TBL_LAB0X | project name, minimum_nights | limit 100 | where minimum_nights > i | render columnchart} MyFunction1(80);
FAI ESEMPIO DI EXPORT
Kusto is built to support tables with a huge number of records (rows) and large amounts of data. To handle such large tables, each table's data is divided into smaller "tablets" called data shards or extents (the two terms are synonymous). The union of all the table's extents holds the table's data. Individual extents are kept smaller than a single node's capacity, and the extents are spread over the cluster's nodes, achieving scale-out.
An extent is a like a type of mini-table. It contains data and metadata and information such as its creation time and optional tags that are associated with its data. Additionally, the extent usually holds information that lets Kusto query the data efficiently. For example, an index for each column of data in the extent, and an encoding dictionary, if column data is encoded. As a result, the table's data is the union of all the data in the table's extents.
Extents are immutable and can never be modified. It may only be queried, reassigned to a different node, or dropped out of the table. Data modification happens by creating one or more new extents and transactionally swapping old extents with new ones.
Extents hold a collection of records that are physically arranged in columns. This technique is called columnar store. It enables efficient encoding and compression of the data, because different values from the same column often "resemble" each other. It also makes querying large spans of data more efficient, because only the columns used by the query need to be loaded. Internally, each column of data in the extent is subdivided into segments, and the segments into blocks. This division isn't observable to queries, and lets Kusto optimize column compression and indexing.
To maintain query efficiency, smaller extents are merged into larger extents. The merge is done automatically, as a background process, according to the configured merge policy and sharding policy. Merging extents reduces the management overhead of having a large number of extents to track. More importantly, it allows Kusto to optimize its indexes and improve compression.
Extent merging stops once an extent reaches certain limits, such as size, since beyond a certain point, merging reduces rather than increases efficiency.
When a Data partitioning policy is defined on a table, extents go through another background process after they're created (post-ingestion). This process reingests the data from the source extents and creates homogeneous extents, in which the values of the column that is the table's partition key all belong to the same partition. If the policy includes a hash partition key, all homogeneous extents that belong to the same partition will be assigned to the same data node in the cluster.
Azure Data Share creates a symbolic link between two ADX cluster.
Sharing occurs in near-real-time (no data pipeline)
ADX Decouples the storage and compute
Allows customers to run multiple compute (read-only) instances on the same underlying storage
You can attach a database as a follower database, which is a read-only database on a remote cluster.
You can share the data at the database level or at the cluster level.
The cluster sharing the database is the leader cluster and the cluster receiving the share is the follower cluster.
A follower cluster can follow one or more leader cluster databases. The follower cluster periodically synchronizes to check for changes.
The queries running on the follower cluster use local cache and don't use the resources of the leader cluster.