SlideShare ist ein Scribd-Unternehmen logo
1 von 82
Agenda
• ADX Basics: Service Goal, pricing,
capabilities
• ADX Data Flow: Ingestion, Querying,
Visualization
• ADX Ecosystem: Integration &
Orchestration
• ADX Tools: monitoring and Management
• ADX Use Cases & best practices
Thank you to our Sponsor
Everything our User
Group Has To Offer
Get involved in
our Meetup
Join the conversation on
our Facebook group
Follow our page
on Facebook
Follow our Videos
on Youtube
Explore
https://bit.ly/2P9sqLy https://bit.ly/2QqAWX4
https://bit.ly/3auvRnD
https://bit.ly/3n8l5bP
ADX - Basics
4
Azure Data Explorer in a sentence
The Platform
Any append-
only stream
of records
Relational query model:
Filter, aggregate, join,
calculated columns, …
Fully-
managed
Rapid iterations to
explore the data
High volume
High velocity
High variance
(structured, semi-
structured, free-text)
PaaS, Vanilla,
Database
Purposely built
ADX in a sentence
© Microsoft Corporation
Azure Data Explorer use cases
IoT applications
Discover and address performance issues with machines,
equipment, and devices in real-time to optimize
production quality and productivity.
Big data logging platform
Enhance customer experiences using digital platforms.
Spot trends, patterns, or anomalies within billions of lines
of log data to make near instant corrections to improve
performance.
SaaS applications
Build multi-tenant SaaS applications embedded with
interactive analytics. Monitor the performance of the
application, improve products, and provide business
owners insights to boost business outcomes.
Nik Shampur
Software Development Lead
“Azure Data Explorer has improved
our analysis capabilities for our
product tremendously…The
scalability and performance allow
us to deeply analyze our collected
data and retrieve valuable insights.”
© Microsoft Corporation
Fast and fully managed data
analytics service
Fully managed
for efficiency
Focus on insights, not the infra-
structure for fast time to value
No infrastructure to manage;
provision the service, choose the
SKU for your workload, and create
database.
Optimized for
streaming data
Get near-instant insights
from fast-flowing data
Scale linearly up to 200 MB per second
per node with highly performant, low
latency ingestion.
Designed for
data exploration
Run ad-hoc queries using the
intuitive query language
Returns results from 1 Billion records <
1 second without modifying the data
or metadata
© Microsoft Corporation
Azure Data Explorer overview
1. Capability for many data types,
formats, and sources
Structured (numbers), semi-structured
(JSONXML), and free text
2. Batch or streaming ingestion
Use managed ingestion pipeline or
queue a request for pull ingestion
3. Compute and storage isolation
• Independent scale out / scale in
• Persistent data in Azure Blob Storage
• Caching for low-latency on compute
4. Multiple options to support
data consumption
Use out-of-the box tools and connectors
or use APIs/SDKs for custom solution
Data Lake
/ Blob
IoT
Ingested Data
Engine
Data
Management
Azure Data Explorer
Azure Storage
Event Hub
IoT Hub
Customer Data
Lake
Kafka Sync
Logstash Plugin
Event Grid
Azure Portal
Power BI
ADX Web UI
ODBC / JDBC Apps
Apps (Via API)
Logstash Plugin
Apps (Via API)
Create,
Manage
Stream
Batch
Grafana
Query,
Control Commands
Azure OSS Applications
Active Data
Connections
© Microsoft Corporation
Intuitive querying
Designed for data exploration
Simple and powerful
• Rich rational query language (filter, aggregate, join,
calculated columns, and more)
• Built-in full-text search, time series, user analytics, and
machine learning operators
• Out-of-the box visualization (render)
• Easy-to-use syntax + Microsoft IntelliSense
• Highly recognizable hierarchical schema entities
Comprehensive
• Built for querying over structured, semi-structured and
unstructured data simultaneously
Extensible
• In-line Python
• SQL
© Microsoft Corporation
Easy provisioning
• No infrastructure to manage: Azure PaaS
• Use Azure Portal, APIs, or PowerShell to provision
• Storage Optimize/Compute Optimize SKUs
• Flexible data caching and retention options at
database and table level
Rapid elasticity
• Buy only what you need
• Scale out/in manually or use autoscale
• Dedicated resources
Maintenance-free
• All columns are compressed and indexed
during ingestion
• No index maintenance required
Simple provisioning
Fully managed for efficiency
• seconds freshness, days retention
• in-mem aggregated data
• pre-defined standing queries
• split-seconds query performance
• data viewing
Hot
• minutes freshness, months retention
• raw data
• ad-hoc queries
• seconds-minutes query perf
• data exploration
Warm
• hours freshness, years retention
• raw data
• programmatic batch processing
• minutes-hours query perf
• data manipulation
Cold
• in-mem cube
• stream analytics
• …
• column store
• Indexing
• …
• distributed file system
• map reduce
• …
Multi-temperature data processing paths
The role of ADX
12
Raw data DWH
Refined data
Real time
derived data
Data
comparison
and fast kpi
ADX
THREE KEY USERS IN ONE TOOL:
• IoT Developer (data check, rule engine for insights)
• Data engineer (data comparison)
• Data scientist (data exploration)
How ADX is Organized
13
INSTANCE DATABASE SOURCES
DB Users/Apps
Ingestion URL
Querying URL
Cache storage
Blob storage
EXTERNAL
SOURCES
EXTERNAL
DESTINATIONS
IotHUB
EventHub
Storage
ADLS
Sql Server
MANY..
ADX – Ingest data
14
FIRST PHASE: Ingestion
15
• Many connections & Plugins
• Many SDKs
• Many managed pipelines
• Many tools to Ingest Rapidly
Managed pipelines:
• Ingest blob using EventGrid
• Ingest Eventhub stream
• Ingest IotHub stream
• Ingest data from ADF
Connections & Plugins:
• Logstash plugin
• Kafka Connector
• Apache spark Connector
Many SDK:
• Python SDK
• .NET SDK
• Java SDK
• Node SDK
• REST API
• GO API
Tools:
• One click ingestion
• LightIngest
Ingestion Types:
16
• Streaming ingestion: Optimized for low volume of data per table,
over thousands of tables
• Operation completes in under 10 seconds
• Data available for query after completion
• Batching ingestion: optimized for high ingestion throughput
• Default batch params: 5 minutes, 500 items, or 1000MB
Ingestion Tecniques
17
For high-volume, reliable,
and cheap data ingestion
Batch ingestion
(provided by SDK)
the client uploads the data to Azure
Blob storage (designated by the Azure
Data Explorer data management
service) and posts a notification to an
Azure Queue.
Batch ingestion is the recommended
technique.
Most appropriate for exploration and
prototyping
.Inline ingestion
(provided by query tools)
Inline ingestion: control command (.ingest inline)
containing in-band data is intended for ad hoc testing
purposes.
Ingest from query: control command (.set, .set-or-append,
.set-or-replace) that points to query results is used for
generating reports or small temporary tables.
Ingest from storage: control command (.ingest into) with
data stored externally (for example, Azure Blob Storage)
allows efficient bulk ingestion of data.
Ingestion: Format & UseCases
18
For all ingestion methods other than ingest from query, format the data so that Azure
Data Explorer can parse it. The supported data formats are:
• CSV, TSV, TSVE, PSV, SCSV, SOH
• JSON (line-separated, multi-line), Avro, MultiJSON (jsonLine), ORC, Parquet
• Files/Blobs can be compressed: ZIP, GZIP
• Better to use declarative names: MyData.csv.zip, MyData.json.gz
Supported data formats
19
Schema mapping helps bind source data fields to destination table columns.
• CSV Mapping (optional) works with all ordinal-based formats. It can be performed
using the ingest command parameter or pre-created on the table and referenced
from the ingest command parameter.
• JSON Mapping (mandatory) and Avro mapping (mandatory) can be performed
using the ingest command parameter. They can also be pre-created on the table and
referenced from the ingest command parameter.
[
{ "column" : "rownumber", "Properties":{"Ordinal":"0"}},
{ "column" : "rowguid", "Properties":{"Ordinal":"1"}},
{ "column" : "xdouble", "Properties":{"Ordinal":"2"}},
{ "column" : "xbool", "Properties":{"Ordinal":"3"}},
{ "column" : "xint32", "Properties":{"Ordinal":"4"}},
{ "column" : "xint64", "Properties":{"Ordinal":"5"}},
{ "column" : "xdate", "Properties":{"Ordinal":"6"}},
{ "column" : "xtext", "Properties":{"Ordinal":"7"}},
{ "column" : "const_val", "Properties":{"ConstValue":"Sample: constant value"}}
]
[
{ "column" : "rownumber", "Properties":{"Path":"$.rownumber"}},
{ "column" : "rowguid", "Properties":{"Path":"$.rowguid"}},
{ "column" : "xdouble", "Properties":{"Path":"$.xdouble"}},
{ "column" : "xbool", "Properties":{"Path":"$.xbool"}},
{ "column" : "xint32", "Properties":{"Path":"$.xint32"}},
{ "column" : "xint64", "Properties":{"Path":"$.xint64"}},
{ "column" : "xdate", "Properties":{"Path":"$.xdate"}},
{ "column" : "xtext", "Properties":{"Path":"$.xtext"}},
{ "column" : "location", "Properties":{"transform":"SourceLocation"}},
{ "column" : "lineNumber", "Properties":{"transform":"SourceLineNumber"}},
{ "column" : "timestamp", "Properties":{"Path":"$.unix_ms", "transform":"DateTimeFromUnixMilliseconds"}},
{ "column" : "full_record", "Properties":{"Path":"$"}}
]
Demo
What is LightIngest
23
• command-line utility for ad-hoc
data ingestion into Kusto
• pull source data from a local
folder
• pull source data from an Azure
Blob Storage container
• Useful to ingest fastly and play
with ADX
• Most useful when you want to
ingest a large amount of data, (time
constraint on ingestion duration)
[Ingest JSON data from blobs]
LightIngest "https://adxclu001.kusto.windows.net;Federated=true"
-database:db001
-table:LAB
-
sourcePath:"https://ACCOUNT_NAME.blob.core.windows.net/CONTAINER_NAME?SAS_TOKEN"
-prefix:MyDir1/MySubDir2
-format:json
-mappingRef:DefaultJsonMapping
-pattern:*.json
-limit:100
[Ingest CSV data with headers from local files]
LightIngest "https://adxclu001.kusto.windows.net;Federated=true"
-database:MyDb
-table:MyTable
-sourcePath:"D:MyFolderData"
-format:csv
-ignoreFirstRecord:true
-mappingPath:"D:MyFolderCsvMapping.txt"
-pattern:*.csv.gz
-limit:100
REFERENCE:
https://docs.microsoft.com/en-
us/azure/kusto/tools/lightingest
LightIngest: pay attention IngestionTime!
24
IMPORTANT:
All the data is indexed but... How is partitioned???? By Ingestion TIME !!!
the -creationTimePattern argument allows users to partition the data by creation time, not ingestion time
[Ingest CSV data with headers from local
files]
LightIngest
"https://adxclu001.kusto.windows.net;Federated=true"
-database:MyDb
-table:MyTable
-sourcePath:"D:MyFolderData"
-format:csv
-ignoreFirstRecord:true
-mappingPath:"D:MyFolderCsvMapping.txt"
-pattern:*.csv.gz
-limit:100
[Ingest JSON data from blobs]
LightIngest
"https://adxclu001.kusto.windows.net;Federated=true"
-database:db001
-table:LAB
-sourcePath:
"https://ACCOUNT_NAME.blob.core.windows.net/CON
TAINER_NAME?SAS_TOKEN"
-prefix:MyDir1/MySubDir2
-format:json
-mappingRef:DefaultJsonMapping
-pattern:*.json
-limit:100
One Click ingestion GA
25
• One Click makes ingestion (intuitive UX)
• Start ingesting data , creating tables and
mapping structures
• Different data formats
STEPS:
1. Check your data
2. Study the best format compression
3. Create and destroy tons of test tables
4. Derive the Mapping
5. SCRIPT ALL and Version It
My ingestion best experience
26
Open points:
• Why EventHub after IotHub?
• Why the second EventHub?
Update Policy
27
Automatically append data to a target table whenever new data is inserted into the source table, based on a
transformation query that runs on the data inserted into the source table.
USE IT IF:
• The source table is as a «free-text column based»
• The target table accepts only specific morphology
Cascading updates are allowed (TableA → TableB → TableC → ...).
Raw table Refined table
How to use Update Policy
28
// Create a function that will be used for update
.create function
MyUpdateFunction()
{
MyTableX
| where ColumnA == 'some-string'
| summarize MyCount=count() by ColumnB, Key=ColumnC
| join (OtherTable | project OtherColumnZ, Key=OtherColumnC) on Key
| project ColumnB, ColumnZ=OtherColumnZ, Key, MyCount
}
// Create the target table (if it doesn't already exist)
.set-or-append DerivedTableX <| MyUpdateFunction() | limit 0
// Use update policy on table DerivedTableX
.alter table DerivedTableX policy update
@'[{"IsEnabled": true, "Source": "MyTableX", "Query": "MyUpdateFunction()", "IsTransactional": false, "PropagateIngestionProperties": false}]’
.delete table DerivedTableX policy update
Pay attention to failures!
29
Evaluate resource usage
.show table MySourceTable extents;
// The following line provides the extent ID for the not-yet-merged extent in the source table which has the most
records
let extentId = $command_results | where MaxCreatedOn > ago(1hr) and MinCreatedOn == MaxCreatedOn | top 1 by
RowCount desc | project ExtentId;
let MySourceTable = MySourceTable | where extent_id() == toscalar(extentId);
MyFunction()
Failures
.show ingestion failures
| where FailedOn > ago(1hr) and OriginatesFromUpdatePolicy == true
• Non-transactional policy: ignored
• Transactional policy: If the ingestion method is pull => automated
retry on the entire ingestion operation (max time)
SO:
You should check failures to
trigger «BROKEN FILES» …
but HOW?
Use this pattern
30
First table is NEVER wide!!
… but YES for the second!
First table schema is K,V,TS,Metadata
Second table schema is WT (Wide
Table)
Telemetry oriented ML oriented
Demo
ADX – Query data
32
Kusto for SQL USers
33
• Perform SQL SELECT (no DDL, only SELECT)
• Use KQL (Kusto Query Language)
• Supports translating T-SQL queries to Kusto query
language
explain
select top(10) * from StormEvents
order by DamageProperty desc
StormEvents
| sort by DamageProperty desc nulls first
| take 10
Some code Examples
34
Query with between
Function with parameters «ToScalar» expression
«Extend» usage
Language examples
35
Alias
database["wiki"] =
cluster("https://somecluster.kusto.windows.net:443")
.database("somedatabase");
database("wiki").PageViews | count
Let
start = ago(5h);
let period = 2h;
T | where Time > start and Time < start + period | ...
Bin:
T | summarize Hits=count() by bin(Duration, 1s)
Batch:
let m = materialize(StormEvents | summarize n=count() by
State);
m | where n > 2000; m | where n < 10
Tabular expression:
Logs
| where Timestamp > ago(1d)
| join ( Events | where continent == 'Europe' ) on RequestId
Time Series Analysis – Bin Operator
36
T | summarize Hits=count() by bin(Duration,
1s)
bin(value,roundTo)
bin operator
Rounds values down to an integer multiple of a given bin size. If you have a scattered set of values, they will
be grouped into a smaller set of specific values.
[Rule]
[Example]
Time Series Analysis – Make Series Operator
37
T | make-series sum(amount) default=0, avg(price) default=0 on timestamp from datetime(2016-01-
01) to datetime(2016-01-10) step 1d by supplier
T | make-series [MakeSeriesParamters] [Column =] Aggregation [default = DefaultValue] [, ...] on
AxisColumn from start to end step step [by [Column =] GroupExpression [, ...]]
make-series operator
[Rule]
[Example]
Time Series Analysis – Basket Operator
38
StormEvents
| where monthofyear(StartTime) == 5
| extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" , "NO")
| project State, EventType, Damage, DamageCrops
| evaluate basket(0.2)
basket operator
Basket finds all frequent patterns of discrete attributes (dimensions) in the data and will return all frequent
patterns that passed the frequency threshold in the original query.
[Rule]
[Example]
T | evaluate basket([Threshold, WeightColumn, MaxDimensions, CustomWildcard, CustomWildcard, ...])
Time Series Analysis – Autocluster Operator
39
StormEvents
| where monthofyear(StartTime) == 5
| extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" ,
"NO")
| project State , EventType , Damage
| evaluate autocluster(0.6)
autocluster operator
AutoCluster finds common patterns of discrete attributes (dimensions) in the data and will reduce the results
of the original query (whether it's 100 or 100k rows) to a small number of patterns.
[Rule]
[Example]
T | evaluate autocluster([SizeWeight, WeightColumn, NumSeeds, CustomWildcard, CustomWildcard, ...])
StormEvents
| where monthofyear(StartTime) == 5
| extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" , "NO")
| project State , EventType , Damage
| evaluate autocluster(0.2, '~', '~', '*')
Demo
ADX Functions
41
Functions are reusable queries or query parts. Kusto supports several kinds of functions:
• Stored functions, which are user-defined functions that are stored and managed a one kind of a
database's schema entities. See Stored functions.
• Query-defined functions, which are user-defined functions that are defined and used within the scope of
a single query. The definition of such functions is done through a let statement. See User-defined
functions.
• Built-in functions, which are hard-coded (defined by Kusto and cannot be modified by users).
Materialized views
42
The view expose an always up-to-date view of the defined aggregation.
Advantages:
• Performance improvement
• Freshness
• Cost reduction
Behind the scenes:
• Source table is periodically materialized into the view table
• During the query time, the view combines the materialized part with the DELTA in raw table since last
materialization to return complete results
Demo
ADX – Export data
44
Export
45
• To Storage
.export async compressed to csv (
h@"https://storage1.blob.core.windows.net/containerName;secretKey",
h@"https://storage1.blob.core.windows.net/containerName2;secretKey" )
with ( sizeLimit=100000, namePrefix=export, includeHeaders=all, encoding
=UTF8NoBOM ) <| myLogs | where id == "moshe" | limit 10000
• To Sql
.export async to sql ['dbo.MySqlTable']
h@"Server=tcp:myserver.database.windows.net,1433;Database=MyDatabas
e;Authentication=Active Directory Integrated;Connection Timeout=30;" with
(createifnotexists="true", primarykey="Id") <| print Message = "Hello
World!", Timestamp = now(), Id=12345678
1. DEFINE COMMAND
Define ADX command and try
your recurrent export strategy
2. TRY IN EDITOR
Use an Editor to try command,
verifying conection strings and
parametrizing them
3. BUILD A JOB
Build a Notebook or a C# JOB
using the command as a SQL
QUERY in your CODE
External tables & Continuous Export
46
• It’s an external endpoint:
• Azure Storage
• Azure Datalake Store
• SQL Server
• You need to define:
• Destination
• Continuous-Export Strategy
EXT TABLE CREATION
.create external table ExternalAdlsGen2 (Timestamp:datetime, x:long,
s:string) kind=adl partition by bin(Timestamp, 1d) dataformat=csv (
h@'abfss://filesystem@storageaccount.dfs.core.windows.net/path;secre
tKey' ) with ( docstring = "Docs", folder = "ExternalTables",
namePrefix="Prefix" )
EXPORT to EXT TABLE
.create-or-alter continuous-export MyExport over (T) to table
ExternalAdlsGen2 with (intervalBetweenRuns=1h, forcedLatency=10m,
sizeLimit=104857600) <| T
My best experience
47
Open points
• How to extract insights, using dynamic
and codeless approach?
• Ho to integrate ADX with low cost DB
solutions?
My final ADX recipe
48
Blob
Storage RawTables
Refined
Tables
Triggered dynamic
check queries
Datalake (long term buckets)
SQL DWH
Update
policy
External
table
Materialized
view
Batch
ingestion
External
table
ADX – View data
49
ADX Dashboards
50
• Integration in KUSTO Web
Explorer
• Optimized for big data
• Using powerful KQL to retrieve
visual data
• Make dynamic views or widgets
FUNCTION3
FUNCTION2
FUNCTION1
My personal approach
51
DATA
FUNCTION3.1
FUNCTION3.2
FUNCTION3.3
KPI
DEFINITION
KPI
DEFINITION
KPI
DEFINITION
KPI
DEFINITION
DASHBOARD
(use KPI to
embed and
filter them)
Demo
Grafana query builder
53
• Create Grafana panels
with no KQL knowledge
• Select
values/filter/grouping using
simple UI dropdowns
• Switch to RawMode to
enhance queries with KQL
How to use Grafana easily
54
Go to All Plugins section, search
ADX Datasource and install plugin
How to use Grafana easily
55
Go to https://grafana.com/
Signup and get and Account
How to use Grafana easily
56
Go to your grafana
https://<workbenchname>.grafana.net/datasources
And configure ADX datasource
And then Start building dashboards!
ADX – Use Data
57
How about orchestration?
Three use cases in which FLOW + KUSTO are the solution
Push data to Power BI dataset
Periodically do queries, and
push to PowerBI dataset
Conditional queries
Make data checks, and send
notifications with no code
Email multiple ADX Flow charts
Send incredible emails with HTML5
Chart as query result
Orchestration?
Manage costs
Starting and stopping cluster,
evaluating a condition
Query sets to check data
Plan a Set of Queries in order
to say «IT’S OK, even Today !»
Manage data retention
Based on dynamic condition
An Example of:
60
1. Set trigger 2. Connect and test ADX BLOCK 3. Configure Email BLOCK with dynamic params
And the result is:
61
ADX – Security & Management
62
Data encryption in ADX
• encryption rest (using Azure Storage
• A Microsoft-managed key is used
• customer-managed keys can be enabled
• key rotation, temporary disable and revoke access controls can be implemented.
• Soft Delete and Purge Protection will be enabled on the Key Vault and cannot be disabled.
63
Extents, policies and Partition
• What are data shards or extents
• Column, segments, and blocks
• merge policy and sharding policy
• Data partitioning policy (post-ingestion)
64
FACTS:
A) Kusto stores its ingested data in reliable storage (most commonly Azure Blob Storage).
B) To speed-up queries on that data, Kusto caches this data (or parts of it) on its processing nodes,
The Kusto cache provides a granular cache policy that
customers can use to differentiate between two data
cache policies: hot data cache and cold data cache.
set query_datascope="hotcache";
T | union U | join (T datascope=all | where Timestamp < ago(365d) on X
YOU CAN SPECIFY WHICH LOCATION MUST BE USED
Cache policy
is independent
from retention
policy !
Retention policy
65
Retention policy
66
• Soft Delete Period (number)
• Data is available for query
ts is the ADX IngestionDate
• Default is set to 100 YEARS
• Recoverability (enabled/disabled)
• Default is set to ENABLED
• Recoverable for 14 days after deletion
.alter database DatabaseName policy retention "{}"
.alter table TableName policy retention "{}"
EXAMPLE:
{ "SoftDeletePeriod": "36500.00:00:00",
"Recoverability":"Enabled" }
.delete database DatabaseName policy retention
.delete table TableName policy retention
.alter-merge table MyTable1 policy retention softdelete = 7d
2 Parameters, applicable to DB or Table
Data Purge
67
PURGE PROCESS:
1. It requires database admin
permissions
2. Prior to Purging you have to
be ENABLED, opening a
SUPPORT TICKET.
3. Run purge QUERY, and
identify SIZE, EXEC.TIME and
give VerificationToken
4. Run REALLY purge QUERY
passing Verification Token
.purge table MyTable records in database MyDatabase <| where
CustomerId in ('X', 'Y')
NumRecordsToPurge
EstimatedPurge
ExecutionTime VerificationToken
1,596 00:00:02 e43c7184ed22f4f
23c7a9d7b124d19
6be2e570096987
e5baadf65057fa6
5736b
.purge table MyTable records in database MyDatabase with
(verificationtoken='e43c7184ed22f4f23c7a9d7b124d196be2e570
096987e5baadf65057fa65736b') <| where CustomerId in ('X', 'Y')
.purge table MyTable records
in database MyDatabase
with (noregrets='true')
2 STEP PROCESS 1 STEP PROCESS
With No Regrets !!!!
Virtual Network
BENEFITS
• USE NSG rules to limit traffic.
• Connect your on-premise network to Azure Data Explorer cluster's subnet.
• Secure your data connection sources (Event Hub and Event Grid) with
service endpoints.
68
VNET gives you TWO Independent IPs
• Private IP: access the cluster inside the VNet.
• Public IP: access the cluster from outside the VNet (management
and monitoring) and as a source address for outbound connections
initiated from the cluster.
Row level security
• Provides fine control of access to table data by different users
• Allow specifying user access to specific rows in tables
• Provides mechanics to mask PII data in tables
69
.create-or-alter function with () TrimCreditCardNumbers() {
let UserCanSeeFullNumbers = current_principal_is_member_of('aadgroup=super_group@domain.com');
let AllData = Customers | where UserCanSeeFullNumbers;
let PartialData = Customers | where not(UserCanSeeFullNumbers) | extend CreditCardNumber = "****";
union AllData, PartialData
}
.alter table Customers policy row_level_security enable "TrimCreditCardNumbers"
Leader and Follower
• Azure Data Share creates a symbolic link between two ADX cluster.
• Sharing occurs in near-real-time (no data pipeline)
• ADX Decouples the storage and compute
• Allows customers to run multiple compute (read-only) instances on the same underlying storage
• You can attach a database as a follower database, which is a read-only database on a remote cluster.
• You can share the data at the database level or at the cluster level.
70
The cluster sharing the database is the leader cluster and the
cluster receiving the share is the follower cluster.
A follower cluster can follow one or more leader cluster
databases. The follower cluster periodically synchronizes to
check for changes.
The queries running on the follower cluster use local cache
and don't use the resources of the leader cluster.
Azure Data Share
My experience
71
ADX – A critical perspective
72
What is ADX for me, today
• A Telemetry data Search engine => ELK replacement
• A TSDB envolved in LAMBDA replacements (as WARM path) => OSS
LAMBDA (MinIO + Kafka) replacement
• A Tool to Materialize data into ADLS & SQL
• A Tool for monitoring, summarizing information and send
notifications
73
Which are the OSS Alternatives that we should
compare with?
74
From db-engines.com
Azure Data Explorer
Fully managed big data
interactive analytics platform
Elastic Search
A distributed, RESTful modern
search and analytics engine
ADX can be a replacement for search and log analytics engines such as Elasticsearch, Splunk, InfluxDB.
Splunk
real-time insights Engine to
boost productivity & security.
InfluxDB
DBMS for storing time series,
events and metrics
Vs
Comparison chart
75
Name Elasticsearch (ELASTIC) InfluxDB (InfluxData Inc.) Azure Data Explorer (Microsoft) Splunk (Splunk Inc.)
Description A distributed, RESTful modern search and
analytics engine based on Apache Lucene
DBMS for storing time series, events and
metrics
Fully managed big data interactive
analytics platform
Analytics Platform for Big Data
Database models Search engine, Document store Time Series DBMS Time Series DBMS, Search engine,
Document store , Event Store,
Relational DBMS
Search engine
Initial release 2010 2013 2019 2003
License Open Source Open Source commercial commercial
Cloud-based only no no yes no
Implementation language Java Go
Server operating systems All OS with a Java VM Linux, OS X hosted Linux, OS X, Solaris, Windows
Data scheme schema-free schema-free Fixed schema with schema-less datatypes
(dynamic)
yes
Typing yes Numeric data and Strings yes yes
XML support no no yes yes
Secondary indexes yes no all fields are automatically indexed yes
SQL SQL-like query language SQL-like query language Kusto Query Language (KQL), SQL
subset
no
APIs and other access methods RESTful HTTP/JSON API HTTP API RESTful HTTP API HTTP REST
Java API JSON over UDP Microsoft SQL Server communication
protocol (MS-TDS)
Supported programming
languages
.Net, Java, JavaScript, Python .Net, Java, JavaScript, Python .Net, Java, JavaScript, Python .Net, Java, JavaScript, Python
Ruby, PHP, Perl, Groovy, Community
Contributed Clients
R,Ruby,PHP,Perl,Haskell,Clojure,Erlang,Go,L
isp,Rust,Scala
R, PowerShell Ruby, PHP
Server-side scripts yes no Yes, possible languages: KQL, Python, R yes
Triggers yes no yes yes
Partitioning methods Sharding Sharding Sharding Sharding
Replication methods yes selectable replication factor yes Master-master replication
MapReduce ES-Hadoop Connector no no yes
Consistency concepts Eventual Consistency Eventual Consistency Eventual Consistency
Immediate Consistency
Foreign keys no no no no
Transaction concepts no no no no
Concurrency yes yes yes yes
Durability yes yes yes yes
In-memory capabilities Memcached and Redis integration yes no no
ADX – Wrap up
76
Why ADX is Unique
77
Simplified costs
• Vm costs
• ADX service add on
cost
Many Prebuilt Inputs
• ADF
• Iothub
• EventHub
• Storage
• Logstash
• Kafka
• Fluent bit
Many Prebuilt Outputs
• Power BI
• ODBC Connector
• Jupyter
• Grafana
Azure Data Explorer
78
Blob
Python
SDK
IoT Hub
.NET SDK
Azure Data
Explorer
REST API
Event Hub
.NET SDK
Python SDK
Web UI
Desktop App
Jupyter
Magic
APIs UX
Power BI
Direct Query
Microsoft
Flow
Azure App
Logic
Connectors
Grafana
ADF
MS-TDS
Java SDK
Java Script
Monaco IDE
Azure
Notebooks
Protocols
Streaming
Bulk
APIs
Queued
Ingestion Direct
Java SDK
• Metrics and time-series data
• Text search and text analytics
• Multi-dimensional/relational
analysis
Comprehensive Strength
• Simple and powerful
• Publicly available
• Data Exploration
• Rich relational query language
• Full text Search
• ML Extensibility
Analytics Query language
• Scale out in hardware
• Scale out across geos
• Granular resource utilization
Control
• Cross geo queries
High performance over large data sets
• Low Latency ingestion
• Schema management
• Compression and indexing
• Retention
• Hot/cold resource allocation
Data Ingestion and Management
Everything our User
Group Has To Offer
Get involved in
our Meetup
Join the conversation on
our Facebook group
Follow our page
on Facebook
Follow our Videos
on Youtube
Explore
https://bit.ly/2P9sqLy https://bit.ly/2QqAWX4
https://bit.ly/3auvRnD
https://bit.ly/3n8l5bP
Thank you to our Sponsor
Summary
• Use ADX for
• Understand data
• Visualize KPI and refine them
• Manage Long Term storage strategy
• Feed a Facts table in DWH
• Trigger daily auto checks
Riccardo.zamana@gmail.com

Weitere ähnliche Inhalte

Was ist angesagt?

Secrets of the DSpace Submission Form
Secrets of the DSpace Submission FormSecrets of the DSpace Submission Form
Secrets of the DSpace Submission FormBram Luyten
 
Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017Vinoth Chandar
 
Elastic search overview
Elastic search overviewElastic search overview
Elastic search overviewABC Talks
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache SparkSamy Dindane
 
Introducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data ScienceIntroducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data ScienceDatabricks
 
Apache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudApache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudDatabricks
 
Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...
Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...
Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...Riccardo Zamana
 
ElasticSearch Basic Introduction
ElasticSearch Basic IntroductionElasticSearch Basic Introduction
ElasticSearch Basic IntroductionMayur Rathod
 
Care and Feeding of Catalyst Optimizer
Care and Feeding of Catalyst OptimizerCare and Feeding of Catalyst Optimizer
Care and Feeding of Catalyst OptimizerDatabricks
 
Interactive Analytics in Human Time
Interactive Analytics in Human TimeInteractive Analytics in Human Time
Interactive Analytics in Human TimeDataWorks Summit
 
Log management with ELK
Log management with ELKLog management with ELK
Log management with ELKGeert Pante
 
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Edureka!
 
Beyond SQL: Speeding up Spark with DataFrames
Beyond SQL: Speeding up Spark with DataFramesBeyond SQL: Speeding up Spark with DataFrames
Beyond SQL: Speeding up Spark with DataFramesDatabricks
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneRahul Jain
 
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...Edureka!
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearchhypto
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsDatabricks
 
Elasticsearch From the Bottom Up
Elasticsearch From the Bottom UpElasticsearch From the Bottom Up
Elasticsearch From the Bottom Upfoundsearch
 
MongoDB - Aggregation Pipeline
MongoDB - Aggregation PipelineMongoDB - Aggregation Pipeline
MongoDB - Aggregation PipelineJason Terpko
 

Was ist angesagt? (20)

Secrets of the DSpace Submission Form
Secrets of the DSpace Submission FormSecrets of the DSpace Submission Form
Secrets of the DSpace Submission Form
 
Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017
 
Elastic search overview
Elastic search overviewElastic search overview
Elastic search overview
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
ELK Stack
ELK StackELK Stack
ELK Stack
 
Introducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data ScienceIntroducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data Science
 
Apache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudApache Spark At Scale in the Cloud
Apache Spark At Scale in the Cloud
 
Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...
Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...
Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...
 
ElasticSearch Basic Introduction
ElasticSearch Basic IntroductionElasticSearch Basic Introduction
ElasticSearch Basic Introduction
 
Care and Feeding of Catalyst Optimizer
Care and Feeding of Catalyst OptimizerCare and Feeding of Catalyst Optimizer
Care and Feeding of Catalyst Optimizer
 
Interactive Analytics in Human Time
Interactive Analytics in Human TimeInteractive Analytics in Human Time
Interactive Analytics in Human Time
 
Log management with ELK
Log management with ELKLog management with ELK
Log management with ELK
 
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
 
Beyond SQL: Speeding up Spark with DataFrames
Beyond SQL: Speeding up Spark with DataFramesBeyond SQL: Speeding up Spark with DataFrames
Beyond SQL: Speeding up Spark with DataFrames
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of Lucene
 
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark Metrics
 
Elasticsearch From the Bottom Up
Elasticsearch From the Bottom UpElasticsearch From the Bottom Up
Elasticsearch From the Bottom Up
 
MongoDB - Aggregation Pipeline
MongoDB - Aggregation PipelineMongoDB - Aggregation Pipeline
MongoDB - Aggregation Pipeline
 

Ähnlich wie Data saturday malta - ADX Azure Data Explorer overview

Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...Michael Rys
 
USQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventUSQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventTrivadis
 
Self-Service Data Ingestion Using NiFi, StreamSets & Kafka
Self-Service Data Ingestion Using NiFi, StreamSets & KafkaSelf-Service Data Ingestion Using NiFi, StreamSets & Kafka
Self-Service Data Ingestion Using NiFi, StreamSets & KafkaGuido Schmutz
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFAmazon Web Services
 
10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About 10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About Jesus Rodriguez
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming VisualizationGuido Schmutz
 
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)Amazon Web Services Korea
 
Elasticsearch + Cascading for Scalable Log Processing
Elasticsearch + Cascading for Scalable Log ProcessingElasticsearch + Cascading for Scalable Log Processing
Elasticsearch + Cascading for Scalable Log ProcessingCascading
 
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...WSO2
 
aip_developer_overview_icar_2014
aip_developer_overview_icar_2014aip_developer_overview_icar_2014
aip_developer_overview_icar_2014Matthew Vaughn
 
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)Trivadis
 
Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageSATOSHI TAGOMORI
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarRTTS
 
AWS March 2016 Webinar Series Building Your Data Lake on AWS
AWS March 2016 Webinar Series Building Your Data Lake on AWS AWS March 2016 Webinar Series Building Your Data Lake on AWS
AWS March 2016 Webinar Series Building Your Data Lake on AWS Amazon Web Services
 
(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWSAmazon Web Services
 
Time Series Analytics Azure ADX
Time Series Analytics Azure ADXTime Series Analytics Azure ADX
Time Series Analytics Azure ADXRiccardo Zamana
 

Ähnlich wie Data saturday malta - ADX Azure Data Explorer overview (20)

Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
 
USQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventUSQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake Event
 
Using Data Lakes
Using Data Lakes Using Data Lakes
Using Data Lakes
 
Self-Service Data Ingestion Using NiFi, StreamSets & Kafka
Self-Service Data Ingestion Using NiFi, StreamSets & KafkaSelf-Service Data Ingestion Using NiFi, StreamSets & Kafka
Self-Service Data Ingestion Using NiFi, StreamSets & Kafka
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SF
 
10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About 10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
MCT Virtual Summit 2021
MCT Virtual Summit 2021MCT Virtual Summit 2021
MCT Virtual Summit 2021
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
 
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
 
Elasticsearch + Cascading for Scalable Log Processing
Elasticsearch + Cascading for Scalable Log ProcessingElasticsearch + Cascading for Scalable Log Processing
Elasticsearch + Cascading for Scalable Log Processing
 
CC -Unit4.pptx
CC -Unit4.pptxCC -Unit4.pptx
CC -Unit4.pptx
 
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...
 
aip_developer_overview_icar_2014
aip_developer_overview_icar_2014aip_developer_overview_icar_2014
aip_developer_overview_icar_2014
 
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
 
Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby Usage
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing Webinar
 
AWS March 2016 Webinar Series Building Your Data Lake on AWS
AWS March 2016 Webinar Series Building Your Data Lake on AWS AWS March 2016 Webinar Series Building Your Data Lake on AWS
AWS March 2016 Webinar Series Building Your Data Lake on AWS
 
(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS
 
Time Series Analytics Azure ADX
Time Series Analytics Azure ADXTime Series Analytics Azure ADX
Time Series Analytics Azure ADX
 

Mehr von Riccardo Zamana

Copilot Prompting Toolkit_All Resources.pdf
Copilot Prompting Toolkit_All Resources.pdfCopilot Prompting Toolkit_All Resources.pdf
Copilot Prompting Toolkit_All Resources.pdfRiccardo Zamana
 
At the core you will have KUSTO
At the core you will have KUSTOAt the core you will have KUSTO
At the core you will have KUSTORiccardo Zamana
 
Azure Industrial Iot Edge
Azure Industrial Iot EdgeAzure Industrial Iot Edge
Azure Industrial Iot EdgeRiccardo Zamana
 
Azure satpn19 time series analytics with azure adx
Azure satpn19   time series analytics with azure adxAzure satpn19   time series analytics with azure adx
Azure satpn19 time series analytics with azure adxRiccardo Zamana
 
Industrial iot: dalle parole ai fatti
Industrial iot: dalle parole ai fatti Industrial iot: dalle parole ai fatti
Industrial iot: dalle parole ai fatti Riccardo Zamana
 
Azure dayroma java, il lato oscuro del cloud
Azure dayroma   java, il lato oscuro del cloudAzure dayroma   java, il lato oscuro del cloud
Azure dayroma java, il lato oscuro del cloudRiccardo Zamana
 
Industrial Iot - IotSaturday
Industrial Iot - IotSaturday Industrial Iot - IotSaturday
Industrial Iot - IotSaturday Riccardo Zamana
 

Mehr von Riccardo Zamana (9)

Copilot Prompting Toolkit_All Resources.pdf
Copilot Prompting Toolkit_All Resources.pdfCopilot Prompting Toolkit_All Resources.pdf
Copilot Prompting Toolkit_All Resources.pdf
 
At the core you will have KUSTO
At the core you will have KUSTOAt the core you will have KUSTO
At the core you will have KUSTO
 
Azure Industrial Iot Edge
Azure Industrial Iot EdgeAzure Industrial Iot Edge
Azure Industrial Iot Edge
 
Azure satpn19 time series analytics with azure adx
Azure satpn19   time series analytics with azure adxAzure satpn19   time series analytics with azure adx
Azure satpn19 time series analytics with azure adx
 
Industrial iot: dalle parole ai fatti
Industrial iot: dalle parole ai fatti Industrial iot: dalle parole ai fatti
Industrial iot: dalle parole ai fatti
 
Azure dayroma java, il lato oscuro del cloud
Azure dayroma   java, il lato oscuro del cloudAzure dayroma   java, il lato oscuro del cloud
Azure dayroma java, il lato oscuro del cloud
 
Industrial Iot - IotSaturday
Industrial Iot - IotSaturday Industrial Iot - IotSaturday
Industrial Iot - IotSaturday
 
Azure reactive systems
Azure reactive systemsAzure reactive systems
Azure reactive systems
 
Industrial IoT on azure
Industrial IoT on azureIndustrial IoT on azure
Industrial IoT on azure
 

Kürzlich hochgeladen

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 

Kürzlich hochgeladen (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 

Data saturday malta - ADX Azure Data Explorer overview

  • 1. Agenda • ADX Basics: Service Goal, pricing, capabilities • ADX Data Flow: Ingestion, Querying, Visualization • ADX Ecosystem: Integration & Orchestration • ADX Tools: monitoring and Management • ADX Use Cases & best practices
  • 2. Thank you to our Sponsor
  • 3. Everything our User Group Has To Offer Get involved in our Meetup Join the conversation on our Facebook group Follow our page on Facebook Follow our Videos on Youtube Explore https://bit.ly/2P9sqLy https://bit.ly/2QqAWX4 https://bit.ly/3auvRnD https://bit.ly/3n8l5bP
  • 5. Azure Data Explorer in a sentence The Platform Any append- only stream of records Relational query model: Filter, aggregate, join, calculated columns, … Fully- managed Rapid iterations to explore the data High volume High velocity High variance (structured, semi- structured, free-text) PaaS, Vanilla, Database Purposely built ADX in a sentence
  • 6. © Microsoft Corporation Azure Data Explorer use cases IoT applications Discover and address performance issues with machines, equipment, and devices in real-time to optimize production quality and productivity. Big data logging platform Enhance customer experiences using digital platforms. Spot trends, patterns, or anomalies within billions of lines of log data to make near instant corrections to improve performance. SaaS applications Build multi-tenant SaaS applications embedded with interactive analytics. Monitor the performance of the application, improve products, and provide business owners insights to boost business outcomes. Nik Shampur Software Development Lead “Azure Data Explorer has improved our analysis capabilities for our product tremendously…The scalability and performance allow us to deeply analyze our collected data and retrieve valuable insights.”
  • 7. © Microsoft Corporation Fast and fully managed data analytics service Fully managed for efficiency Focus on insights, not the infra- structure for fast time to value No infrastructure to manage; provision the service, choose the SKU for your workload, and create database. Optimized for streaming data Get near-instant insights from fast-flowing data Scale linearly up to 200 MB per second per node with highly performant, low latency ingestion. Designed for data exploration Run ad-hoc queries using the intuitive query language Returns results from 1 Billion records < 1 second without modifying the data or metadata
  • 8. © Microsoft Corporation Azure Data Explorer overview 1. Capability for many data types, formats, and sources Structured (numbers), semi-structured (JSONXML), and free text 2. Batch or streaming ingestion Use managed ingestion pipeline or queue a request for pull ingestion 3. Compute and storage isolation • Independent scale out / scale in • Persistent data in Azure Blob Storage • Caching for low-latency on compute 4. Multiple options to support data consumption Use out-of-the box tools and connectors or use APIs/SDKs for custom solution Data Lake / Blob IoT Ingested Data Engine Data Management Azure Data Explorer Azure Storage Event Hub IoT Hub Customer Data Lake Kafka Sync Logstash Plugin Event Grid Azure Portal Power BI ADX Web UI ODBC / JDBC Apps Apps (Via API) Logstash Plugin Apps (Via API) Create, Manage Stream Batch Grafana Query, Control Commands Azure OSS Applications Active Data Connections
  • 9. © Microsoft Corporation Intuitive querying Designed for data exploration Simple and powerful • Rich rational query language (filter, aggregate, join, calculated columns, and more) • Built-in full-text search, time series, user analytics, and machine learning operators • Out-of-the box visualization (render) • Easy-to-use syntax + Microsoft IntelliSense • Highly recognizable hierarchical schema entities Comprehensive • Built for querying over structured, semi-structured and unstructured data simultaneously Extensible • In-line Python • SQL
  • 10. © Microsoft Corporation Easy provisioning • No infrastructure to manage: Azure PaaS • Use Azure Portal, APIs, or PowerShell to provision • Storage Optimize/Compute Optimize SKUs • Flexible data caching and retention options at database and table level Rapid elasticity • Buy only what you need • Scale out/in manually or use autoscale • Dedicated resources Maintenance-free • All columns are compressed and indexed during ingestion • No index maintenance required Simple provisioning Fully managed for efficiency
  • 11. • seconds freshness, days retention • in-mem aggregated data • pre-defined standing queries • split-seconds query performance • data viewing Hot • minutes freshness, months retention • raw data • ad-hoc queries • seconds-minutes query perf • data exploration Warm • hours freshness, years retention • raw data • programmatic batch processing • minutes-hours query perf • data manipulation Cold • in-mem cube • stream analytics • … • column store • Indexing • … • distributed file system • map reduce • … Multi-temperature data processing paths
  • 12. The role of ADX 12 Raw data DWH Refined data Real time derived data Data comparison and fast kpi ADX THREE KEY USERS IN ONE TOOL: • IoT Developer (data check, rule engine for insights) • Data engineer (data comparison) • Data scientist (data exploration)
  • 13. How ADX is Organized 13 INSTANCE DATABASE SOURCES DB Users/Apps Ingestion URL Querying URL Cache storage Blob storage EXTERNAL SOURCES EXTERNAL DESTINATIONS IotHUB EventHub Storage ADLS Sql Server MANY..
  • 14. ADX – Ingest data 14
  • 15. FIRST PHASE: Ingestion 15 • Many connections & Plugins • Many SDKs • Many managed pipelines • Many tools to Ingest Rapidly Managed pipelines: • Ingest blob using EventGrid • Ingest Eventhub stream • Ingest IotHub stream • Ingest data from ADF Connections & Plugins: • Logstash plugin • Kafka Connector • Apache spark Connector Many SDK: • Python SDK • .NET SDK • Java SDK • Node SDK • REST API • GO API Tools: • One click ingestion • LightIngest
  • 16. Ingestion Types: 16 • Streaming ingestion: Optimized for low volume of data per table, over thousands of tables • Operation completes in under 10 seconds • Data available for query after completion • Batching ingestion: optimized for high ingestion throughput • Default batch params: 5 minutes, 500 items, or 1000MB
  • 17. Ingestion Tecniques 17 For high-volume, reliable, and cheap data ingestion Batch ingestion (provided by SDK) the client uploads the data to Azure Blob storage (designated by the Azure Data Explorer data management service) and posts a notification to an Azure Queue. Batch ingestion is the recommended technique. Most appropriate for exploration and prototyping .Inline ingestion (provided by query tools) Inline ingestion: control command (.ingest inline) containing in-band data is intended for ad hoc testing purposes. Ingest from query: control command (.set, .set-or-append, .set-or-replace) that points to query results is used for generating reports or small temporary tables. Ingest from storage: control command (.ingest into) with data stored externally (for example, Azure Blob Storage) allows efficient bulk ingestion of data.
  • 18. Ingestion: Format & UseCases 18 For all ingestion methods other than ingest from query, format the data so that Azure Data Explorer can parse it. The supported data formats are: • CSV, TSV, TSVE, PSV, SCSV, SOH • JSON (line-separated, multi-line), Avro, MultiJSON (jsonLine), ORC, Parquet • Files/Blobs can be compressed: ZIP, GZIP • Better to use declarative names: MyData.csv.zip, MyData.json.gz
  • 19. Supported data formats 19 Schema mapping helps bind source data fields to destination table columns. • CSV Mapping (optional) works with all ordinal-based formats. It can be performed using the ingest command parameter or pre-created on the table and referenced from the ingest command parameter. • JSON Mapping (mandatory) and Avro mapping (mandatory) can be performed using the ingest command parameter. They can also be pre-created on the table and referenced from the ingest command parameter.
  • 20. [ { "column" : "rownumber", "Properties":{"Ordinal":"0"}}, { "column" : "rowguid", "Properties":{"Ordinal":"1"}}, { "column" : "xdouble", "Properties":{"Ordinal":"2"}}, { "column" : "xbool", "Properties":{"Ordinal":"3"}}, { "column" : "xint32", "Properties":{"Ordinal":"4"}}, { "column" : "xint64", "Properties":{"Ordinal":"5"}}, { "column" : "xdate", "Properties":{"Ordinal":"6"}}, { "column" : "xtext", "Properties":{"Ordinal":"7"}}, { "column" : "const_val", "Properties":{"ConstValue":"Sample: constant value"}} ]
  • 21. [ { "column" : "rownumber", "Properties":{"Path":"$.rownumber"}}, { "column" : "rowguid", "Properties":{"Path":"$.rowguid"}}, { "column" : "xdouble", "Properties":{"Path":"$.xdouble"}}, { "column" : "xbool", "Properties":{"Path":"$.xbool"}}, { "column" : "xint32", "Properties":{"Path":"$.xint32"}}, { "column" : "xint64", "Properties":{"Path":"$.xint64"}}, { "column" : "xdate", "Properties":{"Path":"$.xdate"}}, { "column" : "xtext", "Properties":{"Path":"$.xtext"}}, { "column" : "location", "Properties":{"transform":"SourceLocation"}}, { "column" : "lineNumber", "Properties":{"transform":"SourceLineNumber"}}, { "column" : "timestamp", "Properties":{"Path":"$.unix_ms", "transform":"DateTimeFromUnixMilliseconds"}}, { "column" : "full_record", "Properties":{"Path":"$"}} ]
  • 22. Demo
  • 23. What is LightIngest 23 • command-line utility for ad-hoc data ingestion into Kusto • pull source data from a local folder • pull source data from an Azure Blob Storage container • Useful to ingest fastly and play with ADX • Most useful when you want to ingest a large amount of data, (time constraint on ingestion duration) [Ingest JSON data from blobs] LightIngest "https://adxclu001.kusto.windows.net;Federated=true" -database:db001 -table:LAB - sourcePath:"https://ACCOUNT_NAME.blob.core.windows.net/CONTAINER_NAME?SAS_TOKEN" -prefix:MyDir1/MySubDir2 -format:json -mappingRef:DefaultJsonMapping -pattern:*.json -limit:100 [Ingest CSV data with headers from local files] LightIngest "https://adxclu001.kusto.windows.net;Federated=true" -database:MyDb -table:MyTable -sourcePath:"D:MyFolderData" -format:csv -ignoreFirstRecord:true -mappingPath:"D:MyFolderCsvMapping.txt" -pattern:*.csv.gz -limit:100 REFERENCE: https://docs.microsoft.com/en- us/azure/kusto/tools/lightingest
  • 24. LightIngest: pay attention IngestionTime! 24 IMPORTANT: All the data is indexed but... How is partitioned???? By Ingestion TIME !!! the -creationTimePattern argument allows users to partition the data by creation time, not ingestion time [Ingest CSV data with headers from local files] LightIngest "https://adxclu001.kusto.windows.net;Federated=true" -database:MyDb -table:MyTable -sourcePath:"D:MyFolderData" -format:csv -ignoreFirstRecord:true -mappingPath:"D:MyFolderCsvMapping.txt" -pattern:*.csv.gz -limit:100 [Ingest JSON data from blobs] LightIngest "https://adxclu001.kusto.windows.net;Federated=true" -database:db001 -table:LAB -sourcePath: "https://ACCOUNT_NAME.blob.core.windows.net/CON TAINER_NAME?SAS_TOKEN" -prefix:MyDir1/MySubDir2 -format:json -mappingRef:DefaultJsonMapping -pattern:*.json -limit:100
  • 25. One Click ingestion GA 25 • One Click makes ingestion (intuitive UX) • Start ingesting data , creating tables and mapping structures • Different data formats STEPS: 1. Check your data 2. Study the best format compression 3. Create and destroy tons of test tables 4. Derive the Mapping 5. SCRIPT ALL and Version It
  • 26. My ingestion best experience 26 Open points: • Why EventHub after IotHub? • Why the second EventHub?
  • 27. Update Policy 27 Automatically append data to a target table whenever new data is inserted into the source table, based on a transformation query that runs on the data inserted into the source table. USE IT IF: • The source table is as a «free-text column based» • The target table accepts only specific morphology Cascading updates are allowed (TableA → TableB → TableC → ...). Raw table Refined table
  • 28. How to use Update Policy 28 // Create a function that will be used for update .create function MyUpdateFunction() { MyTableX | where ColumnA == 'some-string' | summarize MyCount=count() by ColumnB, Key=ColumnC | join (OtherTable | project OtherColumnZ, Key=OtherColumnC) on Key | project ColumnB, ColumnZ=OtherColumnZ, Key, MyCount } // Create the target table (if it doesn't already exist) .set-or-append DerivedTableX <| MyUpdateFunction() | limit 0 // Use update policy on table DerivedTableX .alter table DerivedTableX policy update @'[{"IsEnabled": true, "Source": "MyTableX", "Query": "MyUpdateFunction()", "IsTransactional": false, "PropagateIngestionProperties": false}]’ .delete table DerivedTableX policy update
  • 29. Pay attention to failures! 29 Evaluate resource usage .show table MySourceTable extents; // The following line provides the extent ID for the not-yet-merged extent in the source table which has the most records let extentId = $command_results | where MaxCreatedOn > ago(1hr) and MinCreatedOn == MaxCreatedOn | top 1 by RowCount desc | project ExtentId; let MySourceTable = MySourceTable | where extent_id() == toscalar(extentId); MyFunction() Failures .show ingestion failures | where FailedOn > ago(1hr) and OriginatesFromUpdatePolicy == true • Non-transactional policy: ignored • Transactional policy: If the ingestion method is pull => automated retry on the entire ingestion operation (max time) SO: You should check failures to trigger «BROKEN FILES» … but HOW?
  • 30. Use this pattern 30 First table is NEVER wide!! … but YES for the second! First table schema is K,V,TS,Metadata Second table schema is WT (Wide Table) Telemetry oriented ML oriented
  • 31. Demo
  • 32. ADX – Query data 32
  • 33. Kusto for SQL USers 33 • Perform SQL SELECT (no DDL, only SELECT) • Use KQL (Kusto Query Language) • Supports translating T-SQL queries to Kusto query language explain select top(10) * from StormEvents order by DamageProperty desc StormEvents | sort by DamageProperty desc nulls first | take 10
  • 34. Some code Examples 34 Query with between Function with parameters «ToScalar» expression «Extend» usage
  • 35. Language examples 35 Alias database["wiki"] = cluster("https://somecluster.kusto.windows.net:443") .database("somedatabase"); database("wiki").PageViews | count Let start = ago(5h); let period = 2h; T | where Time > start and Time < start + period | ... Bin: T | summarize Hits=count() by bin(Duration, 1s) Batch: let m = materialize(StormEvents | summarize n=count() by State); m | where n > 2000; m | where n < 10 Tabular expression: Logs | where Timestamp > ago(1d) | join ( Events | where continent == 'Europe' ) on RequestId
  • 36. Time Series Analysis – Bin Operator 36 T | summarize Hits=count() by bin(Duration, 1s) bin(value,roundTo) bin operator Rounds values down to an integer multiple of a given bin size. If you have a scattered set of values, they will be grouped into a smaller set of specific values. [Rule] [Example]
  • 37. Time Series Analysis – Make Series Operator 37 T | make-series sum(amount) default=0, avg(price) default=0 on timestamp from datetime(2016-01- 01) to datetime(2016-01-10) step 1d by supplier T | make-series [MakeSeriesParamters] [Column =] Aggregation [default = DefaultValue] [, ...] on AxisColumn from start to end step step [by [Column =] GroupExpression [, ...]] make-series operator [Rule] [Example]
  • 38. Time Series Analysis – Basket Operator 38 StormEvents | where monthofyear(StartTime) == 5 | extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" , "NO") | project State, EventType, Damage, DamageCrops | evaluate basket(0.2) basket operator Basket finds all frequent patterns of discrete attributes (dimensions) in the data and will return all frequent patterns that passed the frequency threshold in the original query. [Rule] [Example] T | evaluate basket([Threshold, WeightColumn, MaxDimensions, CustomWildcard, CustomWildcard, ...])
  • 39. Time Series Analysis – Autocluster Operator 39 StormEvents | where monthofyear(StartTime) == 5 | extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" , "NO") | project State , EventType , Damage | evaluate autocluster(0.6) autocluster operator AutoCluster finds common patterns of discrete attributes (dimensions) in the data and will reduce the results of the original query (whether it's 100 or 100k rows) to a small number of patterns. [Rule] [Example] T | evaluate autocluster([SizeWeight, WeightColumn, NumSeeds, CustomWildcard, CustomWildcard, ...]) StormEvents | where monthofyear(StartTime) == 5 | extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" , "NO") | project State , EventType , Damage | evaluate autocluster(0.2, '~', '~', '*')
  • 40. Demo
  • 41. ADX Functions 41 Functions are reusable queries or query parts. Kusto supports several kinds of functions: • Stored functions, which are user-defined functions that are stored and managed a one kind of a database's schema entities. See Stored functions. • Query-defined functions, which are user-defined functions that are defined and used within the scope of a single query. The definition of such functions is done through a let statement. See User-defined functions. • Built-in functions, which are hard-coded (defined by Kusto and cannot be modified by users).
  • 42. Materialized views 42 The view expose an always up-to-date view of the defined aggregation. Advantages: • Performance improvement • Freshness • Cost reduction Behind the scenes: • Source table is periodically materialized into the view table • During the query time, the view combines the materialized part with the DELTA in raw table since last materialization to return complete results
  • 43. Demo
  • 44. ADX – Export data 44
  • 45. Export 45 • To Storage .export async compressed to csv ( h@"https://storage1.blob.core.windows.net/containerName;secretKey", h@"https://storage1.blob.core.windows.net/containerName2;secretKey" ) with ( sizeLimit=100000, namePrefix=export, includeHeaders=all, encoding =UTF8NoBOM ) <| myLogs | where id == "moshe" | limit 10000 • To Sql .export async to sql ['dbo.MySqlTable'] h@"Server=tcp:myserver.database.windows.net,1433;Database=MyDatabas e;Authentication=Active Directory Integrated;Connection Timeout=30;" with (createifnotexists="true", primarykey="Id") <| print Message = "Hello World!", Timestamp = now(), Id=12345678 1. DEFINE COMMAND Define ADX command and try your recurrent export strategy 2. TRY IN EDITOR Use an Editor to try command, verifying conection strings and parametrizing them 3. BUILD A JOB Build a Notebook or a C# JOB using the command as a SQL QUERY in your CODE
  • 46. External tables & Continuous Export 46 • It’s an external endpoint: • Azure Storage • Azure Datalake Store • SQL Server • You need to define: • Destination • Continuous-Export Strategy EXT TABLE CREATION .create external table ExternalAdlsGen2 (Timestamp:datetime, x:long, s:string) kind=adl partition by bin(Timestamp, 1d) dataformat=csv ( h@'abfss://filesystem@storageaccount.dfs.core.windows.net/path;secre tKey' ) with ( docstring = "Docs", folder = "ExternalTables", namePrefix="Prefix" ) EXPORT to EXT TABLE .create-or-alter continuous-export MyExport over (T) to table ExternalAdlsGen2 with (intervalBetweenRuns=1h, forcedLatency=10m, sizeLimit=104857600) <| T
  • 47. My best experience 47 Open points • How to extract insights, using dynamic and codeless approach? • Ho to integrate ADX with low cost DB solutions?
  • 48. My final ADX recipe 48 Blob Storage RawTables Refined Tables Triggered dynamic check queries Datalake (long term buckets) SQL DWH Update policy External table Materialized view Batch ingestion External table
  • 49. ADX – View data 49
  • 50. ADX Dashboards 50 • Integration in KUSTO Web Explorer • Optimized for big data • Using powerful KQL to retrieve visual data • Make dynamic views or widgets
  • 52. Demo
  • 53. Grafana query builder 53 • Create Grafana panels with no KQL knowledge • Select values/filter/grouping using simple UI dropdowns • Switch to RawMode to enhance queries with KQL
  • 54. How to use Grafana easily 54 Go to All Plugins section, search ADX Datasource and install plugin
  • 55. How to use Grafana easily 55 Go to https://grafana.com/ Signup and get and Account
  • 56. How to use Grafana easily 56 Go to your grafana https://<workbenchname>.grafana.net/datasources And configure ADX datasource And then Start building dashboards!
  • 57. ADX – Use Data 57
  • 58. How about orchestration? Three use cases in which FLOW + KUSTO are the solution Push data to Power BI dataset Periodically do queries, and push to PowerBI dataset Conditional queries Make data checks, and send notifications with no code Email multiple ADX Flow charts Send incredible emails with HTML5 Chart as query result
  • 59. Orchestration? Manage costs Starting and stopping cluster, evaluating a condition Query sets to check data Plan a Set of Queries in order to say «IT’S OK, even Today !» Manage data retention Based on dynamic condition
  • 60. An Example of: 60 1. Set trigger 2. Connect and test ADX BLOCK 3. Configure Email BLOCK with dynamic params
  • 61. And the result is: 61
  • 62. ADX – Security & Management 62
  • 63. Data encryption in ADX • encryption rest (using Azure Storage • A Microsoft-managed key is used • customer-managed keys can be enabled • key rotation, temporary disable and revoke access controls can be implemented. • Soft Delete and Purge Protection will be enabled on the Key Vault and cannot be disabled. 63
  • 64. Extents, policies and Partition • What are data shards or extents • Column, segments, and blocks • merge policy and sharding policy • Data partitioning policy (post-ingestion) 64
  • 65. FACTS: A) Kusto stores its ingested data in reliable storage (most commonly Azure Blob Storage). B) To speed-up queries on that data, Kusto caches this data (or parts of it) on its processing nodes, The Kusto cache provides a granular cache policy that customers can use to differentiate between two data cache policies: hot data cache and cold data cache. set query_datascope="hotcache"; T | union U | join (T datascope=all | where Timestamp < ago(365d) on X YOU CAN SPECIFY WHICH LOCATION MUST BE USED Cache policy is independent from retention policy ! Retention policy 65
  • 66. Retention policy 66 • Soft Delete Period (number) • Data is available for query ts is the ADX IngestionDate • Default is set to 100 YEARS • Recoverability (enabled/disabled) • Default is set to ENABLED • Recoverable for 14 days after deletion .alter database DatabaseName policy retention "{}" .alter table TableName policy retention "{}" EXAMPLE: { "SoftDeletePeriod": "36500.00:00:00", "Recoverability":"Enabled" } .delete database DatabaseName policy retention .delete table TableName policy retention .alter-merge table MyTable1 policy retention softdelete = 7d 2 Parameters, applicable to DB or Table
  • 67. Data Purge 67 PURGE PROCESS: 1. It requires database admin permissions 2. Prior to Purging you have to be ENABLED, opening a SUPPORT TICKET. 3. Run purge QUERY, and identify SIZE, EXEC.TIME and give VerificationToken 4. Run REALLY purge QUERY passing Verification Token .purge table MyTable records in database MyDatabase <| where CustomerId in ('X', 'Y') NumRecordsToPurge EstimatedPurge ExecutionTime VerificationToken 1,596 00:00:02 e43c7184ed22f4f 23c7a9d7b124d19 6be2e570096987 e5baadf65057fa6 5736b .purge table MyTable records in database MyDatabase with (verificationtoken='e43c7184ed22f4f23c7a9d7b124d196be2e570 096987e5baadf65057fa65736b') <| where CustomerId in ('X', 'Y') .purge table MyTable records in database MyDatabase with (noregrets='true') 2 STEP PROCESS 1 STEP PROCESS With No Regrets !!!!
  • 68. Virtual Network BENEFITS • USE NSG rules to limit traffic. • Connect your on-premise network to Azure Data Explorer cluster's subnet. • Secure your data connection sources (Event Hub and Event Grid) with service endpoints. 68 VNET gives you TWO Independent IPs • Private IP: access the cluster inside the VNet. • Public IP: access the cluster from outside the VNet (management and monitoring) and as a source address for outbound connections initiated from the cluster.
  • 69. Row level security • Provides fine control of access to table data by different users • Allow specifying user access to specific rows in tables • Provides mechanics to mask PII data in tables 69 .create-or-alter function with () TrimCreditCardNumbers() { let UserCanSeeFullNumbers = current_principal_is_member_of('aadgroup=super_group@domain.com'); let AllData = Customers | where UserCanSeeFullNumbers; let PartialData = Customers | where not(UserCanSeeFullNumbers) | extend CreditCardNumber = "****"; union AllData, PartialData } .alter table Customers policy row_level_security enable "TrimCreditCardNumbers"
  • 70. Leader and Follower • Azure Data Share creates a symbolic link between two ADX cluster. • Sharing occurs in near-real-time (no data pipeline) • ADX Decouples the storage and compute • Allows customers to run multiple compute (read-only) instances on the same underlying storage • You can attach a database as a follower database, which is a read-only database on a remote cluster. • You can share the data at the database level or at the cluster level. 70 The cluster sharing the database is the leader cluster and the cluster receiving the share is the follower cluster. A follower cluster can follow one or more leader cluster databases. The follower cluster periodically synchronizes to check for changes. The queries running on the follower cluster use local cache and don't use the resources of the leader cluster. Azure Data Share
  • 72. ADX – A critical perspective 72
  • 73. What is ADX for me, today • A Telemetry data Search engine => ELK replacement • A TSDB envolved in LAMBDA replacements (as WARM path) => OSS LAMBDA (MinIO + Kafka) replacement • A Tool to Materialize data into ADLS & SQL • A Tool for monitoring, summarizing information and send notifications 73
  • 74. Which are the OSS Alternatives that we should compare with? 74 From db-engines.com Azure Data Explorer Fully managed big data interactive analytics platform Elastic Search A distributed, RESTful modern search and analytics engine ADX can be a replacement for search and log analytics engines such as Elasticsearch, Splunk, InfluxDB. Splunk real-time insights Engine to boost productivity & security. InfluxDB DBMS for storing time series, events and metrics Vs
  • 75. Comparison chart 75 Name Elasticsearch (ELASTIC) InfluxDB (InfluxData Inc.) Azure Data Explorer (Microsoft) Splunk (Splunk Inc.) Description A distributed, RESTful modern search and analytics engine based on Apache Lucene DBMS for storing time series, events and metrics Fully managed big data interactive analytics platform Analytics Platform for Big Data Database models Search engine, Document store Time Series DBMS Time Series DBMS, Search engine, Document store , Event Store, Relational DBMS Search engine Initial release 2010 2013 2019 2003 License Open Source Open Source commercial commercial Cloud-based only no no yes no Implementation language Java Go Server operating systems All OS with a Java VM Linux, OS X hosted Linux, OS X, Solaris, Windows Data scheme schema-free schema-free Fixed schema with schema-less datatypes (dynamic) yes Typing yes Numeric data and Strings yes yes XML support no no yes yes Secondary indexes yes no all fields are automatically indexed yes SQL SQL-like query language SQL-like query language Kusto Query Language (KQL), SQL subset no APIs and other access methods RESTful HTTP/JSON API HTTP API RESTful HTTP API HTTP REST Java API JSON over UDP Microsoft SQL Server communication protocol (MS-TDS) Supported programming languages .Net, Java, JavaScript, Python .Net, Java, JavaScript, Python .Net, Java, JavaScript, Python .Net, Java, JavaScript, Python Ruby, PHP, Perl, Groovy, Community Contributed Clients R,Ruby,PHP,Perl,Haskell,Clojure,Erlang,Go,L isp,Rust,Scala R, PowerShell Ruby, PHP Server-side scripts yes no Yes, possible languages: KQL, Python, R yes Triggers yes no yes yes Partitioning methods Sharding Sharding Sharding Sharding Replication methods yes selectable replication factor yes Master-master replication MapReduce ES-Hadoop Connector no no yes Consistency concepts Eventual Consistency Eventual Consistency Eventual Consistency Immediate Consistency Foreign keys no no no no Transaction concepts no no no no Concurrency yes yes yes yes Durability yes yes yes yes In-memory capabilities Memcached and Redis integration yes no no
  • 76. ADX – Wrap up 76
  • 77. Why ADX is Unique 77 Simplified costs • Vm costs • ADX service add on cost Many Prebuilt Inputs • ADF • Iothub • EventHub • Storage • Logstash • Kafka • Fluent bit Many Prebuilt Outputs • Power BI • ODBC Connector • Jupyter • Grafana
  • 78. Azure Data Explorer 78 Blob Python SDK IoT Hub .NET SDK Azure Data Explorer REST API Event Hub .NET SDK Python SDK Web UI Desktop App Jupyter Magic APIs UX Power BI Direct Query Microsoft Flow Azure App Logic Connectors Grafana ADF MS-TDS Java SDK Java Script Monaco IDE Azure Notebooks Protocols Streaming Bulk APIs Queued Ingestion Direct Java SDK
  • 79. • Metrics and time-series data • Text search and text analytics • Multi-dimensional/relational analysis Comprehensive Strength • Simple and powerful • Publicly available • Data Exploration • Rich relational query language • Full text Search • ML Extensibility Analytics Query language • Scale out in hardware • Scale out across geos • Granular resource utilization Control • Cross geo queries High performance over large data sets • Low Latency ingestion • Schema management • Compression and indexing • Retention • Hot/cold resource allocation Data Ingestion and Management
  • 80. Everything our User Group Has To Offer Get involved in our Meetup Join the conversation on our Facebook group Follow our page on Facebook Follow our Videos on Youtube Explore https://bit.ly/2P9sqLy https://bit.ly/2QqAWX4 https://bit.ly/3auvRnD https://bit.ly/3n8l5bP
  • 81. Thank you to our Sponsor
  • 82. Summary • Use ADX for • Understand data • Visualize KPI and refine them • Manage Long Term storage strategy • Feed a Facts table in DWH • Trigger daily auto checks Riccardo.zamana@gmail.com