SlideShare ist ein Scribd-Unternehmen logo
1 von 59
Ilyas F
Azure Solution Architect @ 8KMiles
Twitter: @ilyas_tweets
Linkedin: https://in.linkedin.com/in/ilyasf
Azure Data Lake Analytics Deep Dive
2016/05/17
Agenda
Origins
• Cosmos
• Futures
Layers & Components
• Storage
• Parallelization
• Job Scheduling
• Query Execution
• Performance
• Demo
Quick Recap
The 3 Azure Data Lake Services
HDInsight Analytics Store
Clusters as a service Big data queries as a
service
Hyper-scale Storage
optimized for analytics
Currently in PREVIEW. General
Availability later in 2016.
Familiar syntax to millions of SQL & .NET developers
Unifies declarative nature of SQL with the imperative
power of C#
Unifies structured, semi-structured and unstructured data
Distributed query support over all data
U-SQL
A new language for Big Data
History
History
Bing needed to…
• Understand user behavior
And do it…
• At massive scale
• With agility and speed
• At low cost
So they built …
• Cosmos
Cosmos
• Batch Jobs
• Interactive
• Machine Learning
• Streaming
Thousands of Developers
Pricing
Key ADL Analytics Components
ADL Account Configuration
ADL Analytics Account
Links to ADL Stores
ADL Store Account
(the default one)
Job Queue
Key Settings:
- Max Concurrent Jobs
- Max ADLAUs per Job
- Max Queue Length
An ADL Store IS REQUIRED for ADL Analytics to
function.
Key Settings:
• Max Concurrent Jobs = 3
• Max ADLAUs per job = 20
• Max Queue Length = 200
If you want to change the defaults, open a Support
ticket
Links to Azure Blob Stores
U-SQL Catalog
Metadata
U-SQL Catalog
Data
Simplified Workflow
Job Front End
Job Scheduler Compiler Service
Job Queue
Job Manager
U-SQL Catalog
YARN
Job submission
Job execution
U-SQL Runtime Vertex execution
Goal: Understanding a U-SQL
(Batch) Job
Azure Data Lake Analytics
(ADLA) Demo
Job Properties
Job Graph
Job Scheduling
States, Queue, Priority
Job Status in
Visual Studio
Preparing
Queued
Running
Finalizing
Ended
(Succeeded, Failed, Cancelled)
New
Compiling
Queued
Scheduling
Starting
Running
Ended
UX Job State
The script is being compiled by the Compiler Service
All jobs enter the queue.
Are there enough ADLAUs to start the job?
If yes, then allocate those ADLAUs for the job
The U-SQL runtime is now executing the code on 1
or more ADLAUs or finalizing the outputs
The job has concluded.
Why does a Job get Queued?
Local Cause
Conditions:
• Queue already at
Max Concurrency
Global Cause (very rare)
Conditions:
• System-wide shortage of
ADLAUs
• System-wide shortage of
Bandwidth
* If these conditions are met, a job
will be queued even if the queue is
not at its Max Concurrency
State History
The Job Queue
The queue is ordered by
job priority.
Lower numbers -> higher
priority.
1 = highest.
Running jobs
When a job is at the top
of the queue, it will start
running.
Defaults:
Max Running Jobs = 3
Max Tokens per job = 20
Max Queue Size = 200
Priority Doesn’t Preempt Running Jobs
X has Pri=1.
X
A
B
C
X will NOT preempt running jobs. X will have to wait.
These are all running
and have very low
priority (pri=1000)
U-SQL Job Compilation
U-SQL Compilation Process
C#
C++
Algebra
Other files
(system files, deployed resources)
managed dll
Unmanaged dll
Compilation output (in job folder)
Compiler & Optimizer
U-SQL Metadata
Service
Deployed to Vertices
The Job Folder
Inside the Default ADL Store:
/system/jobservice/jobs/Usql/YYYY/MM/DD/hh/mm/JOBID
/system/jobservice/jobs/Usql/2016/01/20/00/00/17972fc2-4737-48f7-81fb-
49af9a784f64
C# code generated by the U-SQL
Compiler
C++ code generated by the U-SQL
Compiler
Cluster Plan a.ka. “Job Graph”
generated by U-SQL Compiler
User-provided .NET Assemblies
User-provided USQL script
Job Folder Contents
Resources
Blue items: the output of the
compiler
Grey items: U-SQL runtime bits
Download all the resources
Download a specific resource
Query Execution
Plans, Vertices, Stages, Parallelism, ADLAUs
Job
Scheduler
& Queue
Front-EndService
30
Optimizer
Vertex Scheduling
Compiler
Runtime
Visual Studio
Portal / API
Query Life
How does the Parallelism
number relate to Vertices
What does Vertices mean?
What is this?
Logical -> Physical Plan
Each square = “a vertex”
represents a fraction of the
total
Vertexes in each SuperVertex
(aka “Stage) are doing the
same operation on different
parts of the same data.
Vertexes in a later stages may
depend on a vertex in an
earlier stage
Visualized like this
Stage Details
252 Pieces of work
AVG Vertex execution time
4.3 Billion rows
Data Read & Written
Automatic Vertex retry
A vertex failed … but was retried
automatically
Overall Stage Completed
Successfully
A vertex might fail because:
- Router congested
- Hardware failure (ex: hard drive failed)
- VM had to be rebooted
U-SQL job will automatically schedule a
vertex on another VM.
ADLAUs
Azure
Data
Lake
Analytics
Unit
Parallelism N = N ADLAUs
1 ADLAU ~=
A VM with 2 cores and 6
GB of memory
Efficiency
Cost vs Latency
Profile isn’t loaded
Profile is loaded now
Click Resource usage
Blue: Allocation
Red: Actual running
Smallest estimated time when
given 2425 ADLAUs
1410 seconds
= 23.5 minutes
Model with 100 ADLAUs
8709 seconds
= 145.5 minutes
𝐽𝑜𝑏𝐶𝑜𝑠𝑡 = 5𝑐 + 𝑚𝑖𝑛𝑢𝑡𝑒𝑠 × 𝐴𝐷𝐿𝑈𝐴𝑠 × 𝐴𝐷𝐿𝐴𝑈𝑐𝑜𝑠𝑡𝑝𝑒𝑟𝑚𝑖𝑛
Allocation
Allocating 10 ADLAUs
for a 10 minute job.
Cost
= 10 min * 10 ADLAUs
= 100 ADLAU minutes
Time
Blue line: Allocated
Over Allocation Consider using fewer ADLAUs
You are paying for the area under the
blue line
You are only using the area under the
red line
Time
Vertex Execution
Store Basics
A VERY BIG FILE
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
Files are split apart into Extents.
Extents can be up to 250MB in size.
For availability and reliability, extents
are replicated (3 copies).
Enables parallelized read
Parallel writing
Front-end
machines for a
web serviceLog files
Simultaneous
uploads
Azure Data lake
Extent
As file size increases, more opportunities for
parallelism
Vertex
Extent Vertex
Extent Vertex
Extent Vertex
The importance of partitioning
input data
Search engine clicks data set
A log of how many clicks a certain domain got within a session
SessionID Domain Clicks
3 cnn.com 9
1 whitehouse.gov 14
2 facebook.com 8
3 reddit.com 78
2 microsoft.com 1
1 facebook.com 5
3 microsoft.com 11
Data Partitioning Compared
FB
WH
CNN
Extent 2
FB
WH
CNN
Extent 3
FB
WH
CNN
Extent 1
File:
Keys (Domain) are scattered across
the extents
WH
WH
WH
Extent 2
CNN
CNN
CNN
Extent 3
FB
FB
FB
Extent 1
U-SQL Table partitioned on Domain
The keys are now “close together” also
the index tells U-SQL exactly which
extents contain the key
CREATE TABLE SampleDBTutorials.dbo.ClickData
(
SessionId int,
Domain string,
Clinks int,
INDEX idx1 //Name of index
CLUSTERED (Domain ASC) //Column to cluster by
// PARTITIONED BY HASH (Region) //Column to partition by
);
INSERT INTO SampleDBTutorials.dbo.ClickData
SELECT *
FROM @clickdata;
How did we create and fill that table?
Find all the rows for cnn.com
// Using a File
@ClickData =
SELECT
Session int,
Domain string,
Clicks int
FROM “/clickdata.tsv”
USING Extractors.Tsv();
@rows = SELECT *
FROM @ClickData
WHERE Domain == “cnn.com”;
OUTPUT @rows
TO “/output.tsv”
USING Outputters.tsv();
// Using a U-SQL Table partitioned by Domain
@ClickData =
SELECT *
FROM MyDB.dbo.ClickData;
@rows = SELECT *
FROM @ClickData
WHERE Domain == “cnn.com”;
OUTPUT @rows
TO “/output.tsv”
USING Outputters.tsv();
Read Read
Write Write Write
Read
Filter Filter Filter
CNN,
FB,
WH
EXTENT 1 EXTENT 2 EXTENT 3
CNN,
FB,
WH
CNN,
FB,
WH
Because “CNN” could be anywhere,
all extents must be read.
Read
Write
Filter
FB
EXTENT 1 EXTENT 2 EXTENT 3
WH CNN
Thanks to “Partition Elimination” and
the U-SQL Table, the job only reads
from the extent that is known to
have the relevant key
File U-SQL Table Partitioned by Domain
How many clicks per domain?
@rows = SELECT
Domain,
SUM(Clicks) AS TotalClicks
FROM @ClickData
GROUP BY Domain;
File
Read Read
Partition Partition
Full Agg
Write
Full Agg
Write
Full Agg
Write
Read
Partition
Partial Agg Partial Agg Partial Agg
CNN,
FB,
WH
EXTENT 1 EXTENT 2 EXTENT 3
CNN,
FB,
WH
CNN,
FB,
WH
U-SQL Table Partitioned by Domain
Read Read
Full Agg Full Agg
Write Write
Read
Full Agg
Write
FB
EXTENT 1
WH
EXTENT 2
CNN
EXTENT 3
Expensive!
High-Level Performance Advice
Learn U-SQL
Leverage Native U-SQL Constructs first
UDOs are Evil 
Can’t optimize UDOs like pure U-SQL
code.
Understand your Data
Volume, Distribution, Partitioning,
Growth
Questions?

Weitere ähnliche Inhalte

Was ist angesagt?

Leveraging Azure Databricks to minimize time to insight by combining Batch an...
Leveraging Azure Databricks to minimize time to insight by combining Batch an...Leveraging Azure Databricks to minimize time to insight by combining Batch an...
Leveraging Azure Databricks to minimize time to insight by combining Batch an...Microsoft Tech Community
 
Building Advanced Analytics Pipelines with Azure Databricks
Building Advanced Analytics Pipelines with Azure DatabricksBuilding Advanced Analytics Pipelines with Azure Databricks
Building Advanced Analytics Pipelines with Azure DatabricksLace Lofranco
 
Azure Data Lake and Azure Data Lake Analytics
Azure Data Lake and Azure Data Lake AnalyticsAzure Data Lake and Azure Data Lake Analytics
Azure Data Lake and Azure Data Lake AnalyticsWaqas Idrees
 
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)Jason L Brugger
 
201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine Learning201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine LearningMark Tabladillo
 
Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Michael Rys
 
Deep Dive into Azure Data Factory v2
Deep Dive into Azure Data Factory v2Deep Dive into Azure Data Factory v2
Deep Dive into Azure Data Factory v2Eric Bragas
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)James Serra
 
Azure Data Factory V2; The Data Flows
Azure Data Factory V2; The Data FlowsAzure Data Factory V2; The Data Flows
Azure Data Factory V2; The Data FlowsThomas Sykes
 
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...Michael Rys
 
Azure Databricks is Easier Than You Think
Azure Databricks is Easier Than You ThinkAzure Databricks is Easier Than You Think
Azure Databricks is Easier Than You ThinkIke Ellis
 
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...Michael Rys
 
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Lace Lofranco
 
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu GantaAzure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu GantaDatabricks
 
Mapping Data Flows Training April 2021
Mapping Data Flows Training April 2021Mapping Data Flows Training April 2021
Mapping Data Flows Training April 2021Mark Kromer
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure DatabricksJames Serra
 
A lap around Azure Data Factory
A lap around Azure Data FactoryA lap around Azure Data Factory
A lap around Azure Data FactoryBizTalk360
 
Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage CCG
 
Streaming Real-time Data to Azure Data Lake Storage Gen 2
Streaming Real-time Data to Azure Data Lake Storage Gen 2Streaming Real-time Data to Azure Data Lake Storage Gen 2
Streaming Real-time Data to Azure Data Lake Storage Gen 2Carole Gunst
 

Was ist angesagt? (20)

Leveraging Azure Databricks to minimize time to insight by combining Batch an...
Leveraging Azure Databricks to minimize time to insight by combining Batch an...Leveraging Azure Databricks to minimize time to insight by combining Batch an...
Leveraging Azure Databricks to minimize time to insight by combining Batch an...
 
Building Advanced Analytics Pipelines with Azure Databricks
Building Advanced Analytics Pipelines with Azure DatabricksBuilding Advanced Analytics Pipelines with Azure Databricks
Building Advanced Analytics Pipelines with Azure Databricks
 
Azure Data Lake and Azure Data Lake Analytics
Azure Data Lake and Azure Data Lake AnalyticsAzure Data Lake and Azure Data Lake Analytics
Azure Data Lake and Azure Data Lake Analytics
 
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)
 
201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine Learning201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine Learning
 
Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)
 
Deep Dive into Azure Data Factory v2
Deep Dive into Azure Data Factory v2Deep Dive into Azure Data Factory v2
Deep Dive into Azure Data Factory v2
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
 
Azure Data Factory V2; The Data Flows
Azure Data Factory V2; The Data FlowsAzure Data Factory V2; The Data Flows
Azure Data Factory V2; The Data Flows
 
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
 
Azure Databricks is Easier Than You Think
Azure Databricks is Easier Than You ThinkAzure Databricks is Easier Than You Think
Azure Databricks is Easier Than You Think
 
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...
 
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
 
Introduction to Azure Data Lake
Introduction to Azure Data LakeIntroduction to Azure Data Lake
Introduction to Azure Data Lake
 
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu GantaAzure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
 
Mapping Data Flows Training April 2021
Mapping Data Flows Training April 2021Mapping Data Flows Training April 2021
Mapping Data Flows Training April 2021
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure Databricks
 
A lap around Azure Data Factory
A lap around Azure Data FactoryA lap around Azure Data Factory
A lap around Azure Data Factory
 
Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage
 
Streaming Real-time Data to Azure Data Lake Storage Gen 2
Streaming Real-time Data to Azure Data Lake Storage Gen 2Streaming Real-time Data to Azure Data Lake Storage Gen 2
Streaming Real-time Data to Azure Data Lake Storage Gen 2
 

Andere mochten auch

Getting started with microsoft azure in 30 mins
Getting started with microsoft azure in 30 minsGetting started with microsoft azure in 30 mins
Getting started with microsoft azure in 30 minsIlyas F ☁☁☁
 
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature Mapping
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature MappingMicrosoft Azure vs Amazon Web Services (AWS) Services & Feature Mapping
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature MappingIlyas F ☁☁☁
 
U-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for DevelopersU-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for DevelopersMichael Rys
 
Cortana Analytics Workshop: Azure Data Lake
Cortana Analytics Workshop: Azure Data LakeCortana Analytics Workshop: Azure Data Lake
Cortana Analytics Workshop: Azure Data LakeMSAdvAnalytics
 
AWS vs Azure - Cloud Services Comparison
AWS vs Azure - Cloud Services ComparisonAWS vs Azure - Cloud Services Comparison
AWS vs Azure - Cloud Services ComparisonAniket Kanitkar
 
Azure vs AWS Best Practices: What You Need to Know
Azure vs AWS Best Practices: What You Need to KnowAzure vs AWS Best Practices: What You Need to Know
Azure vs AWS Best Practices: What You Need to KnowRightScale
 
Azure Operational Insightsについて
Azure Operational InsightsについてAzure Operational Insightsについて
Azure Operational InsightsについてNorio Sashizaki
 
15 Compelling Reasons to Consider Multi Tenancy
15 Compelling Reasons to Consider Multi Tenancy15 Compelling Reasons to Consider Multi Tenancy
15 Compelling Reasons to Consider Multi TenancyIlyas F ☁☁☁
 
Business Transformation with Microsoft Azure IoT
Business Transformation with Microsoft Azure IoTBusiness Transformation with Microsoft Azure IoT
Business Transformation with Microsoft Azure IoTIlyas F ☁☁☁
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lakeJames Serra
 
U-SQL Query Execution and Performance Tuning
U-SQL Query Execution and Performance TuningU-SQL Query Execution and Performance Tuning
U-SQL Query Execution and Performance TuningMichael Rys
 
Open stack design 2012 applications targeting openstack-final
Open stack design 2012   applications targeting openstack-finalOpen stack design 2012   applications targeting openstack-final
Open stack design 2012 applications targeting openstack-finalrhirschfeld
 
Inevitability of Multi-Tenancy & SAAS in Product Engineering
Inevitability of Multi-Tenancy & SAAS in Product EngineeringInevitability of Multi-Tenancy & SAAS in Product Engineering
Inevitability of Multi-Tenancy & SAAS in Product EngineeringPrashanth Panduranga
 
Data Migration and Data-Tier Applications with SQL Azure
Data Migration and Data-Tier Applications with SQL AzureData Migration and Data-Tier Applications with SQL Azure
Data Migration and Data-Tier Applications with SQL AzureMark Kromer
 
OpenStack Preso: DevOps on Hybrid Infrastructure
OpenStack Preso: DevOps on Hybrid InfrastructureOpenStack Preso: DevOps on Hybrid Infrastructure
OpenStack Preso: DevOps on Hybrid Infrastructurerhirschfeld
 
SaaS and Multi-Tenancy – Foundational Concepts
SaaS and Multi-Tenancy – Foundational ConceptsSaaS and Multi-Tenancy – Foundational Concepts
SaaS and Multi-Tenancy – Foundational ConceptsJeelani Shaik
 

Andere mochten auch (20)

Getting started with microsoft azure in 30 mins
Getting started with microsoft azure in 30 minsGetting started with microsoft azure in 30 mins
Getting started with microsoft azure in 30 mins
 
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature Mapping
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature MappingMicrosoft Azure vs Amazon Web Services (AWS) Services & Feature Mapping
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature Mapping
 
U-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for DevelopersU-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for Developers
 
Azure operational insights
Azure operational insightsAzure operational insights
Azure operational insights
 
Ilyas visual resume
Ilyas  visual resumeIlyas  visual resume
Ilyas visual resume
 
Cortana Analytics Workshop: Azure Data Lake
Cortana Analytics Workshop: Azure Data LakeCortana Analytics Workshop: Azure Data Lake
Cortana Analytics Workshop: Azure Data Lake
 
AWS vs. Azure
AWS vs. AzureAWS vs. Azure
AWS vs. Azure
 
AWS vs Azure - Cloud Services Comparison
AWS vs Azure - Cloud Services ComparisonAWS vs Azure - Cloud Services Comparison
AWS vs Azure - Cloud Services Comparison
 
Blogging for beginners
Blogging for beginnersBlogging for beginners
Blogging for beginners
 
Azure vs AWS Best Practices: What You Need to Know
Azure vs AWS Best Practices: What You Need to KnowAzure vs AWS Best Practices: What You Need to Know
Azure vs AWS Best Practices: What You Need to Know
 
Azure Operational Insightsについて
Azure Operational InsightsについてAzure Operational Insightsについて
Azure Operational Insightsについて
 
15 Compelling Reasons to Consider Multi Tenancy
15 Compelling Reasons to Consider Multi Tenancy15 Compelling Reasons to Consider Multi Tenancy
15 Compelling Reasons to Consider Multi Tenancy
 
Business Transformation with Microsoft Azure IoT
Business Transformation with Microsoft Azure IoTBusiness Transformation with Microsoft Azure IoT
Business Transformation with Microsoft Azure IoT
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
U-SQL Query Execution and Performance Tuning
U-SQL Query Execution and Performance TuningU-SQL Query Execution and Performance Tuning
U-SQL Query Execution and Performance Tuning
 
Open stack design 2012 applications targeting openstack-final
Open stack design 2012   applications targeting openstack-finalOpen stack design 2012   applications targeting openstack-final
Open stack design 2012 applications targeting openstack-final
 
Inevitability of Multi-Tenancy & SAAS in Product Engineering
Inevitability of Multi-Tenancy & SAAS in Product EngineeringInevitability of Multi-Tenancy & SAAS in Product Engineering
Inevitability of Multi-Tenancy & SAAS in Product Engineering
 
Data Migration and Data-Tier Applications with SQL Azure
Data Migration and Data-Tier Applications with SQL AzureData Migration and Data-Tier Applications with SQL Azure
Data Migration and Data-Tier Applications with SQL Azure
 
OpenStack Preso: DevOps on Hybrid Infrastructure
OpenStack Preso: DevOps on Hybrid InfrastructureOpenStack Preso: DevOps on Hybrid Infrastructure
OpenStack Preso: DevOps on Hybrid Infrastructure
 
SaaS and Multi-Tenancy – Foundational Concepts
SaaS and Multi-Tenancy – Foundational ConceptsSaaS and Multi-Tenancy – Foundational Concepts
SaaS and Multi-Tenancy – Foundational Concepts
 

Ähnlich wie Azure Data Lake Analytics Deep Dive

Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Michael Rys
 
SPL_ALL_EN.pptx
SPL_ALL_EN.pptxSPL_ALL_EN.pptx
SPL_ALL_EN.pptx政宏 张
 
U-SQL Query Execution and Performance Basics (SQLBits 2016)
U-SQL Query Execution and Performance Basics (SQLBits 2016)U-SQL Query Execution and Performance Basics (SQLBits 2016)
U-SQL Query Execution and Performance Basics (SQLBits 2016)Michael Rys
 
NewSQL - Deliverance from BASE and back to SQL and ACID
NewSQL - Deliverance from BASE and back to SQL and ACIDNewSQL - Deliverance from BASE and back to SQL and ACID
NewSQL - Deliverance from BASE and back to SQL and ACIDTony Rogerson
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)James Serra
 
Cloud architectural patterns and Microsoft Azure tools
Cloud architectural patterns and Microsoft Azure toolsCloud architectural patterns and Microsoft Azure tools
Cloud architectural patterns and Microsoft Azure toolsPushkar Chivate
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
High Throughput Analytics with Cassandra & Azure
High Throughput Analytics with Cassandra & AzureHigh Throughput Analytics with Cassandra & Azure
High Throughput Analytics with Cassandra & AzureDataStax Academy
 
Azure Cosmos DB - Technical Deep Dive
Azure Cosmos DB - Technical Deep DiveAzure Cosmos DB - Technical Deep Dive
Azure Cosmos DB - Technical Deep DiveAndre Essing
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Migrating on premises workload to azure sql database
Migrating on premises workload to azure sql databaseMigrating on premises workload to azure sql database
Migrating on premises workload to azure sql databasePARIKSHIT SAVJANI
 
Building a Just-in-Time Application Stack for Analysts
Building a Just-in-Time Application Stack for AnalystsBuilding a Just-in-Time Application Stack for Analysts
Building a Just-in-Time Application Stack for AnalystsAvere Systems
 
Leveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data WarehouseLeveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data WarehouseAmazon Web Services
 
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...Alluxio, Inc.
 
Ms sql server architecture
Ms sql server architectureMs sql server architecture
Ms sql server architectureAjeet Singh
 
SQL Server 2008 Development for Programmers
SQL Server 2008 Development for ProgrammersSQL Server 2008 Development for Programmers
SQL Server 2008 Development for ProgrammersAdam Hutson
 

Ähnlich wie Azure Data Lake Analytics Deep Dive (20)

Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
 
SPL_ALL_EN.pptx
SPL_ALL_EN.pptxSPL_ALL_EN.pptx
SPL_ALL_EN.pptx
 
Redshift overview
Redshift overviewRedshift overview
Redshift overview
 
U-SQL Query Execution and Performance Basics (SQLBits 2016)
U-SQL Query Execution and Performance Basics (SQLBits 2016)U-SQL Query Execution and Performance Basics (SQLBits 2016)
U-SQL Query Execution and Performance Basics (SQLBits 2016)
 
NewSQL - Deliverance from BASE and back to SQL and ACID
NewSQL - Deliverance from BASE and back to SQL and ACIDNewSQL - Deliverance from BASE and back to SQL and ACID
NewSQL - Deliverance from BASE and back to SQL and ACID
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
 
Cloud architectural patterns and Microsoft Azure tools
Cloud architectural patterns and Microsoft Azure toolsCloud architectural patterns and Microsoft Azure tools
Cloud architectural patterns and Microsoft Azure tools
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
High Throughput Analytics with Cassandra & Azure
High Throughput Analytics with Cassandra & AzureHigh Throughput Analytics with Cassandra & Azure
High Throughput Analytics with Cassandra & Azure
 
Azure Cosmos DB - Technical Deep Dive
Azure Cosmos DB - Technical Deep DiveAzure Cosmos DB - Technical Deep Dive
Azure Cosmos DB - Technical Deep Dive
 
AWS glue technical enablement training
AWS glue technical enablement trainingAWS glue technical enablement training
AWS glue technical enablement training
 
Amazon Redshift Deep Dive
Amazon Redshift Deep Dive Amazon Redshift Deep Dive
Amazon Redshift Deep Dive
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Migrating on premises workload to azure sql database
Migrating on premises workload to azure sql databaseMigrating on premises workload to azure sql database
Migrating on premises workload to azure sql database
 
Building a Just-in-Time Application Stack for Analysts
Building a Just-in-Time Application Stack for AnalystsBuilding a Just-in-Time Application Stack for Analysts
Building a Just-in-Time Application Stack for Analysts
 
Leveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data WarehouseLeveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data Warehouse
 
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
 
Ms sql server architecture
Ms sql server architectureMs sql server architecture
Ms sql server architecture
 
SQL Server 2008 Development for Programmers
SQL Server 2008 Development for ProgrammersSQL Server 2008 Development for Programmers
SQL Server 2008 Development for Programmers
 
Deep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDBDeep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDB
 

Kürzlich hochgeladen

『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书rnrncn29
 
ETHICAL HACKING dddddddddddddddfnandni.pptx
ETHICAL HACKING dddddddddddddddfnandni.pptxETHICAL HACKING dddddddddddddddfnandni.pptx
ETHICAL HACKING dddddddddddddddfnandni.pptxNIMMANAGANTI RAMAKRISHNA
 
Cybersecurity Threats and Cybersecurity Best Practices
Cybersecurity Threats and Cybersecurity Best PracticesCybersecurity Threats and Cybersecurity Best Practices
Cybersecurity Threats and Cybersecurity Best PracticesLumiverse Solutions Pvt Ltd
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predieusebiomeyer
 
TRENDS Enabling and inhibiting dimensions.pptx
TRENDS Enabling and inhibiting dimensions.pptxTRENDS Enabling and inhibiting dimensions.pptx
TRENDS Enabling and inhibiting dimensions.pptxAndrieCagasanAkio
 
Company Snapshot Theme for Business by Slidesgo.pptx
Company Snapshot Theme for Business by Slidesgo.pptxCompany Snapshot Theme for Business by Slidesgo.pptx
Company Snapshot Theme for Business by Slidesgo.pptxMario
 
IP addressing and IPv6, presented by Paul Wilson at IETF 119
IP addressing and IPv6, presented by Paul Wilson at IETF 119IP addressing and IPv6, presented by Paul Wilson at IETF 119
IP addressing and IPv6, presented by Paul Wilson at IETF 119APNIC
 
Unidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptxUnidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptxmibuzondetrabajo
 
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书rnrncn29
 

Kürzlich hochgeladen (9)

『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
 
ETHICAL HACKING dddddddddddddddfnandni.pptx
ETHICAL HACKING dddddddddddddddfnandni.pptxETHICAL HACKING dddddddddddddddfnandni.pptx
ETHICAL HACKING dddddddddddddddfnandni.pptx
 
Cybersecurity Threats and Cybersecurity Best Practices
Cybersecurity Threats and Cybersecurity Best PracticesCybersecurity Threats and Cybersecurity Best Practices
Cybersecurity Threats and Cybersecurity Best Practices
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predi
 
TRENDS Enabling and inhibiting dimensions.pptx
TRENDS Enabling and inhibiting dimensions.pptxTRENDS Enabling and inhibiting dimensions.pptx
TRENDS Enabling and inhibiting dimensions.pptx
 
Company Snapshot Theme for Business by Slidesgo.pptx
Company Snapshot Theme for Business by Slidesgo.pptxCompany Snapshot Theme for Business by Slidesgo.pptx
Company Snapshot Theme for Business by Slidesgo.pptx
 
IP addressing and IPv6, presented by Paul Wilson at IETF 119
IP addressing and IPv6, presented by Paul Wilson at IETF 119IP addressing and IPv6, presented by Paul Wilson at IETF 119
IP addressing and IPv6, presented by Paul Wilson at IETF 119
 
Unidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptxUnidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptx
 
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
 

Azure Data Lake Analytics Deep Dive

  • 1. Ilyas F Azure Solution Architect @ 8KMiles Twitter: @ilyas_tweets Linkedin: https://in.linkedin.com/in/ilyasf Azure Data Lake Analytics Deep Dive 2016/05/17
  • 2. Agenda Origins • Cosmos • Futures Layers & Components • Storage • Parallelization • Job Scheduling • Query Execution • Performance • Demo
  • 4. The 3 Azure Data Lake Services HDInsight Analytics Store Clusters as a service Big data queries as a service Hyper-scale Storage optimized for analytics Currently in PREVIEW. General Availability later in 2016.
  • 5. Familiar syntax to millions of SQL & .NET developers Unifies declarative nature of SQL with the imperative power of C# Unifies structured, semi-structured and unstructured data Distributed query support over all data U-SQL A new language for Big Data
  • 7. History Bing needed to… • Understand user behavior And do it… • At massive scale • With agility and speed • At low cost So they built … • Cosmos Cosmos • Batch Jobs • Interactive • Machine Learning • Streaming Thousands of Developers
  • 9. Key ADL Analytics Components
  • 10. ADL Account Configuration ADL Analytics Account Links to ADL Stores ADL Store Account (the default one) Job Queue Key Settings: - Max Concurrent Jobs - Max ADLAUs per Job - Max Queue Length An ADL Store IS REQUIRED for ADL Analytics to function. Key Settings: • Max Concurrent Jobs = 3 • Max ADLAUs per job = 20 • Max Queue Length = 200 If you want to change the defaults, open a Support ticket Links to Azure Blob Stores U-SQL Catalog Metadata U-SQL Catalog Data
  • 11. Simplified Workflow Job Front End Job Scheduler Compiler Service Job Queue Job Manager U-SQL Catalog YARN Job submission Job execution U-SQL Runtime Vertex execution
  • 12. Goal: Understanding a U-SQL (Batch) Job
  • 13. Azure Data Lake Analytics (ADLA) Demo
  • 14.
  • 18. Preparing Queued Running Finalizing Ended (Succeeded, Failed, Cancelled) New Compiling Queued Scheduling Starting Running Ended UX Job State The script is being compiled by the Compiler Service All jobs enter the queue. Are there enough ADLAUs to start the job? If yes, then allocate those ADLAUs for the job The U-SQL runtime is now executing the code on 1 or more ADLAUs or finalizing the outputs The job has concluded.
  • 19. Why does a Job get Queued? Local Cause Conditions: • Queue already at Max Concurrency Global Cause (very rare) Conditions: • System-wide shortage of ADLAUs • System-wide shortage of Bandwidth * If these conditions are met, a job will be queued even if the queue is not at its Max Concurrency
  • 21. The Job Queue The queue is ordered by job priority. Lower numbers -> higher priority. 1 = highest. Running jobs When a job is at the top of the queue, it will start running. Defaults: Max Running Jobs = 3 Max Tokens per job = 20 Max Queue Size = 200
  • 22. Priority Doesn’t Preempt Running Jobs X has Pri=1. X A B C X will NOT preempt running jobs. X will have to wait. These are all running and have very low priority (pri=1000)
  • 24. U-SQL Compilation Process C# C++ Algebra Other files (system files, deployed resources) managed dll Unmanaged dll Compilation output (in job folder) Compiler & Optimizer U-SQL Metadata Service Deployed to Vertices
  • 25. The Job Folder Inside the Default ADL Store: /system/jobservice/jobs/Usql/YYYY/MM/DD/hh/mm/JOBID /system/jobservice/jobs/Usql/2016/01/20/00/00/17972fc2-4737-48f7-81fb- 49af9a784f64
  • 26. C# code generated by the U-SQL Compiler C++ code generated by the U-SQL Compiler Cluster Plan a.ka. “Job Graph” generated by U-SQL Compiler User-provided .NET Assemblies User-provided USQL script Job Folder Contents
  • 28. Blue items: the output of the compiler Grey items: U-SQL runtime bits Download all the resources Download a specific resource
  • 29. Query Execution Plans, Vertices, Stages, Parallelism, ADLAUs
  • 31. How does the Parallelism number relate to Vertices What does Vertices mean? What is this?
  • 32. Logical -> Physical Plan Each square = “a vertex” represents a fraction of the total Vertexes in each SuperVertex (aka “Stage) are doing the same operation on different parts of the same data. Vertexes in a later stages may depend on a vertex in an earlier stage Visualized like this
  • 33. Stage Details 252 Pieces of work AVG Vertex execution time 4.3 Billion rows Data Read & Written
  • 34. Automatic Vertex retry A vertex failed … but was retried automatically Overall Stage Completed Successfully A vertex might fail because: - Router congested - Hardware failure (ex: hard drive failed) - VM had to be rebooted U-SQL job will automatically schedule a vertex on another VM.
  • 35. ADLAUs Azure Data Lake Analytics Unit Parallelism N = N ADLAUs 1 ADLAU ~= A VM with 2 cores and 6 GB of memory
  • 38. Profile is loaded now Click Resource usage
  • 40. Smallest estimated time when given 2425 ADLAUs 1410 seconds = 23.5 minutes
  • 41. Model with 100 ADLAUs 8709 seconds = 145.5 minutes
  • 42. 𝐽𝑜𝑏𝐶𝑜𝑠𝑡 = 5𝑐 + 𝑚𝑖𝑛𝑢𝑡𝑒𝑠 × 𝐴𝐷𝐿𝑈𝐴𝑠 × 𝐴𝐷𝐿𝐴𝑈𝑐𝑜𝑠𝑡𝑝𝑒𝑟𝑚𝑖𝑛
  • 43. Allocation Allocating 10 ADLAUs for a 10 minute job. Cost = 10 min * 10 ADLAUs = 100 ADLAU minutes Time Blue line: Allocated
  • 44. Over Allocation Consider using fewer ADLAUs You are paying for the area under the blue line You are only using the area under the red line Time
  • 46. Store Basics A VERY BIG FILE 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 Files are split apart into Extents. Extents can be up to 250MB in size. For availability and reliability, extents are replicated (3 copies). Enables parallelized read
  • 47. Parallel writing Front-end machines for a web serviceLog files Simultaneous uploads Azure Data lake
  • 48. Extent As file size increases, more opportunities for parallelism Vertex Extent Vertex Extent Vertex Extent Vertex
  • 49. The importance of partitioning input data
  • 50. Search engine clicks data set A log of how many clicks a certain domain got within a session SessionID Domain Clicks 3 cnn.com 9 1 whitehouse.gov 14 2 facebook.com 8 3 reddit.com 78 2 microsoft.com 1 1 facebook.com 5 3 microsoft.com 11
  • 51. Data Partitioning Compared FB WH CNN Extent 2 FB WH CNN Extent 3 FB WH CNN Extent 1 File: Keys (Domain) are scattered across the extents WH WH WH Extent 2 CNN CNN CNN Extent 3 FB FB FB Extent 1 U-SQL Table partitioned on Domain The keys are now “close together” also the index tells U-SQL exactly which extents contain the key
  • 52. CREATE TABLE SampleDBTutorials.dbo.ClickData ( SessionId int, Domain string, Clinks int, INDEX idx1 //Name of index CLUSTERED (Domain ASC) //Column to cluster by // PARTITIONED BY HASH (Region) //Column to partition by ); INSERT INTO SampleDBTutorials.dbo.ClickData SELECT * FROM @clickdata; How did we create and fill that table?
  • 53. Find all the rows for cnn.com // Using a File @ClickData = SELECT Session int, Domain string, Clicks int FROM “/clickdata.tsv” USING Extractors.Tsv(); @rows = SELECT * FROM @ClickData WHERE Domain == “cnn.com”; OUTPUT @rows TO “/output.tsv” USING Outputters.tsv(); // Using a U-SQL Table partitioned by Domain @ClickData = SELECT * FROM MyDB.dbo.ClickData; @rows = SELECT * FROM @ClickData WHERE Domain == “cnn.com”; OUTPUT @rows TO “/output.tsv” USING Outputters.tsv();
  • 54. Read Read Write Write Write Read Filter Filter Filter CNN, FB, WH EXTENT 1 EXTENT 2 EXTENT 3 CNN, FB, WH CNN, FB, WH Because “CNN” could be anywhere, all extents must be read. Read Write Filter FB EXTENT 1 EXTENT 2 EXTENT 3 WH CNN Thanks to “Partition Elimination” and the U-SQL Table, the job only reads from the extent that is known to have the relevant key File U-SQL Table Partitioned by Domain
  • 55. How many clicks per domain? @rows = SELECT Domain, SUM(Clicks) AS TotalClicks FROM @ClickData GROUP BY Domain;
  • 56. File Read Read Partition Partition Full Agg Write Full Agg Write Full Agg Write Read Partition Partial Agg Partial Agg Partial Agg CNN, FB, WH EXTENT 1 EXTENT 2 EXTENT 3 CNN, FB, WH CNN, FB, WH U-SQL Table Partitioned by Domain Read Read Full Agg Full Agg Write Write Read Full Agg Write FB EXTENT 1 WH EXTENT 2 CNN EXTENT 3 Expensive!
  • 58. Learn U-SQL Leverage Native U-SQL Constructs first UDOs are Evil  Can’t optimize UDOs like pure U-SQL code. Understand your Data Volume, Distribution, Partitioning, Growth