SlideShare ist ein Scribd-Unternehmen logo
1 von 78
| © Copyright 2016 Hitachi Consulting1
Microsoft Azure Batch
High Performance Computing with an Application of
Scalable Files Processing
Khalid M. Salama, Ph.D.
Business Insights & Analytics
Hitachi Consulting UK
We Make it Happen. Better.
| © Copyright 2016 Hitachi Consulting2
Outline
 What is Azure Batch and High Performance Computing?
 When to Use Azure Batch?
 Azure Batch Constructs
 Scalable Data Loading Solution with Azure Batch
 .NET Code Walk-through & Demo
 Useful Resources
| © Copyright 2016 Hitachi Consulting3
High Performance Computing
| © Copyright 2016 Hitachi Consulting4
What is Azure Batch?
Yet anther azure service

High Performance Computing (HPC)
environment on Azure.
| © Copyright 2016 Hitachi Consulting5
What is Azure Batch?
Yet anther azure service

High Performance Computing (HPC)
environment on Azure.
Used to scale/parallelize compute-
intensive workloads on managed
cluster of VMs.
| © Copyright 2016 Hitachi Consulting6
What is Azure Batch?
Yet anther azure service

High Performance Computing (HPC)
environment on Azure.
The computation on the
cluster is managed using
Azure Batch APIs.
Used to scale/parallelize compute-
intensive workloads on managed
cluster of VMs.
| © Copyright 2016 Hitachi Consulting7
What is Azure Batch?
Yet anther azure service

High Performance Computing (HPC)
environment on Azure.
The computation on the
cluster is managed using
Azure Batch APIs.
On-demand – Pay as you use
Elastic – Scale up/down or shut down
PaaS – No infrastructure configurations are
needed
Used to scale/parallelize compute-
intensive workloads on managed
cluster of VMs.
| © Copyright 2016 Hitachi Consulting8
Computing Example
Job
Job
Sequential Processing
| © Copyright 2016 Hitachi Consulting9
Computing Example
Job
Job
Task 1
Task 2
Task 3
Task 4
Task 5
Task 6
Sequential Processing
| © Copyright 2016 Hitachi Consulting10
Computing Example
Job
Job
Task 1
Task 2
Task 3
Task 4
Task 5
Task 6
Sequential Processing
Single Compute Unit
| © Copyright 2016 Hitachi Consulting11
Computing Example
Job
Job
Task 1
Task 2
Task 3
Task 4
Task 5
Task 6
Task 1
Sequential Processing
Single Compute Unit
Start T = 0
| © Copyright 2016 Hitachi Consulting12
Computing Example
Job
Job
Task 1
Task 2
Task 3
Task 4
Task 5
Task 6
Task 2
Sequential Processing
Task 1 T = 1X
Start T = 0
Single Compute Unit
| © Copyright 2016 Hitachi Consulting13
Computing Example
Job
Job
Task 1
Task 2
Task 3
Task 4
Task 5
Task 6
Task 3
Sequential Processing
Task 1 T = 1X
Start T = 0
Task 2 T = 2X
Single Compute Unit
| © Copyright 2016 Hitachi Consulting14
Computing Example
Job
Job
Task 1
Task 2
Task 3
Task 4
Task 5
Task 6
Task 1 T = 1X
Start T = 0
Task 2 T = 2X
Task 3 T = 3X
Task 4 T = 4X
Task 5 T = 5X
Task 6 T = 6X
Sequential Processing
End T = 6X+
Single Compute Unit
| © Copyright 2016 Hitachi Consulting15
High Performance Computing
Refers to the use of parallel processing for running compute intensive
job programs efficiently via aggregating compute power
| © Copyright 2016 Hitachi Consulting16
High Performance Computing
Refers to the use of parallel processing for running compute intensive
job programs efficiently via aggregating compute power
Scale out
Using multiple compute units
Divide
A Job is decomposed into
multiple Independent tasks
Distribute
Tasks are processed in a
separate compute nodes,
simultaneously
| © Copyright 2016 Hitachi Consulting17
Computing Example
JobJob
Task 1
Task 2
Task 3
Task 4
Task 5
Task 6
Parallel Processing
| © Copyright 2016 Hitachi Consulting18
Computing Example
JobJob
Task 1
Task 2
Task 3
Task 4
Task 5
Task 6
Parallel Processing
Compute Cluster
| © Copyright 2016 Hitachi Consulting19
Computing Example
JobJob
Task 1
Task 2
Task 3
Task 4
Task 5
Task 6
Parallel Processing
Compute Cluster
Task 1
Task 2
Task 3
Task 4
Task 4
Task 6
| © Copyright 2016 Hitachi Consulting20
Computing Example
JobJob
Task 1
Task 2
Task 3
Task 4
Task 5
Task 6
Parallel Processing
Compute Cluster
Task 1 T = 1X
Start T = 0
Task 2 T = 1X
Task 3 T = 1X
Task 4 T = 1X
Task 5 T = 1X
Task 6 T = 1X
End T = 1X+
| © Copyright 2016 Hitachi Consulting21
Big Data vs. Big Compute
The big brothers
Big Data
 Data Centric
 Increase of data Volume + Velocity + Varity
= Technologies to store and process the data efficiently
 Azure HDInsight
| © Copyright 2016 Hitachi Consulting22
Big Data vs. Big Compute
The big brothers
Big Data
Big Compute
 Data Centric
 Increase of data Volume + Velocity + Varity
= Technologies to store and process the data efficiently
 Azure HDInsight
 CPU & Memory Intensive
 Increase of computation and algorithms complexity
= Technologies to parallelize/distribute workload
 Azure Batch
| © Copyright 2016 Hitachi Consulting23
Big Data vs. Big Compute
Big Data Processing is a subset of Big Compute, the latter covers a wider
spectrum of computing problems
The big brothers
Big Data
Big Compute
 Data Centric
 Increase of data Volume + Velocity + Varity
= Technologies to store and process the data efficiently
 Azure HDInsight
 CPU & Memory Intensive
 Increase of computation and algorithms complexity
= Technologies to parallelize/distribute workload
 Azure Batch
| © Copyright 2016 Hitachi Consulting24
When to use Azure Batch
Intrinsically parallel (also known as "embarrassingly parallel") applications
Use cases for Big Compute
| © Copyright 2016 Hitachi Consulting25
When to use Azure Batch
Intrinsically parallel (also known as "embarrassingly parallel") applications
 Image rendering and graphics processing
 Search and optimization problems
 Various experimental/simulation computing applications
 Massively parallel data file processing & loading
Use cases for Big Compute
| © Copyright 2016 Hitachi Consulting26
When to use Azure Batch
Intrinsically parallel (also known as "embarrassingly parallel") applications
 Image rendering and graphics processing
 Search and optimization problems
 Various experimental/simulation computing applications
 Massively parallel data file processing & loading
 Executing thousands of DB Stored Procedures simultaneously
Use cases for Big Compute
| © Copyright 2016 Hitachi Consulting27
When to use Azure Batch
Intrinsically parallel (also known as "embarrassingly parallel") applications
 Image rendering and graphics processing
 Search and optimization problems
 Various experimental/simulation computing applications
 Massively parallel data file processing & loading
 Executing thousands of DB Stored Procedures simultaneously NO!
Remember where the computation occurs!
Use cases for Big Compute
| © Copyright 2016 Hitachi Consulting28
When to use Azure Batch
Intrinsically parallel (also known as "embarrassingly parallel") applications
 Image rendering and graphics processing
 Search and optimization problems
 Various experimental/simulation computing applications
 Massively parallel data file processing & loading
 Executing thousands of DB Stored Procedures simultaneously NO!
Remember where the computation occurs!
For applications that needs task-to-task interaction, Message Passing Interfaces (MPI) are
supported in Azure Batch – Distributed Processing
In some cases, communication between tasks can be managed via a shared data store –
Parallel Processing
Use cases for Big Compute
| © Copyright 2016 Hitachi Consulting29
Azure Batch
| © Copyright 2016 Hitachi Consulting30
Azure Batch Constructs
Putting together the pieces of the picture
Azure Batch Account
Azure Batch Account
‱ Pool
− Number of VMs
− VM Size
− VM OS Family
 Job
− Set of Tasks
− Priority
− Max. Execution time
 Task
− Parent Job
− Resources (.config, .dlls)
− Cmd Executable (.exe)
− Cmd Parameters
Azure Storage Account
 Hosts all the task resources
(.dlls & .exe)
| © Copyright 2016 Hitachi Consulting31
Azure Batch Constructs
Putting together the pieces of the picture
Azure Batch Account
Pool
(number of
nodes, osFamily,
Node Size
Azure Batch Account
‱ Pool
− Number of VMs
− VM Size
− VM OS Family
 Job
− Set of Tasks
− Priority
− Max. Execution time
 Task
− Parent Job
− Resources (.config, .dlls)
− Cmd Executable (.exe)
− Cmd Parameters
Azure Storage Account
 Hosts all the task resources
(.dlls & .exe)
| © Copyright 2016 Hitachi Consulting32
Azure Batch Constructs
Putting together the pieces of the picture
Azure Batch Account
Pool 1
(number of
nodes, osFamily,
Node Size
Pool 2
(number of
nodes, osFamily,
Node Size
Azure Batch Account
‱ Pool
− Number of VMs
− VM Size
− VM OS Family
 Job
− Set of Tasks
− Priority
− Max. Execution time
 Task
− Parent Job
− Resources (.config, .dlls)
− Cmd Executable (.exe)
− Cmd Parameters
Azure Storage Account
 Hosts all the task resources
(.dlls & .exe)
| © Copyright 2016 Hitachi Consulting33
Azure Batch Constructs
Putting together the pieces of the picture
Azure Batch Account
Pool 1
(number of
nodes, osFamily,
Node Size
Job
(priority, max
execution time)
Task 1
(job, exe
resources)
Task 2
(job, ex
resources)
Task 3
(job, exe
resources)
Pool 2
(number of
nodes, osFamily,
Node Size
Azure Batch Account
‱ Pool
− Number of VMs
− VM Size
− VM OS Family
 Job
− Set of Tasks
− Priority
− Max. Execution time
 Task
− Parent Job
− Resources (.config, .dlls)
− Cmd Executable (.exe)
− Cmd Parameters
Azure Storage Account
 Hosts all the task resources
(.dlls & .exe)
| © Copyright 2016 Hitachi Consulting34
Job 2
(priority, max
execution time)
Azure Batch Constructs
Putting together the pieces of the picture
Azure Batch Account
Pool 1
(number of
nodes, osFamily,
Node Size
Job 1
(priority, max
execution time)
Task 1
(job, exe
resources)
Task 2
(job, ex
resources)
Task 3
(job, exe
resources)
Task A
(job, exe
resources)
Task B
(job, exe
resources)
Job 3
(priority, max
execution time)
Task X
(job, exe
resources)
Task Y
(job, exe
resources)
Pool 2
(number of
nodes, osFamily,
Node Size
Azure Batch Account
‱ Pool
− Number of VMs
− VM Size
− VM OS Family
 Job
− Set of Tasks
− Priority
− Max. Execution time
 Task
− Parent Job
− Resources (.config, .dlls)
− Cmd Executable (.exe)
− Cmd Parameters
Azure Storage Account
 Hosts all the task resources
(.dlls & .exe)
| © Copyright 2016 Hitachi Consulting35
Job 2
(priority, max
execution time)
Azure Batch Constructs
Putting together the pieces of the picture
Azure Batch Account
Pool 1
(number of
nodes, osFamily,
Node Size
Job 1
(priority, max
execution time)
Task 1
(job, exe
resources)
Task 2
(job, exe
resources)
Task 3
(job, exe
resources)
Task A
(job, exe
resources)
Task B
(job, exe
resources)
Job 3
(priority, max
execution time)
Task X
(job, exe
resources)
Task Y
(job, exe
resources)
Pool 2
(number of
nodes, osFamily,
Node Size
Azure Batch Account
‱ Pool
− Number of VMs
− VM Size
− VM OS Family
 Job
− Set of Tasks
− Priority
− Max. Execution time
 Task
− Parent Job
− Resources (.config, .dlls)
− Cmd Executable (.exe)
− Cmd Parameters
Azure Storage Account
 Hosts all the task resources
(.dlls & .exe)
| © Copyright 2016 Hitachi Consulting36
Compute Size
Resource Default Maximum Limit
Azure Batch Account 1 50
Pools per Batch Account 20 5000
Cores per Batch Account 20 N/A
Tasks per Compute Node 1 4 X node core
Number of Nodes vs Node Size:
 Many small nodes → many tasks, not compute/memory intensive
 Few big nodes → few tasks, compute/memory intensive
(potential multi-threading per task)
 Task queueing is automatically managed by Azure Batch
Azure Batch Account
‱ Pool
− Number of VMs
− VM Size
− VM OS Family
 Job
− Set of Tasks
− Priority
− Max. Execution time
 Task
− Parent Job
− Resources (.config, .dlls)
− Cmd Executable (.exe)
− Cmd Parameters
Azure Storage Account
 Hosts all the task resources
(.dlls & .exe)
| © Copyright 2016 Hitachi Consulting37
Compute Size
What If:
 Pool Size = 10 Nodes
 Node Size = Small (1 Core)
 Total Cores = 10
And you have:
 2 Jobs
 Each Job has 7 task
 Total tasks = 14
By default:
 1 Core can process only 1 task
| © Copyright 2016 Hitachi Consulting38
Compute Size
What If:
 Pool Size = 10 Nodes
 Node Size = Small (1 Core)
 Total Cores = 10
And you have:
 2 Jobs
 Each Job has 7 task
 Total tasks = 14
By default:
 1 Core can process only 1 task
Then:
 The 7 tasks with the higher priority job will be executed
(status = “Running”)
 The first 3 added tasks to the lower priority job will be executed
(status = “Running”)
 The rest 4 task of the lower priority job will be queued
(status = “Active”)
 As soon as a “Running” task finishes (status = “Completed”)
an “Active” task will be assigned to the freed compute node
| © Copyright 2016 Hitachi Consulting39
Compute Size
What If:
 Pool Size = 10 Nodes
 Node Size = Small (1 Core)
 Total Cores = 10
And you have:
 2 Jobs
 Each Job has 7 task
 Total tasks = 14
By default:
 1 Core can process only 1 task
Then:
 The 7 tasks with the higher priority job will be executed
(status = “Running”)
 The first 3 added tasks to the lower priority job will be executed
(status = “Running”)
 The rest 4 task of the lower priority job will be queued
(status = “Active”)
 As soon as a “Running” task finishes (status = “Completed”)
an “Active” task will be assigned to the freed compute node
 If job was executed (status = “Running”), then a higher priority job is
submitted to the same pool:
− Azure Batch will “pause” tasks of the low priority job (status = “Suspended”)
to free resources (cores) for the higher priority job,
− then resume them when resources become available
| © Copyright 2016 Hitachi Consulting40
Use Case: Parallel Data Files Loading
| © Copyright 2016 Hitachi Consulting41
Parallel Data Loading with Azure Batch
 Source data is a set of files, with different formants
(Fixed width, Delimited, XML, JSON, Mainframe, Other), in Azure Blob Storage
 Blob Storage Structure: “<DataDomain><DataFeed><DataFeed>_<Timestamp>.<ext>”
 200+ data feeds, each produces 1-3 files daily
 Data feed formats (column, data types, file format) are described in MetadataDB (Azure SQL DB)
 The objective is to build a Data Loading Solution to:
 Parse the files and load them into a database (Azure SQL DW)
 Be scalable – used for ongoing data loading and history data migration
 Be metadata driven – new data feeds can be handled by the solution by adding metadata
 Log execution history and errors
Problem Context
| © Copyright 2016 Hitachi Consulting42
Parallel Data Loading with Azure Batch
The task (unit of parallelization, or granule) can be:
 Processing a Feed
 balanced number of files/file sizes in each feed
 loading files in sequence
 files can be processed simultaneously on the same node using multithreading (CPU/Memory
implications)
 Processing a File
 no files sequence is needed
 fine grain, more control, better utilization of resources
 less manageable (many tasks per job).
 Processing File Line
 multithreading on the same node.
Parallelism Level
| © Copyright 2016 Hitachi Consulting43
Parallel Data Loading with Azure Batch
Solution Architecture
Azure Batch
Runner
<Host>
Source
<Azure Blob Storage>
Compute Cluster
<Azure Batch Pool>
Feed 1
Feed 2
Feed N
.
.
.
.
.
.
Destination
<Azure SQL DW>
Metadata
<Azure SQL DB>
| © Copyright 2016 Hitachi Consulting44
Parallel Data Loading with Azure Batch
Solution Architecture
Azure Batch
Runner
<Host> Metadata
<Azure SQL DB>
Source
<Azure Blob Storage>
Compute Cluster
<Azure Batch Pool>
Feed 1
Feed 2
Feed N
.
.
.
.
.
.
1 - Get list of feeds to process
Destination
<Azure SQL DW>
| © Copyright 2016 Hitachi Consulting45
Parallel Data Loading with Azure Batch
Solution Architecture
Azure Batch
Runner
<Host>
Source
<Azure Blob Storage>
Compute Cluster
<Azure Batch Pool>
Feed 1
Feed 2
Feed N
.
.
.
.
.
.
1 - Get list of feeds to process
2 – Create a Job
3 – Create a task for each feed
4 – add the tasks to the job
5 – Submit the job
Metadata
<Azure SQL DB>
Destination
<Azure SQL DW>
| © Copyright 2016 Hitachi Consulting46
Parallel Data Loading with Azure Batch
Solution Architecture
Azure Batch
Runner
<Host> Metadata
<SQL Azure DB>
Source
<Azure Blob Storage>
Compute Cluster
<Azure Batch Pool>
Feed 1
Feed 2
Feed N
.
.
.
.
.
.
Task 1
Task 2
Task N
Destination
<Azure SQL DW>
| © Copyright 2016 Hitachi Consulting47
Parallel Data Loading with Azure Batch
Solution Architecture
Azure Batch
Runner
<Host>
Source
<Azure Blob Storage>
Compute Cluster
<Azure Batch Pool>
Feed 1
Feed 2
Feed N
.
.
.
.
.
.
File
1
File
2
. . . DS
1
DS
2
. . .
Task 1
Task 2
Task N
Metadata
<Azure SQL DB>
Destination
<Azure SQL DW>
| © Copyright 2016 Hitachi Consulting48
Parallel Data Loading with Azure Batch
Task Processing Steps
Get feed format Info from Metadata
| © Copyright 2016 Hitachi Consulting49
Parallel Data Loading with Azure Batch
Get feed format Info from Metadata
Create destination tables
Task Processing Steps
| © Copyright 2016 Hitachi Consulting50
Parallel Data Loading with Azure Batch
Get feed format Info from Metadata
Create destination tables
Get list of file to process
Task Processing Steps
| © Copyright 2016 Hitachi Consulting51
Parallel Data Loading with Azure Batch
Get feed format Info from Metadata
Create destination tables
Get list of file to process
Load parser class to use
Task Processing Steps
| © Copyright 2016 Hitachi Consulting52
Parallel Data Loading with Azure Batch
Get feed format Info from Metadata
Create destination tables
Get list of file to process
Load parser class to use
For each file to process
Task Processing Steps
| © Copyright 2016 Hitachi Consulting53
Parallel Data Loading with Azure Batch
Get feed format Info from Metadata
Create destination tables
Get list of file to process
Load parser class to use
For each file to process
Load file content from Blob Storage
Task Processing Steps
| © Copyright 2016 Hitachi Consulting54
Parallel Data Loading with Azure Batch
Get feed format Info from Metadata
Create destination tables
Get list of file to process
Load parser class to use
For each file to process
Load file content from Blob Storage
Parse file content to DataTable
Task Processing Steps
| © Copyright 2016 Hitachi Consulting55
Parallel Data Loading with Azure Batch
Get feed format Info from Metadata
Create destination tables
Get list of file to process
Load parser class to use
For each file to process
Load file content from Blob Storage
Parse file content to DataTable
Dump DataTable content to destination (DW)
Task Processing Steps
| © Copyright 2016 Hitachi Consulting56
.NET Solution Structure
‱ Model
‱ Database Services
‱ Blob Storage Services
‱ Parsers
Processing Logic
(Class Library)
‱ Receives Command Line parameters
‱ Performs the operation according to the supplied
parameters
Task
(Console App)
‱ Azure Batch Services
‱ Creates Pools/Jobs/Task
Runner
(Console App)
| © Copyright 2016 Hitachi Consulting57
.NET Solution Structure
}Azure Blob
Storage
} A Host
‱ Model
‱ Database Services
‱ Blob Storage Services
‱ Parsers
Processing Logic
(Class Library)
‱ Receives Command Line parameters
‱ Performs the operation according to the supplied
parameters
Task
(Console App)
‱ Azure Batch Services
‱ Creates Poos/Jobs/Task
Runner
(Console App)
| © Copyright 2016 Hitachi Consulting58
Hosting Azure Batch Runner
None! – One-off execution
SQL Agent Job (VM + SqlServer)
SQL Server Integration Services (VM + SqlServer)
Azure WebJob + Azure Scheduler (or on-demand)
Azure Data Factory
Azure Orchestration???
| © Copyright 2016 Hitachi Consulting59
Code Walk-through
| © Copyright 2016 Hitachi Consulting60
Code Walk-through
 Solution Structure
 Azure Batch Bits
 Azure Blob Storage Bits
 Text File Processing
 XML & JSON – (Quick and Dirty)
 SQL Bulk Copy with Retry Pattern
This is how we do it
| © Copyright 2016 Hitachi Consulting61
Code Walk-through
Solution Structure
| © Copyright 2016 Hitachi Consulting62
Code Walk-through
Azure Batch Bits
Very useful if you want to
sync with subsequent
processing steps.
I.e., start a subsequent step
only when the job finishes.
| © Copyright 2016 Hitachi Consulting63
Code Walk-through
Azure Batch Bits
| © Copyright 2016 Hitachi Consulting64
Code Walk-through
Azure Batch Bits
| © Copyright 2016 Hitachi Consulting65
Code Walk-through
Azure Blob Storage
Streaming is very efficient in
terms of processing large files,
instead of downloading the whole
file to be processed
| © Copyright 2016 Hitachi Consulting66
Code Walk-through
Text File Parsing – FileHelpers Library
Parallel processing at the file level
(a separate thread per line to parse)
| © Copyright 2016 Hitachi Consulting67
Code Walk-through
XML & JSON Files Parsing – Quick & Dirty
‱ The content of the whole file is loaded in a dataset
‱ Cannot flush data in batches
‱ Unlike streaming, it is more memory intensive approach
| © Copyright 2016 Hitachi Consulting68
Code Walk-through
SQL Bulk Copy – Loading in Batches
Batch size <
(available memory / record size)
| © Copyright 2016 Hitachi Consulting69
Code Walk-through
SQL Bulk Copy – Asynchronous
| © Copyright 2016 Hitachi Consulting70
Code Walk-through
SQL Bulk Copy – Retry Pattern
| © Copyright 2016 Hitachi Consulting71
Some Important Notes - Polybase
 Since the destination database is a Azure SQL DW, Polybase - a Big Data technology - is the best
option to load data from Blob Storage into it, by creating external tables that defines the format of
the data file.
| © Copyright 2016 Hitachi Consulting72
Some Important Notes - Polybase
 Since the destination database is a Azure SQL DW, Polybase - a Big Data technology - is the best
option to load data from Blob Storage into it, by creating external tables that defines the format of
the data file.
 However, to use Polybase, the Blob Storage needs to be locally-redundant, and each folder
should have only one data file type.
| © Copyright 2016 Hitachi Consulting73
Some Important Notes - Polybase
 Since the destination database is a Azure SQL DW, Polybase - a Big Data technology - is the best
option to load data from Blob Storage into it, by creating external tables that defines the format of
the data file.
 However, to use Polybase, the Blob Storage needs to be locally-redundant, and each folder
should have only one data file type.
 A pre-processing step is to move the data files from the original Blob storage (that might be Geo-
redundant), to a temporary locally redundant Blob Storage, in a proper folder structure.
| © Copyright 2016 Hitachi Consulting74
Some Important Notes - Polybase
 Since the destination database is a Azure SQL DW, Polybase - a Big Data technology - is the best
option to load data from Blob Storage into it, by creating external tables that defines the format of
the data file.
 However, to use Polybase, the Blob Storage needs to be locally-redundant, and each folder
should have only one data file type.
 A pre-processing step is to move the data files from the original Blob storage (that might be Geo-
redundant), to a temporary locally redundant Blob Storage, in a proper folder structure.
 Parsing data files with complex format (e.g., parent child, mainframe, JSON, XML) is not possible in
Polybase (yet), but Polybase can load each line in the file into a one-column table, where T-SQL
is used to parse it.
| © Copyright 2016 Hitachi Consulting75
Some Important Notes - Polybase
 Since the destination database is a Azure SQL DW, Polybase - a Big Data technology - is the best
option to load data from Blob Storage into it, by creating external tables that defines the format of
the data file.
 However, to use Polybase, the Blob Storage needs to be locally-redundant, and each folder
should have only one data file type.
 A pre-processing step is to move the data files from the original Blob storage (that might be Geo-
redundant), to a temporary locally redundant Blob Storage, in a proper folder structure.
 Parsing data files with complex format (e.g., parent child, mainframe, JSON, XML) is not possible in
Polybase (yet), but Polybase can load each line in the file into a one-column table, where T-SQL
is used to parse it.
 If the source is not Blob Storage (i.e., file system), or you destination is not Azure SQL DW (e.g.,
Azure SQL DB, DocumentDB, or another Azure Blob Storage/Data lake), or your file processing
does not only involve loading data to a database (e.g., processing requests to initiate workflow),
Azure Batch is the right tool.
| © Copyright 2016 Hitachi Consulting76
Useful Resources
Check these out

‱ Azure Batch Documentation
https://azure.microsoft.com/en-us/documentation/articles/batch-technical-overview
‱ Azure Batch Explorer
https://github.com/Azure/azure-batch-samples/tree/master/CSharp/BatchExplorer
‱ HPC and data orchestration using Azure Batch and Data Factory
https://azure.microsoft.com/en-us/documentation/articles/data-factory-data-processing-using-batch
‱ FileHelpers Librarys
http://www.filehelpers.net
‱ Retry Pattern
https://msdn.microsoft.com/en-us/library/dn589788.aspx
‱ Spinning up 16,000 A1 Virtual Machines on Azure Batch
https://blogs.endjin.com/2015/07/spinning-up-16000-a1-virtual-machines-on-azure-batch
‱ Parallel Computing
https://en.wikipedia.org/wiki/Parallel_computing
| © Copyright 2016 Hitachi Consulting77
Acknowledgement
These guys are awesome

Thanks to James Fox and Alessandro Aeberli for their efforts
in building the awesome Data Landing Solution for Argos.
Nirav is currently the master of the landing solution 
| © Copyright 2016 Hitachi Consulting78
My Background
Applying Computational Intelligence in Data Mining
‱ Honorary Research Fellow, School of Computing , University of Kent.
‱ Ph.D. Computer Science, University of Kent, Canterbury, UK.
‱ M.Sc. Computer Science , The American University in Cairo, Egypt.
‱ 25+ published journal and conference papers, focusing on:
– classification rules induction,
– decision trees construction,
– Bayesian classification modelling,
– data reduction,
– instance-based learning,
– evolving neural networks, and
– data clustering
‱ Journals: Swarm Intelligence, Swarm & Evolutionary Computation,
, Applied Soft Computing, and Memetic Computing.
‱ Conferences: ANTS, IEEE CEC, IEEE SIS, EvoBio,
ECTA, IEEE WCCI and INNS-BigData.
ResearchGate.org

Weitere Àhnliche Inhalte

Was ist angesagt?

있는 ê·žëŒ€ëĄœ ì €ìž„í•˜êł , 바로 분석 가늄한, ìƒˆëĄœìšŽ ꎀ점의 데읎터 ì• ë„ëŠŹí‹± 플랫폌 - ì •ì„žì›… ì• ë„ëŠŹí‹± ìŠ€íŽ˜ì…œëŠŹìŠ€íŠž, AWS
있는 ê·žëŒ€ëĄœ ì €ìž„í•˜êł , 바로 분석 가늄한, ìƒˆëĄœìšŽ ꎀ점의 데읎터 ì• ë„ëŠŹí‹± 플랫폌 - ì •ì„žì›… ì• ë„ëŠŹí‹± ìŠ€íŽ˜ì…œëŠŹìŠ€íŠž, AWS있는 ê·žëŒ€ëĄœ ì €ìž„í•˜êł , 바로 분석 가늄한, ìƒˆëĄœìšŽ ꎀ점의 데읎터 ì• ë„ëŠŹí‹± 플랫폌 - ì •ì„žì›… ì• ë„ëŠŹí‹± ìŠ€íŽ˜ì…œëŠŹìŠ€íŠž, AWS
있는 ê·žëŒ€ëĄœ ì €ìž„í•˜êł , 바로 분석 가늄한, ìƒˆëĄœìšŽ ꎀ점의 데읎터 ì• ë„ëŠŹí‹± 플랫폌 - ì •ì„žì›… ì• ë„ëŠŹí‹± ìŠ€íŽ˜ì…œëŠŹìŠ€íŠž, AWS
Amazon Web Services Korea
 
AWS ëč„ìš© 횚윚화넌 êł ë €í•œ Reserved Instance + Savings Plan 옔션 - 박윀 ì–ŽìčŽìšŽíŠž 맀니저 :: AWS Game...
AWS ëč„ìš© 횚윚화넌 êł ë €í•œ Reserved Instance + Savings Plan 옔션 - 박윀 ì–ŽìčŽìšŽíŠž 맀니저 :: AWS Game...AWS ëč„ìš© 횚윚화넌 êł ë €í•œ Reserved Instance + Savings Plan 옔션 - 박윀 ì–ŽìčŽìšŽíŠž 맀니저 :: AWS Game...
AWS ëč„ìš© 횚윚화넌 êł ë €í•œ Reserved Instance + Savings Plan 옔션 - 박윀 ì–ŽìčŽìšŽíŠž 맀니저 :: AWS Game...
Amazon Web Services Korea
 
AWS Fargate와 Amazon ECSë„Œ ì‚Źìš©í•œ CI/CD ëČ ìŠ€íŠž 프랙티슀 - ìœ ìžŹì„, AWS ì†”ëŁšì…˜ìŠˆ 아킀텍튞 :: AWS Build...
AWS Fargate와 Amazon ECSë„Œ ì‚Źìš©í•œ CI/CD ëČ ìŠ€íŠž 프랙티슀 - ìœ ìžŹì„, AWS ì†”ëŁšì…˜ìŠˆ 아킀텍튞 :: AWS Build...AWS Fargate와 Amazon ECSë„Œ ì‚Źìš©í•œ CI/CD ëČ ìŠ€íŠž 프랙티슀 - ìœ ìžŹì„, AWS ì†”ëŁšì…˜ìŠˆ 아킀텍튞 :: AWS Build...
AWS Fargate와 Amazon ECSë„Œ ì‚Źìš©í•œ CI/CD ëČ ìŠ€íŠž 프랙티슀 - ìœ ìžŹì„, AWS ì†”ëŁšì…˜ìŠˆ 아킀텍튞 :: AWS Build...
Amazon Web Services Korea
 

Was ist angesagt? (20)

Windows Azure Blob Storage
Windows Azure Blob StorageWindows Azure Blob Storage
Windows Azure Blob Storage
 
있는 ê·žëŒ€ëĄœ ì €ìž„í•˜êł , 바로 분석 가늄한, ìƒˆëĄœìšŽ ꎀ점의 데읎터 ì• ë„ëŠŹí‹± 플랫폌 - ì •ì„žì›… ì• ë„ëŠŹí‹± ìŠ€íŽ˜ì…œëŠŹìŠ€íŠž, AWS
있는 ê·žëŒ€ëĄœ ì €ìž„í•˜êł , 바로 분석 가늄한, ìƒˆëĄœìšŽ ꎀ점의 데읎터 ì• ë„ëŠŹí‹± 플랫폌 - ì •ì„žì›… ì• ë„ëŠŹí‹± ìŠ€íŽ˜ì…œëŠŹìŠ€íŠž, AWS있는 ê·žëŒ€ëĄœ ì €ìž„í•˜êł , 바로 분석 가늄한, ìƒˆëĄœìšŽ ꎀ점의 데읎터 ì• ë„ëŠŹí‹± 플랫폌 - ì •ì„žì›… ì• ë„ëŠŹí‹± ìŠ€íŽ˜ì…œëŠŹìŠ€íŠž, AWS
있는 ê·žëŒ€ëĄœ ì €ìž„í•˜êł , 바로 분석 가늄한, ìƒˆëĄœìšŽ ꎀ점의 데읎터 ì• ë„ëŠŹí‹± 플랫폌 - ì •ì„žì›… ì• ë„ëŠŹí‹± ìŠ€íŽ˜ì…œëŠŹìŠ€íŠž, AWS
 
Introduction to AWS
Introduction to AWSIntroduction to AWS
Introduction to AWS
 
Data Science on Google Cloud Platform
Data Science on Google Cloud PlatformData Science on Google Cloud Platform
Data Science on Google Cloud Platform
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilities
 
Azure SQL Database
Azure SQL DatabaseAzure SQL Database
Azure SQL Database
 
AWS ëč„ìš© 횚윚화넌 êł ë €í•œ Reserved Instance + Savings Plan 옔션 - 박윀 ì–ŽìčŽìšŽíŠž 맀니저 :: AWS Game...
AWS ëč„ìš© 횚윚화넌 êł ë €í•œ Reserved Instance + Savings Plan 옔션 - 박윀 ì–ŽìčŽìšŽíŠž 맀니저 :: AWS Game...AWS ëč„ìš© 횚윚화넌 êł ë €í•œ Reserved Instance + Savings Plan 옔션 - 박윀 ì–ŽìčŽìšŽíŠž 맀니저 :: AWS Game...
AWS ëč„ìš© 횚윚화넌 êł ë €í•œ Reserved Instance + Savings Plan 옔션 - 박윀 ì–ŽìčŽìšŽíŠž 맀니저 :: AWS Game...
 
Serverless Design Patterns for Rethinking Traditional Enterprise Application ...
Serverless Design Patterns for Rethinking Traditional Enterprise Application ...Serverless Design Patterns for Rethinking Traditional Enterprise Application ...
Serverless Design Patterns for Rethinking Traditional Enterprise Application ...
 
Introduction to Serverless
Introduction to ServerlessIntroduction to Serverless
Introduction to Serverless
 
Introduction to GCP presentation
Introduction to GCP presentationIntroduction to GCP presentation
Introduction to GCP presentation
 
Understanding LLM LLMOps & MLOps_open version.pdf
Understanding LLM LLMOps & MLOps_open version.pdfUnderstanding LLM LLMOps & MLOps_open version.pdf
Understanding LLM LLMOps & MLOps_open version.pdf
 
AWS network services
AWS network servicesAWS network services
AWS network services
 
Cloud comparison - AWS vs Azure vs Google
Cloud comparison - AWS vs Azure vs GoogleCloud comparison - AWS vs Azure vs Google
Cloud comparison - AWS vs Azure vs Google
 
AWS Fargate와 Amazon ECSë„Œ ì‚Źìš©í•œ CI/CD ëČ ìŠ€íŠž 프랙티슀 - ìœ ìžŹì„, AWS ì†”ëŁšì…˜ìŠˆ 아킀텍튞 :: AWS Build...
AWS Fargate와 Amazon ECSë„Œ ì‚Źìš©í•œ CI/CD ëČ ìŠ€íŠž 프랙티슀 - ìœ ìžŹì„, AWS ì†”ëŁšì…˜ìŠˆ 아킀텍튞 :: AWS Build...AWS Fargate와 Amazon ECSë„Œ ì‚Źìš©í•œ CI/CD ëČ ìŠ€íŠž 프랙티슀 - ìœ ìžŹì„, AWS ì†”ëŁšì…˜ìŠˆ 아킀텍튞 :: AWS Build...
AWS Fargate와 Amazon ECSë„Œ ì‚Źìš©í•œ CI/CD ëČ ìŠ€íŠž 프랙티슀 - ìœ ìžŹì„, AWS ì†”ëŁšì…˜ìŠˆ 아킀텍튞 :: AWS Build...
 
Automating nist 800 171 compliance in AWS Govcloud (US)
Automating nist 800 171 compliance in AWS Govcloud (US)Automating nist 800 171 compliance in AWS Govcloud (US)
Automating nist 800 171 compliance in AWS Govcloud (US)
 
Hybrid- and Multi-Cloud by design - IBM Cloud and your journey to Cloud
Hybrid- and Multi-Cloud by design - IBM Cloud and your journey to CloudHybrid- and Multi-Cloud by design - IBM Cloud and your journey to Cloud
Hybrid- and Multi-Cloud by design - IBM Cloud and your journey to Cloud
 
AWS Tagging Strategy
AWS Tagging StrategyAWS Tagging Strategy
AWS Tagging Strategy
 
Ad-Tech on AWS ì„žëŻžë‚˜ | AWS와 데읎터 분석
Ad-Tech on AWS ì„žëŻžë‚˜ | AWS와 데읎터 분석Ad-Tech on AWS ì„žëŻžë‚˜ | AWS와 데읎터 분석
Ad-Tech on AWS ì„žëŻžë‚˜ | AWS와 데읎터 분석
 
AWS 신규 데읎터 분석 서ëč„슀 - QuickSight, Kinesis Firehose 등 (양ìŠč도) :: re:Invent re:Cap ...
AWS 신규 데읎터 분석 서ëč„슀 - QuickSight, Kinesis Firehose 등 (양ìŠč도) :: re:Invent re:Cap ...AWS 신규 데읎터 분석 서ëč„슀 - QuickSight, Kinesis Firehose 등 (양ìŠč도) :: re:Invent re:Cap ...
AWS 신규 데읎터 분석 서ëč„슀 - QuickSight, Kinesis Firehose 등 (양ìŠč도) :: re:Invent re:Cap ...
 
Azure Key Vault, Azure Dev Ops and Azure Synapse - how these services work pe...
Azure Key Vault, Azure Dev Ops and Azure Synapse - how these services work pe...Azure Key Vault, Azure Dev Ops and Azure Synapse - how these services work pe...
Azure Key Vault, Azure Dev Ops and Azure Synapse - how these services work pe...
 

Andere mochten auch

Andere mochten auch (20)

Spark with HDInsight
Spark with HDInsightSpark with HDInsight
Spark with HDInsight
 
20060416 Azure Boot Camp 2016- Azure Data Lake Storage and Analytics
20060416   Azure Boot Camp 2016- Azure Data Lake Storage and Analytics20060416   Azure Boot Camp 2016- Azure Data Lake Storage and Analytics
20060416 Azure Boot Camp 2016- Azure Data Lake Storage and Analytics
 
[JSS2015] Azure SQL Data Warehouse - Azure Data Lake
[JSS2015] Azure SQL Data Warehouse - Azure Data Lake[JSS2015] Azure SQL Data Warehouse - Azure Data Lake
[JSS2015] Azure SQL Data Warehouse - Azure Data Lake
 
Azure SQL DWH
Azure SQL DWHAzure SQL DWH
Azure SQL DWH
 
SQL Saturday #313 Rheinland - MapReduce in der Praxis
SQL Saturday #313 Rheinland - MapReduce in der PraxisSQL Saturday #313 Rheinland - MapReduce in der Praxis
SQL Saturday #313 Rheinland - MapReduce in der Praxis
 
AnalyticsConf : Azure SQL Data Warehouse
AnalyticsConf : Azure SQL Data WarehouseAnalyticsConf : Azure SQL Data Warehouse
AnalyticsConf : Azure SQL Data Warehouse
 
How to deploy SQL Server on an Microsoft Azure virtual machines
How to deploy SQL Server on an Microsoft Azure virtual machinesHow to deploy SQL Server on an Microsoft Azure virtual machines
How to deploy SQL Server on an Microsoft Azure virtual machines
 
Datawarehouse como servicio en azure (sqldw)
Datawarehouse como servicio en azure (sqldw)Datawarehouse como servicio en azure (sqldw)
Datawarehouse como servicio en azure (sqldw)
 
Enterprise Cloud Data Platforms - with Microsoft Azure
Enterprise Cloud Data Platforms - with Microsoft AzureEnterprise Cloud Data Platforms - with Microsoft Azure
Enterprise Cloud Data Platforms - with Microsoft Azure
 
Microsoft Azure Data Warehouse Overview
Microsoft Azure Data Warehouse OverviewMicrosoft Azure Data Warehouse Overview
Microsoft Azure Data Warehouse Overview
 
SQL Azure Data Warehouse - Silviu Niculita
SQL Azure Data Warehouse - Silviu NiculitaSQL Azure Data Warehouse - Silviu Niculita
SQL Azure Data Warehouse - Silviu Niculita
 
Machine learning with Spark
Machine learning with SparkMachine learning with Spark
Machine learning with Spark
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data Warehouse
 
Introducing Azure SQL Database
Introducing Azure SQL DatabaseIntroducing Azure SQL Database
Introducing Azure SQL Database
 
Intorducing Big Data and Microsoft Azure
Intorducing Big Data and Microsoft AzureIntorducing Big Data and Microsoft Azure
Intorducing Big Data and Microsoft Azure
 
Cortana Analytics Workshop: Azure Data Lake
Cortana Analytics Workshop: Azure Data LakeCortana Analytics Workshop: Azure Data Lake
Cortana Analytics Workshop: Azure Data Lake
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake Analytics
 
Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloud
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft Azure
 
SQL to Hive Cheat Sheet
SQL to Hive Cheat SheetSQL to Hive Cheat Sheet
SQL to Hive Cheat Sheet
 

Ähnlich wie Microsoft Azure Batch

Lean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big DataLean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big Data
Stylight
 
Cost Optimization as Major Architectural Consideration for Cloud Application
Cost Optimization as Major Architectural Consideration for Cloud ApplicationCost Optimization as Major Architectural Consideration for Cloud Application
Cost Optimization as Major Architectural Consideration for Cloud Application
Udayan Banerjee
 

Ähnlich wie Microsoft Azure Batch (20)

Easy and Efficient Batch Computing on AWS
Easy and Efficient Batch Computing on AWSEasy and Efficient Batch Computing on AWS
Easy and Efficient Batch Computing on AWS
 
Lean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big DataLean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big Data
 
Make your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWSMake your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWS
 
Real-world Cloud HPC at Scale, for Production Workloads (BDT212) | AWS re:Inv...
Real-world Cloud HPC at Scale, for Production Workloads (BDT212) | AWS re:Inv...Real-world Cloud HPC at Scale, for Production Workloads (BDT212) | AWS re:Inv...
Real-world Cloud HPC at Scale, for Production Workloads (BDT212) | AWS re:Inv...
 
Code for the earth OCP APAC Tokyo 2013-05
Code for the earth OCP APAC Tokyo 2013-05Code for the earth OCP APAC Tokyo 2013-05
Code for the earth OCP APAC Tokyo 2013-05
 
Azure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User StoreAzure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User Store
 
Peteris Arajs - Where is my data
Peteris Arajs - Where is my dataPeteris Arajs - Where is my data
Peteris Arajs - Where is my data
 
Software Defined Infrastructure
Software Defined InfrastructureSoftware Defined Infrastructure
Software Defined Infrastructure
 
Amazon WorkSpaces-Virtual Desktops in Cloud
Amazon WorkSpaces-Virtual Desktops in CloudAmazon WorkSpaces-Virtual Desktops in Cloud
Amazon WorkSpaces-Virtual Desktops in Cloud
 
How Edmodo Uses Splunk For Real-Time Tag-Based Reporting of AWS Billing and U...
How Edmodo Uses Splunk For Real-Time Tag-Based Reporting of AWS Billing and U...How Edmodo Uses Splunk For Real-Time Tag-Based Reporting of AWS Billing and U...
How Edmodo Uses Splunk For Real-Time Tag-Based Reporting of AWS Billing and U...
 
Design Choices for Cloud Data Platforms
Design Choices for Cloud Data PlatformsDesign Choices for Cloud Data Platforms
Design Choices for Cloud Data Platforms
 
Architectures for HPC/HTC Workloads on AWS - CMP306 - re:Invent 2017
Architectures for HPC/HTC Workloads on AWS - CMP306 - re:Invent 2017Architectures for HPC/HTC Workloads on AWS - CMP306 - re:Invent 2017
Architectures for HPC/HTC Workloads on AWS - CMP306 - re:Invent 2017
 
Cost Optimization as Major Architectural Consideration for Cloud Application
Cost Optimization as Major Architectural Consideration for Cloud ApplicationCost Optimization as Major Architectural Consideration for Cloud Application
Cost Optimization as Major Architectural Consideration for Cloud Application
 
ExpertsLive Asia Pacific 2017 - Planning and Deploying SharePoint Server 2016...
ExpertsLive Asia Pacific 2017 - Planning and Deploying SharePoint Server 2016...ExpertsLive Asia Pacific 2017 - Planning and Deploying SharePoint Server 2016...
ExpertsLive Asia Pacific 2017 - Planning and Deploying SharePoint Server 2016...
 
Linux on Azure Pitch Deck
Linux on Azure Pitch DeckLinux on Azure Pitch Deck
Linux on Azure Pitch Deck
 
Task programming
Task programmingTask programming
Task programming
 
EFFICIENT TRUSTED CLOUD STORAGE USING PARALLEL CLOUD COMPUTING
EFFICIENT TRUSTED CLOUD STORAGE USING PARALLEL CLOUD COMPUTINGEFFICIENT TRUSTED CLOUD STORAGE USING PARALLEL CLOUD COMPUTING
EFFICIENT TRUSTED CLOUD STORAGE USING PARALLEL CLOUD COMPUTING
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing data
 
Cloud Economics, from Genesis to Scale
Cloud Economics, from Genesis to ScaleCloud Economics, from Genesis to Scale
Cloud Economics, from Genesis to Scale
 
ENT307 Move your Desktops and Apps to AWS with Amazon WorkSpaces and AppStre...
 ENT307 Move your Desktops and Apps to AWS with Amazon WorkSpaces and AppStre... ENT307 Move your Desktops and Apps to AWS with Amazon WorkSpaces and AppStre...
ENT307 Move your Desktops and Apps to AWS with Amazon WorkSpaces and AppStre...
 

Mehr von Khalid Salama

Mehr von Khalid Salama (8)

Microsoft R - ScaleR Overview
Microsoft R - ScaleR OverviewMicrosoft R - ScaleR Overview
Microsoft R - ScaleR Overview
 
Operational Machine Learning: Using Microsoft Technologies for Applied Data S...
Operational Machine Learning: Using Microsoft Technologies for Applied Data S...Operational Machine Learning: Using Microsoft Technologies for Applied Data S...
Operational Machine Learning: Using Microsoft Technologies for Applied Data S...
 
Microservices, DevOps, and Continuous Delivery
Microservices, DevOps, and Continuous DeliveryMicroservices, DevOps, and Continuous Delivery
Microservices, DevOps, and Continuous Delivery
 
Graph Analytics
Graph AnalyticsGraph Analytics
Graph Analytics
 
NoSQL with Microsoft Azure
NoSQL with Microsoft AzureNoSQL with Microsoft Azure
NoSQL with Microsoft Azure
 
Hive with HDInsight
Hive with HDInsightHive with HDInsight
Hive with HDInsight
 
Real-Time Event & Stream Processing on MS Azure
Real-Time Event & Stream Processing on MS AzureReal-Time Event & Stream Processing on MS Azure
Real-Time Event & Stream Processing on MS Azure
 
Data Mining - The Big Picture!
Data Mining - The Big Picture!Data Mining - The Big Picture!
Data Mining - The Big Picture!
 

KĂŒrzlich hochgeladen

âž„đŸ” 7737669865 đŸ”â–» Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
âž„đŸ” 7737669865 đŸ”â–» Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...âž„đŸ” 7737669865 đŸ”â–» Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
âž„đŸ” 7737669865 đŸ”â–» Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
amitlee9823
 
Just Call Vip call girls Bellary Escorts ☎9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎9352988975 Two shot with one girl ...
gajnagarg
 
Call Girls In Hsr Layout ☎ 7737669865 đŸ„” Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 đŸ„” Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 đŸ„” Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 đŸ„” Book Your One night Stand
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
âž„đŸ” 7737669865 đŸ”â–» Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
âž„đŸ” 7737669865 đŸ”â–» Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...âž„đŸ” 7737669865 đŸ”â–» Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
âž„đŸ” 7737669865 đŸ”â–» Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Just Call Vip call girls roorkee Escorts ☎9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎9352988975 Two shot with one girl ...
gajnagarg
 
âž„đŸ” 7737669865 đŸ”â–» malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
âž„đŸ” 7737669865 đŸ”â–» malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...âž„đŸ” 7737669865 đŸ”â–» malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
âž„đŸ” 7737669865 đŸ”â–» malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 JustđŸ“Č Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 JustđŸ“Č Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 JustđŸ“Č Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 JustđŸ“Č Call Ruhi Call Girl Phone No Amri...
karishmasinghjnh
 
Just Call Vip call girls Palakkad Escorts ☎9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎9352988975 Two shot with one girl...
gajnagarg
 

KĂŒrzlich hochgeladen (20)

âž„đŸ” 7737669865 đŸ”â–» Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
âž„đŸ” 7737669865 đŸ”â–» Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...âž„đŸ” 7737669865 đŸ”â–» Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
âž„đŸ” 7737669865 đŸ”â–» Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
 
Just Call Vip call girls Bellary Escorts ☎9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎9352988975 Two shot with one girl ...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls In Hsr Layout ☎ 7737669865 đŸ„” Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 đŸ„” Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 đŸ„” Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 đŸ„” Book Your One night Stand
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
âž„đŸ” 7737669865 đŸ”â–» Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
âž„đŸ” 7737669865 đŸ”â–» Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...âž„đŸ” 7737669865 đŸ”â–» Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
âž„đŸ” 7737669865 đŸ”â–» Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Just Call Vip call girls roorkee Escorts ☎9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎9352988975 Two shot with one girl ...
 
âž„đŸ” 7737669865 đŸ”â–» malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
âž„đŸ” 7737669865 đŸ”â–» malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...âž„đŸ” 7737669865 đŸ”â–» malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
âž„đŸ” 7737669865 đŸ”â–» malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 JustđŸ“Č Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 JustđŸ“Č Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 JustđŸ“Č Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 JustđŸ“Č Call Ruhi Call Girl Phone No Amri...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Just Call Vip call girls Palakkad Escorts ☎9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎9352988975 Two shot with one girl...
 

Microsoft Azure Batch

  • 1. | © Copyright 2016 Hitachi Consulting1 Microsoft Azure Batch High Performance Computing with an Application of Scalable Files Processing Khalid M. Salama, Ph.D. Business Insights & Analytics Hitachi Consulting UK We Make it Happen. Better.
  • 2. | © Copyright 2016 Hitachi Consulting2 Outline  What is Azure Batch and High Performance Computing?  When to Use Azure Batch?  Azure Batch Constructs  Scalable Data Loading Solution with Azure Batch  .NET Code Walk-through & Demo  Useful Resources
  • 3. | © Copyright 2016 Hitachi Consulting3 High Performance Computing
  • 4. | © Copyright 2016 Hitachi Consulting4 What is Azure Batch? Yet anther azure service
 High Performance Computing (HPC) environment on Azure.
  • 5. | © Copyright 2016 Hitachi Consulting5 What is Azure Batch? Yet anther azure service
 High Performance Computing (HPC) environment on Azure. Used to scale/parallelize compute- intensive workloads on managed cluster of VMs.
  • 6. | © Copyright 2016 Hitachi Consulting6 What is Azure Batch? Yet anther azure service
 High Performance Computing (HPC) environment on Azure. The computation on the cluster is managed using Azure Batch APIs. Used to scale/parallelize compute- intensive workloads on managed cluster of VMs.
  • 7. | © Copyright 2016 Hitachi Consulting7 What is Azure Batch? Yet anther azure service
 High Performance Computing (HPC) environment on Azure. The computation on the cluster is managed using Azure Batch APIs. On-demand – Pay as you use Elastic – Scale up/down or shut down PaaS – No infrastructure configurations are needed Used to scale/parallelize compute- intensive workloads on managed cluster of VMs.
  • 8. | © Copyright 2016 Hitachi Consulting8 Computing Example Job Job Sequential Processing
  • 9. | © Copyright 2016 Hitachi Consulting9 Computing Example Job Job Task 1 Task 2 Task 3 Task 4 Task 5 Task 6 Sequential Processing
  • 10. | © Copyright 2016 Hitachi Consulting10 Computing Example Job Job Task 1 Task 2 Task 3 Task 4 Task 5 Task 6 Sequential Processing Single Compute Unit
  • 11. | © Copyright 2016 Hitachi Consulting11 Computing Example Job Job Task 1 Task 2 Task 3 Task 4 Task 5 Task 6 Task 1 Sequential Processing Single Compute Unit Start T = 0
  • 12. | © Copyright 2016 Hitachi Consulting12 Computing Example Job Job Task 1 Task 2 Task 3 Task 4 Task 5 Task 6 Task 2 Sequential Processing Task 1 T = 1X Start T = 0 Single Compute Unit
  • 13. | © Copyright 2016 Hitachi Consulting13 Computing Example Job Job Task 1 Task 2 Task 3 Task 4 Task 5 Task 6 Task 3 Sequential Processing Task 1 T = 1X Start T = 0 Task 2 T = 2X Single Compute Unit
  • 14. | © Copyright 2016 Hitachi Consulting14 Computing Example Job Job Task 1 Task 2 Task 3 Task 4 Task 5 Task 6 Task 1 T = 1X Start T = 0 Task 2 T = 2X Task 3 T = 3X Task 4 T = 4X Task 5 T = 5X Task 6 T = 6X Sequential Processing End T = 6X+ Single Compute Unit
  • 15. | © Copyright 2016 Hitachi Consulting15 High Performance Computing Refers to the use of parallel processing for running compute intensive job programs efficiently via aggregating compute power
  • 16. | © Copyright 2016 Hitachi Consulting16 High Performance Computing Refers to the use of parallel processing for running compute intensive job programs efficiently via aggregating compute power Scale out Using multiple compute units Divide A Job is decomposed into multiple Independent tasks Distribute Tasks are processed in a separate compute nodes, simultaneously
  • 17. | © Copyright 2016 Hitachi Consulting17 Computing Example JobJob Task 1 Task 2 Task 3 Task 4 Task 5 Task 6 Parallel Processing
  • 18. | © Copyright 2016 Hitachi Consulting18 Computing Example JobJob Task 1 Task 2 Task 3 Task 4 Task 5 Task 6 Parallel Processing Compute Cluster
  • 19. | © Copyright 2016 Hitachi Consulting19 Computing Example JobJob Task 1 Task 2 Task 3 Task 4 Task 5 Task 6 Parallel Processing Compute Cluster Task 1 Task 2 Task 3 Task 4 Task 4 Task 6
  • 20. | © Copyright 2016 Hitachi Consulting20 Computing Example JobJob Task 1 Task 2 Task 3 Task 4 Task 5 Task 6 Parallel Processing Compute Cluster Task 1 T = 1X Start T = 0 Task 2 T = 1X Task 3 T = 1X Task 4 T = 1X Task 5 T = 1X Task 6 T = 1X End T = 1X+
  • 21. | © Copyright 2016 Hitachi Consulting21 Big Data vs. Big Compute The big brothers Big Data  Data Centric  Increase of data Volume + Velocity + Varity = Technologies to store and process the data efficiently  Azure HDInsight
  • 22. | © Copyright 2016 Hitachi Consulting22 Big Data vs. Big Compute The big brothers Big Data Big Compute  Data Centric  Increase of data Volume + Velocity + Varity = Technologies to store and process the data efficiently  Azure HDInsight  CPU & Memory Intensive  Increase of computation and algorithms complexity = Technologies to parallelize/distribute workload  Azure Batch
  • 23. | © Copyright 2016 Hitachi Consulting23 Big Data vs. Big Compute Big Data Processing is a subset of Big Compute, the latter covers a wider spectrum of computing problems The big brothers Big Data Big Compute  Data Centric  Increase of data Volume + Velocity + Varity = Technologies to store and process the data efficiently  Azure HDInsight  CPU & Memory Intensive  Increase of computation and algorithms complexity = Technologies to parallelize/distribute workload  Azure Batch
  • 24. | © Copyright 2016 Hitachi Consulting24 When to use Azure Batch Intrinsically parallel (also known as "embarrassingly parallel") applications Use cases for Big Compute
  • 25. | © Copyright 2016 Hitachi Consulting25 When to use Azure Batch Intrinsically parallel (also known as "embarrassingly parallel") applications  Image rendering and graphics processing  Search and optimization problems  Various experimental/simulation computing applications  Massively parallel data file processing & loading Use cases for Big Compute
  • 26. | © Copyright 2016 Hitachi Consulting26 When to use Azure Batch Intrinsically parallel (also known as "embarrassingly parallel") applications  Image rendering and graphics processing  Search and optimization problems  Various experimental/simulation computing applications  Massively parallel data file processing & loading  Executing thousands of DB Stored Procedures simultaneously Use cases for Big Compute
  • 27. | © Copyright 2016 Hitachi Consulting27 When to use Azure Batch Intrinsically parallel (also known as "embarrassingly parallel") applications  Image rendering and graphics processing  Search and optimization problems  Various experimental/simulation computing applications  Massively parallel data file processing & loading  Executing thousands of DB Stored Procedures simultaneously NO! Remember where the computation occurs! Use cases for Big Compute
  • 28. | © Copyright 2016 Hitachi Consulting28 When to use Azure Batch Intrinsically parallel (also known as "embarrassingly parallel") applications  Image rendering and graphics processing  Search and optimization problems  Various experimental/simulation computing applications  Massively parallel data file processing & loading  Executing thousands of DB Stored Procedures simultaneously NO! Remember where the computation occurs! For applications that needs task-to-task interaction, Message Passing Interfaces (MPI) are supported in Azure Batch – Distributed Processing In some cases, communication between tasks can be managed via a shared data store – Parallel Processing Use cases for Big Compute
  • 29. | © Copyright 2016 Hitachi Consulting29 Azure Batch
  • 30. | © Copyright 2016 Hitachi Consulting30 Azure Batch Constructs Putting together the pieces of the picture Azure Batch Account Azure Batch Account ‱ Pool − Number of VMs − VM Size − VM OS Family  Job − Set of Tasks − Priority − Max. Execution time  Task − Parent Job − Resources (.config, .dlls) − Cmd Executable (.exe) − Cmd Parameters Azure Storage Account  Hosts all the task resources (.dlls & .exe)
  • 31. | © Copyright 2016 Hitachi Consulting31 Azure Batch Constructs Putting together the pieces of the picture Azure Batch Account Pool (number of nodes, osFamily, Node Size Azure Batch Account ‱ Pool − Number of VMs − VM Size − VM OS Family  Job − Set of Tasks − Priority − Max. Execution time  Task − Parent Job − Resources (.config, .dlls) − Cmd Executable (.exe) − Cmd Parameters Azure Storage Account  Hosts all the task resources (.dlls & .exe)
  • 32. | © Copyright 2016 Hitachi Consulting32 Azure Batch Constructs Putting together the pieces of the picture Azure Batch Account Pool 1 (number of nodes, osFamily, Node Size Pool 2 (number of nodes, osFamily, Node Size Azure Batch Account ‱ Pool − Number of VMs − VM Size − VM OS Family  Job − Set of Tasks − Priority − Max. Execution time  Task − Parent Job − Resources (.config, .dlls) − Cmd Executable (.exe) − Cmd Parameters Azure Storage Account  Hosts all the task resources (.dlls & .exe)
  • 33. | © Copyright 2016 Hitachi Consulting33 Azure Batch Constructs Putting together the pieces of the picture Azure Batch Account Pool 1 (number of nodes, osFamily, Node Size Job (priority, max execution time) Task 1 (job, exe resources) Task 2 (job, ex resources) Task 3 (job, exe resources) Pool 2 (number of nodes, osFamily, Node Size Azure Batch Account ‱ Pool − Number of VMs − VM Size − VM OS Family  Job − Set of Tasks − Priority − Max. Execution time  Task − Parent Job − Resources (.config, .dlls) − Cmd Executable (.exe) − Cmd Parameters Azure Storage Account  Hosts all the task resources (.dlls & .exe)
  • 34. | © Copyright 2016 Hitachi Consulting34 Job 2 (priority, max execution time) Azure Batch Constructs Putting together the pieces of the picture Azure Batch Account Pool 1 (number of nodes, osFamily, Node Size Job 1 (priority, max execution time) Task 1 (job, exe resources) Task 2 (job, ex resources) Task 3 (job, exe resources) Task A (job, exe resources) Task B (job, exe resources) Job 3 (priority, max execution time) Task X (job, exe resources) Task Y (job, exe resources) Pool 2 (number of nodes, osFamily, Node Size Azure Batch Account ‱ Pool − Number of VMs − VM Size − VM OS Family  Job − Set of Tasks − Priority − Max. Execution time  Task − Parent Job − Resources (.config, .dlls) − Cmd Executable (.exe) − Cmd Parameters Azure Storage Account  Hosts all the task resources (.dlls & .exe)
  • 35. | © Copyright 2016 Hitachi Consulting35 Job 2 (priority, max execution time) Azure Batch Constructs Putting together the pieces of the picture Azure Batch Account Pool 1 (number of nodes, osFamily, Node Size Job 1 (priority, max execution time) Task 1 (job, exe resources) Task 2 (job, exe resources) Task 3 (job, exe resources) Task A (job, exe resources) Task B (job, exe resources) Job 3 (priority, max execution time) Task X (job, exe resources) Task Y (job, exe resources) Pool 2 (number of nodes, osFamily, Node Size Azure Batch Account ‱ Pool − Number of VMs − VM Size − VM OS Family  Job − Set of Tasks − Priority − Max. Execution time  Task − Parent Job − Resources (.config, .dlls) − Cmd Executable (.exe) − Cmd Parameters Azure Storage Account  Hosts all the task resources (.dlls & .exe)
  • 36. | © Copyright 2016 Hitachi Consulting36 Compute Size Resource Default Maximum Limit Azure Batch Account 1 50 Pools per Batch Account 20 5000 Cores per Batch Account 20 N/A Tasks per Compute Node 1 4 X node core Number of Nodes vs Node Size:  Many small nodes → many tasks, not compute/memory intensive  Few big nodes → few tasks, compute/memory intensive (potential multi-threading per task)  Task queueing is automatically managed by Azure Batch Azure Batch Account ‱ Pool − Number of VMs − VM Size − VM OS Family  Job − Set of Tasks − Priority − Max. Execution time  Task − Parent Job − Resources (.config, .dlls) − Cmd Executable (.exe) − Cmd Parameters Azure Storage Account  Hosts all the task resources (.dlls & .exe)
  • 37. | © Copyright 2016 Hitachi Consulting37 Compute Size What If:  Pool Size = 10 Nodes  Node Size = Small (1 Core)  Total Cores = 10 And you have:  2 Jobs  Each Job has 7 task  Total tasks = 14 By default:  1 Core can process only 1 task
  • 38. | © Copyright 2016 Hitachi Consulting38 Compute Size What If:  Pool Size = 10 Nodes  Node Size = Small (1 Core)  Total Cores = 10 And you have:  2 Jobs  Each Job has 7 task  Total tasks = 14 By default:  1 Core can process only 1 task Then:  The 7 tasks with the higher priority job will be executed (status = “Running”)  The first 3 added tasks to the lower priority job will be executed (status = “Running”)  The rest 4 task of the lower priority job will be queued (status = “Active”)  As soon as a “Running” task finishes (status = “Completed”) an “Active” task will be assigned to the freed compute node
  • 39. | © Copyright 2016 Hitachi Consulting39 Compute Size What If:  Pool Size = 10 Nodes  Node Size = Small (1 Core)  Total Cores = 10 And you have:  2 Jobs  Each Job has 7 task  Total tasks = 14 By default:  1 Core can process only 1 task Then:  The 7 tasks with the higher priority job will be executed (status = “Running”)  The first 3 added tasks to the lower priority job will be executed (status = “Running”)  The rest 4 task of the lower priority job will be queued (status = “Active”)  As soon as a “Running” task finishes (status = “Completed”) an “Active” task will be assigned to the freed compute node  If job was executed (status = “Running”), then a higher priority job is submitted to the same pool: − Azure Batch will “pause” tasks of the low priority job (status = “Suspended”) to free resources (cores) for the higher priority job, − then resume them when resources become available
  • 40. | © Copyright 2016 Hitachi Consulting40 Use Case: Parallel Data Files Loading
  • 41. | © Copyright 2016 Hitachi Consulting41 Parallel Data Loading with Azure Batch  Source data is a set of files, with different formants (Fixed width, Delimited, XML, JSON, Mainframe, Other), in Azure Blob Storage  Blob Storage Structure: “<DataDomain><DataFeed><DataFeed>_<Timestamp>.<ext>”  200+ data feeds, each produces 1-3 files daily  Data feed formats (column, data types, file format) are described in MetadataDB (Azure SQL DB)  The objective is to build a Data Loading Solution to:  Parse the files and load them into a database (Azure SQL DW)  Be scalable – used for ongoing data loading and history data migration  Be metadata driven – new data feeds can be handled by the solution by adding metadata  Log execution history and errors Problem Context
  • 42. | © Copyright 2016 Hitachi Consulting42 Parallel Data Loading with Azure Batch The task (unit of parallelization, or granule) can be:  Processing a Feed  balanced number of files/file sizes in each feed  loading files in sequence  files can be processed simultaneously on the same node using multithreading (CPU/Memory implications)  Processing a File  no files sequence is needed  fine grain, more control, better utilization of resources  less manageable (many tasks per job).  Processing File Line  multithreading on the same node. Parallelism Level
  • 43. | © Copyright 2016 Hitachi Consulting43 Parallel Data Loading with Azure Batch Solution Architecture Azure Batch Runner <Host> Source <Azure Blob Storage> Compute Cluster <Azure Batch Pool> Feed 1 Feed 2 Feed N . . . . . . Destination <Azure SQL DW> Metadata <Azure SQL DB>
  • 44. | © Copyright 2016 Hitachi Consulting44 Parallel Data Loading with Azure Batch Solution Architecture Azure Batch Runner <Host> Metadata <Azure SQL DB> Source <Azure Blob Storage> Compute Cluster <Azure Batch Pool> Feed 1 Feed 2 Feed N . . . . . . 1 - Get list of feeds to process Destination <Azure SQL DW>
  • 45. | © Copyright 2016 Hitachi Consulting45 Parallel Data Loading with Azure Batch Solution Architecture Azure Batch Runner <Host> Source <Azure Blob Storage> Compute Cluster <Azure Batch Pool> Feed 1 Feed 2 Feed N . . . . . . 1 - Get list of feeds to process 2 – Create a Job 3 – Create a task for each feed 4 – add the tasks to the job 5 – Submit the job Metadata <Azure SQL DB> Destination <Azure SQL DW>
  • 46. | © Copyright 2016 Hitachi Consulting46 Parallel Data Loading with Azure Batch Solution Architecture Azure Batch Runner <Host> Metadata <SQL Azure DB> Source <Azure Blob Storage> Compute Cluster <Azure Batch Pool> Feed 1 Feed 2 Feed N . . . . . . Task 1 Task 2 Task N Destination <Azure SQL DW>
  • 47. | © Copyright 2016 Hitachi Consulting47 Parallel Data Loading with Azure Batch Solution Architecture Azure Batch Runner <Host> Source <Azure Blob Storage> Compute Cluster <Azure Batch Pool> Feed 1 Feed 2 Feed N . . . . . . File 1 File 2 . . . DS 1 DS 2 . . . Task 1 Task 2 Task N Metadata <Azure SQL DB> Destination <Azure SQL DW>
  • 48. | © Copyright 2016 Hitachi Consulting48 Parallel Data Loading with Azure Batch Task Processing Steps Get feed format Info from Metadata
  • 49. | © Copyright 2016 Hitachi Consulting49 Parallel Data Loading with Azure Batch Get feed format Info from Metadata Create destination tables Task Processing Steps
  • 50. | © Copyright 2016 Hitachi Consulting50 Parallel Data Loading with Azure Batch Get feed format Info from Metadata Create destination tables Get list of file to process Task Processing Steps
  • 51. | © Copyright 2016 Hitachi Consulting51 Parallel Data Loading with Azure Batch Get feed format Info from Metadata Create destination tables Get list of file to process Load parser class to use Task Processing Steps
  • 52. | © Copyright 2016 Hitachi Consulting52 Parallel Data Loading with Azure Batch Get feed format Info from Metadata Create destination tables Get list of file to process Load parser class to use For each file to process Task Processing Steps
  • 53. | © Copyright 2016 Hitachi Consulting53 Parallel Data Loading with Azure Batch Get feed format Info from Metadata Create destination tables Get list of file to process Load parser class to use For each file to process Load file content from Blob Storage Task Processing Steps
  • 54. | © Copyright 2016 Hitachi Consulting54 Parallel Data Loading with Azure Batch Get feed format Info from Metadata Create destination tables Get list of file to process Load parser class to use For each file to process Load file content from Blob Storage Parse file content to DataTable Task Processing Steps
  • 55. | © Copyright 2016 Hitachi Consulting55 Parallel Data Loading with Azure Batch Get feed format Info from Metadata Create destination tables Get list of file to process Load parser class to use For each file to process Load file content from Blob Storage Parse file content to DataTable Dump DataTable content to destination (DW) Task Processing Steps
  • 56. | © Copyright 2016 Hitachi Consulting56 .NET Solution Structure ‱ Model ‱ Database Services ‱ Blob Storage Services ‱ Parsers Processing Logic (Class Library) ‱ Receives Command Line parameters ‱ Performs the operation according to the supplied parameters Task (Console App) ‱ Azure Batch Services ‱ Creates Pools/Jobs/Task Runner (Console App)
  • 57. | © Copyright 2016 Hitachi Consulting57 .NET Solution Structure }Azure Blob Storage } A Host ‱ Model ‱ Database Services ‱ Blob Storage Services ‱ Parsers Processing Logic (Class Library) ‱ Receives Command Line parameters ‱ Performs the operation according to the supplied parameters Task (Console App) ‱ Azure Batch Services ‱ Creates Poos/Jobs/Task Runner (Console App)
  • 58. | © Copyright 2016 Hitachi Consulting58 Hosting Azure Batch Runner None! – One-off execution SQL Agent Job (VM + SqlServer) SQL Server Integration Services (VM + SqlServer) Azure WebJob + Azure Scheduler (or on-demand) Azure Data Factory Azure Orchestration???
  • 59. | © Copyright 2016 Hitachi Consulting59 Code Walk-through
  • 60. | © Copyright 2016 Hitachi Consulting60 Code Walk-through  Solution Structure  Azure Batch Bits  Azure Blob Storage Bits  Text File Processing  XML & JSON – (Quick and Dirty)  SQL Bulk Copy with Retry Pattern This is how we do it
  • 61. | © Copyright 2016 Hitachi Consulting61 Code Walk-through Solution Structure
  • 62. | © Copyright 2016 Hitachi Consulting62 Code Walk-through Azure Batch Bits Very useful if you want to sync with subsequent processing steps. I.e., start a subsequent step only when the job finishes.
  • 63. | © Copyright 2016 Hitachi Consulting63 Code Walk-through Azure Batch Bits
  • 64. | © Copyright 2016 Hitachi Consulting64 Code Walk-through Azure Batch Bits
  • 65. | © Copyright 2016 Hitachi Consulting65 Code Walk-through Azure Blob Storage Streaming is very efficient in terms of processing large files, instead of downloading the whole file to be processed
  • 66. | © Copyright 2016 Hitachi Consulting66 Code Walk-through Text File Parsing – FileHelpers Library Parallel processing at the file level (a separate thread per line to parse)
  • 67. | © Copyright 2016 Hitachi Consulting67 Code Walk-through XML & JSON Files Parsing – Quick & Dirty ‱ The content of the whole file is loaded in a dataset ‱ Cannot flush data in batches ‱ Unlike streaming, it is more memory intensive approach
  • 68. | © Copyright 2016 Hitachi Consulting68 Code Walk-through SQL Bulk Copy – Loading in Batches Batch size < (available memory / record size)
  • 69. | © Copyright 2016 Hitachi Consulting69 Code Walk-through SQL Bulk Copy – Asynchronous
  • 70. | © Copyright 2016 Hitachi Consulting70 Code Walk-through SQL Bulk Copy – Retry Pattern
  • 71. | © Copyright 2016 Hitachi Consulting71 Some Important Notes - Polybase  Since the destination database is a Azure SQL DW, Polybase - a Big Data technology - is the best option to load data from Blob Storage into it, by creating external tables that defines the format of the data file.
  • 72. | © Copyright 2016 Hitachi Consulting72 Some Important Notes - Polybase  Since the destination database is a Azure SQL DW, Polybase - a Big Data technology - is the best option to load data from Blob Storage into it, by creating external tables that defines the format of the data file.  However, to use Polybase, the Blob Storage needs to be locally-redundant, and each folder should have only one data file type.
  • 73. | © Copyright 2016 Hitachi Consulting73 Some Important Notes - Polybase  Since the destination database is a Azure SQL DW, Polybase - a Big Data technology - is the best option to load data from Blob Storage into it, by creating external tables that defines the format of the data file.  However, to use Polybase, the Blob Storage needs to be locally-redundant, and each folder should have only one data file type.  A pre-processing step is to move the data files from the original Blob storage (that might be Geo- redundant), to a temporary locally redundant Blob Storage, in a proper folder structure.
  • 74. | © Copyright 2016 Hitachi Consulting74 Some Important Notes - Polybase  Since the destination database is a Azure SQL DW, Polybase - a Big Data technology - is the best option to load data from Blob Storage into it, by creating external tables that defines the format of the data file.  However, to use Polybase, the Blob Storage needs to be locally-redundant, and each folder should have only one data file type.  A pre-processing step is to move the data files from the original Blob storage (that might be Geo- redundant), to a temporary locally redundant Blob Storage, in a proper folder structure.  Parsing data files with complex format (e.g., parent child, mainframe, JSON, XML) is not possible in Polybase (yet), but Polybase can load each line in the file into a one-column table, where T-SQL is used to parse it.
  • 75. | © Copyright 2016 Hitachi Consulting75 Some Important Notes - Polybase  Since the destination database is a Azure SQL DW, Polybase - a Big Data technology - is the best option to load data from Blob Storage into it, by creating external tables that defines the format of the data file.  However, to use Polybase, the Blob Storage needs to be locally-redundant, and each folder should have only one data file type.  A pre-processing step is to move the data files from the original Blob storage (that might be Geo- redundant), to a temporary locally redundant Blob Storage, in a proper folder structure.  Parsing data files with complex format (e.g., parent child, mainframe, JSON, XML) is not possible in Polybase (yet), but Polybase can load each line in the file into a one-column table, where T-SQL is used to parse it.  If the source is not Blob Storage (i.e., file system), or you destination is not Azure SQL DW (e.g., Azure SQL DB, DocumentDB, or another Azure Blob Storage/Data lake), or your file processing does not only involve loading data to a database (e.g., processing requests to initiate workflow), Azure Batch is the right tool.
  • 76. | © Copyright 2016 Hitachi Consulting76 Useful Resources Check these out
 ‱ Azure Batch Documentation https://azure.microsoft.com/en-us/documentation/articles/batch-technical-overview ‱ Azure Batch Explorer https://github.com/Azure/azure-batch-samples/tree/master/CSharp/BatchExplorer ‱ HPC and data orchestration using Azure Batch and Data Factory https://azure.microsoft.com/en-us/documentation/articles/data-factory-data-processing-using-batch ‱ FileHelpers Librarys http://www.filehelpers.net ‱ Retry Pattern https://msdn.microsoft.com/en-us/library/dn589788.aspx ‱ Spinning up 16,000 A1 Virtual Machines on Azure Batch https://blogs.endjin.com/2015/07/spinning-up-16000-a1-virtual-machines-on-azure-batch ‱ Parallel Computing https://en.wikipedia.org/wiki/Parallel_computing
  • 77. | © Copyright 2016 Hitachi Consulting77 Acknowledgement These guys are awesome
 Thanks to James Fox and Alessandro Aeberli for their efforts in building the awesome Data Landing Solution for Argos. Nirav is currently the master of the landing solution 
  • 78. | © Copyright 2016 Hitachi Consulting78 My Background Applying Computational Intelligence in Data Mining ‱ Honorary Research Fellow, School of Computing , University of Kent. ‱ Ph.D. Computer Science, University of Kent, Canterbury, UK. ‱ M.Sc. Computer Science , The American University in Cairo, Egypt. ‱ 25+ published journal and conference papers, focusing on: – classification rules induction, – decision trees construction, – Bayesian classification modelling, – data reduction, – instance-based learning, – evolving neural networks, and – data clustering ‱ Journals: Swarm Intelligence, Swarm & Evolutionary Computation, , Applied Soft Computing, and Memetic Computing. ‱ Conferences: ANTS, IEEE CEC, IEEE SIS, EvoBio, ECTA, IEEE WCCI and INNS-BigData. ResearchGate.org