SlideShare ist ein Scribd-Unternehmen logo
1 von 60
Downloaden Sie, um offline zu lesen
Data Solution Architect, Microsoft
AZURE DATA LAKE
Store and Analytics
Big Data for Microsoft Developers
Kenneth M. Nielsen
@doktorkermit
Kenneth M. Nielsen
• Worked with SQL Server since 1999
• Co-organizer of SQL Saturday DK
• Co-organizer of SQLNexus Nordic
• Community is Everything
• Data Solution Architect at Microsoft
• kmn@funkylab.com
• @doktorkermit
• www.funkylab.com
Agenda
• Azure Data Lake overview
• Azure Data Lake Store
• Azure Data Lake Analytics
• Azure Data Lake Analytics – Using Visual Studio
• Azure Data Lake Analytics – Using PowerShell
• Azure Data Lake Analytics – Cognitive Analysis
• Q & A
AZURE DATA LAKE
Overview
History
Bing needed to…
– Understand user behavior
And do it…
– At massive scale
– With agility and speed
– At low cost
So they built …
– Cosmos
Cosmos
• Batch Jobs
• Interactive
• Machine Learning
• Streaming
Thousands of Developers
AZURE DATA LAKE
Store and analyze data of any kind and size
Develop faster, debug and optimize smarter
Interactively explore patterns in your data
No learning curve
Managed and supported
Dynamically scales to match your business priorities
Enterprise-grade security
Built on YARN, designed for the cloud
DATA LAKE STORE
Azure Data Lake Store
A hyper scale repository
for big data analytics
workloads
No limits to SCALE
Store ANY DATA in its native format
HADOOP FILE SYSTEM (HDFS) for the cloud
ENTERPRISE READY access control,
Encryption at rest
Optimized for analytic workload
PERFORMANCE
Azure Data Lake Store
Any Data
• Unstructured
• Semi-structured
• Structured
Azure Data Lake Store
Azure Data Lake Store
HDFS for the cloud
New filesystem build from the
ground up, based on
HADOOP file system
• Integrates with
HDInsight, Hortonworks
and Cloudera
• Supports Files and
Folder objects and
operations
Azure Data Lake Store
Unlimited storage
• Files sizes can be
from Gigabytes to
Petabytes
• No limits to scale
Azure Data Lake Store
Security
• Always encrypted; in motion
using SSL, and at rest using
keys in Azure Key Vault
• Single sign-on, multi-factor
authentication and seamless
integration of on-premises
identities with Active Directory
• Fine-grained POSIX-based
ACLs for role-based access
controls
• Auditing every access /
configuration change
DATA LAKE ANALYTICS
Azure Data Lake Analytics
A elastic analytics service
built on Apache YARN that processes all
data, at any size
• No limits to SCALE
• Includes U-SQL, a language that unifies the
benefits of SQL with the expressive power of C#
• Optimized to work with ADL STORE
• FEDERATED QUERY across Azure data sources
• ENTERPRISE READY Role based access control
& Auditing
• Pay PER JOB & Scale PER JOB
U-SQL
A new language for
Big Data
• Familiar syntax to millions of SQL & .NET
developers
• Unifies declarative nature of SQL with the
imperative power of C#
• Unifies structured, semi-structured and
unstructured data
• Distributed query support over all data
Language Overview
U-SQL Fundamentals
• All the familiar SQL clauses
SELECT | FROM | WHERE
GROUP BY | JOIN | OVER
• Operate on unstructured and
structured data
• Relational metadata objects
.NET integration and extensibility
• U-SQL expressions are full C# expressions
• Reuse .NET code in your own assemblies
• Use C# to define your own:
Types | Functions | Joins | Aggregators | I/O (Extractors, Outp
utters)
U-SQL Capabilities
Interactive
Batch
Streaming
Machine Learning
IN PROGRESS
AVAILABLE NOW
FUTURE
FUTURE
U-SQL Distributed Query
Azure Storage Blobs
Azure Data Lake Store
Azure SQL Database
Azure SQL Data Warehouse
Azure SQL DB in Azure VM
READ
READ
READ
READ
READ
WRITE
WRITE
WRITE
WRITE
WRITE
Develop massively parallel
programs with simplicity
• U-SQL: a simple
and powerful language that’s
familiar and easily extensible
• Unifies the declarative
nature of SQL with expressive
power of C#
• Leverage existing libraries in .NET
languages, R and Python
• Massively parallelize code on
diverse workloads (ETL, ML, image
tagging, facial detection)
@orders =
EXTRACT
OrderId int,
Customer string,
Date DateTime,
Amount float
FROM "/input/orders.txt"
USING Extractors.Tsv();
OUTPUT @orders
TO "/output/orders_copy.txt"
USING Outputters.Tsv();
Apply Schema on read
From a file in a Data Lake
Easy delimited text handling
Write out
Read the input, write it directly to output (just a simple copy)
Rowset
U-SQL Compilation Process
C#
C++
Algebra
Other files
(system files, deployed resources)
managed dll
Unmanaged dll
Compilation output (in job folder)
Compiler & Optimizer
U-SQL Metadata Service
Deployed to Vertices
Logical -> Physical Plan
Each square = “a vertex” represents
a fraction of the total
Vertexes in each SuperVertex (aka
“Stage) are doing the same operation
on different parts of the same data.
Vertexes in a later stages may
depend on a vertex in an earlier stage
Execution with Requested Parallelism
Requested Parallelism = 1
(reserve enough to do 1 vertex at
a time)
Requested Parallelism = 4
(reserve enough to do 4 vertices
at a time)
Job Scheduler
& Queue
Front-EndService
Query Life
Optimizer
Vertex Scheduling
Compiler
Runtime
Visual Studio
Portal / API
Stage Details
252 Pieces of work
AVG Vertex execution time
4.3 Billion rows
Data Read & Written
ADLAUs
Azure
Data
Lake
Analytics
Unit
Parallelism N = N ADLAUs
1 ADLAU ~=
A VM with 2 cores and 6 GB of
memory
Preparing
Queued
Running
Finalizing
Ended
(Succeeded, Failed, Cancelled)
New
Compiling
Queued
Scheduling
Starting
Running
Ended
UX Job State
The script is being compiled by the Compiler Service
All jobs enter the queue.
Are there enough ADLAUs to start the job?
If yes, then allocate those ADLAUs for the job
The U-SQL runtime is now executing the code on 1 or m
ore ADLAUs or finalizing the outputs
The job has concluded.
Why does a Job get Queued?
Local Cause
Conditions:
• Queue already at Max
Concurrency
Global Cause
Conditions:
• System-wide shortage of ADLAUs
• System-wide shortage of
Bandwidth
* If these conditions are met, a job will be queued even if the
queue is not at its Max Concurrency
DATA LAKE ANALYTICS
Visual Studio
Azure Data Lake – Visual Studio
Available
project types
Azure Data Lake – Visual Studio
Fully integrates to
Solution Explorer
Azure Data Lake – Visual Studio
• Monitor and
manage jobs
• Browse and
manage storage
• Browse U-SQL
catalog
CREATING U-SQL
Creating U-SQL
IntelliSense Supported
Creating U-SQL
Code behind to
enhance your
code
Debug and Optimize your
Big Data programs with ease
• Deep integration with
Visual Studio, Visual Studio Code,
Eclipse, & IntelliJ
• Easy for novices to write
simple queries
• Integrated with U-SQL,
Hive, Storm, and Spark
• Actively offers recommendations
to improve performance and
reduce cost
• Playback visually displays job run
USING VISUAL STUDIO
Demo
Installing Azure PowerShell
• PowerShell Gallery
• Recommended approach
• PowerShell 5.0 supports PowerShell Gallery
• Windows 10 ships with PowerShell 5.0
• Web Platform Installation (WebPI)
Installing from the PowerShell
Gallery
• Launch Windows PowerShell ISE as
Administrator
• Install-Module AzureRM
• Install-AzureRM
Finding the ADL cmdlets
• Option 1
• Get-Command -Module AzureRM.DataLakeStore
• Get-Command -Module AzureRM.DataLakeAnalytics
• Option 2
• Get-Command *DataLake*
Logging in to Azure
• Launch Windows PowerShell ISE
• $subname = “Your Subscription Name”
• Login-AzureRmAccount –SubscriptionName $subname
ADLS: Listing files in a store
• $adls = “mscloudsummitstore”
• Get-AzureRmDataLakeStoreChildItem
• -Account $adls
• -Path /
ADLS: Upload and download
• $adls = “mscloudsummitstore”
• Import-AzureRmDataLakeStoreItem
-Account $adls
-Path d:somefile.txt
-Destination /somefile.txt
• Export-AzureRmDataLakeStoreItem
-Account $adls
-Path /somefile.txt
-Destination d:somefile_copy.txt
ADLA: List and submit jobs
• $adla = “mscloudsummitanalytics”
• Get-AzureRmDataLakeAnalyticsJob
-Account $adla
•
Submit-AzureRmDataLakeAnalyticsJob
-Account $adla
-Script “…” # U-SQL text
-Name myjob
• Submit-AzureRmDataLakeAnalyticsJob
-Account $adla
-ScriptPath D:test.script
-Name myjob
ADL Store (ADLS) feature set
Account Management
Create new account
List accounts
Update account properties
Delete account
Transferring Data
Upload into store from local disk
Download from store to local disk
Files and Folders
List contents of folder
Create
Move
Delete
Does file exist
Security
Get ACLs
Update ACLs
Get Owner
Set Owner
File Content
Set file content
Append file content
Get file content
Merge files
ADL Analytics (ADLA) feature set
Account Management
Create new account
List accounts
Update account properties
Delete account
Data Sources
Add a data source
List data sources
Update data source
Delete data source
Compute
List jobs
Submit job
Cancel job
Catalog Items
List items in U-SQL catalog
Update item
Catalog Secrets
Create catalog secret
List catalog secrets
Delete catalog secrets
USING ADL POWERSHELL
DEMO
COGNITIVE ANALYSIS OF IMAGES
Install samples and assemblies
Running sample
Running sample
COGNITIVE ANALYSIS OF IMAGES
Demo
Additional capabilities and resources
Tools:
• http://aka.ms/adltoolsVS
Blogs and community page:
• http://funkylab.com/
• http://blogs.msdn.com/b/visualstudio/
• http://azure.microsoft.com/en-us/blog/topics/big-data/
• https://channel9.msdn.com/Search?term=U-SQL#ch9Search
• https://blogs.msdn.microsoft.com/azuredatalake/2016/11/22/u-sql-advanced-analytics-introducing-cognitive-scenarios-for-text-
and-imaging/
Documentation and articles and slides:
• http://aka.ms/usql_reference
• https://azure.microsoft.com/en-us/documentation/services/data-lake-analytics/
• https://msdn.microsoft.com/en-us/magazine/mt614251
GITHUB Get startet
• https://Github.com
ADL forums and feedback
• http://aka.ms/adlfeedback
• https://social.msdn.microsoft.com/Forums/azure/en-US/home?forum=AzureDataLake
• http://stackoverflow.com/questions/tagged/u-sql
QUESTIONS
Register for SQL Nexus 2017
Register for SQL Nexus 2017
Merci beaucoup à nos sponsors!
Thank you to all our sponsors!
Join the conversation
#MSCloudSummit
@MSCloudSummit
Merci Beaucoup! Thank you!
Join the conversation
#MSCloudSummit
@MSCloudSummit
http://bit.ly/MSCSevalJ1
Evaluez les sessions…
…et tentez de gagner une
Surface Pro 4

Weitere ähnliche Inhalte

Was ist angesagt?

Analyzing StackExchange data with Azure Data Lake
Analyzing StackExchange data with Azure Data LakeAnalyzing StackExchange data with Azure Data Lake
Analyzing StackExchange data with Azure Data LakeBizTalk360
 
Azure data bricks by Eugene Polonichko
Azure data bricks by Eugene PolonichkoAzure data bricks by Eugene Polonichko
Azure data bricks by Eugene PolonichkoAlex Tumanoff
 
Dipping Your Toes: Azure Data Lake for DBAs
Dipping Your Toes: Azure Data Lake for DBAsDipping Your Toes: Azure Data Lake for DBAs
Dipping Your Toes: Azure Data Lake for DBAsBob Pusateri
 
Integration Monday - Analysing StackExchange data with Azure Data Lake
Integration Monday - Analysing StackExchange data with Azure Data LakeIntegration Monday - Analysing StackExchange data with Azure Data Lake
Integration Monday - Analysing StackExchange data with Azure Data LakeTom Kerkhove
 
Microsoft Azure Databricks
Microsoft Azure DatabricksMicrosoft Azure Databricks
Microsoft Azure DatabricksSascha Dittmann
 
Azure Lowlands: An intro to Azure Data Lake
Azure Lowlands: An intro to Azure Data LakeAzure Lowlands: An intro to Azure Data Lake
Azure Lowlands: An intro to Azure Data LakeRick van den Bosch
 
Spark as a Service with Azure Databricks
Spark as a Service with Azure DatabricksSpark as a Service with Azure Databricks
Spark as a Service with Azure DatabricksLace Lofranco
 
The Developer Data Scientist – Creating New Analytics Driven Applications usi...
The Developer Data Scientist – Creating New Analytics Driven Applications usi...The Developer Data Scientist – Creating New Analytics Driven Applications usi...
The Developer Data Scientist – Creating New Analytics Driven Applications usi...Microsoft Tech Community
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure DatabricksJames Serra
 
A lap around Azure Data Factory
A lap around Azure Data FactoryA lap around Azure Data Factory
A lap around Azure Data FactoryBizTalk360
 
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...Michael Rys
 
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Lace Lofranco
 
Azure Data Factory V2; The Data Flows
Azure Data Factory V2; The Data FlowsAzure Data Factory V2; The Data Flows
Azure Data Factory V2; The Data FlowsThomas Sykes
 
Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage CCG
 
Azure Data Lake Store and Analytics
Azure Data Lake Store and AnalyticsAzure Data Lake Store and Analytics
Azure Data Lake Store and AnalyticsSergio Zenatti Filho
 
Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Big Data Processing with Spark and .NET - Microsoft Ignite 2019Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Big Data Processing with Spark and .NET - Microsoft Ignite 2019Michael Rys
 
Leveraging Azure Databricks to minimize time to insight by combining Batch an...
Leveraging Azure Databricks to minimize time to insight by combining Batch an...Leveraging Azure Databricks to minimize time to insight by combining Batch an...
Leveraging Azure Databricks to minimize time to insight by combining Batch an...Microsoft Tech Community
 
Unleash the power of Azure Data Factory
Unleash the power of Azure Data Factory Unleash the power of Azure Data Factory
Unleash the power of Azure Data Factory Sergio Zenatti Filho
 
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu GantaAzure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu GantaDatabricks
 

Was ist angesagt? (20)

Analyzing StackExchange data with Azure Data Lake
Analyzing StackExchange data with Azure Data LakeAnalyzing StackExchange data with Azure Data Lake
Analyzing StackExchange data with Azure Data Lake
 
Azure data bricks by Eugene Polonichko
Azure data bricks by Eugene PolonichkoAzure data bricks by Eugene Polonichko
Azure data bricks by Eugene Polonichko
 
Dipping Your Toes: Azure Data Lake for DBAs
Dipping Your Toes: Azure Data Lake for DBAsDipping Your Toes: Azure Data Lake for DBAs
Dipping Your Toes: Azure Data Lake for DBAs
 
Integration Monday - Analysing StackExchange data with Azure Data Lake
Integration Monday - Analysing StackExchange data with Azure Data LakeIntegration Monday - Analysing StackExchange data with Azure Data Lake
Integration Monday - Analysing StackExchange data with Azure Data Lake
 
Microsoft Azure Databricks
Microsoft Azure DatabricksMicrosoft Azure Databricks
Microsoft Azure Databricks
 
Introduction to Azure Data Lake
Introduction to Azure Data LakeIntroduction to Azure Data Lake
Introduction to Azure Data Lake
 
Azure Lowlands: An intro to Azure Data Lake
Azure Lowlands: An intro to Azure Data LakeAzure Lowlands: An intro to Azure Data Lake
Azure Lowlands: An intro to Azure Data Lake
 
Spark as a Service with Azure Databricks
Spark as a Service with Azure DatabricksSpark as a Service with Azure Databricks
Spark as a Service with Azure Databricks
 
The Developer Data Scientist – Creating New Analytics Driven Applications usi...
The Developer Data Scientist – Creating New Analytics Driven Applications usi...The Developer Data Scientist – Creating New Analytics Driven Applications usi...
The Developer Data Scientist – Creating New Analytics Driven Applications usi...
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure Databricks
 
A lap around Azure Data Factory
A lap around Azure Data FactoryA lap around Azure Data Factory
A lap around Azure Data Factory
 
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...
 
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
 
Azure Data Factory V2; The Data Flows
Azure Data Factory V2; The Data FlowsAzure Data Factory V2; The Data Flows
Azure Data Factory V2; The Data Flows
 
Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage
 
Azure Data Lake Store and Analytics
Azure Data Lake Store and AnalyticsAzure Data Lake Store and Analytics
Azure Data Lake Store and Analytics
 
Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Big Data Processing with Spark and .NET - Microsoft Ignite 2019Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Big Data Processing with Spark and .NET - Microsoft Ignite 2019
 
Leveraging Azure Databricks to minimize time to insight by combining Batch an...
Leveraging Azure Databricks to minimize time to insight by combining Batch an...Leveraging Azure Databricks to minimize time to insight by combining Batch an...
Leveraging Azure Databricks to minimize time to insight by combining Batch an...
 
Unleash the power of Azure Data Factory
Unleash the power of Azure Data Factory Unleash the power of Azure Data Factory
Unleash the power of Azure Data Factory
 
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu GantaAzure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
 

Ähnlich wie J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen

Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)James Serra
 
Tokyo azure meetup #2 big data made easy
Tokyo azure meetup #2   big data made easyTokyo azure meetup #2   big data made easy
Tokyo azure meetup #2 big data made easyTokyo Azure Meetup
 
USQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventUSQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventTrivadis
 
Scalable relational database with SQL Azure
Scalable relational database with SQL AzureScalable relational database with SQL Azure
Scalable relational database with SQL AzureShy Engelberg
 
Analytics in the Cloud
Analytics in the CloudAnalytics in the Cloud
Analytics in the CloudRoss McNeely
 
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...Michael Rys
 
U-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for DevelopersU-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for DevelopersMichael Rys
 
Modern ETL: Azure Data Factory, Data Lake, and SQL Database
Modern ETL: Azure Data Factory, Data Lake, and SQL DatabaseModern ETL: Azure Data Factory, Data Lake, and SQL Database
Modern ETL: Azure Data Factory, Data Lake, and SQL DatabaseEric Bragas
 
Migrating on premises workload to azure sql database
Migrating on premises workload to azure sql databaseMigrating on premises workload to azure sql database
Migrating on premises workload to azure sql databasePARIKSHIT SAVJANI
 
Accelerating Business Intelligence Solutions with Microsoft Azure pass
Accelerating Business Intelligence Solutions with Microsoft Azure   passAccelerating Business Intelligence Solutions with Microsoft Azure   pass
Accelerating Business Intelligence Solutions with Microsoft Azure passJason Strate
 
2014.10.22 Building Azure Solutions with Office 365
2014.10.22 Building Azure Solutions with Office 3652014.10.22 Building Azure Solutions with Office 365
2014.10.22 Building Azure Solutions with Office 365Marco Parenzan
 
Move your on prem data to a lake in a Lake in Cloud
Move your on prem data to a lake in a Lake in CloudMove your on prem data to a lake in a Lake in Cloud
Move your on prem data to a lake in a Lake in CloudCAMMS
 
Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)Michael Rys
 
Gs08 modernize your data platform with sql technologies wash dc
Gs08 modernize your data platform with sql technologies   wash dcGs08 modernize your data platform with sql technologies   wash dc
Gs08 modernize your data platform with sql technologies wash dcBob Ward
 
Talavant Data Lake Analytics
Talavant Data Lake Analytics Talavant Data Lake Analytics
Talavant Data Lake Analytics Sean Forgatch
 
Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2Amazon Web Services
 
Geek Sync | Deployment and Management of Complex Azure Environments
Geek Sync | Deployment and Management of Complex Azure EnvironmentsGeek Sync | Deployment and Management of Complex Azure Environments
Geek Sync | Deployment and Management of Complex Azure EnvironmentsIDERA Software
 
USQ Landdemos Azure Data Lake
USQ Landdemos Azure Data LakeUSQ Landdemos Azure Data Lake
USQ Landdemos Azure Data LakeTrivadis
 
Tech-Spark: Azure SQL Databases
Tech-Spark: Azure SQL DatabasesTech-Spark: Azure SQL Databases
Tech-Spark: Azure SQL DatabasesRalph Attard
 

Ähnlich wie J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen (20)

Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
 
Tokyo azure meetup #2 big data made easy
Tokyo azure meetup #2   big data made easyTokyo azure meetup #2   big data made easy
Tokyo azure meetup #2 big data made easy
 
USQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventUSQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake Event
 
Scalable relational database with SQL Azure
Scalable relational database with SQL AzureScalable relational database with SQL Azure
Scalable relational database with SQL Azure
 
Analytics in the Cloud
Analytics in the CloudAnalytics in the Cloud
Analytics in the Cloud
 
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
 
U-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for DevelopersU-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for Developers
 
Modern ETL: Azure Data Factory, Data Lake, and SQL Database
Modern ETL: Azure Data Factory, Data Lake, and SQL DatabaseModern ETL: Azure Data Factory, Data Lake, and SQL Database
Modern ETL: Azure Data Factory, Data Lake, and SQL Database
 
Serverless SQL
Serverless SQLServerless SQL
Serverless SQL
 
Migrating on premises workload to azure sql database
Migrating on premises workload to azure sql databaseMigrating on premises workload to azure sql database
Migrating on premises workload to azure sql database
 
Accelerating Business Intelligence Solutions with Microsoft Azure pass
Accelerating Business Intelligence Solutions with Microsoft Azure   passAccelerating Business Intelligence Solutions with Microsoft Azure   pass
Accelerating Business Intelligence Solutions with Microsoft Azure pass
 
2014.10.22 Building Azure Solutions with Office 365
2014.10.22 Building Azure Solutions with Office 3652014.10.22 Building Azure Solutions with Office 365
2014.10.22 Building Azure Solutions with Office 365
 
Move your on prem data to a lake in a Lake in Cloud
Move your on prem data to a lake in a Lake in CloudMove your on prem data to a lake in a Lake in Cloud
Move your on prem data to a lake in a Lake in Cloud
 
Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)
 
Gs08 modernize your data platform with sql technologies wash dc
Gs08 modernize your data platform with sql technologies   wash dcGs08 modernize your data platform with sql technologies   wash dc
Gs08 modernize your data platform with sql technologies wash dc
 
Talavant Data Lake Analytics
Talavant Data Lake Analytics Talavant Data Lake Analytics
Talavant Data Lake Analytics
 
Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2
 
Geek Sync | Deployment and Management of Complex Azure Environments
Geek Sync | Deployment and Management of Complex Azure EnvironmentsGeek Sync | Deployment and Management of Complex Azure Environments
Geek Sync | Deployment and Management of Complex Azure Environments
 
USQ Landdemos Azure Data Lake
USQ Landdemos Azure Data LakeUSQ Landdemos Azure Data Lake
USQ Landdemos Azure Data Lake
 
Tech-Spark: Azure SQL Databases
Tech-Spark: Azure SQL DatabasesTech-Spark: Azure SQL Databases
Tech-Spark: Azure SQL Databases
 

Mehr von MS Cloud Summit

J1 T1 5 - Stream Analytics - Cédric Charlier
J1 T1 5 - Stream Analytics - Cédric CharlierJ1 T1 5 - Stream Analytics - Cédric Charlier
J1 T1 5 - Stream Analytics - Cédric CharlierMS Cloud Summit
 
J1 T1 4 - Azure Data Factory vs SSIS - Regis Baccaro
J1 T1 4 - Azure Data Factory vs SSIS - Regis BaccaroJ1 T1 4 - Azure Data Factory vs SSIS - Regis Baccaro
J1 T1 4 - Azure Data Factory vs SSIS - Regis BaccaroMS Cloud Summit
 
J1 T1 2 - Azure DocumentDB, une base de données extrêmement rapide à l’échell...
J1 T1 2 - Azure DocumentDB, une base de données extrêmement rapide à l’échell...J1 T1 2 - Azure DocumentDB, une base de données extrêmement rapide à l’échell...
J1 T1 2 - Azure DocumentDB, une base de données extrêmement rapide à l’échell...MS Cloud Summit
 
J1 T1 1 - Azure Data Platform, quelle solution pour quel usage - Charles-Hen...
J1 T1 1 - Azure Data Platform, quelle solution pour quel usage  - Charles-Hen...J1 T1 1 - Azure Data Platform, quelle solution pour quel usage  - Charles-Hen...
J1 T1 1 - Azure Data Platform, quelle solution pour quel usage - Charles-Hen...MS Cloud Summit
 
J1 - Keynote Data Platform - Rohan Kumar
J1 - Keynote Data Platform - Rohan KumarJ1 - Keynote Data Platform - Rohan Kumar
J1 - Keynote Data Platform - Rohan KumarMS Cloud Summit
 

Mehr von MS Cloud Summit (6)

J1 T1 5 - Stream Analytics - Cédric Charlier
J1 T1 5 - Stream Analytics - Cédric CharlierJ1 T1 5 - Stream Analytics - Cédric Charlier
J1 T1 5 - Stream Analytics - Cédric Charlier
 
J1 T1 4 - Azure Data Factory vs SSIS - Regis Baccaro
J1 T1 4 - Azure Data Factory vs SSIS - Regis BaccaroJ1 T1 4 - Azure Data Factory vs SSIS - Regis Baccaro
J1 T1 4 - Azure Data Factory vs SSIS - Regis Baccaro
 
J1 T1 2 - Azure DocumentDB, une base de données extrêmement rapide à l’échell...
J1 T1 2 - Azure DocumentDB, une base de données extrêmement rapide à l’échell...J1 T1 2 - Azure DocumentDB, une base de données extrêmement rapide à l’échell...
J1 T1 2 - Azure DocumentDB, une base de données extrêmement rapide à l’échell...
 
J1 T1 1 - Azure Data Platform, quelle solution pour quel usage - Charles-Hen...
J1 T1 1 - Azure Data Platform, quelle solution pour quel usage  - Charles-Hen...J1 T1 1 - Azure Data Platform, quelle solution pour quel usage  - Charles-Hen...
J1 T1 1 - Azure Data Platform, quelle solution pour quel usage - Charles-Hen...
 
J1 - Keynote Data Platform - Rohan Kumar
J1 - Keynote Data Platform - Rohan KumarJ1 - Keynote Data Platform - Rohan Kumar
J1 - Keynote Data Platform - Rohan Kumar
 
Agenda MSCS Jour 1
 Agenda MSCS Jour 1 Agenda MSCS Jour 1
Agenda MSCS Jour 1
 

Kürzlich hochgeladen

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 

Kürzlich hochgeladen (20)

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 

J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen

  • 1. Data Solution Architect, Microsoft AZURE DATA LAKE Store and Analytics Big Data for Microsoft Developers Kenneth M. Nielsen @doktorkermit
  • 2. Kenneth M. Nielsen • Worked with SQL Server since 1999 • Co-organizer of SQL Saturday DK • Co-organizer of SQLNexus Nordic • Community is Everything • Data Solution Architect at Microsoft • kmn@funkylab.com • @doktorkermit • www.funkylab.com
  • 3. Agenda • Azure Data Lake overview • Azure Data Lake Store • Azure Data Lake Analytics • Azure Data Lake Analytics – Using Visual Studio • Azure Data Lake Analytics – Using PowerShell • Azure Data Lake Analytics – Cognitive Analysis • Q & A
  • 5. History Bing needed to… – Understand user behavior And do it… – At massive scale – With agility and speed – At low cost So they built … – Cosmos Cosmos • Batch Jobs • Interactive • Machine Learning • Streaming Thousands of Developers
  • 6. AZURE DATA LAKE Store and analyze data of any kind and size Develop faster, debug and optimize smarter Interactively explore patterns in your data No learning curve Managed and supported Dynamically scales to match your business priorities Enterprise-grade security Built on YARN, designed for the cloud
  • 8. Azure Data Lake Store A hyper scale repository for big data analytics workloads No limits to SCALE Store ANY DATA in its native format HADOOP FILE SYSTEM (HDFS) for the cloud ENTERPRISE READY access control, Encryption at rest Optimized for analytic workload PERFORMANCE
  • 9. Azure Data Lake Store Any Data • Unstructured • Semi-structured • Structured
  • 11. Azure Data Lake Store HDFS for the cloud New filesystem build from the ground up, based on HADOOP file system • Integrates with HDInsight, Hortonworks and Cloudera • Supports Files and Folder objects and operations
  • 12. Azure Data Lake Store Unlimited storage • Files sizes can be from Gigabytes to Petabytes • No limits to scale
  • 13. Azure Data Lake Store Security • Always encrypted; in motion using SSL, and at rest using keys in Azure Key Vault • Single sign-on, multi-factor authentication and seamless integration of on-premises identities with Active Directory • Fine-grained POSIX-based ACLs for role-based access controls • Auditing every access / configuration change
  • 15. Azure Data Lake Analytics A elastic analytics service built on Apache YARN that processes all data, at any size • No limits to SCALE • Includes U-SQL, a language that unifies the benefits of SQL with the expressive power of C# • Optimized to work with ADL STORE • FEDERATED QUERY across Azure data sources • ENTERPRISE READY Role based access control & Auditing • Pay PER JOB & Scale PER JOB
  • 16. U-SQL A new language for Big Data • Familiar syntax to millions of SQL & .NET developers • Unifies declarative nature of SQL with the imperative power of C# • Unifies structured, semi-structured and unstructured data • Distributed query support over all data
  • 17. Language Overview U-SQL Fundamentals • All the familiar SQL clauses SELECT | FROM | WHERE GROUP BY | JOIN | OVER • Operate on unstructured and structured data • Relational metadata objects .NET integration and extensibility • U-SQL expressions are full C# expressions • Reuse .NET code in your own assemblies • Use C# to define your own: Types | Functions | Joins | Aggregators | I/O (Extractors, Outp utters)
  • 19. U-SQL Distributed Query Azure Storage Blobs Azure Data Lake Store Azure SQL Database Azure SQL Data Warehouse Azure SQL DB in Azure VM READ READ READ READ READ WRITE WRITE WRITE WRITE WRITE
  • 20. Develop massively parallel programs with simplicity • U-SQL: a simple and powerful language that’s familiar and easily extensible • Unifies the declarative nature of SQL with expressive power of C# • Leverage existing libraries in .NET languages, R and Python • Massively parallelize code on diverse workloads (ETL, ML, image tagging, facial detection)
  • 21. @orders = EXTRACT OrderId int, Customer string, Date DateTime, Amount float FROM "/input/orders.txt" USING Extractors.Tsv(); OUTPUT @orders TO "/output/orders_copy.txt" USING Outputters.Tsv(); Apply Schema on read From a file in a Data Lake Easy delimited text handling Write out Read the input, write it directly to output (just a simple copy) Rowset
  • 22. U-SQL Compilation Process C# C++ Algebra Other files (system files, deployed resources) managed dll Unmanaged dll Compilation output (in job folder) Compiler & Optimizer U-SQL Metadata Service Deployed to Vertices
  • 23. Logical -> Physical Plan Each square = “a vertex” represents a fraction of the total Vertexes in each SuperVertex (aka “Stage) are doing the same operation on different parts of the same data. Vertexes in a later stages may depend on a vertex in an earlier stage
  • 24. Execution with Requested Parallelism Requested Parallelism = 1 (reserve enough to do 1 vertex at a time) Requested Parallelism = 4 (reserve enough to do 4 vertices at a time)
  • 25. Job Scheduler & Queue Front-EndService Query Life Optimizer Vertex Scheduling Compiler Runtime Visual Studio Portal / API
  • 26. Stage Details 252 Pieces of work AVG Vertex execution time 4.3 Billion rows Data Read & Written
  • 27. ADLAUs Azure Data Lake Analytics Unit Parallelism N = N ADLAUs 1 ADLAU ~= A VM with 2 cores and 6 GB of memory
  • 28. Preparing Queued Running Finalizing Ended (Succeeded, Failed, Cancelled) New Compiling Queued Scheduling Starting Running Ended UX Job State The script is being compiled by the Compiler Service All jobs enter the queue. Are there enough ADLAUs to start the job? If yes, then allocate those ADLAUs for the job The U-SQL runtime is now executing the code on 1 or m ore ADLAUs or finalizing the outputs The job has concluded.
  • 29. Why does a Job get Queued? Local Cause Conditions: • Queue already at Max Concurrency Global Cause Conditions: • System-wide shortage of ADLAUs • System-wide shortage of Bandwidth * If these conditions are met, a job will be queued even if the queue is not at its Max Concurrency
  • 31. Azure Data Lake – Visual Studio Available project types
  • 32. Azure Data Lake – Visual Studio Fully integrates to Solution Explorer
  • 33. Azure Data Lake – Visual Studio • Monitor and manage jobs • Browse and manage storage • Browse U-SQL catalog
  • 36. Creating U-SQL Code behind to enhance your code
  • 37. Debug and Optimize your Big Data programs with ease • Deep integration with Visual Studio, Visual Studio Code, Eclipse, & IntelliJ • Easy for novices to write simple queries • Integrated with U-SQL, Hive, Storm, and Spark • Actively offers recommendations to improve performance and reduce cost • Playback visually displays job run
  • 39. Installing Azure PowerShell • PowerShell Gallery • Recommended approach • PowerShell 5.0 supports PowerShell Gallery • Windows 10 ships with PowerShell 5.0 • Web Platform Installation (WebPI)
  • 40. Installing from the PowerShell Gallery • Launch Windows PowerShell ISE as Administrator • Install-Module AzureRM • Install-AzureRM
  • 41. Finding the ADL cmdlets • Option 1 • Get-Command -Module AzureRM.DataLakeStore • Get-Command -Module AzureRM.DataLakeAnalytics • Option 2 • Get-Command *DataLake*
  • 42. Logging in to Azure • Launch Windows PowerShell ISE • $subname = “Your Subscription Name” • Login-AzureRmAccount –SubscriptionName $subname
  • 43. ADLS: Listing files in a store • $adls = “mscloudsummitstore” • Get-AzureRmDataLakeStoreChildItem • -Account $adls • -Path /
  • 44. ADLS: Upload and download • $adls = “mscloudsummitstore” • Import-AzureRmDataLakeStoreItem -Account $adls -Path d:somefile.txt -Destination /somefile.txt • Export-AzureRmDataLakeStoreItem -Account $adls -Path /somefile.txt -Destination d:somefile_copy.txt
  • 45. ADLA: List and submit jobs • $adla = “mscloudsummitanalytics” • Get-AzureRmDataLakeAnalyticsJob -Account $adla • Submit-AzureRmDataLakeAnalyticsJob -Account $adla -Script “…” # U-SQL text -Name myjob • Submit-AzureRmDataLakeAnalyticsJob -Account $adla -ScriptPath D:test.script -Name myjob
  • 46. ADL Store (ADLS) feature set Account Management Create new account List accounts Update account properties Delete account Transferring Data Upload into store from local disk Download from store to local disk Files and Folders List contents of folder Create Move Delete Does file exist Security Get ACLs Update ACLs Get Owner Set Owner File Content Set file content Append file content Get file content Merge files
  • 47. ADL Analytics (ADLA) feature set Account Management Create new account List accounts Update account properties Delete account Data Sources Add a data source List data sources Update data source Delete data source Compute List jobs Submit job Cancel job Catalog Items List items in U-SQL catalog Update item Catalog Secrets Create catalog secret List catalog secrets Delete catalog secrets
  • 50. Install samples and assemblies
  • 53. COGNITIVE ANALYSIS OF IMAGES Demo
  • 54. Additional capabilities and resources Tools: • http://aka.ms/adltoolsVS Blogs and community page: • http://funkylab.com/ • http://blogs.msdn.com/b/visualstudio/ • http://azure.microsoft.com/en-us/blog/topics/big-data/ • https://channel9.msdn.com/Search?term=U-SQL#ch9Search • https://blogs.msdn.microsoft.com/azuredatalake/2016/11/22/u-sql-advanced-analytics-introducing-cognitive-scenarios-for-text- and-imaging/ Documentation and articles and slides: • http://aka.ms/usql_reference • https://azure.microsoft.com/en-us/documentation/services/data-lake-analytics/ • https://msdn.microsoft.com/en-us/magazine/mt614251 GITHUB Get startet • https://Github.com ADL forums and feedback • http://aka.ms/adlfeedback • https://social.msdn.microsoft.com/Forums/azure/en-US/home?forum=AzureDataLake • http://stackoverflow.com/questions/tagged/u-sql
  • 56. Register for SQL Nexus 2017
  • 57. Register for SQL Nexus 2017
  • 58. Merci beaucoup à nos sponsors! Thank you to all our sponsors! Join the conversation #MSCloudSummit @MSCloudSummit
  • 59. Merci Beaucoup! Thank you! Join the conversation #MSCloudSummit @MSCloudSummit
  • 60. http://bit.ly/MSCSevalJ1 Evaluez les sessions… …et tentez de gagner une Surface Pro 4