SlideShare ist ein Scribd-Unternehmen logo
1 von 42
Agenda
3
Data sourcesNon-relational data
DESIGNED FOR THE
QUESTIONS YOU KNOW!
The Data Lake Approach
Ingest all data
regardless of
requirements
Store all data
in native format
without schema
definition
Do analysis
Hadoop, Spark, R,
Azure Data Lake
Analytics (ADLA)
Interactive queries
Batch queries
Machine Learning
Data warehouse
Real-time analytics
Devices
HDFS Compatible REST API
ADL Store
.NET, SQL, Python, R
scaled out by U-SQL
ADL Analytics
Open Source Apache
Hadoop ADL Client
Azure Databricks
HDInsight
Hive
• Performance at
scale
• Optimized for
analytics
• Multiple
analytics
engines
• Single
repository
sharing
Governance and
Master Data
Management
Azure SQL Data Warehouse
Data Quality
and Lineage
ERP,
CRM,
and
other
LOB
Data
OLTP
and
other
RDBMS
Clickstrea
m Logs
and
Events
Sensors,
Social,
Weather,
other
un-
structure
d data
ETL
Azure
Data
Lake
Analytics
(U-SQL)
Azure Data Lake Store
Spark
on HDI,
DataBri
cks
BI Models
Reports and
Dashboards
Apache
Hadoo
p on
HDInsi
ght
Polybase
Analyst
Power User
Data Engineer
Data Scientist
Big Data Warehouse
Streaming Layer
Clean,
Curate,
Aggregate
Combine
reference
data
Perform
Scoring
from ML
models
IoT
Sensors
and/or
User
activity
streams
Social,
Trends,
Weathe
r etc.
Clickstrea
m, Batch
Files,
server
logs,
Images,
videos,
and other
unstructur
ed data
Event
Broker
(Event
Hubs,
Apache
Kafka)
Azure
Data
Lake
Analytics
(U-SQL)
Azure Data Lake Store
Spark
on HDI,
Databri
cks
Event
Broker
Realtime
Dashboards
Apache
Hadoop
on
HDInsig
ht
Analyst
Data Engineer
Data Scientist
Trained
Machine
Learning
Models
Reference Data
Realtime Processing with Lambda Architecture
Automated
Systems
Data Lake Analytics Workloads
With BATCH workload, Data Lake Analytics is ideal for
• The transformation and preparation of data for use in other systems
• Analytics on VERY LARGE amounts of data
• Massively Parallel programs written in .NET, Python and R, scaled out with U-
SQL
• Performing Cognition at Scale on large collections
Scales out your custom code in .NET, Python, R over
your Data Lake
Familiar syntax to millions of SQL & .NET developers
Unifies
• Declarative nature of SQL with the imperative
power of your programming language
• Processing of structured, semi-structured and
unstructured data
• Querying multiple Azure Data Sources
(Federated Query)
Introducing U-SQL
A framework for Big Data
U-SQL can query data from multiple sources in Azure.
Where possible data transformation is pushed close to
the remote query engine to minimize data transfer and
maximize performance.
Easily query data in multiple Azure data stores without
moving it to a single store
U-SQL
Query
Query
Azure
Storage Blobs
Azure SQL
in VMs
Azure
SQL DB
Azure Data
Lake Analytics
Azure
SQL Data Warehouse
Azure
Data Lake Storage
Embedded Artificial Intelligence
Host Deep Neural Networks (DNNs)
6 Built-in Cognitive Functions
 Face API
 Image Tagging
 Emotion analysis
 OCR
 Text Key Phrase Extraction
 Text Sentiment Analysis
• Fully managed service to support orchestration of data
movement and transformation
• Connect to relational or non-relational data that is on-premises
or in the cloud
• Single pane of glass to monitor and manage data processing
pipelines
• Globally deployed service infrastructure
• Cost Effective
Compose, orchestrate & monitor data services at scale
Stored Procedures
Hadoop on Azure
Trusted data
BI & analyticsData Lake Analytics
Custom Code
Machine Learning
Category Data store Supported as source Supported as sink
Azure
Azure Data Lake Store
Azure Blob storage
Azure SQL Database
Azure SQL Data Warehouse
Azure Table storage
Azure DocumentDB
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
Databases
SQL Server*
Oracle*
MySQL*
DB2*
Teradata*
PostgreSQL*
Sybase*
Cassandra*
MongoDB*
Amazon Redshift
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
File
File System*
HDFS*
Amazon S3
✓
✓
✓
✓
Others
Salesforce
Generic ODBC*
Generic OData
Web Table (table from HTML)
GE Historian*
✓
✓
✓
✓
✓
Connects ADL Store out-of-the-box to all your stores
{ "name": "ComputeEventsByRegionPipeline",
"properties": { "description": "This is a U-SQL pipeline.",
"activities": [ { "type": "DataLakeAnalyticsU-SQL",
"typeProperties": { "scriptPath":
"scriptskonaSearchLogProcessing.txt",
"scriptLinkedService": "StorageLinkedService",
"degreeOfParallelism": 3,
"priority": 100,
"parameters": { "in": "/datalake/input/SearchLog.tsv",
"out": "/datalake/output/Result.tsv" } },
"inputs": [ { "name": "DataLakeTable" } ],
"outputs": [ { "name": "EventsByRegionTable" } ],
"policy": { "timeout": "06:00:00",
"concurrency": 1,
"executionPriorityOrder": "NewestFirst",
"retry": 1 },
"scheduler": { "frequency": "Day", "interval": 1 },
"name": "EventsByRegion",
"linkedServiceName":
"AzureDataLakeAnalyticsLinkedService" } ],
"start": "2015-08-08T00:00:00Z",
"end": "2015-08-08T01:00:00Z",
"isPaused": false } }
https://docs.microsoft.com/en-us/azure/data-factory/v1/data-factory-usql-activity
https://docs.microsoft.com/en-us/azure/data-factory/transform-data-using-data-lake-analytics
Original Jobs View New Pipeline Jobs ViewOriginal
• List jobs submitted in the last 30 days
• Aggregate trends of jobs over 30 days
• Order and filter list of jobs
New
• Superset of original jobs view
• Adds grouping of jobs by pipelines & recurrences
• Jobs and consumption trends per pipeline
• Quickly identify pipelines and jobs to troubleshoot
• Quickly compare failed jobs with “last known good” instance
• Manage pipeline cost, improve efficiency and predict future
cost
How to use
• Create ADF v2 pipelines containing ADLA U-SQL activities
• Pipelines and Recurrences automatically appear in ADLA portal
• Submit and monitor pipeline/recurring jobs using Azure
PowerShell, ADLA SDK and REST APIs
 Automatic "in-lining"
optimized out-of-
the-box
 Per job
parallelization
visibility into execution
 Heatmap to identify
bottlenecks
EXTRACT Expression
@s = EXTRACT a string, b int
FROM "filepath/file.csv"
USING Extractors.Csv(encoding: Encoding.Unicode);
• Built-in Extractors: Csv, Tsv, Text with lots of options, Parquet
• Custom Extractors: e.g., JSON, XML, etc. (see http://usql.io)
OUTPUT Expression
OUTPUT @s
TO "filepath/file.csv"
USING Outputters.Csv();
• Built-in Outputters: Csv, Tsv, Text, Parquet
• Custom Outputters: e.g., JSON, XML, etc. (see http://usql.io)
Filepath URIs
• Relative URI to default ADL Storage account: "filepath/file.csv"
• Absolute URIs:
• ADLS: "adl://account.azuredatalakestore.net/filepath/file.csv"
• WASB: "wasb://container@account/filepath/file.csv"
Simple pattern language on filename and path
@pattern string =
"/input/{date:yyyy}/{date:MM}/{date:dd}/{*}.{suffix}";
• Binds two columns date and suffix
• Wildcards the filename
• Limits on number of files and file sizes can be improved with
SET @@FeaturePreviews =
"FileSetV2Dot5:on,InputFileGrouping:on,
AsyncCompilerStoreAccess:on";
(Will become default between now and middle of year)
Virtual columns
EXTRACT name string
, suffix string // virtual column
, date DateTime // virtual column
FROM @pattern
USING Extractors.Csv();
• Refer to virtual columns in predicates to get partition elimination
• Warning gets raised if no partition elimination was found
ADLA Account/Catalog
Database
Schema
[1,n]
[1,n]
[0,n]
tables views TVFs
C# Fns C# UDAgg
Clustered
Index
partitions
C#
Assemblies
C# Extractors
Data
Source
C# Reducers
C# Processors
C# Combiners
C# Outputters
Ext. tables
User
objects
Refers toContains Implemented
and named by
Procedures
Creden-
tials
MD
Name
C#
Name
C# Applier
Table Types
Legend
Statistics
C# UDTs
Packages
• Naming
• Discovery
• Sharing
• Securing
U-SQL Catalog
Naming
• Default Database and Schema context: master.dbo
• Quote identifiers with []: [my table]
• Stores data in ADL Storage /catalog folder
Discovery
• Visual Studio Server Explorer
• Azure Data Lake Analytics Portal
• SDKs and Azure Powershell commands
• Catalog Views: usql.databases, usql.tables etc.
Sharing
• Within an Azure Data Lake Analytics account
• Across ADLA accounts that share same Azure Active Directory:
• Referencing Assemblies
• Calling TVFs, Procedures and referencing tables and views
• Inserting into tables
Securing
• Secured with AAD principals at catalog and Database level
Views
CREATE VIEW V AS EXTRACT…
CREATE VIEW V AS SELECT …
• Cannot contain user-defined objects (e.g. UDF or UDOs)!
• Will be inlined
Table-Valued Functions (TVFs)
CREATE FUNCTION F (@arg string = "default")
RETURNS @res [TABLE ( … )]
AS BEGIN … @res = … END;
• Provides parameterization
• One or more results
• Can contain multiple statements
• Can contain user-code (needs assembly reference)
• Will always be inlined
• Infers schema or checks against specified return schema
CREATE PROCEDURE P (@arg string = "default“) AS
BEGIN
…;
OUTPUT @res TO …;
INSERT INTO T …;
END;
• Provides parameterization
• No result but writes into file or table
• Can contain multiple statements
• Can contain user-code (needs assembly reference)
• Will always be inlined
• Can contain DDL (but no CREATE, DROP FUNCTION/PROCEDURE)
DECLARE @variable SqlArray<int> =
new SqlArray<int>{1,2};
DECLARE @variable = new SqlArray<int>{1,2};
• Provides named and typed scalar expressions
• Option to infer the type of the scalar variable
DECLARE EXTERNAL @parameter = "string value";
• Provides overwriteable defaulting of a scalar variable
• Allows external parameter models (e.g., Azure Data Factory)
DECLARE CONST @const_expression = "my "+@parameter;
• Checks and guarantees that expression is evaluated at compile time,
otherwise errors.
CREATE TABLE T (col1 int
, col2 string
, col3 SQL.MAP<string,string>
, INDEX idx CLUSTERED (col2 ASC)
PARTITION BY (col1)
DISTRIBUTED BY HASH (driver_id)
);
• Structured Data, built-in Data types only (no UDTs)
• Clustered Index (needs to be specified): row-oriented
• Fine-grained distribution (needs to be specified):
• HASH, DIRECT HASH, RANGE, ROUND ROBIN
• Addressable Partitions (optional)
CREATE TABLE T (INDEX idx CLUSTERED …) AS SELECT …;
CREATE TABLE T (INDEX idx CLUSTERED …) AS EXTRACT…;
CREATE TABLE T (INDEX idx CLUSTERED …) AS myTVF(DEFAULT);
• Infer the schema from the query
• Still requires index and distribution (does not support partitioning)
Benefits of Table clustering and distribution
• Faster lookup of data provided by distribution and clustering when right
distribution/cluster is chosen
• Data distribution provides better localized scale out
• Used for filters, joins and grouping
Benefits of Table partitioning
• Provides data life cycle management (“expire” old partitions)
• Partial re-computation of data at partition level
• Query predicates can provide partition elimination
Do not use when…
• No filters, joins and grouping
• No reuse of the data for future queries
If in doubt: use sampling (e.g., SAMPLE ANY(x)) and test.
ALTER TABLE T ADD COLUMN eventName string;
ALTER TABLE T DROP COLUMN col3;
ALTER TABLE T ADD COLUMN result string, clientId string,
payload int?;
ALTER TABLE T DROP COLUMN clientId, result;
• Meta-data only operation
• Existing rows will get
• Non-nullable types: C# data type default value (e.g., int will be 0)
• Nullable types: null
U-SQL extensibility
Extend U-SQL with C#/.NET/Python/R
Built-in operators,
function, aggregates
C# expressions (in SELECT expressions)
User-defined aggregates (UDAGGs)
User-defined functions (UDFs)
User-defined operators (UDOs) in
.Net/Python/R
What are UDOs?
Custom Operator Extensions in
language of your choice
Scaled out by U-SQL
• PROCESS
• COMBINE
• REDUCE
JSON Processing
https://github.com/Azure/usql/tree/master/Examples/DataFormats
https://github.com/Azure/usql/tree/master/Examples/JSONExamples
Microsoft.Analytics.Samples.Formats
NewtonSoft.Json Apache AvroSystem.Xml
JSON
Processing
JSON
Processing
@json =
EXTRACT personid int,
name string,
addresses string
FROM @input
USING new Json.JsonExtractor(“[*].person");
@person =
SELECT personid,
name,
Json.JsonFunctions.JsonTuple(addresses)["address"] AS address_array
FROM @json;
@addresses = SELECT personid, name, Json.JsonFunctions.JsonTuple(address) AS address
FROM @person
CROSS APPLY
EXPLODE (Json.JsonFunctions.JsonTuple(address_array).Values) AS A(address);
@result =
SELECT personid,
name,
address["addressid"]AS addressid,
address["street"]AS street,
address["postcode"]AS postcode,
address["city"]AS city
FROM @addresses;
https://github.com/Azure/usql/tree/master/Examples/ImageApp
https://docs.microsoft.com/en-us/azure/data-lake-analytics/data-lake-analytics-u-sql-cognitive
Car
Green
Parked
Outdoor
Racing
Additional
Resources
http://usql.io
http://blogs.msdn.microsoft.com/azuredatalake/
http://blogs.msdn.microsoft.com/mrys/
https://channel9.msdn.com/Search?term=U-SQL#ch9Search
http://aka.ms/usql_reference
https://docs.microsoft.com/en-us/azure/data-lake-analytics/data-lake-analytics-u-sql-
programmability-guide
https://docs.microsoft.com/en-us/azure/data-lake-analytics/
https://msdn.microsoft.com/en-us/magazine/mt614251
https://msdn.microsoft.com/magazine/mt790200
http://www.slideshare.net/MichaelRys
Getting Started with R in U-SQL
https://docs.microsoft.com/en-us/azure/data-lake-analytics/data-lake-analytics-u-sql-
python-extensions
https://social.msdn.microsoft.com/Forums/azure/en-US/home?forum=AzureDataLake
http://stackoverflow.com/questions/tagged/u-sql
http://aka.ms/adlfeedback
Continue your
education at
Microsoft Virtual
Academy online.
Learn more about Azure Data Lake
usql@microsoft.com@MikeDoesBigData
Thank You!
http://aka.ms/azuredatalake

Weitere ähnliche Inhalte

Was ist angesagt?

Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...Michael Rys
 
Microsoft's Hadoop Story
Microsoft's Hadoop StoryMicrosoft's Hadoop Story
Microsoft's Hadoop StoryMichael Rys
 
U-SQL Federated Distributed Queries (SQLBits 2016)
U-SQL Federated Distributed Queries (SQLBits 2016)U-SQL Federated Distributed Queries (SQLBits 2016)
U-SQL Federated Distributed Queries (SQLBits 2016)Michael Rys
 
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)Michael Rys
 
Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Big Data Processing with Spark and .NET - Microsoft Ignite 2019Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Big Data Processing with Spark and .NET - Microsoft Ignite 2019Michael Rys
 
Azure Data Lake Analytics Deep Dive
Azure Data Lake Analytics Deep DiveAzure Data Lake Analytics Deep Dive
Azure Data Lake Analytics Deep DiveIlyas F ☁☁☁
 
U-SQL Query Execution and Performance Basics (SQLBits 2016)
U-SQL Query Execution and Performance Basics (SQLBits 2016)U-SQL Query Execution and Performance Basics (SQLBits 2016)
U-SQL Query Execution and Performance Basics (SQLBits 2016)Michael Rys
 
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. NielsenJ1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. NielsenMS Cloud Summit
 
U-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for DevelopersU-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for DevelopersMichael Rys
 
Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)Michael Rys
 
Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)Michael Rys
 
Taming the Data Science Monster with A New ‘Sword’ – U-SQL
Taming the Data Science Monster with A New ‘Sword’ – U-SQLTaming the Data Science Monster with A New ‘Sword’ – U-SQL
Taming the Data Science Monster with A New ‘Sword’ – U-SQLMichael Rys
 
SQLBits X Scaling out with SQL Azure Federations
SQLBits X Scaling out with SQL Azure FederationsSQLBits X Scaling out with SQL Azure Federations
SQLBits X Scaling out with SQL Azure FederationsMichael Rys
 
ADL/U-SQL Introduction (SQLBits 2016)
ADL/U-SQL Introduction (SQLBits 2016)ADL/U-SQL Introduction (SQLBits 2016)
ADL/U-SQL Introduction (SQLBits 2016)Michael Rys
 
Dipping Your Toes: Azure Data Lake for DBAs
Dipping Your Toes: Azure Data Lake for DBAsDipping Your Toes: Azure Data Lake for DBAs
Dipping Your Toes: Azure Data Lake for DBAsBob Pusateri
 
Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Michael Rys
 
U-SQL Reading & Writing Files (SQLBits 2016)
U-SQL Reading & Writing Files (SQLBits 2016)U-SQL Reading & Writing Files (SQLBits 2016)
U-SQL Reading & Writing Files (SQLBits 2016)Michael Rys
 
Azure data bricks by Eugene Polonichko
Azure data bricks by Eugene PolonichkoAzure data bricks by Eugene Polonichko
Azure data bricks by Eugene PolonichkoAlex Tumanoff
 

Was ist angesagt? (20)

Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
 
Microsoft's Hadoop Story
Microsoft's Hadoop StoryMicrosoft's Hadoop Story
Microsoft's Hadoop Story
 
U-SQL Federated Distributed Queries (SQLBits 2016)
U-SQL Federated Distributed Queries (SQLBits 2016)U-SQL Federated Distributed Queries (SQLBits 2016)
U-SQL Federated Distributed Queries (SQLBits 2016)
 
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
 
Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Big Data Processing with Spark and .NET - Microsoft Ignite 2019Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Big Data Processing with Spark and .NET - Microsoft Ignite 2019
 
Azure data lake sql konf 2016
Azure data lake   sql konf 2016Azure data lake   sql konf 2016
Azure data lake sql konf 2016
 
Azure Data Lake Analytics Deep Dive
Azure Data Lake Analytics Deep DiveAzure Data Lake Analytics Deep Dive
Azure Data Lake Analytics Deep Dive
 
U-SQL Query Execution and Performance Basics (SQLBits 2016)
U-SQL Query Execution and Performance Basics (SQLBits 2016)U-SQL Query Execution and Performance Basics (SQLBits 2016)
U-SQL Query Execution and Performance Basics (SQLBits 2016)
 
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. NielsenJ1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
 
U-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for DevelopersU-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for Developers
 
Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)
 
Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)
 
An intro to Azure Data Lake
An intro to Azure Data LakeAn intro to Azure Data Lake
An intro to Azure Data Lake
 
Taming the Data Science Monster with A New ‘Sword’ – U-SQL
Taming the Data Science Monster with A New ‘Sword’ – U-SQLTaming the Data Science Monster with A New ‘Sword’ – U-SQL
Taming the Data Science Monster with A New ‘Sword’ – U-SQL
 
SQLBits X Scaling out with SQL Azure Federations
SQLBits X Scaling out with SQL Azure FederationsSQLBits X Scaling out with SQL Azure Federations
SQLBits X Scaling out with SQL Azure Federations
 
ADL/U-SQL Introduction (SQLBits 2016)
ADL/U-SQL Introduction (SQLBits 2016)ADL/U-SQL Introduction (SQLBits 2016)
ADL/U-SQL Introduction (SQLBits 2016)
 
Dipping Your Toes: Azure Data Lake for DBAs
Dipping Your Toes: Azure Data Lake for DBAsDipping Your Toes: Azure Data Lake for DBAs
Dipping Your Toes: Azure Data Lake for DBAs
 
Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)
 
U-SQL Reading & Writing Files (SQLBits 2016)
U-SQL Reading & Writing Files (SQLBits 2016)U-SQL Reading & Writing Files (SQLBits 2016)
U-SQL Reading & Writing Files (SQLBits 2016)
 
Azure data bricks by Eugene Polonichko
Azure data bricks by Eugene PolonichkoAzure data bricks by Eugene Polonichko
Azure data bricks by Eugene Polonichko
 

Ähnlich wie Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platform, and intelligent (SQLBits 2018)

Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage CCG
 
Using existing language skillsets to create large-scale, cloud-based analytics
Using existing language skillsets to create large-scale, cloud-based analyticsUsing existing language skillsets to create large-scale, cloud-based analytics
Using existing language skillsets to create large-scale, cloud-based analyticsMicrosoft Tech Community
 
3 CityNetConf - sql+c#=u-sql
3 CityNetConf - sql+c#=u-sql3 CityNetConf - sql+c#=u-sql
3 CityNetConf - sql+c#=u-sqlŁukasz Grala
 
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)Amazon Web Services Korea
 
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.Amazon Web Services
 
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.Amazon Web Services
 
Announcing Amazon Athena - Instantly Analyze Your Data in S3 Using SQL
Announcing Amazon Athena - Instantly Analyze Your Data in S3 Using SQLAnnouncing Amazon Athena - Instantly Analyze Your Data in S3 Using SQL
Announcing Amazon Athena - Instantly Analyze Your Data in S3 Using SQLAmazon Web Services
 
Azure Lowlands: An intro to Azure Data Lake
Azure Lowlands: An intro to Azure Data LakeAzure Lowlands: An intro to Azure Data Lake
Azure Lowlands: An intro to Azure Data LakeRick van den Bosch
 
USQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventUSQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventTrivadis
 
In-memory ColumnStore Index
In-memory ColumnStore IndexIn-memory ColumnStore Index
In-memory ColumnStore IndexSolidQ
 
SQL on Hadoop
SQL on HadoopSQL on Hadoop
SQL on Hadoopnvvrajesh
 
Data saturday malta - ADX Azure Data Explorer overview
Data saturday malta - ADX Azure Data Explorer overviewData saturday malta - ADX Azure Data Explorer overview
Data saturday malta - ADX Azure Data Explorer overviewRiccardo Zamana
 
Experience sql server on l inux and docker
Experience sql server on l inux and dockerExperience sql server on l inux and docker
Experience sql server on l inux and dockerBob Ward
 
Data quality patterns in the cloud with ADF
Data quality patterns in the cloud with ADFData quality patterns in the cloud with ADF
Data quality patterns in the cloud with ADFMark Kromer
 
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Michael Rys
 
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...WSO2
 
Cortana Analytics Workshop: Azure Data Lake
Cortana Analytics Workshop: Azure Data LakeCortana Analytics Workshop: Azure Data Lake
Cortana Analytics Workshop: Azure Data LakeMSAdvAnalytics
 
Business intelligence
Business intelligenceBusiness intelligence
Business intelligenceshraddha mane
 

Ähnlich wie Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platform, and intelligent (SQLBits 2018) (20)

Introduction to Amazon Athena
Introduction to Amazon AthenaIntroduction to Amazon Athena
Introduction to Amazon Athena
 
Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage
 
Using existing language skillsets to create large-scale, cloud-based analytics
Using existing language skillsets to create large-scale, cloud-based analyticsUsing existing language skillsets to create large-scale, cloud-based analytics
Using existing language skillsets to create large-scale, cloud-based analytics
 
3 CityNetConf - sql+c#=u-sql
3 CityNetConf - sql+c#=u-sql3 CityNetConf - sql+c#=u-sql
3 CityNetConf - sql+c#=u-sql
 
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
 
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
 
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
 
Announcing Amazon Athena - Instantly Analyze Your Data in S3 Using SQL
Announcing Amazon Athena - Instantly Analyze Your Data in S3 Using SQLAnnouncing Amazon Athena - Instantly Analyze Your Data in S3 Using SQL
Announcing Amazon Athena - Instantly Analyze Your Data in S3 Using SQL
 
Azure Lowlands: An intro to Azure Data Lake
Azure Lowlands: An intro to Azure Data LakeAzure Lowlands: An intro to Azure Data Lake
Azure Lowlands: An intro to Azure Data Lake
 
USQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventUSQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake Event
 
In-memory ColumnStore Index
In-memory ColumnStore IndexIn-memory ColumnStore Index
In-memory ColumnStore Index
 
SQL on Hadoop
SQL on HadoopSQL on Hadoop
SQL on Hadoop
 
Data saturday malta - ADX Azure Data Explorer overview
Data saturday malta - ADX Azure Data Explorer overviewData saturday malta - ADX Azure Data Explorer overview
Data saturday malta - ADX Azure Data Explorer overview
 
Experience sql server on l inux and docker
Experience sql server on l inux and dockerExperience sql server on l inux and docker
Experience sql server on l inux and docker
 
Data quality patterns in the cloud with ADF
Data quality patterns in the cloud with ADFData quality patterns in the cloud with ADF
Data quality patterns in the cloud with ADF
 
ETL 2.0 Data Engineering for developers
ETL 2.0 Data Engineering for developersETL 2.0 Data Engineering for developers
ETL 2.0 Data Engineering for developers
 
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
 
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...
 
Cortana Analytics Workshop: Azure Data Lake
Cortana Analytics Workshop: Azure Data LakeCortana Analytics Workshop: Azure Data Lake
Cortana Analytics Workshop: Azure Data Lake
 
Business intelligence
Business intelligenceBusiness intelligence
Business intelligence
 

Mehr von Michael Rys

Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...Michael Rys
 
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...Michael Rys
 
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...Michael Rys
 
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...Michael Rys
 
Killer Scenarios with Data Lake in Azure with U-SQL
Killer Scenarios with Data Lake in Azure with U-SQLKiller Scenarios with Data Lake in Azure with U-SQL
Killer Scenarios with Data Lake in Azure with U-SQLMichael Rys
 
U-SQL Learning Resources (SQLBits 2016)
U-SQL Learning Resources (SQLBits 2016)U-SQL Learning Resources (SQLBits 2016)
U-SQL Learning Resources (SQLBits 2016)Michael Rys
 
U-SQL Partitioned Data and Tables (SQLBits 2016)
U-SQL Partitioned Data and Tables (SQLBits 2016)U-SQL Partitioned Data and Tables (SQLBits 2016)
U-SQL Partitioned Data and Tables (SQLBits 2016)Michael Rys
 
U-SQL User-Defined Operators (UDOs) (SQLBits 2016)
U-SQL User-Defined Operators (UDOs) (SQLBits 2016)U-SQL User-Defined Operators (UDOs) (SQLBits 2016)
U-SQL User-Defined Operators (UDOs) (SQLBits 2016)Michael Rys
 
U-SQL Does SQL (SQLBits 2016)
U-SQL Does SQL (SQLBits 2016)U-SQL Does SQL (SQLBits 2016)
U-SQL Does SQL (SQLBits 2016)Michael Rys
 
U-SQL Meta Data Catalog (SQLBits 2016)
U-SQL Meta Data Catalog (SQLBits 2016)U-SQL Meta Data Catalog (SQLBits 2016)
U-SQL Meta Data Catalog (SQLBits 2016)Michael Rys
 
U-SQL Intro (SQLBits 2016)
U-SQL Intro (SQLBits 2016)U-SQL Intro (SQLBits 2016)
U-SQL Intro (SQLBits 2016)Michael Rys
 
Using C# with U-SQL (SQLBits 2016)
Using C# with U-SQL (SQLBits 2016)Using C# with U-SQL (SQLBits 2016)
Using C# with U-SQL (SQLBits 2016)Michael Rys
 

Mehr von Michael Rys (12)

Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
 
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
 
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...
 
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
 
Killer Scenarios with Data Lake in Azure with U-SQL
Killer Scenarios with Data Lake in Azure with U-SQLKiller Scenarios with Data Lake in Azure with U-SQL
Killer Scenarios with Data Lake in Azure with U-SQL
 
U-SQL Learning Resources (SQLBits 2016)
U-SQL Learning Resources (SQLBits 2016)U-SQL Learning Resources (SQLBits 2016)
U-SQL Learning Resources (SQLBits 2016)
 
U-SQL Partitioned Data and Tables (SQLBits 2016)
U-SQL Partitioned Data and Tables (SQLBits 2016)U-SQL Partitioned Data and Tables (SQLBits 2016)
U-SQL Partitioned Data and Tables (SQLBits 2016)
 
U-SQL User-Defined Operators (UDOs) (SQLBits 2016)
U-SQL User-Defined Operators (UDOs) (SQLBits 2016)U-SQL User-Defined Operators (UDOs) (SQLBits 2016)
U-SQL User-Defined Operators (UDOs) (SQLBits 2016)
 
U-SQL Does SQL (SQLBits 2016)
U-SQL Does SQL (SQLBits 2016)U-SQL Does SQL (SQLBits 2016)
U-SQL Does SQL (SQLBits 2016)
 
U-SQL Meta Data Catalog (SQLBits 2016)
U-SQL Meta Data Catalog (SQLBits 2016)U-SQL Meta Data Catalog (SQLBits 2016)
U-SQL Meta Data Catalog (SQLBits 2016)
 
U-SQL Intro (SQLBits 2016)
U-SQL Intro (SQLBits 2016)U-SQL Intro (SQLBits 2016)
U-SQL Intro (SQLBits 2016)
 
Using C# with U-SQL (SQLBits 2016)
Using C# with U-SQL (SQLBits 2016)Using C# with U-SQL (SQLBits 2016)
Using C# with U-SQL (SQLBits 2016)
 

Kürzlich hochgeladen

Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 

Kürzlich hochgeladen (20)

Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 

Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platform, and intelligent (SQLBits 2018)

  • 1.
  • 3. 3 Data sourcesNon-relational data DESIGNED FOR THE QUESTIONS YOU KNOW!
  • 4. The Data Lake Approach Ingest all data regardless of requirements Store all data in native format without schema definition Do analysis Hadoop, Spark, R, Azure Data Lake Analytics (ADLA) Interactive queries Batch queries Machine Learning Data warehouse Real-time analytics Devices
  • 5. HDFS Compatible REST API ADL Store .NET, SQL, Python, R scaled out by U-SQL ADL Analytics Open Source Apache Hadoop ADL Client Azure Databricks HDInsight Hive • Performance at scale • Optimized for analytics • Multiple analytics engines • Single repository sharing
  • 6.
  • 7. Governance and Master Data Management Azure SQL Data Warehouse Data Quality and Lineage ERP, CRM, and other LOB Data OLTP and other RDBMS Clickstrea m Logs and Events Sensors, Social, Weather, other un- structure d data ETL Azure Data Lake Analytics (U-SQL) Azure Data Lake Store Spark on HDI, DataBri cks BI Models Reports and Dashboards Apache Hadoo p on HDInsi ght Polybase Analyst Power User Data Engineer Data Scientist Big Data Warehouse
  • 8. Streaming Layer Clean, Curate, Aggregate Combine reference data Perform Scoring from ML models IoT Sensors and/or User activity streams Social, Trends, Weathe r etc. Clickstrea m, Batch Files, server logs, Images, videos, and other unstructur ed data Event Broker (Event Hubs, Apache Kafka) Azure Data Lake Analytics (U-SQL) Azure Data Lake Store Spark on HDI, Databri cks Event Broker Realtime Dashboards Apache Hadoop on HDInsig ht Analyst Data Engineer Data Scientist Trained Machine Learning Models Reference Data Realtime Processing with Lambda Architecture Automated Systems
  • 9.
  • 10.
  • 11. Data Lake Analytics Workloads With BATCH workload, Data Lake Analytics is ideal for • The transformation and preparation of data for use in other systems • Analytics on VERY LARGE amounts of data • Massively Parallel programs written in .NET, Python and R, scaled out with U- SQL • Performing Cognition at Scale on large collections
  • 12. Scales out your custom code in .NET, Python, R over your Data Lake Familiar syntax to millions of SQL & .NET developers Unifies • Declarative nature of SQL with the imperative power of your programming language • Processing of structured, semi-structured and unstructured data • Querying multiple Azure Data Sources (Federated Query) Introducing U-SQL A framework for Big Data
  • 13. U-SQL can query data from multiple sources in Azure. Where possible data transformation is pushed close to the remote query engine to minimize data transfer and maximize performance. Easily query data in multiple Azure data stores without moving it to a single store U-SQL Query Query Azure Storage Blobs Azure SQL in VMs Azure SQL DB Azure Data Lake Analytics Azure SQL Data Warehouse Azure Data Lake Storage
  • 14. Embedded Artificial Intelligence Host Deep Neural Networks (DNNs) 6 Built-in Cognitive Functions  Face API  Image Tagging  Emotion analysis  OCR  Text Key Phrase Extraction  Text Sentiment Analysis
  • 15. • Fully managed service to support orchestration of data movement and transformation • Connect to relational or non-relational data that is on-premises or in the cloud • Single pane of glass to monitor and manage data processing pipelines • Globally deployed service infrastructure • Cost Effective Compose, orchestrate & monitor data services at scale Stored Procedures Hadoop on Azure Trusted data BI & analyticsData Lake Analytics Custom Code Machine Learning
  • 16. Category Data store Supported as source Supported as sink Azure Azure Data Lake Store Azure Blob storage Azure SQL Database Azure SQL Data Warehouse Azure Table storage Azure DocumentDB ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ Databases SQL Server* Oracle* MySQL* DB2* Teradata* PostgreSQL* Sybase* Cassandra* MongoDB* Amazon Redshift ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ File File System* HDFS* Amazon S3 ✓ ✓ ✓ ✓ Others Salesforce Generic ODBC* Generic OData Web Table (table from HTML) GE Historian* ✓ ✓ ✓ ✓ ✓ Connects ADL Store out-of-the-box to all your stores
  • 17. { "name": "ComputeEventsByRegionPipeline", "properties": { "description": "This is a U-SQL pipeline.", "activities": [ { "type": "DataLakeAnalyticsU-SQL", "typeProperties": { "scriptPath": "scriptskonaSearchLogProcessing.txt", "scriptLinkedService": "StorageLinkedService", "degreeOfParallelism": 3, "priority": 100, "parameters": { "in": "/datalake/input/SearchLog.tsv", "out": "/datalake/output/Result.tsv" } }, "inputs": [ { "name": "DataLakeTable" } ], "outputs": [ { "name": "EventsByRegionTable" } ], "policy": { "timeout": "06:00:00", "concurrency": 1, "executionPriorityOrder": "NewestFirst", "retry": 1 }, "scheduler": { "frequency": "Day", "interval": 1 }, "name": "EventsByRegion", "linkedServiceName": "AzureDataLakeAnalyticsLinkedService" } ], "start": "2015-08-08T00:00:00Z", "end": "2015-08-08T01:00:00Z", "isPaused": false } } https://docs.microsoft.com/en-us/azure/data-factory/v1/data-factory-usql-activity https://docs.microsoft.com/en-us/azure/data-factory/transform-data-using-data-lake-analytics
  • 18. Original Jobs View New Pipeline Jobs ViewOriginal • List jobs submitted in the last 30 days • Aggregate trends of jobs over 30 days • Order and filter list of jobs New • Superset of original jobs view • Adds grouping of jobs by pipelines & recurrences • Jobs and consumption trends per pipeline • Quickly identify pipelines and jobs to troubleshoot • Quickly compare failed jobs with “last known good” instance • Manage pipeline cost, improve efficiency and predict future cost How to use • Create ADF v2 pipelines containing ADLA U-SQL activities • Pipelines and Recurrences automatically appear in ADLA portal • Submit and monitor pipeline/recurring jobs using Azure PowerShell, ADLA SDK and REST APIs
  • 19.
  • 20.  Automatic "in-lining" optimized out-of- the-box  Per job parallelization visibility into execution  Heatmap to identify bottlenecks
  • 21. EXTRACT Expression @s = EXTRACT a string, b int FROM "filepath/file.csv" USING Extractors.Csv(encoding: Encoding.Unicode); • Built-in Extractors: Csv, Tsv, Text with lots of options, Parquet • Custom Extractors: e.g., JSON, XML, etc. (see http://usql.io) OUTPUT Expression OUTPUT @s TO "filepath/file.csv" USING Outputters.Csv(); • Built-in Outputters: Csv, Tsv, Text, Parquet • Custom Outputters: e.g., JSON, XML, etc. (see http://usql.io) Filepath URIs • Relative URI to default ADL Storage account: "filepath/file.csv" • Absolute URIs: • ADLS: "adl://account.azuredatalakestore.net/filepath/file.csv" • WASB: "wasb://container@account/filepath/file.csv"
  • 22.
  • 23.
  • 24. Simple pattern language on filename and path @pattern string = "/input/{date:yyyy}/{date:MM}/{date:dd}/{*}.{suffix}"; • Binds two columns date and suffix • Wildcards the filename • Limits on number of files and file sizes can be improved with SET @@FeaturePreviews = "FileSetV2Dot5:on,InputFileGrouping:on, AsyncCompilerStoreAccess:on"; (Will become default between now and middle of year) Virtual columns EXTRACT name string , suffix string // virtual column , date DateTime // virtual column FROM @pattern USING Extractors.Csv(); • Refer to virtual columns in predicates to get partition elimination • Warning gets raised if no partition elimination was found
  • 25.
  • 26. ADLA Account/Catalog Database Schema [1,n] [1,n] [0,n] tables views TVFs C# Fns C# UDAgg Clustered Index partitions C# Assemblies C# Extractors Data Source C# Reducers C# Processors C# Combiners C# Outputters Ext. tables User objects Refers toContains Implemented and named by Procedures Creden- tials MD Name C# Name C# Applier Table Types Legend Statistics C# UDTs Packages
  • 27. • Naming • Discovery • Sharing • Securing U-SQL Catalog Naming • Default Database and Schema context: master.dbo • Quote identifiers with []: [my table] • Stores data in ADL Storage /catalog folder Discovery • Visual Studio Server Explorer • Azure Data Lake Analytics Portal • SDKs and Azure Powershell commands • Catalog Views: usql.databases, usql.tables etc. Sharing • Within an Azure Data Lake Analytics account • Across ADLA accounts that share same Azure Active Directory: • Referencing Assemblies • Calling TVFs, Procedures and referencing tables and views • Inserting into tables Securing • Secured with AAD principals at catalog and Database level
  • 28. Views CREATE VIEW V AS EXTRACT… CREATE VIEW V AS SELECT … • Cannot contain user-defined objects (e.g. UDF or UDOs)! • Will be inlined Table-Valued Functions (TVFs) CREATE FUNCTION F (@arg string = "default") RETURNS @res [TABLE ( … )] AS BEGIN … @res = … END; • Provides parameterization • One or more results • Can contain multiple statements • Can contain user-code (needs assembly reference) • Will always be inlined • Infers schema or checks against specified return schema
  • 29. CREATE PROCEDURE P (@arg string = "default“) AS BEGIN …; OUTPUT @res TO …; INSERT INTO T …; END; • Provides parameterization • No result but writes into file or table • Can contain multiple statements • Can contain user-code (needs assembly reference) • Will always be inlined • Can contain DDL (but no CREATE, DROP FUNCTION/PROCEDURE)
  • 30. DECLARE @variable SqlArray<int> = new SqlArray<int>{1,2}; DECLARE @variable = new SqlArray<int>{1,2}; • Provides named and typed scalar expressions • Option to infer the type of the scalar variable DECLARE EXTERNAL @parameter = "string value"; • Provides overwriteable defaulting of a scalar variable • Allows external parameter models (e.g., Azure Data Factory) DECLARE CONST @const_expression = "my "+@parameter; • Checks and guarantees that expression is evaluated at compile time, otherwise errors.
  • 31. CREATE TABLE T (col1 int , col2 string , col3 SQL.MAP<string,string> , INDEX idx CLUSTERED (col2 ASC) PARTITION BY (col1) DISTRIBUTED BY HASH (driver_id) ); • Structured Data, built-in Data types only (no UDTs) • Clustered Index (needs to be specified): row-oriented • Fine-grained distribution (needs to be specified): • HASH, DIRECT HASH, RANGE, ROUND ROBIN • Addressable Partitions (optional) CREATE TABLE T (INDEX idx CLUSTERED …) AS SELECT …; CREATE TABLE T (INDEX idx CLUSTERED …) AS EXTRACT…; CREATE TABLE T (INDEX idx CLUSTERED …) AS myTVF(DEFAULT); • Infer the schema from the query • Still requires index and distribution (does not support partitioning)
  • 32. Benefits of Table clustering and distribution • Faster lookup of data provided by distribution and clustering when right distribution/cluster is chosen • Data distribution provides better localized scale out • Used for filters, joins and grouping Benefits of Table partitioning • Provides data life cycle management (“expire” old partitions) • Partial re-computation of data at partition level • Query predicates can provide partition elimination Do not use when… • No filters, joins and grouping • No reuse of the data for future queries If in doubt: use sampling (e.g., SAMPLE ANY(x)) and test.
  • 33. ALTER TABLE T ADD COLUMN eventName string; ALTER TABLE T DROP COLUMN col3; ALTER TABLE T ADD COLUMN result string, clientId string, payload int?; ALTER TABLE T DROP COLUMN clientId, result; • Meta-data only operation • Existing rows will get • Non-nullable types: C# data type default value (e.g., int will be 0) • Nullable types: null
  • 34.
  • 35. U-SQL extensibility Extend U-SQL with C#/.NET/Python/R Built-in operators, function, aggregates C# expressions (in SELECT expressions) User-defined aggregates (UDAGGs) User-defined functions (UDFs) User-defined operators (UDOs) in .Net/Python/R
  • 36. What are UDOs? Custom Operator Extensions in language of your choice Scaled out by U-SQL • PROCESS • COMBINE • REDUCE
  • 39. JSON Processing @json = EXTRACT personid int, name string, addresses string FROM @input USING new Json.JsonExtractor(“[*].person"); @person = SELECT personid, name, Json.JsonFunctions.JsonTuple(addresses)["address"] AS address_array FROM @json; @addresses = SELECT personid, name, Json.JsonFunctions.JsonTuple(address) AS address FROM @person CROSS APPLY EXPLODE (Json.JsonFunctions.JsonTuple(address_array).Values) AS A(address); @result = SELECT personid, name, address["addressid"]AS addressid, address["street"]AS street, address["postcode"]AS postcode, address["city"]AS city FROM @addresses;
  • 41. Additional Resources http://usql.io http://blogs.msdn.microsoft.com/azuredatalake/ http://blogs.msdn.microsoft.com/mrys/ https://channel9.msdn.com/Search?term=U-SQL#ch9Search http://aka.ms/usql_reference https://docs.microsoft.com/en-us/azure/data-lake-analytics/data-lake-analytics-u-sql- programmability-guide https://docs.microsoft.com/en-us/azure/data-lake-analytics/ https://msdn.microsoft.com/en-us/magazine/mt614251 https://msdn.microsoft.com/magazine/mt790200 http://www.slideshare.net/MichaelRys Getting Started with R in U-SQL https://docs.microsoft.com/en-us/azure/data-lake-analytics/data-lake-analytics-u-sql- python-extensions https://social.msdn.microsoft.com/Forums/azure/en-US/home?forum=AzureDataLake http://stackoverflow.com/questions/tagged/u-sql http://aka.ms/adlfeedback Continue your education at Microsoft Virtual Academy online.
  • 42. Learn more about Azure Data Lake usql@microsoft.com@MikeDoesBigData Thank You! http://aka.ms/azuredatalake

Hinweis der Redaktion

  1. DATA SOURCE: Represents a remote data source such as Azure SQL Database. Have to specify all the details (connection string, credentials, etc required to connect to and issues queries. EXTERNAL TABLE: A local table, with columns defined in C# types, that redirects queries issued against it to the remote table that it is based on. U-SQL automatically does the type conversion. External tables lets you impose a specific schema against the remote data, shielding you from remote schema changes. You can issue queries that ‘join’ external and local tables. PASS THROUGH queries: These queries are issued directly against the remote data source in the syntax of the remote data source (say T-SQL for Azure SQL database). REMOTABLE_TYPES: For every external data source you have to specify the list of ‘remoteable types. This list constrains the types of queries that will be remoted. Ex: REMOTABLE_TYPES = (bool, byte, short, ushort, int, decimal); LAZY METADATA LOADING: Here the remote data schematized only when the query is actually issues to the remote data source. Your program must be able to deal with remote schema changes.
  2. Shows simple Extract, OUTPUT Then simple extensibility with string functions.
  3. Shows simple Extract, OUTPUT Then simple extensibility with string functions.
  4. Add file sets.
  5. Show Views, TVFs and Tables
  6. GROUP BY, ORDER BY, CROSS APPLY
  7. Extensions require .NET assemblies to be registered with a database