Call Girls Bannerghatta Road Just Call đ 7737669865 đ Top Class Call Girl Ser...
Â
3 CityNetConf - sql+c#=u-sql
1. SQL + C# = U-SQL
Ĺukasz Grala
Architect Data Platform & Advanced Analytics & BI Solutions
Data Platform MVPGdaĹsk 18.05.2016
2. @Ĺukasz Grala â lukasz@tidk.pl
⢠Architekt rozwiÄ zaĹ Platformy Danych & Business Intelligence & Zaawansowanej Analityki w TIDK
⢠Certyfikowany trener Microsoft i wykĹadowca na wyĹźszych uczelniach
⢠Autor zaawansowanych szkoleŠi warsztatów, oraz licznych publikacji i webcastów
⢠Od 2010 roku wyróşniany nagrodÄ Microsoft Data Platform MVP
⢠Doktorant Politechnika PoznaĹska â WydziaĹ Informatyki (obszar bazy danych, eksploracja danych,
uczenie maszynowe)
⢠Prelegent na licznych konferencjach w kraju i na Ĺwiecie
⢠Posiada liczne certyfikaty (MCT, MCSE, MCSA, MCITP,âŚ)
⢠CzĹonek Polskiego Towarzystwa Informatycznego
⢠CzĹonek i lider Polish SQL Server User Group (PLSSUG)
⢠Pasjonat analizy, przechowywania i przetwarzania danych, miĹoĹnik Jazzu
3. Data (Big Data)
⢠72 hours of video are uploaded per minute on YouTube (1 terabyte
every 4 minutes)
⢠500 terabytes of new data per day are ingested in Facebook
databases
⢠Sensors from a Boeing jet engine create 20 terabytes
of data every hour
⢠The proposed Square Kilometer Array telescope will generate âa few
Exabytes of data per dayâ (single beam)
lukasz@tidk.pl
8. HADOOP FILE SYSTEM HDFS for cloud
ENTERPRISE READY access control, encryption at rest
Store ANY DATA in its native format (csv, tcv, json tables,
images,âŚ)
No limits to SCALE
Optimization for analytic workload PERFORMANCE
Azure Data Lake Store Services
8
9. Data Lake Store â Filesystem
⢠WebHDFS API, REST
⢠Use: adl://
â˘
adl://<data_lake_store_name>.azuredatalakestore.net
10. Built on Apache YARN
Scales dynamically with the turn of a dial
Pay by the query
Supports Azure AD for access control, roles, and integration with
on-prem identity systems
Built with U-SQL to unify the benefits of SQL with the power of C#
Processes data across Azure
Azure Data Lake Analytics Services
10
11. Work across all cloud data
Azure Data Lake
Analytics
Azure SQL DW Azure SQL DB
Azure
Storage Blobs
Azure
Data Lake Store
SQL DB in an
Azure VM
15. Azure Data Lake Analytics
A elastic analytics service built on Apache YARN that processes all data,
at any size
⢠No limits to SCALE
⢠Includes U-SQL, a language that unifies the benefits of SQL with the expressive power of C#
⢠Optimized to work with ADL STORE
⢠FEDERATED QUERY across Azure data sources
⢠ENTERPRISE READY Role based access control & Auditing
⢠Pay PER JOB & Scale PER JOB
16. lukasz@tidk.pl
The origins of U-SQL
SCOPE â Microsoftâs internal Big Data language
⢠SQL and C# integration model
⢠Optimization and Scaling model
⢠Runs 100â000s of jobs daily
Hive
⢠Complex data types (Maps, Arrays)
⢠Data format alignment for text files
T-SQL/ANSI SQL
⢠Many of the SQL capabilities (windowing functions, meta data model etc.)
17. U-SQL â Language Overview
U-SQL Fundamentals
⢠All the familiar SQL clauses
SELECT | FROM | WHERE
GROUP BY | JOIN | OVER
⢠Operate on unstructured and
structured data
⢠Relational metadata objects
.NET integration and
extensibility
⢠U-SQL expressions are full C#
expressions
⢠Reuse .NET code in your own
assemblies
⢠Use C# to define your own:
Types | Functions | Joins | Aggregators | I/O
(Extractors, Outputters)
19. U-SQL Distributed Query
Azure Data Lake Store
Azure SQL Database
Azure SQL Data Warehouse
Azure SQL DB in Azure VM
READ
READ
READ
READ
READ
WRITE
WRITE
WRITE
WRITE
WRITE
Azure Storage Blobs
21. U-SQL Language Philosophy
Declarative Query and Transformation Language:
⢠Uses SQLâs SELECT FROM WHERE with GROUP
BY/Aggregation, Joins, SQL Analytics functions
⢠Optimizable, Scalable
Expression-flow programming style:
⢠Easy to use functional lambda composition
⢠Composable, globally optimizable
Operates on Unstructured & Structured Data
⢠Schema on read over files
⢠Relational metadata objects (e.g. database, table)
Extensible from ground up:
⢠Type system is based on C#
⢠Expression language IS C#
⢠User-defined functions (U-SQL and C#)
⢠User-defined Aggregators (C#)
⢠User-defined Operators (UDO) (C#)
U-SQL provides the Parallelization and Scale-out
Framework for Usercode
⢠EXTRACTOR, OUTPUTTER, PROCESSOR, REDUCER,
COMBINER, APPLIER
Federated query across distributed data sources
REFERENCE MyDB.MyAssembly;
CREATE TABLE T( cid int, first_order DateTime
, last_order DateTime, order_count int
, order_amount float );
@o = EXTRACT oid int, cid int, odate DateTime, amount float
FROM "/input/orders.txt"
USING Extractors.Csv();
@c = EXTRACT cid int, name string, city string
FROM "/input/customers.txt"
USING Extractors.Csv();
@j = SELECT c.cid, MIN(o.odate) AS firstorder
, MAX(o.date) AS lastorder, COUNT(o.oid) AS ordercnt
, AGG<MyAgg.MySum>(c.amount) AS totalamount
FROM @c AS c LEFT OUTER JOIN @o AS o ON c.cid == o.cid
WHERE c.city.StartsWith("New")
&& MyNamespace.MyFunction(o.odate) > 10
GROUP BY c.cid;
OUTPUT @j TO "/output/result.txt"
USING new MyData.Write();
INSERT INTO T SELECT * FROM @j;
22. Meta Data Object Model
ADLA Catalog
Database
Schema
[1,n]
[1,n]
[0,n]
tables views TVFs
C# Fns C# UDAgg
Clustered
Index
partitions
C#
Assemblies
C# Extractors
Data
Source
C# Reducers
C# Processors
C# Combiners
C# Outputters
Ext. tables Procedures
Creden-
tials
C# Applier
Table Types
Statistics
C# UDTs
Abstract
objects
User
objects
Refers toContains Implemented
and named by
MD
Name
C# Name
Legend
23. U-SQL Joins
Join operators
⢠INNER JOIN
⢠LEFT or RIGHT or FULL OUTER JOIN
⢠CROSS JOIN
⢠SEMIJOIN
⢠equivalent to IN subquery
⢠ANTISEMIJOIN
⢠Equivalent to NOT IN subquery
Notes
⢠ON clause comparisons need to be of the simple form:
rowset.column == rowset.column
or AND conjunctions of the simple equality comparison
⢠If a comparand is not a column, wrap it into a column in a previous SELECT
⢠If the comparison operation is not ==, put it into the WHERE clause
⢠turn the join into a CROSS JOIN if no equality comparison
Reason: Syntax calls out which joins are efficient
25. Views and TVFs
⢠Views for simple cases
⢠TVFs for parameterization and most cases
Table-Valued Functions (TVFs)
CREATE FUNCTION F (@arg string = "default")
RETURNS @res [TABLE ( ⌠)]
AS BEGIN ⌠@res = ⌠END;
⢠Provides parameterization
⢠One or more results
⢠Can contain multiple statements
⢠Can contain user-code (needs assembly reference)
⢠Will always be inlined
⢠Infers schema or checks against specified return schema
Views
CREATE VIEW V AS EXTRACTâŚ
CREATE VIEW V AS SELECT âŚ
⢠Cannot contain user-defined objects (such as
UDFs or UDOs)
⢠Will be inlined
26. Procedures
⢠Allows encapsulation of non-DDL scripts
⢠Provides parameterization
⢠No result but writes into file or table
⢠Can contain multiple statements
⢠Can contain user code (needs assembly
reference)
⢠Will always be inlined
⢠Cannot contain DDL (no CREATE, DROP)
CREATE PROCEDURE P (@arg string = "defaultâ)
AS
BEGIN
âŚ;
OUTPUT @res TO âŚ;
INSERT INTO T âŚ;
END;
27. Tables
⢠CREATE TABLE
⢠CREATE TABLE AS SELECT
CREATE TABLE T (INDEX idx CLUSTERED âŚ) AS SELECT âŚ;
CREATE TABLE T (INDEX idx CLUSTERED âŚ) AS EXTRACTâŚ;
CREATE TABLE T (INDEX idx CLUSTERED âŚ) AS
myTVF(DEFAULT);
⢠Infer the schema from the query
⢠Still requires index and partitioning
CREATE TABLE T (col1 int
, col2 string
, col3 SQL.MAP<string,string>
, INDEX idx CLUSTERED (col1 ASC)
PARTITIONED BY HASH (driver_id)
);
⢠Structured Data
⢠Built-in Data types only (no UDTs)
⢠Clustered index (must be specified): row-oriented
⢠Fine-grained partitioning (must be specified):
⢠HASH, DIRECT HASH, RANGE, ROUND ROBIN
28. INSERT
⢠INSERT constant values
⢠INSERT from queries
⢠Multiple INSERTs
Multiple INSERTs into same table
⢠Is supported
⢠Generates separate file per insert in physical storage:
⢠Can lead to performance degradation
⢠Recommendations:
⢠Try to avoid small inserts
⢠Rebuild table after frequent insertions with:
ALTER TABLE T REBUILD;
INSERT constant values
INSERT INTO T VALUES (1, "text",
new SQL.MAP<string,string>("key","value"));
INSERT from queries
INSERT INTO T SELECT col1, col2, col3 FROM @rowset;
29. Top 5s Surprise for SQL Users
AS is not as
⢠C# keywords and SQL keywords overlap
⢠Costly to make case-insensitive -> Better build capabilities than tinker with syntax
= != ==
⢠Remember: C# expression language
null IS NOT NULL
⢠C# nulls are two-valued
PROCEDURES but no WHILE
No UPDATE nor MERGE
⢠Transform/Recook instead
31. Additional capabilities and resources
⢠Tools: http://aka.ms/adltoolsVS
⢠Blogs and community page:
⢠http://usql.io
⢠http://blogs.msdn.com/b/visualstudio/
⢠http://azure.microsoft.com/en-us/blog/topics/big-data/
⢠https://channel9.msdn.com/Search?term=U-SQL#ch9Search
⢠Documentation and articles and slides:
⢠http://aka.ms/usql_reference
⢠https://azure.microsoft.com/en-us/documentation/services/data-lake-analytics/
⢠https://msdn.microsoft.com/en-us/magazine/mt614251
⢠http://www.slideshare.net/MichaelRys
⢠ADL forums and feedback
⢠http://aka.ms/adlfeedback
⢠https://social.msdn.microsoft.com/Forums/azure/en-US/home?forum=AzureDataLake
⢠http://stackoverflow.com/questions/tagged/u-sql
Michael Rys -
@MikeDoesBigData
32.
33. ⢠16-18 maj 2016
⢠WrocĹaw Centrum Konferencyjne
⢠3 dni, 6 warsztatĂłw, 4 ĹcieĹźki, ponad 30 prelegentĂłw, 50 sesji
⢠600 uczestników + sponsorzy + prelegenci + organizatorzy
⢠GoĹcie miÄdzy innymi z USA, Anglii, Niemiec, Ukrainy, BuĹgarii, SĹoweni
⢠Premiera techniczna SQL Server 2016
sqlday.pl @sqlday
lukasz@tidk.pl
W tym warsztat Big Data Analytics â Ĺukasz Grala & Marcin Szeliga
34. Masterclass: Cloud Storage
23-25.05.2016, Warszawa
Azure SQL Server i Azure SQL Database, Skalowanie bazy relacyjnej w
chmurze, Hurtownia danych w chmurze PowerShell i bazy danych w
Azure, Azure BLOB Storage, Bazy dokumentowe, Big Data z
HDInsight, Hadoop, Apache Spark, PozostaĹe komponenty HDInsight i
Hadoop, Wirtualne maszyny
Masterclass: Cloud Analytics
20-22.06.2016, Warszawa
Data Catalog, Data Factory, Data Lake, PowerBI i dane relacyjne w
chmurze, Hadoop, Apache Spark, Analiza danych strumieniowych,
Analiza z baz danych dokumentowych i grafowych, Uczenie
maszynowe, Polybase w SQL Server 2016
Ĺukasz Grala
Data Platform MVP,
MCT, MCSE, MCSA,
MCITP, MCSA,
MCP, MTA
Ĺukasz o szkoleniach:
âDanych produkowanych jest
wiÄcej niĹź kiedykolwiek, pochodzÄ
z sieci Internet, z portali spoĹecznoĹciowych, z
urzÄ dzeĹ. Bardzo duĹźy rozwĂłj Internetu Rzeczy
(IoT) iloĹÄ tych danych jeszcze bardziej
zwiÄksza. Dlatego przygotowaliĹmy dwa
specjalne kursy Cloud Storage i Cloud Analytics,
przedstawiajÄ ce mechanizmy skĹadowania,
przetwarzania i analizy danych z
wykorzystaniem chmury.â
Big Data, BI, Analityka, SQL
Standard -25% na hasĹo 3CityNetConfwww.hexcode.pl
Hinweis der Redaktion
A new distributed analytics service
Built on Apache YARN
Dynamically scales
Handles jobs of any scale instantly by simply setting the dial for how much power you need.
You only pay for the cost of the query
Supports Azure Active Directory for Access Control, Roles, Integration with on-premises identity systems
It also includes U-SQL, a language that unifies the benefits of SQL with the expressive power of C#
U-SQLâs scalable runtime processes data across multiple Azure data sources
A new distributed analytics service
Built on Apache YARN
Dynamically scales
Handles jobs of any scale instantly by simply setting the dial for how much power you need.
You only pay for the cost of the query
Supports Azure Active Directory for Access Control, Roles, Integration with on-premises identity systems
It also includes U-SQL, a language that unifies the benefits of SQL with the expressive power of C#
U-SQLâs scalable runtime processes data across multiple Azure data sources
ADLA allows you to compute on data anywhere and a join data from multiple cloud sources.