Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

Trivadis TechEvent 2016 Polybase challenges Hive relational access to non-relational HDFS by Olaf Nimz

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige

Hier ansehen

1 von 22 Anzeige

Trivadis TechEvent 2016 Polybase challenges Hive relational access to non-relational HDFS by Olaf Nimz

Herunterladen, um offline zu lesen

In this presentation, Olaf Nimz talks about a proposed marriage between SQL Server and Hadoop, about Building Bridges to HDFS, Distributed query processing and about Sensible Hybrid Scenarios.

In this presentation, Olaf Nimz talks about a proposed marriage between SQL Server and Hadoop, about Building Bridges to HDFS, Distributed query processing and about Sensible Hybrid Scenarios.

Anzeige
Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (20)

Andere mochten auch (20)

Anzeige

Ähnlich wie Trivadis TechEvent 2016 Polybase challenges Hive relational access to non-relational HDFS by Olaf Nimz (20)

Weitere von Trivadis (20)

Anzeige

Aktuellste (20)

Trivadis TechEvent 2016 Polybase challenges Hive relational access to non-relational HDFS by Olaf Nimz

  1. 1. BASLE BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENEVA HAMBURG COPENHAGEN LAUSANNE MUNICH STUTTGART VIENNA ZURICH Polybase challenges Hive relational access to non-relational HDFS Olaf Nimz
  2. 2. Agenda Proposed marriage between SQL Server and Hadoop Building Bridges to HDFS Distributed query processing Sensible Hybrid Scenarios
  3. 3. Take Home Message 1. Access to non-relational world is easier with Polybase T-SQL only Unstructured data still complex e.g. nested JSON stuctures 2. Hybrid solutions Fact Extractor - IoT Staging Area for DWH – keep entire history Dirty data source files Near real-time 3. Scenarios Swiss Air - Flight Logs SwissCom - Call Data Records Archiving (c)old DWH Facts
  4. 4. Polybase
  5. 5. Polybase Requirements – Java (64-bit JRE >7.51) – Azure storage account or Hadoop (not HDInsight) > Hortonwork’s Data Platform (HDP 1.3, 2.0 – 2.3) > Cloudera’s CDH (4.3, 5.1 – 5.5) Installation Check – SELECT SERVERPROPERTY ('IsPolybaseInstalled'); returns 1? Configuration external data source – sp_configure @configname = 'hadoop connectivity', @configvalue = 7;
  6. 6. Data Movement Services
  7. 7. Feature SQL Server 2016 Azure SQL Data Warehouse APS Appliance - PDW Query Hadoop data with Transact-SQL yes no yes Query Azure blob storage with Transact-SQL yes yes yes Import data from Hadoop yes no yes Import data from Azure blob storage yes yes yes Export data to Hadoop yes no yes Export data to Azure blob storage yes yes yes Run PolyBase queries from Microsoft's BI tools yes yes yes Push down query computations to Hadoop yes no yes Feature
  8. 8. Objects for Polybase
  9. 9. 2015 © Trivadis Define external objects CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'S0me!nfo'; CREATE DATABASE SCOPED CREDENTIAL HadoopUser WITH IDENTITY = '<hadoop_user_name>', SECRET = '<hadoop_password>'; CREATE EXTERNAL DATA SOURCE HadoopCluster WITH ( TYPE = HADOOP, LOCATION ='hdfs://10.xxx.xx.xxx:xxxx', RESOURCE_MANAGER_LOCATION = '10.xxx.xx.xxx:xxxx', CREDENTIAL = HadoopUser);
  10. 10. 2015 © Trivadis Define external objects CREATE EXTERNAL FILE FORMAT TextFileFormat WITH ( FORMAT_TYPE = DELIMITEDTEXT, FORMAT_OPTIONS (FIELD_TERMINATOR ='|', USE_TYPE_DEFAULT = TRUE) CREATE EXTERNAL TABLE [dbo].[CarSensor_Data] ( [SensorKey] int NOT NULL, [CustomerKey] int NOT NULL, [GeographyKey] int NULL, [Speed] float NOT NULL, [YearMeasured] int NOT NULL ) WITH (LOCATION = '/Demo/', DATA_SOURCE = HadoopCluster, FILE_FORMAT = TextFileFormat );
  11. 11. 2015 © Trivadis Query external data SELECT DISTINCT Insured_Customers.FirstName , Insured_Customers.LastName , Insured_Customers.YearlyIncome , CarSensor_Data.Speed FROM Insured_Customers , CarSensor_Data -- cross join WHERE Insured_Customers.CustomerKey = CarSensor_Data.CustomerKey and CarSensor_Data.Speed > 35 ORDER BY CarSensor_Data.Speed DESC OPTION (FORCE EXTERNALPUSHDOWN); -- or OPTION (DISABLE EXTERNALPUSHDOWN)
  12. 12. 2015 © Trivadis Export Data to Hadoop  CREATE EXTERNAL TABLE [dbo].[FastCustomers2009] ( … );  Move cold data to Hadoop/Blob while keeping it query-able via an external table: INSERT INTO dbo.FastCustomer2009 SELECT * FROM Insured_Customers T1 JOIN CarSensor_Data T2 ON (T1.CustomerKey = T2.CustomerKey) WHERE T2.YearMeasured = 2009 AND T2.Speed > 40;
  13. 13. Polybase Objects in SSMS
  14. 14. Dynamic Management Views Monitor and troubleshoot PolyBase queries using the DMVs.  longest running queries  longest running step of the distributed query  execution progress of the longest running step - of a SQL step - XML remote query plan - of a DMS step  Find information about external DMS operations - View the PolyBase query plan - XML remote query plan (node properties)
  15. 15. JSON Format Parse JSON text and read or modify values. Transform arrays of JSON objects into table format. Use any Transact SQL query on the converted JSON objects. Format the results of Transact-SQL queries in JSON format.
  16. 16. JSON
  17. 17. Parse «unstructured» JSON cell content stored in the jsonCol column: [ { "name": "John", "skills": [ "SQL", "C#", "Azure“ ] }, { "name": "Jane", "surname": "Doe" } ] SELECT Name, Surname, JSON_VALUE(jsonCol, '$.info.address.PostCode') as PostCode, JSON_VALUE(jsonCol, '$.info.address."Address Line 1"') +' '+ JSON_VALUE(jsonCol, '$.info.address."Address Line 2"') as Address, JSON_QUERY(jsonCol, '$.info.skills') as Skills FROM PeopleCollection WHERE ISJSON(jsonCol) > 0 AND JSON_VALUE(jsonCol, '$.info.address.town') = 'Belgrade' AND Status = 'Active' ORDER BY JSON_VALUE(@jsonInfo, '$.info.address.PostCode')
  18. 18. Convert «unstructured» JSON to table SET @json = '[ { "id" : 2, "info": { "name": "John", "surname": "Smith" }, "age": 25 }, { "id" : 5, "info": { "name": "Jane", "surname": "Smith" }, "dob": "2005-11-04T12:00:00" } ]' SELECT * FROM OPENJSON(@json) WITH (id int 'strict $.id', firstName nvarchar(50) '$.info.name', lastName nvarchar(50) '$.info.surname', age int, dateOfBirth datetime2 '$.dob')
  19. 19. Performance Scaling
  20. 20. Take Home Message 1. Access to non-relational world is easier with Polybase T-SQL only Unstructured data still complex e.g. nested JSON stuctures 2. Hybrid solutions Fact Extractor - IoT Staging Area for DWH – keep entire history Dirty data source files Near real-time 3. Scenarios Swiss Air - Flight Logs Swisscom - Call Data Records Archiving (c)old DWH Facts
  21. 21. Outlook Table definition remains challenging Push down computation Scale-out the SQL Server side – using e.g. idle Fail Over Instance see Blob Post with Code Examples
  22. 22. BASEL BERN BRUGG LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN THANK YOU. Trivadis AG Olaf Nimz Sägereistrasse 29 8152 Glattbrugg Tel. +41-44-808 70 20 Fax +41-44-808 70 21 info@trivadis.com www.trivadis.com

×