SlideShare ist ein Scribd-Unternehmen logo
1 von 25
MS SQL 2019:
Big Data Processing
Andrii Zrobok
Chief Database Developer, EPAM
azrobok@gmail.com
Agenda
 MS SQL 2019 overview
 PolyBase: History, What, Why, Demo
 Big Data Cluster
 Scenarios
About me
25 + years of experience in database development: development data-centric
applications from scratch, support of legacy databases/applications, data migration
tasks, performance tuning, SSIS/ETL tasks, consulting, database trainer, etc.
Databases: FoxPro 2.0 for DOS (Fox Software), MS SQL Server (from version 6.5,
1996), Oracle, Sybase ASE, MySQL, PostgreSQL
Co-leader of Lviv Data Platform UG (PASS Local Chapter) (http://lvivsqlug.pass.org/)
Speaker at:
• PASS SQLSaturday conferences (Lviv, Kyiv, Dnipro, Odessa, Kharkiv; since 2013)
• PASS L’viv/Vinnitsa/Virtual SQL Server User Groups;
• EPAM IT Week 2015-2017
Nowadays challenges
 Unified access to all your data with unparalleled performance
 Easily and securely manage data big and small
 Build intelligent Apps and AI with all your data
MS SQL 2019 Preview
Windows: Standard version with PolyBase
Linux: Linux version without PolyBase
Docker: Database Engine Container Image (Ubuntu, Red Hat)
Big Data Analytics: Linux container on Kubernetes
https://www.microsoft.com/en-us/sql-server/sql-server-2019#Install
PolyBase: What?
SQL Server
PolyBase external tables / external data source
T-SQLApplications Analytics
Microsoft's newest technology for connecting to remote servers.
https://docs.microsoft.com/uk-ua/sql/relational-databases/polybase/polybase-
guide?view=sqlallproducts-allversions
PolyBase: History
 Introduced in SQL Server Parallel Data Warehouse (PDW) edition, back
in 2010
 Expanded in SQL Server Analytics Platform System (APS) in 2012.
 Released to the "general public" in SQL Server 2016, with most support
being in Enterprise Edition.
 Extended support for additional technologies (like Oracle, MongoDB,
etc.) will be available in SQL Server 2019.
PolyBase: Why?
 Without PolyBase
 Transfer half your data so that all your data was in one format or the other
 Query both sources of data, then write custom query logic to join and
integrate the data at the client level.
 With PolyBase
 using T-SQL to join the data (external table, statistics for external table)
 Usage
 Querying / Import (into table) / Export (into data storage)
 Performance
 Use computation on Target server (OPTION (FORCE EXTERNALPUSHDOWN))
PolyBase: Demo - tools
1) PolyBase should be installed and enabled
2) Using Management Studio (scripts, no visibility)
OR
3) Using Azure Data Studio + SQL Server 2019 (Preview) Extension
https://docs.microsoft.com/en-us/sql/azure-data-studio/download?view=sql-
server-2017
https://docs.microsoft.com/en-us/sql/azure-data-studio/sql-server-2019-
extension?view=sqlallproducts-allversions
PolyBase: Demo - steps
 Create master key (needed for password encryption)
 Create database scoped credential (access to remote database
server)
 Create external data source (address of remote database server)
 Create schema for external data (optional)
 Create external tables / statistics on external tables
PolyBase: Demo – external tables
CREATE DATABASE SCOPED CREDENTIAL OracleCredentials
WITH IDENTITY = 'system', Secret = '0x7ORA18c';
CREATE EXTERNAL DATA SOURCE OracleInstance
WITH (
LOCATION = 'oracle://192.168.1.103:1521',
CREDENTIAL = OracleCredentials);
CREATE EXTERNAL TABLE pb_oracle.countries
( country_id CHAR(2) NOT NULL
, country_name VARCHAR(40)
, region_id INTEGER )
WITH ( LOCATION='XE.EDU.COUNTRIES',
DATA_SOURCE=OracleInstance);
PolyBase: select from remote servers
SELECT
e.employee_id,
e.first_name,
e.last_name
,d.department_name
,l.city
,c.country_name
,r.region_name
FROM dbo.employees e
INNER JOIN dbo.departments d ON e.department_id = d.department_id
INNER JOIN dbo.locations l ON d.location_id = l.location_id
INNER JOIN pb_oracle.countries c ON c.country_id = l.country_id
INNER JOIN pb_sqlserver.regions r ON r.region_id = c.region_id
PolyBase: Remote Query
PolyBase: statistics
CREATE STATISTICS
CustomerCustKeyStatistics
ON pb_sqlserver.address
(stateprovinceid) WITH FULLSCAN;
SELECT DISTINCT a.city
from [pb_sqlserver].[address] a
where a.stateprovinceid = 9
PolyBase: externalpushdown
select stateprovinceid, count(*) from
pb_sqlserver.address group by stateprovinceid
select stateprovinceid, count(*) from
pb_sqlserver.address group by stateprovinceid
OPTION (DISABLE EXTERNALPUSHDOWN)
PolyBase: Scale – out groups
One node – up to 8 readers
Polybase extends the idea of
Massively Parallel Processing
(MPP) to SQL Server.
SQL Server is a classic "scale-up"
technology: if you want more
power, add more
RAM/CPUs/resources to the
single server.
Hadoop is a great example of an
MPP system: if you want more
power, add more servers; the
system will coordinate
processing.
Kubernetes Concepts
https://medium.com/@tsuyoshiushio/kubernetes-in-three-diagrams-6aba8432541c
Big data cluster architecture
Big data cluster component
Component Description
Control Plane The control plane provides management and security for the cluster.
It contains the Kubernetes master, the SQL Server master instance,
and other cluster-level services such as the Hive Metastore and Spark Driver.
Compute plane The compute plane provides computational resources to the cluster. It contains nodes running
SQL Server on Linux pods. The pods in the compute plane are divided into compute pools for
specific processing tasks. A compute pool can act as a PolyBase scale-out group for
distributed queries over different data sources-such as HDFS, Oracle, MongoDB, or Teradata.
Data plane The data plane is used for data persistence and caching. The SQL data pool consists of one or
more pods running SQL Server on Linux. It is used to ingest data from SQL queries or Spark
jobs. SQL Server big data cluster data marts are persisted in the data pool. The storage pool
consists of storage pool pods comprised of SQL Server on Linux, Spark, and HDFS. All the
storage nodes in a SQL Server big data cluster are members of an HDFS cluster.
Management
 Easy deploy and manage because of benefits of containers and
Kubernetes
 Fast to deploy
 Self contained (no installations required, images)
 Easy upgrade – new image uploading
 Scalable, multi-tenant
Scenarios: Data virtualization
By leveraging SQL Server
PolyBase SQL Server big data
clusters can query external
data sources without moving or
copying the data
Scenarios: Data Lake
A SQL Server big data cluster includes
a scalable HDFS storage pool. This can
be used to store big data, potentially
ingested from multiple external
sources. Once the big data is stored in
HDFS in the big data cluster, you can
analyze and query the data and
combine it with your relational data.
Scenarios: Scale-out datamart
SQL Server big data clusters provide
scale-out compute and storage to
improve the performance of analyzing
any data. Data from a variety of
sources can be ingested and
distributed across data pool nodes as a
cache for further analysis.
Scenarios: Integrated AI and ML
SQL Server big data clusters enable AI and machine learning tasks on the data
stored in HDFS storage pools and the data pools. You can use Spark as well as
built-in AI tools in SQL Server, using R, Python, Scala, or Java.
MS SQL Server 2019 & Big Data Processing
The end
Q&A
THANK YOU

Weitere ähnliche Inhalte

Was ist angesagt?

Azure Data Lake and U-SQL
Azure Data Lake and U-SQLAzure Data Lake and U-SQL
Azure Data Lake and U-SQLMichael Rys
 
Qubole hadoop-summit-2013-europe
Qubole hadoop-summit-2013-europeQubole hadoop-summit-2013-europe
Qubole hadoop-summit-2013-europeJoydeep Sen Sarma
 
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)Sascha Dittmann
 
Migrating structured data between Hadoop and RDBMS
Migrating structured data between Hadoop and RDBMSMigrating structured data between Hadoop and RDBMS
Migrating structured data between Hadoop and RDBMSBouquet
 
Qubole - Big data in cloud
Qubole - Big data in cloudQubole - Big data in cloud
Qubole - Big data in cloudDmitry Tolpeko
 
Cloudera Impala + PostgreSQL
Cloudera Impala + PostgreSQLCloudera Impala + PostgreSQL
Cloudera Impala + PostgreSQLliuknag
 
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...Michael Rys
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseGrant Fritchey
 
Azure data factory
Azure data factoryAzure data factory
Azure data factoryBizTalk360
 
[db tech showcase Tokyo 2017] C37: MariaDB ColumnStore analytics engine : use...
[db tech showcase Tokyo 2017] C37: MariaDB ColumnStore analytics engine : use...[db tech showcase Tokyo 2017] C37: MariaDB ColumnStore analytics engine : use...
[db tech showcase Tokyo 2017] C37: MariaDB ColumnStore analytics engine : use...Insight Technology, Inc.
 
Elephants vs. Dolphins: Comparing PostgreSQL and MySQL for use in the DoD
Elephants vs. Dolphins:  Comparing PostgreSQL and MySQL for use in the DoDElephants vs. Dolphins:  Comparing PostgreSQL and MySQL for use in the DoD
Elephants vs. Dolphins: Comparing PostgreSQL and MySQL for use in the DoDJamey Hanson
 
NoSQL Options Compared
NoSQL Options ComparedNoSQL Options Compared
NoSQL Options ComparedSergey Bushik
 
What's new in SQL Server 2016
What's new in SQL Server 2016What's new in SQL Server 2016
What's new in SQL Server 2016James Serra
 
IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015Yousun Jeong
 
HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environmen...
HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environmen...HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environmen...
HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environmen...Cloudera, Inc.
 
Visualizing big data in the browser using spark
Visualizing big data in the browser using sparkVisualizing big data in the browser using spark
Visualizing big data in the browser using sparkDatabricks
 
Impala Architecture presentation
Impala Architecture presentationImpala Architecture presentation
Impala Architecture presentationhadooparchbook
 

Was ist angesagt? (20)

Azure Data Lake and U-SQL
Azure Data Lake and U-SQLAzure Data Lake and U-SQL
Azure Data Lake and U-SQL
 
Qubole hadoop-summit-2013-europe
Qubole hadoop-summit-2013-europeQubole hadoop-summit-2013-europe
Qubole hadoop-summit-2013-europe
 
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
 
Azure SQL DWH
Azure SQL DWHAzure SQL DWH
Azure SQL DWH
 
Azure data lake sql konf 2016
Azure data lake   sql konf 2016Azure data lake   sql konf 2016
Azure data lake sql konf 2016
 
Migrating structured data between Hadoop and RDBMS
Migrating structured data between Hadoop and RDBMSMigrating structured data between Hadoop and RDBMS
Migrating structured data between Hadoop and RDBMS
 
Qubole - Big data in cloud
Qubole - Big data in cloudQubole - Big data in cloud
Qubole - Big data in cloud
 
Cloudera Impala + PostgreSQL
Cloudera Impala + PostgreSQLCloudera Impala + PostgreSQL
Cloudera Impala + PostgreSQL
 
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data Warehouse
 
SQOOP - RDBMS to Hadoop
SQOOP - RDBMS to HadoopSQOOP - RDBMS to Hadoop
SQOOP - RDBMS to Hadoop
 
Azure data factory
Azure data factoryAzure data factory
Azure data factory
 
[db tech showcase Tokyo 2017] C37: MariaDB ColumnStore analytics engine : use...
[db tech showcase Tokyo 2017] C37: MariaDB ColumnStore analytics engine : use...[db tech showcase Tokyo 2017] C37: MariaDB ColumnStore analytics engine : use...
[db tech showcase Tokyo 2017] C37: MariaDB ColumnStore analytics engine : use...
 
Elephants vs. Dolphins: Comparing PostgreSQL and MySQL for use in the DoD
Elephants vs. Dolphins:  Comparing PostgreSQL and MySQL for use in the DoDElephants vs. Dolphins:  Comparing PostgreSQL and MySQL for use in the DoD
Elephants vs. Dolphins: Comparing PostgreSQL and MySQL for use in the DoD
 
NoSQL Options Compared
NoSQL Options ComparedNoSQL Options Compared
NoSQL Options Compared
 
What's new in SQL Server 2016
What's new in SQL Server 2016What's new in SQL Server 2016
What's new in SQL Server 2016
 
IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015
 
HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environmen...
HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environmen...HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environmen...
HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environmen...
 
Visualizing big data in the browser using spark
Visualizing big data in the browser using sparkVisualizing big data in the browser using spark
Visualizing big data in the browser using spark
 
Impala Architecture presentation
Impala Architecture presentationImpala Architecture presentation
Impala Architecture presentation
 

Ähnlich wie MS SQL 2019: Big Data Processing and PolyBase Overview

Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive session
Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive sessionMicrosoft ignite 2018 SQL server 2019 big data clusters - deep dive session
Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive sessionTravis Wright
 
DAC4B 2015 - Polybase
DAC4B 2015 - PolybaseDAC4B 2015 - Polybase
DAC4B 2015 - PolybaseŁukasz Grala
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)James Serra
 
Azure Data platform
Azure Data platformAzure Data platform
Azure Data platformMostafa
 
Big Data: SQL on Hadoop from IBM
Big Data:  SQL on Hadoop from IBM Big Data:  SQL on Hadoop from IBM
Big Data: SQL on Hadoop from IBM Cynthia Saracco
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseJames Serra
 
Whats New Sql Server 2008 R2 Cw
Whats New Sql Server 2008 R2 CwWhats New Sql Server 2008 R2 Cw
Whats New Sql Server 2008 R2 CwEduardo Castro
 
Whats New Sql Server 2008 R2
Whats New Sql Server 2008 R2Whats New Sql Server 2008 R2
Whats New Sql Server 2008 R2Eduardo Castro
 
Microsoft ignite 2018 SQL Server 2019 big data clusters - intro session
Microsoft ignite 2018  SQL Server 2019 big data clusters - intro sessionMicrosoft ignite 2018  SQL Server 2019 big data clusters - intro session
Microsoft ignite 2018 SQL Server 2019 big data clusters - intro sessionTravis Wright
 
Azure Data.pptx
Azure Data.pptxAzure Data.pptx
Azure Data.pptxFedoRam1
 
Exploring Microsoft Azure Infrastructures
Exploring Microsoft Azure InfrastructuresExploring Microsoft Azure Infrastructures
Exploring Microsoft Azure InfrastructuresCCG
 
SQL Server 2019 Big Data Cluster
SQL Server 2019 Big Data ClusterSQL Server 2019 Big Data Cluster
SQL Server 2019 Big Data ClusterMaximiliano Accotto
 
Modernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSModernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSStéphane Fréchette
 
Data Integration through Data Virtualization (SQL Server Konferenz 2019)
Data Integration through Data Virtualization (SQL Server Konferenz 2019)Data Integration through Data Virtualization (SQL Server Konferenz 2019)
Data Integration through Data Virtualization (SQL Server Konferenz 2019)Cathrine Wilhelmsen
 
Introduction to microsoft sql server 2008 r2
Introduction to microsoft sql server 2008 r2Introduction to microsoft sql server 2008 r2
Introduction to microsoft sql server 2008 r2Eduardo Castro
 
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformModernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformHortonworks
 
Professional Portfolio
Professional PortfolioProfessional Portfolio
Professional PortfolioMoniqueO Opris
 

Ähnlich wie MS SQL 2019: Big Data Processing and PolyBase Overview (20)

Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive session
Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive sessionMicrosoft ignite 2018 SQL server 2019 big data clusters - deep dive session
Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive session
 
DAC4B 2015 - Polybase
DAC4B 2015 - PolybaseDAC4B 2015 - Polybase
DAC4B 2015 - Polybase
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
 
Azure Data platform
Azure Data platformAzure Data platform
Azure Data platform
 
Big Data: SQL on Hadoop from IBM
Big Data:  SQL on Hadoop from IBM Big Data:  SQL on Hadoop from IBM
Big Data: SQL on Hadoop from IBM
 
Exploring sql server 2016 bi
Exploring sql server 2016 biExploring sql server 2016 bi
Exploring sql server 2016 bi
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data Warehouse
 
Whats New Sql Server 2008 R2 Cw
Whats New Sql Server 2008 R2 CwWhats New Sql Server 2008 R2 Cw
Whats New Sql Server 2008 R2 Cw
 
Whats New Sql Server 2008 R2
Whats New Sql Server 2008 R2Whats New Sql Server 2008 R2
Whats New Sql Server 2008 R2
 
Microsoft ignite 2018 SQL Server 2019 big data clusters - intro session
Microsoft ignite 2018  SQL Server 2019 big data clusters - intro sessionMicrosoft ignite 2018  SQL Server 2019 big data clusters - intro session
Microsoft ignite 2018 SQL Server 2019 big data clusters - intro session
 
Azure Data.pptx
Azure Data.pptxAzure Data.pptx
Azure Data.pptx
 
Exploring Microsoft Azure Infrastructures
Exploring Microsoft Azure InfrastructuresExploring Microsoft Azure Infrastructures
Exploring Microsoft Azure Infrastructures
 
SQL Server 2019 Big Data Cluster
SQL Server 2019 Big Data ClusterSQL Server 2019 Big Data Cluster
SQL Server 2019 Big Data Cluster
 
Modernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSModernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APS
 
Data Integration through Data Virtualization (SQL Server Konferenz 2019)
Data Integration through Data Virtualization (SQL Server Konferenz 2019)Data Integration through Data Virtualization (SQL Server Konferenz 2019)
Data Integration through Data Virtualization (SQL Server Konferenz 2019)
 
Resume
ResumeResume
Resume
 
Introduction to microsoft sql server 2008 r2
Introduction to microsoft sql server 2008 r2Introduction to microsoft sql server 2008 r2
Introduction to microsoft sql server 2008 r2
 
Mysql
MysqlMysql
Mysql
 
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformModernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
 
Professional Portfolio
Professional PortfolioProfessional Portfolio
Professional Portfolio
 

Mehr von Lviv Startup Club

Oksana Krykun: Перші 90 днів в роботі над новим продуктом (UA)
Oksana Krykun: Перші 90 днів в роботі над новим продуктом (UA)Oksana Krykun: Перші 90 днів в роботі над новим продуктом (UA)
Oksana Krykun: Перші 90 днів в роботі над новим продуктом (UA)Lviv Startup Club
 
Yaroslav Osolikhin: «Неідеальний» проєктний менеджер: People Management під ч...
Yaroslav Osolikhin: «Неідеальний» проєктний менеджер: People Management під ч...Yaroslav Osolikhin: «Неідеальний» проєктний менеджер: People Management під ч...
Yaroslav Osolikhin: «Неідеальний» проєктний менеджер: People Management під ч...Lviv Startup Club
 
Mykhailo Hryhorash: What can be good in a "bad" project? (UA)
Mykhailo Hryhorash: What can be good in a "bad" project? (UA)Mykhailo Hryhorash: What can be good in a "bad" project? (UA)
Mykhailo Hryhorash: What can be good in a "bad" project? (UA)Lviv Startup Club
 
Nikita Zahurdaiev: PMO Tools and Technologies (UA)
Nikita Zahurdaiev: PMO Tools and Technologies (UA)Nikita Zahurdaiev: PMO Tools and Technologies (UA)
Nikita Zahurdaiev: PMO Tools and Technologies (UA)Lviv Startup Club
 
Nikita Zahurdaiev: Developing PMO Services and Functions (UA)
Nikita Zahurdaiev: Developing PMO Services and Functions (UA)Nikita Zahurdaiev: Developing PMO Services and Functions (UA)
Nikita Zahurdaiev: Developing PMO Services and Functions (UA)Lviv Startup Club
 
Oleksandr Krakovetskyi: What's wrong with Generative AI? (UA)
Oleksandr Krakovetskyi: What's wrong with Generative AI? (UA)Oleksandr Krakovetskyi: What's wrong with Generative AI? (UA)
Oleksandr Krakovetskyi: What's wrong with Generative AI? (UA)Lviv Startup Club
 
Oleksandr Krakovetskyi: What's wrong with Generative AI? (UA)
Oleksandr Krakovetskyi: What's wrong with Generative AI? (UA)Oleksandr Krakovetskyi: What's wrong with Generative AI? (UA)
Oleksandr Krakovetskyi: What's wrong with Generative AI? (UA)Lviv Startup Club
 
Stanislav Podyachev: AI Agents as Role-Playing Business Modeling Tools (UA)
Stanislav Podyachev: AI Agents as Role-Playing Business Modeling Tools (UA)Stanislav Podyachev: AI Agents as Role-Playing Business Modeling Tools (UA)
Stanislav Podyachev: AI Agents as Role-Playing Business Modeling Tools (UA)Lviv Startup Club
 
Kyryl Truskovskyi: Training and Serving Open-Sourced Foundational Models (UA)
Kyryl Truskovskyi: Training and Serving Open-Sourced Foundational Models (UA)Kyryl Truskovskyi: Training and Serving Open-Sourced Foundational Models (UA)
Kyryl Truskovskyi: Training and Serving Open-Sourced Foundational Models (UA)Lviv Startup Club
 
Andrii Rodionov: What can go wrong in a distributed system – experience from ...
Andrii Rodionov: What can go wrong in a distributed system – experience from ...Andrii Rodionov: What can go wrong in a distributed system – experience from ...
Andrii Rodionov: What can go wrong in a distributed system – experience from ...Lviv Startup Club
 
Dmytro Tkachenko: Можливості АІ відео для бізнесу (UA)
Dmytro Tkachenko: Можливості АІ відео для бізнесу (UA)Dmytro Tkachenko: Можливості АІ відео для бізнесу (UA)
Dmytro Tkachenko: Можливості АІ відео для бізнесу (UA)Lviv Startup Club
 
Roman Kyslyi: Використання та побудова LLM агентів (UA)
Roman Kyslyi: Використання та побудова LLM агентів (UA)Roman Kyslyi: Використання та побудова LLM агентів (UA)
Roman Kyslyi: Використання та побудова LLM агентів (UA)Lviv Startup Club
 
Veronika Snizhko: Штучний інтелект як каталізатор інноваційної культури в ком...
Veronika Snizhko: Штучний інтелект як каталізатор інноваційної культури в ком...Veronika Snizhko: Штучний інтелект як каталізатор інноваційної культури в ком...
Veronika Snizhko: Штучний інтелект як каталізатор інноваційної культури в ком...Lviv Startup Club
 
Volodymyr Zhukov: Ключові труднощі в реальних імплементаціях AI. Досвід з пра...
Volodymyr Zhukov: Ключові труднощі в реальних імплементаціях AI. Досвід з пра...Volodymyr Zhukov: Ключові труднощі в реальних імплементаціях AI. Досвід з пра...
Volodymyr Zhukov: Ключові труднощі в реальних імплементаціях AI. Досвід з пра...Lviv Startup Club
 
Volodymyr Zhukov: Куди рухається ринок AI у 2024 році. Інсайти від Stanford H...
Volodymyr Zhukov: Куди рухається ринок AI у 2024 році. Інсайти від Stanford H...Volodymyr Zhukov: Куди рухається ринок AI у 2024 році. Інсайти від Stanford H...
Volodymyr Zhukov: Куди рухається ринок AI у 2024 році. Інсайти від Stanford H...Lviv Startup Club
 
Andrii Boichuk: The RAG is dead, long live the RAG або як сучасні LLM змінюют...
Andrii Boichuk: The RAG is dead, long live the RAG або як сучасні LLM змінюют...Andrii Boichuk: The RAG is dead, long live the RAG або як сучасні LLM змінюют...
Andrii Boichuk: The RAG is dead, long live the RAG або як сучасні LLM змінюют...Lviv Startup Club
 
Vladyslav Fliahin: Applications of Gen AI in CV (UA)
Vladyslav Fliahin: Applications of Gen AI in CV (UA)Vladyslav Fliahin: Applications of Gen AI in CV (UA)
Vladyslav Fliahin: Applications of Gen AI in CV (UA)Lviv Startup Club
 
Artem Ternov: Побудова платформи під DataEngineering та DataScience в ентерпр...
Artem Ternov: Побудова платформи під DataEngineering та DataScience в ентерпр...Artem Ternov: Побудова платформи під DataEngineering та DataScience в ентерпр...
Artem Ternov: Побудова платформи під DataEngineering та DataScience в ентерпр...Lviv Startup Club
 
Michael Vidyakin: Defining PMO Structure and Governance (UA)
Michael Vidyakin: Defining PMO Structure and Governance (UA)Michael Vidyakin: Defining PMO Structure and Governance (UA)
Michael Vidyakin: Defining PMO Structure and Governance (UA)Lviv Startup Club
 
Michael Vidyakin: Assessing Organizational Readiness (UA)
Michael Vidyakin: Assessing Organizational Readiness (UA)Michael Vidyakin: Assessing Organizational Readiness (UA)
Michael Vidyakin: Assessing Organizational Readiness (UA)Lviv Startup Club
 

Mehr von Lviv Startup Club (20)

Oksana Krykun: Перші 90 днів в роботі над новим продуктом (UA)
Oksana Krykun: Перші 90 днів в роботі над новим продуктом (UA)Oksana Krykun: Перші 90 днів в роботі над новим продуктом (UA)
Oksana Krykun: Перші 90 днів в роботі над новим продуктом (UA)
 
Yaroslav Osolikhin: «Неідеальний» проєктний менеджер: People Management під ч...
Yaroslav Osolikhin: «Неідеальний» проєктний менеджер: People Management під ч...Yaroslav Osolikhin: «Неідеальний» проєктний менеджер: People Management під ч...
Yaroslav Osolikhin: «Неідеальний» проєктний менеджер: People Management під ч...
 
Mykhailo Hryhorash: What can be good in a "bad" project? (UA)
Mykhailo Hryhorash: What can be good in a "bad" project? (UA)Mykhailo Hryhorash: What can be good in a "bad" project? (UA)
Mykhailo Hryhorash: What can be good in a "bad" project? (UA)
 
Nikita Zahurdaiev: PMO Tools and Technologies (UA)
Nikita Zahurdaiev: PMO Tools and Technologies (UA)Nikita Zahurdaiev: PMO Tools and Technologies (UA)
Nikita Zahurdaiev: PMO Tools and Technologies (UA)
 
Nikita Zahurdaiev: Developing PMO Services and Functions (UA)
Nikita Zahurdaiev: Developing PMO Services and Functions (UA)Nikita Zahurdaiev: Developing PMO Services and Functions (UA)
Nikita Zahurdaiev: Developing PMO Services and Functions (UA)
 
Oleksandr Krakovetskyi: What's wrong with Generative AI? (UA)
Oleksandr Krakovetskyi: What's wrong with Generative AI? (UA)Oleksandr Krakovetskyi: What's wrong with Generative AI? (UA)
Oleksandr Krakovetskyi: What's wrong with Generative AI? (UA)
 
Oleksandr Krakovetskyi: What's wrong with Generative AI? (UA)
Oleksandr Krakovetskyi: What's wrong with Generative AI? (UA)Oleksandr Krakovetskyi: What's wrong with Generative AI? (UA)
Oleksandr Krakovetskyi: What's wrong with Generative AI? (UA)
 
Stanislav Podyachev: AI Agents as Role-Playing Business Modeling Tools (UA)
Stanislav Podyachev: AI Agents as Role-Playing Business Modeling Tools (UA)Stanislav Podyachev: AI Agents as Role-Playing Business Modeling Tools (UA)
Stanislav Podyachev: AI Agents as Role-Playing Business Modeling Tools (UA)
 
Kyryl Truskovskyi: Training and Serving Open-Sourced Foundational Models (UA)
Kyryl Truskovskyi: Training and Serving Open-Sourced Foundational Models (UA)Kyryl Truskovskyi: Training and Serving Open-Sourced Foundational Models (UA)
Kyryl Truskovskyi: Training and Serving Open-Sourced Foundational Models (UA)
 
Andrii Rodionov: What can go wrong in a distributed system – experience from ...
Andrii Rodionov: What can go wrong in a distributed system – experience from ...Andrii Rodionov: What can go wrong in a distributed system – experience from ...
Andrii Rodionov: What can go wrong in a distributed system – experience from ...
 
Dmytro Tkachenko: Можливості АІ відео для бізнесу (UA)
Dmytro Tkachenko: Можливості АІ відео для бізнесу (UA)Dmytro Tkachenko: Можливості АІ відео для бізнесу (UA)
Dmytro Tkachenko: Можливості АІ відео для бізнесу (UA)
 
Roman Kyslyi: Використання та побудова LLM агентів (UA)
Roman Kyslyi: Використання та побудова LLM агентів (UA)Roman Kyslyi: Використання та побудова LLM агентів (UA)
Roman Kyslyi: Використання та побудова LLM агентів (UA)
 
Veronika Snizhko: Штучний інтелект як каталізатор інноваційної культури в ком...
Veronika Snizhko: Штучний інтелект як каталізатор інноваційної культури в ком...Veronika Snizhko: Штучний інтелект як каталізатор інноваційної культури в ком...
Veronika Snizhko: Штучний інтелект як каталізатор інноваційної культури в ком...
 
Volodymyr Zhukov: Ключові труднощі в реальних імплементаціях AI. Досвід з пра...
Volodymyr Zhukov: Ключові труднощі в реальних імплементаціях AI. Досвід з пра...Volodymyr Zhukov: Ключові труднощі в реальних імплементаціях AI. Досвід з пра...
Volodymyr Zhukov: Ключові труднощі в реальних імплементаціях AI. Досвід з пра...
 
Volodymyr Zhukov: Куди рухається ринок AI у 2024 році. Інсайти від Stanford H...
Volodymyr Zhukov: Куди рухається ринок AI у 2024 році. Інсайти від Stanford H...Volodymyr Zhukov: Куди рухається ринок AI у 2024 році. Інсайти від Stanford H...
Volodymyr Zhukov: Куди рухається ринок AI у 2024 році. Інсайти від Stanford H...
 
Andrii Boichuk: The RAG is dead, long live the RAG або як сучасні LLM змінюют...
Andrii Boichuk: The RAG is dead, long live the RAG або як сучасні LLM змінюют...Andrii Boichuk: The RAG is dead, long live the RAG або як сучасні LLM змінюют...
Andrii Boichuk: The RAG is dead, long live the RAG або як сучасні LLM змінюют...
 
Vladyslav Fliahin: Applications of Gen AI in CV (UA)
Vladyslav Fliahin: Applications of Gen AI in CV (UA)Vladyslav Fliahin: Applications of Gen AI in CV (UA)
Vladyslav Fliahin: Applications of Gen AI in CV (UA)
 
Artem Ternov: Побудова платформи під DataEngineering та DataScience в ентерпр...
Artem Ternov: Побудова платформи під DataEngineering та DataScience в ентерпр...Artem Ternov: Побудова платформи під DataEngineering та DataScience в ентерпр...
Artem Ternov: Побудова платформи під DataEngineering та DataScience в ентерпр...
 
Michael Vidyakin: Defining PMO Structure and Governance (UA)
Michael Vidyakin: Defining PMO Structure and Governance (UA)Michael Vidyakin: Defining PMO Structure and Governance (UA)
Michael Vidyakin: Defining PMO Structure and Governance (UA)
 
Michael Vidyakin: Assessing Organizational Readiness (UA)
Michael Vidyakin: Assessing Organizational Readiness (UA)Michael Vidyakin: Assessing Organizational Readiness (UA)
Michael Vidyakin: Assessing Organizational Readiness (UA)
 

Kürzlich hochgeladen

8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCRashishs7044
 
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City GurgaonCall Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaoncallgirls2057
 
Traction part 2 - EOS Model JAX Bridges.
Traction part 2 - EOS Model JAX Bridges.Traction part 2 - EOS Model JAX Bridges.
Traction part 2 - EOS Model JAX Bridges.Anamaria Contreras
 
Cybersecurity Awareness Training Presentation v2024.03
Cybersecurity Awareness Training Presentation v2024.03Cybersecurity Awareness Training Presentation v2024.03
Cybersecurity Awareness Training Presentation v2024.03DallasHaselhorst
 
Darshan Hiranandani [News About Next CEO].pdf
Darshan Hiranandani [News About Next CEO].pdfDarshan Hiranandani [News About Next CEO].pdf
Darshan Hiranandani [News About Next CEO].pdfShashank Mehta
 
Annual General Meeting Presentation Slides
Annual General Meeting Presentation SlidesAnnual General Meeting Presentation Slides
Annual General Meeting Presentation SlidesKeppelCorporation
 
APRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfAPRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfRbc Rbcua
 
Church Building Grants To Assist With New Construction, Additions, And Restor...
Church Building Grants To Assist With New Construction, Additions, And Restor...Church Building Grants To Assist With New Construction, Additions, And Restor...
Church Building Grants To Assist With New Construction, Additions, And Restor...Americas Got Grants
 
Kenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith PereraKenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith Pereraictsugar
 
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCRashishs7044
 
Financial-Statement-Analysis-of-Coca-cola-Company.pptx
Financial-Statement-Analysis-of-Coca-cola-Company.pptxFinancial-Statement-Analysis-of-Coca-cola-Company.pptx
Financial-Statement-Analysis-of-Coca-cola-Company.pptxsaniyaimamuddin
 
Kenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby AfricaKenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby Africaictsugar
 
Chapter 9 PPT 4th edition.pdf internal audit
Chapter 9 PPT 4th edition.pdf internal auditChapter 9 PPT 4th edition.pdf internal audit
Chapter 9 PPT 4th edition.pdf internal auditNhtLNguyn9
 
PSCC - Capability Statement Presentation
PSCC - Capability Statement PresentationPSCC - Capability Statement Presentation
PSCC - Capability Statement PresentationAnamaria Contreras
 
8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCR8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCRashishs7044
 
FULL ENJOY Call girls in Paharganj Delhi | 8377087607
FULL ENJOY Call girls in Paharganj Delhi | 8377087607FULL ENJOY Call girls in Paharganj Delhi | 8377087607
FULL ENJOY Call girls in Paharganj Delhi | 8377087607dollysharma2066
 
Unlocking the Future: Explore Web 3.0 Workshop to Start Earning Today!
Unlocking the Future: Explore Web 3.0 Workshop to Start Earning Today!Unlocking the Future: Explore Web 3.0 Workshop to Start Earning Today!
Unlocking the Future: Explore Web 3.0 Workshop to Start Earning Today!Doge Mining Website
 
Innovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdfInnovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdfrichard876048
 
Memorándum de Entendimiento (MoU) entre Codelco y SQM
Memorándum de Entendimiento (MoU) entre Codelco y SQMMemorándum de Entendimiento (MoU) entre Codelco y SQM
Memorándum de Entendimiento (MoU) entre Codelco y SQMVoces Mineras
 

Kürzlich hochgeladen (20)

8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
 
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City GurgaonCall Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaon
 
Traction part 2 - EOS Model JAX Bridges.
Traction part 2 - EOS Model JAX Bridges.Traction part 2 - EOS Model JAX Bridges.
Traction part 2 - EOS Model JAX Bridges.
 
Cybersecurity Awareness Training Presentation v2024.03
Cybersecurity Awareness Training Presentation v2024.03Cybersecurity Awareness Training Presentation v2024.03
Cybersecurity Awareness Training Presentation v2024.03
 
Darshan Hiranandani [News About Next CEO].pdf
Darshan Hiranandani [News About Next CEO].pdfDarshan Hiranandani [News About Next CEO].pdf
Darshan Hiranandani [News About Next CEO].pdf
 
Annual General Meeting Presentation Slides
Annual General Meeting Presentation SlidesAnnual General Meeting Presentation Slides
Annual General Meeting Presentation Slides
 
APRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfAPRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdf
 
Church Building Grants To Assist With New Construction, Additions, And Restor...
Church Building Grants To Assist With New Construction, Additions, And Restor...Church Building Grants To Assist With New Construction, Additions, And Restor...
Church Building Grants To Assist With New Construction, Additions, And Restor...
 
Kenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith PereraKenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith Perera
 
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
 
Financial-Statement-Analysis-of-Coca-cola-Company.pptx
Financial-Statement-Analysis-of-Coca-cola-Company.pptxFinancial-Statement-Analysis-of-Coca-cola-Company.pptx
Financial-Statement-Analysis-of-Coca-cola-Company.pptx
 
Kenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby AfricaKenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby Africa
 
Call Us ➥9319373153▻Call Girls In North Goa
Call Us ➥9319373153▻Call Girls In North GoaCall Us ➥9319373153▻Call Girls In North Goa
Call Us ➥9319373153▻Call Girls In North Goa
 
Chapter 9 PPT 4th edition.pdf internal audit
Chapter 9 PPT 4th edition.pdf internal auditChapter 9 PPT 4th edition.pdf internal audit
Chapter 9 PPT 4th edition.pdf internal audit
 
PSCC - Capability Statement Presentation
PSCC - Capability Statement PresentationPSCC - Capability Statement Presentation
PSCC - Capability Statement Presentation
 
8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCR8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCR
 
FULL ENJOY Call girls in Paharganj Delhi | 8377087607
FULL ENJOY Call girls in Paharganj Delhi | 8377087607FULL ENJOY Call girls in Paharganj Delhi | 8377087607
FULL ENJOY Call girls in Paharganj Delhi | 8377087607
 
Unlocking the Future: Explore Web 3.0 Workshop to Start Earning Today!
Unlocking the Future: Explore Web 3.0 Workshop to Start Earning Today!Unlocking the Future: Explore Web 3.0 Workshop to Start Earning Today!
Unlocking the Future: Explore Web 3.0 Workshop to Start Earning Today!
 
Innovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdfInnovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdf
 
Memorándum de Entendimiento (MoU) entre Codelco y SQM
Memorándum de Entendimiento (MoU) entre Codelco y SQMMemorándum de Entendimiento (MoU) entre Codelco y SQM
Memorándum de Entendimiento (MoU) entre Codelco y SQM
 

MS SQL 2019: Big Data Processing and PolyBase Overview

  • 1. MS SQL 2019: Big Data Processing Andrii Zrobok Chief Database Developer, EPAM azrobok@gmail.com
  • 2. Agenda  MS SQL 2019 overview  PolyBase: History, What, Why, Demo  Big Data Cluster  Scenarios
  • 3. About me 25 + years of experience in database development: development data-centric applications from scratch, support of legacy databases/applications, data migration tasks, performance tuning, SSIS/ETL tasks, consulting, database trainer, etc. Databases: FoxPro 2.0 for DOS (Fox Software), MS SQL Server (from version 6.5, 1996), Oracle, Sybase ASE, MySQL, PostgreSQL Co-leader of Lviv Data Platform UG (PASS Local Chapter) (http://lvivsqlug.pass.org/) Speaker at: • PASS SQLSaturday conferences (Lviv, Kyiv, Dnipro, Odessa, Kharkiv; since 2013) • PASS L’viv/Vinnitsa/Virtual SQL Server User Groups; • EPAM IT Week 2015-2017
  • 4. Nowadays challenges  Unified access to all your data with unparalleled performance  Easily and securely manage data big and small  Build intelligent Apps and AI with all your data
  • 5. MS SQL 2019 Preview Windows: Standard version with PolyBase Linux: Linux version without PolyBase Docker: Database Engine Container Image (Ubuntu, Red Hat) Big Data Analytics: Linux container on Kubernetes https://www.microsoft.com/en-us/sql-server/sql-server-2019#Install
  • 6. PolyBase: What? SQL Server PolyBase external tables / external data source T-SQLApplications Analytics Microsoft's newest technology for connecting to remote servers. https://docs.microsoft.com/uk-ua/sql/relational-databases/polybase/polybase- guide?view=sqlallproducts-allversions
  • 7. PolyBase: History  Introduced in SQL Server Parallel Data Warehouse (PDW) edition, back in 2010  Expanded in SQL Server Analytics Platform System (APS) in 2012.  Released to the "general public" in SQL Server 2016, with most support being in Enterprise Edition.  Extended support for additional technologies (like Oracle, MongoDB, etc.) will be available in SQL Server 2019.
  • 8. PolyBase: Why?  Without PolyBase  Transfer half your data so that all your data was in one format or the other  Query both sources of data, then write custom query logic to join and integrate the data at the client level.  With PolyBase  using T-SQL to join the data (external table, statistics for external table)  Usage  Querying / Import (into table) / Export (into data storage)  Performance  Use computation on Target server (OPTION (FORCE EXTERNALPUSHDOWN))
  • 9. PolyBase: Demo - tools 1) PolyBase should be installed and enabled 2) Using Management Studio (scripts, no visibility) OR 3) Using Azure Data Studio + SQL Server 2019 (Preview) Extension https://docs.microsoft.com/en-us/sql/azure-data-studio/download?view=sql- server-2017 https://docs.microsoft.com/en-us/sql/azure-data-studio/sql-server-2019- extension?view=sqlallproducts-allversions
  • 10. PolyBase: Demo - steps  Create master key (needed for password encryption)  Create database scoped credential (access to remote database server)  Create external data source (address of remote database server)  Create schema for external data (optional)  Create external tables / statistics on external tables
  • 11. PolyBase: Demo – external tables CREATE DATABASE SCOPED CREDENTIAL OracleCredentials WITH IDENTITY = 'system', Secret = '0x7ORA18c'; CREATE EXTERNAL DATA SOURCE OracleInstance WITH ( LOCATION = 'oracle://192.168.1.103:1521', CREDENTIAL = OracleCredentials); CREATE EXTERNAL TABLE pb_oracle.countries ( country_id CHAR(2) NOT NULL , country_name VARCHAR(40) , region_id INTEGER ) WITH ( LOCATION='XE.EDU.COUNTRIES', DATA_SOURCE=OracleInstance);
  • 12. PolyBase: select from remote servers SELECT e.employee_id, e.first_name, e.last_name ,d.department_name ,l.city ,c.country_name ,r.region_name FROM dbo.employees e INNER JOIN dbo.departments d ON e.department_id = d.department_id INNER JOIN dbo.locations l ON d.location_id = l.location_id INNER JOIN pb_oracle.countries c ON c.country_id = l.country_id INNER JOIN pb_sqlserver.regions r ON r.region_id = c.region_id
  • 14. PolyBase: statistics CREATE STATISTICS CustomerCustKeyStatistics ON pb_sqlserver.address (stateprovinceid) WITH FULLSCAN; SELECT DISTINCT a.city from [pb_sqlserver].[address] a where a.stateprovinceid = 9
  • 15. PolyBase: externalpushdown select stateprovinceid, count(*) from pb_sqlserver.address group by stateprovinceid select stateprovinceid, count(*) from pb_sqlserver.address group by stateprovinceid OPTION (DISABLE EXTERNALPUSHDOWN)
  • 16. PolyBase: Scale – out groups One node – up to 8 readers Polybase extends the idea of Massively Parallel Processing (MPP) to SQL Server. SQL Server is a classic "scale-up" technology: if you want more power, add more RAM/CPUs/resources to the single server. Hadoop is a great example of an MPP system: if you want more power, add more servers; the system will coordinate processing.
  • 18. Big data cluster architecture
  • 19. Big data cluster component Component Description Control Plane The control plane provides management and security for the cluster. It contains the Kubernetes master, the SQL Server master instance, and other cluster-level services such as the Hive Metastore and Spark Driver. Compute plane The compute plane provides computational resources to the cluster. It contains nodes running SQL Server on Linux pods. The pods in the compute plane are divided into compute pools for specific processing tasks. A compute pool can act as a PolyBase scale-out group for distributed queries over different data sources-such as HDFS, Oracle, MongoDB, or Teradata. Data plane The data plane is used for data persistence and caching. The SQL data pool consists of one or more pods running SQL Server on Linux. It is used to ingest data from SQL queries or Spark jobs. SQL Server big data cluster data marts are persisted in the data pool. The storage pool consists of storage pool pods comprised of SQL Server on Linux, Spark, and HDFS. All the storage nodes in a SQL Server big data cluster are members of an HDFS cluster.
  • 20. Management  Easy deploy and manage because of benefits of containers and Kubernetes  Fast to deploy  Self contained (no installations required, images)  Easy upgrade – new image uploading  Scalable, multi-tenant
  • 21. Scenarios: Data virtualization By leveraging SQL Server PolyBase SQL Server big data clusters can query external data sources without moving or copying the data
  • 22. Scenarios: Data Lake A SQL Server big data cluster includes a scalable HDFS storage pool. This can be used to store big data, potentially ingested from multiple external sources. Once the big data is stored in HDFS in the big data cluster, you can analyze and query the data and combine it with your relational data.
  • 23. Scenarios: Scale-out datamart SQL Server big data clusters provide scale-out compute and storage to improve the performance of analyzing any data. Data from a variety of sources can be ingested and distributed across data pool nodes as a cache for further analysis.
  • 24. Scenarios: Integrated AI and ML SQL Server big data clusters enable AI and machine learning tasks on the data stored in HDFS storage pools and the data pools. You can use Spark as well as built-in AI tools in SQL Server, using R, Python, Scala, or Java.
  • 25. MS SQL Server 2019 & Big Data Processing The end Q&A THANK YOU

Hinweis der Redaktion

  1. Big Data Clusters The latest version simplifies big data analytics for SQL Server users. The new SQL server combines HDFS (the Hadoop Distributed File System) and Apache Spark and provides one integrated system. It provides the facility of data virtualization by integrating data without extracting , transforming and loading it. Big data clusters are difficult to deploy but if you have Kubernetes infrastructure, a single command will deploy your big data cluster in about half an hour.
  2. Polybase is Microsoft's newest technology for connecting to remote servers. It started by letting you connect to Hadoop and has expanded since then to include Azure Blob Storage. Polybase is also the best method to load data into Azure SQL Data Warehouse. The PolyBase product which was in earlier version too has been expanded. Sql server can now support queries from external sources like Oracle, Teradata, MongoDB which as a result increases the flexibility of the sql server
  3. Polybase lets SQL Server compute nodes talk directly to Hadoop data nodes, perform aggregations, and then return results to the head node. This removes the classic SQL Server single point of contention.
  4. Kubernetes enable you to use the cluster as if it is single PC. You don’t need to care the detail of the infrastructure. Just declare the what you want in yaml file, you will get what you want Cluster A Kubernetes cluster is a set of machines, known as nodes. One node controls the cluster and is designated the master node; the remaining nodes are worker nodes. The Kubernetes master is responsible for distributing work between the workers, and for monitoring the health of the cluster. Node A node runs containerized applications. It can be either a physical machine or a virtual machine. A Kubernetes cluster can contain a mixture of physical machine and virtual machine nodes. Pod A pod is the atomic deployment unit of Kubernetes. A pod is a logical group of one or more containers-and associated resources-needed to run an application. Each pod runs on a node; a node can run one or more pods. The Kubernetes master automatically assigns pods to nodes in the cluster. In SQL Server big data clusters, Kubernetes is responsible for the state of the SQL Server big data clusters; Kubernetes builds and configures the cluster nodes, assigns pods to nodes, and monitors the health of the cluster.
  5. Big Data Clusters The latest version simplifies big data analytics for SQL Server users. The new SQL server combines HDFS (the Hadoop Distributed Filing System) and Apache Spark and provides one integrated system. It provides the facility of data virtualization by integrating data without extracting , transforming and loading it. Big data clusters are difficult to deploy but if you have Kubernetes infrastructure, a single command will deploy your big data cluster in about half an hour. A SQL Server big data cluster is a cluster of Linux containers orchestrated by Kubernetes. Starting with SQL Server 2019 preview, SQL Server big data clusters allow you to deploy scalable clusters of SQL Server, Spark, and HDFS containers running on Kubernetes. These components are running side by side to enable you to read, write, and process big data from Transact-SQL or Spark, allowing you to easily combine and analyze your high-value relational data with high-volume big data. Control plane The control plane provides management and security for the cluster. It contains the Kubernetes master, the SQL Server master instance, and other cluster-level services such as the Hive Metastore and Spark Driver. Compute plane The compute plane provides computational resources to the cluster. It contains nodes running SQL Server on Linux pods. The pods in the compute plane are divided into compute pools for specific processing tasks. A compute pool can act as a PolyBase scale-out group for distributed queries over different data sources-such as HDFS, Oracle, MongoDB, or Teradata. Data plane The data plane is used for data persistence and caching. It contains the SQL data pool, and storage pool. The SQL data pool consists of one or more pods running SQL Server on Linux. It is used to ingest data from SQL queries or Spark jobs. SQL Server big data cluster data marts are persisted in the data pool. The storage pool consists of storage pool pods comprised of SQL Server on Linux, Spark, and HDFS. All the storage nodes in a SQL Server big data cluster are members of an HDFS cluster.
  6. Data virtualization: By leveraging SQL Server PolyBase, SQL Server big data clusters can query external data sources without moving or copying the data.
  7. Data lake A SQL Server big data cluster includes a scalable HDFS storage pool. This can be used to store big data, potentially ingested from multiple external sources. Once the big data is stored in HDFS in the big data cluster, you can analyze and query the data and combine it with your relational data.
  8. Data virtualization: By leveraging SQL Server PolyBase, SQL Server big data clusters can query external data sources without moving or copying the data. SQL Server 2019 preview introduces new connectors to data sources. Data lake A SQL Server big data cluster includes a scalable HDFS storage pool. This can be used to store big data, potentially ingested from multiple external sources. Once the big data is stored in HDFS in the big data cluster, you can analyze and query the data and combine it with your relational data. Scale-out data mart SQL Server big data clusters provide scale-out compute and storage to improve the performance of analyzing any data. Data from a variety of sources can be ingested and distributed across data pool nodes as a cache for further analysis.
  9. Integrated AI and Machine Learning SQL Server big data clusters enable AI and machine learning tasks on the data stored in HDFS storage pools and the data pools. You can use Spark as well as built-in AI tools in SQL Server, using R, Python, Scala, or Java.