SlideShare ist ein Scribd-Unternehmen logo
1 von 37
Modernizing Your 
Data Warehouse 
using APS 
Big data. Small data. All data. 
Stéphane Fréchette - SQL Server MVP - @sfrechette 
Database / Business Intelligence Solution Architect
- Gartner, “The State of Data Warehousing in 2012”
Increasing 
data volumes 
1 
Real-time 
data 
2 
New data 
sources and types 
3 
4 
Cloud-born 
data 
Data sources
 
The modern data warehouse 
Data sources Non-relational data
Insights from all your data 
Enrich and optimize your data from non-traditional sources 
5
Roadblocks to a modern data warehouse 
Keep legacy 
investment 
Buy new tier-one 
hardware appliance 
Acquire Big Data 
solution 
Acquire business 
intelligence 
Limited 
scalability and ability to 
handle new data types 
Significant training 
and data silos 
High acquisition 
and migration 
costs 
Complex with low 
adoption
Introducing the Microsoft Analytics Platform System 
The turnkey modern data warehouse appliance 
• Relational and non-relational 
data in a single appliance 
• Enterprise-ready Hadoop 
• Integrated querying across 
Hadoop and PDW using T-SQL 
• Direct integration with 
Microsoft BI tools such as 
Microsoft Excel 
• Near real-time performance 
with In-Memory Columnstore 
• Ability to scale out to 
accommodate growing data 
• Removal of data warehouse 
bottlenecks with MPP SQL 
Server 
• Concurrency that fuels rapid 
adoption 
• Industry’s lowest data 
warehouse appliance price per 
terabyte 
• Value through a single 
appliance solution 
• Value with flexible hardware 
options using commodity 
hardware
Microsoft Analytics Platform System 
The turnkey modern data warehouse appliance
Evolution in the nature and use of data in the enterprise 
Data complexity: 
variety and velocity 
Petabytes 
Historical 
analysis 
Insight 
analysis 
Predictive 
analytics 
Predictive 
forecasting 
Value to the business
What is Hadoop? 
Microsoft Confidential 
10 
OPERATIONAL 
SERVICES 
AMBARI 
Core Services 
DATA 
SERVICES 
MAP 
REDUCE 
HDFS 
FLUME 
SQOOP 
LOAD & 
EXTRACT 
NFS 
WebHDFS 
OOZIE 
YARN 
HIVE & 
HCATALOG 
PIG 
FALCON HBASE 
Hadoop Cluster 
compute 
& 
. . . 
storage . . . 
. . 
compute 
& 
storage 
. 
. 
Hadoop clusters provide 
scale-out storage and 
distributed data processing 
on commodity hardware
Manageable, secured, and highly available Hadoop integrated into the appliance 
High performance 
and tuned within the 
appliance 
End-user 
authentication with 
Active Directory 
Accessible insights 
for everyone with 
Microsoft BI tools 
Managed and 
monitored using 
System Center 
100-percent Apache 
Hadoop 
SQL Server 
Parallel Data 
Warehouse 
PolyBase 
Microsoft 
HDInsight
Parallel Data Warehouse 
workload 
HDInsight workload 
Fabric 
Hardware 
Appliance 
A region is a logical container within an 
appliance 
Each workload contains the following 
boundaries: 
• Security 
• Metering 
• Servicing
Bringing Hadoop point solutions and the data warehouse together for users and IT 
Provides a single T-SQL query model for PDW 
and Hadoop with rich features of T-SQL, 
including joins without ETL 
Uses the power of MPP to enhance query 
execution performance 
Supports Windows Azure HDInsight to enable 
new hybrid cloud scenarios 
Provides the ability to query non-Microsoft 
Hadoop distributions, such as Hortonworks and 
Cloudera 
SQL Server 
Parallel Data 
Warehouse 
Microsoft Azure 
HDInsight 
PolyBase 
Microsoft 
HDInsight 
Hortonworks for 
Windows and Linux 
Cloudera 
Select… Result set
Results 
Direct and parallelized HDFS access 
Enhancing the Data Movement Service (DMS) of APS to allow direct communication between HDFS data nodes and PDW compute 
nodes 
Non-relational data 
Social 
apps 
Sensor 
and RFID 
Mobile 
apps 
Web 
apps 
Hadoop 
Relational data 
Traditional schema-based 
data warehouse applications 
Regular 
T-SQL 
External table 
External data 
source 
External file 
format 
Enhanced PDW 
query engine 
HDFS bridge PDW
Hadoop / Data Lake 
(Cloudera, Hortonworks, 
HDInsight) 
Source systems 
Day / Hour / Minute Refresh 
SQL Server 
Data Marts 
SQL Server 
Reporting Services 
SQL Server 
Analytics / Ad-hoc / Visualization 
MapReduce T-SQL 
SQL Server 
Parallel Data 
Warehouse 
PolyBase 
Microsoft 
HDInsight 
Analysis Services APS
HDFS File / Directory 
//hdfs/social_media/twitter 
//hdfs/social_media/twitter/Daily.log 
1 
0 
Hadoop 
Dynamic binding 
Column filtering 
Row filtering 
User Location Product Sentiment Rtwt Hour Date 
Sean 
Audie 
Suz 
Tom 
Sanjay 
Roger 
Steve 
CA 
CO 
WA 
IL 
MN 
TX 
AL 
xbox 
excel 
xbox 
sqls 
wp8 
ssas 
ssrs 
-1 
0 
1 
1 
1 
1 
5 
0 
8 
0 
0 
0 
8 
8 
2 
2 
1 
23 
23 
5-15-14 
5-15-14 
5-15-14 
5-13-14 
5-14-14 
5-14-14 
5-13-14 
SELECT User, Product, Sentiment 
FROM Twitter_Table 
WHERE Hour = Current - 1 
AND Date = Today 
AND Sentiment >= 0
Improve APS operations by extending PolyBase 
HDFS file formats 
Textfile and 
RCFile support 
• Microsoft Azure HDInsight 
• HDInsight on APS 
• Hortonworks Data Platform 
1.3 and 2.0 (Linux/Windows 
Server) 
• Cloudera Linux 4.3 
Security and 
permission model 
External table 
source and file 
format syntax 
Microsoft 
Azure 
Storage 
Blobs 
AU1 
PolyBase v2 
Analytics Platform 
System 
(powered by PolyBase)
Big Data insights for anyone 
New insights with familiar tools through native Microsoft BI integration 
Minimizes IT 
intervention for 
discovering data 
with tools such as 
Microsoft Excel 
Enables DBA and 
power users to join 
relational and 
Hadoop data with 
T-SQL 
Takes advantage of 
high adoption 
of Excel, Power 
View, PowerPivot, 
and SQL Server 
Analysis Services 
Offers Hadoop 
tools like 
MapReduce, Hive, 
and Pig for data 
scientists 
Everyone else using 
Microsoft BI tools 
Power users 
Data scientist
CREATE EXTERNAL TABLE table_name 
({<column_definition>}[,..n ]) 
{WITH ( 
DATA_SOURCE = <data_source>, 
FILE_FORMAT = <file_format>, 
LOCATION =‘<file_path>’, 
[REJECT_VALUE = <value>], 
…)}; 
1 Referencing external data source 
2 Referencing external file format 
3 Path of the Hadoop file/folder 
4 (Optional) Reject parameters
CREATE EXTERNAL DATA SOURCE datasource_name 
{WITH ( 
TYPE = <data_source>, 
LOCATION =‘<location>’, 
[JOB_TRACKER_LOCATION = ‘<jb_location>’] 
}; 
1 Type of external data source 
2 Location of external data source 
Enabling or disabling of MapReduce 
job generation 
3
CREATE EXTERNAL FILE FORMAT fileformat_name 
{WITH ( 
FORMAT_TYPE = <type>, 
[SERDE_METHOD = ‘<sede_method>’,] 
[DATA_COMPRESSION = ‘<compr_method>’, 
[FORMAT_OPTIONS (<format_options>)] 
}; 
1 Type of external data source 
2 (De)Serialization method [Hive RCFile] 
3 Compression method 
4 (Optional) Format Options [Text Files]
<Format Options> :: = 
[,FIELD_TERMINATOR = ‘value’], 
[,STRING_DELIMITER = ‘value’], 
[,DATE_FORMAT = ‘value’], 
[USE_TYPE_DEFAULT = ‘value’] 
1 Column delimiter 
2 Delimiter for string data types 
3 To specify a particular date format 
4 How missing entries are handled
Bringing islands of Hadoop data together 
Running high performance queries against Hadoop data 
Archiving data warehouse data to Hadoop (move) 
Exporting relational data to Hadoop (copy) 
Importing Hadoop data into a data warehouse (copy)
Microsoft Analytics Platform System 
The turnkey modern data warehouse appliance
Scale up Rowstore 
Diminishing scale as requirements grow 
Data 
Querying data by row 
Page 1 Page 2 Page 3 
C1 C2 C3 C4 
R1 R1 R1 R1 
R2 R2 R2 R2 
R3 R3 R3 R3 
R4 R4 R4 R4 
R5 R5 R5 R5 
R6 R6 R6 R6 
Sub-optimal performance for many data 
warehouse queries 
Forklift 
Forklift
Scale out Multiple nodes with dedicated CPU, 
memory, and storage 
Ability to incrementally add hardware 
for near-linear scale to multiple 
petabytes 
Ability to handle query complexity and 
concurrency at scale 
No “forklift” of prior warehouse to 
increase capacity 
Ability to scale out HDInsight and PDW 
Scaling out your data to petabytes 
Scale-out technologies in the Analytics Platform System 
PDW / 
HDInsight 
PDW / 
HDInsight 
PDW / 
HDInsight 
PDW 
PDW / 
HDInsight 
PDW / 
HDInsight 
PDW / 
HDInsight 
0 terabytes 6 petabytes
Blazing-fast performance 
MPP and In-Memory Columnstore for next-generation performance 
Up to 100x 
faster queries 
Updateable clustered columnstore vs. table with customary indexing 
• Store data in columnar format for massive 
compression 
• Load data into or out of memory for next-generation 
performance with up to 60% 
improvement in data loading speed 
• Updateable and clustered for real-time trickle 
loading 
Up to 15x 
more compression 
Columnstore index representation 
Parallel query execution 
Query 
Results
Why is a clustered columnstore index 
important? 
• Saves space 
• Provides easier management by eliminating 
maintenance of secondary indexes 
• Supports all PDW data types, including high-precision 
decimal data types and more 
Space used in GB (table with 101 million rows) 
Space used = table space + index space 
20.0 
15.0 
10.0 
5.0 
0.0 
91% 
savings 
1 2 3 4 5 6 
In-Memory Columnstore is featured in the 
storage engine in PDW AU1
Relational query execution processing 
1 SQL queries sent to control node 
Control node creates query 
execution plan 
2 
Query plan creates distributed 
queries to run on each compute 
node 
3 
Distributed queries sent to compute 
nodes (all running in parallel) 
4 
Control node collects query results 
and returns them to user 
5 
Create query plan 
User query 
Client Control 
Compute 
Compute 
Compute 
Compute 
Appliance 
Management 
Query results 
Aggregate query results Compute nodes 
process query plan 
operations in parallel
SQL Server SMP 
Reporting and cubes 
BI Tools 
Great performance with mixed workloads 
Analytics Platform System 
ETL/ELT with SSIS, DQS, MDS 
ERP CRM LOB APPS 
ETL/ELT with DWLoader 
Hadoop / Big Data 
PDW 
PolyBase 
HDInsight 
Ad hoc queries 
Intra-Day 
Near real-time 
Fast ad hoc 
Columnstore 
Polybase 
CRTAS 
Link Table 
Real-Time 
ROLAP / MOLAP 
DirectQuery 
SNAC
Microsoft Analytics Platform System 
The turnkey modern data warehouse appliance
High performance using commodity hardware 
Price per terabyte for leading vendors 
Significantly lower 
price per terabyte 
than the closest competitor 
Price per terabyte for user-available storage (compressed) 
NOTE: Orange line indicates average price per 
terabyte. 
Thousands 
Oracle EMC IBM Teradata Microsoft 
$30 
$25 
$20 
$15 
$10 
$5 
$0 
Lower storage costs 
with Windows Server 2012 
Storage Spaces
Hardware and software engineered together 
The ease of an appliance 
Co-engineered 
with HP, Dell, and 
Quanta best 
practices 
Leading 
performance with 
commodity 
hardware 
Integrated 
support plan with 
a single Microsoft 
PDW contact 
Pre-configured, 
built, and tuned 
software and 
hardware 
PolyBase 
HDInsight
Hardware architecture InfiniBand 
InfiniBand 
PDW region 
Ethernet 
Ethernet 
Control node 
Failover node 
Master node 
Failover node 
Compute nodes 
Economical disk storage 
Compute nodes 
Economical disk storage 
Compute nodes 
Economical disk storage 
Networking 
HDInsight region 
PDW region 
Rack #1 
InfiniBand 
InfiniBand 
Ethernet 
Ethernet 
Failover node 
Compute nodes 
Economical disk storage 
Compute nodes 
Economical disk storage 
Compute nodes 
Economical disk storage 
HDI extension base 
unit 
HDI active scale 
unit 
HDI active scale 
unit 
HDI extension base 
unit 
Rack #2 
HST-01 
HST-02 
HSA-01 
HST-02 
Economical 
disk storage 
IB and Ethernet 
Active Unit Addition of two or three compute nodes 
depending on OEM hardware 
configuration and related storage 
Passive Unit Host for non-worker HDInsight nodes 
Failover Node High availability for the rack
• PDW engine 
• DMS Manager 
• SQL Server 2012 Enterprise Edition (PDW build) 
Base Unit C 
T 
L 
Host 1 
Host 2 
Host 3 
Host 4 
Economical 
disk storage 
IB and 
Ethernet 
Direct attached SAS 
M 
A 
D 
A 
D 
V 
M 
M 
Compute 1 
Compute 2 
Software details 
• All hosts run Windows Server 2012 Standard and 
Windows Azure Virtual Machines 
• Fabric or workload in Hyper-V Virtual Machines 
• Fabric virtual machine, management server (MAD01), 
and control server (CTL) share one server 
• PDW agent that runs on all hosts and all virtual 
machines 
• DWConfig and Admin Console 
• Windows Storage Spaces and Azure Storage blobs
CT 
Base Unit 
L 
Host 1 
Host 1 
Host 2 
Host 3 
Host 4 
Economical 
disk 
storage 
IB and 
Ethernet 
Direct attached SAS 
M 
AD 
A 
D 
V 
M 
M 
Compute 1 
Compute 1 
Compute 2 
Host 5 
Passive Unit 
2 
Base Unit 
CT 
L 
M 
AD 
FA 
B 
AD 
V 
M 
M 
Compute 1 
CT 
L 
Virtual machine migration can be used to move 
workload nodes to new hosts after hardware failure 
Cluster Shared Volumes 
• Enable all nodes to access logical unit numbers 
(LUNs) on economical disk storage 
• Use Server Message Block (SMB3) protocol 
Failover capabilities 
• Uses one cluster across the whole appliance 
• Automatically migrates virtual machines on host 
failure 
• Enforces rules with affinity and anti-affinity maps 
• Uses Windows Failover Cluster Manager
Modernizing Your Data Warehouse using APS

Weitere ähnliche Inhalte

Was ist angesagt?

Pentaho Big Data Analytics with Vertica and Hadoop
Pentaho Big Data Analytics with Vertica and HadoopPentaho Big Data Analytics with Vertica and Hadoop
Pentaho Big Data Analytics with Vertica and HadoopMark Kromer
 
Solving Performance Problems on Hadoop
Solving Performance Problems on HadoopSolving Performance Problems on Hadoop
Solving Performance Problems on HadoopTyler Mitchell
 
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesBig Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesDenodo
 
Break Free From Oracle with Attunity and Microsoft
Break Free From Oracle with Attunity and MicrosoftBreak Free From Oracle with Attunity and Microsoft
Break Free From Oracle with Attunity and MicrosoftAttunity
 
Big Data in the Real World
Big Data in the Real WorldBig Data in the Real World
Big Data in the Real WorldMark Kromer
 
Which data should you move to Hadoop?
Which data should you move to Hadoop?Which data should you move to Hadoop?
Which data should you move to Hadoop?Attunity
 
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 MillionHow One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 MillionDataWorks Summit
 
Hadoop Journey at Walgreens
Hadoop Journey at WalgreensHadoop Journey at Walgreens
Hadoop Journey at WalgreensDataWorks Summit
 
Breakout: Hadoop and the Operational Data Store
Breakout: Hadoop and the Operational Data StoreBreakout: Hadoop and the Operational Data Store
Breakout: Hadoop and the Operational Data StoreCloudera, Inc.
 
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...Mark Rittman
 
Scalable data pipeline
Scalable data pipelineScalable data pipeline
Scalable data pipelineGreenM
 
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for AnalyticsVerizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for AnalyticsDataWorks Summit
 
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...DataWorks Summit
 
Big Data in the Cloud with Azure Marketplace Images
Big Data in the Cloud with Azure Marketplace ImagesBig Data in the Cloud with Azure Marketplace Images
Big Data in the Cloud with Azure Marketplace ImagesMark Kromer
 
Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016StampedeCon
 
Scaling Data Science on Big Data
Scaling Data Science on Big DataScaling Data Science on Big Data
Scaling Data Science on Big DataDataWorks Summit
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's includedJames Serra
 
Real-time Data Pipelines with SAP and Apache Kafka
Real-time Data Pipelines with SAP and Apache KafkaReal-time Data Pipelines with SAP and Apache Kafka
Real-time Data Pipelines with SAP and Apache KafkaCarole Gunst
 
Building a Big Data Solution
Building a Big Data SolutionBuilding a Big Data Solution
Building a Big Data SolutionJames Serra
 

Was ist angesagt? (20)

Pentaho Big Data Analytics with Vertica and Hadoop
Pentaho Big Data Analytics with Vertica and HadoopPentaho Big Data Analytics with Vertica and Hadoop
Pentaho Big Data Analytics with Vertica and Hadoop
 
Solving Performance Problems on Hadoop
Solving Performance Problems on HadoopSolving Performance Problems on Hadoop
Solving Performance Problems on Hadoop
 
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesBig Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data Lakes
 
Break Free From Oracle with Attunity and Microsoft
Break Free From Oracle with Attunity and MicrosoftBreak Free From Oracle with Attunity and Microsoft
Break Free From Oracle with Attunity and Microsoft
 
Big Data in the Real World
Big Data in the Real WorldBig Data in the Real World
Big Data in the Real World
 
Which data should you move to Hadoop?
Which data should you move to Hadoop?Which data should you move to Hadoop?
Which data should you move to Hadoop?
 
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 MillionHow One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
 
Hadoop Journey at Walgreens
Hadoop Journey at WalgreensHadoop Journey at Walgreens
Hadoop Journey at Walgreens
 
Breakout: Hadoop and the Operational Data Store
Breakout: Hadoop and the Operational Data StoreBreakout: Hadoop and the Operational Data Store
Breakout: Hadoop and the Operational Data Store
 
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
 
Scalable data pipeline
Scalable data pipelineScalable data pipeline
Scalable data pipeline
 
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for AnalyticsVerizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
 
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
 
Big Data in the Cloud with Azure Marketplace Images
Big Data in the Cloud with Azure Marketplace ImagesBig Data in the Cloud with Azure Marketplace Images
Big Data in the Cloud with Azure Marketplace Images
 
Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016
 
Scaling Data Science on Big Data
Scaling Data Science on Big DataScaling Data Science on Big Data
Scaling Data Science on Big Data
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's included
 
Real-time Data Pipelines with SAP and Apache Kafka
Real-time Data Pipelines with SAP and Apache KafkaReal-time Data Pipelines with SAP and Apache Kafka
Real-time Data Pipelines with SAP and Apache Kafka
 
Building a Big Data Solution
Building a Big Data SolutionBuilding a Big Data Solution
Building a Big Data Solution
 
Filling the Data Lake
Filling the Data LakeFilling the Data Lake
Filling the Data Lake
 

Andere mochten auch

Bi303 data warehousing with fast track and pdw - Assaf Fraenkel
Bi303 data warehousing with fast track and pdw - Assaf FraenkelBi303 data warehousing with fast track and pdw - Assaf Fraenkel
Bi303 data warehousing with fast track and pdw - Assaf Fraenkelsqlserver.co.il
 
Sql server 2012_parallel_data_warehouse_breakthrough_platform_white_paper
Sql server 2012_parallel_data_warehouse_breakthrough_platform_white_paperSql server 2012_parallel_data_warehouse_breakthrough_platform_white_paper
Sql server 2012_parallel_data_warehouse_breakthrough_platform_white_paperWendy Frodyma
 
Microsoft Azure Data Warehouse Overview
Microsoft Azure Data Warehouse OverviewMicrosoft Azure Data Warehouse Overview
Microsoft Azure Data Warehouse OverviewJustin Munsters
 
PDW value proposition
PDW value propositionPDW value proposition
PDW value propositionWendy Frodyma
 
What exactly is Business Intelligence?
What exactly is Business Intelligence?What exactly is Business Intelligence?
What exactly is Business Intelligence?James Serra
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseJames Serra
 
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stackBig Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stackAndrew Brust
 
Transitioning to a BI Role
Transitioning to a BI RoleTransitioning to a BI Role
Transitioning to a BI RoleJames Serra
 
Introducing Azure SQL Database
Introducing Azure SQL DatabaseIntroducing Azure SQL Database
Introducing Azure SQL DatabaseJames Serra
 
Best Practices to Deliver BI Solutions
Best Practices to Deliver BI SolutionsBest Practices to Deliver BI Solutions
Best Practices to Deliver BI SolutionsJames Serra
 
SQL Server 2016: Just a Few of Our DBA's Favorite Things
SQL Server 2016: Just a Few of Our DBA's Favorite ThingsSQL Server 2016: Just a Few of Our DBA's Favorite Things
SQL Server 2016: Just a Few of Our DBA's Favorite ThingsHostway|HOSTING
 
Benefits of the Azure cloud
Benefits of the Azure cloudBenefits of the Azure cloud
Benefits of the Azure cloudJames Serra
 
SQL - Parallel Data Warehouse (PDW)
SQL - Parallel Data Warehouse (PDW)SQL - Parallel Data Warehouse (PDW)
SQL - Parallel Data Warehouse (PDW) Karan Gulati
 
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...Hortonworks
 
Should I move my database to the cloud?
Should I move my database to the cloud?Should I move my database to the cloud?
Should I move my database to the cloud?James Serra
 
Power BI Made Simple
Power BI Made SimplePower BI Made Simple
Power BI Made SimpleJames Serra
 
SQL Server on Linux - march 2017
SQL Server on Linux - march 2017SQL Server on Linux - march 2017
SQL Server on Linux - march 2017Sorin Peste
 
What's new in SQL Server 2016
What's new in SQL Server 2016What's new in SQL Server 2016
What's new in SQL Server 2016James Serra
 

Andere mochten auch (19)

Bi303 data warehousing with fast track and pdw - Assaf Fraenkel
Bi303 data warehousing with fast track and pdw - Assaf FraenkelBi303 data warehousing with fast track and pdw - Assaf Fraenkel
Bi303 data warehousing with fast track and pdw - Assaf Fraenkel
 
Sql server 2012_parallel_data_warehouse_breakthrough_platform_white_paper
Sql server 2012_parallel_data_warehouse_breakthrough_platform_white_paperSql server 2012_parallel_data_warehouse_breakthrough_platform_white_paper
Sql server 2012_parallel_data_warehouse_breakthrough_platform_white_paper
 
Microsoft Azure Data Warehouse Overview
Microsoft Azure Data Warehouse OverviewMicrosoft Azure Data Warehouse Overview
Microsoft Azure Data Warehouse Overview
 
PDW value proposition
PDW value propositionPDW value proposition
PDW value proposition
 
What exactly is Business Intelligence?
What exactly is Business Intelligence?What exactly is Business Intelligence?
What exactly is Business Intelligence?
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data Warehouse
 
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stackBig Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
 
Modern Veri Ambarı_Cem Kubilay
Modern Veri Ambarı_Cem KubilayModern Veri Ambarı_Cem Kubilay
Modern Veri Ambarı_Cem Kubilay
 
Transitioning to a BI Role
Transitioning to a BI RoleTransitioning to a BI Role
Transitioning to a BI Role
 
Introducing Azure SQL Database
Introducing Azure SQL DatabaseIntroducing Azure SQL Database
Introducing Azure SQL Database
 
Best Practices to Deliver BI Solutions
Best Practices to Deliver BI SolutionsBest Practices to Deliver BI Solutions
Best Practices to Deliver BI Solutions
 
SQL Server 2016: Just a Few of Our DBA's Favorite Things
SQL Server 2016: Just a Few of Our DBA's Favorite ThingsSQL Server 2016: Just a Few of Our DBA's Favorite Things
SQL Server 2016: Just a Few of Our DBA's Favorite Things
 
Benefits of the Azure cloud
Benefits of the Azure cloudBenefits of the Azure cloud
Benefits of the Azure cloud
 
SQL - Parallel Data Warehouse (PDW)
SQL - Parallel Data Warehouse (PDW)SQL - Parallel Data Warehouse (PDW)
SQL - Parallel Data Warehouse (PDW)
 
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
 
Should I move my database to the cloud?
Should I move my database to the cloud?Should I move my database to the cloud?
Should I move my database to the cloud?
 
Power BI Made Simple
Power BI Made SimplePower BI Made Simple
Power BI Made Simple
 
SQL Server on Linux - march 2017
SQL Server on Linux - march 2017SQL Server on Linux - march 2017
SQL Server on Linux - march 2017
 
What's new in SQL Server 2016
What's new in SQL Server 2016What's new in SQL Server 2016
What's new in SQL Server 2016
 

Ähnlich wie Modernizing Your Data Warehouse using APS

Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014Hortonworks
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Martin Bém
 
Testing Big Data: Automated Testing of Hadoop with QuerySurge
Testing Big Data: Automated  Testing of Hadoop with QuerySurgeTesting Big Data: Automated  Testing of Hadoop with QuerySurge
Testing Big Data: Automated Testing of Hadoop with QuerySurgeRTTS
 
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudBring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudDataWorks Summit
 
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformModernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformHortonworks
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKRajesh Jayarman
 
Hd insight overview
Hd insight overviewHd insight overview
Hd insight overviewvhrocca
 
How does Microsoft solve Big Data?
How does Microsoft solve Big Data?How does Microsoft solve Big Data?
How does Microsoft solve Big Data?James Serra
 
Testing Big Data: Automated ETL Testing of Hadoop
Testing Big Data: Automated ETL Testing of HadoopTesting Big Data: Automated ETL Testing of Hadoop
Testing Big Data: Automated ETL Testing of HadoopRTTS
 
Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...
Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...
Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...Impetus Technologies
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Group
 
Opportunity: Data, Analytic & Azure
Opportunity: Data, Analytic & Azure Opportunity: Data, Analytic & Azure
Opportunity: Data, Analytic & Azure Abhimanyu Singhal
 
Hitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop SolutionHitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop SolutionHitachi Vantara
 
Big data talking stories in Healthcare
Big data talking stories in Healthcare Big data talking stories in Healthcare
Big data talking stories in Healthcare Mostafa
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionJames Serra
 
Hadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the ExpertsHadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the ExpertsDataWorks Summit/Hadoop Summit
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarRTTS
 

Ähnlich wie Modernizing Your Data Warehouse using APS (20)

Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
 
Testing Big Data: Automated Testing of Hadoop with QuerySurge
Testing Big Data: Automated  Testing of Hadoop with QuerySurgeTesting Big Data: Automated  Testing of Hadoop with QuerySurge
Testing Big Data: Automated Testing of Hadoop with QuerySurge
 
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudBring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
 
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformModernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RK
 
Hd insight overview
Hd insight overviewHd insight overview
Hd insight overview
 
How does Microsoft solve Big Data?
How does Microsoft solve Big Data?How does Microsoft solve Big Data?
How does Microsoft solve Big Data?
 
Testing Big Data: Automated ETL Testing of Hadoop
Testing Big Data: Automated ETL Testing of HadoopTesting Big Data: Automated ETL Testing of Hadoop
Testing Big Data: Automated ETL Testing of Hadoop
 
Azure Big data
Azure Big data Azure Big data
Azure Big data
 
Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...
Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...
Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
 
Opportunity: Data, Analytic & Azure
Opportunity: Data, Analytic & Azure Opportunity: Data, Analytic & Azure
Opportunity: Data, Analytic & Azure
 
Hitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop SolutionHitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop Solution
 
Big data talking stories in Healthcare
Big data talking stories in Healthcare Big data talking stories in Healthcare
Big data talking stories in Healthcare
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
 
Hadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the ExpertsHadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the Experts
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing Webinar
 

Mehr von Stéphane Fréchette

Back to the future - Temporal Table in SQL Server 2016
Back to the future - Temporal Table in SQL Server 2016Back to the future - Temporal Table in SQL Server 2016
Back to the future - Temporal Table in SQL Server 2016Stéphane Fréchette
 
Self-Service Data Integration with Power Query - SQLSaturday #364 Boston
Self-Service Data Integration with Power Query - SQLSaturday #364 Boston  Self-Service Data Integration with Power Query - SQLSaturday #364 Boston
Self-Service Data Integration with Power Query - SQLSaturday #364 Boston Stéphane Fréchette
 
Power BI - Bring your data together
Power BI - Bring your data togetherPower BI - Bring your data together
Power BI - Bring your data togetherStéphane Fréchette
 
Data Analytics with R and SQL Server
Data Analytics with R and SQL ServerData Analytics with R and SQL Server
Data Analytics with R and SQL ServerStéphane Fréchette
 
Self-Service Data Integration with Power Query
Self-Service Data Integration with Power QuerySelf-Service Data Integration with Power Query
Self-Service Data Integration with Power QueryStéphane Fréchette
 
Le journalisme de données... par où commencer?
Le journalisme de données... par où commencer?Le journalisme de données... par où commencer?
Le journalisme de données... par où commencer?Stéphane Fréchette
 
Graph Databases for SQL Server Professionals - SQLSaturday #350 Winnipeg
Graph Databases for SQL Server Professionals - SQLSaturday #350 WinnipegGraph Databases for SQL Server Professionals - SQLSaturday #350 Winnipeg
Graph Databases for SQL Server Professionals - SQLSaturday #350 WinnipegStéphane Fréchette
 
Graph Databases for SQL Server Professionals
Graph Databases for SQL Server ProfessionalsGraph Databases for SQL Server Professionals
Graph Databases for SQL Server ProfessionalsStéphane Fréchette
 
SQL Server 2014 Faster Insights from Any Data
SQL Server 2014 Faster Insights from Any DataSQL Server 2014 Faster Insights from Any Data
SQL Server 2014 Faster Insights from Any DataStéphane Fréchette
 
On the move with Big Data (Hadoop, Pig, Sqoop, SSIS...)
On the move with Big Data (Hadoop, Pig, Sqoop, SSIS...)On the move with Big Data (Hadoop, Pig, Sqoop, SSIS...)
On the move with Big Data (Hadoop, Pig, Sqoop, SSIS...)Stéphane Fréchette
 
Introduction to Master Data Services in SQL Server 2012
Introduction to Master Data Services in SQL Server 2012Introduction to Master Data Services in SQL Server 2012
Introduction to Master Data Services in SQL Server 2012Stéphane Fréchette
 
Data Quality Services in SQL Server 2012
Data Quality Services in SQL Server 2012Data Quality Services in SQL Server 2012
Data Quality Services in SQL Server 2012Stéphane Fréchette
 
Business Intelligence in Excel 2013
Business Intelligence in Excel 2013Business Intelligence in Excel 2013
Business Intelligence in Excel 2013Stéphane Fréchette
 
Gatineau Ouverte troisième rencontre publique
Gatineau Ouverte troisième rencontre publiqueGatineau Ouverte troisième rencontre publique
Gatineau Ouverte troisième rencontre publiqueStéphane Fréchette
 
Gatineau Ouverte première rencontre publique
Gatineau Ouverte première rencontre publiqueGatineau Ouverte première rencontre publique
Gatineau Ouverte première rencontre publiqueStéphane Fréchette
 

Mehr von Stéphane Fréchette (18)

Back to the future - Temporal Table in SQL Server 2016
Back to the future - Temporal Table in SQL Server 2016Back to the future - Temporal Table in SQL Server 2016
Back to the future - Temporal Table in SQL Server 2016
 
Self-Service Data Integration with Power Query - SQLSaturday #364 Boston
Self-Service Data Integration with Power Query - SQLSaturday #364 Boston  Self-Service Data Integration with Power Query - SQLSaturday #364 Boston
Self-Service Data Integration with Power Query - SQLSaturday #364 Boston
 
Power BI - Bring your data together
Power BI - Bring your data togetherPower BI - Bring your data together
Power BI - Bring your data together
 
Data Analytics with R and SQL Server
Data Analytics with R and SQL ServerData Analytics with R and SQL Server
Data Analytics with R and SQL Server
 
Self-Service Data Integration with Power Query
Self-Service Data Integration with Power QuerySelf-Service Data Integration with Power Query
Self-Service Data Integration with Power Query
 
Introduction to Azure HDInsight
Introduction to Azure HDInsightIntroduction to Azure HDInsight
Introduction to Azure HDInsight
 
Le journalisme de données... par où commencer?
Le journalisme de données... par où commencer?Le journalisme de données... par où commencer?
Le journalisme de données... par où commencer?
 
Graph Databases for SQL Server Professionals - SQLSaturday #350 Winnipeg
Graph Databases for SQL Server Professionals - SQLSaturday #350 WinnipegGraph Databases for SQL Server Professionals - SQLSaturday #350 Winnipeg
Graph Databases for SQL Server Professionals - SQLSaturday #350 Winnipeg
 
Graph Databases for SQL Server Professionals
Graph Databases for SQL Server ProfessionalsGraph Databases for SQL Server Professionals
Graph Databases for SQL Server Professionals
 
SQL Server 2014 Faster Insights from Any Data
SQL Server 2014 Faster Insights from Any DataSQL Server 2014 Faster Insights from Any Data
SQL Server 2014 Faster Insights from Any Data
 
On the move with Big Data (Hadoop, Pig, Sqoop, SSIS...)
On the move with Big Data (Hadoop, Pig, Sqoop, SSIS...)On the move with Big Data (Hadoop, Pig, Sqoop, SSIS...)
On the move with Big Data (Hadoop, Pig, Sqoop, SSIS...)
 
TEDxGatineau
TEDxGatineau TEDxGatineau
TEDxGatineau
 
Power BI
Power BIPower BI
Power BI
 
Introduction to Master Data Services in SQL Server 2012
Introduction to Master Data Services in SQL Server 2012Introduction to Master Data Services in SQL Server 2012
Introduction to Master Data Services in SQL Server 2012
 
Data Quality Services in SQL Server 2012
Data Quality Services in SQL Server 2012Data Quality Services in SQL Server 2012
Data Quality Services in SQL Server 2012
 
Business Intelligence in Excel 2013
Business Intelligence in Excel 2013Business Intelligence in Excel 2013
Business Intelligence in Excel 2013
 
Gatineau Ouverte troisième rencontre publique
Gatineau Ouverte troisième rencontre publiqueGatineau Ouverte troisième rencontre publique
Gatineau Ouverte troisième rencontre publique
 
Gatineau Ouverte première rencontre publique
Gatineau Ouverte première rencontre publiqueGatineau Ouverte première rencontre publique
Gatineau Ouverte première rencontre publique
 

Kürzlich hochgeladen

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 

Kürzlich hochgeladen (20)

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 

Modernizing Your Data Warehouse using APS

  • 1. Modernizing Your Data Warehouse using APS Big data. Small data. All data. Stéphane Fréchette - SQL Server MVP - @sfrechette Database / Business Intelligence Solution Architect
  • 2. - Gartner, “The State of Data Warehousing in 2012”
  • 3. Increasing data volumes 1 Real-time data 2 New data sources and types 3 4 Cloud-born data Data sources
  • 4.  The modern data warehouse Data sources Non-relational data
  • 5. Insights from all your data Enrich and optimize your data from non-traditional sources 5
  • 6. Roadblocks to a modern data warehouse Keep legacy investment Buy new tier-one hardware appliance Acquire Big Data solution Acquire business intelligence Limited scalability and ability to handle new data types Significant training and data silos High acquisition and migration costs Complex with low adoption
  • 7. Introducing the Microsoft Analytics Platform System The turnkey modern data warehouse appliance • Relational and non-relational data in a single appliance • Enterprise-ready Hadoop • Integrated querying across Hadoop and PDW using T-SQL • Direct integration with Microsoft BI tools such as Microsoft Excel • Near real-time performance with In-Memory Columnstore • Ability to scale out to accommodate growing data • Removal of data warehouse bottlenecks with MPP SQL Server • Concurrency that fuels rapid adoption • Industry’s lowest data warehouse appliance price per terabyte • Value through a single appliance solution • Value with flexible hardware options using commodity hardware
  • 8. Microsoft Analytics Platform System The turnkey modern data warehouse appliance
  • 9. Evolution in the nature and use of data in the enterprise Data complexity: variety and velocity Petabytes Historical analysis Insight analysis Predictive analytics Predictive forecasting Value to the business
  • 10. What is Hadoop? Microsoft Confidential 10 OPERATIONAL SERVICES AMBARI Core Services DATA SERVICES MAP REDUCE HDFS FLUME SQOOP LOAD & EXTRACT NFS WebHDFS OOZIE YARN HIVE & HCATALOG PIG FALCON HBASE Hadoop Cluster compute & . . . storage . . . . . compute & storage . . Hadoop clusters provide scale-out storage and distributed data processing on commodity hardware
  • 11. Manageable, secured, and highly available Hadoop integrated into the appliance High performance and tuned within the appliance End-user authentication with Active Directory Accessible insights for everyone with Microsoft BI tools Managed and monitored using System Center 100-percent Apache Hadoop SQL Server Parallel Data Warehouse PolyBase Microsoft HDInsight
  • 12. Parallel Data Warehouse workload HDInsight workload Fabric Hardware Appliance A region is a logical container within an appliance Each workload contains the following boundaries: • Security • Metering • Servicing
  • 13. Bringing Hadoop point solutions and the data warehouse together for users and IT Provides a single T-SQL query model for PDW and Hadoop with rich features of T-SQL, including joins without ETL Uses the power of MPP to enhance query execution performance Supports Windows Azure HDInsight to enable new hybrid cloud scenarios Provides the ability to query non-Microsoft Hadoop distributions, such as Hortonworks and Cloudera SQL Server Parallel Data Warehouse Microsoft Azure HDInsight PolyBase Microsoft HDInsight Hortonworks for Windows and Linux Cloudera Select… Result set
  • 14. Results Direct and parallelized HDFS access Enhancing the Data Movement Service (DMS) of APS to allow direct communication between HDFS data nodes and PDW compute nodes Non-relational data Social apps Sensor and RFID Mobile apps Web apps Hadoop Relational data Traditional schema-based data warehouse applications Regular T-SQL External table External data source External file format Enhanced PDW query engine HDFS bridge PDW
  • 15. Hadoop / Data Lake (Cloudera, Hortonworks, HDInsight) Source systems Day / Hour / Minute Refresh SQL Server Data Marts SQL Server Reporting Services SQL Server Analytics / Ad-hoc / Visualization MapReduce T-SQL SQL Server Parallel Data Warehouse PolyBase Microsoft HDInsight Analysis Services APS
  • 16. HDFS File / Directory //hdfs/social_media/twitter //hdfs/social_media/twitter/Daily.log 1 0 Hadoop Dynamic binding Column filtering Row filtering User Location Product Sentiment Rtwt Hour Date Sean Audie Suz Tom Sanjay Roger Steve CA CO WA IL MN TX AL xbox excel xbox sqls wp8 ssas ssrs -1 0 1 1 1 1 5 0 8 0 0 0 8 8 2 2 1 23 23 5-15-14 5-15-14 5-15-14 5-13-14 5-14-14 5-14-14 5-13-14 SELECT User, Product, Sentiment FROM Twitter_Table WHERE Hour = Current - 1 AND Date = Today AND Sentiment >= 0
  • 17. Improve APS operations by extending PolyBase HDFS file formats Textfile and RCFile support • Microsoft Azure HDInsight • HDInsight on APS • Hortonworks Data Platform 1.3 and 2.0 (Linux/Windows Server) • Cloudera Linux 4.3 Security and permission model External table source and file format syntax Microsoft Azure Storage Blobs AU1 PolyBase v2 Analytics Platform System (powered by PolyBase)
  • 18. Big Data insights for anyone New insights with familiar tools through native Microsoft BI integration Minimizes IT intervention for discovering data with tools such as Microsoft Excel Enables DBA and power users to join relational and Hadoop data with T-SQL Takes advantage of high adoption of Excel, Power View, PowerPivot, and SQL Server Analysis Services Offers Hadoop tools like MapReduce, Hive, and Pig for data scientists Everyone else using Microsoft BI tools Power users Data scientist
  • 19. CREATE EXTERNAL TABLE table_name ({<column_definition>}[,..n ]) {WITH ( DATA_SOURCE = <data_source>, FILE_FORMAT = <file_format>, LOCATION =‘<file_path>’, [REJECT_VALUE = <value>], …)}; 1 Referencing external data source 2 Referencing external file format 3 Path of the Hadoop file/folder 4 (Optional) Reject parameters
  • 20. CREATE EXTERNAL DATA SOURCE datasource_name {WITH ( TYPE = <data_source>, LOCATION =‘<location>’, [JOB_TRACKER_LOCATION = ‘<jb_location>’] }; 1 Type of external data source 2 Location of external data source Enabling or disabling of MapReduce job generation 3
  • 21. CREATE EXTERNAL FILE FORMAT fileformat_name {WITH ( FORMAT_TYPE = <type>, [SERDE_METHOD = ‘<sede_method>’,] [DATA_COMPRESSION = ‘<compr_method>’, [FORMAT_OPTIONS (<format_options>)] }; 1 Type of external data source 2 (De)Serialization method [Hive RCFile] 3 Compression method 4 (Optional) Format Options [Text Files]
  • 22. <Format Options> :: = [,FIELD_TERMINATOR = ‘value’], [,STRING_DELIMITER = ‘value’], [,DATE_FORMAT = ‘value’], [USE_TYPE_DEFAULT = ‘value’] 1 Column delimiter 2 Delimiter for string data types 3 To specify a particular date format 4 How missing entries are handled
  • 23. Bringing islands of Hadoop data together Running high performance queries against Hadoop data Archiving data warehouse data to Hadoop (move) Exporting relational data to Hadoop (copy) Importing Hadoop data into a data warehouse (copy)
  • 24. Microsoft Analytics Platform System The turnkey modern data warehouse appliance
  • 25. Scale up Rowstore Diminishing scale as requirements grow Data Querying data by row Page 1 Page 2 Page 3 C1 C2 C3 C4 R1 R1 R1 R1 R2 R2 R2 R2 R3 R3 R3 R3 R4 R4 R4 R4 R5 R5 R5 R5 R6 R6 R6 R6 Sub-optimal performance for many data warehouse queries Forklift Forklift
  • 26. Scale out Multiple nodes with dedicated CPU, memory, and storage Ability to incrementally add hardware for near-linear scale to multiple petabytes Ability to handle query complexity and concurrency at scale No “forklift” of prior warehouse to increase capacity Ability to scale out HDInsight and PDW Scaling out your data to petabytes Scale-out technologies in the Analytics Platform System PDW / HDInsight PDW / HDInsight PDW / HDInsight PDW PDW / HDInsight PDW / HDInsight PDW / HDInsight 0 terabytes 6 petabytes
  • 27. Blazing-fast performance MPP and In-Memory Columnstore for next-generation performance Up to 100x faster queries Updateable clustered columnstore vs. table with customary indexing • Store data in columnar format for massive compression • Load data into or out of memory for next-generation performance with up to 60% improvement in data loading speed • Updateable and clustered for real-time trickle loading Up to 15x more compression Columnstore index representation Parallel query execution Query Results
  • 28. Why is a clustered columnstore index important? • Saves space • Provides easier management by eliminating maintenance of secondary indexes • Supports all PDW data types, including high-precision decimal data types and more Space used in GB (table with 101 million rows) Space used = table space + index space 20.0 15.0 10.0 5.0 0.0 91% savings 1 2 3 4 5 6 In-Memory Columnstore is featured in the storage engine in PDW AU1
  • 29. Relational query execution processing 1 SQL queries sent to control node Control node creates query execution plan 2 Query plan creates distributed queries to run on each compute node 3 Distributed queries sent to compute nodes (all running in parallel) 4 Control node collects query results and returns them to user 5 Create query plan User query Client Control Compute Compute Compute Compute Appliance Management Query results Aggregate query results Compute nodes process query plan operations in parallel
  • 30. SQL Server SMP Reporting and cubes BI Tools Great performance with mixed workloads Analytics Platform System ETL/ELT with SSIS, DQS, MDS ERP CRM LOB APPS ETL/ELT with DWLoader Hadoop / Big Data PDW PolyBase HDInsight Ad hoc queries Intra-Day Near real-time Fast ad hoc Columnstore Polybase CRTAS Link Table Real-Time ROLAP / MOLAP DirectQuery SNAC
  • 31. Microsoft Analytics Platform System The turnkey modern data warehouse appliance
  • 32. High performance using commodity hardware Price per terabyte for leading vendors Significantly lower price per terabyte than the closest competitor Price per terabyte for user-available storage (compressed) NOTE: Orange line indicates average price per terabyte. Thousands Oracle EMC IBM Teradata Microsoft $30 $25 $20 $15 $10 $5 $0 Lower storage costs with Windows Server 2012 Storage Spaces
  • 33. Hardware and software engineered together The ease of an appliance Co-engineered with HP, Dell, and Quanta best practices Leading performance with commodity hardware Integrated support plan with a single Microsoft PDW contact Pre-configured, built, and tuned software and hardware PolyBase HDInsight
  • 34. Hardware architecture InfiniBand InfiniBand PDW region Ethernet Ethernet Control node Failover node Master node Failover node Compute nodes Economical disk storage Compute nodes Economical disk storage Compute nodes Economical disk storage Networking HDInsight region PDW region Rack #1 InfiniBand InfiniBand Ethernet Ethernet Failover node Compute nodes Economical disk storage Compute nodes Economical disk storage Compute nodes Economical disk storage HDI extension base unit HDI active scale unit HDI active scale unit HDI extension base unit Rack #2 HST-01 HST-02 HSA-01 HST-02 Economical disk storage IB and Ethernet Active Unit Addition of two or three compute nodes depending on OEM hardware configuration and related storage Passive Unit Host for non-worker HDInsight nodes Failover Node High availability for the rack
  • 35. • PDW engine • DMS Manager • SQL Server 2012 Enterprise Edition (PDW build) Base Unit C T L Host 1 Host 2 Host 3 Host 4 Economical disk storage IB and Ethernet Direct attached SAS M A D A D V M M Compute 1 Compute 2 Software details • All hosts run Windows Server 2012 Standard and Windows Azure Virtual Machines • Fabric or workload in Hyper-V Virtual Machines • Fabric virtual machine, management server (MAD01), and control server (CTL) share one server • PDW agent that runs on all hosts and all virtual machines • DWConfig and Admin Console • Windows Storage Spaces and Azure Storage blobs
  • 36. CT Base Unit L Host 1 Host 1 Host 2 Host 3 Host 4 Economical disk storage IB and Ethernet Direct attached SAS M AD A D V M M Compute 1 Compute 1 Compute 2 Host 5 Passive Unit 2 Base Unit CT L M AD FA B AD V M M Compute 1 CT L Virtual machine migration can be used to move workload nodes to new hosts after hardware failure Cluster Shared Volumes • Enable all nodes to access logical unit numbers (LUNs) on economical disk storage • Use Server Message Block (SMB3) protocol Failover capabilities • Uses one cluster across the whole appliance • Automatically migrates virtual machines on host failure • Enforces rules with affinity and anti-affinity maps • Uses Windows Failover Cluster Manager