Session presented at Big Data Spain 2012 Conference
16th Nov 2012
ETSI Telecomunicacion UPM Madrid
www.bigdataspain.org
More info: http://www.bigdataspain.org/es-2012/conference/building-a-heterogeneous-hadoop-olap-system-with-microsoft-bi-stack/pablo-doval-and-ibon-landa
2. WHO…
… AM I?
• SQL/BI Team Lead at Plain Concepts
• e-mail: pablod@plainconcepts.com
• Blog: http://geek.ms/blogs/palvarez
• Twitter: @PabloDoval
… ARE YOU?
• Quick Poll in the Room
6. SHARP
Overview
SCADA Historical Analysis and Reporting Platform
Demonstrate the feasibility of a custom end to end global
architecture:
• SCADA: Local, Mobile and Central
• Historical Data: High speed and High volume
• Reporting
• Analysis
7. SHARP
MAGUS
MongoDB
MongoDB Capped collections
Capped collections For each Production Center
MAGUS
2 months of 1s data
MAGUS 2 months of 1s data
Central 1 year of 10m data
1 year of 10m data
MAGUS
Local Operation
Mobile Operation
Production Center A
MAGUS
Remote Operation
MongoDB
Capped collections
MAGUS
2 months of 1s data
1 year of 10m data
Mongo
MAGUS DAT Files
Export
Local Operation
Mobile Operation
Production Center B
Production Centers Central
8. SHARP
Historical Data
MAGUS
MAGUS Mongo
Central Export
Source 1
Loader DAT
Source2 DAT
Loader DAT
Source3
DAT
Loader
DWH
Hadoop
Source4
Loader
DAT
Source5 Loader DAT
DAT
DAT
Loader
Source6
Loader
Source7
Production Centers Central
9. SHARP
Analysis and Reporting
Events
Power
Pivot
DWH
StreamInsight
Microsoft
Office
• Dynamic reports
Reporting
• Scheduled reports
Services • Automatic Distribution
OLAP • Multiformat (PDF, XLS, etc.)
Tabular
Power View
OLAP
Tabular
Future
¿Cloud?
Production Centers Central
10. INITIAL ASSESMENT
Proof of
Concept
Microsoft
Ecosystem
On Premise
Infrastructure
21. IMPROV. TO HIGHER RESOLUTION
Sqoop with PDW…
DATA
Map/
Sqoop Reduce
Job
…
SQL Server
SQL Server SQL Server SQL Server
22. IMPROV. TO HIGHER RESOLUTION
DATA
Sqoop refresher…
…
SQL Server
SQL Server SQL Server SQL Server
Sqoop
Hadoop Cluster
23. IMPROV. TO HIGHER RESOLUTION
The Goal – Polybase!
DATA
Ability to work with data in DW and Hive seamlessly and in a
performant way.
T-SQL Queries
SQL Server
(PDW)
SQL HDF
24. IMPROV. TO HIGHER RESOLUTION
DATA
Polybase parallelism via DMS
…
SQL Server
SQL Server SQL Server SQL Server
Hadoop Cluster
26. IMPROV. TO HIGHER RESOLUTION
That’s just the beginning…
DATA
Uses the same T-SQL Syntax to query both worlds at the same
time
The QO is able to check what data to push into what
environment to process optimally.
27. STORIES WE COULD TELL
What went right…
Cloud Environment
Tabular Model for OLAP
SSIS for ETL via ODBC Hive Driver
28. STORIES WE COULD TELL
What was not so good…
Mappers and Reducers in C# via Hadoop Streaming
29. CALL TO ACTION
LEARN MORE
1. Microsoft Big Data Solution: www.microsoft.com/bigdata
2. Windows Azure: www.windowsazure.com/en-
us/home/scenarios/big-data
TRY NOW
1. Preview of the Windows Azure HDInsight Service:
https://www.hadooponazure.com
2. Developer CTP of Microsoft HDInsight Server for Windows Server:
http://www.microsoft.com/bigdata
Hinweis der Redaktion
Usual presentation and contactstuf…Greet Ibon, he couldn’tmakeitto Madrid.Threequestions: - Are youengaged in any Hadoop projects? - HaveyouplayedwithMicrosoft’s Hadoop Distribution - Didyouknowtherewas a Microsoft’s Hadoop Distribution? ;)Microsoft’s Big Data IncubationProgram.
Development as a Proof of Concept allowsfor new scenariosto be thought and developed in futureiterationswith mínimum risk.Wewouldstartwith a 10Min data storage and DataWarehouse, and 1Min data storage. Thenanalytical proceses.