SlideShare ist ein Scribd-Unternehmen logo
1 von 30
BUILDING AN HETEROGENEOUS
 HADOOP/OLAP SYSTEM WITH
    MICROSOFT'S BI STACK
WHO…

… AM I?
 •   SQL/BI Team Lead at Plain Concepts
 •   e-mail: pablod@plainconcepts.com
 •   Blog: http://geek.ms/blogs/palvarez
 •   Twitter: @PabloDoval

… ARE YOU?
 •   Quick Poll in the Room 
WHAT…

… ARE WE GOING TO SEE?




… I’M NOT GOING TO SHOW?
SOME PICS…
SHARP
Overview



 SCADA Historical Analysis and Reporting Platform


 Demonstrate the feasibility of a custom end to end global
 architecture:
 • SCADA: Local, Mobile and Central
 • Historical Data: High speed and High volume
 • Reporting
 • Analysis
SHARP
  MAGUS
                                                                           MongoDB
          MongoDB                                                          Capped collections
          Capped collections                                               For each Production Center
MAGUS
        2 months of 1s data
                                                                MAGUS      2 months of 1s data
                                                                 Central   1 year of 10m data
        1 year of 10m data


          MAGUS
          Local Operation
          Mobile Operation


           Production Center A
                                                                           MAGUS
                                                                           Remote Operation
           MongoDB
           Capped collections
MAGUS
           2 months of 1s data
           1 year of 10m data

                                                                 Mongo
           MAGUS                                                                 DAT Files
                                                                 Export
           Local Operation
           Mobile Operation

           Production Center B

                                 Production Centers   Central
SHARP
Historical Data
 MAGUS
                           MAGUS     Mongo
                           Central   Export

           Source 1



                           Loader             DAT
 Source2                                      DAT
                           Loader             DAT
           Source3
                                              DAT

                           Loader



                                                             DWH
                                                    Hadoop
 Source4
                           Loader
                                              DAT
           Source5         Loader             DAT
                                              DAT
                                              DAT
                           Loader
 Source6


                           Loader
           Source7




  Production Centers   Central
SHARP
    Analysis and Reporting
                                                   Events




                                                            Power
                                                            Pivot

                                     DWH
                     StreamInsight


                                               Microsoft
                                                Office




                                                            •   Dynamic reports
                                               Reporting
                                                            •   Scheduled reports
                                                Services    •   Automatic Distribution
                                     OLAP                   •   Multiformat (PDF, XLS, etc.)
                                     Tabular



                                               Power View

                                     OLAP
                                     Tabular
                                                             Future
                                                            ¿Cloud?



Production Centers   Central
INITIAL ASSESMENT

  Proof of
  Concept

  Microsoft
 Ecosystem


 On Premise
Infrastructure
TOOLS OF THE TRADE




                     PowerPivot




                     Power View
SO… WHAT DOES IT LOOK LIKE?
CURRENT SHARP IMPLEMENTATION

                      Map
                     Reduce




              HDFS
 Load
Service
                     HIVE               DWH
           Hadoop
  Azure
 Storage


                               SSIS




                              SSRS PowerView
LET’S TAKE A DEEPER LOOK…
FUTURE IMPROVEMENTS

New Analytical Processes


CEP Integration with Stream Insight


Improvements on the Higher Resolution data
COMPLEX EVENT PROCESSING
    StreamInsight
                                                   Events




                                                            Power
                                                            Pivot

                                     DWH
                     StreamInsight


                                               Microsoft
                                                Office




                                                            •   Dynamic reports
                                               Reporting
                                                            •   Scheduled reports
                                                Services    •   Automatic Distribution
                                     OLAP                   •   Multiformat (PDF, XLS, etc.)
                                     Tabular



                                               Power View

                                     OLAP
                                     Tabular
                                                             Future
                                                            ¿Cloud?



Production Centers   Central
COMPLEX EVENT PROCESSING
    StreamInsight
                                     Events




                     StreamInsight




Production Centers   Central
IMPROV. TO HIGHER RESOLUTION
DATA
The Goal

Ability to work with data in DW and Hive seamlessly and in a
performant way.




                                       Export
IMPROV. TO HIGHER RESOLUTION
DATA
Sqoop Refresher
IMPROV. TO HIGHER RESOLUTION
Sqoop with PDW…
DATA
                                Map/
        Sqoop                  Reduce
                                 Job




                                              …
     SQL Server
                  SQL Server     SQL Server       SQL Server
IMPROV. TO HIGHER RESOLUTION
DATA
Sqoop refresher…


                                             …
     SQL Server
                   SQL Server   SQL Server       SQL Server




                                Sqoop




  Hadoop Cluster
IMPROV. TO HIGHER RESOLUTION
The Goal – Polybase!
DATA
Ability to work with data in DW and Hive seamlessly and in a
performant way.
                       T-SQL Queries




                SQL Server
                  (PDW)
            SQL HDF
IMPROV. TO HIGHER RESOLUTION
DATA
Polybase parallelism via DMS


                                             …
     SQL Server
                   SQL Server   SQL Server       SQL Server




  Hadoop Cluster
IMPROV. TO HIGHER RESOLUTION
DATA
Parallelism
IMPROV. TO HIGHER RESOLUTION
That’s just the beginning…
DATA

Uses the same T-SQL Syntax to query both worlds at the same
time


The QO is able to check what data to push into what
environment to process optimally.
STORIES WE COULD TELL
What went right…
 Cloud Environment


 Tabular Model for OLAP


 SSIS for ETL via ODBC Hive Driver
STORIES WE COULD TELL
What was not so good…
 Mappers and Reducers in C# via Hadoop Streaming
CALL TO ACTION
LEARN MORE
 1.   Microsoft Big Data Solution: www.microsoft.com/bigdata
 2.   Windows Azure: www.windowsazure.com/en-
      us/home/scenarios/big-data

TRY NOW
 1. Preview of the Windows Azure HDInsight Service:
      https://www.hadooponazure.com

 2. Developer CTP of Microsoft HDInsight Server for Windows Server:
      http://www.microsoft.com/bigdata
Building a heterogeneous Hadoop Olap system with Microsoft BI stack. PABLO DOVAL & IBON LANDA at Big Data Spain 2012

Weitere ähnliche Inhalte

Was ist angesagt?

Hadoop World 2011: Building Scalable Data Platforms ; Hadoop & Netezza Deploy...
Hadoop World 2011: Building Scalable Data Platforms ; Hadoop & Netezza Deploy...Hadoop World 2011: Building Scalable Data Platforms ; Hadoop & Netezza Deploy...
Hadoop World 2011: Building Scalable Data Platforms ; Hadoop & Netezza Deploy...Krishnan Parasuraman
 
SPS- Share Point 2010 and Windows Azure
SPS- Share Point 2010 and Windows AzureSPS- Share Point 2010 and Windows Azure
SPS- Share Point 2010 and Windows AzureShakir Majeed Khan
 
SQL Server 2008 Fast Track Data Warehouse
SQL Server 2008 Fast Track Data WarehouseSQL Server 2008 Fast Track Data Warehouse
SQL Server 2008 Fast Track Data WarehouseMark Ginnebaugh
 
Real-Time Loading to Sybase IQ
Real-Time Loading to Sybase IQReal-Time Loading to Sybase IQ
Real-Time Loading to Sybase IQSybase Türkiye
 
Liquidity Risk Management powered by SAP HANA
Liquidity Risk Management powered by SAP HANALiquidity Risk Management powered by SAP HANA
Liquidity Risk Management powered by SAP HANASAP Technology
 
Apache Hadoop Now Next and Beyond
Apache Hadoop Now Next and BeyondApache Hadoop Now Next and Beyond
Apache Hadoop Now Next and BeyondDataWorks Summit
 
SQL Server Reporting Services (SSRS) 101
 SQL Server Reporting Services (SSRS) 101 SQL Server Reporting Services (SSRS) 101
SQL Server Reporting Services (SSRS) 101Sparkhound Inc.
 
Sql server2008 r2_bi_datasheet_final
Sql server2008 r2_bi_datasheet_finalSql server2008 r2_bi_datasheet_final
Sql server2008 r2_bi_datasheet_finalKlaudiia Jacome
 

Was ist angesagt? (9)

Hadoop World 2011: Building Scalable Data Platforms ; Hadoop & Netezza Deploy...
Hadoop World 2011: Building Scalable Data Platforms ; Hadoop & Netezza Deploy...Hadoop World 2011: Building Scalable Data Platforms ; Hadoop & Netezza Deploy...
Hadoop World 2011: Building Scalable Data Platforms ; Hadoop & Netezza Deploy...
 
SPS- Share Point 2010 and Windows Azure
SPS- Share Point 2010 and Windows AzureSPS- Share Point 2010 and Windows Azure
SPS- Share Point 2010 and Windows Azure
 
Oracle BI Server By AORTA
Oracle BI Server By AORTAOracle BI Server By AORTA
Oracle BI Server By AORTA
 
SQL Server 2008 Fast Track Data Warehouse
SQL Server 2008 Fast Track Data WarehouseSQL Server 2008 Fast Track Data Warehouse
SQL Server 2008 Fast Track Data Warehouse
 
Real-Time Loading to Sybase IQ
Real-Time Loading to Sybase IQReal-Time Loading to Sybase IQ
Real-Time Loading to Sybase IQ
 
Liquidity Risk Management powered by SAP HANA
Liquidity Risk Management powered by SAP HANALiquidity Risk Management powered by SAP HANA
Liquidity Risk Management powered by SAP HANA
 
Apache Hadoop Now Next and Beyond
Apache Hadoop Now Next and BeyondApache Hadoop Now Next and Beyond
Apache Hadoop Now Next and Beyond
 
SQL Server Reporting Services (SSRS) 101
 SQL Server Reporting Services (SSRS) 101 SQL Server Reporting Services (SSRS) 101
SQL Server Reporting Services (SSRS) 101
 
Sql server2008 r2_bi_datasheet_final
Sql server2008 r2_bi_datasheet_finalSql server2008 r2_bi_datasheet_final
Sql server2008 r2_bi_datasheet_final
 

Ähnlich wie Building a heterogeneous Hadoop Olap system with Microsoft BI stack. PABLO DOVAL & IBON LANDA at Big Data Spain 2012

OpenSpan - A Better Way to Work, A Better Way to Manage
OpenSpan - A Better Way to Work, A Better Way to ManageOpenSpan - A Better Way to Work, A Better Way to Manage
OpenSpan - A Better Way to Work, A Better Way to ManageFrank Wagman
 
MEW22 22nd Machine Evaluation Workshop Microsoft
MEW22 22nd Machine Evaluation Workshop MicrosoftMEW22 22nd Machine Evaluation Workshop Microsoft
MEW22 22nd Machine Evaluation Workshop MicrosoftLee Stott
 
SnapLogic corporate presentation
SnapLogic corporate presentationSnapLogic corporate presentation
SnapLogic corporate presentationpbridges
 
SAP Sybase Event Streaming Processing
SAP Sybase Event Streaming ProcessingSAP Sybase Event Streaming Processing
SAP Sybase Event Streaming ProcessingSybase Türkiye
 
Hadoop - Now, Next and Beyond
Hadoop - Now, Next and BeyondHadoop - Now, Next and Beyond
Hadoop - Now, Next and BeyondTeradata Aster
 
CA Nimsoft xen desktop monitoring
CA Nimsoft xen desktop monitoring CA Nimsoft xen desktop monitoring
CA Nimsoft xen desktop monitoring CA Nimsoft
 
Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...
Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...
Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...Cloudera, Inc.
 
Jaspersoft Dashboards Webinar Feb 2013
Jaspersoft Dashboards Webinar  Feb 2013Jaspersoft Dashboards Webinar  Feb 2013
Jaspersoft Dashboards Webinar Feb 2013Mike Boyarski
 
21st Century SOA
21st Century SOA21st Century SOA
21st Century SOABob Rhubart
 
Sybase Complex Event Processing
Sybase Complex Event ProcessingSybase Complex Event Processing
Sybase Complex Event ProcessingSybase Türkiye
 
Con2012 salmon impact_of_sap_hana_revolutionary_changes_for_sap_controlling_data
Con2012 salmon impact_of_sap_hana_revolutionary_changes_for_sap_controlling_dataCon2012 salmon impact_of_sap_hana_revolutionary_changes_for_sap_controlling_data
Con2012 salmon impact_of_sap_hana_revolutionary_changes_for_sap_controlling_dataaadamserpcorp
 
Couchbase Server and IBM BigInsights: One + One = Three
Couchbase Server and IBM BigInsights: One + One = ThreeCouchbase Server and IBM BigInsights: One + One = Three
Couchbase Server and IBM BigInsights: One + One = ThreeDipti Borkar
 
21st Century Service Oriented Architecture
21st Century Service Oriented Architecture21st Century Service Oriented Architecture
21st Century Service Oriented ArchitectureBob Rhubart
 
Data Ingestion, Extraction & Parsing on Hadoop
Data Ingestion, Extraction & Parsing on HadoopData Ingestion, Extraction & Parsing on Hadoop
Data Ingestion, Extraction & Parsing on Hadoopskaluska
 
Controlling 2012 Impact of SAP HANA
Controlling 2012 Impact of SAP HANAControlling 2012 Impact of SAP HANA
Controlling 2012 Impact of SAP HANAJohn Jordan
 
Impact of in-memory technology and SAP HANA on your business, IT, and career
Impact of in-memory technology and SAP HANA on your business, IT, and careerImpact of in-memory technology and SAP HANA on your business, IT, and career
Impact of in-memory technology and SAP HANA on your business, IT, and careerVitaliy Rudnytskiy
 
How Service Mesh Fits into the Modern Data Stack
How Service Mesh Fits into the Modern Data StackHow Service Mesh Fits into the Modern Data Stack
How Service Mesh Fits into the Modern Data StackFabian Hardt
 

Ähnlich wie Building a heterogeneous Hadoop Olap system with Microsoft BI stack. PABLO DOVAL & IBON LANDA at Big Data Spain 2012 (20)

OpenSpan - A Better Way to Work, A Better Way to Manage
OpenSpan - A Better Way to Work, A Better Way to ManageOpenSpan - A Better Way to Work, A Better Way to Manage
OpenSpan - A Better Way to Work, A Better Way to Manage
 
MEW22 22nd Machine Evaluation Workshop Microsoft
MEW22 22nd Machine Evaluation Workshop MicrosoftMEW22 22nd Machine Evaluation Workshop Microsoft
MEW22 22nd Machine Evaluation Workshop Microsoft
 
SnapLogic corporate presentation
SnapLogic corporate presentationSnapLogic corporate presentation
SnapLogic corporate presentation
 
hadoop @ Ibmbigdata
hadoop @ Ibmbigdatahadoop @ Ibmbigdata
hadoop @ Ibmbigdata
 
Cloud computing era
Cloud computing eraCloud computing era
Cloud computing era
 
SAP Sybase Event Streaming Processing
SAP Sybase Event Streaming ProcessingSAP Sybase Event Streaming Processing
SAP Sybase Event Streaming Processing
 
Zh tw cloud computing era
Zh tw cloud computing eraZh tw cloud computing era
Zh tw cloud computing era
 
Hadoop - Now, Next and Beyond
Hadoop - Now, Next and BeyondHadoop - Now, Next and Beyond
Hadoop - Now, Next and Beyond
 
CA Nimsoft xen desktop monitoring
CA Nimsoft xen desktop monitoring CA Nimsoft xen desktop monitoring
CA Nimsoft xen desktop monitoring
 
Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...
Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...
Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...
 
Jaspersoft Dashboards Webinar Feb 2013
Jaspersoft Dashboards Webinar  Feb 2013Jaspersoft Dashboards Webinar  Feb 2013
Jaspersoft Dashboards Webinar Feb 2013
 
21st Century SOA
21st Century SOA21st Century SOA
21st Century SOA
 
Sybase Complex Event Processing
Sybase Complex Event ProcessingSybase Complex Event Processing
Sybase Complex Event Processing
 
Con2012 salmon impact_of_sap_hana_revolutionary_changes_for_sap_controlling_data
Con2012 salmon impact_of_sap_hana_revolutionary_changes_for_sap_controlling_dataCon2012 salmon impact_of_sap_hana_revolutionary_changes_for_sap_controlling_data
Con2012 salmon impact_of_sap_hana_revolutionary_changes_for_sap_controlling_data
 
Couchbase Server and IBM BigInsights: One + One = Three
Couchbase Server and IBM BigInsights: One + One = ThreeCouchbase Server and IBM BigInsights: One + One = Three
Couchbase Server and IBM BigInsights: One + One = Three
 
21st Century Service Oriented Architecture
21st Century Service Oriented Architecture21st Century Service Oriented Architecture
21st Century Service Oriented Architecture
 
Data Ingestion, Extraction & Parsing on Hadoop
Data Ingestion, Extraction & Parsing on HadoopData Ingestion, Extraction & Parsing on Hadoop
Data Ingestion, Extraction & Parsing on Hadoop
 
Controlling 2012 Impact of SAP HANA
Controlling 2012 Impact of SAP HANAControlling 2012 Impact of SAP HANA
Controlling 2012 Impact of SAP HANA
 
Impact of in-memory technology and SAP HANA on your business, IT, and career
Impact of in-memory technology and SAP HANA on your business, IT, and careerImpact of in-memory technology and SAP HANA on your business, IT, and career
Impact of in-memory technology and SAP HANA on your business, IT, and career
 
How Service Mesh Fits into the Modern Data Stack
How Service Mesh Fits into the Modern Data StackHow Service Mesh Fits into the Modern Data Stack
How Service Mesh Fits into the Modern Data Stack
 

Mehr von Big Data Spain

Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017Big Data Spain
 
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...Big Data Spain
 
AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017Big Data Spain
 
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017Big Data Spain
 
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...Big Data Spain
 
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...Big Data Spain
 
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...Big Data Spain
 
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...Big Data Spain
 
State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...Big Data Spain
 
Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...Big Data Spain
 
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...Big Data Spain
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a... The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...Big Data Spain
 
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...Big Data Spain
 
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017Big Data Spain
 
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...Big Data Spain
 
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...Big Data Spain
 
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...Big Data Spain
 
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...Big Data Spain
 
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...Big Data Spain
 
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017Big Data Spain
 

Mehr von Big Data Spain (20)

Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
 
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
 
AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017
 
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
 
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
 
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
 
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
 
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
 
State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...
 
Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...
 
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a... The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
 
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
 
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
 
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
 
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
 
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
 
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
 
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
 

Kürzlich hochgeladen

Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 

Kürzlich hochgeladen (20)

Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 

Building a heterogeneous Hadoop Olap system with Microsoft BI stack. PABLO DOVAL & IBON LANDA at Big Data Spain 2012

  • 1. BUILDING AN HETEROGENEOUS HADOOP/OLAP SYSTEM WITH MICROSOFT'S BI STACK
  • 2. WHO… … AM I? • SQL/BI Team Lead at Plain Concepts • e-mail: pablod@plainconcepts.com • Blog: http://geek.ms/blogs/palvarez • Twitter: @PabloDoval … ARE YOU? • Quick Poll in the Room 
  • 3. WHAT… … ARE WE GOING TO SEE? … I’M NOT GOING TO SHOW?
  • 4.
  • 6. SHARP Overview SCADA Historical Analysis and Reporting Platform Demonstrate the feasibility of a custom end to end global architecture: • SCADA: Local, Mobile and Central • Historical Data: High speed and High volume • Reporting • Analysis
  • 7. SHARP MAGUS MongoDB MongoDB Capped collections Capped collections For each Production Center MAGUS 2 months of 1s data MAGUS 2 months of 1s data Central 1 year of 10m data 1 year of 10m data MAGUS Local Operation Mobile Operation Production Center A MAGUS Remote Operation MongoDB Capped collections MAGUS 2 months of 1s data 1 year of 10m data Mongo MAGUS DAT Files Export Local Operation Mobile Operation Production Center B Production Centers Central
  • 8. SHARP Historical Data MAGUS MAGUS Mongo Central Export Source 1 Loader DAT Source2 DAT Loader DAT Source3 DAT Loader DWH Hadoop Source4 Loader DAT Source5 Loader DAT DAT DAT Loader Source6 Loader Source7 Production Centers Central
  • 9. SHARP Analysis and Reporting Events Power Pivot DWH StreamInsight Microsoft Office • Dynamic reports Reporting • Scheduled reports Services • Automatic Distribution OLAP • Multiformat (PDF, XLS, etc.) Tabular Power View OLAP Tabular Future ¿Cloud? Production Centers Central
  • 10. INITIAL ASSESMENT Proof of Concept Microsoft Ecosystem On Premise Infrastructure
  • 11. TOOLS OF THE TRADE PowerPivot Power View
  • 12.
  • 13. SO… WHAT DOES IT LOOK LIKE?
  • 14. CURRENT SHARP IMPLEMENTATION Map Reduce HDFS Load Service HIVE DWH Hadoop Azure Storage SSIS SSRS PowerView
  • 15. LET’S TAKE A DEEPER LOOK…
  • 16. FUTURE IMPROVEMENTS New Analytical Processes CEP Integration with Stream Insight Improvements on the Higher Resolution data
  • 17. COMPLEX EVENT PROCESSING StreamInsight Events Power Pivot DWH StreamInsight Microsoft Office • Dynamic reports Reporting • Scheduled reports Services • Automatic Distribution OLAP • Multiformat (PDF, XLS, etc.) Tabular Power View OLAP Tabular Future ¿Cloud? Production Centers Central
  • 18. COMPLEX EVENT PROCESSING StreamInsight Events StreamInsight Production Centers Central
  • 19. IMPROV. TO HIGHER RESOLUTION DATA The Goal Ability to work with data in DW and Hive seamlessly and in a performant way. Export
  • 20. IMPROV. TO HIGHER RESOLUTION DATA Sqoop Refresher
  • 21. IMPROV. TO HIGHER RESOLUTION Sqoop with PDW… DATA Map/ Sqoop Reduce Job … SQL Server SQL Server SQL Server SQL Server
  • 22. IMPROV. TO HIGHER RESOLUTION DATA Sqoop refresher… … SQL Server SQL Server SQL Server SQL Server Sqoop Hadoop Cluster
  • 23. IMPROV. TO HIGHER RESOLUTION The Goal – Polybase! DATA Ability to work with data in DW and Hive seamlessly and in a performant way. T-SQL Queries SQL Server (PDW) SQL HDF
  • 24. IMPROV. TO HIGHER RESOLUTION DATA Polybase parallelism via DMS … SQL Server SQL Server SQL Server SQL Server Hadoop Cluster
  • 25. IMPROV. TO HIGHER RESOLUTION DATA Parallelism
  • 26. IMPROV. TO HIGHER RESOLUTION That’s just the beginning… DATA Uses the same T-SQL Syntax to query both worlds at the same time The QO is able to check what data to push into what environment to process optimally.
  • 27. STORIES WE COULD TELL What went right… Cloud Environment Tabular Model for OLAP SSIS for ETL via ODBC Hive Driver
  • 28. STORIES WE COULD TELL What was not so good… Mappers and Reducers in C# via Hadoop Streaming
  • 29. CALL TO ACTION LEARN MORE 1. Microsoft Big Data Solution: www.microsoft.com/bigdata 2. Windows Azure: www.windowsazure.com/en- us/home/scenarios/big-data TRY NOW 1. Preview of the Windows Azure HDInsight Service: https://www.hadooponazure.com 2. Developer CTP of Microsoft HDInsight Server for Windows Server: http://www.microsoft.com/bigdata

Hinweis der Redaktion

  1. Usual presentation and contactstuf…Greet Ibon, he couldn’tmakeitto Madrid.Threequestions: - Are youengaged in any Hadoop projects? - HaveyouplayedwithMicrosoft’s Hadoop Distribution - Didyouknowtherewas a Microsoft’s Hadoop Distribution? ;)Microsoft’s Big Data IncubationProgram.
  2. Development as a Proof of Concept allowsfor new scenariosto be thought and developed in futureiterationswith mínimum risk.Wewouldstartwith a 10Min data storage and DataWarehouse, and 1Min data storage. Thenanalytical proceses.
  3. Show HDInsightService and HDInsight Server.