SlideShare ist ein Scribd-Unternehmen logo
1 von 42
Big Data mit Microsoft?
Wie HDInsight, SQL Server 2014
und Excel zusammenspielen
Olivia Klose, Technical Evangelist
Georg Urban, Sr. Technology Solution Professional
Microsoft Deutschland GmbH
The large hadron collider produces 15 PB/year*

http://public.web.cern.ch/public/en/lhc/Computing-en.html
But what if I don‟t
own a large hadron
collider …
 Large scale plants
 Vehicle fleets
 Smart Grids
 Green Energy
 Stock Exchanges
 Host Protocols
 Computer Centers
 Web Farms
 Twitter
 Facebook
 Google Analytics
 …
XML – but…
 polystructured
 varying
 no explicit schema
 lot„s of hex-BLOBs

40.000 attributes & growing

„here is my data“

</meldungText><antwort>False</antwort><wert>na</wert></meldung><steuergeraet
sgbdVariante="SMG_60"><steuergeraeteFunktion zeitstempel="2013-04-30T09:00:37.9926171-04:00
endDate="2013-04-30T09:00:38.1158609-04:00" jobName="STATUS_FAHRZEUGTESTER"><datensatz
satzNr="1"><result name="JOB_STATUS">OKAY</result><result name="_TEL_ANTWORT">80 F1 18
70 02 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 82 6B 00 6D 6B 39 CD 14 00 14 00 00 0E 00 15 00
00 19 00 0C 00 12 00 15 85 57 71 88 81 C0 7D 73 C2 08 01 05 02 F7 00 FF FF 01 73 00 00 02 A8 00 C2
01 E0 00 00 00 00 00 00 3D 01 00 00 00 01 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01
01 E1 02 05 01 F8 03 4F FF AD 04</result><result name="_TEL_AUFTRAG">83 18 F1 30 02
01</result><result name="STAT_KL15_ROH">0</result><result
name="STAT_KLR_EIN_ROH">0</result><result name="STAT_WAKE_UP_ROH">1</result><result
name="STAT_ISTGANG_TEXT">Neutral</result>
<sgFunktion zeitstempel=“2013-04-30T10:33:37.0834084+02:00" endDate="2013-0430T10:33:37.9310504+02:00" jobName="_FLM_LESEN_BOSCH"><datensatz satzNr="1"><result
name="FLM_DATEN_1">00 00 00 03 02 08 C6 56 46 4C 4D 39 00 16 4B B2 00 00 00 32 00 00 06 99 00
00 65 00 00 18 6E 00 00 00 73 00 00 00 20 00 00 00 73 00 00 00 00 00 00 10 69 00 00 0F 53 00 00 00
00 00 00 0A 00 00 79 6D 00 00 B7 34 00 00 D3 9E 4A 4C 41 52 00 00 00 00 00 00 00 00 00 00 00 00 0
00 00 00 00 00 2C 00 00 00 00 00 00 1A 5C 00 15 4B CA 00 00 44 08 00 00 2D 39 00 00 1E 45 00 00 2
00 00 1E EB 00 00 0C 65 00 00 04 47 00 00 00 00 00 00 00 00 00 00 00 04 00 00 00 27 00 00 01 1E 00
02 AB 00 00 07 71 00 00 13 D7 00 00 36 48 00 15 91 AD 00 00 3F 97 00 00 19 C1 00 00 07 F9 00 00 02
00 00 00 BD 00 00 00 20 00 16 1C 42 00 00 18 B1 00 00 09 40 00 00 08 9F 00 00 04 3A 00 00 01 3E 00
8C D7 00 00 61 A3 00 00 37 9D 00 00 1E 78 00 00 14 96 00 00 0A 71 00 00 05 49 00 00 02 B1 00 00 0
00 00 00 1D 00 00 00 09 00 00 00 05 00 00 00 00 00 00 00 00 00 00 23 BB 00 00 2F 84 00 00 14 EF 00
09 40 00 00 04 71 00 00 03 34 00 00 02 12 00 00 01 AC 00 00 01 59 00 00 0B C4 00 00 00 06 00 00 00
00 00 00 19 00 00 00 01 00 00 00 00 00 00 00 04 00 00 00 01 00 00 00 00 00 00 00 01 00 00 00 00 00
00 03 00 00 00 00 00 00 00 00 52 4F 54 48 00 00 00 00 00 00 00 07 00 00 00 00 00 00 00 01 00 00 00
00 00 00 01 00 00 00 00 00 00 00 04 00 00 00 00 00 00 00 00 56 30 00 00 00 03 00 11 00 01 01 06 00
00 00 00 00 00 00 00 01 00 00 00 0E 00 05 00 1A 00 12 00 00 00 26 00 00 00 00 00 0B 00 00 00 01 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 44 44 00 43 00 16 00 08 00
00 04 00 02 00 00 00 02 00 11 00 20 00 1A 00 0A 00 15 00 0F 00 1B 00 13 00 08 00 08 00 00 00 00 00
00 0E 00 08 00 04 00 02 00 01 00 00 00 6D 00 03 00 02 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 0A 00 21 00 15 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0B 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 18 05 1F 00 00 00 00 00 00 00 00 00 1F 00 03 00 02 00 00 00 00 00 00 00
00 05 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 62 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 2E 00 00 1B 00 19 00 18 00 0D 00 00 00 00 00 00 00 01 00 01 00 02 00 00 06 00 01 E6
00 12 00 03 00 02 00 07 00 00 00 00 00 00 00 00 00 00 00 00 00 04 00 02 01 BA 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 24 00</result><result name="FLM_DATEN_2">08 00 0
00 00 00 00 00 00 00 0C 00 80 1B 00 45 10 00 A6 0D 00 51 16 00 59 44 00 00 EB 00 00 CA 00 00 49 00
17 00 10 00 0C 00 05 00 04 00 06 00 02 00 01 00 00 00 00 12 00 00 3A 00 00 26 00 00 13 00 0D 00 08
09 00 04 00 0A 00 00 00 00 00 00 00 00 03 00 00 0D 00 00 0B 00 00 07 00 02 00 05 00 01 00 01 00 04
00 00 00 00 00 00 04 00 07 00 08 00 06 00 04 00 …





small data subsets are stored
most data stays in file system (original XML-files)
only about 3 years history is stored in the moment
very much denormalized data
(e.g. Entity-Attribute-Value tables)

 TCO & performance limits

(queries are slow - pivoting is expensive)
 cover the whole live cycle 15 years

(incl. production data)
 more data sources: social media (motortalk)
 lower TCO for storage & flexible analysis
 …impossible with „classical“ RDBMS
"Big data" is high-volume, high-velocity
and high-variety information assets that
demand cost-effective, innovative forms of
information processing for enhanced
insight and decision making.

Source:
The Importance of 'Big Data': A Definition, Mark Beyer, Douglas Laney, G00235055
...

Modular Hardware Architecture

...

 ColumnStore v2 storage
 Hadoop Regions
Tight integration of
“nonstructured” data

FDR Infiniband

Ultra high compression

Direct
attached SAS

Scale Unit
Parallel Data Warehouse Screenshots
PDW in SQL Server Data Tools

A familiar development enviroment
…just counting rows

Scanning 10 billion rows…

…does not take…

…that long!
…a reporting query

…won„t take…

And even complex queries…

…much longer!
Data Distribution
Data is distributed evenly
over all data nodes…
Azure UX

Azure SDK

HDInsight

*
Hive

Templeton

RDP

*
Pig

HCatalog

Ambari

Map Reduce

*
Azure Blobs

*

= good to know!

HDFS

Sqoop
Oozie
Analyze

Demo-Umgebung
Extract

Azure Blob
Storage
…

Twitter

Hive Tables
StreamInsight

SQL Azure

Real-Time
Dashboard

Mash Up &
Visualise
Solution Components

HDInsight

Virtual Machine

Twitter

Excel
Big Data Twitter Demo
Azure Management Portal
Analyse

Manage
Extract

Azure Blob
Storage
…

Twitter

Hive Tables
StreamInsight

SQL Azure

Real-Time
Dashboard

Mash Up &
Visualise
Big Data Twitter Demo – Dashboard
Analyse

Manage
Extract

Azure Blob
Storage
…

Twitter

Hive Tables
StreamInsight

SQL Azure

Real-Time
Dashboard

Mash Up &
Visualise
Big Data Twitter Demo – SQL Azure
Analyse

Manage
Extract

Azure Blob
Storage
…

Twitter

Hive Tables
StreamInsight

SQL Azure

Real-Time
Dashboard

Mash Up &
Visualise
Big Data Twitter Demo
Azure Blob Storage
Analyse

Analyse
Extract

Azure Blob
Storage
…

Twitter

Hive Tables
StreamInsight

SQL Azure

Real-Time
Dashboard

Mash Up &
Visualise
Big Data Twitter Demo – Hive
Analyse

Insight
Extract

Azure Blob
Storage
…

Twitter

Hive Tables
StreamInsight

SQL Azure

Real-Time
Dashboard

Mash Up &
Visualise
Big Data Twitter Demo
Mash Up in Excel
Polybase

Regular T-SQL

Results

 T-SQL query engine for RDBMS & Hadoop
 Cost base optimizer. decides on:
 Rendering operators in Map/Reduce-Jobs or
 Moving HDFS data into RDBMS storage

PDW

 HDFS-Bridge for parallelized Data Transport

HDFS Data Nodes

&
T-SQL for Polybase

A distributed query.

Definition of an external table.
Modern Data Warehousing
Parallel Data Warehouse

HDInsight

Polybase

&
Big Data Enterprise Architecture

&
What„s next…
Twitter Big Data Sourcecode: http://twitterbigdata.codeplex.com/
Twitter Big Data Setup: http://aka.ms/bigdatatwitter
Azure Trial: http://aka.ms/azurenow
HDInsight: www.windowsazure.com/en-us/documentation/services/hdinsight/
Hortonworks for Windows: http://hortonworks.com/products/hdp-windows/
PDW und Polybase: http://microsoft.com/pdw

Microsoft Big Data: http://microsoft.com/bigdata
Deutsche SQL Server Konferenz 2014: http://www.sqlkonferenz.de
“Big data is like teen sex.
Everybody is talking about it,
everyone thinks everyone else is doing
it,
so everyone claims they are doing it.”
Dan Ariely, professor and director of Center for Advanced Hindsight at Duke University
Big Data mit Microsoft?

Weitere ähnliche Inhalte

Ähnlich wie Big Data mit Microsoft?

Jumpstarting big data projects / Architectural Considerations of HDInsight Ap...
Jumpstarting big data projects / Architectural Considerations of HDInsight Ap...Jumpstarting big data projects / Architectural Considerations of HDInsight Ap...
Jumpstarting big data projects / Architectural Considerations of HDInsight Ap...Olivia Klose
 
Aimp3 memory manager_eventlog
Aimp3 memory manager_eventlog Aimp3 memory manager_eventlog
Aimp3 memory manager_eventlog Ahmad Shabri
 
Compilation process
Compilation processCompilation process
Compilation processAlex Denisov
 
Horses for Courses: Deep Learning Beyond Niche Applications
Horses for Courses: Deep Learning Beyond Niche ApplicationsHorses for Courses: Deep Learning Beyond Niche Applications
Horses for Courses: Deep Learning Beyond Niche ApplicationsNikita Johnson
 
LT SAP HANAネットワークプロトコル初段
LT SAP HANAネットワークプロトコル初段LT SAP HANAネットワークプロトコル初段
LT SAP HANAネットワークプロトコル初段Koji Shinkubo
 
Monitoring Microservices
Monitoring MicroservicesMonitoring Microservices
Monitoring MicroservicesWeaveworks
 
IBM Global Security Kit as a Cryptographic layer for IBM middleware
IBM Global Security Kit as a Cryptographic layer for IBM middlewareIBM Global Security Kit as a Cryptographic layer for IBM middleware
IBM Global Security Kit as a Cryptographic layer for IBM middlewareOktawian Powazka
 
Acerノートpcバッテリー,リチウムイオンバッテリー
Acerノートpcバッテリー,リチウムイオンバッテリーAcerノートpcバッテリー,リチウムイオンバッテリー
Acerノートpcバッテリー,リチウムイオンバッテリーFollowpower Liu
 
nullcon 2011 - Memory analysis – Looking into the eye of the bits
nullcon 2011 - Memory analysis – Looking into the eye of the bitsnullcon 2011 - Memory analysis – Looking into the eye of the bits
nullcon 2011 - Memory analysis – Looking into the eye of the bitsn|u - The Open Security Community
 
Looking in the eye of the bits
Looking in the eye of the bitsLooking in the eye of the bits
Looking in the eye of the bitsIftach Ian Amit
 
AWS Simple Workflow: Distributed Out of the Box! - Morning@Lohika
AWS Simple Workflow: Distributed Out of the Box! - Morning@LohikaAWS Simple Workflow: Distributed Out of the Box! - Morning@Lohika
AWS Simple Workflow: Distributed Out of the Box! - Morning@LohikaSerhiy Batyuk
 

Ähnlich wie Big Data mit Microsoft? (20)

Jumpstarting big data projects / Architectural Considerations of HDInsight Ap...
Jumpstarting big data projects / Architectural Considerations of HDInsight Ap...Jumpstarting big data projects / Architectural Considerations of HDInsight Ap...
Jumpstarting big data projects / Architectural Considerations of HDInsight Ap...
 
Aimp3 memory manager_eventlog
Aimp3 memory manager_eventlog Aimp3 memory manager_eventlog
Aimp3 memory manager_eventlog
 
Log
LogLog
Log
 
Compilation process
Compilation processCompilation process
Compilation process
 
Horses for Courses: Deep Learning Beyond Niche Applications
Horses for Courses: Deep Learning Beyond Niche ApplicationsHorses for Courses: Deep Learning Beyond Niche Applications
Horses for Courses: Deep Learning Beyond Niche Applications
 
Log
LogLog
Log
 
LT SAP HANAネットワークプロトコル初段
LT SAP HANAネットワークプロトコル初段LT SAP HANAネットワークプロトコル初段
LT SAP HANAネットワークプロトコル初段
 
No more dumb hex!
No more dumb hex!No more dumb hex!
No more dumb hex!
 
Oiu
OiuOiu
Oiu
 
talk.ppt
talk.ppttalk.ppt
talk.ppt
 
Monitoring Microservices
Monitoring MicroservicesMonitoring Microservices
Monitoring Microservices
 
IBM Global Security Kit as a Cryptographic layer for IBM middleware
IBM Global Security Kit as a Cryptographic layer for IBM middlewareIBM Global Security Kit as a Cryptographic layer for IBM middleware
IBM Global Security Kit as a Cryptographic layer for IBM middleware
 
Acerノートpcバッテリー,リチウムイオンバッテリー
Acerノートpcバッテリー,リチウムイオンバッテリーAcerノートpcバッテリー,リチウムイオンバッテリー
Acerノートpcバッテリー,リチウムイオンバッテリー
 
CAR Email 5.22.03 (b)
CAR Email 5.22.03 (b)CAR Email 5.22.03 (b)
CAR Email 5.22.03 (b)
 
Performance Risk Management
Performance Risk ManagementPerformance Risk Management
Performance Risk Management
 
nullcon 2011 - Memory analysis – Looking into the eye of the bits
nullcon 2011 - Memory analysis – Looking into the eye of the bitsnullcon 2011 - Memory analysis – Looking into the eye of the bits
nullcon 2011 - Memory analysis – Looking into the eye of the bits
 
crack satellite
crack satellite crack satellite
crack satellite
 
Looking in the eye of the bits
Looking in the eye of the bitsLooking in the eye of the bits
Looking in the eye of the bits
 
AWS Simple Workflow: Distributed Out of the Box! - Morning@Lohika
AWS Simple Workflow: Distributed Out of the Box! - Morning@LohikaAWS Simple Workflow: Distributed Out of the Box! - Morning@Lohika
AWS Simple Workflow: Distributed Out of the Box! - Morning@Lohika
 
CAR Email 4.11.03 (b)
CAR Email 4.11.03 (b)CAR Email 4.11.03 (b)
CAR Email 4.11.03 (b)
 

Mehr von Olivia Klose

Evolution of AI - Why is my computer still so dumb?
Evolution of AI - Why is my computer still so dumb?Evolution of AI - Why is my computer still so dumb?
Evolution of AI - Why is my computer still so dumb?Olivia Klose
 
Deep Learning for New User Interactions (Gestures, Speech and Emotions)
Deep Learning for New User Interactions (Gestures, Speech and Emotions)Deep Learning for New User Interactions (Gestures, Speech and Emotions)
Deep Learning for New User Interactions (Gestures, Speech and Emotions)Olivia Klose
 
TechCamps - Internet of Things
TechCamps - Internet of ThingsTechCamps - Internet of Things
TechCamps - Internet of ThingsOlivia Klose
 
What does Bob really want? Recommenders in the Cloud
What does Bob really want? Recommenders in the CloudWhat does Bob really want? Recommenders in the Cloud
What does Bob really want? Recommenders in the CloudOlivia Klose
 
Developer Week 2015: Azure Machine Learning
Developer Week 2015: Azure Machine LearningDeveloper Week 2015: Azure Machine Learning
Developer Week 2015: Azure Machine LearningOlivia Klose
 
Dotnet Cologne 2015: //Rebuild - Big Data Analysis End-to-End
Dotnet Cologne 2015: //Rebuild - Big Data Analysis End-to-EndDotnet Cologne 2015: //Rebuild - Big Data Analysis End-to-End
Dotnet Cologne 2015: //Rebuild - Big Data Analysis End-to-EndOlivia Klose
 
Would I have survived the Titanic? Machine Learning in Microsoft Azure
Would I have survived the Titanic? Machine Learning in Microsoft AzureWould I have survived the Titanic? Machine Learning in Microsoft Azure
Would I have survived the Titanic? Machine Learning in Microsoft AzureOlivia Klose
 

Mehr von Olivia Klose (8)

Evolution of AI - Why is my computer still so dumb?
Evolution of AI - Why is my computer still so dumb?Evolution of AI - Why is my computer still so dumb?
Evolution of AI - Why is my computer still so dumb?
 
Deep Learning for New User Interactions (Gestures, Speech and Emotions)
Deep Learning for New User Interactions (Gestures, Speech and Emotions)Deep Learning for New User Interactions (Gestures, Speech and Emotions)
Deep Learning for New User Interactions (Gestures, Speech and Emotions)
 
TechCamps - Internet of Things
TechCamps - Internet of ThingsTechCamps - Internet of Things
TechCamps - Internet of Things
 
What does Bob really want? Recommenders in the Cloud
What does Bob really want? Recommenders in the CloudWhat does Bob really want? Recommenders in the Cloud
What does Bob really want? Recommenders in the Cloud
 
Developer Week 2015: Azure Machine Learning
Developer Week 2015: Azure Machine LearningDeveloper Week 2015: Azure Machine Learning
Developer Week 2015: Azure Machine Learning
 
Dotnet Cologne 2015: //Rebuild - Big Data Analysis End-to-End
Dotnet Cologne 2015: //Rebuild - Big Data Analysis End-to-EndDotnet Cologne 2015: //Rebuild - Big Data Analysis End-to-End
Dotnet Cologne 2015: //Rebuild - Big Data Analysis End-to-End
 
Would I have survived the Titanic? Machine Learning in Microsoft Azure
Would I have survived the Titanic? Machine Learning in Microsoft AzureWould I have survived the Titanic? Machine Learning in Microsoft Azure
Would I have survived the Titanic? Machine Learning in Microsoft Azure
 
Big Data DIY
Big Data DIYBig Data DIY
Big Data DIY
 

Kürzlich hochgeladen

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 

Kürzlich hochgeladen (20)

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 

Big Data mit Microsoft?

  • 1. Big Data mit Microsoft? Wie HDInsight, SQL Server 2014 und Excel zusammenspielen Olivia Klose, Technical Evangelist Georg Urban, Sr. Technology Solution Professional Microsoft Deutschland GmbH
  • 2.
  • 3. The large hadron collider produces 15 PB/year* http://public.web.cern.ch/public/en/lhc/Computing-en.html
  • 4. But what if I don‟t own a large hadron collider …
  • 5.  Large scale plants  Vehicle fleets  Smart Grids  Green Energy  Stock Exchanges  Host Protocols  Computer Centers  Web Farms  Twitter  Facebook  Google Analytics  …
  • 6.
  • 7.
  • 8. XML – but…  polystructured  varying  no explicit schema  lot„s of hex-BLOBs 40.000 attributes & growing „here is my data“ </meldungText><antwort>False</antwort><wert>na</wert></meldung><steuergeraet sgbdVariante="SMG_60"><steuergeraeteFunktion zeitstempel="2013-04-30T09:00:37.9926171-04:00 endDate="2013-04-30T09:00:38.1158609-04:00" jobName="STATUS_FAHRZEUGTESTER"><datensatz satzNr="1"><result name="JOB_STATUS">OKAY</result><result name="_TEL_ANTWORT">80 F1 18 70 02 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 82 6B 00 6D 6B 39 CD 14 00 14 00 00 0E 00 15 00 00 19 00 0C 00 12 00 15 85 57 71 88 81 C0 7D 73 C2 08 01 05 02 F7 00 FF FF 01 73 00 00 02 A8 00 C2 01 E0 00 00 00 00 00 00 3D 01 00 00 00 01 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 01 E1 02 05 01 F8 03 4F FF AD 04</result><result name="_TEL_AUFTRAG">83 18 F1 30 02 01</result><result name="STAT_KL15_ROH">0</result><result name="STAT_KLR_EIN_ROH">0</result><result name="STAT_WAKE_UP_ROH">1</result><result name="STAT_ISTGANG_TEXT">Neutral</result> <sgFunktion zeitstempel=“2013-04-30T10:33:37.0834084+02:00" endDate="2013-0430T10:33:37.9310504+02:00" jobName="_FLM_LESEN_BOSCH"><datensatz satzNr="1"><result name="FLM_DATEN_1">00 00 00 03 02 08 C6 56 46 4C 4D 39 00 16 4B B2 00 00 00 32 00 00 06 99 00 00 65 00 00 18 6E 00 00 00 73 00 00 00 20 00 00 00 73 00 00 00 00 00 00 10 69 00 00 0F 53 00 00 00 00 00 00 0A 00 00 79 6D 00 00 B7 34 00 00 D3 9E 4A 4C 41 52 00 00 00 00 00 00 00 00 00 00 00 00 0 00 00 00 00 00 2C 00 00 00 00 00 00 1A 5C 00 15 4B CA 00 00 44 08 00 00 2D 39 00 00 1E 45 00 00 2 00 00 1E EB 00 00 0C 65 00 00 04 47 00 00 00 00 00 00 00 00 00 00 00 04 00 00 00 27 00 00 01 1E 00 02 AB 00 00 07 71 00 00 13 D7 00 00 36 48 00 15 91 AD 00 00 3F 97 00 00 19 C1 00 00 07 F9 00 00 02 00 00 00 BD 00 00 00 20 00 16 1C 42 00 00 18 B1 00 00 09 40 00 00 08 9F 00 00 04 3A 00 00 01 3E 00 8C D7 00 00 61 A3 00 00 37 9D 00 00 1E 78 00 00 14 96 00 00 0A 71 00 00 05 49 00 00 02 B1 00 00 0 00 00 00 1D 00 00 00 09 00 00 00 05 00 00 00 00 00 00 00 00 00 00 23 BB 00 00 2F 84 00 00 14 EF 00 09 40 00 00 04 71 00 00 03 34 00 00 02 12 00 00 01 AC 00 00 01 59 00 00 0B C4 00 00 00 06 00 00 00 00 00 00 19 00 00 00 01 00 00 00 00 00 00 00 04 00 00 00 01 00 00 00 00 00 00 00 01 00 00 00 00 00 00 03 00 00 00 00 00 00 00 00 52 4F 54 48 00 00 00 00 00 00 00 07 00 00 00 00 00 00 00 01 00 00 00 00 00 00 01 00 00 00 00 00 00 00 04 00 00 00 00 00 00 00 00 56 30 00 00 00 03 00 11 00 01 01 06 00 00 00 00 00 00 00 00 01 00 00 00 0E 00 05 00 1A 00 12 00 00 00 26 00 00 00 00 00 0B 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 44 44 00 43 00 16 00 08 00 00 04 00 02 00 00 00 02 00 11 00 20 00 1A 00 0A 00 15 00 0F 00 1B 00 13 00 08 00 08 00 00 00 00 00 00 0E 00 08 00 04 00 02 00 01 00 00 00 6D 00 03 00 02 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0A 00 21 00 15 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0B 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 18 05 1F 00 00 00 00 00 00 00 00 00 1F 00 03 00 02 00 00 00 00 00 00 00 00 05 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 62 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 2E 00 00 1B 00 19 00 18 00 0D 00 00 00 00 00 00 00 01 00 01 00 02 00 00 06 00 01 E6 00 12 00 03 00 02 00 07 00 00 00 00 00 00 00 00 00 00 00 00 00 04 00 02 01 BA 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 24 00</result><result name="FLM_DATEN_2">08 00 0 00 00 00 00 00 00 00 0C 00 80 1B 00 45 10 00 A6 0D 00 51 16 00 59 44 00 00 EB 00 00 CA 00 00 49 00 17 00 10 00 0C 00 05 00 04 00 06 00 02 00 01 00 00 00 00 12 00 00 3A 00 00 26 00 00 13 00 0D 00 08 09 00 04 00 0A 00 00 00 00 00 00 00 00 03 00 00 0D 00 00 0B 00 00 07 00 02 00 05 00 01 00 01 00 04 00 00 00 00 00 00 04 00 07 00 08 00 06 00 04 00 …
  • 9.     small data subsets are stored most data stays in file system (original XML-files) only about 3 years history is stored in the moment very much denormalized data (e.g. Entity-Attribute-Value tables)  TCO & performance limits (queries are slow - pivoting is expensive)  cover the whole live cycle 15 years (incl. production data)  more data sources: social media (motortalk)  lower TCO for storage & flexible analysis  …impossible with „classical“ RDBMS
  • 10. "Big data" is high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making. Source: The Importance of 'Big Data': A Definition, Mark Beyer, Douglas Laney, G00235055
  • 11.
  • 12. ... Modular Hardware Architecture ...  ColumnStore v2 storage  Hadoop Regions Tight integration of “nonstructured” data FDR Infiniband Ultra high compression Direct attached SAS Scale Unit
  • 13. Parallel Data Warehouse Screenshots
  • 14. PDW in SQL Server Data Tools A familiar development enviroment
  • 15. …just counting rows Scanning 10 billion rows… …does not take… …that long!
  • 16. …a reporting query …won„t take… And even complex queries… …much longer!
  • 17. Data Distribution Data is distributed evenly over all data nodes…
  • 18.
  • 19.
  • 20.
  • 21. Azure UX Azure SDK HDInsight * Hive Templeton RDP * Pig HCatalog Ambari Map Reduce * Azure Blobs * = good to know! HDFS Sqoop Oozie
  • 22.
  • 25. Big Data Twitter Demo Azure Management Portal
  • 27. Big Data Twitter Demo – Dashboard
  • 29. Big Data Twitter Demo – SQL Azure
  • 31. Big Data Twitter Demo Azure Blob Storage
  • 33. Big Data Twitter Demo – Hive
  • 35. Big Data Twitter Demo Mash Up in Excel
  • 36. Polybase Regular T-SQL Results  T-SQL query engine for RDBMS & Hadoop  Cost base optimizer. decides on:  Rendering operators in Map/Reduce-Jobs or  Moving HDFS data into RDBMS storage PDW  HDFS-Bridge for parallelized Data Transport HDFS Data Nodes &
  • 37. T-SQL for Polybase A distributed query. Definition of an external table.
  • 38. Modern Data Warehousing Parallel Data Warehouse HDInsight Polybase &
  • 39. Big Data Enterprise Architecture &
  • 40. What„s next… Twitter Big Data Sourcecode: http://twitterbigdata.codeplex.com/ Twitter Big Data Setup: http://aka.ms/bigdatatwitter Azure Trial: http://aka.ms/azurenow HDInsight: www.windowsazure.com/en-us/documentation/services/hdinsight/ Hortonworks for Windows: http://hortonworks.com/products/hdp-windows/ PDW und Polybase: http://microsoft.com/pdw Microsoft Big Data: http://microsoft.com/bigdata Deutsche SQL Server Konferenz 2014: http://www.sqlkonferenz.de
  • 41. “Big data is like teen sex. Everybody is talking about it, everyone thinks everyone else is doing it, so everyone claims they are doing it.” Dan Ariely, professor and director of Center for Advanced Hindsight at Duke University

Hinweis der Redaktion

  1. Olivia