This document summarizes a presentation about how HDInsight, SQL Server 2014, and Excel can work together with Big Data. It provides an overview of how these Microsoft technologies integrate with Hadoop on HDInsight to extract, analyze, and visualize large, diverse datasets. Examples are given of real-time dashboards and mashups created with these tools to analyze Twitter data stored on HDInsight clusters on Azure.
Dev Dives: Streamline document processing with UiPath Studio Web
Big Data mit Microsoft?
1. Big Data mit Microsoft?
Wie HDInsight, SQL Server 2014
und Excel zusammenspielen
Olivia Klose, Technical Evangelist
Georg Urban, Sr. Technology Solution Professional
Microsoft Deutschland GmbH
2.
3. The large hadron collider produces 15 PB/year*
http://public.web.cern.ch/public/en/lhc/Computing-en.html
4. But what if I don‟t
own a large hadron
collider …
5. Large scale plants
Vehicle fleets
Smart Grids
Green Energy
Stock Exchanges
Host Protocols
Computer Centers
Web Farms
Twitter
Facebook
Google Analytics
…
9.
small data subsets are stored
most data stays in file system (original XML-files)
only about 3 years history is stored in the moment
very much denormalized data
(e.g. Entity-Attribute-Value tables)
TCO & performance limits
(queries are slow - pivoting is expensive)
cover the whole live cycle 15 years
(incl. production data)
more data sources: social media (motortalk)
lower TCO for storage & flexible analysis
…impossible with „classical“ RDBMS
10. "Big data" is high-volume, high-velocity
and high-variety information assets that
demand cost-effective, innovative forms of
information processing for enhanced
insight and decision making.
Source:
The Importance of 'Big Data': A Definition, Mark Beyer, Douglas Laney, G00235055
11.
12. ...
Modular Hardware Architecture
...
ColumnStore v2 storage
Hadoop Regions
Tight integration of
“nonstructured” data
FDR Infiniband
Ultra high compression
Direct
attached SAS
Scale Unit
36. Polybase
Regular T-SQL
Results
T-SQL query engine for RDBMS & Hadoop
Cost base optimizer. decides on:
Rendering operators in Map/Reduce-Jobs or
Moving HDFS data into RDBMS storage
PDW
HDFS-Bridge for parallelized Data Transport
HDFS Data Nodes
&
40. What„s next…
Twitter Big Data Sourcecode: http://twitterbigdata.codeplex.com/
Twitter Big Data Setup: http://aka.ms/bigdatatwitter
Azure Trial: http://aka.ms/azurenow
HDInsight: www.windowsazure.com/en-us/documentation/services/hdinsight/
Hortonworks for Windows: http://hortonworks.com/products/hdp-windows/
PDW und Polybase: http://microsoft.com/pdw
Microsoft Big Data: http://microsoft.com/bigdata
Deutsche SQL Server Konferenz 2014: http://www.sqlkonferenz.de
41. “Big data is like teen sex.
Everybody is talking about it,
everyone thinks everyone else is doing
it,
so everyone claims they are doing it.”
Dan Ariely, professor and director of Center for Advanced Hindsight at Duke University