2. What is Big Data?
Big Data is a collection of data sets so large and complex
that it becomes difficult to process using traditional
database systems.
Big Data Challenges (3Vs)
Volume
Amount of Data
Velocity
Speed of Data
In & Out
Variety
Range of
Data Types &
Sources
3. Microsoft Solution to Big Data
●
●
●
●
●
Microsoft HDInsight
Microsoft .NET SDK for Hadoop
Microsoft ODBC Driver for Hive
Microsoft Excel (Power View & PowerPivot)
Microsoft SharePoint (Power View)
4. Microsoft HDInsight
● 100% Apache Hadoop compatible Big Data
implementation
● Microsoft support of HDInsight on Windows Server and
Windows Azure
● Simplified deployment and ease of manageability with
System Center 2012 or Windows Azure
● Elegant connectivity to Microsoft Office Excel 2013 and
Business Intelligence tools
5. What is Hadoop?
Apache Hadoop is an open-source software
framework that allows for the distributed processing of
large data sets across clusters of computers using
simple programming model. It is designed to scale up from
single servers to thousands of machines, each offering
local computation and storage.
6. What is Hadoop? (Cont.)
Hadoop includes 2 major modules
1. Hadoop Distributed File System (HDFS)
A distributed file system that provides high-throughput
access to application data
2. Hadoop MapReduce
A programming model for parallel processing of large
data sets
13. Microsoft .NET SDK for Hadoop
●
●
●
●
HDInsight Cluster Management
Hadoop Job Submission
Customize Map/Reduce Job
LINQ to Hive
14. Microsoft ODBC Driver for Hive
● Connect the following tools to Hadoop for
data insight
○ Microsoft Excel (Power View & PowerPivot)
○ Microsoft SharePoint (Power View)
○ Microsoft SQL Server
■ Database Engine
■ Analysis Services
15. Learning Hadoop
● Get Started with Hadoop@Hortonworks
http://hortonworks.com/get-started/
● Big Data University
http://bigdatauniversity.com/
● Getting Started with Microsoft Big Data
http://www.microsoftvirtualacademy.com/training-courses/getting-startedwith-microsoft-big-data