The document discusses visualizing big data with tools like Hadoop, Hive, and Excel 2013. It provides an overview of big data technologies and data visualization with Office 365 and Power BI. It describes what Hive is and how it works, including how Hive solves the problem of analyzing large amounts of data by providing a SQL-like language (HiveQL) to query data stored in Hadoop and translating queries to MapReduce jobs. The document demonstrates visualizing big data with Microsoft tools like Power View and Power Map in Excel.
3. Explore Everything PASS Has to Offer
Free SQL Server and BI Web Events
Free 1-day Training Events
Regional Event
This is Community
Business Analytics Training
Local User Groups Around the World
Session Recordings
PASS Newsletter
Free Online Technical Training
3
4. About me
Director-At-Large (Elect) PASS Board from
Jan 2014
SQL Server MVP
Blogger, data strategist, public speaker,
technologist
Joint owner of Copper Blue Consulting Ltd
4 |
5. Agenda
5 |
Overview of Big Data Technologies
Data Visualisation with Office365 and PowerBI
Hive
Visualising Big Data with Microsoft
8. What is Hadoop?
“Flexible and Available
Architecture for Large Scale
computation and data
processing on a network of
highly available commodity
hardware.”
10. Data Visualisation Background
We have the tools. All we’ve
got to
do is imagine what could be.
We can reinvent the present;
we can transform the world
around us.
Jason Silva
10
11. Almost 50% of your
brain is dedicated to
visual processing.
David van Essen
Researchers found that colour
visuals increase the willingness to
read by
11
80%
About 70% of your
sensory receptors are in
your eyes.
12. Why is Data Visualisation Important?
It’s clearly a
budget. It has a
lot of numbers in
it. George W Bush
I could never figure out
where the decimal
point went. (Lord
Randolph Churchill)
13.
14. The Unknown Unknowns
That is to say, there are things that we
know we don't know. But there are also
unknown unknowns. There are things
we don't know we don't know. (Donald
Rumsfeld)
15.
16. What is the purpose of Hive?
Hive is a solution to a business problem:
How do you analyse large amounts of data?
Data Scientists want to study data
Communicate with the data
Businesses want to reap benefits of data
Results that make sense of the data
16
18. What is the purpose of Hive?
Hive is a data warehousing system for Hadoop
To meet the needs of businesses, data scientists, analysts and BI
professionals
Data, Summarized
Fit a structure onto data
Data, Analyzed
Analysis of Large Datasets stored in Hadoop File Systems
SQL-Like language called HiveQL
Custom mappers and reduces when HiveQL isn’t enough
18
19. Agenda
Hive solves the business problem of analysing large amounts of
data
•
•
•
•
19
What is the purpose of Hive?
Why Hive?
A history of Hive
What are Hive’s constituents
20. Why Hive?
Can’t Hadoop be used to solve these problems?
Why is there a need for Hive?
Writing MR jobs in Java can be difficult
You don’t know it’s wrong until it’s fallen over!
Joining Large Datasets can be difficult
Learning Curve
20
21. Agenda
Hive solves the business problem of analysing large amounts of
data
•
•
•
•
21
What is the purpose of Hive?
Why Hive?
A history of Hive
What are Hive’s constituents
24. What can Hive offer you?
Hive can help with a range of business problems:
•
•
•
•
24
Log Processing
Predictive Modelling
Hypothesis testing
And Business Intelligence
25. Hive is not a replacement for SQL
So don’t throw out your SQL Server instances!
• Hive is for processing large data sets that may span
hundreds, or even thousands, of machines
• Hive as a high overhead for starting a job. It translates queries
to MR so it takes time
• Hive does not cache data, like SQL Server
• Hive performance tuning is mainly Hadoop performance
tuning
• Similarity of the query engine, but different architectures for
different purposes
25
26. Agenda
Hive solves the business problem of analysing large amounts of
data
•
•
•
•
What is the purpose of Hive?
Why Hive?
A history of Hive
What are Hive’s constituents?
Hive as a SQL-like Language Query Tool
Hive as a Translation Tool
Hive as a Structuring Tool
26
27. HiveQL
Hive QL is a SQL-like language
It outputs naturally occurring groups for further analysis
Easy Data Summarization
Large Datasets, summarized
Fit a structure onto data
Analysis of Large Datasets stored in Hadoop file systems
SQL-Like language called HiveQL
Custom mappers and reduces when HiveQL isn’t enough
27
28. HiveQL Queries like SQL Queries?
Similarities in Syntax and Features
Similar features
SELECT
FROM
WHERE
GROUP BY / HAVING
Table Aliases
Computed Columns
28
29. HiveQL Queries like SQL Queries?
Similarities in Syntax and Features
Similar features
Aggregate Functions
Nested Select
CASE
LIKE / RLIKE
JOIN
ORDER BY / SORT BY
29
30. How does Hive work?
Hive as a Translation Tool
Compiles and executes queries
Hive translates the SQL Query to a Map Reduce Job
These are chained together
Queries are compiled and executed
30
31. How does Hive work?
Hive as a structuring Tool
Creates a schema around the data
Tables stored in Directories
Hive Tables
Rows and columns, like SQL tables
Hive Metastore
Namespace with a set of tables
Holds table definitions
Physical Layout
Column Types
Partition Information
31
32. Hive and SQL Data Types
Hive
SQL
Tinyint
Tinyint
SmallInt
Smallint
Int
Int
BigInt
BigInt
Boolean
Bit (setting as NOT NULL)
Float
Float
Double
Real
BigDecimal
Decimal
33
33. Hive and SQL Data Types
HEADING
HEADING
String
Char, varchar, nvarchar, ntext, text, image
Binary
binary
Timestamp
Timestamp (note that this is being deprecated).
RowVersion
34
35. How does Hive work?
Hive as a structuring Tool
Creates a schema around the data
Tables stored in Directories
Hive Tables
Rows and columns, like SQL tables
Hive Metastore
Namespace with a set of tables
Holds table definitions
Physical Layout
Column Types
Partition Information
36
37. Different Tools for Different Jobs
Power View
Power Map
Highly Visual Design Experience
Power Map is a new 3D
visualization add-in for Excel
helping you to analyse
geographical and temporal data
Power View is an interactive, ad
hoc, query and visualization
experience.
It is for business question ‘mystery’
solving
Mapping
Exploring
Interacting
38
38
44. JOIN US for our second annual event to get the best learning for
analyzing, managing, and sharing business information and
insights through the Microsoft Data Platform of technologies.