A talk at Data Visualization Summit 2014 in Santa Clara, CA
ABSTRACT: What is the thought process that transforms data into visualizations? In this presentation, I will talk about guidelines that will help you when starting with raw data, walk through standard techniques, and also discuss things to keep in mind when making design decisions.
9. Anscombe’s Quartet
Property Value
Mean of X 11.0
Variance of X 10.0
Mean of Y 7.5
Variance of Y 3.75
Correlation between X and Y 0.816
Linear regression y = 3.0 +0.5x
#1 #2 #3 #4
Identical statistics!
32. The Geography of Tweets
@miguelrios
tweet counts latitude longitude
20,000 27.174526 78.042153
9,000 49.124093 52.201304
1,000 12.2995 31.59592
... ... ...
DATA
abstract
dimension
real world
dimensions
49. VISUALIZATION
visual encodings + interactions
tooltips
animation
highlight
filter
etc.
bar chart
line chart
matrix
node-link
treemaps
etc.
or multiple views
(data type)
50. DATA
1) What type of data?
vis7
vis5
vis3
vis2
vis1
vis6
vis4
Many options...
Which visualization technique should I use?
51. DATA
1) What type of data?
vis7
vis3
vis4
Less options...
Still, which one should I use?
52. How to start?
• What tool should I use?
!
!
!
1. What type of data do I have?
2. What do I want from the data?
DATA
53. 2) What do I want from the data?
• Many ways to visualize one type of data.
• Things to consider:
• audience (data scientist, execs, etc.)
• goal (storytelling, exploratory analysis)
• tasks
58. State of the Union
http://twitter.github.io/interactive/sotu2014/#p1
59. Ok, now tools.
1. What type of data do I have?
2. What do I want from the data?
60. Tools
Option 1: Programming library
Option 2: Packaged software
You have to write code.
(Mostly) no coding involved
61. Programming libraries
• d3.js, processing, R, etc.
!
• Copy and modify from examples.
• Can do custom stuffs (if you can figure out how)
• More overhead for common task
62. Packaged software
• Tableau (multi-dimensional)
• Gephi (graph)
• NodeXL (graph)
• Research projects (contact authors)
!
• Just use the software. No hassle of code/debug
• Limited functionalities to what the tools can do
• Custom designs more difficult
63. Ideal workflow
1. What type of data do I have?
2. What do I want from the data?
3. Pick appropriate techniques/tools
4. Done!
64. Ideal workflow
1. What type of data do I have?
2. What do I want from the data?
3. Pick appropriate techniques/tools
4. Done!
Not that easy!
65. Real-life workflow
data are dirty unsatisfied
transform
What type of data do I have?
Pre-process data
What do I want from the data?
Pick appropriate techniques/tools
See results change goal
change perspective