Intro to data visualization

Data Visualization - An introduction
Prof Jan Aerts
Biodata Visualization and Analysis
ESAT/SCD
University of Leuven
Belgium

twitter: @jandot
Google+: +Jan Aerts
jan.aerts@esat.kuleuven.be
http://biovizanlab.wordpress.com
http://saaientist.blogspot.com

1. What is data visualization?

“A good sketch is better than a long speech” (Napoleon)

“A good sketch is better than a long speech” (Napoleon)

shows: size of the army, geographical coordinates, direction that the army
was traveling, location of the army with respect to certain dates, temperature
along the path of the retreat

Shape of Songs: “Like a Prayer” (Madonna)
Martin Wattenberg

http://multimedia.mcb.harvard.edu/anim_innerlife.html

What I use as a definition:

“computer-based visualization systems providing visual representations of
datasets intended to help people carry out some task more effectively.” (T
Munzner)

cognition <=> perception
cognitive task => perceptive task

“eyes beat memory”

Why do we visualize data?
• record information

• blueprints, photographs,
seismographs, ...

• analyze data to support reasoning

• develop & assess hypotheses

• discover errors in data

• expand memory

• ﬁnd patterns (see Snow’s cholera map)

• communicate information

• share & persuade

• collaborate & revise

exploration explanation

pictorial superiority effect

“information”

72hr

“informa” “i”
65% 1%

2. Exploration <-> explanation


visual
infographics
analytics


visual
infographics
analytics

hypothesis
generation


“visual analytics”

=> identify unexpected patterns


J van Wijk

Anscombe’s quartet

• uX = 9.0
• uY = 7.5
• sigma X = 3.317
• sigma Y = 2.03
• Y = 3 + 0.5X
• R2 = 0.67

A concrete example: hive plots

same network

Martin Krzewinsky

different networks!

Martin Krzewinsky

3D, anyone?

occlusion
interaction complexity
perspective distortion
text legibility

Functions in linux operation system:
“function A calls function B”

Gene interaction data:
“gene A regulates gene B”

regulator

workhorse
manager

3. Why speciﬁcally learn about dataviz?

Isn’t it all just about using common sense?

• huge space of design alternatives => many tradeoffs

• many possibilities known to be ineffective

• avoid random walk through parameter space

• avoid some of our past mistakes

• extensive experimentation has already been done

• guidelines continue to evolve

• we reﬂect on lessons learned in design studies

• iterative reﬁnement usually wise

4. Stages of data visualization

How do we get from data to visualization? We need to understand:

• properties of the data

• properties of the image

• the rules mapping data to image

S Stevens “On the theory of scales and measurements” (1946)

4.2. Properties of the image - perception

Semiology of graphics

• Jacques Bertin, Gauthier-Villars 1967, EHESS 1998

• semiology = study of signs and sign processes, likeness, analogy, metaphor,
symbolism, signiﬁcation, and communication (Wikipedia)

• visual encoding:

• what - points, lines, areas (, patterns, trees/networks, grids)

• where - positional: XY (1D, 2D, 3D)

• how - retinal: Z (size, lightness, texture, colour, orientation, shape)

• when - temporal: animation

“marks” - geometric primitives

H

V

S

“channels” - control appearance of marks

Gestalt laws - interplay between parts and the
whole (Kurt Koffka)

series of principles

Election results Florida:

• black = Bush
• white = Gore

Gestalt - Principle of Simplicity

Every pattern we see is seen such that we see a structure that is as simple as
possible.

Gestalt - Principle of Proximity

Things that are close to each other are seen as belonging together (=>
clusters)

Gestalt - Principle of Similarity

Things that are similar in some way are perceived as belonging together.

Gestalt - Principle of Closure

You will try to complete a pattern.

Gestalt - Principle of Connectedness

Things that are connected are perceived as belonging together. This encoding
is stronger than similarity, shape, colour, and size.

Gestalt - Principle of Good Continuation

Objects that are arranged in a straight or smooth line tend to be seen as a
unit.

Gestalt - Principle of Common Fate

Objects that move in the same direction tend to be seen as a unit.

Gestalt - Principle of Familiarity

Gestalt - Principle of Symmetry

Symmetrical areas tend to be seen as ﬁgures against asymmetrical
backgrounds.

Context affects perceptual tasks

Pre-attentive vision

= ability of low-level human visual system to rapidly identify certain basic visual
properties

• some features “pop out”

• used for:

• target detection

• boundary detection

• counting/estimation

• ...

• visual system takes over => all cognitive power available for interpreting the
ﬁgure, rather than needing part of it for processing the ﬁgure

Really fast; see http://www.csc.ncsu.edu/faculty/healey/PP/

Limitations of preattentive vision

1. Combining pre-attentive features does not always work => would need to
resort to “serial search” (most channel pairs; all channel triplets)
e.g. is there a red square in this picture

2. Speed depends on which channel (use one that is good for
categorical; see further (“accuracy”))

4.3. Mapping data to image: visual encoding

Language of graphics

• graphics = sign system:

• each mark (point, line, area) represents a data element

• choose visual variables to encode relationships between data elements

• difference, similarity, order, proportion

• only position supports all relationships (see later)

• huge range of alternatives for data with many attributes

• ﬁnd images that express & effectively convey the information

Which encoding should I use?

• From huge list of possibilities, you have to choose the best one.

• Principle of Consistency

• properties of the representation should match properties of the data (e.g.
pie chart: area vs radius)

• Principle of Importance Ordering

• encode the most important piece of information in the most “effective”
way (i.e. spatial position)

Steven’s psychophysical law

= proposed relationship between the magnitude of a physical stimulus and its
perceived intensity or strength

Accuracy of quantitative perceptual tasks
how much (quantitative) what/where (qualitative)

McKinlay


McKinlay
“power of the plane”


grouping: see Gestalt laws

McKinlay

COLOUR ... is tricky, and often used wrong

Colour space

• = mathematical model to talk about colour

• RGB (red-green-blue)

• most common, but less useful

• HSV (hue-saturation-value)

• more useful

colorbrewer2.org

in R: please use RColorBrewer!

Context affects colour perception

Dangers of Depth (3D)

• We do NOT see in 3D; we see in 2.05D.

• occlusion

• interaction complexity

• perspective distortion

Lie factor

size of effect shown in graphic
“lie factor” =
size of effect in data

3D scatter plots are better as series of 2D projections

Dynamic data

• animation is good sometimes, but often not:

• we can only follow 3-4 visual cues simultaneously

• change in “mental map”

• change blindness (e.g. http://nivea.psycho.univ-paris5.fr/CBMovies/
BarnTrackFlickerMovie.gif)

Overview, zoom and ﬁlter, details on demand
(Schneiderman’s Information Seeking Mantra)

Operations on the data

• sorting

• ﬁltering

• browsing/exploring

• comparison

• characterizing trends & distributions

• ﬁnding anomalies & outliers

• ...

Techniques to support these operations

• re-orderable matrices

• brushing

• linked views

• overview & detail

• focus & context

• ...

Evaluate the right thing

Munzner, 2009

Slide/picture acknowledgments

• Jeffrey Heer

• Tamara Munzner

• Jessie Kennedy

• Nils Gehlenborg

• Miriah Meyer

“I think this presentation went quite well...”

Intro to data visualization

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Intro to data visualization

Ähnlich wie Intro to data visualization (20)

Mehr von Jan Aerts

Mehr von Jan Aerts (20)

Intro to data visualization