1. Introduction to ggplot2
Elegant Graphics for Data Analysis
Maik Röder
15.12.2011
RUGBCN and Barcelona Code Meetup
vendredi 16 décembre 2011 1
2. Data Analysis Steps
• Prepare data
• e.g. using the reshape framework for restructuring
data
• Plot data
• e.g. using ggplot2 instead of base graphics and
lattice
• Summarize the data and refine the plots
• Iterative process
vendredi 16 décembre 2011 2
3. ggplot2
grammar of graphics
vendredi 16 décembre 2011 3
4. Grammar
• Oxford English Dictionary:
• The fundamental principles or rules of an art or
science
• A book presenting these in methodical form.
(Now rare; formerly common in the titles of
books.)
• System of rules underlying a given language
• An abstraction which facilitates thinking, reasoning
and communicating
vendredi 16 décembre 2011 4
5. The grammar of graphics
• Move beyond named graphics (e.g. “scatterplot”)
• gain insight into the deep structure that underlies
statistical graphics
• Powerful and flexible system for
• constructing abstract graphs (set of points)
mathematically
• Realizing physical representations as graphics by
mapping aesthetic attributes (size, colour) to graphs
• Lacking openly available implementation
vendredi 16 décembre 2011 5
6. Specification
Concise description of components of a graphic
• DATA - data operations that create variables
from datasets. Reshaping using an Algebra with
operations
• TRANS - variable transformations
• SCALE - scale transformations
• ELEMENT - graphs and their aesthetic attributes
• COORD - a coordinate system
• GUIDE - one or more guides
vendredi 16 décembre 2011 6
7. Birth/Death Rate
Source: http://www.scalloway.org.uk/popu6.htm
vendredi 16 décembre 2011 7
8. Excess birth
(vs. death) rates in selected countries
Source: The grammar of Graphics, p.13
vendredi 16 décembre 2011 8
9. Grammar of Graphics
Specification can be run in GPL implemented in SPSS
DATA: source("demographics")
DATA: longitude,
latitude = map(source("World"))
TRANS: bd = max(birth - death, 0)
COORD: project.mercator()
ELEMENT: point(position(lon * lat),
size(bd),
color(color.red))
ELEMENT: polygon(position(longitude *
latitude))
Source: The grammar of Graphics, p.13
vendredi 16 décembre 2011 9
10. Rearrangement of Components
Grammar of Graphics Layered Grammar of
Graphics
Data Defaults
Trans Data
Mapping
Element Layer
Data
Mapping
Geom
Stat
Scale Position
Guide Scale
Coord
Coord Facet
vendredi 16 décembre 2011 10
11. Layered Grammar of Graphics
Implementation embedded in R using ggplot2
w <- world
d <- demographics
d <- transform(d,
bd = pmax(birth - death, 0))
p <- ggplot(d, aes(lon, lat))
p <- p + geom_polygon(data = w)
p <- p + geom_point(aes(size = bd),
colour = "red")
p <- p + coord_map(projection = "mercator")
p
vendredi 16 décembre 2011 11
12. ggplot2
• Author: Hadley Wickham
• Open Source implementation of the layered
grammar of graphics
• High-level R package for creating publication-
quality statistical graphics
• Carefully chosen defaults following basic
graphical design rules
• Flexible set of components for creating any type of
graphics
vendredi 16 décembre 2011 12
13. ggplot2 installation
• In R console:
install.packages("ggplot2")
library(ggplot2)
vendredi 16 décembre 2011 13
14. qplot
• Quickly plot something with qplot
• for exploring ideas interactively
• Same options as plot converted to ggplot2
qplot(carat, price,
data=diamonds,
main = "Diamonds",
asp = 1)
vendredi 16 décembre 2011 14
16. Exploring with qplot
First try:
qplot(carat, price,
data=diamonds)
Log transform using functions on the variables:
qplot(log(carat),
log(price),
data=diamonds)
vendredi 16 décembre 2011 16
18. from qplot to ggplot
qplot(carat, price,
data=diamonds,
main = "Diamonds",
asp = 1)
p <- ggplot(diamonds, aes(carat, price))
p <- p + geom_point()
p <- p + opts(title = "Diamonds",
aspect.ratio = 1)
p
vendredi 16 décembre 2011 18
19. Data and mapping
• If you need to flexibly restructure and
aggregate data beforehand, use Reshape
• data is considered an independent concern
• Need a mapping of what variables are
mapped to what aesthetic
• weight => x, height => y, age => size
• Mappings are defined in scales
vendredi 16 décembre 2011 19
20. Statistical Transformations
• a stat transforms data
• can add new variables to a dataset
• that can be used in aesthetic mappings
vendredi 16 décembre 2011 20
21. stat_smooth
• Fits a smoother to the data
• Displays a smooth and its standard error
ggplot(diamonds, aes(carat, price)) +
geom_point() + geom_smooth()
vendredi 16 décembre 2011 21
32. Coordinate System
• Maps the position of objects into the plane
• Affect all position variables simultaneously
• Change appearance of geoms (unlike scales)
vendredi 16 décembre 2011 32
33. coord_map
library("maps")
map <- map("nz", plot=FALSE)[c("x","y")]
m <- data.frame(map)
n <- qplot(x, y, data=m, geom="path")
n
d <- data.frame(c(0), c(0))
n + geom_point(data = d, colour = "red")
vendredi 16 décembre 2011 33
39. Faceting Formula
no faceting .~ .
single row multiple columns .~ a
single column, multiple rows b~.
multiple rows and columns a~b
.~ a + b
multiple variables in rows and/or
a + b ~.
columns
a+b~c+d
vendredi 16 décembre 2011 39
40. Scales in Facets
facet_grid(. ~ cyl, scales="free_x")
scales value free
fixed -
free x, y
free_x x
free_y y
vendredi 16 décembre 2011 40
41. Layers
• Iterativey update a plot
• change a single feature at a time
• Think about the high level aspects of the
plot in isolation
• Instead of choosing a static type of plot,
create new types of plots on the fly
• Cure against immobility
• Developers can easily develop new layers
without affecting other layers
vendredi 16 décembre 2011 41
42. Hierarchy of defaults
Omitted layer Default chosen by layer
Stat Geom
Geom Stat
Mapping Plot default
Coord Cartesian coordinates
Chosen depending on aesthetic and type of
Scale
variable
Linear scaling for continuous variables
Position
Integers for categorical variables
vendredi 16 décembre 2011 42
43. Thanks!
• Visit the ggplot2 homepage:
• http://had.co.nz/ggplot2/
• Get the ggplot2 book:
• http://amzn.com/0387981403
• Get the Grammar of Graphics book from
Leland Wilkinson:
• http://amzn.com/0387245448
vendredi 16 décembre 2011 43