Ggplot2 is one of R’s most popular, widely used packages, developed by Rice University’s Hadley Wickham. Ggplot2’s exploratory graphics capabilities are driving the use of R as a complement to legacy analytics tools such as SAS. SAS is well-regarded for its strength in data management and "production" statistics, where you know what you want to do and need to do it repeatedly. On the other hand, R is strong in data analysis and exploration in situations where figuring out what is needed is the biggest challenge. In this important way, SAS and R are strong companions.
This webinar will provide an all-access pass to Hadley’s latest work. He’ll discuss:
* A brief overview of ggplot2, and how it's different to other plotting systems
* A sneak peek at some of the new features coming to the next version of ggplot2
* What’s been learned about good development practices in the 5 years since first starting to develop ggplot
* Some of the internals of ggplot2, and talk about how he is gradually making it easier for others to contribute.
1. ggplot2:
A backstage tour
Hadley Wickham
Assistant Professor /
Dobelman Family Junior Chair
Department of Statistics
Rice University
February 2012
Wednesday, February 8, 12
2. 1. Why ggplot2?
2. Sneak peek and new features
3. Best practices
4. Questions
Wednesday, February 8, 12
3. Poll: What graphics
system are you
currently using?
Wednesday, February 8, 12
7. “Nothing is as practical as a good theory”
—Kurt Lewin
“[A good model] will bring together in a
coherent way things that previously
appeared unrelated and which also will
provide a basis for dealing systematically
with new situations”
—David Cox
Wednesday, February 8, 12
8. A plot is made up of multiple layers.
A layer consists of data, a set of
mappings between variables and
aesthetics, a geometric object and a
statistical transformation
Scales control the details of the mapping.
All components are independent and
reusable.
Wednesday, February 8, 12
9. Interesting ggplot example
Layered grammar + ggplot2
James Cheshire, http://bit.ly/xqHhAs
Wednesday, February 8, 12
13. Poll: What resources are
most helpful to you when
improving your R skills?
Wednesday, February 8, 12
14. Learning ggplot2
ggplot2 mailing list
http://groups.google.com/group/ggplot2
stackoverflow
http://stackoverflow.com/tags/ggplot2
Lattice to ggplot2 conversion
http://learnr.wordpress.com/?s=lattice
Cookbook for common graphics
http://wiki.stdout.org/rcookbook/Graphs/
ggplot2 book
http://amzn.com/0387981403
Wednesday, February 8, 12
16. Poll: Why do you use
visualisation?
Wednesday, February 8, 12
17. # Getting started
# To get the CRAN version
install.packages("ggplot2")
# To get the development version
install.packages("devtools")
library(devtools)
dev_mode() # don't overwrite your existing install
install_github("ggplot2")
Wednesday, February 8, 12
29. qplot(x, y, data = df, colour = colour, alpha = I(1/4))
Wednesday, February 8, 12
30. qplot(x, y, data = df, colour = colour, alpha = I(1/4)) +
guides(colour = guide_legend(
override.aes = list(alpha = 1, size = 2)))
Wednesday, February 8, 12
31. # Better layout
df <- data.frame(x = 1:10, y = 10:1, colour = 1:2)
qplot(x, y, data = df) + coord_fixed()
qplot(x, y, data = df) + facet_wrap(~ colour)
# Internally, there has been a big rewrite of
# the facetting data processing and rendering
# systems. This lays the foundation for new
# features, and fixes some annoying long-standing
# bugs.
Wednesday, February 8, 12
32. # Speed improvements
system.time(
print(qplot(carat, price, data = diamonds))
)
# Includes new tools for figuring out what's
# taking all the time
benchplot(qplot(carat, price, data = diamonds))
# See also geom_raster and geom_map
# Still a lot of work to do. The emphasis in
# ggplot2 is reducing the amount of thinking
# time by making it easier to go from the plot in
# your brain to the plot on the page.
Wednesday, February 8, 12
36. Poll: How big
is your data?
Wednesday, February 8, 12
37. # Future work: big visualisation
# (Sponsored by Revolution Analytics)
# How can you make a plot of 100 million
# observations?
# In less that one minute.
Wednesday, February 8, 12
46. Poll: How do you learn
about new packages?
Wednesday, February 8, 12
47. Package best
practices
• Namespace
• Documentation
• Unit tests
• Read the source!
• (ggplot2 not always the best example: it was
was my second R package - I have now written
around 30. I now know a lot more!)
Wednesday, February 8, 12
49. # Namespaces
library(ggplot2)
ddply
# Note that plyr, reshape etc aren't automatically
# loaded. This is good development practice -
# it's better to be explicit than implicit.
# Look at the NAMESPACE file.
Wednesday, February 8, 12
54. Learning ggplot2
ggplot2 mailing list
http://groups.google.com/group/ggplot2
stackoverflow
http://stackoverflow.com/tags/ggplot2
Lattice to ggplot2 conversion
http://learnr.wordpress.com/?s=lattice
Cookbook for common graphics
http://wiki.stdout.org/rcookbook/Graphs/
ggplot2 book
http://amzn.com/0387981403
Wednesday, February 8, 12