SlideShare a Scribd company logo
1 of 44
Download to read offline
Stat405   Graphics for large data


                           Hadley Wickham
Thursday, 26 August 2010
Majoring in Stat

                    • Declare early (even if you’re not sure)
                    • Weekly lunches
                    • Summer opportunities
                      (research & internships)




Thursday, 26 August 2010
1. Leftovers from last lecture
                2. The diamonds data
                3. Histograms and bar charts
                4. More boxplots and scatterplots
                5. Homework



Thursday, 26 August 2010
# Remember: start with                                                                                                            ●                            ●


             library(ggplot2)                                                                                                                                           ●

       40

                                                                                                                                                                                    ●

                                                                                                                                                            ●
                                                                                                                                           ●
                                                                                                                                                                                         ●    ●
       35
                                                                                                                                                   ●
                                                                                                                                                                                ●
                                                                                                                                                                    ●
                                                                                                                         ●                 ●       ●
                                                                                                                    ●                                           ●
                                                                                                                    ●                                                                   ●          ●
                                                                                                                     ●       ●                                                           ●
                                                                                                                                                                                ●
       30                                                                                                                     ●   ●                                                               ●

                                                                                                                          ● ● ●       ●●           ●       ●●                  ● ● ● ●
                                                                                                                                                                               ●
                                                                                                                                                                            ● ●●
                                                                                                                                                                             ●
                                                                                                                                           ●                        ●       ●●  ●
                                                                                                                              ●                                     ●             ●
                                                                                                                         ● ● ●                 ●
 hwy




                                                                                                                            ●                                               ●                ● ●
                                                                                                                                                                                             ● ●
                                                                    ●                                                ● ●     ●●                                         ●   ●
                                                        ●                                               ●             ●● ●
                                                                                                                      ●        ●
                                                                                                                                 ●         ●●              ●●                   ●        ●
                                                                                                            ●       ● ●
                                                                                                                    ● ●
                                                                                                                    ●         ●●●          ● ●                                   ●        ●     ●●
                                                                                                                    ●                                               ●                          ●
                                                                                                                                                   ●                            ● ●
       25                                           ●       ●                                     ●                          ● ●●
                                                                                                                                                                                         ●●
                                                                                                                                                                                         ●
                                                                                                                                                                                               ●
                                                                                                                                                                                               ●
                                                ●                            ●                ●                                                    ●                 ●              ●    ●
                                                                                                                ●                              ●
                                                                                     ●●                              ●                                              ●
                                                                                                            ●                ●
                                                                ●                         ●                                                    ●                                               ●
                                                                         ●
                                   ●                    ●                                 ●                                                                     ●
                                                                                 ●        ●                                                                      ●
                                                                         ●
                                                                                                                                                       ●
                      ●        ●
       20                          ●
                                           ●     ●
                                               ● ●          ●●      ●                                                                                           ●
                   ●                                        ●●      ●●
                 ●●        ●                ●                       ●●
                                                  ●      ●
                                       ●      ●
                   ●                          ●● ● ●●●
                    ●     ●
                          ●    ●           ● ● ● ●●
                      ●  ●     ●           ●●● ●     ●     ●                     ●
                  ● ● ●
                   ●                           ●● ●     ●  ●
                  ●● ●        ●                        ● ●
                        ●
                 ●                            ● ●
       15        ● ●        ●                ● ●        ● ●
                                                 ●
                                                 ●


                  ●            ●   ●                 ●          ●




                      pickup                         suv                     minivan                  2seater            midsize           subcompact                           compact
qplot(reorder(class, hwy),reorder(class, hwy) = mpg, geom = "jitter")
                            hwy, data
Thursday, 26 August 2010
●           ●




                                                                       ●

       40


                                                                                   ●


       35                                                                          ●




       30
 hwy




                                ●
                                ●

       25                       ●
                                ●
                                ●
                           ●


       20


                                        ●


       15


                           ●    ●



                      pickup   suv   minivan   2seater   midsize   subcompact   compact
qplot(reorder(class, hwy), hwy, data hwy)mpg, geom = "boxplot")
                           reorder(class, =
Thursday, 26 August 2010
●
                                                                                                                                                       ●                            ●
                                                                                                                                                                                    ●




                                                                                                                                                       ●           ●

       40


                                                                                                                                                                                    ● ●
                                                                                                                                                   ● ●

       35                                                                                                                                                                  ●        ● ●
                                                                                                                                                       ●
                                                                                                                                                                                           ●
                                                                                                                                                                   ●
                                                                                                                                ●             ● ● ●
                                                                                                               ●                                                           ●                 ●
                                                                                                                       ●        ●●                                                       ●
       30                                                                                                      ●           ●    ●                                              ●
                                                                                                                                                                              ●●
                                                                                                                                                                                        ●●
                                                                                                                                ●
                                                                                                                               ● ●                                 ●                      ●●
                                                                                                                                          ●   ●●●                           ●● ●●       ● ●●
                                                                                                                                   ●
                                                                                                                       ●       ●               ●                           ●
 hwy




                                                                                                                                ●
                                                                                                                                ●                              ●
                                                                                                               ●                 ●             ●                                    ●● ●●
                                                         ● ●                                                                                                               ●           ●
                                                                                                                   ●     ●            ●                                                 ●
                                                 ●                                                             ●        ●
                                                                                                                        ● ●          ●    ●   ●●●                  ●              ●●
                                                         ●                                             ●               ●●            ●
                                                                                                                                    ●●                                        ●      ●●
                                                                                                                                                                                      ●
                                                                                               ●                   ●    ●
                                                                                                                        ●           ●       ●   ●●                              ●
                                                                                                           ●
       25                                    ●           ●         ●                                               ●                ●                                        ● ●● ●
                                                                                                                                     ●
                                                                                                                                     ●                             ●       ●            ●
                                                         ●             ●                                   ●                              ●                ●
                                                                                                                                                           ●                         ●●
                                                                       ●       ●   ●                                   ●                                       ●
                                                               ●                                                                                                             ●
                                             ●           ●                                         ●               ●                                       ●
                                                                       ●                   ●
                           ●                                               ●
                           ●                             ●                 ●
                                                                                           ●                                                       ●           ●
                                                                                                                                                                       ●
                                                                                       ●
                                   ●          ●
       20          ●       ●               ●● ●
                                            ●        ●
                                                     ●                                                                                                 ●
                                                    ●● ● ●●
                  ●                ●● ●    ● ●       ●    ●
                                    ●
                                    ●          ●   ● ●  ●
                                                 ●
                                                 ●     ●
                             ●                  ●      ●                       ●
                 ● ●●
                    ●     ● ●●             ●●●●● ● ● ●
                                             ●
                                              ● ●● ● ● ● ●
                                                 ●                             ●
                      ● ● ●                            ●
                       ●                       ●
                  ●   ●
                      ●    ●
                         ●                   ●       ●
       15         ●     ●    ●               ●        ●●           ●
                                                     ●
                                                     ●

                                   ●             ●
                      ●        ●                         ●
                                                         ●
                                       ●



qplot(reorder(class,minivan
       pickup suv
                      hwy), 2seater data = subcompact
                                  hwy, midsize mpg,                                                                                                                            compact
  geom = c("jitter", "boxplot"))
                         reorder(class, hwy)
Thursday, 26 August 2010
Your turn

                    Read the help for reorder. Redraw the
                    previous plots with class ordered by
                    median hwy.
                    How would you put the jittered points on
                    top of the boxplots?




Thursday, 26 August 2010
Diamonds



Thursday, 26 August 2010
Diamonds data
                    ~54,000 round diamonds from
                    http://www.diamondse.info/
                    Carat, colour, clarity, cut
                    Total depth, table, depth,
                    width, height
                    Price


Thursday, 26 August 2010
x
                                   table width




                                                           z




                               depth = z / diameter
                           table = table width / x * 100

Thursday, 26 August 2010
Recall

                    Write down five ways to inspect the
                    diamonds dataset.
                    You have one minute!




Thursday, 26 August 2010
Your turn


                    Inspect the data and familiarise yourself
                    with the variables. If you don’t know what
                    they mean, look them up on wikipedia.




Thursday, 26 August 2010
Histogram &
                            bar charts


Thursday, 26 August 2010
Histograms and
                              barcharts

                    Used to display the distribution of a
                    variable
                    Categorical variable → bar chart
                    Continuous variable → histogram




Thursday, 26 August 2010
Always
     experiment with
      the bin width!
Thursday, 26 August 2010
Examples
                # With only one variable, qplot guesses that
                # you want a bar chart or histogram
                qplot(cut, data = diamonds)

                qplot(carat, data = diamonds)
                qplot(carat, data = diamonds, binwidth = 1)
                qplot(carat, data = diamonds, binwidth = 0.1)
                qplot(carat, data = diamonds, binwidth = 0.01)
                resolution(diamonds$carat)

                last_plot() + xlim(0, 3)


Thursday, 26 August 2010
Examples
                # With only one variable, qplot guesses that
                # you want a bar chart or histogram
                qplot(cut, data = diamonds)

                qplot(carat, data = diamonds)
                qplot(carat, data = diamonds, binwidth = 1)
                      Common ggplot2
                qplot(carat, data = diamonds, binwidth = 0.1)
                      technique: adding
                qplot(carat, data = diamonds, binwidth = 0.01)
                         together plot
                resolution(diamonds$carat)
                           components

                last_plot() + xlim(0, 3)


Thursday, 26 August 2010
qplot(table, data = diamonds, binwidth = 1)

     # To zoom in on a plot region   use xlim() and ylim()
     qplot(table, data = diamonds,   binwidth = 1) +
        xlim(50, 70)
     qplot(table, data = diamonds,   binwidth = 0.1) +
       xlim(50, 70)
     qplot(table, data = diamonds,   binwidth = 0.1) +
       xlim(50, 70) + ylim(0, 50)

     # Note that this type of zooming discards data
     outside of the plot regions
     # See coord_cartesian() for an alternative


Thursday, 26 August 2010
Additional variables

                    As with scatterplots can use aesthetics
                    or faceting. Using aesthetics creates
                    pretty, but ineffective, plots.
                    The following examples show the
                    difference, when investigation the
                    relationship between cut and depth.



Thursday, 26 August 2010
4000




         3000
 count




         2000




         1000




            0

                           56   58   60   62   64   66   68   70
qplot(depth, data = diamonds, binwidth = 0.2)
                          depth
Thursday, 26 August 2010
4000




         3000

                                            cut
                                                  Fair
                                                  Good
 count




         2000                                     Very Good
                                                  Premium
                                                  Ideal



         1000




            0

qplot(depth, data = diamonds, binwidth = 0.2,
        56   58   60    62  64  66   68   70
  fill = cut) + xlim(55, 70)
                      depth
Thursday, 26 August 2010
4000




         3000

                                            cut
                                                  Fair
                                                  Good
 count




         2000                                     Very Good
                                                  Premium
                                                  Ideal



         1000




         Fill is the aesthetic
           0
             for fill colour
qplot(depth, data = diamonds, binwidth = 0.2,
        56   58   60    62  64  66   68   70
  fill = cut) + xlim(55, 70)
                      depth
Thursday, 26 August 2010
Fair    Good               Very Good


         2500

         2000

         1500

         1000

         500

            0
 count




                           Premium   Ideal


         2500

         2000

         1500

         1000

         500

            0
qplot(depth, 62 64 66= 68 70 56 58 60 binwidth = 0.2) +
       56 58 60 data    diamonds, 62 64 66 68 70 56 58 60   62 64 66 68 70

  xlim(55, 70) + facet_wrap(~depth    cut)
Thursday, 26 August 2010
Your turn

                    Explore the distribution of price.
                    How does it vary with colour, or cut, and
                    clarity?
                    Practice zooming in on regions of interest.




Thursday, 26 August 2010
Box and
                           whisker plots


Thursday, 26 August 2010
Boxplots

                    Less information than a histogram, but
                    take up much less space.
                    Already seen them used with discrete x
                    values. Can also use with continuous x
                    values, by specifying how we want the
                    data grouped.



Thursday, 26 August 2010
qplot(table, price, data = diamonds)
Thursday, 26 August 2010
●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
         15000                       ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●
                                     ●




         10000
 price




         5000




                           50   60       70   80   90
qplot(table, price, data = diamonds, geom = "boxplot")
                             table
Thursday, 26 August 2010
●   ●   ●
                                            ●
                                            ●   ●
                                                ●   ●
                                                    ●   ●
                                                        ●   ●
                                                            ●   ●   ●
                                    ●
                                    ●   ●
                                        ●   ●   ●
                                                ●   ●
                                                    ●   ●
                                                        ●   ●
                                                            ●   ●
                                                                ●   ●     ●
                            ●       ●   ●   ●
                                            ●   ●   ●   ●   ●   ●
                                                                ●   ●   ●
                              ●     ●   ●
                                        ●   ●   ●
                                                ●   ●   ●   ●   ●   ●
                                    ●   ●   ●       ●   ●   ●   ●
                            ● ●     ●
                                    ●   ●
                                        ●
                                        ●
                                            ●
                                            ●
                                                ●
                                                ●
                                                ●   ●
                                                    ●
                                                    ●
                                                        ●
                                                        ●
                                                        ●   ●
                                                            ●   ●
                                                                ●
                                                                ●
                                                                    ●
                                                                    ●   ● ●
                                    ●   ●   ●   ●
                                                ●   ●
                                                    ●
                                                    ●   ●
                                                        ●   ●
                                                            ●   ●
                                                                ●   ●   ●
                              ●     ●   ●
                                        ●   ●
                                            ●
                                            ●   ●
                                                ●   ●
                                                    ●   ●   ●   ●   ●   ●
                                    ●   ●
                                        ●   ●
                                            ●   ●
                                                ●   ●
                                                    ●   ●
                                                        ●   ●
                                                            ●   ●
                                                                ●
                                                                ●   ●   ●
                              ●     ●
                                    ●   ●
                                        ●   ●   ●
                                                ●   ●   ●
                                                        ●   ●
                                                            ●   ●   ●   ●
                                    ●
                                    ●   ●   ●
                                            ●   ●
                                                ●   ●
                                                    ●   ●
                                                        ●
                                                        ●   ●
                                                            ●   ●
                                                                ●   ●
                                                                    ●   ● ●
                                    ●   ●   ●   ●   ●   ●   ●
                                                            ●   ●
                                                                ●   ●           ●
                               ●    ●   ●
                                        ●
                                        ●   ●
                                            ●   ●
                                                ●   ●   ●   ●   ●   ●     ●
                                                                          ●     ●
                                    ●   ●   ●
                                            ●   ●
                                                ●
                                                ●   ●
                                                    ●   ●
                                                        ●   ●
                                                            ●   ●
                                                                ●   ●   ●
                               ●
                               ●    ●   ●   ●   ●   ●   ●
                                                        ●   ●   ●   ●
                                                                    ●   ● ●
                               ●    ●   ●
                                        ●   ●
                                            ●   ●   ●
                                                    ●   ●
                                                        ●
                                                        ●   ●
                                                            ●   ●   ●   ●
                           ●   ●
                             ● ●    ●   ●   ●   ●
                                                ●   ●
                                                    ●   ●   ●
                                                            ●   ●   ●
                               ●    ●   ●   ●
                                            ●
                                            ●   ●
                                                ●   ●   ●   ●   ●   ●           ●
                               ●    ●
                                    ●   ●
                                        ●   ●
                                            ●   ●
                                                ●   ●
                                                    ●   ●
                                                        ●   ●
                                                            ●   ●
                                                                ●   ●           ●●
                                                                                ●
                               ●    ●
                                    ●   ●   ●   ●   ●   ●   ●
                                                            ●   ●   ●
                               ●    ●   ●
                                        ●
                                        ●   ●
                                            ●   ●
                                                ●   ●
                                                    ●   ●
                                                        ●   ●
                                                            ●   ●
                                                                ●   ●
                                                                    ●
                                    ●   ●   ●       ●
                                                    ●   ●   ●           ● ●
         15000                 ●
                               ●
                                    ●
                                    ●
                                    ●
                                        ●
                                        ●
                                        ●
                                        ●
                                            ●
                                            ●
                                            ●
                                                ●
                                                ●
                                                ●
                                                ●
                                                    ●
                                                    ●
                                                    ●
                                                        ●
                                                        ●
                                                        ●
                                                        ●
                                                            ●
                                                            ●
                                                            ●
                                                                ●
                                                                ●
                                                                ●
                                                                    ●
                                                                    ●
                                                                    ●   ●
                                                                        ●
                                                                          ●         ●
                                                                                    ●
                                    ●   ●
                                        ●   ●
                                            ●   ●   ●   ●   ●   ●
                                                                ●   ●   ●
                             ●      ●
                                    ●   ●
                                        ●   ●
                                            ●
                                            ●   ●
                                                ●   ●
                                                    ●   ●
                                                        ●       ●   ●
                                    ●
                                    ●   ●   ●   ●   ●   ●       ●
                                                                ●
                                                                ●   ●           ●
                                    ●
                                    ●   ●   ●   ●
                                                ●   ●           ●
                                    ●   ●
                                        ●
                                        ●   ●
                                            ●   ●
                                                ●   ●
                                                    ●           ●
                                                                ●       ●
                                ●   ●
                                    ●   ●   ●   ●   ●           ●   ●   ●
                                ●
                                ●   ●
                                    ●   ●
                                        ●   ●
                                            ●   ●
                                                ●               ●
                                                                ●   ●
                                                                    ●
                                                                    ●   ●       ●●●
                                ●
                                ●   ●
                                    ●   ●
                                        ●   ●
                                            ●   ●
                                                ●               ●
                                                                ●   ●           ●
                                ●   ●
                                    ●   ●   ●   ●
                                                ●               ●
                                                                ●   ●       ●
                                    ●
                                    ●   ●
                                        ●   ●   ●
                                                ●               ●
                                                                ●   ●   ●         ●
                                        ●   ●   ●               ●
                                                                ●   ●
                                                                    ●   ●
                                ●
                                    ●
                                    ●
                                    ●
                                        ●
                                        ●   ●
                                            ●
                                            ●
                                                ●
                                                ●
                                                ●               ●
                                                                ●   ●
                                                                    ●   ●       ●●●     ●
                                    ●
                                    ●   ●
                                        ●   ●
                                            ●   ●
                                                ●               ●
                                                                ●   ●
                                                                    ●   ●
                                                                        ●
                                    ●   ●   ●
                                            ●   ●               ●
                                                                ●   ●   ●   ●           ●
                                    ●
                                    ●   ●
                                        ●   ●
                                            ●
                                            ●   ●
                                                ●                   ●
                                    ●   ●   ●
                                            ●   ●                   ●   ●           ●
                                ●
                                ●   ●
                                    ●   ●
                                        ●   ●
                                            ●                       ●   ●   ●
                                ●   ●
                                    ●   ●
                                        ●   ●
                                            ●                       ●
                                                                    ●   ●   ●   ●
                                ●
                                ●   ●   ●   ●                           ●   ●
                                    ●   ●   ●                           ●
                                ●   ●
                                    ●   ●
                                        ●   ●
                                            ●                               ●
                                                                            ●   ●● ●        ●
                                    ●
                                    ●   ●   ●                           ●   ●
                                ●   ●
                                    ●   ●
                                        ●   ●
                                            ●                               ●   ●
                                ●
                                ●   ●   ●   ●
                                            ●                           ●         ● ●
                                ●
                                ●   ●
                                    ●   ●
                                        ●   ●
                                            ●                           ●
                                ●   ●   ●
                                        ●
                                        ●   ●
                                            ●                                     ●
         10000                  ●
                                ●
                                    ●
                                    ●
                                    ●
                                        ●
                                        ●
                                        ●
                                            ●                                   ●
 price




                                ●   ●
                                    ●   ●
                                        ●
                                ●
                                ●   ●   ●                                           ●
                                ●   ●
                                    ●   ●
                                        ●
                                ●   ●
                                    ●   ●
                                        ●
                                    ●
                                    ●   ●
                                        ●                                           ●
                                    ●
                                    ●   ●
                                    ●
                                    ●
                                    ●
                                    ●




         5000




qplot(table, price, data = diamonds, geom 80 "boxplot",
               50       60         70     =         90
  group = round(table))      table
Thursday, 26 August 2010
●   ●   ●
                                            ●
                                            ●   ●
                                                ●   ●
                                                    ●   ●
                                                        ●   ●
                                                            ●   ●   ●
                                    ●
                                    ●   ●
                                        ●   ●   ●
                                                ●   ●
                                                    ●   ●
                                                        ●   ●
                                                            ●   ●
                                                                ●   ●     ●
                            ●       ●   ●   ●
                                            ●   ●   ●   ●   ●   ●
                                                                ●   ●   ●
                              ●     ●   ●
                                        ●   ●   ●
                                                ●   ●   ●   ●   ●   ●
                                    ●   ●   ●       ●   ●   ●   ●
                            ● ●     ●
                                    ●   ●
                                        ●
                                        ●
                                            ●
                                            ●
                                                ●
                                                ●
                                                ●   ●
                                                    ●
                                                    ●
                                                        ●
                                                        ●
                                                        ●   ●
                                                            ●   ●
                                                                ●
                                                                ●
                                                                    ●
                                                                    ●   ● ●
                                    ●   ●   ●   ●
                                                ●   ●
                                                    ●
                                                    ●   ●
                                                        ●   ●
                                                            ●   ●
                                                                ●   ●   ●
                              ●     ●   ●
                                        ●   ●
                                            ●
                                            ●   ●
                                                ●   ●
                                                    ●   ●   ●   ●   ●   ●
                                    ●   ●
                                        ●   ●
                                            ●   ●
                                                ●   ●
                                                    ●   ●
                                                        ●   ●
                                                            ●   ●
                                                                ●
                                                                ●   ●   ●
                              ●     ●
                                    ●   ●
                                        ●   ●   ●
                                                ●   ●   ●
                                                        ●   ●
                                                            ●   ●   ●   ●
                                    ●
                                    ●   ●   ●
                                            ●   ●
                                                ●   ●
                                                    ●   ●
                                                        ●
                                                        ●   ●
                                                            ●   ●
                                                                ●   ●
                                                                    ●   ● ●
                                    ●   ●   ●   ●   ●   ●   ●
                                                            ●   ●
                                                                ●   ●           ●
                               ●    ●   ●
                                        ●
                                        ●   ●
                                            ●   ●
                                                ●   ●   ●   ●   ●   ●     ●
                                                                          ●     ●
                                    ●   ●   ●
                                            ●   ●
                                                ●
                                                ●   ●
                                                    ●   ●
                                                        ●   ●
                                                            ●   ●
                                                                ●   ●   ●
                               ●
                               ●    ●   ●   ●   ●   ●   ●
                                                        ●   ●   ●   ●
                                                                    ●   ● ●
                               ●    ●   ●
                                        ●   ●
                                            ●   ●   ●
                                                    ●   ●
                                                        ●
                                                        ●   ●
                                                            ●   ●   ●   ●
                           ●   ●
                             ● ●    ●   ●   ●   ●
                                                ●   ●
                                                    ●   ●   ●
                                                            ●   ●   ●
                               ●    ●   ●   ●
                                            ●
                                            ●   ●
                                                ●   ●   ●   ●   ●   ●           ●
                               ●    ●
                                    ●   ●
                                        ●   ●
                                            ●   ●
                                                ●   ●
                                                    ●   ●
                                                        ●   ●
                                                            ●   ●
                                                                ●   ●           ●●
                                                                                ●
                               ●    ●
                                    ●   ●   ●   ●   ●   ●   ●
                                                            ●   ●   ●
                               ●    ●   ●
                                        ●
                                        ●   ●
                                            ●   ●
                                                ●   ●
                                                    ●   ●
                                                        ●   ●
                                                            ●   ●
                                                                ●   ●
                                                                    ●
                                    ●   ●   ●       ●
                                                    ●   ●   ●           ● ●
         15000                 ●
                               ●
                                    ●
                                    ●
                                    ●
                                        ●
                                        ●
                                        ●
                                        ●
                                            ●
                                            ●
                                            ●
                                                ●
                                                ●
                                                ●
                                                ●
                                                    ●
                                                    ●
                                                    ●
                                                        ●
                                                        ●
                                                        ●
                                                        ●
                                                            ●
                                                            ●
                                                            ●
                                                                ●
                                                                ●
                                                                ●
                                                                    ●
                                                                    ●
                                                                    ●   ●
                                                                        ●
                                                                          ●         ●
                                                                                    ●
                                    ●   ●
                                        ●   ●
                                            ●   ●   ●   ●   ●   ●
                                                                ●   ●   ●
                             ●      ●
                                    ●   ●
                                        ●   ●
                                            ●
                                            ●   ●
                                                ●   ●
                                                    ●   ●
                                                        ●       ●   ●
                                    ●
                                    ●   ●   ●   ●   ●   ●       ●
                                                                ●
                                                                ●   ●           ●
                                    ●
                                    ●   ●   ●   ●
                                                ●   ●           ●
                                    ●   ●
                                        ●
                                        ●   ●
                                            ●   ●
                                                ●   ●
                                                    ●           ●
                                                                ●       ●
                                ●   ●
                                    ●   ●   ●   ●   ●           ●   ●   ●
                                ●
                                ●   ●
                                    ●   ●
                                        ●   ●
                                            ●   ●
                                                ●               ●
                                                                ●   ●
                                                                    ●
                                                                    ●   ●       ●●●
                                ●
                                ●   ●
                                    ●   ●
                                        ●   ●
                                            ●   ●
                                                ●               ●
                                                                ●   ●           ●
                                ●   ●
                                    ●   ●   ●   ●
                                                ●               ●
                                                                ●   ●       ●
                                    ●
                                    ●   ●
                                        ●   ●   ●
                                                ●               ●
                                                                ●   ●   ●         ●
                                        ●   ●   ●               ●
                                                                ●   ●
                                                                    ●   ●
                                ●
                                    ●
                                    ●
                                    ●
                                        ●
                                        ●   ●
                                            ●
                                            ●
                                                ●
                                                ●
                                                ●               ●
                                                                ●   ●
                                                                    ●   ●       ●●●     ●
                                    ●
                                    ●   ●
                                        ●   ●
                                            ●   ●
                                                ●               ●
                                                                ●   ●
                                                                    ●   ●
                                                                        ●
                                    ●   ●   ●
                                            ●   ●               ●
                                                                ●   ●   ●   ●           ●
                                    ●
                                    ●   ●
                                        ●   ●
                                            ●
                                            ●   ●
                                                ●                   ●
                                    ●   ●   ●
                                            ●   ●                   ●   ●           ●
                                ●
                                ●   ●
                                    ●   ●
                                        ●   ●
                                            ●                       ●   ●   ●
                                ●   ●
                                    ●   ●
                                        ●   ●
                                            ●                       ●
                                                                    ●   ●   ●   ●
                                ●
                                ●   ●   ●   ●                           ●   ●
                                    ●   ●   ●                           ●
                                ●   ●
                                    ●   ●
                                        ●   ●
                                            ●                               ●
                                                                            ●   ●● ●        ●
                                    ●
                                    ●   ●   ●                           ●   ●
                                ●   ●
                                    ●   ●
                                        ●   ●
                                            ●                               ●   ●
                                ●
                                ●   ●   ●   ●
                                            ●                           ●         ● ●
                                ●
                                ●   ●
                                    ●   ●
                                        ●   ●
                                            ●                           ●
                                ●   ●   ●
                                        ●
                                        ●   ●
                                            ●                                     ●
         10000                  ●
                                ●
                                    ●
                                    ●
                                    ●
                                        ●
                                        ●
                                        ●
                                            ●                                   ●
 price




                                ●   ●
                                    ●   ●
                                        ●
                                ●
                                ●   ●   ●                                           ●
                                ●   ●
                                    ●   ●
                                        ●
                                ●   ●
                                    ●   ●
                                        ●
                                    ●
                                    ●   ●
                                        ●                                           ●
                                    ●
                                    ●   ●
                                    ●
                                    ●
                                    ●
                                    ●




         5000




     One boxplot for
    each unique value
     of this aesthetic
qplot(table, price, data = diamonds, geom 80 "boxplot",
               50       60         70     =         90
  group = round(table))      table
Thursday, 26 August 2010
Scatterplots



Thursday, 26 August 2010
Interpreting a
                             scatterplot

                    • Global patterns
                    • Local patterns
                    • Deviations




Thursday, 26 August 2010
Thursday, 26 August 2010
Strong linear relationship.
               A number of outliers.




Thursday, 26 August 2010
Thursday, 26 August 2010
Unusual striations. Two
                           groups? Little relationship
                           between table and price?




Thursday, 26 August 2010
Thursday, 26 August 2010
Curved (exponential?)
                           relationship. Outliers mostly
                           cheaper than expected.


Thursday, 26 August 2010
But what’s the
                               problem with
                            all these plots?


qplot(carat, price, data = diamonds)
Thursday, 26 August 2010
But what’s the
                               problem with
                            all these plots?
                               In pairs, brainstorm
                           solutions for 2 minutes.

qplot(carat, price, data = diamonds)
Thursday, 26 August 2010
Idea             ggplot
                     Small points        shape = I(".")

                   Transparency         alpha = I(1/50)

                           Jittering    geom = "jitter"

                  Smooth curve          geom = "smooth"
                                        geom = "bin2d" or
                           2d bins         geom = "hex"

             Density contours          geom = "density2d"
Thursday, 26 August 2010
Your turn

                    Practice doing these plots yourself.
                    Read the online documentation for each
                    plot type: http://had.co.nz/ggplot2




Thursday, 26 August 2010
Homework

                    Practice your graphics/data exploration
                    skills with the diamonds or mpg data.
                    Due in one week.
                    Make sure to read the grading rubric, and
                    find a colour printer.



Thursday, 26 August 2010
Asking questions

                    You have two minutes to write down as
                    many questions as you can come up with
                    that you might want to answer about the
                    diamonds data.
                    Write your best question on a piece of
                    paper and turn it in.



Thursday, 26 August 2010

More Related Content

Similar to 02 Large

Similar to 02 Large (20)

04 Wrapup
04 Wrapup04 Wrapup
04 Wrapup
 
08 Continuous
08 Continuous08 Continuous
08 Continuous
 
08 Continuous
08 Continuous08 Continuous
08 Continuous
 
13 Bivariate
13 Bivariate13 Bivariate
13 Bivariate
 
Los Angeles R users group - July 12 2011 - Part 1
Los Angeles R users group - July 12 2011 - Part 1Los Angeles R users group - July 12 2011 - Part 1
Los Angeles R users group - July 12 2011 - Part 1
 
1 basics
1 basics1 basics
1 basics
 
Model Visualisation (with ggplot2)
Model Visualisation (with ggplot2)Model Visualisation (with ggplot2)
Model Visualisation (with ggplot2)
 
Over Visie, Missie En Strategie
Over Visie, Missie En StrategieOver Visie, Missie En Strategie
Over Visie, Missie En Strategie
 
About Vision, Mission And Strategy
About Vision, Mission And StrategyAbout Vision, Mission And Strategy
About Vision, Mission And Strategy
 
01 Intro
01 Intro01 Intro
01 Intro
 
How People Use Facebook -- And Why It Matters
How People Use Facebook -- And Why It MattersHow People Use Facebook -- And Why It Matters
How People Use Facebook -- And Why It Matters
 
14 case-study
14 case-study14 case-study
14 case-study
 
研修企画書11 12term voda-カヤック
研修企画書11 12term voda-カヤック研修企画書11 12term voda-カヤック
研修企画書11 12term voda-カヤック
 
21 Ml
21 Ml21 Ml
21 Ml
 
研修企画書11-12term voda-カヤック
研修企画書11-12term voda-カヤック研修企画書11-12term voda-カヤック
研修企画書11-12term voda-カヤック
 
17 polishing
17 polishing17 polishing
17 polishing
 
17 Sampling Dist
17 Sampling Dist17 Sampling Dist
17 Sampling Dist
 
Modul mulus bahagian c sjk (modul murid)
Modul mulus bahagian c sjk (modul murid)Modul mulus bahagian c sjk (modul murid)
Modul mulus bahagian c sjk (modul murid)
 
Modul mulus bahagian c sjk (modul guru)
Modul mulus bahagian c sjk (modul guru)Modul mulus bahagian c sjk (modul guru)
Modul mulus bahagian c sjk (modul guru)
 
Modul mulus bahagian c sk (modul murid)
Modul mulus bahagian c sk (modul murid)Modul mulus bahagian c sk (modul murid)
Modul mulus bahagian c sk (modul murid)
 

More from Hadley Wickham (20)

27 development
27 development27 development
27 development
 
24 modelling
24 modelling24 modelling
24 modelling
 
23 data-structures
23 data-structures23 data-structures
23 data-structures
 
Graphical inference
Graphical inferenceGraphical inference
Graphical inference
 
R packages
R packagesR packages
R packages
 
22 spam
22 spam22 spam
22 spam
 
21 spam
21 spam21 spam
21 spam
 
20 date-times
20 date-times20 date-times
20 date-times
 
19 tables
19 tables19 tables
19 tables
 
18 cleaning
18 cleaning18 cleaning
18 cleaning
 
16 critique
16 critique16 critique
16 critique
 
15 time-space
15 time-space15 time-space
15 time-space
 
13 case-study
13 case-study13 case-study
13 case-study
 
12 adv-manip
12 adv-manip12 adv-manip
12 adv-manip
 
11 adv-manip
11 adv-manip11 adv-manip
11 adv-manip
 
11 adv-manip
11 adv-manip11 adv-manip
11 adv-manip
 
10 simulation
10 simulation10 simulation
10 simulation
 
10 simulation
10 simulation10 simulation
10 simulation
 
09 bootstrapping
09 bootstrapping09 bootstrapping
09 bootstrapping
 
07 problem-solving
07 problem-solving07 problem-solving
07 problem-solving
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Recently uploaded (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 

02 Large

  • 1. Stat405 Graphics for large data Hadley Wickham Thursday, 26 August 2010
  • 2. Majoring in Stat • Declare early (even if you’re not sure) • Weekly lunches • Summer opportunities (research & internships) Thursday, 26 August 2010
  • 3. 1. Leftovers from last lecture 2. The diamonds data 3. Histograms and bar charts 4. More boxplots and scatterplots 5. Homework Thursday, 26 August 2010
  • 4. # Remember: start with ● ● library(ggplot2) ● 40 ● ● ● ● ● 35 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 30 ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● hwy ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ● ● ● ● ● 25 ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 ● ● ● ● ● ●● ● ● ● ●● ●● ●● ● ● ●● ● ● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● 15 ● ● ● ● ● ● ● ● ● ● ● ● ● ● pickup suv minivan 2seater midsize subcompact compact qplot(reorder(class, hwy),reorder(class, hwy) = mpg, geom = "jitter") hwy, data Thursday, 26 August 2010
  • 5. ● ● 40 ● 35 ● 30 hwy ● ● 25 ● ● ● ● 20 ● 15 ● ● pickup suv minivan 2seater midsize subcompact compact qplot(reorder(class, hwy), hwy, data hwy)mpg, geom = "boxplot") reorder(class, = Thursday, 26 August 2010
  • 6. ● ● ● ● ● 40 ● ● ● ● 35 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● 30 ● ● ● ● ●● ●● ● ● ● ● ●● ● ●●● ●● ●● ● ●● ● ● ● ● ● hwy ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ●● ● ● 25 ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 ● ● ●● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ●●●●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 15 ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● qplot(reorder(class,minivan pickup suv hwy), 2seater data = subcompact hwy, midsize mpg, compact geom = c("jitter", "boxplot")) reorder(class, hwy) Thursday, 26 August 2010
  • 7. Your turn Read the help for reorder. Redraw the previous plots with class ordered by median hwy. How would you put the jittered points on top of the boxplots? Thursday, 26 August 2010
  • 9. Diamonds data ~54,000 round diamonds from http://www.diamondse.info/ Carat, colour, clarity, cut Total depth, table, depth, width, height Price Thursday, 26 August 2010
  • 10. x table width z depth = z / diameter table = table width / x * 100 Thursday, 26 August 2010
  • 11. Recall Write down five ways to inspect the diamonds dataset. You have one minute! Thursday, 26 August 2010
  • 12. Your turn Inspect the data and familiarise yourself with the variables. If you don’t know what they mean, look them up on wikipedia. Thursday, 26 August 2010
  • 13. Histogram & bar charts Thursday, 26 August 2010
  • 14. Histograms and barcharts Used to display the distribution of a variable Categorical variable → bar chart Continuous variable → histogram Thursday, 26 August 2010
  • 15. Always experiment with the bin width! Thursday, 26 August 2010
  • 16. Examples # With only one variable, qplot guesses that # you want a bar chart or histogram qplot(cut, data = diamonds) qplot(carat, data = diamonds) qplot(carat, data = diamonds, binwidth = 1) qplot(carat, data = diamonds, binwidth = 0.1) qplot(carat, data = diamonds, binwidth = 0.01) resolution(diamonds$carat) last_plot() + xlim(0, 3) Thursday, 26 August 2010
  • 17. Examples # With only one variable, qplot guesses that # you want a bar chart or histogram qplot(cut, data = diamonds) qplot(carat, data = diamonds) qplot(carat, data = diamonds, binwidth = 1) Common ggplot2 qplot(carat, data = diamonds, binwidth = 0.1) technique: adding qplot(carat, data = diamonds, binwidth = 0.01) together plot resolution(diamonds$carat) components last_plot() + xlim(0, 3) Thursday, 26 August 2010
  • 18. qplot(table, data = diamonds, binwidth = 1) # To zoom in on a plot region use xlim() and ylim() qplot(table, data = diamonds, binwidth = 1) + xlim(50, 70) qplot(table, data = diamonds, binwidth = 0.1) + xlim(50, 70) qplot(table, data = diamonds, binwidth = 0.1) + xlim(50, 70) + ylim(0, 50) # Note that this type of zooming discards data outside of the plot regions # See coord_cartesian() for an alternative Thursday, 26 August 2010
  • 19. Additional variables As with scatterplots can use aesthetics or faceting. Using aesthetics creates pretty, but ineffective, plots. The following examples show the difference, when investigation the relationship between cut and depth. Thursday, 26 August 2010
  • 20. 4000 3000 count 2000 1000 0 56 58 60 62 64 66 68 70 qplot(depth, data = diamonds, binwidth = 0.2) depth Thursday, 26 August 2010
  • 21. 4000 3000 cut Fair Good count 2000 Very Good Premium Ideal 1000 0 qplot(depth, data = diamonds, binwidth = 0.2, 56 58 60 62 64 66 68 70 fill = cut) + xlim(55, 70) depth Thursday, 26 August 2010
  • 22. 4000 3000 cut Fair Good count 2000 Very Good Premium Ideal 1000 Fill is the aesthetic 0 for fill colour qplot(depth, data = diamonds, binwidth = 0.2, 56 58 60 62 64 66 68 70 fill = cut) + xlim(55, 70) depth Thursday, 26 August 2010
  • 23. Fair Good Very Good 2500 2000 1500 1000 500 0 count Premium Ideal 2500 2000 1500 1000 500 0 qplot(depth, 62 64 66= 68 70 56 58 60 binwidth = 0.2) + 56 58 60 data diamonds, 62 64 66 68 70 56 58 60 62 64 66 68 70 xlim(55, 70) + facet_wrap(~depth cut) Thursday, 26 August 2010
  • 24. Your turn Explore the distribution of price. How does it vary with colour, or cut, and clarity? Practice zooming in on regions of interest. Thursday, 26 August 2010
  • 25. Box and whisker plots Thursday, 26 August 2010
  • 26. Boxplots Less information than a histogram, but take up much less space. Already seen them used with discrete x values. Can also use with continuous x values, by specifying how we want the data grouped. Thursday, 26 August 2010
  • 27. qplot(table, price, data = diamonds) Thursday, 26 August 2010
  • 28. ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 15000 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 10000 price 5000 50 60 70 80 90 qplot(table, price, data = diamonds, geom = "boxplot") table Thursday, 26 August 2010
  • 29. ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 15000 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 10000 ● ● ● ● ● ● ● ● ● ● price ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5000 qplot(table, price, data = diamonds, geom 80 "boxplot", 50 60 70 = 90 group = round(table)) table Thursday, 26 August 2010
  • 30. ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 15000 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 10000 ● ● ● ● ● ● ● ● ● ● price ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5000 One boxplot for each unique value of this aesthetic qplot(table, price, data = diamonds, geom 80 "boxplot", 50 60 70 = 90 group = round(table)) table Thursday, 26 August 2010
  • 32. Interpreting a scatterplot • Global patterns • Local patterns • Deviations Thursday, 26 August 2010
  • 34. Strong linear relationship. A number of outliers. Thursday, 26 August 2010
  • 36. Unusual striations. Two groups? Little relationship between table and price? Thursday, 26 August 2010
  • 38. Curved (exponential?) relationship. Outliers mostly cheaper than expected. Thursday, 26 August 2010
  • 39. But what’s the problem with all these plots? qplot(carat, price, data = diamonds) Thursday, 26 August 2010
  • 40. But what’s the problem with all these plots? In pairs, brainstorm solutions for 2 minutes. qplot(carat, price, data = diamonds) Thursday, 26 August 2010
  • 41. Idea ggplot Small points shape = I(".") Transparency alpha = I(1/50) Jittering geom = "jitter" Smooth curve geom = "smooth" geom = "bin2d" or 2d bins geom = "hex" Density contours geom = "density2d" Thursday, 26 August 2010
  • 42. Your turn Practice doing these plots yourself. Read the online documentation for each plot type: http://had.co.nz/ggplot2 Thursday, 26 August 2010
  • 43. Homework Practice your graphics/data exploration skills with the diamonds or mpg data. Due in one week. Make sure to read the grading rubric, and find a colour printer. Thursday, 26 August 2010
  • 44. Asking questions You have two minutes to write down as many questions as you can come up with that you might want to answer about the diamonds data. Write your best question on a piece of paper and turn it in. Thursday, 26 August 2010