SlideShare ist ein Scribd-Unternehmen logo
1 von 65
Chapter 2
        Graphical Methods for
              Describing Data
                 Distributions
Created by Kathy Fritz / Revised by S. Miller September 2012
Variable
• any characteristic whose value may change
  from one individual to another




                                    College

         Home
Data
• The values for a variable from individual
  observations
Suppose that a PE coach records the
height of each student in his class.


  This is an example of
  univariate data




Univariate – consist of observations on a
single variable made on individuals in a
sample or population
Suppose that the PE coach records the
height and weight of each student in his
class.


    This is an example of
    bivariate data




Bivariate - data that consist of pairs of
numbers from two variables for each
individual in a sample or population
Suppose that the PE coach records the
height, weight, number of sit-ups, and
number of push-ups for each student in
his class.


   This is an example of
   multivariate data



Multivariate - data that consist of
observations on two or more variables
Two types of variables



categorical   numerical
Categorical variables
• Qualitative

• Consist of categorical responses

1.   Car model                      Which of
                                   They are all
                                      these
2.   Birth year                    categorical
                                  variables are
3.   Type of cell phone             variables!
                                      NOT
4.   Your zip code                 categorical
5.   Which club you have joined     variables?
Numerical variables
• quantitative It makes sense to perform math
           There operations on these values.
                  are two types of
             numerical variables -
• observations or measurements take on
          discrete and continuous
  numerical values

1.   GPAs                          Which of these
                              Does it makes sense
                                    variables are
2.   Height of students        to find an average
                                  NOT numerical?
3.   Codes to combination locks
                              code to combination
4.   Number of text messages per day locks?
5.   Weight of textbooks
Two types of variables



categorical       numerical


              discrete   continuous
Discrete (numerical)

• Isolated points along a number line

• usually counts of items

• Example: number of textbooks purchased
Continuous (numerical)
• Variable that can be any value in a
  given interval

• usually measurements of something

• Examples: GPAs or height or weight
Are the following variables categorical
or numerical (discrete or continuous)?
1. the color of cars in the teacher’s lot
              Categorical
2. the number of calculators owned by
   students at your college Discrete numerical

3. the zip code of an individual
                           Categorical
       Is money a measurement or a count?
4. the amount of time it takes students to
   drive to school        Continuous numerical
5. the appraised value of homes in your city
                            Discrete numerical
Graphical Display Variable Type              Data Type     Purpose
                                                           Display data
Bar Chart
                Use the following table to
                    Univariate               Categorical
                                                           distribution
Comparative Bar
Chart           determine2an appropriate 2 or more
                  Univariate for or
                  more groups
                                    Categorical
                                                Compare
                                                groups

Dotplot        graphical display a data set. data of
                  Univariate
                                            What types
                                    Numerical
                                                Display
                                                     graphs can be
                                                         distribution

                                             Numerical used with
Comparative         Univariate for 2 or                  Compare 2 or more
dotplot             more groups                          groups
Stem-and-leaf                                          categorical
                                                         Display data
                    Univariate               Numerical
display                                                   data?
                                                         distribution
Comparative stem-   Univariate for 2                       Compare 2 or more
and-leaf            groups                  In section
                                             Numerical
                                                           2.3, we will
                                                           groups

Histogram           Univariate               see how the various
                                             Numerical
                                                       Display data
                                                       distribution
                                            graphical displays for
                                                       Investigate
Scatterplot         Bivariate               univariate,relationship between
                                             Numerical  numerical
                                                data compare.
                                                       2 variables
                    Univariate, collected                  Investigate trend
Time series plot                             Numerical
                    over time                              over time
Displaying
Categorical Data

               Bar Charts
   Comparative Bar Charts
Bar Chart
When to Use:          Univariate, Categorical data
To comply with new standards from the U. S. Department of
     This ischart is afrequency distribution.
     A bar called a graphical bottom of the
Transportation, helmets should reach thedisplay for
motorcyclist’s ears. The report “Motorcycle Helmet Use in 2005 –
                   categorical data.
Overall Results” (National Highway Traffic Safety Administration,
Augustfrequency distribution is by observing 1700
    A 2005) summarized data collected a table that
      displays the possible categories along
motorcyclists nationwide at selected roadway locations.
         The frequency for a particular
Each time a motorcyclist passed by,frequencies or whether
       with the associated the observer that     noted
the category is thehelmet (N), a noncompliant helmet (NC),
    rider was wearing no number of times
or a compliant helmet (C). frequencies. set.
                 relative in the data
       category appears
                                            Helmet Use   Frequency
The data are summarized in this
                                                N           731
table:
                                                NC          153
This should equal the total number of
                                   C                        816
            observations.                                   1700
Bar Chart
To compile with new standards from the U. S. Department of
Transportation, helmets should reach the bottom of the
motorcyclist’s ears. The report “Motorcycle Helmet Use in 2005 –
Overall Results” (National Highway Traffic Safety Administration,
August 2005) summarized data collected by observing 1700
motorcyclists nationwide at selected roadway locations.
Each time a motorcyclist passed by, the observer noted whether
the rider was wearing no helmet (N), a noncompliant helmet (NC),
or a compliant helmet (C).


The data are summarized in this                           Relative
                                            Helmet Use
                                            Helmet Use   Frequency
table:                                          N           731
                                                           0.430

             This should equal 1                NC          153
                                                           0.090
                                                            816
           (allowing for rounding).             C          0.480
                                                           1700
                                                           1.000
Bar Chart
How to construct
1. Draw a horizontal line; write the categories or
     All bars should have the same width so
   labels below the line at regularly spaced
      that both the height and the area of
   intervals
        the bar are proportional to the
    frequency or relative frequency of the
2. Draw a vertical line; label the scale using
           corresponding categories.
   frequency or relative frequency

3. Place a rectangular bar above each category
   label with a height determined by its frequency
   or relative frequency
Bar Chart
What to Look For
    Frequently or infrequently occurring
    categories

Here is the
completed bar chart
for the motorcycle
helmet data.

Describe this graph.
Comparative Bar Charts
When to Use      Univariate, Categorical data for
 Bar charts can two or more groups
                 also be used to provide a visual
         You use relative frequency rather
       comparison of two or more groups.
        than frequency on the vertical axis
How to constructyou can make meaningful
         so that
          comparisons even if the sample
• Constructed by using the same horizontal and
              sizes are not the same.
  vertical axes for the bar charts of two or
  more groups
• Usually color-coded to indicate which bars
             Why?
  correspond to each group
• Should use relative frequencies on the
  vertical axis
Each year the Princeton Review conducts a survey of
students applying to college and of parents of college
applicants. In 2009, 12,715 high school students
responded to the question “Ideally how far from home
would you like the college you attend to be?”
Also, 3007 parents of students applying to college
responded to the question “how far from home would
you like the college yourshould you do first?Data
                  What child attends to be?”
are displayed in the frequency table below.

                            Frequency
Ideal Distance         Students   Parents   Create a
Less than 250 miles      4450       1594    comparative
250 to 500 miles         3942        902    bar chart
500 to 1000 miles        2416        331
                                            with these
                                            data.
More than 1000 miles     1907        180
Relative Frequency
Ideal Distance           Students   Parents
Less than 250 miles         .35         .53
250 to 500 miles            .31         .30
500 to 1000 miles           .19         .11
More than 1000 miles        .15         .06
   Found by dividing the frequency by the total
               number of students
   Found by dividing the frequency by the total
                       number of parents
                                          What does this
                                          graph show about
                                          the ideal distance
                                          college should be
                                          from home?
Displaying
Numerical Data

                 Dotplots
   Stem-and-leaf Displays
              Histograms
Dotplot
When to Use          Univariate, Numerical data

How to construct
1. Draw a horizontal line and mark it with an
   appropriate numerical scale

2. Locate each value in the data set along the
   scale and represent it by a dot. If there are
   two are more observations with the same
   value, stack the dots vertically
Dotplot
What to Look For
 • A representative or typical value (center)
   in An outlier is an unusually large or small
      the data set
 • The extent to which the data values
                       data value.
   spread out
 • The nature offor deciding when an(shape)
    A precise rule the distribution observation
           is an outlier is given we look for with
                          What in Chapter 3.
   along the numberunivariate, numerical data
                         line
 • The presence of unusual values (gaps and
                            sets are similar for
   outliers)             dotplots, stem-and-leaf
                        displays, and histograms.
The first three observations are
 Professor Norm gave a 10-question quiz last
   plotted – note that you stack the
 week in his introductory statistics class. The
     points if values are repeated.
 number of correct answers for each student is
 recorded below.First draw a horizontal line with an
                         appropriate scale.
          6    8     6       5       4       7       9        4   5
          8    5     This 6 the completed dotplot.
                     4    is 7 7 3 8 7
          6    7     6       6       6       5       5        9


Write a sentence
or two describing
this distribution.       2       4       6       8       10
                         Number of correct answers
                         Number of correct answers
What to Look For
What to Look For
     The representative or typical value (center) in the data set
• • The representative or typical value (center) in the data set
• • The extent to which the data values spreadone that has a
      A symmetrical distributionspread out
                              data values is out
     The extent to which thedata values spread out
         extent to which
    vertical Norm curve, (shape) alongthe left line
        If we draw a
•Professor line of gave a 10-question the number line is
  • The nature of the distribution (shape) along the number half
     The nature of      symmetry where quiz last
         smoothing out this
• • The presence of unusual values
     The presence of unusual values
week in hiswe will see that of the right half.The
   dotplot, mirror image statistics class.
          a introductory
number of ONLY oneanswers for each student is
   there is correct peak.
recorded below.
   Distributions with a single
      peak are said to be
                                     2    4     6     8    10
           unimodal.
                                     Number of correct answers

 TheDistributions with two
       center for the distribution of the number of
     peaks are bimodal, and
 correct answers is about 6. There is not a lot of
    with more than two peaks
 variability in the observations. The distribution
         are multimodal.
 is approximately symmetrical with no unusual
 observations.
Comparative Dotplots
When to Use    Univariate, numerical data with
               observations from 2 or more groups

How to construct
  • Constructed using the same numerical scale
    for two or more dotplots
  • Be sure to include group labels for the
    dotplots in the display

What to Look For
    Comment on the same four attributes, but
    comparing the dotplots displayed.
Distributions where the right tail is longer
   In anotherthatcomparative be positively the data
      Notice introductorysidedotplotclass, skewed
         Create a the left statistics with of the
       than the left is said to (or lower tail)
             sets from the two statistics classes,
   Professor Skew also gaveto 10-question quiz. The
                 (or skewed a the right).
       distribution is longer than the right side (or
   number of correct answers for andSkew’s class
         Is the distribution for Prof. Skew. to is
                 Professors’ Norm each said
        upper tail). This distribution is studentbe
   recorded direction of skewness is always inleft).
        The below.
       negatively skewed (or skewed to the the
               symmetric? Why or why not?
                direction of the longer tail.
The center8 the distribution for the number
      6     of 10      8     8      7    9     8   10
of correct 7
           answers on 9Prof. Skew’s class is 8




                                                                           Prof. Skew
      8          8           7      7    3          7
larger than the center of Prof. Norm’s class.
      8    7     6     6     6      5    5     9    8
There is also more variability in Prof. Skew’s
distribution. Prof. Skew’s distribution
appears to have an unusual observation where
one student few had 2 answers correct while
  Write a only




                                                                           Prof. Norm
there were no unusual observations in Prof.
  sentences
Norm’s class. The distribution for Prof. Skew
  comparing these
is negatively skewed while Prof. Norm’s
  distributions.
distribution is more symmetrical.
                                               2    4     6     8     10
                                               Number of correct answers
Stem-and-Leaf Displays
When to Use          Univariate, Numerical data


How to construct
 Stem-and-leafor more of the leading digits for
  • Select one   displays are an effective way to
 summarize univariate numerical data when the
    the stem
  • List the data set stem values in a vertical
             possible is not too large.
    column
  • Record the leaf for each observationlist
     Each observation is split intosure to
                                Be two parts:
    beside theconsists of theevery stem from
       Stem – corresponding stem digit(s)
                                first value
  • Indicate the units forthe finaland leavesthe
          Leaf - consists of stems digit(s) to
                              the smallest
    someplace in the display
                                 largest value
Stem-and-Leaf Displays
What to Look For
 • A representative or typical value (center)
   in the data set
 • The extent to which the data values
   spread out
 • The presence of unusual values (gaps and
   outliers)
 • The extent of symmetry in the data
   distribution
 • The number and location of peaks
The completed stem-and-leafleaf will is shown
                              So the display be the last
                            below.      two digits.
TheLet 5.6% be represented (AARP Bulletin, Junethe
      article “Going Wireless” as 05.6% so that all
2009) reported thedigits in front of the decimal. If we
 numbers have two estimated percentage of due to
     However, it is somewhat difficult tothe leaf is 5.6
                              With 05.6%, read
households with only wireless phone service (no behind –
  use the 2-digits, we would have will be written to 20
                     the 2-digit stems. from 05
                             and it stems
landline) for the 50 U.S. states andstems! the second
                              the stem the District of
                 that’s way too many 0. For
Columbia. Data use the first digit (tens) as our stems.
   So let’s just for the 19 Eastern but theare written
                               number, states first digit
                                         5.7 also is given
   A common practice is to drop all
here.
                         in theThis makes the (with a
                               behind the stem 0 display
                                 leaf.
   5.6 5.7 20.0 16.8 16.5 13.4commato read, and
                                   easier between). 8.0
                                       10.8 9.3 11.6
    11.4   16.3   14.0   10.8   7.8     DOES NOT change the
                                         20.6 10.8 5.1 11.6
                                      What is the leaf for 20.0%
                                         overall distribution of
    A 5stem-and-leaf display is anshould that leafway
                                    and What is the variablebe
                                          where appropriate
0   5.6, 9 8 79.3, 8.0, 7.8, 5.1
    5 5.7, 5
         5.7                                     the data set.
                                                     written?
1   6.8, 3 0 13.4, 4 0 0 1summarize theseinterest?
    6 6 6.5, 1 6 0.8, 1.6, 1.4, 6.3, 4.0, 0.8, 0.8, 1.6 data.
                     to                          of
2    0.0, 0.6
     00
     0.0
           (A dotplot would also be Wireless percent
                                    a reasonable choice.)
The article “Going Wireless” (AARP Bulletin, June
2009) reported the estimated percentage of
households with only wireless phone service (no
landline) for the 50 U.S. states and the District of
Columbia. Data for the 19 Eastern states are given
here.                               While it is not
                                 necessary to write
                         The center of the distribution
 0 559875
       5789
                         for the the leaves in order
                                  estimated percentage
 1 6 6 3 01 1 3 4 6 6 6
   0001 1164001
                         of households with only wireless
 2 00                              from smallest to
                         phone service is approximately
   Stem: tens
                         11%. There doesby doing so,
                                largest, not appear to
   Leaf: ones            be much the centerThisthe
                                   variability. of
Write a few              display distribution is more
                                 appears to be a
sentences describing unimodal, symmetric
                                      easily seen.
this distribution.       distribution with no outliers.
Comparative Stem-and-Leaf Displays
When to Use     Univariate, numerical data with
                observations from 2 or more group


How to construct
  • List the leaves for one data set to the right
    of the stems
  • List the leaves for the second data set to the
    left of the stems
  • Be sure to include group labels to identify
    which group is on the left and which is on the
    right
The article “Going Wireless” (AARP Bulletin, June
  2009) reported the estimated percentage of
  households with only wireless phone service (no
  landline) for the 50 U.S. states and the District of
  Columbia. Data for the 13 Western states are given
              Western States   Eastern States
  here.                998 0 555789
                 8766110      1   00011134666
          11.7   18.9   9.0   16.7     8.0   22.1   9.2   10.8
                        521   2   00
          21.1   17.7 25.5 16.3      11.4
                                  Stem: tens
                                  Leaf: ones
 The center of the distribution ofcomparative stem-
                        Create a the estimated
                      and-leaf display comparing the
Write a few of households with only wireless phone service
 percentage
 for the Western states is a little larger than the center
sentences              distributions of the Eastern
comparing these states. Both distributions are
 for the Eastern
                           and Western states.
distribution. with approximately the same amount of
 symmetrical
 variability.
Histograms
When to Use             Univariate numerical data
  Dotplots and stem-and-leaf displays are not
How to construct Constructed data
                          Discrete differently for
      effective ways to summarize numerical
 • Draw a horizontal scale and mark it with the possible
      data when the discrete contains a large
                        data set versus continuous
   values for the variable
 • Draw a vertical scale and data it datafrequency or
                number of mark values. data almost
                    Discrete numerical
                                     with
   relative frequencyalways result from counting. In
    Histograms are value, draw a rectangle centered a
                    such cases, each observation is
 • Above each possible displays that don’t work
   at well for small a height corresponding to its
      that value with data sets but do work well
                               whole number
   frequency or relative frequency
            for larger numerical data sets.
What to look for
  Center or typical value; spread; general shape
  and location and number of peaks; and gaps or
  outliers
Queen honey bees mate shortly after they become adults.
During a mating flight, the queen usually takes multiple
partners, collecting sperm that she will store and use
throughout the rest of her life.
A paper, “The Curious Promiscuity of Queen Honey Bees”
(Annals of Zoology [2001]: 255-265), provided the
following data on the number of partners for 30 queen
bees.

12    2     4       6   6     7        8       7        8     11
8     3     5       6   7     10       1       9        7     6
9     7     5       4   7     4        6       7        8    10
Here is a dotplot
of these data.

                         2     4       6        8       10    12
                                   Number of Partners
The bars should be centered over the
            discrete data values and have heights
Queen honey bees continued
           corresponding to the frequency of each
                         data value.

                    6
        Frequency



                    4

                    2



                    0   2   4   6   8   10   12
In practice, histograms for discrete data ONLY show the
                       Number of partners
The distributionnumber built the histogram on of queen
  rectangular bars. We of partners, partners top of the
   The variable, for the number of is discrete. To
honey bees to show create a histogram: with aover the
    dotplot is approximatelybars are centered center
                    that the symmetric
at 7 partners already have athat heights of the bars are
   discrete data values and horizontal axis – of
            we and a somewhat large amount
variability. There doesn’t appear to befrequency
      we need to frequency of each data any outliers.
              the add a vertical axis for value.
Here are two histograms showing the of
   What do you notice about the shapes
“queen bee these two One uses frequency
           data set”. histograms?
on the vertical axis, while the other uses
           relative frequency
Histograms with equal width intervals
When to Use             Univariate numerical data
How to construct        Continuous data
  • Mark the boundaries of the class intervals on the
    horizontal axis
  • Use either frequency or relative frequency on the
    vertical axis
  • Draw a rectangle for each class interval directly above
    that interval. The height of each rectangle is the
    frequency or relative frequency of the corresponding
    interval

What to look for
 Center or typical value; spread; general shape and
 location and number of peaks; and gaps or outliers
The top dotplot shows all the data
 Consider the following data on carry-on luggage
            values in each interval stacked in
 weight for 25 airline passengers.
This interval includes 10the the interval. barsbut not
 With25.0 17.9 the middle 30.0 rectangular to cover
        continuous data, of 18.0 values 28.2 27.8
                  10.1 27.6 and all 28.7 up
    an interval 20.9 data values (notwill 20.8 28.5
             15. of 33.8 intervals just one value).
 including 31.4 The next 27.6 21.9 19.9 include 15 and
       28.0
   Looking 24.9up todotplot, it 22.7easy 20,see that we
    all22.4 at this but not including to and so on.
        values 26.4 22.0 34.5 is 25.3
           could use intervals with a width of 5.
        Here is a is a continuous numerical data set.
             This dotplot of this data set.
From the dotplot, it is easy to see how the
     continuous histogram is created.
Comparative Histograms
   The article “Early Television Exposure and
    The biggest difference between the two histograms
       Subsequent Attention Problems in Children”
• Mustthe lowApril with a much higher proportion of 3-
   is at use two separate histograms with the
   (Pediatrics, end, 2004) investigated the television
  same horizontal U.S. children. 0-2 TVfrequency on
    year-old children axis and relative hours show
   viewing habits of falling in the These graphsinterval
  the vertical axis 1-year-old children.3-year old
                  than
      the viewing habits of 1-year old and
                         children.
           1-yr-olds                      3-yr-olds
Histograms with unequal width intervals
When to use
    when you have a concentration of data in the
    middle with some extreme values


How to construct
     construct similar to histograms with
     continuous data, but with density on the
     vertical axis

               relative frequency for interval
     density
                      width of interval
When people are asked for the values such as age or weight,
  they sometimes relative frequency on the verticalThe
      When using shade the truth in their responses. axis,
  article “Self-Report of Academic Performance” (Social
           the proportional area principle is violated.
  Methods and Research [November 1981]: 165-185) focused
  on SAT scores and grade point average (GPA). For each
  student inthe relativethe difference between reported to
    Notice the sample, frequency for the interval 0.4 GPA
  and< actual GPA was than the relative frequency for the
       2.0 is smaller determined. Positive differences
  resulted from individuals reporting GPAs the bar is MUCH
     interval -0.1 to < 0, but the area of larger than the
Class         Relative Frequency
  correct value.
Interval
                                 larger.
-2.0 to < -0.4   0.023
-0.4 to < -0.2   0.055
-0.2 to < 0.1    0.097
 -0.1 to < 0     0.210
  0 to < 0.1     0.189
 0.1 to < 0.2    0.139
 0.2 to < 0.4    0.116
 0.4 to < 2.0    0.171
GPAs continued
                                            Class            Relative    Width   Density
  To fix this problem, we                   Interval         Frequency
  need to find the                          -2.0 to < -0.4     0.023       1.6    0.014

  density of each                           -0.4 to < -0.2     0.055       0.2    0.275

  interval.                                 -0.2 to < 0.1      0.097       0.1    0.970
                                             -0.1 to < 0       0.210       0.1    2.100
                                              0 to < 0.1       0.189       0.1    1.890
          relative frequency for interval
density                                      0.1 to 0.2        0.139       0.1    1.390
                 width of interval
                                            0.2 to < 0.4       0.116       0.2    0.580
                                             0.4 to 2.0        0.171       1.6    0.107


                                                      This is a correct
                                                   histogram with unequal
                                                           widths.
Displaying Bivariate
Numerical Data

            Scatterplots
        Time Series Plots
Scatterplots
When to Use             Bivariate Numerical data

How to construct
  1. Draw horizontal and vertical axes. Label the
     horizontal axis and include an appropriate scale for
     the x-variable. Label the vertical axis and include
     an appropriate scale for the y-variable.
  2. For each (x, y) pair in the data set, add a dot in
     the appropriate location in the display.


What to look for
  Relationship between x and y
The accompanying table gives the cost (in
dollars) and an overall quality rating for 10
different brands of men’s athletic shoes
(www.consumerreports.org).
Cost     65   45   45   80    110   110   30   80   110   70
Rating   71   70   62   59     58   57    56   52   51    51

Is there a relationship between x = cost and
y = quality rating?


                             A scatterplot can help
                              answer this question
Cost           65    45    45    80    110   110   30   80   110   70
    Rating         71    70    62    59    58    57    56   52   51    51



                                            Is there a relationship
         70                                 between x = cost and
                                          Next, plotdraw completed
                                            Here is eachand y) pair.
                                            yFirst, the (x, label
                                              = quality rating?
                                            appropriate horizontal
                                                  scatterplot.
Rating




         60
                                               and vertical axes.
                                             There appears to be a
         50                                   negative relationship
              20   40    60    80   100
                                            between cost of athletic
                        Cost                 shoes and their quality
                                                rating – does that
                                                  surprise you?
Time Series Plots
When to Use       Bivariate data with time and
                  another variable
How to construct
  1. Draw horizontal and vertical axes. Label the
     horizontal axis and include an appropriate scale
     for the x-variable. Label the vertical axis and
     include an appropriate scale for the y-variable.
  2. For each (x, y) pair in the data set, add a dot in
     the appropriate location in the display.
  3. Connect each dot in order

What to look for
    trends or patterns over time
The Christmas Price Index is computed each year by
PNC Advisors. It is a humorous look at the cost of
giving all the gifts described in the popular Christmas
song “The Twelve Days of Christmas”
(www.pncchristmaspriceindex.com).


                                          Describe any
                                          trends or
                                          patterns
                                          that you see.


                             Why is there a downward
                           trend between 1993 & 1995?
Graphical Displays
in the Media

               Pie Charts
     Segmented Bar Charts
Pie (Circle) Chart
When to Use            Categorical data

How to construct
  • A circle is used to represent the whole data set.

  • “Slices” of the pie represent the categories

  • The size of a particular category’s slice is
    proportional to its frequency or relative
    frequency.

  • Most effective for summarizing data sets when
    there are not too many categories
Pie (Circle) Chart
The article “Fred Flintstone, Check Your Policy” (The Washington
Post, October 2, 2005) summarized a survey of 1014 adults
conducted by the Life and Health Insurance Foundation for
Education. Each person surveyed was asked to select which of five
fictional characters had the greatest need for life insurance:
Spider-Man, Batman, Fred Flintstone, Harry Potter, and Marge
Simpson. The data are summarized in the pie chart.
                                  The survey results were quite
                                 different from the assessment
                                     of an insurance expert.

                                The insurance expert felt that
                                Batman, a wealthy bachelor, and
                                  Spider-Man did not need life
                                   insurance as much as Fred
                                 Flintstone, a married man with
                                          dependents!
Segmented can be difficult to construct by
  A pie chart (or Stacked) Bar Charts
When to Use circular Categorical data makes
  hand. The           shape sometimes
  if difficult to compare areas for different
   categories, particularly when the relative
How to construct
             frequencies are similar.
  • Use a rectangular bar rather than a circle
     to represent the entire data set.
    So, we could use a segmented bar chart.
  • The bar is divided into segments, with
     different segments representing
     different categories.
  • The area of the segment is proportional to
     the relative frequency for the particular
     category.
Segmented (or Stacked) Bar Charts
Each year, the Higher Education Research Institute
conducts a survey of college seniors. In 2008,
approximately 23,000 seniors participated in the survey
(“Findings from the 2008 Administration of the College
Senior Survey,” Higher Education Research Institute,
June 2009).
This segmented bar
chart summarizes
student responses to
the question: “During
the past year, how much
time did you spend
studying and doing
homework in a typical
week?”
Common Mistakes
Avoid these Common Mistakes
 1. Areas should be proportional to frequency,
    relative frequency, or magnitude of the
    number being represented.

By replacing naturally drawn to
 The eye is the bars of a bar
 large areas in graphical displays.
   chart with milk buckets,
 Sometimes, indistorted. to make
     areas are an effort
 the graphical displays more
 interesting, designers1980 sight
  The two buckets for lose
 of this important principle.
 represent 32 cows, whereas
 Consider this graph (1970 Today,
   the one bucket for USA
 October 3, 2002).cows.
     represents 19
Avoid these Common Mistakes
1. Areas should be proportional to frequency,
   relative frequency, or magnitude of the
   number being represented.

Another common distortion
occurs when a third
dimension is added to bar
charts or pie charts. This
distorts the areas and
makes it much more
difficult to interpret.
Avoid these Common Mistakes
2. Be cautious of graphs with broken axes (axes
   that don’t start at 0).

• The use of broken axes in a scatterplot does not result
  in a misleading picture of the relationship of bivariate
  data.

• In time series plots, broken axes can sometimes
  exaggerate the magnitude of change over time.

• In bar charts and histograms, the vertical axis should
  NEVER be broken. This violates the “proportional
  area” principle.
Avoid these Common Mistakes
2. Be cautious of graphs with broken axes (axes
   that don’t start at 0).

This bar chart is similar to
one in an advertisement for
a software product designed
to raise student test scores.
Areas of the bars are not
proportional to the
magnitude of the numbers
represented – the area for
the rectangle 68 is more
than three times the area of
the rectangle representing
55!
Avoid these Common Mistakes
3. Notice that the intervals between observations are
    Watch out for unequal time spacing in time
  irregular,plots. points in the plot are equally spaced
    series yet the
   along the time axis. This makes it difficult to assess
               the rate ofis a correct time series plot.
                    Here change over time.

If observations
over time are not
made at regular
time intervals,
special care must
be taken in
constructing the
time series plot.
Avoid these Common Mistakes
 4. Be careful how you interpret patterns in
        Does an increase in the number of Methodist
    scatterplots.
        ministers CAUSE the increase in imported rum?
 Consider the following scatterplot showing the relationship between
 the number of Methodist ministers in New England and the amount
 of Cuban rum imported into Boston from 1860 to 1940
 (Education.com).                 35000


  r = .999973                                     30000

A strong pattern in a
                              Number of Barrels
                               of Imported Rum    25000
scatterplot means that                            20000
the two variables tend to
vary together in a                                15000

predictable way, BUT it                           10000
does not mean that there
is a cause-and-effect                             5000
                                                          0   50       100     150     200    250   300
relationship.                                                      Number of Methodist Ministers
Avoid these Common Mistakes
5. Make sure that a graphical display creates
   the right first impression.

Consider the following graph
from USA Today (June 25,
2001). Although this graph
does not violate the
proportional area principle,
the way the “bar” for the
none category is displayed
makes this graph difficult to
read. A quick glance at this
graph may leave the reader
with an incorrect impression.

Weitere ähnliche Inhalte

Was ist angesagt?

CART: Not only Classification and Regression Trees
CART: Not only Classification and Regression TreesCART: Not only Classification and Regression Trees
CART: Not only Classification and Regression TreesMarc Garcia
 
Classification and regression trees (cart)
Classification and regression trees (cart)Classification and regression trees (cart)
Classification and regression trees (cart)Learnbay Datascience
 
1.8 discretization
1.8 discretization1.8 discretization
1.8 discretizationKrish_ver2
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis緯鈞 沈
 
pratik meshram-Unit 5 (contemporary mkt r sch)
pratik meshram-Unit 5 (contemporary mkt r sch)pratik meshram-Unit 5 (contemporary mkt r sch)
pratik meshram-Unit 5 (contemporary mkt r sch)Pratik Meshram
 
2.8 accuracy and ensemble methods
2.8 accuracy and ensemble methods2.8 accuracy and ensemble methods
2.8 accuracy and ensemble methodsKrish_ver2
 
Marketing analytics - clustering Types
Marketing analytics - clustering TypesMarketing analytics - clustering Types
Marketing analytics - clustering TypesSuryakumar Thangarasu
 
Supervised learning (2)
Supervised learning (2)Supervised learning (2)
Supervised learning (2)AlexAman1
 
Cluster analysis using spss
Cluster analysis using spssCluster analysis using spss
Cluster analysis using spssDr Nisha Arora
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysissaba khan
 

Was ist angesagt? (17)

Dd31720725
Dd31720725Dd31720725
Dd31720725
 
CART: Not only Classification and Regression Trees
CART: Not only Classification and Regression TreesCART: Not only Classification and Regression Trees
CART: Not only Classification and Regression Trees
 
Decision tree
Decision treeDecision tree
Decision tree
 
Chap003
Chap003Chap003
Chap003
 
Classification and regression trees (cart)
Classification and regression trees (cart)Classification and regression trees (cart)
Classification and regression trees (cart)
 
1.8 discretization
1.8 discretization1.8 discretization
1.8 discretization
 
Linear discriminant analysis
Linear discriminant analysisLinear discriminant analysis
Linear discriminant analysis
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Cluster Validation
Cluster ValidationCluster Validation
Cluster Validation
 
pratik meshram-Unit 5 (contemporary mkt r sch)
pratik meshram-Unit 5 (contemporary mkt r sch)pratik meshram-Unit 5 (contemporary mkt r sch)
pratik meshram-Unit 5 (contemporary mkt r sch)
 
2.8 accuracy and ensemble methods
2.8 accuracy and ensemble methods2.8 accuracy and ensemble methods
2.8 accuracy and ensemble methods
 
Marketing analytics - clustering Types
Marketing analytics - clustering TypesMarketing analytics - clustering Types
Marketing analytics - clustering Types
 
Business statistics
Business statisticsBusiness statistics
Business statistics
 
Decision tree
Decision treeDecision tree
Decision tree
 
Supervised learning (2)
Supervised learning (2)Supervised learning (2)
Supervised learning (2)
 
Cluster analysis using spss
Cluster analysis using spssCluster analysis using spss
Cluster analysis using spss
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 

Ähnlich wie Ap stats chapter 2 miller revised

Statistik Chapter 2
Statistik Chapter 2Statistik Chapter 2
Statistik Chapter 2WanBK Leo
 
Chapter-2-Frequency-Distribution-and-Graphical-Presentation.pptx
Chapter-2-Frequency-Distribution-and-Graphical-Presentation.pptxChapter-2-Frequency-Distribution-and-Graphical-Presentation.pptx
Chapter-2-Frequency-Distribution-and-Graphical-Presentation.pptxLaurenceBernardBalbi1
 
QUANTITATIVE METHODS NOTES.pdf
QUANTITATIVE METHODS NOTES.pdfQUANTITATIVE METHODS NOTES.pdf
QUANTITATIVE METHODS NOTES.pdfBensonNduati1
 
Chapter 02
Chapter 02Chapter 02
Chapter 02conalep
 
Data Representations
Data RepresentationsData Representations
Data Representationsbujols
 
Intoduction to statistics
Intoduction to statisticsIntoduction to statistics
Intoduction to statisticsSachinKumar1799
 
Statistics final seminar
Statistics final seminarStatistics final seminar
Statistics final seminarTejas Jagtap
 
Statistics
StatisticsStatistics
Statisticspikuoec
 
Data Mining StepsProblem Definition Market AnalysisC
Data Mining StepsProblem Definition Market AnalysisCData Mining StepsProblem Definition Market AnalysisC
Data Mining StepsProblem Definition Market AnalysisCsharondabriggs
 
Data What Type Of Data Do You Have V2.1
Data   What Type Of Data Do You Have V2.1Data   What Type Of Data Do You Have V2.1
Data What Type Of Data Do You Have V2.1TimKasse
 
Ch1 The Nature of Statistics
Ch1 The Nature of StatisticsCh1 The Nature of Statistics
Ch1 The Nature of StatisticsFarhan Alfin
 
FREQUENCY DISTRIBUTION.pptx
FREQUENCY DISTRIBUTION.pptxFREQUENCY DISTRIBUTION.pptx
FREQUENCY DISTRIBUTION.pptxSreeLatha98
 
Analyzing quantitative data
Analyzing quantitative dataAnalyzing quantitative data
Analyzing quantitative dataBing Villamor
 
Exploratory data analysis
Exploratory data analysisExploratory data analysis
Exploratory data analysisgokulprasath06
 
Introduction to Inferential Statistics.pptx
Introduction to Inferential  Statistics.pptxIntroduction to Inferential  Statistics.pptx
Introduction to Inferential Statistics.pptxTAVITI NAIDU GONGADA
 
Case Study: Prediction on Iris Dataset Using KNN Algorithm
Case Study: Prediction on Iris Dataset Using KNN AlgorithmCase Study: Prediction on Iris Dataset Using KNN Algorithm
Case Study: Prediction on Iris Dataset Using KNN AlgorithmIRJET Journal
 

Ähnlich wie Ap stats chapter 2 miller revised (20)

Statistik Chapter 2
Statistik Chapter 2Statistik Chapter 2
Statistik Chapter 2
 
Chapter-2-Frequency-Distribution-and-Graphical-Presentation.pptx
Chapter-2-Frequency-Distribution-and-Graphical-Presentation.pptxChapter-2-Frequency-Distribution-and-Graphical-Presentation.pptx
Chapter-2-Frequency-Distribution-and-Graphical-Presentation.pptx
 
QUANTITATIVE METHODS NOTES.pdf
QUANTITATIVE METHODS NOTES.pdfQUANTITATIVE METHODS NOTES.pdf
QUANTITATIVE METHODS NOTES.pdf
 
Chapter 02
Chapter 02Chapter 02
Chapter 02
 
Data Representations
Data RepresentationsData Representations
Data Representations
 
Intoduction to statistics
Intoduction to statisticsIntoduction to statistics
Intoduction to statistics
 
Statistics final seminar
Statistics final seminarStatistics final seminar
Statistics final seminar
 
Chapter01
Chapter01Chapter01
Chapter01
 
Chapter01
Chapter01Chapter01
Chapter01
 
Statistics
StatisticsStatistics
Statistics
 
Data Mining StepsProblem Definition Market AnalysisC
Data Mining StepsProblem Definition Market AnalysisCData Mining StepsProblem Definition Market AnalysisC
Data Mining StepsProblem Definition Market AnalysisC
 
Data What Type Of Data Do You Have V2.1
Data   What Type Of Data Do You Have V2.1Data   What Type Of Data Do You Have V2.1
Data What Type Of Data Do You Have V2.1
 
Ch1 The Nature of Statistics
Ch1 The Nature of StatisticsCh1 The Nature of Statistics
Ch1 The Nature of Statistics
 
FREQUENCY DISTRIBUTION.pptx
FREQUENCY DISTRIBUTION.pptxFREQUENCY DISTRIBUTION.pptx
FREQUENCY DISTRIBUTION.pptx
 
Analyzing quantitative data
Analyzing quantitative dataAnalyzing quantitative data
Analyzing quantitative data
 
Exploratory data analysis
Exploratory data analysisExploratory data analysis
Exploratory data analysis
 
Classification
ClassificationClassification
Classification
 
Classification
ClassificationClassification
Classification
 
Introduction to Inferential Statistics.pptx
Introduction to Inferential  Statistics.pptxIntroduction to Inferential  Statistics.pptx
Introduction to Inferential Statistics.pptx
 
Case Study: Prediction on Iris Dataset Using KNN Algorithm
Case Study: Prediction on Iris Dataset Using KNN AlgorithmCase Study: Prediction on Iris Dataset Using KNN Algorithm
Case Study: Prediction on Iris Dataset Using KNN Algorithm
 

Kürzlich hochgeladen

Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfErwinPantujan2
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationRosabel UA
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
Food processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsFood processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsManeerUddin
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 

Kürzlich hochgeladen (20)

LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translation
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
Food processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsFood processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture hons
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 

Ap stats chapter 2 miller revised

  • 1. Chapter 2 Graphical Methods for Describing Data Distributions Created by Kathy Fritz / Revised by S. Miller September 2012
  • 2. Variable • any characteristic whose value may change from one individual to another College Home
  • 3. Data • The values for a variable from individual observations
  • 4. Suppose that a PE coach records the height of each student in his class. This is an example of univariate data Univariate – consist of observations on a single variable made on individuals in a sample or population
  • 5. Suppose that the PE coach records the height and weight of each student in his class. This is an example of bivariate data Bivariate - data that consist of pairs of numbers from two variables for each individual in a sample or population
  • 6. Suppose that the PE coach records the height, weight, number of sit-ups, and number of push-ups for each student in his class. This is an example of multivariate data Multivariate - data that consist of observations on two or more variables
  • 7. Two types of variables categorical numerical
  • 8. Categorical variables • Qualitative • Consist of categorical responses 1. Car model Which of They are all these 2. Birth year categorical variables are 3. Type of cell phone variables! NOT 4. Your zip code categorical 5. Which club you have joined variables?
  • 9. Numerical variables • quantitative It makes sense to perform math There operations on these values. are two types of numerical variables - • observations or measurements take on discrete and continuous numerical values 1. GPAs Which of these Does it makes sense variables are 2. Height of students to find an average NOT numerical? 3. Codes to combination locks code to combination 4. Number of text messages per day locks? 5. Weight of textbooks
  • 10. Two types of variables categorical numerical discrete continuous
  • 11. Discrete (numerical) • Isolated points along a number line • usually counts of items • Example: number of textbooks purchased
  • 12. Continuous (numerical) • Variable that can be any value in a given interval • usually measurements of something • Examples: GPAs or height or weight
  • 13. Are the following variables categorical or numerical (discrete or continuous)? 1. the color of cars in the teacher’s lot Categorical 2. the number of calculators owned by students at your college Discrete numerical 3. the zip code of an individual Categorical Is money a measurement or a count? 4. the amount of time it takes students to drive to school Continuous numerical 5. the appraised value of homes in your city Discrete numerical
  • 14. Graphical Display Variable Type Data Type Purpose Display data Bar Chart Use the following table to Univariate Categorical distribution Comparative Bar Chart determine2an appropriate 2 or more Univariate for or more groups Categorical Compare groups Dotplot graphical display a data set. data of Univariate What types Numerical Display graphs can be distribution Numerical used with Comparative Univariate for 2 or Compare 2 or more dotplot more groups groups Stem-and-leaf categorical Display data Univariate Numerical display data? distribution Comparative stem- Univariate for 2 Compare 2 or more and-leaf groups In section Numerical 2.3, we will groups Histogram Univariate see how the various Numerical Display data distribution graphical displays for Investigate Scatterplot Bivariate univariate,relationship between Numerical numerical data compare. 2 variables Univariate, collected Investigate trend Time series plot Numerical over time over time
  • 15. Displaying Categorical Data Bar Charts Comparative Bar Charts
  • 16. Bar Chart When to Use: Univariate, Categorical data To comply with new standards from the U. S. Department of This ischart is afrequency distribution. A bar called a graphical bottom of the Transportation, helmets should reach thedisplay for motorcyclist’s ears. The report “Motorcycle Helmet Use in 2005 – categorical data. Overall Results” (National Highway Traffic Safety Administration, Augustfrequency distribution is by observing 1700 A 2005) summarized data collected a table that displays the possible categories along motorcyclists nationwide at selected roadway locations. The frequency for a particular Each time a motorcyclist passed by,frequencies or whether with the associated the observer that noted the category is thehelmet (N), a noncompliant helmet (NC), rider was wearing no number of times or a compliant helmet (C). frequencies. set. relative in the data category appears Helmet Use Frequency The data are summarized in this N 731 table: NC 153 This should equal the total number of C 816 observations. 1700
  • 17. Bar Chart To compile with new standards from the U. S. Department of Transportation, helmets should reach the bottom of the motorcyclist’s ears. The report “Motorcycle Helmet Use in 2005 – Overall Results” (National Highway Traffic Safety Administration, August 2005) summarized data collected by observing 1700 motorcyclists nationwide at selected roadway locations. Each time a motorcyclist passed by, the observer noted whether the rider was wearing no helmet (N), a noncompliant helmet (NC), or a compliant helmet (C). The data are summarized in this Relative Helmet Use Helmet Use Frequency table: N 731 0.430 This should equal 1 NC 153 0.090 816 (allowing for rounding). C 0.480 1700 1.000
  • 18. Bar Chart How to construct 1. Draw a horizontal line; write the categories or All bars should have the same width so labels below the line at regularly spaced that both the height and the area of intervals the bar are proportional to the frequency or relative frequency of the 2. Draw a vertical line; label the scale using corresponding categories. frequency or relative frequency 3. Place a rectangular bar above each category label with a height determined by its frequency or relative frequency
  • 19. Bar Chart What to Look For Frequently or infrequently occurring categories Here is the completed bar chart for the motorcycle helmet data. Describe this graph.
  • 20. Comparative Bar Charts When to Use Univariate, Categorical data for Bar charts can two or more groups also be used to provide a visual You use relative frequency rather comparison of two or more groups. than frequency on the vertical axis How to constructyou can make meaningful so that comparisons even if the sample • Constructed by using the same horizontal and sizes are not the same. vertical axes for the bar charts of two or more groups • Usually color-coded to indicate which bars Why? correspond to each group • Should use relative frequencies on the vertical axis
  • 21. Each year the Princeton Review conducts a survey of students applying to college and of parents of college applicants. In 2009, 12,715 high school students responded to the question “Ideally how far from home would you like the college you attend to be?” Also, 3007 parents of students applying to college responded to the question “how far from home would you like the college yourshould you do first?Data What child attends to be?” are displayed in the frequency table below. Frequency Ideal Distance Students Parents Create a Less than 250 miles 4450 1594 comparative 250 to 500 miles 3942 902 bar chart 500 to 1000 miles 2416 331 with these data. More than 1000 miles 1907 180
  • 22. Relative Frequency Ideal Distance Students Parents Less than 250 miles .35 .53 250 to 500 miles .31 .30 500 to 1000 miles .19 .11 More than 1000 miles .15 .06 Found by dividing the frequency by the total number of students Found by dividing the frequency by the total number of parents What does this graph show about the ideal distance college should be from home?
  • 23. Displaying Numerical Data Dotplots Stem-and-leaf Displays Histograms
  • 24. Dotplot When to Use Univariate, Numerical data How to construct 1. Draw a horizontal line and mark it with an appropriate numerical scale 2. Locate each value in the data set along the scale and represent it by a dot. If there are two are more observations with the same value, stack the dots vertically
  • 25. Dotplot What to Look For • A representative or typical value (center) in An outlier is an unusually large or small the data set • The extent to which the data values data value. spread out • The nature offor deciding when an(shape) A precise rule the distribution observation is an outlier is given we look for with What in Chapter 3. along the numberunivariate, numerical data line • The presence of unusual values (gaps and sets are similar for outliers) dotplots, stem-and-leaf displays, and histograms.
  • 26. The first three observations are Professor Norm gave a 10-question quiz last plotted – note that you stack the week in his introductory statistics class. The points if values are repeated. number of correct answers for each student is recorded below.First draw a horizontal line with an appropriate scale. 6 8 6 5 4 7 9 4 5 8 5 This 6 the completed dotplot. 4 is 7 7 3 8 7 6 7 6 6 6 5 5 9 Write a sentence or two describing this distribution. 2 4 6 8 10 Number of correct answers Number of correct answers
  • 27. What to Look For What to Look For The representative or typical value (center) in the data set • • The representative or typical value (center) in the data set • • The extent to which the data values spreadone that has a A symmetrical distributionspread out data values is out The extent to which thedata values spread out extent to which vertical Norm curve, (shape) alongthe left line If we draw a •Professor line of gave a 10-question the number line is • The nature of the distribution (shape) along the number half The nature of symmetry where quiz last smoothing out this • • The presence of unusual values The presence of unusual values week in hiswe will see that of the right half.The dotplot, mirror image statistics class. a introductory number of ONLY oneanswers for each student is there is correct peak. recorded below. Distributions with a single peak are said to be 2 4 6 8 10 unimodal. Number of correct answers TheDistributions with two center for the distribution of the number of peaks are bimodal, and correct answers is about 6. There is not a lot of with more than two peaks variability in the observations. The distribution are multimodal. is approximately symmetrical with no unusual observations.
  • 28. Comparative Dotplots When to Use Univariate, numerical data with observations from 2 or more groups How to construct • Constructed using the same numerical scale for two or more dotplots • Be sure to include group labels for the dotplots in the display What to Look For Comment on the same four attributes, but comparing the dotplots displayed.
  • 29. Distributions where the right tail is longer In anotherthatcomparative be positively the data Notice introductorysidedotplotclass, skewed Create a the left statistics with of the than the left is said to (or lower tail) sets from the two statistics classes, Professor Skew also gaveto 10-question quiz. The (or skewed a the right). distribution is longer than the right side (or number of correct answers for andSkew’s class Is the distribution for Prof. Skew. to is Professors’ Norm each said upper tail). This distribution is studentbe recorded direction of skewness is always inleft). The below. negatively skewed (or skewed to the the symmetric? Why or why not? direction of the longer tail. The center8 the distribution for the number 6 of 10 8 8 7 9 8 10 of correct 7 answers on 9Prof. Skew’s class is 8 Prof. Skew 8 8 7 7 3 7 larger than the center of Prof. Norm’s class. 8 7 6 6 6 5 5 9 8 There is also more variability in Prof. Skew’s distribution. Prof. Skew’s distribution appears to have an unusual observation where one student few had 2 answers correct while Write a only Prof. Norm there were no unusual observations in Prof. sentences Norm’s class. The distribution for Prof. Skew comparing these is negatively skewed while Prof. Norm’s distributions. distribution is more symmetrical. 2 4 6 8 10 Number of correct answers
  • 30. Stem-and-Leaf Displays When to Use Univariate, Numerical data How to construct Stem-and-leafor more of the leading digits for • Select one displays are an effective way to summarize univariate numerical data when the the stem • List the data set stem values in a vertical possible is not too large. column • Record the leaf for each observationlist Each observation is split intosure to Be two parts: beside theconsists of theevery stem from Stem – corresponding stem digit(s) first value • Indicate the units forthe finaland leavesthe Leaf - consists of stems digit(s) to the smallest someplace in the display largest value
  • 31. Stem-and-Leaf Displays What to Look For • A representative or typical value (center) in the data set • The extent to which the data values spread out • The presence of unusual values (gaps and outliers) • The extent of symmetry in the data distribution • The number and location of peaks
  • 32. The completed stem-and-leafleaf will is shown So the display be the last below. two digits. TheLet 5.6% be represented (AARP Bulletin, Junethe article “Going Wireless” as 05.6% so that all 2009) reported thedigits in front of the decimal. If we numbers have two estimated percentage of due to However, it is somewhat difficult tothe leaf is 5.6 With 05.6%, read households with only wireless phone service (no behind – use the 2-digits, we would have will be written to 20 the 2-digit stems. from 05 and it stems landline) for the 50 U.S. states andstems! the second the stem the District of that’s way too many 0. For Columbia. Data use the first digit (tens) as our stems. So let’s just for the 19 Eastern but theare written number, states first digit 5.7 also is given A common practice is to drop all here. in theThis makes the (with a behind the stem 0 display leaf. 5.6 5.7 20.0 16.8 16.5 13.4commato read, and easier between). 8.0 10.8 9.3 11.6 11.4 16.3 14.0 10.8 7.8 DOES NOT change the 20.6 10.8 5.1 11.6 What is the leaf for 20.0% overall distribution of A 5stem-and-leaf display is anshould that leafway and What is the variablebe where appropriate 0 5.6, 9 8 79.3, 8.0, 7.8, 5.1 5 5.7, 5 5.7 the data set. written? 1 6.8, 3 0 13.4, 4 0 0 1summarize theseinterest? 6 6 6.5, 1 6 0.8, 1.6, 1.4, 6.3, 4.0, 0.8, 0.8, 1.6 data. to of 2 0.0, 0.6 00 0.0 (A dotplot would also be Wireless percent a reasonable choice.)
  • 33. The article “Going Wireless” (AARP Bulletin, June 2009) reported the estimated percentage of households with only wireless phone service (no landline) for the 50 U.S. states and the District of Columbia. Data for the 19 Eastern states are given here. While it is not necessary to write The center of the distribution 0 559875 5789 for the the leaves in order estimated percentage 1 6 6 3 01 1 3 4 6 6 6 0001 1164001 of households with only wireless 2 00 from smallest to phone service is approximately Stem: tens 11%. There doesby doing so, largest, not appear to Leaf: ones be much the centerThisthe variability. of Write a few display distribution is more appears to be a sentences describing unimodal, symmetric easily seen. this distribution. distribution with no outliers.
  • 34. Comparative Stem-and-Leaf Displays When to Use Univariate, numerical data with observations from 2 or more group How to construct • List the leaves for one data set to the right of the stems • List the leaves for the second data set to the left of the stems • Be sure to include group labels to identify which group is on the left and which is on the right
  • 35. The article “Going Wireless” (AARP Bulletin, June 2009) reported the estimated percentage of households with only wireless phone service (no landline) for the 50 U.S. states and the District of Columbia. Data for the 13 Western states are given Western States Eastern States here. 998 0 555789 8766110 1 00011134666 11.7 18.9 9.0 16.7 8.0 22.1 9.2 10.8 521 2 00 21.1 17.7 25.5 16.3 11.4 Stem: tens Leaf: ones The center of the distribution ofcomparative stem- Create a the estimated and-leaf display comparing the Write a few of households with only wireless phone service percentage for the Western states is a little larger than the center sentences distributions of the Eastern comparing these states. Both distributions are for the Eastern and Western states. distribution. with approximately the same amount of symmetrical variability.
  • 36. Histograms When to Use Univariate numerical data Dotplots and stem-and-leaf displays are not How to construct Constructed data Discrete differently for effective ways to summarize numerical • Draw a horizontal scale and mark it with the possible data when the discrete contains a large data set versus continuous values for the variable • Draw a vertical scale and data it datafrequency or number of mark values. data almost Discrete numerical with relative frequencyalways result from counting. In Histograms are value, draw a rectangle centered a such cases, each observation is • Above each possible displays that don’t work at well for small a height corresponding to its that value with data sets but do work well whole number frequency or relative frequency for larger numerical data sets. What to look for Center or typical value; spread; general shape and location and number of peaks; and gaps or outliers
  • 37. Queen honey bees mate shortly after they become adults. During a mating flight, the queen usually takes multiple partners, collecting sperm that she will store and use throughout the rest of her life. A paper, “The Curious Promiscuity of Queen Honey Bees” (Annals of Zoology [2001]: 255-265), provided the following data on the number of partners for 30 queen bees. 12 2 4 6 6 7 8 7 8 11 8 3 5 6 7 10 1 9 7 6 9 7 5 4 7 4 6 7 8 10 Here is a dotplot of these data. 2 4 6 8 10 12 Number of Partners
  • 38. The bars should be centered over the discrete data values and have heights Queen honey bees continued corresponding to the frequency of each data value. 6 Frequency 4 2 0 2 4 6 8 10 12 In practice, histograms for discrete data ONLY show the Number of partners The distributionnumber built the histogram on of queen rectangular bars. We of partners, partners top of the The variable, for the number of is discrete. To honey bees to show create a histogram: with aover the dotplot is approximatelybars are centered center that the symmetric at 7 partners already have athat heights of the bars are discrete data values and horizontal axis – of we and a somewhat large amount variability. There doesn’t appear to befrequency we need to frequency of each data any outliers. the add a vertical axis for value.
  • 39. Here are two histograms showing the of What do you notice about the shapes “queen bee these two One uses frequency data set”. histograms? on the vertical axis, while the other uses relative frequency
  • 40. Histograms with equal width intervals When to Use Univariate numerical data How to construct Continuous data • Mark the boundaries of the class intervals on the horizontal axis • Use either frequency or relative frequency on the vertical axis • Draw a rectangle for each class interval directly above that interval. The height of each rectangle is the frequency or relative frequency of the corresponding interval What to look for Center or typical value; spread; general shape and location and number of peaks; and gaps or outliers
  • 41. The top dotplot shows all the data Consider the following data on carry-on luggage values in each interval stacked in weight for 25 airline passengers. This interval includes 10the the interval. barsbut not With25.0 17.9 the middle 30.0 rectangular to cover continuous data, of 18.0 values 28.2 27.8 10.1 27.6 and all 28.7 up an interval 20.9 data values (notwill 20.8 28.5 15. of 33.8 intervals just one value). including 31.4 The next 27.6 21.9 19.9 include 15 and 28.0 Looking 24.9up todotplot, it 22.7easy 20,see that we all22.4 at this but not including to and so on. values 26.4 22.0 34.5 is 25.3 could use intervals with a width of 5. Here is a is a continuous numerical data set. This dotplot of this data set.
  • 42. From the dotplot, it is easy to see how the continuous histogram is created.
  • 43. Comparative Histograms The article “Early Television Exposure and The biggest difference between the two histograms Subsequent Attention Problems in Children” • Mustthe lowApril with a much higher proportion of 3- is at use two separate histograms with the (Pediatrics, end, 2004) investigated the television same horizontal U.S. children. 0-2 TVfrequency on year-old children axis and relative hours show viewing habits of falling in the These graphsinterval the vertical axis 1-year-old children.3-year old than the viewing habits of 1-year old and children. 1-yr-olds 3-yr-olds
  • 44. Histograms with unequal width intervals When to use when you have a concentration of data in the middle with some extreme values How to construct construct similar to histograms with continuous data, but with density on the vertical axis relative frequency for interval density width of interval
  • 45. When people are asked for the values such as age or weight, they sometimes relative frequency on the verticalThe When using shade the truth in their responses. axis, article “Self-Report of Academic Performance” (Social the proportional area principle is violated. Methods and Research [November 1981]: 165-185) focused on SAT scores and grade point average (GPA). For each student inthe relativethe difference between reported to Notice the sample, frequency for the interval 0.4 GPA and< actual GPA was than the relative frequency for the 2.0 is smaller determined. Positive differences resulted from individuals reporting GPAs the bar is MUCH interval -0.1 to < 0, but the area of larger than the Class Relative Frequency correct value. Interval larger. -2.0 to < -0.4 0.023 -0.4 to < -0.2 0.055 -0.2 to < 0.1 0.097 -0.1 to < 0 0.210 0 to < 0.1 0.189 0.1 to < 0.2 0.139 0.2 to < 0.4 0.116 0.4 to < 2.0 0.171
  • 46. GPAs continued Class Relative Width Density To fix this problem, we Interval Frequency need to find the -2.0 to < -0.4 0.023 1.6 0.014 density of each -0.4 to < -0.2 0.055 0.2 0.275 interval. -0.2 to < 0.1 0.097 0.1 0.970 -0.1 to < 0 0.210 0.1 2.100 0 to < 0.1 0.189 0.1 1.890 relative frequency for interval density 0.1 to 0.2 0.139 0.1 1.390 width of interval 0.2 to < 0.4 0.116 0.2 0.580 0.4 to 2.0 0.171 1.6 0.107 This is a correct histogram with unequal widths.
  • 47. Displaying Bivariate Numerical Data Scatterplots Time Series Plots
  • 48. Scatterplots When to Use Bivariate Numerical data How to construct 1. Draw horizontal and vertical axes. Label the horizontal axis and include an appropriate scale for the x-variable. Label the vertical axis and include an appropriate scale for the y-variable. 2. For each (x, y) pair in the data set, add a dot in the appropriate location in the display. What to look for Relationship between x and y
  • 49. The accompanying table gives the cost (in dollars) and an overall quality rating for 10 different brands of men’s athletic shoes (www.consumerreports.org). Cost 65 45 45 80 110 110 30 80 110 70 Rating 71 70 62 59 58 57 56 52 51 51 Is there a relationship between x = cost and y = quality rating? A scatterplot can help answer this question
  • 50. Cost 65 45 45 80 110 110 30 80 110 70 Rating 71 70 62 59 58 57 56 52 51 51 Is there a relationship 70 between x = cost and Next, plotdraw completed Here is eachand y) pair. yFirst, the (x, label = quality rating? appropriate horizontal scatterplot. Rating 60 and vertical axes. There appears to be a 50 negative relationship 20 40 60 80 100 between cost of athletic Cost shoes and their quality rating – does that surprise you?
  • 51. Time Series Plots When to Use Bivariate data with time and another variable How to construct 1. Draw horizontal and vertical axes. Label the horizontal axis and include an appropriate scale for the x-variable. Label the vertical axis and include an appropriate scale for the y-variable. 2. For each (x, y) pair in the data set, add a dot in the appropriate location in the display. 3. Connect each dot in order What to look for trends or patterns over time
  • 52. The Christmas Price Index is computed each year by PNC Advisors. It is a humorous look at the cost of giving all the gifts described in the popular Christmas song “The Twelve Days of Christmas” (www.pncchristmaspriceindex.com). Describe any trends or patterns that you see. Why is there a downward trend between 1993 & 1995?
  • 53. Graphical Displays in the Media Pie Charts Segmented Bar Charts
  • 54. Pie (Circle) Chart When to Use Categorical data How to construct • A circle is used to represent the whole data set. • “Slices” of the pie represent the categories • The size of a particular category’s slice is proportional to its frequency or relative frequency. • Most effective for summarizing data sets when there are not too many categories
  • 55. Pie (Circle) Chart The article “Fred Flintstone, Check Your Policy” (The Washington Post, October 2, 2005) summarized a survey of 1014 adults conducted by the Life and Health Insurance Foundation for Education. Each person surveyed was asked to select which of five fictional characters had the greatest need for life insurance: Spider-Man, Batman, Fred Flintstone, Harry Potter, and Marge Simpson. The data are summarized in the pie chart. The survey results were quite different from the assessment of an insurance expert. The insurance expert felt that Batman, a wealthy bachelor, and Spider-Man did not need life insurance as much as Fred Flintstone, a married man with dependents!
  • 56. Segmented can be difficult to construct by A pie chart (or Stacked) Bar Charts When to Use circular Categorical data makes hand. The shape sometimes if difficult to compare areas for different categories, particularly when the relative How to construct frequencies are similar. • Use a rectangular bar rather than a circle to represent the entire data set. So, we could use a segmented bar chart. • The bar is divided into segments, with different segments representing different categories. • The area of the segment is proportional to the relative frequency for the particular category.
  • 57. Segmented (or Stacked) Bar Charts Each year, the Higher Education Research Institute conducts a survey of college seniors. In 2008, approximately 23,000 seniors participated in the survey (“Findings from the 2008 Administration of the College Senior Survey,” Higher Education Research Institute, June 2009). This segmented bar chart summarizes student responses to the question: “During the past year, how much time did you spend studying and doing homework in a typical week?”
  • 59. Avoid these Common Mistakes 1. Areas should be proportional to frequency, relative frequency, or magnitude of the number being represented. By replacing naturally drawn to The eye is the bars of a bar large areas in graphical displays. chart with milk buckets, Sometimes, indistorted. to make areas are an effort the graphical displays more interesting, designers1980 sight The two buckets for lose of this important principle. represent 32 cows, whereas Consider this graph (1970 Today, the one bucket for USA October 3, 2002).cows. represents 19
  • 60. Avoid these Common Mistakes 1. Areas should be proportional to frequency, relative frequency, or magnitude of the number being represented. Another common distortion occurs when a third dimension is added to bar charts or pie charts. This distorts the areas and makes it much more difficult to interpret.
  • 61. Avoid these Common Mistakes 2. Be cautious of graphs with broken axes (axes that don’t start at 0). • The use of broken axes in a scatterplot does not result in a misleading picture of the relationship of bivariate data. • In time series plots, broken axes can sometimes exaggerate the magnitude of change over time. • In bar charts and histograms, the vertical axis should NEVER be broken. This violates the “proportional area” principle.
  • 62. Avoid these Common Mistakes 2. Be cautious of graphs with broken axes (axes that don’t start at 0). This bar chart is similar to one in an advertisement for a software product designed to raise student test scores. Areas of the bars are not proportional to the magnitude of the numbers represented – the area for the rectangle 68 is more than three times the area of the rectangle representing 55!
  • 63. Avoid these Common Mistakes 3. Notice that the intervals between observations are Watch out for unequal time spacing in time irregular,plots. points in the plot are equally spaced series yet the along the time axis. This makes it difficult to assess the rate ofis a correct time series plot. Here change over time. If observations over time are not made at regular time intervals, special care must be taken in constructing the time series plot.
  • 64. Avoid these Common Mistakes 4. Be careful how you interpret patterns in Does an increase in the number of Methodist scatterplots. ministers CAUSE the increase in imported rum? Consider the following scatterplot showing the relationship between the number of Methodist ministers in New England and the amount of Cuban rum imported into Boston from 1860 to 1940 (Education.com). 35000 r = .999973 30000 A strong pattern in a Number of Barrels of Imported Rum 25000 scatterplot means that 20000 the two variables tend to vary together in a 15000 predictable way, BUT it 10000 does not mean that there is a cause-and-effect 5000 0 50 100 150 200 250 300 relationship. Number of Methodist Ministers
  • 65. Avoid these Common Mistakes 5. Make sure that a graphical display creates the right first impression. Consider the following graph from USA Today (June 25, 2001). Although this graph does not violate the proportional area principle, the way the “bar” for the none category is displayed makes this graph difficult to read. A quick glance at this graph may leave the reader with an incorrect impression.