Descriptive Statistics Part II: Graphical Description

DESCRIPTIVE STATISTICS
Part II: Graphical Description
1

What are raw data and processed data?
2

3

4
Bar Chart and Pie Chart
Bar Chart:
An x-y chart in which the x-axis represents values of a quantitative
variable or labels of a qualitative variable and the y-axis represents
the corresponding values depicted by bars
Pie Chart:
A circle divided into wedge-shaped pieces that represent
areas proportional to the frequencies or relative frequencies

5
Example: The data in the Table below illustrate the average points per game for eight
NFL teams after 10 games of 2010 season. Construct a bar chart to compare the
performance of the eight teams. What teams had the highest points per game and what
teams had the lowest points per game?
Team
Average
Points/Game
Atlanta 25.6
New England 28.9
Baltimore 23.3
New York Jets 23.8
Pittsburgh 23.5
Philadelphia 28.4
Green Bay 25.2
New Orleans 23.5
Average points per game
of eight NFL teams after 10 games
in 2010 season
0
5
10
15
20
25
30
35
AveragePointsperGame
NFL Team
A Bar Chart of Average Points per Game

6
2
3
Using Excel® to Construct
a Bar Chart
4
OUTPUT
1

7
Death Cause Percent
Car Accidents 89
Fire arms 2
Poison 4
Other 5
Example: The data shown below illustrate the percent of causes of death among U.S.
residents in the age from 18 to 24. Construct a pie chart illustrating the causes of death.
What cause contributes the most to the death of residents in this age range?
Percent of death
incidents and their causes
Car Accidents
89%
Fire arms
2%
Poison
4%
Other
5%
A Pie Chart of the Main Causes of Death of U.S. Residents (Age 18-24)

8
2
3
Using Excel® to Construct
a Pie Chart
4
OUTPUT
1

12
Working Problem 3.3.b:
The bar chart shown below illustrates the Financial Aids of some U.S. Universities
(2006)
(c) Which university has the highest financial aid?
(d) Which university has the lowest financial aid?
$8,222
$6,954 $7,532 $7,861 $8,006
$10,566
$13,449
$9,501 $9,687
$7,320 $7,980
$13,462
$5,487
$8,269
$0
$2,000
$4,000
$6,000
$8,000
$10,000
$12,000
$14,000
$16,000
Financial Aid

14
Histogram or a Frequency Distribution
This is a distribution that results from grouping of data into mutually exclusive
classes of values and displaying frequencies or the number of observations in
each class
EXAMPLE:
Students’ Grades
0
2
4
6
8
10
12
14
55 65 75 85 95
NumberofStudents(or
Frequency)
Grade Values (the Variable)
Student Grade (the
Variable) Mid-Point
Number of Students
(Frequency)
50 to 60 55 4
60 to 70 65 6
70 to 80 75 12
80 to 90 85 9
90 to 100 95 4

15
Describing a Histogram or a Frequency Distribution (Basic points)
0
2
4
6
8
10
12
14
55 65 75 85 95NumberofStudents(or
Frequency)
How many students took the test? 35 students (add up the bar heights)
Approximately, what are the minimum and maximum grades, or what is the range?
Min ~55, Max ~95, Range ~40
What grade did the majority of students earn (the Mode)?
Mode ~75 (or 70% to 80%)

16
Describing a Histogram or a Frequency Distribution (More information)
0
2
4
6
8
10
12
14
55 65 75 85 95NumberofStudents(or
Frequency)
Approximately, How many students fail the test (<60%)? 4 students
Approximately, How many students passed the test with C (70%) or more?
25 students
Approximately, What percent of the students earned an “A” (or
above 90%?
~100*(4/35) = 11.429%

17
Histogram or a Frequency Distribution
EXAMPLE:
NY-College System
Teachers Salaries ($)
NY-College System
Teachers Salaries ($)
Mid-
Point
Percent
Frequency
(%)
30,000 to 40,000 35,000 6
40,000 to 50,000 45,000 17
50,000 to 60,000 55,000 26
60,000 to 70,000 65,000 18
70,000 to 80,000 75,000 12
80,000 to 90,000 85,000 8
90,000 to 100,000 95,000 5
100,000 to 110,000 105,000 4
110,000 to 120,000 115,000 4
0
5
10
15
20
25
30
35,000 45,000 55,000 65,000 75,000 85,000 95,000 105,000 115,000
AxisTitle
Axis Title
(a) Approximately, what percent of the teacher earn less than $40,000?
(b) Approximately, what percent of the teacher earn more than $100,000?
(c) How much does the majority of teachers (Mode) earn?

How do we construct a histogram-The Five Key Steps? Example
55 50 47 50 55 81 80 98
62 38 67 70 60 69 78 39
70 65 99 55 64 89 85 65
75 56 75 50 100 68 95 85
50 30 60 66 85 79 85 70
Values of driving-to-work distances
Step 1: Find the minimum and the
maximum values of the data set:
min = 30 and max = 100. This gives
you an idea about the span or the range
of the entire data set, which is 70 miles.
Step 2: Decide on the class width that you wish to use. This will depend on number of classes or categories that
you wish to use. Commonly, the minimum number of classes used is 5. Let us select a class width of 10 miles
Step 3: Form a frequency table starting
with the first column (listing classes)
Step 4: Form the second column of the
frequency table, which is the simply the
mid-point of each class.
Step 5: Count the number of observations in
each class and place the count in the third
column labeled ‘frequency’.
Classes
25 to < 35
35 to < 45
45 to < 55
55 to < 65
65 to < 75
75 to < 85
85 to < 95
95 to <105
Mid-Points
30
40
50
60
70
80
90
100
Frequency
Total = 40
Horizontal Axis
of Histogram (x)
Vertical Axis
of Histogram (f)
x
f
1
2
5
8
9
6
5
4
55 50 47 50 55 81 80 98
62 38 67 70 60 69 78 39
70 65 99 55 64 89 85 65
75 56 75 50 100 68 95 85
50 30 60 66 85 79 85 70

19
Classes Mid-Points Frequency
25 to < 35 30 1
35 to < 45 40 2
45 to < 55 50 5
55 to < 65 60 8
65 to < 75 70 9
75 to < 85 80 6
85 to < 95 90 5
95 to <105 100 4
Total = 40
How do we construct a histogram-The Five Key Steps? Example
0
1
2
3
4
5
6
7
8
9
10
30 40 50 60 70 80 90
Frequency
Distance (miles)

20
What is a relative frequency distribution?
The absolute number of observations corresponding to a certain class or interval can be converted
into a relative frequency value or a fraction of the total frequency using the following expression:
where fi is the frequency of the ith class, k is the number of classes, and
is the sum of all frequencies.
We can also obtain the percent relative frequency by multiplying the relative frequency by 100:

21
What is a relative frequency distribution?
Relative Frequency (rf)
(1/40) or 0.025
(2/40) or 0.050
0.125
0.200
0.225
0.150
0.125
0.100
1.000
Classes Mid-Points (x) Frequency (f)
25 to < 35 30 1
35 to < 45 40 2
45 to < 55 50 5
55 to < 65 60 8
65 to < 75 70 9
75 to < 85 80 6
85 to < 95 90 5
95 to <105 100 4
Total = 40
Percent Relative
Frequency (rf %)
2.5
5
12.5
20
22.5
15
12.5
10
100
0
0.05
0.1
0.15
0.2
0.25
30 40 50 60 70 80 90
RelativeFrequency
Distance
0
5
10
15
20
25
30 40 50 60 70 80 90
PercentRelative
Frequency
Distance

24
Cumulative Frequency Curve (or Ogive):
This is a curve or a line graph used to display the cumulative frequency of each
class at its upper class boundary. The horizontal axis should represent the upper
boundaries of different classes and the vertical axis should represent the cumulative
frequencies.

25
What is a cumulative frequency distribution?
Classes
Mid-
Points
(x)
Frequency
(f)
Relative
Frequency
(rf)
Percent
Relative
Frequency
(rf %)
25 to < 35 30 1 0.025 2.5
35 to < 45 40 2 0.050 5
45 to < 55 50 5 0.125 12.5
55 to < 65 60 8 0.200 20
65 to < 75 70 9 0.225 22.5
75 to < 85 80 6 0.150 15
85 to < 95 90 5 0.125 12.5
95 to <105 100 4 0.100 10
Total = 40 1 100
Cumulative
Frequency
(CF)
1
3
8
16
25
31
36
40
Percent
Cumulative
Frequency
(CF%)
2.5
7.5
20
40
62.5
77.5
90
100
0.0
25.0
50.0
75.0
100.0
15 35 55 75 95
CumulativePercent
Distance

26
0.0
25.0
50.0
75.0
100.0
15 35 55 75 95
CumulativePercent
Distance
Cumulative Frequency Distribution of Driving-to-Work Distance
%5.62)75( dp
What is the key information provided by a cumulative frequency distribution?
p(x < xo).The percent of data observations exhibiting values less than a certain value,
Q: What is the percent of employees driving less than 75 miles to work?
%5.62

27
Example: The cumulative frequency distribution of touchdowns of the 32 NFL teams after the tenth
week of 2010 season is shown below.
• How many teams scored 20 touchdowns or less?
• How many teams scored 30 touchdowns or more?
0.0
10.0
20.0
30.0
40.0
50.0
60.0
70.0
80.0
90.0
100.0
6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44
CumulativeFrequency(%)
Touchdowns
Cumulative frequency distribution of touchdowns of the NFL teams after the tenth week of 2010 season

28
Example: The cumulative frequency distribution of touchdowns of the 32 NFL teams after the tenth
week of 2010 season is shown below.
• How many teams scored 20 touchdowns or less?
• How many teams scored 30 touchdowns or more?
0.0
10.0
20.0
30.0
40.0
50.0
60.0
70.0
80.0
90.0
100.0
6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44
Touchdowns
P(x < 20) ≈16%
P(x < 30) ≈72%
P(x ≥ 30) ≈100 – 72 = 28%

31
Using Excel® for constructing frequency distributions:
Case study: Ceramic tile area
258.5 255.0 255.5 256.4 256.6 258.4 257.2 257.4 256.3 259.5
255.7 254.9 256.1 254.5 253.3 257.9 259.1 256.8 257.7 255.1
254.1 255.5 256.5 256.1 255.0 255.9 255.1 254.6 255.1 255.1
255.4 254.3 258.5 256.3 255.6 256.5 257.5 253.8 256.2 256.1
256.2 255.7 257.1 256.7 256.1 257.4 255.0 256.2 254.6 257.0
255.5 256.9 255.8 254.7 256.2 256.9 256.4 255.6 254.8 255.6
257.3 256.8 256.0 254.9 256.0 256.2 257.7 252.7 255.6 255.5
253.9 256.3 255.4 256.1 256.0 254.0 257.8 252.7 256.4 256.6
255.5 255.6 255.1 256.6 254.5 255.4 254.1 256.0 256.9 256.9
254.6 254.8 256.3 255.5 256.4 253.8 254.8 254.6 255.4 255.2
Values of ceramic tile areas
Example: The data in the Table represent a sample of 100 tiles selected
randomly from a tile operation producing tiles of consistent thickness
and nominal dimensions of 16x16 inch, or 256 square inch area.

All data should be aligned in one column
Arrangement of Ceramic Tile Data in one Column
Ceramic Tile Area (Sq. inch)
258.5
255.7
254.1
255.4
256.2
255.5
257.3
253.9
255.5
254.6
255
254.9
255.5
254.3
255.7
256.9
256.8
256.3
255.6
254.8
255.5
256.1
256.5
258.5
257.1
255.8
256
255.4
255.1
256.3
256.4
254.5
256.1
256.3
256.7
254.7
254.9
256.1

1
Using Excel® to perform Descriptive Statistics of Ceramic Tile Data

Excel® Output of Descriptive Statistics of Ceramic Tile Data
• The mean area = 255.87 square inch
• The median = 255.95 square inch
• The mode = 255.5 square inch.

Form
a Bin
Range
1
Using Excel® Histogram Tool to construct a histogram of Ceramic Tile Data-Step 1

Using Excel® Histogram Tool to construct a histogram of Ceramic Tile Data-Steps 2 & 3
Go to Data Analysis2
Select Histogram
in Data Analysis
& press Ok
3

Select Cumulative Percentage
& Chart Output and Press Ok
4
5
Using Excel® Histogram Tool to construct a histogram of Ceramic Tile Data-Steps 4 & 5

Excel® Histogram Output of Ceramic Tile Data

Closing the gap between bars in Excel® Histogram Output

40
Bin Frequency Cumulative %
252.0 0 0.00%
252.5 0 0.00%
253.0 2 2.00%
253.5 1 3.00%
254.0 4 7.00%
254.5 5 12.00%
255.0 13 25.00%
255.5 16 41.00%
256.0 13 54.00%
256.5 20 74.00%
257.0 11 85.00%
257.5 6 91.00%
258.0 4 95.00%
258.5 3 98.00%
259.0 0 98.00%
259.5 2 100.00%
260.0 0 100.00%
Example: Suppose in the tile area example discussed above, a consumer would like to purchase
tiles of the following specifications: target = 256 square inch, tolerance = 256 ± 1. In other words,
the consumer has a plan to use tiles of area of 256 square inch, but he is willing to tolerate tiles
ranging in area from 255 to 257 square inch. What percent of tiles will meet these specifications?

41
Bin Frequency Cumulative %
252.0 0 0.00%
252.5 0 0.00%
253.0 2 2.00%
253.5 1 3.00%
254.0 4 7.00%
254.5 5 12.00%
255.0 13 25.00%
255.5 16 41.00%
256.0 13 54.00%
256.5 20 74.00%
257.0 11 85.00%
257.5 6 91.00%
258.0 4 95.00%
258.5 3 98.00%
259.0 0 98.00%
259.5 2 100.00%
260.0 0 100.00%
Classes
> 251.5 to 252
> 252 to 252.5
> 252.5 to 253
> 253 to 253.5
> 253.5 to 254
> 254 to 254.5
> 254.5 to 255
> 255 to 255.5
> 255.5 to 256
> 256 to 256.5
> 256.5 to 257
> 257 to 257.5
> 257.5 to 258
> 258 to 258.5
> 258.5 to 259
> 259 to 259.5
> 259.5 to 260
Target = 256 square inch, tolerance = 256 ± 1

42
0
2
4
6
8
10
12
14
16
18
20
22
Frequency
Ceramic Tile Area (Sq. inch)
RejectReject
256 ± 1.0
Specified Area
Figure 3.21 Frequency Distribution: Tiles within the Specification Limits
Target = 256 square inch, tolerance = 256 ± 1

43
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
110.00%
252.0 252.5 253.0 253.5 254.0 254.5 255.0 255.5 256.0 256.5 257.0 257.5 258.0 258.5 259.0 259.5 260.0
Area
P(x ≤ 255) = 25%
P(x ≤ 257) = 85%
Percent of Tiles of Area between 255 and 257 square inch

44
8000 11000
9000 4000
7000 8000
12000 9000
11000 7000
21000 12000
8000 6000
7500 21000
19000 8000
12000 7500
10050 7400
14000 6200
14000 18200
7000 14230
12000 9000
Working Problem 3.9:
The data shown here represents the prices of a sample of 30 used cars selected randomly from
a large used car lot. Using descriptive statistics, answer the following questions:
(a) Determine the mean, the mode, and the median
(b) Determine the range, the standard deviation, and the variance

The data shown here represents the prices of a sample of 30 used cars selected randomly from
a large used car lot. Construct a histogram and a Cumulative Frequency Curve
8000 11000
9000 4000
7000 8000
12000 9000
11000 7000
21000 12000
8000 6000
7500 21000
19000 8000
12000 7500
10050 7400
14000 6200
14000 18200
7000 14230
12000 9000

46
Given the Histogram of Car Prices given below, answer the following questions:
(a) What is the price of the majority of cars in the used car lot?
(b) What is the range of prices of used cars in the used car lot?
(c) What percent of cars of a price less than $10,000?
(d) What is the percent of cars of a price greater than $20,000?
0
5
10
15
20
25
30
4,000
6,000
8,000
10,000
12,000
14,000
16,000
18,000
20,000
22,000
Percent
Car Price($)
Car
Price
($) cumulative
lower upper midpoint
freq
. percent freq. percent
4,000 < 6,000 5,000 1 3.3 1 3.3
6,000 < 8,000 7,000 8 26.7 9 30.0
8,000 < 10,000 9,000 7 23.3 16 53.3
10,000 < 12,000 11,000 3 10.0 19 63.3
12,000 < 14,000 13,000 4 13.3 23 76.7
14,000 < 16,000 15,000 3 10.0 26 86.7
16,000 < 18,000 17,000 0 0.0 26 86.7
18,000 < 20,000 19,000 2 6.7 28 93.3
20,000 < 22,000 21,000 2 6.7 30 100.0
30 100.0

47
Comparing frequency distributions- The case of second-hand smokers
Example: Children represent a primary target of many studies of second-hand smoking. In this
regard, the important variable to be considered is the level of blood cotinine in the body.
The data in the Table represent values of blood cotinine level of two random samples of children age 4 to 17.
The first set (unexposed children) represents children who were not exposed to tobacco smoking on regular basis.
The second set (exposed children) represents children who were exposed to tobacco smoking on regular basis.
Blood cotinine level is measured in nanograms per millimeter (ng/ml).
Blood-cotinine
levels for two
groups of children
Cotinine levels of
unexposed
children (ng/ml)
Cotinine levels of
exposed children
(ng/ml)
Cotinine levels
of unexposed
children (ng/ml)
Cotinine levels of
exposed children
(ng/ml)
0.54 1.88 0.44 1.78
0.62 1.57 0.4 2.04
0.65 2.13 0.55 1.93
0.49 2.61 0.35 2.44
0.51 2.25 0.8 2.4
0.48 2.02 0.33 1.91
0.39 1.65 0.6 1.74
0.63 2 1.1 1.63
0.6 4 0.51 2.2
0.43 1.77 0.41 1.61
0.39 2.21 0.51 1.97
0.5 1.53 0.42 3.5
0.45 2.17 1.2 1.78
0.57 2.21 0.61 1.81
0.38 1.41 0.48 2.23

48
• Questions:
• Construct histograms and cumulative frequency curves of the two data sets
• According to the Center for Disease Control and Prevention, non-smokers exposed to low
levels of ETS typically have blood cotinine concentrations less than 1 ng/ml.
- What percent of the unexposed children has a blood cotinine level of more than 1 ng/ml?
- What percent of the exposed children has a blood cotinine level of more than 1 ng/ml?
Cotinine levels of
unexposed
children (ng/ml)
Cotinine levels of
exposed children
(ng/ml)
Cotinine levels
of unexposed
children (ng/ml)
Cotinine levels of
exposed children
(ng/ml)
0.54 1.88 0.44 1.78
0.62 1.57 0.4 2.04
0.65 2.13 0.55 1.93
0.49 2.61 0.35 2.44
0.51 2.25 0.8 2.4
0.48 2.02 0.33 1.91
0.39 1.65 0.6 1.74
0.63 2 1.1 1.63
0.6 4 0.51 2.2
0.43 1.77 0.41 1.61
0.39 2.21 0.51 1.97
0.5 1.53 0.42 3.5
0.45 2.17 1.2 1.78
0.57 2.21 0.61 1.81
0.38 1.41 0.48 2.23
Blood-cotinine
levels for two
groups of children

49
Unexposed children Exposed children
Mean 0.545 2.079
Median 0.505 1.985
Mode 0.51 2.21
Standard Deviation 0.195 0.542
Sample Variance 0.03799 0.29398
Range 0.87 2.59
Minimum 0.33 1.41
Maximum 1.2 4
Sum 16.34 62.38
Count 30 30
Comparison of blood-cotinine (ng/ml) for two groups of children

51
0
2
4
6
8
10
12
14
16
0
1
2
3
4
5
6
7
8
9
10
0 0.25 0.5 0.75 1 1.25 1.5 1.75 2 2.25 2.5 2.75 3 3.25 3.5 3.75 4 4.25
Frequency (Exposed) Frequency (Unexposed)
Blood-cotinine level (ng/ml)
Frequency(exposed)
Frequency(unexposed)
Histograms of blood-cotinine (ng/ml) for unexposed and exposed children

52
0.00%
20.00%
40.00%
60.00%
80.00%
100.00%
120.00%
0 0.25 0.5 0.75 1 1.25 1.5 1.75 2 2.25 2.5 2.75 3 3.25 3.5 3.75 4 4.25 4.5
Blood-cotinine level (ng/ml)
Unexposed
Exposed
P(x ≤ 1.0)
= 93%
P(x > 1.0) = 7%
P(x ≤ 1.0)
= 0%
P(x > 1.0) = 100%
Cumulative Frequency Curves of blood-cotinine (ng/ml) for
unexposed and exposed children

53
The shape of the frequency distribution
x
Frequency
x
Frequency
x
Frequency
x
Frequency
Mean
Median
Mode
x
Frequency
Mean
Median
Mode
x
Frequency
Mode
Median
Mean
Mode 1 Mode 2
Mean
(d) Steep Shape
( c) Skewed to the left
(negatively skewed)
(f) Bimodal or Multimodal
(b) Skewed to the right
(positively skewed)
(a) Symmetrical shape
(e) Flat Shape
Common Shapes of Frequency Distribution

54
0
5
10
15
20
25
Percent
x
Mean ≈ Mode ≈ Median (……)
Mean > Mode > Median (……)
Mean < Mode < Median (……)
None of the above (……)
Mean ≈ Median ( ….. )
Mean > Median (……)
Mean < Median (…...)
None of the above (…...)
In the following histograms,
- Determine if the shape of the distribution is symmetric, uniform, negatively skewed, positively skewed, or
bimodal.
- Determine the relationship between mean, mode, and median for each shape
0
10
20
30
40
50
60
Percent
x
0
10
20
30
40
50
60
Percent
x
0
10
20
30
40
50
60
Percent
x
0
5
10
15
20
25
30
35
40
Percent
x
(a) …………. (b) ………………….. (c) ………………….
(d) …………. (e) …………………..

55
The weighted mean and standard deviation
Example: Suppose you had a party with 20 people visiting you. You went to
McDonald’s restaurant and bought five Big Mac meals for $6 each, eight happy
meals for $2.50 each, and seven grilled chicken meals for $5.50 each. How much
did you pay and what is the average price per meal? What is the variance of the
meal price?
Price (x)
Frequency
(fi)
6.0 5
2.5 8
5.5 7
Sum n = 20
xifi
30.0
20.0
38.5
88.5
1.575
-1.925
1.075
2.48
3.71
1.16
12.40
29.65
8.09
50.14

56
In a stat test, 4 students made 75/100, 24 students made 85/100, and 12 students made
95/100. What is the mean, standard deviation, and variance of the grade?

57
The Seno car body shop pays its hourly employees $10.50, $15.00, or $20.00 per hour. There are 26 hourly employees, 14 of which are
paid at the $10.50 rate, 10 at the $15.00 rate, and 2 at the $20.00 rate.
- What is the mean hourly rate paid?
- What is the standard deviation?
- What is the variance?

58
Use the frequency distribution shown in the table below to determine the mean, standard deviation and
variance of the property tax value for the sample of 50 houses.
Tax ($) Frequency (fi)
6000 6
7000 8
8000 16
9000 11
10000 5
11000 4

59
What is Chebyshev’s theorem?
Chebyshev’s Theorem:
In a frequency distribution of data, the proportion of the values that lie within k standard
deviations of the mean is at least 1-1/k2, where k is any constant greater than one.
Example: The Figure below shows the frequency distributions of annual salary of professors of two different
colleges. As you can see in this figure, both colleges share the same average annual salary, but college B had
twice the standard deviation, which is reflected in a much wider distribution than that of college A.
• At least what percent of the salaries lie within plus and minus 2.0 standard deviations of the mean?
• At least what percent of the salaries lie within
Plus and minus 3.0 standard
deviations of the mean?
Annual Salaries of Full Professors
in two Different Colleges
College A, s = $10,000
RelativeFrequency(%)
Salary
Mean
$80,000
College B, s = $20,000

60
In remodeling your home, you used many contractors that were paid hourly an average wage of $50. The standard
deviation of wages was $15.
- At least what percent of the wages lie within plus 2 standard deviations and minus 2 standard deviations of the
average wage?
- At least what percent of the wages lie within plus 3 standard deviations and minus 3 standard deviations of the
average wage?

61
What is the empirical rule?
Empirical Rule:
For a symmetrical, bell-shaped frequency distribution, 68.26 percent of the observations
will lie within plus and minus one standard deviation of the mean; 95.44 percent of the
observations will lie within plus and minus two standard deviations of the mean; and
99.74 percent will lie within plus and minus three standard deviations of the mean
m +/- 3 s
99.74%
m +/- 2 s
95.44%
Mean
Mode
Median
m +/- 1 s
68.26%
x

62
What is the empirical rule?
Example: Assuming that College ‘A’ frequency distribution of salary shown in the previous Figure
represents a normal distribution, where the mean of professors’ salaries is $80,000, and the
standard deviation is $10,000, answer the following questions:
• What is the percent of salaries within the range of the mean plus and minus 2 standard deviation?
• What is the percent of salaries within the range of the mean plus and minus 3 standard deviation?
Annual Salaries of Full
Professors
in two Different Colleges
College A, s = $10,000
Salary
Mean
$80,000
College B, s = $20,000
or (from $50,000 to $110,000) is 99.74%).
The percent of salaries within the range of the mean plus and minus 3 standard deviation,
The percent of salaries within the range of the mean plus and minus 2 standard deviation,
or (from $60,000 to $100,000) is 95.44%.

63
A course average grade is typically 80%, and the standard deviation of grade is 5%.
Assuming that the grade value has bell-shaped symmetrical distribution:
- What percent of students will make ‘C’ grade or lower (C = 70%)
- What percent of students will make ‘B’ grade or better (B = 80%)
- What percent of students will make ‘A’ grade or better (A = 90%)
- What percent of students will make ‘C’ grade or better (C = 70%)

64
A course average grade is typically 75%, and the standard deviation of grade is 2.5%. Assuming that the grade value has bell-shaped
symmetrical distribution:
- What percent of students will make ‘C’ grade or lower (C = 70%)
- What percent of students will make ‘B’ grade or better (B = 80%)
-What percent of students will make ‘A’ grade or better (A = 90%)
- What percent of students will make ‘C’ grade or better (C = 70%)

65
Heights of men have a bell-shaped distribution with a mean of 176 cm and a standard deviation of 7 cm.
Using the empirical rule, what is the approximate percentage of men:
a. 169 cm and 183 cm?
b. 155 cm and 197 cm?
c. Taller than 190?

Other Forms of Graphical Data Description
Dot Plot
Steam and Leaf Plot
Box Plot
10 15 20 25 30 35 40
x
BoxPlot
10 15 20 25 30 35
x
DotPlot
Frequency Stem Leaf
10 1 2 4 5 6 6 8 8 8 9 9
14 2 0 2 2 2 2 2 2 2 2 2 3 4 4 5
2 3 0 2
26
66

300 350 400 450 500 550 600 650 700 750
Monthly Rent ($)
DotPlot
Dot Plot of Apartments Rents
Example: Suppose you are searching for an apartment for rent near your college and a
random sample of apartments were of the following monthly rent ($):
500, 500, 400, 400, 350, 350, 400, 400, 500, 500, 500, 500, 600, 600, 600, 700, 700
Construct a dot plot to illustrate the frequency distribution of apartment rent.
Dot Plot
67

Stem & Leaf Plot
G1 G2 G3 G4 G5 G6 G7 G8 G9 G10
119 128 133 135 139 140 143 203 149 211
120 129 133 135 139 140 143 147 150 153
122 130 133 136 139 142 144 147 150 198
123 130 134 137 139 142 144 147 150 153
124 130 134 137 139 142 144 147 152 190
126 131 135 137 139 142 144 147 152 155
126 131 135 137 139 142 198 148 152 215
128 131 135 137 140 142 145 148 152 156
128 131 135 139 140 143 211 149 152 159
128 132 135 139 140 143 147 149 152 160
Example: Area values of composite plates (mm2)
68
An engineer takes 10 sample groups of composite plates, each of 10 plates and measures
the area of each plate (mm2). The reported values are shown in the table below. Construct
a Stem & Leaf Plot

Stem & Leaf Plot
Example: Area values of composite plates (mm2)
Frequency Stem Leaf
1 11 9
11 12 0 2 3 4 6 6 8 8 8 8 9
35 13 0 0 0 1 1 1 1 2 3 3 3 4 4 5 5 5 5 5 5 5 6 7 7 7 7 7 9 9 9 9 9 9 9 9 9
31 14 0 0 0 0 0 2 2 2 2 2 2 3 3 3 3 4 4 4 4 5 7 7 7 7 7 7 8 8 9 9 9
14 15 0 0 0 2 2 2 2 2 2 3 3 5 6 9
1 16 0
0 17
0 18
3 19 0 8 8
1 20 3
3 21 1 1 5
100
69
Statistics Value
Mean 144.30
Median 140.24
Mode 198
Standard Deviation 18.68
Sample Variance 348.78
Range 96.35
Minimum 118.65
Maximum 215

Percentiles and Box Plot
6121 7748 7094
8419 7828 8390
8452 8289 7178
7141 6644 7886
8461 7963 7914
Example: Data of property tax of a sample of houses in Monmouth County, New Jersey
n Annual Property Tax ($)
1 6121
2 6644
3 7094
4 7141
5 7178
6 7748
7 7828
8 7886
9 7914
10 7963
11 8289
12 8390
13 8419
14 8452
15 8461
Sort
)1(
100
 n
p
lp
A set of data can be divided into percentiles
with the desired percentile obtained from the
following equation:
where lp represents the location of a certain percentile, p is the required
percentile, and n is the number of observations in the data set.
70

Percentiles and Box Plot
n Annual Property Tax ($)
1 6121
2 6644
3 7094
4 7141
5 7178
6 7748
7 7828
8 7886
9 7914
10 7963
11 8289
12 8390
13 8419
14 8452
15 8461
)1(
100
 n
p
lp
12)115(
100
75
8)115(
100
50
4)115(
100
25
75
50
25



l
l
l Lower quartile
Median
Upper quartile
6000 6200 6400 6600 6800 7000 7200 7400 7600 7800 8000 8200 8400 8600
Annual Property Tax ($)
Minimum
Value
Maximum
Value
Q1
Lower 25%
(Lower quartile)
Q3
Upper 75%
(Upper quartile)
Median
71

72
Listed below are the property taxes ($) paid by a sample of 15 houses in Montgomery,
Alabama, in 2009:
- Locate the median, the first quartile, and the third quartile for the property taxes.
- Construct a steam and leaf plot
- Construct a histogram
- Construct a box plot
2000 1500 1800 1050 2000 2050 1800 1670 1800 1400 2800 950 1800 1200 1300

For each of he following three data sets:
(a) Determine mean, median, mode, range, standard deviation, skewness, and kurtosis
for each data set
(b) Construct histogram, dot plot, and box plot
(c) Using these graphs, explain how each graph reveals a different type of information
(d) Compare the three data sets using histogram, dot plot, and box plot
14 17 19 18 21
14 17 19 18 21
15 17 19 18 22
15 17 19 18 22
16 18 20 18 18
16 18 20 18 18
16 18 20 18 18
20 21 23 24 25
19 21 23 24 25
18 22 23 24 25
19 22 24 25 25
18 22 24 25 25
20 23 24 25 25
19 23 24 25 25
20 23 24 25 25
Data Set A:
Data Set B:
22 16 24 19 22
21 14 23 22 19
22 16 21 20 19
19 17 24 14 16
24 14 14 14 24
14 24 24 20 22
20 21 15 21 20
19 16 19 16 18
Data Set C:

Example: Determine the skewness of the data in Table below and explain
how the result is comparable to that of the Box Plot.
6121 7748 7094
8419 7828 8390
8452 8289 7178
7141 6644 7886
8461 7963 7914
Mean 7701.87
Median 7886
Standard Deviation 717.80
Sample Variance 515241.6
Kurtosis -0.01347
Skewness -0.86623
Range 2340
Minimum 6121
Maximum 8461
Descriptive statistics of tax data
????
What does a negative Skew mean?
74

6000 6500 7000 7500 8000 8500 9000
Q1 = 7159 Q3 = 8339.5Median = 7886
0
1
2
3
4
5
6
6000 6500 7000 7500 8000 8500
Frequency
Taxes ($)
Box Plot and Histogram of Property Tax Data
75

Scatter Plot
The Number of Hours you Study Every Week
Grade(%) How many hours you study every week?
How does this impact your grade?
1 2 3 4 5 6 7 8
100
90
80
70
60
50
40
76

Number of hours of Study per
week
Grade (out of
100%)
4 81
7 72
6 85
5 75
7 90
8 90
1 60
1 45
5 89
6 88
7 88
7 75
8 81
8 95
7 87
6 97
3 75
6 65
5 85
3 80
Scatter Plot: Example
You can develop a scatter plot
manually or using Excel
77

Developing a Scatter Plot Using Excel Program-Steps 1 through 3
(1) Block the x and y
Columns of the data of
interest using the mouse
(2) Click on Insert Button
(3) Click on Charts Button
78

(4) Select XY (Scatter) and Click OK
You may also select a particular form
Of Scatter Plot from the right window
Developing a Scatter Plot Using Excel Program- Step 4
79

(5) The output Scatter Plot is shown below.
You can click twice on the graph and select to
make the scatter customization menu appear to
Allow you to customize your graph
Developing a Scatter Plot Using Excel Program- Step 5
80

Developing a Scatter Plot Using Excel Program- Step 6: Adding a Trendline
(6) Click on the points of the plot and click the right-button of the mouse
You will see the menu shown in which you select adding a trendline
81

Developing a Scatter Plot Using Excel Program- Step 7: Trendline Menu
(7) When trendline menu opens, you will find that the default
Is linear (or straight line). You can also check the equation
and R-Square boxes to obtain the equation representing the x-y
Relationship and the degree of strength between x and y, respectively.
Developing a Scatter Plot Using Excel
82

y = 3.9598x + 58.371
R² = 0.4528
r = 0.673
0
10
20
30
40
50
60
70
80
90
100
110
0 1 2 3 4 5 6 7 8 9
TestGradePercent
Hours of Study
Scatter Plot Showing the Relationship between the number of Study Hours and Student’s Grade
Developing a Scatter Plot Using Excel
83

84
The data in table below shows a random sample of houses with their square feet area and the price.
Use Excel® to develop a scatter plot relating house price to the area in square feet. Use trendline option
to obtain the equation and the coefficient of correlation r.
House Square Feet House Price ($)
3200 410,000
3600 440,000
2600 400,000
2000 260,000
3000 425,000
2800 400,000
2000 280,000
2500 380,000
2500 400,000

85
The data in table below shows the points per game and touch down s of a random sample of college teams. Use Excel® to develop a
scatter plot relating touchdowns to points/game. Use trendline option to obtain the equation and the coefficient of correlation r.
TDs Points/Game
27 23.5
26 23.3
25 21.5
21 19.2
31 26.8
28 25.7
28 24.4
25 22
42 28.9
28 23.8
27 21.3
15 17.2
34 27.4
29 24.3
24 23.8
26 21.7

In general, an outlier is a data observation that seems to not belong to the family of
data under consideration.
Examples:
• Students Grades (%):
55, 65, 99, 80, 78, 80, 80, 77, 69, 68, 89, 92, 110, 78, 86
• Adult Weight (lb):
140, 82, 122, 98, 110, 165, 138, 200, 175, 204, 290, 188, 145, 167, 389, 220, 175
• A value of $1 million in a data set of a company annual wages of plausible range
from $40,000 to $220,000
• A value of 180 in a data set of people ages
• A value of 7 years old in a data set of college student’s ages
• A value of $200 million in a data set of top 100 billionaire net worth
86
What is an outlier?

87
Example: Detect outliers in the following data set of monthly accidents on the job during a
year evaluation in a firm of 100 people.
Month Accidents/Month
January 3
February 4
March 1
April 2
May 1
June 2
July 3
August 15
September 2
October 1
November 3
December 2
Accidents on the Job

• Judgment Come First
• Another rule used for detecting outliers is based
on defining an outlier as “a value that is more than
1.5 times the inter-quartile range smaller than the
lower quartile and larger than the upper quartile”
88
Descriptive statistics
Accidents/Month
count 12
mean 3.25
sample variance 14.57
sample standard
deviation 3.82
minimum 1
maximum 15
range 14
1st quartile 1.75
median 2
3rd quartile 3
interquartile range 1.25
mode 2
low extremes 0
low outliers 0
high outliers 0
high extremes 10 2 4 6 8 10 12 14 16
Accidents/Month
BoxPlot
0 2 4 6 8 10 12 14 16
Accidents/Month
DotPlot

89
Property Tax ($)
6000 8000 9000
6800 4800 7000
5800 11000 10000
7000 10050 12000
6800 8000 9000
4000 6700 7000
28000 6800 10000
10000 4000 11000
12000 5999 10050
9000 10000 4800
7000 12000 11000
10000 9500 6800
5600 7500 5800
4800 21000 7000
11000 5600 6800
10050 4200 4000
Working problem 3.25:
The data shown in the table below indicates the property tax ($) of a random sample of houses.
- Determine the mean, the median, and the mode of property tax
- Determine the range, standard deviation, and variance
- Determine the outliers in the data set

Descriptive Statistics Part II: Graphical Description

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Descriptive Statistics Part II: Graphical Description

Ähnlich wie Descriptive Statistics Part II: Graphical Description (20)

Mehr von getyourcheaton

Mehr von getyourcheaton (12)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Descriptive Statistics Part II: Graphical Description

Hinweis der Redaktion