SlideShare ist ein Scribd-Unternehmen logo
1 von 109
Downloaden Sie, um offline zu lesen
CHAPTER 2
Introduction to Data Analysis in
MS Excel and SPSS
FD 502 - Basic Statistics
Christian G. Abalos
Presenter
Table
and
Graphs
Measures of
Central
Tendency
Measures
of Relative
Position
Measures of
Variability
Skewness and
Kurtosis
Introduction
to SPSS
Introduction to
Statistical Tools in MS
Excel and SPSS
Frequency
Distribution
Targets
1. Construct frequency distribution table
using SPSS.
2. Present MS Excel and SPSS results in
graphs or tables using the APA format
and;
3. Use MS Excel and SPSS in computing for
descriptive statistics and;
4. Interpret MS Excel and SPSS results.
Basic Statistics FD 502
Basic Statistics FD 502
Data Analysis with Excel is a comprehensive tutorial
that provides a good insight into the latest and
advanced features available in Microsoft Excel. It
explains in detail how to perform various data
analysis functions using the features available in MS-
Excel.
What is Data Analysis ToolPack in Excel?
Introduction to Data Analysis FD 502
Data Analysis ToolPack in MS Excel
These instructions apply only to Excel 2010 and present versions.
• Click the File tab, click Options, and then click Add-ins category.
• In the Manage box, select Excel Add-ins and the click Go.
• In the Add-ins available box, select the Analysis ToolPak check box
and then click OK.
Tip:
If Analysis ToolPack is not listed Add-Ins available check box, click
Browse to locate it.
If you are prompted that the Analysis ToolPack is not currently
installed on your computer, click Yes to install it.
Introduction to Data Analysis FD 502
Introduction to Data Analysis FD 502
Introduction to Data Analysis FD 502
Introduction to Data Analysis FD 502
Introduction to Data Analysis FD 502
Features of Data Analysis ToolPak in MS Excel
Introduction to Data Analysis FD 502
Sample Data for Exploring Data
Analysis Tool Pack
Introduction to Data Analysis FD 502
IBM Statistical Package for Social Sciences (SPSS)
Introduction to Data Analysis FD 502
Features of IBM Statistical Package for Social
Sciences (SPSS)
• Standard Package
• Data Access and
Management
• Data Preparation
• Graphs
• Output
• Data Editor Enhancement
• Extended Programmability
• Statistics
• Multi Threaded Algorithm
• Bootstrapping
• Regression
• Advanced Statistics
Introduction to Data Analysis FD 502
Data Creation in IBM SPSS
Introduction to Data Analysis FD 502
Introduction to Data Analysis FD 502
Introduction to Data Analysis FD 502
Sample Data in Exploring IBM SPSS
Frequency Distribution
A frequency distribution is the organization of
raw data in table form, using classes and
frequencies.
There are three basic types of frequency
distributions. The three types are categorical,
ungrouped and grouped frequency distributions.
Frequency Distribution FD 502
Categorical Frequency Distribution
The categorical frequency distribution is used
for data that can be placed in specific
categories, such as nominal- or ordinal-level
data.
For example, data such as political affiliation,
religious affiliation, or major field of study.
Frequency Distribution FD 502
Twenty-five army inductees were given a blood
test to determine their blood type. The data set is as
follows:
A B B AB O
O O B AB B
B B O A O
A O O O AB
AB A O B A
Example
Frequency Distribution FD 502
The categorical frequency distribution is
Blood Type Frequency Percent
A 5 20
B 7 28
O 9 36
AB 4 16
N = 25 100
Frequency Distribution FD 502
Ungrouped Frequency Distribution
An ungrouped frequency
distribution is used for numerical data
and when the range (the difference between
the highest and the smallest values) is small.
Frequency Distribution FD 502
Example
Frequency Distribution FD 502
Class Limits (in miles) Frequency Percentage
12 6 20
13 1 3
14 3 10
15 6 20
16 8 27
17 2 7
18 3 10
19 1 3
N = 30 100
The ungrouped frequency distribution is
Frequency Distribution FD 502
Grouped Frequency Distribution
When the range of the data is large, the data
must be grouped into classes that are more than one
unit in width.
To construct a frequency distribution, follow these
rules:
1. There should be between 5 and 20 classes.
2. The class width should be an odd number. This
ensures that the midpoint of each class has the
same place value as the data.
Frequency Distribution FD 502
3. The classes must be mutually
exclusive. Mutually exclusive classes have
nonoverlapping class limits so that data cannot be
placed into two classes.
4. The classes must be continuous. There
should be no gaps in a frequency distribution.
5. The classes must be exhaustive. There
should be enough classes to accommodate all the data.
6. The classes must be equal in width.
This avoids a distorted view of the data.
Grouped Frequency Distribution
CHAPTER 2
Presentation of Data
FD 502 - Basic Statistics
Christian G. Abalos
Presenter
Frequency Distribution FD 502
Constructing Grouped Frequency Distribution
1. Find the range.
range = highest value – lowest value
2. Decide on the number of class intervals or
classes, we denote it by k.
´ Sturge’s Formula: k = 1 + log2N
´ another formula:
´ 5 – 20 classes
Frequency Distribution FD 502
3. Determine the class size or class width of
the interval, we denote it by c.
(rounded to the nearest odd whole number)
4. Determine the lower limit LL and the upper
limit UL of the lowest class interval. The lowest class
interval should contain the lowest value in the data set. The value of
the UL is determined using the equation
UL = LL + (c – 1)
Constructing Grouped Frequency Distribution
Frequency Distribution FD 502
Constructing Grouped Frequency Distribution
5. Determine the upper class intervals by
consecutively adding the class size c to the
values of LL and UL of the lowest class
interval until we get the class interval with the
highest value in the data set.
6. Tally the data, find the frequencies.
Note: Other statistical information may be reflected in the table such
as class boundaries, class marks or class midpoints, less than
cumulative frequency (<cf), greater than cumulative frequency (>cf),
and the relative frequency (rf)
Frequency Distribution FD 502
´The class boundaries are used to
separate the classes so that there are
no gaps in the frequency distribution.
Other Features of Grouped Frequency Distribution
Frequency Distribution FD 502
• The class midpoint is found by adding the upper and
lower boundaries (or limits) and dividing by 2.
• The cumulative frequencies are used to determine the
number of cases falling below (for <cf) or above (for
>cf) a particular value in a distribution.
• The relative frequency (rf) of a class interval is the
proportion of observations falling within the class and
maybe presented in percent.
Thus,
Other Features of Grouped Frequency Distribution
100
x
n
f
rf =
Frequency Distribution FD 502
Distribution of scores of forty students in a
Mathematics class.
Example
Frequency Distribution FD 502
Why do we construct frequency distribution?
Graphical Presentations of Data
The three most common statistical graphs are
the bar graph (histogram), the frequency
polygon, and the cumulative frequency or the
ogive.
The purpose of graphs in statistics is to convey
the data to the viewer in pictorial form.
Graphs are useful in getting the audience’s
attention in a publication or a presentation.
Graphs FD 502
Bar Graph
Graphs FD 502
Bar Graph
Graphs FD 502
Histogram
The histogram is a graph that displays the data
by using vertical bars of various heights to
represent the frequencies.
Graphs FD 502
Frequency Polygon
The frequency polygon is a graph that displays
the data by using lines that connect points plotted
for the frequencies at the midpoints of the classes.
Graphs FD 502
Ogive
The ogive is the graph that represents
the cumulative frequencies for the classes
in a frequency distribution.
Graphs FD 502
Other types of Graphs
Pareto Chart
A Pareto chart is used to represent a frequency
distribution for categorical variable, and the
frequencies are displayed by the heights of
vertical bars, which are arranged in order from
highest to lowest.
Graphs FD 502
Other types of Graphs
Pie Chart
A pie chart is a circle that is divided into
sections according to the percentage of
frequencies in each category of the
distribution.
Graphs FD 502
A stem-and-leaf plot is a data plot that uses part of
a data value as a stem and part of the data value as
the leaf to form groups or classes.
It has the advantage over grouped frequency
distribution of retaining the actual data while
showing them in graphic form.
Other types of Graphs
Graphs FD 502
Short History on Stem-Leaf Diagram
Graphs FD 502
Example
CHAPTER 3
Descriptive Statistics
FD 502 - Basic Statistics
Christian G. Abalos
Presenter
Summary of Measures
Summary
Measures
Central
Tendency
Mean Median Mode
Other
Locations
Percentiles Quartiles Deciles
Variation
Range Variance
Standard
Deviation
Coefficient
of Variation
Measures of Central Tendency FD 502
A measure of central tendency or measure of
central location describes the “center” of a
given set of data. This is a value about which
observations tend to cluster.
´ A single value used to represent the “center” of the data or the typical
value.
´ An index of the central location of a distribution.
´ Precise but simple
´ The most representative value of the data
Measures of Central Tendency
Measures of Central Tendency FD 502
Common measures of central
tendency are the MEAN, MEDIAN, and
MODE.
Arithmetic Mean
The arithmetic mean or simply the
mean is the average of a given set of
data. It is obtained by dividing the sum
of all the observations by the total
number of observations.
Measures of Central Tendency FD 502
population mean for a finite
population with N elements,
denoted by the Greek letter μ
sample mean for a finite sample
with n elements, denoted by
The population mean is a parameter while the sample mean is a statistic.
Arithmetic Mean
Measures of Central Tendency FD 502
A random sample of 5 BSED students about to
take their final examination were asked how
many hours they slept the night before the test.
The data given are 5, 7, 3, 4, and 6. The mean
number of hours of sleep is
x =
xi
i =1
n
å
n
=
5+7+3+ 4 +6
5
= 5 hours
Example
Measures of Central Tendency FD 502
Using the previous data in the previous
example, if the student reported 14 hours of
sleep instead of 3 hours, then the new mean
is
x =
xi
i =1
n
å
n
=
5+7+14+ 4 +6
5
= 7.2 hours
Remark: The mean takes into account all
observations in the data set. Thus, it is
affected by extreme values.
Example
Measures of Central Tendency FD 502
Mean for Grouped Data
x =
fi xi
i =1
k
å
fi
i =1
k
å
where
fi = frequency of the class interval
xi = class mark of the class interval
Measures of Central Tendency FD 502
Given the frequency distribution table
below, find its mean.
Class Interval Frequency
19 – 21 3
16 – 18 10
13 – 15 4
10 – 12 12
7 – 9 6
Example
Measures of Central Tendency FD 502
We solve first the class mark and the
product of the class mark and the
frequency.
Class
Interval
Frequency
( f )
Class Mark
(x) f x
19 – 21 3 20 60
16 – 18 10 17 170
13 – 15 4 14 56
10 – 12 12 11 132
7 – 9 6 8 48
Sf=35 Sfx=466
Measures of Central Tendency FD 502
x =
fi xi
i =1
k
å
fi
i =1
k
å
=
466
35
= 13.3
Class Interval Frequency
( f )
Class Mark
(x) fx
19 – 21 3 20 60
16 – 18 10 17 170
13 – 15 4 14 56
10 – 12 12 11 132
7 – 9 6 8 48
Sf=35 Sfx=466
Measures of Central Tendency FD 502
Weighted Mean
´Utilized when an individual value have varying
importance.
´Weights are assigned to each observed value before
mean is computed.
Measures of Central Tendency FD 502
Find the GPA of a student with the corresponding grades below:
Example
Subjects Grades Unit
A 1.5 3
B 1.1 2
C 1.8 3
D 2.0 4
E 1.5 3
Subjects Grades Unit
A 1.5 3
B 1.1 2
C 1.8 3
D 2.0 4
E 1.5 3
15
Subjects Grades Unit Grades*units
A 1.5 3 4.5
B 1.1 2 2.2
C 1.8 3 5.4
D 2.0 4 8
E 1.5 3 4.5
15 24.6
̅
𝑥 =
24.6
15
= 1.64
Measures of Central Tendency FD 502
Properties of the Mean
´The most common and widely understood measure of central
tendency which utilize all observed value in the calculation;
´Mean can be computed for grouped and ungrouped data hence,
mean may be based not on the actual observed value;
´The mean is affected by extreme values.
´The value of the mean is always existing and unique;
´Mean is utilized when the distribution is not symmetrical and
when all observed values is given equal importance as well as
bases for statistics.
Measures of Central Tendency FD 502
Median
´divides an ordered observation into two equal parts; the
positional middle of the array
´half of the observations are below its value and the
other half are above its value.
Measures of Central Tendency FD 502
Steps in finding the Median
1. Arrange the set of scores in ascending order
(from lowest to highest)
2. If n is odd, there will be a middle score. This
middle score is the median.
If n is even, there will be two middle scores. The
median is taken as the arithmetic average of the
two middle scores.
Measures of Central Tendency FD 502
Below are the scores of 6 students in
their Mathematics test. Find the
median.
35 20 12 30 25 50
Arranging the scores in increasing
order, we have
12 20 25 30 35 50
Example
Measures of Central Tendency FD 502
Since n=6, the median is the average
of the
and
observations. That is,
n
2
æ
è
ç
ö
ø
÷ =
6
2
æ
è
ç
ö
ø
÷ = 3rd n
2
+1
æ
è
ç
ö
ø
÷ =
6
2
+1
æ
è
ç
ö
ø
÷ = 4th
Md =
x3 + x4
2
=
25+30
2
= 27.5
Measures of Central Tendency FD 502
Below are the scores of 7 students in
their Mathematics test. Find the
median.
35 20 12 30 25 50 26
Arranging the scores in increasing
order, we have
12 20 25 26 30 35 50
Example
Measures of Central Tendency FD 502
Since n=7, the median is the
observation. That is,
Md = x4 = 26
n+1
2
æ
è
ç
ö
ø
÷ =
7+1
2
æ
è
ç
ö
ø
÷ = 4th
Measures of Central Tendency FD 502
´The median is a positional measure.
´Extreme values affect the median less than the mean.
´Median is utilized when there are extreme observed values
´Median is also utilized when grouped data or a frequency
distribution do not have a true zero point of open-ended class
intervals.
Characteristics of the Median
Measures of Central Tendency FD 502
Mode
The observed value that occurs most often or with the greatest frequency in a
data set.
Mode can be identified by counting the frequency of each observed value and
locating the observed value with the highest frequency
Mode is a less popular measure of central tendency as compared to the mean
and the median, but the easiest and can be considered as a quick estimate for
the measure of central tendency.
Measures of Central Tendency FD 502
Find the mode of 7; 5; 5; 3; 1; 1; 3; 5
Since 5 appears most frequent than the rest of the observed values then , the
mode is 5
Find the mode of 7; 5; 5; 3; 1; 1; 3; 5; 1
Since 5 and 1 appears most frequent than the rest of the observed values then ,
the mode is 1 and 5 .
Example
CRUDE MODE
Mo = 3´ Median-2´ Mean
Measures of Central Tendency FD 502
Measures of Central Tendency FD 502
The mode is the most typical value of a set of observations.
Few low or high values do not easily affect the mode.
The mode is sometimes not unique and nonexistent.
There may be several modes for one data set.
We can get the mode for both quantitative and qualitative types of data
Characteristics of Mode
Central tendency in relation to Levels of
measurement
´ Variables measured categorically are either as nominal or ordinal data
and can only be best represented using frequency counts. Its likelihood of
statistical comparison dwells on differences in terms of proportion.
Specifically, a nominal data can be summarized using mode and an
ordinal data using median or mode. As an implication to this, all statistical
test that utilizes comparison of means are not permissible for categorical
data since primarily, mean as a measure of centrality is not existent for
such data type.
´ Both, interval and ratio accommodate mean as a measure of central
tendency, interval data are less powerful as that of a ratio data due to its
arbitrariness of zero point.
Measures of Central Tendency FD 502
Measures of Relative Position FD 502
Summary of Measures
Summary
Measures
Central
Tendency
Mean Median Mode
Other
Locations
Percentiles Quartiles Deciles
Variation
Range Variance
Standard
Deviation
Coefficient
of Variation
Measures of Relative Position FD 502
Measures of Relative Position
Percentile
´ Per-centum
´ Divides the ordered observations into 100 equal parts.
´ There are 99 percentiles, denoted as P1, P2, P3, …, P99 with around 1% of the
observations in each group.
We interpret percentiles as follows:
P1, first percentile, is the value below which 1% of the ordered
values fall.
P2, second percentile, is the value below which 2% of the ordered
values fall.
P99, ninety-ninth percentile, is the value below which 99% of the
ordered values fall.
Measures of Relative Position FD 502
´ Step 1: Arrange the data in ascending order
´ Step 2: Assume that there is no missing data, and all values are existent.
´ Step 3: Let X1, X2, X3, … Xn be the observations arranged from lowest to the
highest
´ Step 4: Denote the percentile of interest with k
´ Step 5: Get the percentile using the formula
(i)
(ii)
Computing the Percentile
Measures of Relative Position FD 502
Another Formula for
Computing the Percentile
Measures of Relative Position FD 502
Below are the scores of 40 students in
their Mathematics test. Find P85.
16 26 31 32 34 37 39 43
19 29 31 33 34 37 39 44
22 30 31 33 35 37 41 45
25 30 32 33 35 38 41 47
26 31 32 34 36 38 42 47
Example
Measures of Relative Position FD 502
We seek the value below which
As seen from the table, P85 could be
any value between 41 and 42. To have
a unique value, we define
85
100
´ 40 = 34 observations fall
P85 =
41+ 42
2
= 41.5
Example
Measures of Relative Position FD 502
Deciles are values that divide a set of ordered
observations into 10 equal parts. These values,
denoted by D1, D2, …, D9, are such that 10%of the data
falls below D1, 20% falls below D2, …, and 90% falls
below D9.
Decile
´ Refers to the nine (9) values that divide an ordered data set into 10 equal parts
´ The ith decile, Di is a value below which 10 x i % of the data lie
D1, first decile, is the value below which 10% of the ordered
values fall.
Measures of Relative Position FD 502
Quartile
´ Refers to three (3) values that divide an ordered
data sets into 4 equal parts
´ Split Ordered Data into 4 Quarters
The ith quartile, Qi is a value below which 25x i % of the data lie
Measures of Relative Position FD 502
The lower quartile
denoted by Q1 have the
lowest observed values of
the data set. It divides the
bottom 25% of the
ordered observations from
the top 75%.
The upper quartile
denoted by Q3 have
the highest observed
values of the data set.
It divides the bottom
75% of the ordered
observations from the
top 25%.
The middle quartile
denoted by Q2 contains
the next highest
observed values of the
data set. It divides the
bottom 50% of the
ordered observations
from the top 50%.
Qk ,k = 1,2,3 is a value in an ordered distribution, such that the k% of the ordered data in
the distribution are < Qk.
Measures of Relative Position FD 502
In a box and whisker plot: the ends of
the box are the upper and lower quartiles, so
the box spans the interquartile range. the median is
marked by a vertical line inside the box.
the whiskers are the two lines outside the box that
extend to the highest and lowest observations.
Box and Whiskers Plot
Measures of Relative Position FD 502
Box and Whiskers Plot
Measures of Relative Position FD 502
Summary of Measures
Summary
Measures
Central
Tendency
Mean Median Mode
Other
Locations
Percentiles Quartiles Deciles
Variation
Range Variance
Standard
Deviation
Coefficient
of Variation
Measures of Variability FD 502
Measures of Variability
Absolute Dispersion: Necessary to compare two or
more data sets with similar means and unit of
measurement.
Relative Dispersion: Necessary to compare two or
more data sets with different means and varying of
measurement.
Consider the two sets of data below.
Set A
25, 28, 28, 30, 30, 33, 35, 40, 41, 45
Set B
10, 15, 23, 28, 28, 30, 39, 45, 52, 65
They have the same mean (33.5) but
Set A is more homogeneous than Set B.
Measures of Variability FD 502
Measures of Variability FD 502
Range
The range of a set of data is the
difference between the largest and
smallest number in the set.
Example:
In Set A, the range is 45 – 25 = 20.
In Set B, the range is 65 – 10 = 55.
Measures of Variability FD 502
Measures of Variability FD 502
Mean Absolute Deviation (MAD)
MAD =
xi - x
å
N
Where
xi = score
= mean of the scores
N = total number of scores
Measures of Variability FD 502
Mean Absolute Deviation (MAD) Grouped Data
MAD =
f X - X
( )
å
N
Where
X = class mark
X = mean
f = frequency
N = total number of cases
Measures of Variability FD 502
Population Variance
Ungrouped Data
Given the finite population x1, x2, …,
xN, the population variance is
s2
=
xi -µ
( )
2
i =1
N
å
N
Measures of Variability FD 502
Sample Variance
Ungrouped Data
Given a random sample x1, x2, …,
xn, the sample variance is
s2
=
xi - x
( )
2
i=1
n
å
n -1
s2
=
xi - x
( )
2
i=1
n
å
n
Biased estimator:
Unbiased estimator:
Measures of Variability FD 502
Sample Standard Deviation
s =
xi - x
( )
2
i =1
n
å
n -1
Measures of Variability FD 502
Computational Formula for the Sample
Variance (unbiased)
s2
=
n x2
- x
å
( )
2
å
n n-1
( )
Measures of Variability FD 502
Example:
Set A
25, 28, 28, 30, 30, 33, 35, 40, 41, 45
We have
s2
=
xi -33.5
( )
2
i =1
10
å
10-1
=
25-33.5
( )
2
+...+ 45-33.5
( )
2
9
s2
= 43
s = 7
Measures of Variability FD 502
Example:
Set B
10, 15, 23, 28, 28, 30, 39, 45, 52, 65
We have
s2
=
xi -33.5
( )
2
i =1
10
å
10-1
=
10-33.5
( )
2
+...+ 65-33.5
( )
2
9
s2
= 287 and s = 17
Measures of Variability FD 502
Sample Variance
Grouped Data
s2
=
f
( )xi - x
( )
2
i=1
n
å
n
where
f = frequency
x = class mark
= mean
n = total number of observations
Measures of Variability FD 502
Measure of Relative Variation
C.
V .=
s
x
× 100%
To compare the variability of data
sets measured in different units, we use
the measure of relative variation called
coefficient of variation. This index
expresses the standard deviation as a
percentage relative to the mean. It’s
value is given by
Measures of Variability FD 502
Example:
Determine which data set is more
spread out.
Measures of Variability FD 502
We first compute the means and
standard deviations of the sets of data.
Data Set 1:
Data Set 2
:
x = 24 years and s = 3.742 years
x = P 8875 and s = P 2267.984
Measures of Variability FD 502
So, we have
Data Set 1:
Data Set 2
:
C.
V .=
s
x
× 100% =
3.742 years
24 years
× 100% =15.59%
C.
V .=
s
x
× 100% =
P 2267.984
P 8875
× 100% =25.55%
Therefore, net take home pay is more
scattered with respect to the mean than
years of teaching experience of teachers.
Measures of Distribution FD 502
Skewness and Kurtosis
Skewness measures the degree and direction of asymmetry. A
symmetric distribution such as a normal distribution has a skewness of 0,
and a distribution that is skewed to the left, e.g. when the mean is less
than the median, has a negative skewness. the extent to which a
distribution of values deviates from symmetry around the mean. A
value of zero means the distribution is symmetric, while a positive
skewness indicates a greater number of smaller values, and a negative
value indicates a greater number of larger values. Values for
acceptability for psychometric purposes (+/-1 to +/-2) are the same as
with kurtosis.
Measures of Variability FD 502
Measures of Distribution FD 502
Skewness and Kurtosis
Kurtosis measures of the "peakedness" or "flatness" of a
distribution. A kurtosis value near zero indicates a shape close to
normal. A negative value indicates a distribution which is more
peaked than normal, and a positive kurtosis indicates a shape flatter
than normal. An extreme positive kurtosis indicates a distribution
where more of the values are located in the tails of the distribution
rather than around the mean. A kurtosis value of +/-1 is considered
very good for most psychometric uses, but +/-2 is also usually
acceptable.
Measures of Distribution FD 502
Measures of Variability FD 502
Interpretation of Skewness (Bulmer, 1979)
• If skewness is less than -1 or greater than 1, the
distribution is highly skewed.
• If skewness is between -1 and -0.5 or between 0.5
and 1, the distribution is moderately skewed.
• If skewness is between -0.5 and 0.5, the
distribution is approximately symmetric.
Interpretation of Kurtosis in SPSS
• A kurtosis value near zero indicates a shape close
to normal
• A negative value indicates a distribution which is
more peaked than normal.
• A positive value indicates a shape flatter than
normal.
Measures of Variability FD 502
Thank You…

Weitere ähnliche Inhalte

Ähnlich wie Intro to Data Analysis and Descriptive Statistics FD 502 Presentation.pdf

Tabulation of Data, Frequency Distribution, Contingency table
Tabulation of Data, Frequency Distribution, Contingency tableTabulation of Data, Frequency Distribution, Contingency table
Tabulation of Data, Frequency Distribution, Contingency table
Jagdish Powar
 
Das20502 chapter 1 descriptive statistics
Das20502 chapter 1 descriptive statisticsDas20502 chapter 1 descriptive statistics
Das20502 chapter 1 descriptive statistics
Rozainita Rosley
 
Aed1222 lesson 5
Aed1222 lesson 5Aed1222 lesson 5
Aed1222 lesson 5
nurun2010
 
NOTE STATISTIC BA301
NOTE STATISTIC BA301NOTE STATISTIC BA301
NOTE STATISTIC BA301
faijmsk
 
Chapter 2 250110 083240
Chapter 2 250110 083240Chapter 2 250110 083240
Chapter 2 250110 083240
guest25d353
 
Chapter 2 250110 083240
Chapter 2 250110 083240Chapter 2 250110 083240
Chapter 2 250110 083240
guest25d353
 
Graphical presentation of data
Graphical presentation of dataGraphical presentation of data
Graphical presentation of data
prince irfan
 

Ähnlich wie Intro to Data Analysis and Descriptive Statistics FD 502 Presentation.pdf (20)

Chapter 2
Chapter 2Chapter 2
Chapter 2
 
Chapter 2
Chapter 2Chapter 2
Chapter 2
 
Statistics.ppt
Statistics.pptStatistics.ppt
Statistics.ppt
 
Statistics and probability lesson5
Statistics and probability lesson5Statistics and probability lesson5
Statistics and probability lesson5
 
Tabulation of Data, Frequency Distribution, Contingency table
Tabulation of Data, Frequency Distribution, Contingency tableTabulation of Data, Frequency Distribution, Contingency table
Tabulation of Data, Frequency Distribution, Contingency table
 
Das20502 chapter 1 descriptive statistics
Das20502 chapter 1 descriptive statisticsDas20502 chapter 1 descriptive statistics
Das20502 chapter 1 descriptive statistics
 
Aed1222 lesson 5
Aed1222 lesson 5Aed1222 lesson 5
Aed1222 lesson 5
 
first lecture to elementary statistcs
first lecture to elementary statistcsfirst lecture to elementary statistcs
first lecture to elementary statistcs
 
Data presenattaion we can read this document..pptx
Data presenattaion  we can read this document..pptxData presenattaion  we can read this document..pptx
Data presenattaion we can read this document..pptx
 
Rj Prashant's ppts on statistics
Rj Prashant's ppts on statisticsRj Prashant's ppts on statistics
Rj Prashant's ppts on statistics
 
NOTE STATISTIC BA301
NOTE STATISTIC BA301NOTE STATISTIC BA301
NOTE STATISTIC BA301
 
Frequency distribution 6
Frequency distribution 6Frequency distribution 6
Frequency distribution 6
 
2.1 frequency distributions for organizing and summarizing data
2.1 frequency distributions for organizing and summarizing data2.1 frequency distributions for organizing and summarizing data
2.1 frequency distributions for organizing and summarizing data
 
Chapter 2 250110 083240
Chapter 2 250110 083240Chapter 2 250110 083240
Chapter 2 250110 083240
 
Chapter 2 250110 083240
Chapter 2 250110 083240Chapter 2 250110 083240
Chapter 2 250110 083240
 
Spss presentation
Spss presentationSpss presentation
Spss presentation
 
Graphical presentation of data
Graphical presentation of dataGraphical presentation of data
Graphical presentation of data
 
Principlles of statistics
Principlles of statisticsPrinciplles of statistics
Principlles of statistics
 
measure of dispersion
measure of dispersion measure of dispersion
measure of dispersion
 
Ch 3 DATA.doc
Ch 3 DATA.docCh 3 DATA.doc
Ch 3 DATA.doc
 

Kürzlich hochgeladen

ppt your views.ppt your views of your college in your eyes
ppt your views.ppt your views of your college in your eyesppt your views.ppt your views of your college in your eyes
ppt your views.ppt your views of your college in your eyes
ashishpaul799
 
Liberal & Redical Feminism presentation.pptx
Liberal & Redical Feminism presentation.pptxLiberal & Redical Feminism presentation.pptx
Liberal & Redical Feminism presentation.pptx
Rizwan Abbas
 

Kürzlich hochgeladen (20)

Dementia (Alzheimer & vasular dementia).
Dementia (Alzheimer & vasular dementia).Dementia (Alzheimer & vasular dementia).
Dementia (Alzheimer & vasular dementia).
 
....................Muslim-Law notes.pdf
....................Muslim-Law notes.pdf....................Muslim-Law notes.pdf
....................Muslim-Law notes.pdf
 
ppt your views.ppt your views of your college in your eyes
ppt your views.ppt your views of your college in your eyesppt your views.ppt your views of your college in your eyes
ppt your views.ppt your views of your college in your eyes
 
Liberal & Redical Feminism presentation.pptx
Liberal & Redical Feminism presentation.pptxLiberal & Redical Feminism presentation.pptx
Liberal & Redical Feminism presentation.pptx
 
50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...
50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...
50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...
 
size separation d pharm 1st year pharmaceutics
size separation d pharm 1st year pharmaceuticssize separation d pharm 1st year pharmaceutics
size separation d pharm 1st year pharmaceutics
 
The Benefits and Challenges of Open Educational Resources
The Benefits and Challenges of Open Educational ResourcesThe Benefits and Challenges of Open Educational Resources
The Benefits and Challenges of Open Educational Resources
 
Mbaye_Astou.Education Civica_Human Rights.pptx
Mbaye_Astou.Education Civica_Human Rights.pptxMbaye_Astou.Education Civica_Human Rights.pptx
Mbaye_Astou.Education Civica_Human Rights.pptx
 
2024_Student Session 2_ Set Plan Preparation.pptx
2024_Student Session 2_ Set Plan Preparation.pptx2024_Student Session 2_ Set Plan Preparation.pptx
2024_Student Session 2_ Set Plan Preparation.pptx
 
Basic_QTL_Marker-assisted_Selection_Sourabh.ppt
Basic_QTL_Marker-assisted_Selection_Sourabh.pptBasic_QTL_Marker-assisted_Selection_Sourabh.ppt
Basic_QTL_Marker-assisted_Selection_Sourabh.ppt
 
Championnat de France de Tennis de table/
Championnat de France de Tennis de table/Championnat de France de Tennis de table/
Championnat de France de Tennis de table/
 
slides CapTechTalks Webinar May 2024 Alexander Perry.pptx
slides CapTechTalks Webinar May 2024 Alexander Perry.pptxslides CapTechTalks Webinar May 2024 Alexander Perry.pptx
slides CapTechTalks Webinar May 2024 Alexander Perry.pptx
 
Research Methods in Psychology | Cambridge AS Level | Cambridge Assessment In...
Research Methods in Psychology | Cambridge AS Level | Cambridge Assessment In...Research Methods in Psychology | Cambridge AS Level | Cambridge Assessment In...
Research Methods in Psychology | Cambridge AS Level | Cambridge Assessment In...
 
Incoming and Outgoing Shipments in 2 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 2 STEPS Using Odoo 17Incoming and Outgoing Shipments in 2 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 2 STEPS Using Odoo 17
 
Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17
Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17
Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17
 
An Overview of the Odoo 17 Discuss App.pptx
An Overview of the Odoo 17 Discuss App.pptxAn Overview of the Odoo 17 Discuss App.pptx
An Overview of the Odoo 17 Discuss App.pptx
 
factors influencing drug absorption-final-2.pptx
factors influencing drug absorption-final-2.pptxfactors influencing drug absorption-final-2.pptx
factors influencing drug absorption-final-2.pptx
 
Post Exam Fun(da) Intra UEM General Quiz 2024 - Prelims q&a.pdf
Post Exam Fun(da) Intra UEM General Quiz 2024 - Prelims q&a.pdfPost Exam Fun(da) Intra UEM General Quiz 2024 - Prelims q&a.pdf
Post Exam Fun(da) Intra UEM General Quiz 2024 - Prelims q&a.pdf
 
Features of Video Calls in the Discuss Module in Odoo 17
Features of Video Calls in the Discuss Module in Odoo 17Features of Video Calls in the Discuss Module in Odoo 17
Features of Video Calls in the Discuss Module in Odoo 17
 
Capitol Tech Univ Doctoral Presentation -May 2024
Capitol Tech Univ Doctoral Presentation -May 2024Capitol Tech Univ Doctoral Presentation -May 2024
Capitol Tech Univ Doctoral Presentation -May 2024
 

Intro to Data Analysis and Descriptive Statistics FD 502 Presentation.pdf

  • 1. CHAPTER 2 Introduction to Data Analysis in MS Excel and SPSS FD 502 - Basic Statistics Christian G. Abalos Presenter
  • 2. Table and Graphs Measures of Central Tendency Measures of Relative Position Measures of Variability Skewness and Kurtosis Introduction to SPSS Introduction to Statistical Tools in MS Excel and SPSS Frequency Distribution
  • 3. Targets 1. Construct frequency distribution table using SPSS. 2. Present MS Excel and SPSS results in graphs or tables using the APA format and; 3. Use MS Excel and SPSS in computing for descriptive statistics and; 4. Interpret MS Excel and SPSS results. Basic Statistics FD 502
  • 4. Basic Statistics FD 502 Data Analysis with Excel is a comprehensive tutorial that provides a good insight into the latest and advanced features available in Microsoft Excel. It explains in detail how to perform various data analysis functions using the features available in MS- Excel. What is Data Analysis ToolPack in Excel?
  • 5. Introduction to Data Analysis FD 502 Data Analysis ToolPack in MS Excel These instructions apply only to Excel 2010 and present versions. • Click the File tab, click Options, and then click Add-ins category. • In the Manage box, select Excel Add-ins and the click Go. • In the Add-ins available box, select the Analysis ToolPak check box and then click OK. Tip: If Analysis ToolPack is not listed Add-Ins available check box, click Browse to locate it. If you are prompted that the Analysis ToolPack is not currently installed on your computer, click Yes to install it.
  • 6. Introduction to Data Analysis FD 502
  • 7. Introduction to Data Analysis FD 502
  • 8. Introduction to Data Analysis FD 502
  • 9. Introduction to Data Analysis FD 502
  • 10. Introduction to Data Analysis FD 502 Features of Data Analysis ToolPak in MS Excel
  • 11. Introduction to Data Analysis FD 502 Sample Data for Exploring Data Analysis Tool Pack
  • 12. Introduction to Data Analysis FD 502 IBM Statistical Package for Social Sciences (SPSS)
  • 13. Introduction to Data Analysis FD 502 Features of IBM Statistical Package for Social Sciences (SPSS) • Standard Package • Data Access and Management • Data Preparation • Graphs • Output • Data Editor Enhancement • Extended Programmability • Statistics • Multi Threaded Algorithm • Bootstrapping • Regression • Advanced Statistics
  • 14. Introduction to Data Analysis FD 502 Data Creation in IBM SPSS
  • 15. Introduction to Data Analysis FD 502
  • 16. Introduction to Data Analysis FD 502
  • 17. Introduction to Data Analysis FD 502 Sample Data in Exploring IBM SPSS
  • 18. Frequency Distribution A frequency distribution is the organization of raw data in table form, using classes and frequencies. There are three basic types of frequency distributions. The three types are categorical, ungrouped and grouped frequency distributions.
  • 19. Frequency Distribution FD 502 Categorical Frequency Distribution The categorical frequency distribution is used for data that can be placed in specific categories, such as nominal- or ordinal-level data. For example, data such as political affiliation, religious affiliation, or major field of study.
  • 20. Frequency Distribution FD 502 Twenty-five army inductees were given a blood test to determine their blood type. The data set is as follows: A B B AB O O O B AB B B B O A O A O O O AB AB A O B A Example
  • 21. Frequency Distribution FD 502 The categorical frequency distribution is Blood Type Frequency Percent A 5 20 B 7 28 O 9 36 AB 4 16 N = 25 100
  • 22. Frequency Distribution FD 502 Ungrouped Frequency Distribution An ungrouped frequency distribution is used for numerical data and when the range (the difference between the highest and the smallest values) is small.
  • 24. Frequency Distribution FD 502 Class Limits (in miles) Frequency Percentage 12 6 20 13 1 3 14 3 10 15 6 20 16 8 27 17 2 7 18 3 10 19 1 3 N = 30 100 The ungrouped frequency distribution is
  • 25. Frequency Distribution FD 502 Grouped Frequency Distribution When the range of the data is large, the data must be grouped into classes that are more than one unit in width. To construct a frequency distribution, follow these rules: 1. There should be between 5 and 20 classes. 2. The class width should be an odd number. This ensures that the midpoint of each class has the same place value as the data.
  • 26. Frequency Distribution FD 502 3. The classes must be mutually exclusive. Mutually exclusive classes have nonoverlapping class limits so that data cannot be placed into two classes. 4. The classes must be continuous. There should be no gaps in a frequency distribution. 5. The classes must be exhaustive. There should be enough classes to accommodate all the data. 6. The classes must be equal in width. This avoids a distorted view of the data. Grouped Frequency Distribution
  • 27. CHAPTER 2 Presentation of Data FD 502 - Basic Statistics Christian G. Abalos Presenter
  • 28. Frequency Distribution FD 502 Constructing Grouped Frequency Distribution 1. Find the range. range = highest value – lowest value 2. Decide on the number of class intervals or classes, we denote it by k. ´ Sturge’s Formula: k = 1 + log2N ´ another formula: ´ 5 – 20 classes
  • 29. Frequency Distribution FD 502 3. Determine the class size or class width of the interval, we denote it by c. (rounded to the nearest odd whole number) 4. Determine the lower limit LL and the upper limit UL of the lowest class interval. The lowest class interval should contain the lowest value in the data set. The value of the UL is determined using the equation UL = LL + (c – 1) Constructing Grouped Frequency Distribution
  • 30. Frequency Distribution FD 502 Constructing Grouped Frequency Distribution 5. Determine the upper class intervals by consecutively adding the class size c to the values of LL and UL of the lowest class interval until we get the class interval with the highest value in the data set. 6. Tally the data, find the frequencies. Note: Other statistical information may be reflected in the table such as class boundaries, class marks or class midpoints, less than cumulative frequency (<cf), greater than cumulative frequency (>cf), and the relative frequency (rf)
  • 31. Frequency Distribution FD 502 ´The class boundaries are used to separate the classes so that there are no gaps in the frequency distribution. Other Features of Grouped Frequency Distribution
  • 32. Frequency Distribution FD 502 • The class midpoint is found by adding the upper and lower boundaries (or limits) and dividing by 2. • The cumulative frequencies are used to determine the number of cases falling below (for <cf) or above (for >cf) a particular value in a distribution. • The relative frequency (rf) of a class interval is the proportion of observations falling within the class and maybe presented in percent. Thus, Other Features of Grouped Frequency Distribution 100 x n f rf =
  • 33. Frequency Distribution FD 502 Distribution of scores of forty students in a Mathematics class. Example
  • 34. Frequency Distribution FD 502 Why do we construct frequency distribution?
  • 35. Graphical Presentations of Data The three most common statistical graphs are the bar graph (histogram), the frequency polygon, and the cumulative frequency or the ogive. The purpose of graphs in statistics is to convey the data to the viewer in pictorial form. Graphs are useful in getting the audience’s attention in a publication or a presentation.
  • 38. Graphs FD 502 Histogram The histogram is a graph that displays the data by using vertical bars of various heights to represent the frequencies.
  • 39. Graphs FD 502 Frequency Polygon The frequency polygon is a graph that displays the data by using lines that connect points plotted for the frequencies at the midpoints of the classes.
  • 40. Graphs FD 502 Ogive The ogive is the graph that represents the cumulative frequencies for the classes in a frequency distribution.
  • 41. Graphs FD 502 Other types of Graphs Pareto Chart A Pareto chart is used to represent a frequency distribution for categorical variable, and the frequencies are displayed by the heights of vertical bars, which are arranged in order from highest to lowest.
  • 42. Graphs FD 502 Other types of Graphs Pie Chart A pie chart is a circle that is divided into sections according to the percentage of frequencies in each category of the distribution.
  • 43. Graphs FD 502 A stem-and-leaf plot is a data plot that uses part of a data value as a stem and part of the data value as the leaf to form groups or classes. It has the advantage over grouped frequency distribution of retaining the actual data while showing them in graphic form. Other types of Graphs
  • 44. Graphs FD 502 Short History on Stem-Leaf Diagram
  • 46. CHAPTER 3 Descriptive Statistics FD 502 - Basic Statistics Christian G. Abalos Presenter
  • 47. Summary of Measures Summary Measures Central Tendency Mean Median Mode Other Locations Percentiles Quartiles Deciles Variation Range Variance Standard Deviation Coefficient of Variation
  • 48.
  • 49. Measures of Central Tendency FD 502 A measure of central tendency or measure of central location describes the “center” of a given set of data. This is a value about which observations tend to cluster. ´ A single value used to represent the “center” of the data or the typical value. ´ An index of the central location of a distribution. ´ Precise but simple ´ The most representative value of the data Measures of Central Tendency
  • 50. Measures of Central Tendency FD 502 Common measures of central tendency are the MEAN, MEDIAN, and MODE. Arithmetic Mean The arithmetic mean or simply the mean is the average of a given set of data. It is obtained by dividing the sum of all the observations by the total number of observations.
  • 51. Measures of Central Tendency FD 502 population mean for a finite population with N elements, denoted by the Greek letter μ sample mean for a finite sample with n elements, denoted by The population mean is a parameter while the sample mean is a statistic. Arithmetic Mean
  • 52. Measures of Central Tendency FD 502 A random sample of 5 BSED students about to take their final examination were asked how many hours they slept the night before the test. The data given are 5, 7, 3, 4, and 6. The mean number of hours of sleep is x = xi i =1 n å n = 5+7+3+ 4 +6 5 = 5 hours Example
  • 53. Measures of Central Tendency FD 502 Using the previous data in the previous example, if the student reported 14 hours of sleep instead of 3 hours, then the new mean is x = xi i =1 n å n = 5+7+14+ 4 +6 5 = 7.2 hours Remark: The mean takes into account all observations in the data set. Thus, it is affected by extreme values. Example
  • 54. Measures of Central Tendency FD 502 Mean for Grouped Data x = fi xi i =1 k å fi i =1 k å where fi = frequency of the class interval xi = class mark of the class interval
  • 55. Measures of Central Tendency FD 502 Given the frequency distribution table below, find its mean. Class Interval Frequency 19 – 21 3 16 – 18 10 13 – 15 4 10 – 12 12 7 – 9 6 Example
  • 56. Measures of Central Tendency FD 502 We solve first the class mark and the product of the class mark and the frequency. Class Interval Frequency ( f ) Class Mark (x) f x 19 – 21 3 20 60 16 – 18 10 17 170 13 – 15 4 14 56 10 – 12 12 11 132 7 – 9 6 8 48 Sf=35 Sfx=466
  • 57. Measures of Central Tendency FD 502 x = fi xi i =1 k å fi i =1 k å = 466 35 = 13.3 Class Interval Frequency ( f ) Class Mark (x) fx 19 – 21 3 20 60 16 – 18 10 17 170 13 – 15 4 14 56 10 – 12 12 11 132 7 – 9 6 8 48 Sf=35 Sfx=466
  • 58. Measures of Central Tendency FD 502 Weighted Mean ´Utilized when an individual value have varying importance. ´Weights are assigned to each observed value before mean is computed.
  • 59. Measures of Central Tendency FD 502 Find the GPA of a student with the corresponding grades below: Example Subjects Grades Unit A 1.5 3 B 1.1 2 C 1.8 3 D 2.0 4 E 1.5 3 Subjects Grades Unit A 1.5 3 B 1.1 2 C 1.8 3 D 2.0 4 E 1.5 3 15 Subjects Grades Unit Grades*units A 1.5 3 4.5 B 1.1 2 2.2 C 1.8 3 5.4 D 2.0 4 8 E 1.5 3 4.5 15 24.6 ̅ 𝑥 = 24.6 15 = 1.64
  • 60. Measures of Central Tendency FD 502 Properties of the Mean ´The most common and widely understood measure of central tendency which utilize all observed value in the calculation; ´Mean can be computed for grouped and ungrouped data hence, mean may be based not on the actual observed value; ´The mean is affected by extreme values. ´The value of the mean is always existing and unique; ´Mean is utilized when the distribution is not symmetrical and when all observed values is given equal importance as well as bases for statistics.
  • 61. Measures of Central Tendency FD 502 Median ´divides an ordered observation into two equal parts; the positional middle of the array ´half of the observations are below its value and the other half are above its value.
  • 62. Measures of Central Tendency FD 502 Steps in finding the Median 1. Arrange the set of scores in ascending order (from lowest to highest) 2. If n is odd, there will be a middle score. This middle score is the median. If n is even, there will be two middle scores. The median is taken as the arithmetic average of the two middle scores.
  • 63. Measures of Central Tendency FD 502 Below are the scores of 6 students in their Mathematics test. Find the median. 35 20 12 30 25 50 Arranging the scores in increasing order, we have 12 20 25 30 35 50 Example
  • 64. Measures of Central Tendency FD 502 Since n=6, the median is the average of the and observations. That is, n 2 æ è ç ö ø ÷ = 6 2 æ è ç ö ø ÷ = 3rd n 2 +1 æ è ç ö ø ÷ = 6 2 +1 æ è ç ö ø ÷ = 4th Md = x3 + x4 2 = 25+30 2 = 27.5
  • 65. Measures of Central Tendency FD 502 Below are the scores of 7 students in their Mathematics test. Find the median. 35 20 12 30 25 50 26 Arranging the scores in increasing order, we have 12 20 25 26 30 35 50 Example
  • 66. Measures of Central Tendency FD 502 Since n=7, the median is the observation. That is, Md = x4 = 26 n+1 2 æ è ç ö ø ÷ = 7+1 2 æ è ç ö ø ÷ = 4th
  • 67. Measures of Central Tendency FD 502 ´The median is a positional measure. ´Extreme values affect the median less than the mean. ´Median is utilized when there are extreme observed values ´Median is also utilized when grouped data or a frequency distribution do not have a true zero point of open-ended class intervals. Characteristics of the Median
  • 68. Measures of Central Tendency FD 502 Mode The observed value that occurs most often or with the greatest frequency in a data set. Mode can be identified by counting the frequency of each observed value and locating the observed value with the highest frequency Mode is a less popular measure of central tendency as compared to the mean and the median, but the easiest and can be considered as a quick estimate for the measure of central tendency.
  • 69. Measures of Central Tendency FD 502 Find the mode of 7; 5; 5; 3; 1; 1; 3; 5 Since 5 appears most frequent than the rest of the observed values then , the mode is 5 Find the mode of 7; 5; 5; 3; 1; 1; 3; 5; 1 Since 5 and 1 appears most frequent than the rest of the observed values then , the mode is 1 and 5 . Example
  • 70. CRUDE MODE Mo = 3´ Median-2´ Mean Measures of Central Tendency FD 502
  • 71. Measures of Central Tendency FD 502 The mode is the most typical value of a set of observations. Few low or high values do not easily affect the mode. The mode is sometimes not unique and nonexistent. There may be several modes for one data set. We can get the mode for both quantitative and qualitative types of data Characteristics of Mode
  • 72. Central tendency in relation to Levels of measurement ´ Variables measured categorically are either as nominal or ordinal data and can only be best represented using frequency counts. Its likelihood of statistical comparison dwells on differences in terms of proportion. Specifically, a nominal data can be summarized using mode and an ordinal data using median or mode. As an implication to this, all statistical test that utilizes comparison of means are not permissible for categorical data since primarily, mean as a measure of centrality is not existent for such data type. ´ Both, interval and ratio accommodate mean as a measure of central tendency, interval data are less powerful as that of a ratio data due to its arbitrariness of zero point. Measures of Central Tendency FD 502
  • 73. Measures of Relative Position FD 502 Summary of Measures Summary Measures Central Tendency Mean Median Mode Other Locations Percentiles Quartiles Deciles Variation Range Variance Standard Deviation Coefficient of Variation
  • 74. Measures of Relative Position FD 502 Measures of Relative Position Percentile ´ Per-centum ´ Divides the ordered observations into 100 equal parts. ´ There are 99 percentiles, denoted as P1, P2, P3, …, P99 with around 1% of the observations in each group. We interpret percentiles as follows: P1, first percentile, is the value below which 1% of the ordered values fall. P2, second percentile, is the value below which 2% of the ordered values fall. P99, ninety-ninth percentile, is the value below which 99% of the ordered values fall.
  • 75. Measures of Relative Position FD 502 ´ Step 1: Arrange the data in ascending order ´ Step 2: Assume that there is no missing data, and all values are existent. ´ Step 3: Let X1, X2, X3, … Xn be the observations arranged from lowest to the highest ´ Step 4: Denote the percentile of interest with k ´ Step 5: Get the percentile using the formula (i) (ii) Computing the Percentile
  • 76. Measures of Relative Position FD 502 Another Formula for Computing the Percentile
  • 77. Measures of Relative Position FD 502 Below are the scores of 40 students in their Mathematics test. Find P85. 16 26 31 32 34 37 39 43 19 29 31 33 34 37 39 44 22 30 31 33 35 37 41 45 25 30 32 33 35 38 41 47 26 31 32 34 36 38 42 47 Example
  • 78. Measures of Relative Position FD 502 We seek the value below which As seen from the table, P85 could be any value between 41 and 42. To have a unique value, we define 85 100 ´ 40 = 34 observations fall P85 = 41+ 42 2 = 41.5 Example
  • 79. Measures of Relative Position FD 502 Deciles are values that divide a set of ordered observations into 10 equal parts. These values, denoted by D1, D2, …, D9, are such that 10%of the data falls below D1, 20% falls below D2, …, and 90% falls below D9. Decile ´ Refers to the nine (9) values that divide an ordered data set into 10 equal parts ´ The ith decile, Di is a value below which 10 x i % of the data lie D1, first decile, is the value below which 10% of the ordered values fall.
  • 80. Measures of Relative Position FD 502 Quartile ´ Refers to three (3) values that divide an ordered data sets into 4 equal parts ´ Split Ordered Data into 4 Quarters The ith quartile, Qi is a value below which 25x i % of the data lie
  • 81. Measures of Relative Position FD 502 The lower quartile denoted by Q1 have the lowest observed values of the data set. It divides the bottom 25% of the ordered observations from the top 75%. The upper quartile denoted by Q3 have the highest observed values of the data set. It divides the bottom 75% of the ordered observations from the top 25%. The middle quartile denoted by Q2 contains the next highest observed values of the data set. It divides the bottom 50% of the ordered observations from the top 50%. Qk ,k = 1,2,3 is a value in an ordered distribution, such that the k% of the ordered data in the distribution are < Qk.
  • 82. Measures of Relative Position FD 502 In a box and whisker plot: the ends of the box are the upper and lower quartiles, so the box spans the interquartile range. the median is marked by a vertical line inside the box. the whiskers are the two lines outside the box that extend to the highest and lowest observations. Box and Whiskers Plot
  • 83. Measures of Relative Position FD 502 Box and Whiskers Plot
  • 84. Measures of Relative Position FD 502 Summary of Measures Summary Measures Central Tendency Mean Median Mode Other Locations Percentiles Quartiles Deciles Variation Range Variance Standard Deviation Coefficient of Variation
  • 85. Measures of Variability FD 502 Measures of Variability Absolute Dispersion: Necessary to compare two or more data sets with similar means and unit of measurement. Relative Dispersion: Necessary to compare two or more data sets with different means and varying of measurement.
  • 86. Consider the two sets of data below. Set A 25, 28, 28, 30, 30, 33, 35, 40, 41, 45 Set B 10, 15, 23, 28, 28, 30, 39, 45, 52, 65 They have the same mean (33.5) but Set A is more homogeneous than Set B. Measures of Variability FD 502
  • 87. Measures of Variability FD 502 Range The range of a set of data is the difference between the largest and smallest number in the set. Example: In Set A, the range is 45 – 25 = 20. In Set B, the range is 65 – 10 = 55.
  • 89. Measures of Variability FD 502 Mean Absolute Deviation (MAD) MAD = xi - x å N Where xi = score = mean of the scores N = total number of scores
  • 90. Measures of Variability FD 502 Mean Absolute Deviation (MAD) Grouped Data MAD = f X - X ( ) å N Where X = class mark X = mean f = frequency N = total number of cases
  • 91. Measures of Variability FD 502 Population Variance Ungrouped Data Given the finite population x1, x2, …, xN, the population variance is s2 = xi -µ ( ) 2 i =1 N å N
  • 92. Measures of Variability FD 502 Sample Variance Ungrouped Data Given a random sample x1, x2, …, xn, the sample variance is s2 = xi - x ( ) 2 i=1 n å n -1 s2 = xi - x ( ) 2 i=1 n å n Biased estimator: Unbiased estimator:
  • 93. Measures of Variability FD 502 Sample Standard Deviation s = xi - x ( ) 2 i =1 n å n -1
  • 94. Measures of Variability FD 502 Computational Formula for the Sample Variance (unbiased) s2 = n x2 - x å ( ) 2 å n n-1 ( )
  • 95. Measures of Variability FD 502 Example: Set A 25, 28, 28, 30, 30, 33, 35, 40, 41, 45 We have s2 = xi -33.5 ( ) 2 i =1 10 å 10-1 = 25-33.5 ( ) 2 +...+ 45-33.5 ( ) 2 9 s2 = 43 s = 7
  • 96. Measures of Variability FD 502 Example: Set B 10, 15, 23, 28, 28, 30, 39, 45, 52, 65 We have s2 = xi -33.5 ( ) 2 i =1 10 å 10-1 = 10-33.5 ( ) 2 +...+ 65-33.5 ( ) 2 9 s2 = 287 and s = 17
  • 97. Measures of Variability FD 502 Sample Variance Grouped Data s2 = f ( )xi - x ( ) 2 i=1 n å n where f = frequency x = class mark = mean n = total number of observations
  • 98. Measures of Variability FD 502 Measure of Relative Variation C. V .= s x × 100% To compare the variability of data sets measured in different units, we use the measure of relative variation called coefficient of variation. This index expresses the standard deviation as a percentage relative to the mean. It’s value is given by
  • 99. Measures of Variability FD 502 Example: Determine which data set is more spread out.
  • 100. Measures of Variability FD 502 We first compute the means and standard deviations of the sets of data. Data Set 1: Data Set 2 : x = 24 years and s = 3.742 years x = P 8875 and s = P 2267.984
  • 101. Measures of Variability FD 502 So, we have Data Set 1: Data Set 2 : C. V .= s x × 100% = 3.742 years 24 years × 100% =15.59% C. V .= s x × 100% = P 2267.984 P 8875 × 100% =25.55% Therefore, net take home pay is more scattered with respect to the mean than years of teaching experience of teachers.
  • 102. Measures of Distribution FD 502 Skewness and Kurtosis Skewness measures the degree and direction of asymmetry. A symmetric distribution such as a normal distribution has a skewness of 0, and a distribution that is skewed to the left, e.g. when the mean is less than the median, has a negative skewness. the extent to which a distribution of values deviates from symmetry around the mean. A value of zero means the distribution is symmetric, while a positive skewness indicates a greater number of smaller values, and a negative value indicates a greater number of larger values. Values for acceptability for psychometric purposes (+/-1 to +/-2) are the same as with kurtosis.
  • 104. Measures of Distribution FD 502 Skewness and Kurtosis Kurtosis measures of the "peakedness" or "flatness" of a distribution. A kurtosis value near zero indicates a shape close to normal. A negative value indicates a distribution which is more peaked than normal, and a positive kurtosis indicates a shape flatter than normal. An extreme positive kurtosis indicates a distribution where more of the values are located in the tails of the distribution rather than around the mean. A kurtosis value of +/-1 is considered very good for most psychometric uses, but +/-2 is also usually acceptable.
  • 107. Interpretation of Skewness (Bulmer, 1979) • If skewness is less than -1 or greater than 1, the distribution is highly skewed. • If skewness is between -1 and -0.5 or between 0.5 and 1, the distribution is moderately skewed. • If skewness is between -0.5 and 0.5, the distribution is approximately symmetric.
  • 108. Interpretation of Kurtosis in SPSS • A kurtosis value near zero indicates a shape close to normal • A negative value indicates a distribution which is more peaked than normal. • A positive value indicates a shape flatter than normal.
  • 109. Measures of Variability FD 502 Thank You…