2. What is fequencies?
• Frequencies procedure provides statistics and graphical displays
that are useful for describing many types of variables.
• For a frequency report and bar chart, you can arrange the distinct
values in ascending or descending order, or you can order the
categories by their frequencies.
4. Frequencies Statistics
• Values of a quantitative variable that divide the ordered data into
groups so that a certain percentage is above and another percentage is
below.
• Quartiles: divide the observations into four groups of equal size.
• Mean. A measure of central tendency. The arithmetic average, the sum
divided by the number of cases.
• Median. The value above and below which half of the cases fall, the
50th percentile. If there is an even number of cases, the median is the
average of the two middle cases when they are sorted in ascending or
descending order. The median is a measure of central tendency not
sensitive to outlying values (unlike the mean, which can be affected by a
few extremely high or low values).
• Mode. The most frequently occurring value. If several values share the
greatest frequency of occurrence, each of them is a mode. The
Frequencies procedure reports only the smallest of such multiple
modes.
• Sum. The sum or total of the values, across all cases with nonmissing
values.
5. Quartiles
• Quartiles: divide the observations into four groups of equal size Quartiles (the
25th, 50th, and 75th percentiles) divide the observations into four groups of
equal size.
• The median is the middle value of the data set.
• Mendian th value where n is the number of data value in the data set
• The lower quartile (Q1) is the median of the lower half of the data set.
• The upper quartile (Q3) is the median of the upper half of the data set.
• The interquartile range (IQR) is the spread of the middle 50% of the data values.
So:
• Q1=25%, Q2=50%, Q3=75%
)1(
2
1
n
setdatainvaluedatatheofnumbertheisnwjerevaluethnQ )1(
4
1
1
setdatainvaluedatatheofnumbertheisnwjerevaluethnQ )1(
4
3
3
13 QQIQR
6. Quartiles
• If you want an equal number of groups other than four, select Cut points for n
equal groups.
• You can also specify individual percentiles (for example, the 95th percentile, the
value below which 95% of the observations fall).
7. EX:Quartiles
• Find the median, lower quartile, upper quartile and interquartile range of the
following data set of scores:
– 19 21 24 21 24 28 25 24 30
• Solution:
1. Arrange the score values in ascending order of magnitude:
19 21 21 24 24 24 25 28 30
2. There are 9 values in the data set.
245
2
10
2
19
2
1
,
valuethvaluethvaluethvalueth
n
medianNow
21
2
2121
5.2)19(
4
1
)1(
4
1
valuethvaluthnquartilelower
5.26
2
2825
5.7)19(
4
3
)1(
4
3
valuethvaluthnquartilelower
8. • Interquartile range=Upper Quartile-Lower Quartile=26.5-21=5.5
• This means the middle 50% of the data value range from 21 to 26.5
• Analyze=>Descriptive Statistic=>Frequencies=>Statistics
• Variable Name:VAR0001
• Check on Quartiles
• OK
EX:Quartiles
9. •Note: Cut Points for: equal groups
•Cut the number times + quartiles
(25%,50%,75%)
11. Dispersion
• Dispersion. Statistics that measure the amount of variation or spread in the
data include the standard deviation, variance, range, minimum, maximum, and
standard error of the mean.
• Std. deviation. A measure of dispersion around the mean. In a normal
distribution, 68% of cases fall within one standard deviation of the mean and
95% of cases fall within two standard deviations. For example, if the mean age
is 45, with a standard deviation of 10, 95% of the cases would be between 25
and 65 in a normal distribution.
• Variance. A measure of dispersion around the mean, equal to the sum of
squared deviations from the mean divided by one less than the number of
cases. The variance is measured in units that are the square of those of the
variable itself.
• Range. The difference between the largest and smallest values of a numeric
variable, the maximum minus the minimum.
• Minimum. The smallest value of a numeric variable.
• Maximum. The largest value of a numeric variable.
• S. E. mean. A measure of how much the value of the mean may vary from
sample to sample taken from the same distribution. It can be used to roughly
compare the observed mean to a hypothesized value (that is, you can conclude
the two values are different if the ratio of the difference to the standard error is
less than -2 or greater than +2).Std. Error of Mean
12. Dispersion
• The standard error of the mean (i.e., of using the sample mean as a
method of estimating the population mean) is the standard deviation of
those sample means over all possible samples (of a given size) drawn from
the population. Secondly, the standard error of the mean can refer to an
estimate of that standard deviation, computed from the sample of data
being analyzed at the time.
13. Distribution.
• Skewness and kurtosis are statistics that describe the shape and symmetry
of the distribution. These statistics are displayed with their standard errors.
• • Skewness. A measure of the asymmetry of a distribution. The normal
distribution is symmetric and has a skewness value of 0. A distribution with
a significant positive skewness has a long right tail. A distribution with a
significant negative skewness has a long left tail. As a guideline, a skewness
value more than twice its standard error is taken to indicate a departure
from symmetry.
• Data can be "skewed", meaning it tends to have a long tail on one side or
the other:
14. Distribution.
• Negative Skew?
– Because the long "tail" is on the negative side of the peak.
– People sometimes say it is "skewed to the left" (the long tail is on the
left hand side)
– The mean is also on the left of the peak.
• The Normal Distribution has No Skew
– A Normal Distribution is not skewed.
– It is perfectly symmetrical.
– And the Mean is exactly at the peak.
15. Distribution.
• Positive Skew
• And positive skew is when the long tail is on the positive side
of the peak, and some people say it is "skewed to the right".
Manual Formula
• Skewness=3(Mean-Median)/Standard Deviation
17. Kurtosis.
• A measure of the extent to which observations cluster around a central
point.
• For a normal distribution, the value of the kurtosis statistic is zero. Positive
kurtosis indicates that, relative to a normal distribution, the observations are
more clustered about the center of the distribution and have thinner tails
until the extreme values of the distribution, at which point the tails of the
leptokurtic distribution are thicker relative to a normal distribution.
• Negative kurtosis indicates that, relative to a normal distribution, the
observations cluster less and have thicker tails until the extreme values of
the distribution, at which point the tails of the platykurtic distribution are
thinner relative to a normal distribution.
18. Kurtosis.
• The coefficient of Kurtosis is a measure for the degree of
peakedness/flatness in the variable distribution.
19. Frequencies Charts
• Chart Type
• A pie chart displays the contribution of parts to a whole. Each slice of a pie
chart corresponds to a group that is defined by a single grouping variable.
• A bar chart displays the count for each distinct value or category as a
separate bar, allowing you to compare categories visually.
• A histogram also has bars, but they are plotted along an equal interval scale.
The height of each bar is the count of values of a quantitative variable falling
within the interval.
• A histogram shows the shape, center, and spread of the distribution. A
normal curve superimposed on a histogram helps you judge whether the
data are normally distributed.
• This feature requires the Statistics Base option.
• From the menus choose:
Analyze > Descriptive Statistics > Frequencies...
• In the Frequencies dialog box, click Charts.
20. Crosstabs
• The Crosstabs procedure forms two-way and multiway tables and provides a
variety of tests and measures of association for two-way tables. The
structure of the table and whether categories are ordered determine what
test or measure to use.
• Crosstabs' statistics and measures of association are computed for two-way
tables only. If you specify a row, a column, and a layer factor (control
variable), the Crosstabs procedure forms one panel of associated statistics
and measures for each value of the layer factor (or a combination of values
for two or more control variables). For example, if gender is a layer factor for
a table of married(yes, no) against life (is life exciting, routine, or dull), the
results for a two-way table for the females are computed separately from
those for the males and printed as panels following one another.