SlideShare ist ein Scribd-Unternehmen logo
1 von 121
Trochim, W. M. K. (2006). Internal validity.
http://www.socialresearchmethods.net/kb/intval.php
Please follow link:^^^^^
Social Work Research: Chi Square
Molly, an administrator with a regional organization that
advocates for alternatives to long-term prison sentences for
nonviolent offenders, asked a team of researchers to conduct an
outcome evaluation of a new vocational rehabilitation program
for recently paroled prison inmates. The primary goal of the
program is to promote full-time employment among its
participants.
To evaluate the program, the evaluators decided to use a quasi-
experimental research design. The program enrolled 30
individuals to participate in the new program. Additionally,
there was a waiting list of 30 other participants who planned to
enroll after the first group completed the program. After the
first group of 30 participants completed the vocational program
(the “intervention” group), the researchers compared those
participants’ levels of employment with the 30 on the waiting
list (the “comparison” group).
In order to collect data on employment levels, the probation
officers for each of the 60 people in the sample (those in both
the intervention and comparison groups) completed a short
survey on the status of each client in the sample. The survey
contained demographic questions that included an item that
inquired about the employment level of the client. This was
measured through variables identified as none, part-time, or
full-time. A hard copy of the survey was mailed to each
probation officer and a stamped, self-addressed envelope was
provided for return of the survey to the researchers.
After the surveys were returned, the researchers entered the data
into an SPSS program for statistical analysis. Because both the
independent variable (participation in the vocational
rehabilitation program) and dependent variable (employment
outcome) used nominal/categorical measurement, the bivariate
statistic selected to compare the outcome of the two groups was
the Pearson chi-square.
After all of the information was entered into the SPSS program,
the following output charts were generated:
TABLE 1. CASE PROCESSING SUMMARY
Cases
Valid
Missing
Total
N
Percent
N
Percent
N
Percent
Program
Participation
*Employment
59
98.3%
1
1.7%
60
100.0%
TABLE 2. PROGRAM PARTICIPATION *EMPLOYMENT
CROSS TABULATION
Employment
Total
None
Part-Time
Full-Time
Program
Participation
Intervention
Group
Count % within Program Participation
5
16.7%
7
23.3%
18
60.0%
30
100.0%
Comparison
Group
Count % within Program Participation
16
55.2%
7
24.1%
6
20.7%
29
100.0%
Total
Count % within Program Participation
21
35.6%
14
23.7%
24
40.7%
59
100.0%
TABLE 3. CHI-SQUARE TESTS
Value
df
Asymp. Sig. (2-sided)
Pearson Chi-Square
11.748a
2
.003
Likelihood Ratio
12.321
2
.002
Linear-by-Linear Association
11.548
1
.001
N of Valid Cases
59
a. 0 cells (.0%) have expected count less than 5. The minimum
expected count is 6.88.
The first table, titled Case Processing Summary, provided the
sample size (N = 59). Information for one of the 60 participants
was not available, while the information was collected for all of
the other 59 participants.
The second table, Program Participation Employment Cross
Tabulation, provided the frequency table, which showed that
among participants in the intervention group, 18 or 60% were
found to be employed full time, while 7 or 23% were found to
be employed part time, and 5 or 17% were unemployed. The
corresponding numbers for the comparison group (parolees who
had not yet enrolled in the program but were on the waiting list
for admission) showed that only 6 or 21% were employed full-
time, while 7 or 24% were employed part time, and 16 or 55%
were unemployed.
The third table, which provided the outcome of the Pearson chi-
square test, found that the difference between the intervention
and comparison groups were highly significant, with a p value
of .003, which is significantly beyond the usual alpha-level of
.05 that most researchers use to establish significance.
These results indicate that the vocational rehabilitation
intervention program may be effective at promoting full-time
employment among recently paroled inmates. However, there
are multiple limitations to this study, including that 1) no
random assignment was used, and 2) it is possible that
differences between the groups were due to preexisting
differences among the participants (such as selection bias).
Potential future studies could include a matched comparison
group or, if possible, a control group. In addition, future studies
should assess not only whether or not a recently paroled
individual obtains employment but also the degree to which he
or she is able to maintain employment, earn a living wage, and
satisfy other conditions of probation.
(Plummer 63-65)
Plummer, Sara-Beth, Sara Makris, Sally Brocksen. Social Work
Case Studies: Concentration Year. Laureate Publishing,
10/21/13. VitalBook file.
The citation provided is a guideline. Please check each citation
for accuracy before use.
Statistics for Social
Workers
J. Timothy Stocks
tatrstrrsrefers to a branch ot mathematics dealing '"'th the direct
de<erip-
tion of sample or population characteristics and the an.ll)'5i• of
popula·
lion characteri>tics b)' inference from samples. It co•·ers J wide
range of
content, including th~ collection, organization, and
interpretJtion of
data. It is divided into two broad categoric>: de;cnptive
>lathrics and
inferential >lJt ost ics.
Descriptive statistics involves the CQnlputation of statistics or
pnr.1meters to describe a
sample' or a popu lation _~ All t he data arc available and used
in <.omputntlon o f t hese
aggregate characteristics. T his may involve reports of central
tendency or v.~r i al>il i ty of
single variables (univariate statistics). ll also may involve
enumeration of the I'Ciation-
sh ips between or among two or moo·e variables' (bivariate or
multivariJte stot istics}.
Descriptiw statistics arc used 10 provide information about a
large m.b> of data in a form
that ma)' be easily understood. The defining characteristic of
descriptive ;tJtistks b that
the product is a report, not .on inference.
Inferential statisti<> imolvc' the construction of a probable
description of the charac·
teristics of a population b•sed on s.unple data. We compute
statistics from .1 pJrtial;et of
the population data (a samplt) to estimate the population
parameters. Thrse t<timates
are not exact, but ·e can mo~k..: reawnable judgments as w
hoV preruc our c~lim:ues are.
Included within inferential statiwcs i;, hypothesis testing, a
procedure for U>ing mathe-
m:uics tO provide evidence for the exi<tence of relationships
between o r among variable;.
T bis testing is a form of inferential •"l~umem.
Descriptive Statistics
Measures of Central Tendency
Measures of central tenden')' are individual numbers that typify
the tot.tl set of ~cores.
The three most frequently used mca>urcs of centraltendenq are
the arithmetic mean, the
mode, and the median.
Arir!Jmeric .1ea11. The arithmetic mean usually is simply
called the mca11. It also is called
the m-erage. It is computed b)' adding up all of a set of scores
and dwidmg by the number
of scores in the set. The algebraic representation of this is
75
76 PA11 f I • OuANTifAllVi AffkOAGHU: fouHo~;noM Of
Ot.r"' CO ltf(TIO'J
~, =l:: X ,
11
where 11 represents the popu I at ion mean, X represems an
individual score, and rr is t he
number of scores being adde(l.
The formula for the sample mean is the same except t hat the
mean is represented by
the variable lener with a bar above it:
- l:;X X= --.
II
Following are t he numbers of class periods skipped by 20
seventh-graders d uring
I week: {1, 6,2,6, 15,2(),3,20, 17, 11, 15, 18,8,3, 17, 16, 14,
17,0, 101. Wecomputethe
mean by adding up the class periods missed and dh•iding by 20:
l:;X 219 •
J.l = -- = - = 10.9o.
II 20
Mode. The mode is the most frequently appearing score. It
really is not so much a measure
of centrality as it is a measure of typicalness. It is found by o
rganizing scores int o a fre-
quency distribution and determining which score has t he
greatest fre-
TABLE 6 . 1 Truancy Scores
quency. Table 6. 1 displays the truancy scores arranged in a
frequency
distribution.
Score
20
19
18
17
16
IS
14
13
12
II
10
9
8
7
6
5
4
3
2
1
0
frequ ency
2
0
1
3
1
2
I
0
0
l
I
0
1
0
2
0
0
2
0
Because 17 is the most frequently appearing number, the mode
(or
modal number) of class periods skipped is 17.
Unlike the mean or median, a distribution o f scores can have
more
than one mode.
,llfedinrr. lf we take all the scores in a set of scores, place t hem
in o rder
from least to greatest, and count in to the middle, then the score
in the
middle is the median. This is easy enough if there is an odd
number of
scores. However, if there is an even number of scores, then
there is no
single score in the middle. In this case, t he two middle scores
are
selected, and their average is the median.
There a.re 20 scores in the previous example. The median would
be
the a"erage of the lOth and lith scores. We usc t he frequency
table to
find these scores, which are 14 and J 5. T hus, the median is
14.5.
Measures of Variabi li ty
Whereas measures of central tendency are used to estimate a
typical
score in a dimibution, measures of variability may be thought of
ns a
way in which to measure departu re from typic<~lness. They
pro"ide
information on how "spread out" scores in a d istribution are.
J<auge. The range is the easiest measure of variability to
calculate. It is
simply the distance from the minimum ( lowest) score in a
distribution
If
10
R
:.aJ
13
de
c .. ...nu 6 • STAnsnu t<~~ Soc&AL Wouta~ 77
to the maximum ( highest) score. h is obtained by subtracting
the 111ini murn score flom
lhe maximum ~cor~.
Let us compute th.- rang.- for the following dJt.l ~ct:
/1, 6, 10, 14, 18,22/.
'T'he n1inimum i!) 2, and tht." tnJximum is 22:
Range = 22 - 2 20.
Sum ofSquaus. The sum of squares is a measure of the total
amount of variability in" set
of scores. Jts na me tells how to wmpute it. Smu ofsqunres is
short (or sum ofsqumed dc1ti
til ion scores. It is represented by the S)'lnbol SS.
The formulas for sample and population sums ot squares are the
same except for sam-
ple and populat•on mean symbob:
SS = I(X ~tl'
Using the dJtJ set fo r t11e range, the sum of squnres would be
computed as in
'ldble6.2.
V.~rinuce. Another name for variance i~ mean square. This is
short for mean of squared
devintron score<. 1l1is is obtained by dividi ng the sum of
squares by the number of scores
(11). It is a me,tsure of the average amount of variabilit y
associated with each score in a set
of scores. The population variance fOI'mu la is
ss
a2= -.
n
whc1e cr2 is the syn>bol for populn tion variance, SS is the
symbol fo r sum of squares, and
11 st,uJds for th e number of scores in the population.
The variance for the example we used to compute
sum of squares would be
TAOLE 6.2 Computing the Sum of Squares
X X m
2 tO
6 6
10 ]
l<t 12
18 >6
12 10
NOTE, !X~ 72; n- 6; ~ • 12; l:(X - p)' ~ 780
(X - m)'
100
36
4
4
36
100
2 280
(J --= 46.67.
6
The sample variJnce is not an unbi.as.ed estin1a1o1
of thf population variance. If we compute the vari
anccs for these samples using the SS/11 formula, then
the- san1ple vadn nccs wil1 average o ut smaller than
the population val'iance. For th is rc:~son, the sample
variance is computed differently froru the population
variance:
ss
sl = - - .
II - I
CHA,Ut 6 • Sr"n~nn HJa SOCIAl wouus 77
to the maximum (highc;t) score. h is obtained by subtracting the
minimum scoo·c from
the maximum score.
let us compute the rnnge for the following data set:
12. 6, 10, 14, 18.221 .
The minimum is 2. and the maximum is 22:
Range 22-2 = 20.
Sum of8qo~t~res. The ,um of squares;, a measure of the total
amoun t o f variability in a set
of score~. It> name tells how to compute it. Sum of 51Jo.arcs is
short for ;um of squared dco•i-
atiou scores. It is reprewnt<>tl by the symlxll SS.
The formulas for <.omple and popul.llion sums of squares are
the ~arne except tor S<J m -
p le a nd population mean sym bols:
ss l.(X -X)'
Usi ng the data set for the range, t he su m of squares would be
computed ns i n
T.,b)e 6.2.
~rta11u. Another name for variance is mean square. This is
short for menn of 51JIUtred
devontw11 scores. This os obtained by dividing the sum of
squares by the number of ><.ores
(n). It is a measure of t he averoge ••m ount of var iability
associated w ith each score in a set
of scores. T he popula tio n variance for m11ln is
ss
¢ =- .
n
where o ' is th e symbol foo· population v•o·ia.nc.e, SS is t he
symbol fo o· Slim o f squares. a11d
11 stands for the numbet of scores in the population.
The •-..ria nee for the example we used to compute
sum of squar~s would be
TABu 6.2 Computing the Sum of Squares
X X-m
2 - 10
6 -6
10 -2
14 +2
18 +6
22 +10
HOT£: r.x- 72: n; ti; p = 12: l:lX Ill'= 250.
(X- m)'
100
j(,
4
4
J&
tOO
280
cr2 =
6
~ 46.67.
The snmple variance is uot Jn Ulbiased estimalor
o f' t he population variance. Jf we com pute t he vari-
ances for these samples using th" SShr formu la, then
the sample variances will average out smaller than
thc population ••ariance. For this reJson, the sample
Vllriance is computed differe ntly from the population
variance:
ss r =-.
n - J
78 PAll I • QuAiuu.ot.nvt A"MACH(S.:. FouHDAIIOif"i Of
O.AIA CoLLfcnow
The n - 1 i> a correction fac tor for this tendency to
undcre>tima te. I t is c.1 lled
degree• of freedon1. If <lur example we1< a sample. then the
,,ariance would be
.1 280
> =--
6 - 1
280 6
5 = 5.
Sumdard Deviatron. Although the variance is a measure of
average variability associJtc'<l
wllh each score, it i> on a d ifferent sc.lle from the score itself.
Tlw variance measures avel·
age squared deviation from the mean. To get " me<tstne of
averdgc variabili ty on the ;a rue
scale as the original scores, we ta ke the squa 1·c rc)Ot of the
varia nee. The st<tndard deviation
is the square root of the variance. The fo rmula< are
Using the same .ct of numbers as before, the population
standard deviation would be
cr -/46.67 = 6.83 .
and the sample st.mdard deviation would be
s J56 = 7.'18.
For a normally d istribured set of scores, n ppwximately 68% of
all ;cores will be within
ll •tanrlard deviation of 1 he mean.
Measures of Relationship
T.1ble 6.3 shows the relat iortship between number of >treSsors
experien<ed by a parent during
.1 week and that parent's frequency of U>C of corporal
punishment during the same wee.k.
One can use ,·eg,·cssion procedures to dcrivr the line that best
fo ts the data. This line is
rcfel'l'ed to as a regression line (or line of best ii 1 o r
prediction I inc). Su ch a line bas been
.CJiculated for the example plot. It has a Y ime,·cept of - 3.555
t11id a slope of + 1.279. T his
gives us the prediction equation of
Y,_. = 3.555 t 1.279X,
where Yis fi-equ ency o f <Orporal p unishment and X is
stresso1 ~. This is graphically pre
dieted in Figure 6 . 1.
Slope is the ch•ngc in Y for a unit increase in X. So, the slope
of 11.279 meam that''"
increase in stres.ors (X) of 1 will be accomp.ulicd by an
increase in predicted frequency of
~orporal punishment (I') of + 1.279 incidents per week. If the
slope were a negati'e
number, then an increase in X would be accompanied by a pred
ictcd decrease in Y.
The equation does not give the actual value of Y (called the
obt.tined or obserwd
score); rather, it giv~s a prediction of the value of Y for a
certain value of X. Fo r
-
Cu,"na 6 • SrAliSnc<o 10~ So- '"' WOhi•C. 79
r iQUIO 6.1 8
Frequency ol Stre<sors
and Use of Co•poral 7
0
Punishment
~
6 0
c . Y P'td; - 3.555 + 1.279X ..
" 5 0 r:r
e ...
c 4 ..
E
.r:
3 til
·;:
" Q.
2 0
0
0
0 1 2 3 4 5 6 7 8 9
Stressors
example, if X were 3 , rhen we would predi<.t t hal Y would be
- 3.555 + 1.279(3) ~ - 3.555
+ 3.837 ~ 0.282.
Tuu 6 . 3 frequency of
Sttessors and Use of
Corporal Punishment
Sue-ssors Pun1.shm~nt
3 0
4
4 }
s 3
6 4
7 ~
8 6
7
q 8
1() 9
T he regression li ne is the line that predicts Y >UCh t hat t he
error
of p redictio n is minim ized. Error is d efined as the d ifference
between the predicted score and the obtaine<l score. The
equation
for compu ting error is
E= Y Y..,.. ..
~1en X= 4, there arc two obL1ined ''alues of Y: I and 2. The
p redicted value of Y is
Y,,...t = - 3.555 I 1.279( 4) = - 3.555 + S. l l6 ~ 1.56 1.
rhe error of prediction i~ E =I - 1.561 = -0.561 fu r Y = I, and
E - 2 - 1.561 = +0.<139fnr Y=2 .
If we square each error difference score and sum the squares.
then we get a quantity called the enor sum of sq.ure;., which i;.
r~presented b)•
SSI: L( Y - Y,..,.,)'.
T he regressi011 line io !he o ne line that give> the sm.11lcst va
lue
fo r SSt.
80 P~oar 1 • QUAtHnAnvE A ,ROACHES: FouNOAHO~r~~$ of
DAtA Conte I!Otf
The SSE is a measure of the lOla I variability of obtained score
values around their pre-
dicted values. There are two other ;un" of squares !hat are
important to undcr>tanding
correlation and regri'SSion.
The total sum of squ.m:s (SS1) i$ a measure of the total
variabilit)' of the obtained
score values around the mean oft he obtained scores. The SST is
represented by
SST = L(Y-Y)'.
The remaining sum of squa 1·cs is coiled the regression sum of
S<Ju:u·cs (SSR) o r the
explained sum of squares. If we squnre each of the differences
between prcdie1 cd scores
and t he mean and then add t hem u p, w·c get the SSR, which is
represented by
SSR L( v, .... - Y)'.
The SSR is a measure of the tot.d variabil ity of the predicted
score values around the
mean of the obtained scores.
An important and interesting feature of the>e three sums of
squares is that the sum of
the SSR and SSE is equal to the SS1:
SST SSR- SSE.
This leads us to three o ther imponnnt stat istics: t he proportion
of variance explJined
(I'VE) , the correlation coefficient, ond the standard error of
estim ate.
Proportion of Iarin nee Expluir~ctl. T ht I'VE is a measure of
how good Lhc rcs,·cssion line
p red icts obtained scores. The values of PV£ 1·ange fro m 0 (
no p red ictive value) to I ( pre-
diction with perfect accurJLy). The cqunt ion fo r PV£ is
SSR
J>vE - - ·
SST
There also is a computational equation for the PVE. which is
where
PVE - ( SSXY )'
SSX • SSY'
SSXY is the "co variance" ~um of ;qua res: l.(X - X)( Y - Y ),
SSX is t he sum of squares for vn rinble X: IlX - XJ', and
SSYis the sum of squares for varinblc Y: 2:( Y - Y)'.
The procedure fo r computing these sums of squares is outlined
in Table 6.4.
The proportion of v.triance in the freque ncy of corporal
punishment thnl may be
explained by stressors experienced ;,
( 4 6L5)1 3782.25
l'VE = - = = 0 .953.
(48.1)(825) 3968.25
TABLE 6.4 Computation of r2 (PVE)
y Y - y (Y- Y)' X X x (X - X)' (X X)( Y Y)
3 -33 10 .89 0 -4 5 20 .2 5 +1405
4 -2 3 5.29 -lS 12 .25 +80S
4 -23 529 2 -15 6 .25 < 5.75
5 - Ll 1.69 3 1.5 2.25 • 1.95
6 -ol 0 .09 < -o5 0.25 0 IS
7 +0./ 0.49 5 ·10.5 0.25 035
8 + II 2.89 6 ; 1.5 2 .25 • 2.55
7 TO.! 0.49 7 12.5 6 .25 11.75
9 +27 7.29 R t3.5 12.25 -19.45
10 +3 I 13 69 9 "'5 20.25 16.65
NOTE: Y - 6.3; SSY - 48. l; X = 4.5; S5X = 82.5; S5XY • •6 l S
The PVEsometimes is en lied th~ coefticient of determination
and is represented by the
symbol r'.
Correlation Co~ffirirm. A correlation coellicient also is a
111easure of th e strength of rela-
tionship between two variables. The correlation cocfficicnt is
represented by the letter r
and can take on values between - 1 and + I inclu~ivc. The
correlation coefficient always has
the same sign a.< the slope. If one squares a correlation
coefficient, then <me will obtain the
PV£ It is computed using the following formula:
SSXY
r = -vr.;;S50sx""•""S;;;S;;o;Y
For our examph: data, the correlation coefficient would be
+61.5 ~ 61.5 +61.5
R --- = = = -0.976 .
./(18.1)(82.5) ¥'3968.25 62.994
Standard Error of Em mate. The standard error of estimate is the
<tandard deviation of the
prediction errors. It i< computed like any other standard
deviation: the: square root of the
SSE divided by the dcRn:es of freedom.
The fi rst s tep is to compute the variance error (s:.J:
..1
'E
SSE
n-2
Notice that the value for degrees of freedom is 11 2 rather than
11 - l. The reason why
we subtract 2 in this instance is that variance error (and
standard Cfi'Or of c:stimatc) is a
statistic describing characteristics of two variables. T hey deal
with the error involved in
the prediction of Y (one variable) from X {the other v.triable) .
'l he standard error of estimate is the square root of the variance
error:
Sf.= ...j(ij.
The standard error of estimate tells us hOv spread out scores
are with respect to their
predicted values. If the error· scores ( E = Y- Y,.o~> are
normally distributed around the
prediction line, then about 68% of actual scores will foil
between ±I :;,; of their predicted
values.
We can calculate the standard error of estimate using the
foUowing computing formula:
( n-1) ( I -- r 2)(-------) , u-2
where
s,. is the standard deviation of Y,
r is the correlation coefficient fo r X and Y, and
n is tl1e sample si7.c.
for the example dat..1, this would be
S£ = 2.3lli ((J -- .953) :~ = D = 2.311 ((0.47)~)
= 2.311J0.053 = (0.230)(0.727) = 0 .167.
Inferential Statistics: Hypothesis Testing
The Null and Alternative Hypotheses
Classical ;tatistical hypothesis testing is based on the evaluation
of two rival hypothescs:
the null hypothesis and the alrermltive hypothesis.
We try to dete<:t relationsh ips by identifying changes that are
unl ikely to have occurred
simp!)• bccau~e of random fluctuat ions <If dependent
measures. Statistical analysis is the
usual procedure for identil)•ing ;uch relationsh•p>.
The null hypothesis is the hypotltcsis that there is no
relationship between two vari-
ables. This implies that if the null hypothesis is true, then any
apparent relationship in
Mmples i> the resuh of random flu ctuations in the dependent
meas ure or sampling error.
Statistical hypothesis tests arc carried out on samples. for
example, in nn experi-
ment!// two-gro11p posttcst-only design, there would be a
sample whose members
received an intervention and a sample whose members did not.
Both of these would be
probability samples from a larger population. The interven tion
>ample would reprcse>11
Figure 6.2
The Null Hypothesis
and Type I Error
C14Anu 6 • StAJtmu f<M' Socw. Wouus 83
the popula tion of all individuals as if they had received the
i.ntervt•ntion. Th e control
sample would be repre<entative of the <ame popuiJtion of
individuals as if the)· had
not recei>·ed the inten-emion.
lf the intervention had no effect, then th e populations would be
iden tical. However, it
would be unlikely that two samples from two ident ical popula
tions would he ident ical. So,
although the sample mea ns would be diffe rent, they would not
rcpre>CtH any effect of t he
independent variable. The apparent difference would be due to
sampling error.
Statistical hypothC$is tests invoh·e e'-aluating evidence from
.amples to make inler-
ences about populations. II is for this reason that the null
hypothe>i> is a statement about
population parameters. For example, o ne null hypothe>iS for I
he previous design cou ld be
stated as
or as
H, : ll = ~to = 0.
H, stands for the null hypothC$iS. It is J letter H with J " ro
subscript. It is a statement
t.ha t the m~ans of the experime ntal ( Mean I) and cont rol (
Mean 2) popultnio'ls arc eq ual.
To <:>tablish that a relat ionship exists between th e in
tervention (independent Vilfi:tble)
and the outcome (measure o f the dependent variable), we must
collect eviden<C that
allows us to reject the null h)'J>Othesis.
Strictly speaking, we do not mak~ J decision as to whether the
nul] hypoth eoi:. is
correct. Ve evaluate the evidence to determine the ext<·nL to
which it •cncls to confirn"' or
disconfi rm the null hypothesis. If the evide nce wct·e suc.h that
it is unlikely that an
observed relationship would have ocwrrcd as the re.ult of
sampling e r ror, then we would
reject the null hypothesis. If the eviden«: were more ambiguous,
then we would f.1il to
reject the null hypothesis. The terms re;err and fail to rrjm carry
the implicit under<tand-
ing tlMt our decision might be in ert'or. Th e truth i, th at we n
ever really know whethe r
our decbio11 is correct.
vVhen we reject the n ull hypothesh and it is true, we ltJve
committed a Type I error. By
setting certain statistic•! criteria beforehand, we can ~"tablish
the prombiliry that we "•ill
commit a 'JYpe l error. 'c decide what proportion of the time
we arc willing to commit a
Type l error. This proportion ( proba bility) is called a l1>ha
(o:). If we n1e willing to reject
the null hypothesis when it is true onl)• I in 20 times, thc11 we
set our a level at .05. If' on ly I
in 100 time>, then we set it at .0 I.
Tbe probability that we will fail to reje<t the null hy]>Othesis
when it is true (correct
deci;ion) ts 1 - a (Figure 6.2).
Situahon: NULL HYPOTH ESIS TRUE
Deas1on ACSlllt
Reject H, 1'ype I Error
ex • the probability or rejecting the Null Hypo thes is when it is
true
Fail to Reject H, Correct Decision
I a= the probability of not rejecttng the Nun Hypothesis wllcn
11 is true.
84 PAII t I • Qv.umr:.WI~ A PI'IOACHH: Fourwt. lt<m S OF 0
1.1A CotulCI!Oii
Figure G.:Y
The Nu ll Hypothesis
and u Level
The fol!pwing hypothesis would be evaluated by c<>mparing
the difference between
sample means:
If' we carried out multiple samples from populations with
identical. n>eans (the null
hypothesis was true), then we would find that most of the
vallles for the differences
between the sample means wou ld not be 0. Figure 6.3
represents a distribm ion of the dif·
fercn ces between sample means drawn from identical
populations.
The mean d ifference for the total distribution of samp le means
is 0, and the standard
deviation is 5. I f the differences are normally distributed, then
approximately 68% of
lhese differences will be between - 5 (z = - 1) and +5 (z= +l).
Fully 95% of the differences
in the distribution will fall between the range of -9.8 ( z =-1
.96} and +9.8 (z = +1 .96). If
we drew a random sa mple from each population, it '~ould not
be unusual to find a di ffer-
ence between sample means of as mnch as 9 .8, even though the
population means were
the same.
On the other hand, we would expect to fin d a difference more
than 9.8 about 1 in 20
times. If we set our criterion fo r rejecting the null hypothesis
such that a mean difference
must be greater than +9.8 or less than - 9.8, tben we would
commit a Type I error only 1
in 20 times (.OS) on average. O ur (J. level ( the probability of
committing a Type l error)
would be set at .05.
The probability that a relationship or a difference of a certain
size would be seen in a
sample if the nuU hypothesis were true is represented by p. To
reject the null hypothesis,
p mu~t be less than or equal to <X. The probability of getting
an effect this large or !~rger if
the null hypothesis were true is less than or equal to the
probability of making a Type l
error that we ha,•e decided is acceptable.
1 - u = .95
- 4 - 3 - 1 0 +1 +3 +4
z
- 20 - 15 - 10 - 5 0 +5 +10 +15 +20
X, -x2
a = .05
CH..,tU 6 • Sr.r.nsnu •o• SoctAt Wo~·~ui 85
Rejecting the H0: We believe that it i~ likely that the
relationship in the sample IS gcncr
alizablc to the population.
Not rejutmg the H,; We do not believe that we have >umcient
e1•idence to draw infer-
ences about the populat ion.
For the previous example, let us imagine that we ha-e set a=
.OS. Al;o, imagine thJt we
obtained a difference betwt-en the sample me.ms of 10. The
probability that we would
obtain a difference of +10 or - 10 would be equivalent to the
probability of a z ~core
g reater than +2.0 plus the probabilit y of a z ~core less th.111 -
2.0 o r .0228 + .0228 = .0156.
This is o ur p value; p = .0456. Because p <a, we would reject
the n ull hypothesis.
Some texts create the impression that the alternative (or
research or experimental)
hypothes~ b simply tbc opposite of the null hypothesis. In fact,
sometimes d1is nail·c
alternative h)pothesis is used. However, it generally is not
particularly useful to
researchers. Usually. we nrc inrertsted i n defecting an in
lcrvention effccl of a particu l :~r
size. On certnin measu,·c,, we would be interested in .mwll
effects (<:.g., death rate),
whereas on others, o nly l~rger effects would be of interest.
When we are inter<5ted in an effect of a particular •ize. we use
a specific altemnti1e
hypotbesil. that takes the following form:
H, : f.l 1 - ~,.,;:: id I,
where dis a difference of a particular size. If the test is a
nondirectional I<'St, then the dif-
ference in the alternative hypothesis would be expressed as an
absolute value, ldl, to ohnw
that either ,t positive or neg.tt tve differe~tct~ ;, involv~d.
lt is custo mary to exprc>S the mea11 d i ffere nce in an II , in
units of standard deviat ion.
Such scores are called zsco,·es. T he diffe(ence is called an
effect size. Effect sizes frequently
are used in meta-analyse> of outcome studies to compare the
relatic cllicacy of different
t )'Pes of intencntioos acrOS> 'tudies.
Cohen (1988) groups effect sizes into small, medium , and large
cntegorics. The criteda
for each arc al follows:
Small effect >iu (d ~ .2): It is appro:rimatcly the effect size for
the average difference in
height (i.e., 0.5 inches and < = 2.1) between 15- and 16 year-old
girls.
Medium effect size (d • .5): It is ap proximately the effect size
fo r t he average differc11ce
in heigh t ( i.e., 1.0 inches and s~ 2.0) bNwccn 14- aud 18·
year-old g ir ls.
Large cff<Xl size (d: .8): rh1s is the same eflect size (tl = .8) as
the avcrJge difference in
height for 13- and 18-year-old girls.
l ntuit iv<:ly. it would se..-m t hat we wo uld want to detect
even ve1y >mall effect si ~t·s in
our research. llo1Vever, t here is a practicdl trade-off involved.
All o ther things being equal.
the consistcllt detection of unaU effect >izc' requires very large
(1l > 200) sample size,,
Because 'cry large sample sizes require resources thdt might not
be readily available,
they might not be practical for all studies. Furthermore. there
are c~rtail1 outcome vari-
ables for which we would not be part icuia l'l y in terested in
small effec t>.
If we rejeCt t he null hypothesis, t hen we implicitly huvc
decided that t he evidence >Up-
ports the alternative hypothesis. If the alttrnative hypothc<is is
true and we reject t he null
hypothesis. then we have m3de a correct decision. However, if
we fail to reject the null
hypothesis and the alternati•e hypothesis is true, then we hJve
committC'd a Type II error.
A Type !I error involves the fa ilure to detect an existing effect
(Figure 6.4).
86 P1o11r I • Qt•MmTM •; e A ?PIOAC HtS: Fou NDAti ON)
o, 0.-.tA Contr'fiO'I
Figur• 6 .4
The Null Hypoth<sis
and Typo II Error
Decision
Reject 1io
Fail to Reject
H•
Siluation: ALTERNATIVE HYPOTHESIS TRUE
Result
Correct 0 edslon
1 -13 a t he
Alternative
probabinty of rejecling tho Null Hypothesis when the
Hypothesis is true. The power ot a test.
Type II E n· or
I}~ the p r
Altornatlvo
obability of not rejecling the Null Hypothesis w11e 11 the
Hypothesis is true.
Beta(~) is t he probdbility o f committing a Type rr error. This
probability is eStdblished
when we set our criterion for rejecting the null hypothesis. The
probdbility of a correct
decision (I - f3) is an importdnt probability. It is so important
that it has a nJmc~power.
Power refers to the probability t h.u "e will detect an eff«t of
the size we have sckctcd.
We should decide on the power (I - (3) as well as the a level
before we carry out a sta-
tistical test. just as with Type 1 error, we should decide
beforehand how often we are will-
ing to make a Type 11 error (fail to detect a certain effect size).
This is our f3 level. The
procedure for making such determinat ions is discussed in
Cohen ( 1988).
Assumptio ns for Statisti cal Hypothesis Tests
Although assumptions arc diffc •·cm leu different tests, all tests
of the uull hypo1 hcsis shn re
two related assumptions: randomness nud independence.
T he randomness assum ption is t hnt sample members m ust be
randomly selected from
the populatio n being evaluate d. If the sample is being divided
into groups (e.g., trc:>tment
and control), then assignment to gro ups al.<e> must be random.
This is referred to as mn-
rlom selection and random fWigmnem.
The mathematical models that underlie statistical hypothesis
testing depend on ran-
dom sampling. If the samples Jre not random. then •<e cannot
compute .111 accurate prob·
ability (p) that the sample could have resulted if the null
hypothesi~ were true.
The independence asswnption t. that one member's score •<ill
not innucncc another
member's score. The only common re!Jtionship among group
scores should be the inter-
vention. One implication of this is t hat members of a group
should not have any contact
with each other so as nut to a !Teet each o ther's scores.
Again, the mathematical models are dependent on the
independence of sample scores.
l f t he scores are not independent, t hen the probability (p) is,
as before. >i mply n number
t h•t has little to do with the p ro babilit)' of a Type I erro r.
Parametric and Nonpara metric Hypothesis Tests
Traditionally. hypothesis tests arc g rouped into parametric and
nonp.trJntCt ric tests. T he
names are misleading given th at one class of test has no more
or less to do with popula-
tion parameters than t he other. T he difference between t he
two tests lies in the mathe
matical assumptions used to compute the likelihood of a Type I
error.
Parametric tests are based on the assumption that t he
populations from whkh the
samples are drown are norm.•lly di~t rihuted. Non parametric
tests do not have this rigid
C HAJ>TEJI 6 • STATI 11(~ 1011: SOCIAl WO !U({IS 87
assumption. T hus, a non parametric test can be carr ied out on a
broader range of data
than can a parametric test. Nonparametric lests remain
serviceable even in circumstances
where parametric procedures collapse.
When the populations from which we sample are nor mally
distributed , and when all
the other assumptions of t he parametric test are met, parametric
test~ are slightly more
powerful than non parametr ic tests. However, when the
parametr ic assu mptions are not
met, nonparametric tests are more powerful.
Specific Hypothesis Tests
•Ve now investigate several frequently used hypothesis te.m
and issues surrounding their
appropria te use. Where appropriate, parametric and
nonparametric tes ts are presented
together for ead1 type of design.
Single-Sample Hypothesis Tests
These are tests i n which a single sample is drawn. Comparisons
are made between sample
values and population parameters to see whether the sample
differs in a statistically sig-
nificant way fro m the parent populnt.ion. Occasionally, these
tests are used to determine
~<hether a sample differs from some theoretical population.
For example, we might wish to gather evidence as to whether a
particular population
was normally distributed. We would take a randon1 sample from
this population and com·
pare the <l istribution of scores to an artificially constructed,
normally d istr ibuted set of
scores. If there were a statistically significam difference, tben
we would reject the hypothe-
sis tlwt our sample came from~ normally distributed population
(the null hypothesis}.
Typicrully, these tests are not used for experiments. T hey tend
to be used to demonstrate
that certain strata within populations differ from t he population
as a whole.
Here, we investigate two single-sample test~:
L Single-sample rtest (interval or ratio scale)
2. x' (chi-square) goodness of fit test (nominal scale)
TIJe Single-Srmrple t Test. This rest usually is used to sec
whether a strotum of a population
is different on average from the population as a whole (e.g., are
the mean wages received
by social workers in Lansing different from the mean for aU
social workers in M ichigaJJ?) .
The null hypothesis for t his test is t hat the mean wages fo r a
particular strntum
(l ansing social workers) of the population and the population as
a whole ( Michigan
social wor kers) will be the same:
where !lo is the mean wage fo r the population and ~t 1 is the
mean wage fo r t he stratum.
The assumptions of the single-sample t test are as follows:
Randomness: Sample members must be randomly drawn from
the pop ulation.
fndeptmdence: Sa mple (X) scores rnust be independent of each
other.
Sct1liug:The dependent m~sure (X scores) must be interval or
ratio.
Norma l distribr<tion:The population of X scores must be nor
mally di&tributed.
88 PAIIT I • QUANnrAnVf At-nOA.t-H£s: Fo u iOAnotn o•
OA t A Cou.£CIION
These asswnprioos are li<ted more or lc:.s in order of
in1portance. VioiJtions of the frrsr
three assumptions are es>entiJIIy "f•tal" ones. E'·en slight
violations of the lir..t two
assumptions can introduce major error into the compmation of p
value~.
Violation of the assumption of,, normal distribution will
introduce >Ome error into
the computation of p vJiues. Unless the population distribution
is markedly different
fro m a normal distribution, rhe erro" will tend to be slight
(e.g., a re ported p v.tlue of.042
Jctu ally will be a p value of .057). This is what is meant whe n
some-one snys t ha t the t test
is a <•robust" test.
T he tstatistic fo•· t he sing le sample t te;t is computed by
subtr:ocr ing t he null hypotbe-
• is (popula tion) mean from t h e s"mple mean and dividing by
th e sta ndard error of th e
n1ean.
T he fo rmu la for r...,, (pronOlii1Ced "t obr•ined") is
As the absolute value of '·• get> larger, tht> more unlikely it is
that such a difference
could occur if the null hypothc>sis is true. At a certain point,
tht' probabilit)' (p) of obtam-
ing a t so large becomes sufficiently small (rt'acbt'S the a.
level) that we rcjt'<t the null
hypotbt'Sis.
T he critical value oft (the v.d ue t hat too. must equal or exceed
to reject the null hypoth-
esis) depends o n the degrees of freedom. For a single-sample
rtest,the degree> of freedom
ure df= n - I , whe re" is the s.omp k >itt'.
Let us look at how to compute '"k
v.re know from a statewide SUI'VC)' I hat the average time
taken to complete an outpa-
tient rehabilitation p rogram r-or .o certain injury, X, is 46.6 d
ays. We w ish to see whethe r
clients seen at o u r clinic nrc taking longer o r ;horter than the
state average.
We randomly sa mple 16 fil e< from the pa>t year, We review
these c.1>cS anu dete•mine
the length of program for each of the clients in the sample. The
mean n umber of days to
complete rehabilitation a t our clinic is 19.875 days. This is
lower than the populat ion
mean of 46.6 days. The question is whether this result is
statistically significant. I> itlikel)'
that this sample could ha,·e been drawn from a population with
a mean of 46.6?
To determine thi>, we ne..'<lto calculate r.,... The first step in
calculating t,_,. was arriro out
when we computro the sample mean. Tite next step is to
compute the standard error of the
lllt'aO. We begin this by <umpu ung the standard deviation,
which t urns our to be s 11.888.
Th e standard erro r of the lliCJn i> calculated by d ividing the
standard deviation by t he
square root of the sample size or
s;
_s_ = l 1.888 = l 1.888 =
2
_
9
72.
/ii Jl6 4
We take th e fo rmu la for t,,..., Joel p lug in our n umbers 10
obLain
29.875- 46.6
2.972
-1 6.725 8
2.972 - 5.62
We look up the tabled t val u e {I., ) at 15 degrees offreroom.
This turns out to be 2. 131
for a nondirectional test at (X .05 (sec • t•ble of the critical
values for the ttt»t, non<li-
rectional, found in most ,tatistie> texts). The absolute , .. Jue of
r.,.. = 5.628. This is greater
than t"" = 2.131, so we reject the n ull hypothesis. The e-.-
idencc suggests thot clicnls in o ur
clinic average fewer days in rehabilitation thon is t he case in
the statewide population.
T he effect size index for a test o f means is d and is computed
as follows fo r a single-
sample t test:
d = ~~o .
s
The effect size for our example would be as follows:
d = 29.875 - 46.6
11.888
which would be classifie d as a large effect.
-16.725
11.888 = 1.4069'
1he x' Cootfne;s-of· Fit Test . Th e.%' goodness- of-fit test is a
single·sam pic test. lL is used in
t he evaluation of 11ominal (categorical) variables. The test
involves comparisons between
observed and expected frequencies wi thin strata in a sample.
Expected freq uencies are
derived from either population v-alues or t heoretical values.
Observed frequencie-s are
those derived from the sample.
T he null hypothesis for !he x' test is that the population from
which the s.1mple has
been drawn will have !he same proportion of members in each
category as the empirical
or theoretical null hypothesis population:
where
P., is the proportion o r case~ •.vitbin category kin the null
hypothesis population
(expected), and
P01 is the proportion of cases within category k in the
population from which the test
sample was drawn (observed).
The assumptio n> fo r thet' goodness-of fit test arc as follows:
• Randomness: Sample members m ust be randornly drawn from
the populnt i<)ll.
• Independence: Snmplc scores m ust be independent of each
other. O ne im plication of
this is that categories must be mut ually cxclu;ive (no case may
appear in more than
one category).
• Scaling: The dependent measure (categories) m ust be
nominal.
• expected frequenck$: No exl'ected frequency within a category
should be less !han I,
and no more than 20% of the expected frequencies should be
less than 5.
As "ith all tests of !he nuU hypothesis, the x' test begins with
the assumptions of ran ·
domness and independence. Deriving fr o m thc.~c assumptions
is the requirement that the
categor ies in the cross-tabulation must be mutunlly exclusive
and exhaustive.
Mutually exclusive means t hat an individual may not be in
more than one categot)' per
variable. ExiJaustive means that all categories of int ere;t arc
covered.
These assumpliom nrc listed more or less in o rder of
i.n1portance. Violations of the first
three assumptions are essentially "fatal" ones. Even slight
violations of the first two
assumptions can introduce major errors into the computation of
p values.
90 PA~-r l • OVAinllAt•vt Al'tfiOoCI!CS: FouNOoTION<o 01
DAYA C.ouu:.HON
They} goodness-of-fit test is basically a h>rgc-sam plc test.
Whc11 the c·xpectcd frequen
cies are small (expected frequency les.~ thnn I or atlc:1~t 20o,(,
of expected ft·equ,•ncics less
than 5), the probabilities associated with the X' t~St will be in
accurate.
The usual pt·occdtu'c in this case is either to increase expc led
frc<1ucncb b)' colbp, ing
adj.>ccnt C<>tcgorics (also called cells) <>r to u.<c '"' ot her
test. Follo<"ing is a concrete
CX:l111 plc.
The workers at the Interdenom ina tional Social Services Center
in St. Win ifre d
Township wanted to see whether they were servi ng people o f
all fniths (and those of no
fit ith) equ:11l)'· The)' had census 11gures indicating that
religious preferences in the town>hip
were as follows: Ch risti~n (64%), Jewish (10%), Muslim (8%),
other religionino preference
(14%). and agnostic/atheist ( 4%).
The workers randomly sampled 50 clients from those seen
during the previous year.
Befor• they drew the sample, they calculated the expected freq
uency for each category. To
obtain rhe expected frequencies for the sample, the)' converted
the percentage for each
preference to a decimal proportion and multiplied ir by 50.
Thus, the expected frequency
for Christians was 64% of 50 or .64 x 50 : 32, the Jewish
category was 10% of 50 or
. 10 x 50 = 5, and so on. Table 6.5 depicts the expected
frequencies.
TABLE 6.5 Expected Frequencies for Religious Preferences
Expected
fr(!q uency
Christi (In
J2
Jewish
5
ti1uslim Other/No Preference Agnostic/ Atheist
4 7 2
Two (40%) of our expected frequencies (Muslim and
agnostichlllteist) are less than 5.
Given that the maximum allowable is 20%, we are violating a
test assumption . We can
remedy this by collapsing categories (merging two or more
categories into one) Ot' by
increasing the sample size. However, thet·e is no c.ategoq• that
we could reasonably com·
bir1e with agnostic/atheist. lt would not work to combine this
C<tegory with any of the
other categol'ics because the latter ar• religious individuals,
whereas atheists and agnostics
aJe not religious.
However, we could increase the sample size. To get a sample in
which onl)• one (20%)
o f the expected frequencies was less than 5, we would need a
sample large enough so that
8% ( percentage of the population identifying as Muslim ) of il
would equal 5:
0.08 • 11 = 5
" = - 5- = 62.5 "' 6J.
0.08
So, our sample size would need to be 63, givi11g us th e
expected frcq ucncio.:> show11 in
Table 6.6. On!)' one of live (20%) of the expect«l frequencies is
less I han 5, and nQne of
them is less tha n I, so the s:un ple size assumption is mel. The
results of a random sample
of 63 cases were as found in Table 6.7.
TABLE 6.6 New Expected Frequencies for Religious
Prefere~ce; ' · < · ;. : •: •: •
. . ~ ' * •
Christian Jewish Muslim Other/No P(eference Agn ostic:/
Atheist
--------------------------
~>:pecte.fl
frcq uc:nc;·
~0.32 6.30 5.04 8 82 2 52
TABLE 6.7 Observed and Expected Frequencies for Religious
Preferences
Christian Jewish Muslim Other/ No Preference Agno$tic/
Ath~isl:
Expected 40.3L &.30 5.04 8 .82 2.52
rr~(j ll CrtCy
Obse1·.-cd 49 2 2 9
frequency
The null hypothesis fo r this example is th;~ l the p roporlion of
peo ple living in St.
Win ifred T<>wnship who identify 1vith each religious
categor)' will be the sam.: as the pro·
portion of people who have received services at the
Interdenominational Services Center
in St. Winifred 1b w nship who identify wit·h each relig io us
catt:gory.
The null hypoth~sis expresses the expectation that observed and
expected frequencies
will not be differem. Notice the similari ty ben~<.>en the nu ll
hypothesis and the numerator
of the ,,, .•. test statistic:
/v IJ&
X2 = "' (Jo - rd 0 0 1 L- fE .
T he form ula tells us to >U btract the e xpe<ied score from the
observed score (j~ -.0 and
then to square the difference (ffo - f.:]' ) and divide by the
expected score (ff0 - J~l'!f.) for
each observed and expected score pair. •Vhen we are fmished,
we add the answers and
o bta in the X',,, test s~tlist ic (Ta ble 6.&).
The x.,. is evaluated by comparing it to a cr-itical value <x'.,,)
that is obtained from a
table of critical values of the X2 distribution. If X'.,b, is greater
than or equal to x', ... • then
we reject t he null hypot hesis.
For ax' goodness of fit, the degrees of freedom are equal to the
number of ,,ategories
(c) min us I or df = c- L In our case, we have five categories
(Christian. Jewish, Muslim,
otherino prefere nce, and agnostic/athe;st), so df = 5- I = 4.
The critical value fo r X' at C< = .05 an d df =4 is X' .," = 9.49.
We have calculllted 7.'.,., as
23. 1295. Because X1<,1>1 is greater than X.~ena , we reject
the null hypothesh:. The evidence .sug-
gests that people of all faiths (and those of no faith) are not
being sec11 proportionately to
their representations in the township.
Earlier, we discussed the use of t he effect size measure d for
the t test. Jt is an appropri-
ale measure of eftect size: fO r a test of means. However, Lhc
X2 test doc,~ not compare
92 PAIT I • Q UAIITI TA.Tivt A PPfiOAW £s: fou~OAliONS
O f DATA Coll.ECTI OM
TABLE 6.8 Computation of x' ...
Observed (f
0
) Expected (f,) fo - fe lfc - f,)' (f.- t,)'
f,
49 4032 +8.68 75.3424 17.4404
2 6-30 -4.30 18.4900 2.9349
2 5 04 - 3.04 9.24 16 1.8337
9 .8.82 - 0. 18 0.0324 0.0037
2.S2 - 1.52 2.310• 0.9!68
!'JOT!.: I
(f, - f,)'
17,4404 + 2.9349 + I 8337 + 0.0037 + 0.9168= :t',, = 23.1295.
f,
means. It compares frequencies (or proportions}. Therefore, a d
ifferent effect size index is
used for the X' test-w. This measure of effect size ranges from 0
to I . Cohen ( !988) clas-
sifi es these effect s izes into three categories:
Small effe<i size: w~ .10
Medium effect size: w ~ .30
Large effect size: w ~ .50
The effect size c.oefficient for a x! goodness-of-fi t test is
computed according to the fol-
lowing formula:
where N = the total sample size.
For the St. Winifred Township example,
IV= J(23.! 295/ 63}- J(0.367l) = 0.6059,
which would be classiGed as a large effect.
Hypothesis Tests for Two Related Samples
These are Jests in which either a single sample is drawn and
rneasLtremen ts are taken at
rwo times or two samples are drawn and members of the sample
are individually matched
o n som e altribute. ~vfeasureJDeDts are taken fot each member
of the matched groups.
We· investigate three examples of two related sample tests in
this section:
I. Dependent (matched, paired, correlated) samples t test (in
terval or ratio scale)
2. Wilcoxon matched pairs, signed rank.~ test (ordinal scale)
3. McNemar change test ( nominal scale)
C1MPH~ 6 • Sunsncs FOR Sot-IAt 'IOKKUlS 93
Difference Scores. The dependent r test and the Wilcoxon
matched pairs, signed ranks test
evaluate d ifference scores. These may be differences between
scores f<om measuremenl~
taken m two differen t times on the same individual (pretest and
posttest) or differences
between scores taken on two diffe rent individuals who have
been paired or matched with
each other based on their similarity on some variable or variable
cluster (e.g., gender,
race/etllnicity, socioeconomic status). The formula for a d
ifference score is
x; - X1 =X0 ,
X, is the first of a pair of scores,
x; is the second of a pair of scores. and
X
0
is the d ifference between the two.
The null hypothesis for all these tests is that the samples came
from popub tions in
which the expected differences are zero.
Tlte Dependenr. Samples t Test. This also is called the
correlated, paired, or matched t test.
The nu ll hypothesis for this test is that the mean of the
differences between the paired
scores is 0:
where
J.l.xo = the mean diffe rence between the populations from
which the samples were
d rav.1n) and
)!00 "" the mean difference between the populations specified
by the null hypothesis.
Because the null hypotnesis typically Sp<!cifies no difference
(!!00 = 0), the null hypothe-
sis usually is written as
The t statistic for the dependent t test is the mean of the sample
differences divided by
the standard error of the mean difference or
Xo - l'oo
lobt = 5= ·
XD
As the absolute va.lue of t. gets larger, the more unlikely it is
that such a difference could
occur if the nnll ll)'pothesis is true. AI a certain point, the p
robability (p) of obtaining at so
large becomes sufficiently small (reaches the alpha level) that
we reject the null hypothesis.
The assumptions of the dependem t test are as follows:
Randomness: Sample members must be randomly d rawn from
the population.
Tndependence: Xvscores must be independen t of each other.
Sca ling: The Mpcndt'nt measure (X
0
scores) must be interval or ratio.
No r·mal distribution: The population of X
0
scores must be normally distributed .
These a>sumptions a re list ed more or less in order of import>l
11cc. Viola tions of the t1 rsl
t hree asswup tions i1re essen t ially "dea th penalty" violation..
Eve n slight violation. "r the
(ht two assumpti011s can intr oduce majo r e rror in to th e
comp ullll ion or p values. Sim i lady,
dilTnence scores computed fro1n ~""'O sel!t of ordi nal data
tnay inwrporate major error.
Violation of th~ assu mption of a normal distribution "ill
introduce some error into
the computation of p values. However. Wllcss the population
distribution is markedly dif
fcrent fi-om a normal di>tribu tion, the errors will tend to be
slight (e.g., a reported p value
of .042 actually will be a p value of .057). Th is is what is
ml·an t wh en someone '"YS thnt
the t test is a "'robu~t .. test.
Still, cvm thoug h t he erwr is sli~;ht, the nonp<tr<~metric.
Wikoxon rn;,tch ed ~>t~ irs,
sig ned ranks test (discussed in the next section} prob;,bly will
yield a more accu rate test
when there are viulation~ of this normal dislribution
as.su.mpliun.
Let us look at the proc<"<iure for compuling th<: dependent
grouvs I statistic. We usc an
evaluation uf an intervention for individuals '"ith dcpre..,.inn
problems. The dependent
measure is the Bclk Depression Inventory ( liD I), a reliable and
well 'alidated mea>urc nf
dcpn:s~;un.
Ten clienL~ were rand omly s~kcted r,·om clients seen fo r d ep
ression problcn" a t a (l,un -
m unity cent~r. 'I 'hey were pretested (X,) with t he BDI, r<·cd
ved I he treatment, ;,nd t he n
were posrtested (~)wi th t h e same inst ru111e n1.. The m ean
of the d iffe rence scores (.k0 )
wa.s - L This means that tJ K· aven1ge: chtUl.gC' in BD f
scnrefi fron1 pcelC'Sl tu pn:-:ttest was a
dtcrease of I poinl. The standard deviation of the ditlcrcnce
s.:ort> 'aS l.H .
'I he ne>.'t step is the cnmpntation of the 'landard error ol tllc
mean. Wedhdde the stan-
dard deviation by the square rout of t he s.unpk siu: to get t he
standard c·rror of th e mean:
.< XD = 1.'33/ V 10 - l .;l3j 3 .16 = 0 .•12.
 Ve plug the value.< into the formula li>r t.,.:
XI>
r"lobt = -
-'xl'>
- 1
-~ - .1..38
0.42 ..
Fo1· a = .05 and rlf ~ 11 - I = 10 - I -9, r, ... = 2.262 (sec a
t<~nle of critical values for the
1 te,r, nondire.:tional, fo und in m ost stali>Li" texts). Because
lt .... l - 2 .. l8 is greater !loan or
equal tn the critical ';liuc, we reject the null hyp(llhcsis at a=
.05.
The cff~ctsi/e index for tbiotc.,l i' ll and is rom puled a;
foUows:
;
For the depr~ssion intervention cx,unplc,
-1-0 - 1
d = = = - 0.752.
1.33 1.33
w hich wou ld be classifier! ns " medium effect.
CHAI'rER 6 • SI All~ucs Hl!t Socu .. l Woll.~Eas 95
lv'ilc&X011 Matched Pairs, Signed Ranks Test. The Wilcoxon
matched pairs, signed ranks test
is a nonparametric test for the evalua tion of d ifference scores.
The test involves ranking
d ifference scores as 10 how far they are from 0. The difference
score closest to 0 receives
the rank of I, the next score receives the rank of 2, and so on.
The ranks for diffe rence
scores below 0 are given a negative sign, whereas those above 0
are given a positive s ign.
T he null hypothesis is t hat the sample comes from a
population of di fference scores in
"' hich the expected difference score is 0.
The assumptions fo r t he Wilcoxon matched pairs, signed ranks
test are as follows:
• Ratufomness: Sample members must be randomly drawn fro m
the population.
• independence: XD scores 111ust be independen t of each
other.
• Scaling: T he dependent measure (XD scores) must be ordi nal
(interval or ratio dif-
ferences must be converted to ranks).
Let us look at the procedure for computing the Wilcoxon
matched pairs, signed ranks
test statistic. We use the same example as for t he t test. The
dependent measure is t he BDI,
a measure of depression. Scores on the BDI are not normally
distributed, tending to be
positively skewed.
Ten clients were randomly selected from clients seen for
deprcs.~ion problems at a com-
mun ity center. They were pretested w·ith the BDI~ received the
treatment, and I hen were
posttested with t he same instrument. We c.ompute the
difference scores (post -pre) fo r
each indi,·idual. We assign a rank to each difference score
based on irs closeness to 0.
Difference scores ofO do not receive a rank. Tied ranks receive
the average nlllk for the tie.
So, if we look at Table 6.9, we see that there is one difference
score of 0 that goes
unranked. There are five difference so::ores of eit her - 1 or +L
These cover t he first five
ranks {I, 2, 3, 4, 5), giving an average rank of 3. T here are
three difference scores of - 2
(and none of +2). T hese cover the next three ranks (6, 7, 8) ,
giving an average rank of 7.
The una! score is - 3, which is given the rank of 9.
TABLE 6.9 Computation of the Wilcoxon T .. ,
Signed Ranks
JD Number Pretcsl Postte.st Difference Rank Positive Negati ve
17 16 - 1 3 3
2 19 t8 -1 3 3
3 18 15 -3 9 9
4 18 17 -1 3 3
s 16 16 0
6 16 17 +1 3 3
7 18 16 - 2 7 7
8 21 19 - 2 7 7
9 18 19 .+1 3 3
10 18 16 - 2 7 7
NOTE: Sum of ranks for less, frequent ~ign ~ 6:
9 6 t-'11111 I • QUAWhlAII11 Al•f'II(IA(tUI: r t i
UNOATI(Hn ()I I)AlA (.OU I CI101i
T he M<l st<·p is to '';ign" the rank. ' I hi< mcJns to place the
rank in eith« 1hc p<hilivc
or 1hc negative <.Oiumnm 1h~ l.thle. depending on whether 1he
differ,·ncc >(Ore wa, PO>i
tivc or ncg.uivc.
We then determine which ,ign (JXl,ithe or neg.ttive) apJl<'ared
1.-s~ fre<JUCOtl)· Jnd add
up rhc r.mks for 1his >~!(n. lkcause th e positive sign ,tppearctf
only twice (comp>rctf to
~even tim~s for lhc ncg:.uivc sill.n)~ w~: add up I he rank~ in
the pO$itivc column .lnd obtain
1>. rhi•" I he IC1 l3l"lic v~lue for the Wil<OM>n mJI<.hed
J>J II.,, stgncd r:lnks test.
Th e IC> I. stati>l icis w iled 'f.,1, . This is an 11 ppcrcase T a
nd is not the >flllll' as the >tatistic
us<:d with the (lo'"erc.tse) I distribullon.
There are two other i<sues with re>pect to the Wilwxon 7.1,. •
hat shoul11 be ad,lresscd:
1. The Wilcoxon T..., is cvaluat<·d according to rhe ruombtr of
nontcro differentc
~cores. So, we should subt ract I from the o rigina l 11 fo r each
<liiferenc~ score th ot h
0 10 obtJin a corrected 11 to usc for the critical '~lue table.
2. Unlike most other t~>l &ratistic~. the Wilcoxon T,,, must be
lrss tlta11 or equa l to t he
c ritical value to ,·eject the null hypothc>is.
We consult a table of critica l values for I he W ilcoxon T(scc t
ahlc of .:ritical values for
Wilcoxon Tin any general swristics book) Jnd stt whether obe
result (7.,.. = 6) was sig·
nificant at o. = .05. lle<:ause there wa. one differen ce score
equal to 0, the corrected 11 = 9.
The critical value for the Wilcoxon 7"a t n=9 and a .05 is T.,. =
5. 1:,.. = 6 is not less than
or equ•lto the critic.ol value, so we fail to reject the nuU
h)·polhesi> at o.- .05.
There is n o weD-accepted post h oc measure of effect sizt for
Otd in:d tesL~ of rela ted
scores. One possib le measure would be proportion of
nonoverlapping scores as a measure
of effect. Cohen ( 1988) brieOy discu~s this measure, called U.
The p1·ocedure bc:gins with compul ing the miniJuum and
maximum ~cores for each of
the two related g roups. We choose the least maximum and the
greatest minimum. Tbi>
establish es the end points for the overlap range.
We count t he n umber of scores in both groups w ithin this
mngc (including rhe end
JX>ints) and divide by the total number of scores. This gives a
proportion of overlapping
score.o;. Subt ract t his number from I , and wr o btain the p
ropottion of nunoverlapping
$Cores. T his indc.~ ranges from 0 to I. Lower proportions arc
indicative of ~mallcr effects,
and higher on~> are indicative of larg<·r effects.
Cohe11 ( 1988) calcula tes equivalent< between U a nd d, which
would imply the foUow·
ing definition> of strength of effect:
Small ct rect slzr
Uugc ('tfect SIZC
d~ ~
d:.8
u- .IS
u- .33
u ~ 47
f"Or the example da1~, the minimum scooc for th e prctCl wa&
16, and the mnximum
~core w;1~ 2 1. The poSit(!St miuimum and ua.tximllln -;cores
wt:r~ 15 .md llJ. rc-'>petti•cly.
'I h e grc•test minimum is 16 •• md lht lcastm.l.ximum is 19.
Of 20 total '>()1 e.,, 1 ~ f~U with in thi, 1werl.•1> r.onge. The
p ru('<J rt ion of ovcrhop is I ~/20 c.~) .
Tl'te pwportion of nonovcrl•ppings..otc., b u~ 1 -.90 = .10.
hich would be a smJJI cft<:.:t.
CHAnt~ 6 • STAT1srtcs rQR SQetAL Wcnrxus 97
.WcNmmr Change Test. The Mc:-icmar change test is used for
pre- and post intervention
designs "'here the variables in the anai)'Sis arc dichotomously
scored (e.g., improved ~.
not impro,•ed, same,.,_ different, increase 's. decrease).
The layout for the McJ-:emar change test is shown in Figure
6.5. Cell A cont.Un> the
number of indh~dual.s who changed from+ to-. Cell B contains
the number of individ-
uals who recei,ed +on both measu rement>. Cell C contains the
number of individuals
who received - on both measurements. Cell D contains the
number of individullh who
changed from - to +. The null hypot hesis is expressed "'
where
P, is t he proportion of cases shifting from+ to- (decreasing) in
the null hypothesis
population, and
P
0
is the proponion of ca,.,; shifting from - to + (increasing) in the
ouU hypothesi'
population.
The assumptions for the McNemar change test are sintilar to
those for the X' test:
Rrmrlomness: Sample members must be randomly drawn from
the population.
Independence: Withi n-group sa111 plc sco•cs must be
independent of each other (although
llerween-group scores [pre· ~nd poM1c~1 ~cores] will
necessarily be dependent).
Smling: The dependent measure (categol'ies) must be nomi nal.
F.xpected frequencies: No expected freq ue11cy within a
category should be less than 5.
A special case of X'..,, b t he test >tatistic for the McNemar
change test:
where
t _ (If,. .fi,f - I ) 2
'"" - f, + fn
J. =the frequency in Cell A, and
fn =the freq uency in Cell D.
Th ·is is a test statistic with df = I , For rlf I , we need to
include s·omcthiug called the
Yates correction for continuity in the equation. This is - I,
which appears in the n ur.-'1~ 1'"
tor of the test statistic.
Figure 6.5
McNemar Change
Test layout
Before +
After
A B
c 0
98 PART I • QuAutlfi~T•vt A PI'AOAC HlS! Fou~JDAfiONS
OF Ot.rA CotUCliON
Let us imagine that we are interested in marijuana use among
high school students. We
also are interested in change in marijuana ust over time.
Jmagine that we collected survey
data on a random sample of ninth-graders in 2007.1n 2009, we
surveyed the same sample
that had been in ninth grade in 2007. We fo und that 32 of 65
students said that they used
marijuana during the previous year, as compared 10 23 of 65 in
2009. The results are sum-
marized in Table 6. 10.
TABLE 6.10 Observed and Expected Frequencies for the
McNemar
Change Test
2009
None Marijuana
2007
Marijvana 2 (Cell A) 21 (Cell S)
None 31 (Cell C) 11 (Cell 0)
Total 33 32
l'o!<ll
23
42
65
Cell A repn-serm thMe studeitts who had used marijuaM in
2007 hut who had nOf used
it in 2009. Cell B shows the number of students who had used
marijuana in both 2007 and
2009. CeU C shows the number of students who did not use
marijuana either in 2007 or in
2009. Cell D shows the number of students who did not use
marijuana in 2007 but who did
use it in 2009.
So, the sum of Cells A and D is the total number of students
whose patterns of mari-
juano use changed. The nuU hypothesis fo r the McNemar
change test is th at changing from
nonuse to use would be just as likely as changing from use to
nonuse.
In other words, of the I 3 individuals who c.ha11ged their
pauern of marijuana usc, "e
would expect half (6.5} to go from not using 10 using and the
other half (6.5) to go from
using to not using if the null hypothesis were true.
Tile calculation of the McNemar change test statistic is shown
in Table 6. 1 L
!'or df ~ 1 and C/. ~ .05, x,, = 3.84 (see a I<Jbe of critical
values of x' fo<md in most sta-
tistics texts). Because x ',., = 4.92, we would reject the null
hypolhesis at u = .OS. We would
conclude that there was in fact aJl increase in marijuana use
between 2007 and 2009.
TABLE 6.11 Computation of the McNemar Change Test
Statistic
( JI~ - f01)-1
2 11 8
NOTE: 7~1 = 4.923.
64
(If. - f. l- 1 I'
f..,. + fl)
4 ,9230767
CHAot1U 6 e STATISTICS fO-. SOCI~l W O'-I(rll 99
The effect size coefficient for a M':-lemar change test is wand
is computed according
to the following formula:
For the high school survey,
w = J(4.923/65) "' Jo.o757 = 0.2752,
which wo uld be classified as a medium effect.
Hypothes is Tests fQr Two Ind e p e nde nt S amples
These are tests in '•hich a sam ple is randomly drawn and
individ uals fro m the sample Jrc
rJ.ndomly assigned to one of two experimental conditions.
We investigate three examples of two independent samples
tests:
I. Independent samples (group) /test (interval or ratio scale)
2. vV"dcoxonfMann-Whitney (WfM-W) test (ordinal scale)
3. ;(2 test of independence (2 X k) ( uominal scale)
l11depeudent Samples 1 Test. T his sometimes is CJIIcd the g
roup t test. It is a test of mcJ.ns
whose null hypothesis is fo r mally stated •• follows:
Following are the assum ptions of t he independent t rest:
Randomness: Sample members m usr be randomly drawn from
the populotion and ran·
dom ly assigned to o ne of the '-"0 groups.
ltrdepe11dence: Scores must be independent of e.1ch or her.
Scalitrg: The dependenr measure musr be inrervlll or ratio.
Normal distribution: T he populations from which tbe
individuals in the samples were
d r,own must be normally distribured.
Homogeneity of variances (a,'- a ,'): ' f he samples must be
drawn from populatious
whose variances are eq ual.
Equality of sample sizes ( "• = n,): ' I he samples m ust be of
the same sir.e.
As before, these assumptions are listed more or less in o rder of
imp o rtance. T he fir. r
three assumptions are rbe " fa tal" assum pt ion;.
Violation o f the nonnaliry assumption will make for Jess
accurate p val ues. However,
unlc;.s Lhe population dist r iburion is markedly diiTerent from
a normal d isrr iburion, the
errors will tend to be slight. Slill, e"en though the error is
slight. the oonparamcrric W /M-
W test probably will be more accurate when the norma lit)•
assum prion is violated.
The independent groups t tesr alw is fair!)' robu>t .-ith respect
to •iolation of the
homogeneiry of variances assumption and the equal sample size
assumprion. A problem
may .orise when both of these assumptions are violated Jtthe
same time.
100 PAnl I • OUANntAuvt Art~AoAc.ul~ Fou~~rooAT ION>
o• 0"'" Ct~ur<TION
If the ,maller variance •~ mthc "11allca >.~mple.then the
probability of,, I ypc II ca ror ( 1101
deteaing an exi;,ting dilfcrcn<c) ia"rC.1«'>.i£ th(' larger
'ariancc is i 11 til<' <mJIIcr .amp!<-, then
1 he probability of a 1Ypc I error (rei<-.:ting the null
hypothc:.i> when it i> true) anne.a'<".
If there is no ..tSsodarion lk·twt-en s.;1mplt"' Mit.' ~lnd
vari:wcc. then ''iol.l1ion of c:.u.h of
thc>e .~S»umptions is not partiCufMiy problem.uic. There may
be fairly ,,ub>t.mtial di~
crrpJncies bctwet•n s. .. mplc si1C!' withnut much effect on
Lhc dtc.ur~cy o i Ottr /' cMim.lttl'!.
Similarly, if e- very other n~~nmption i!) mel, 1hcu a slight
difference in v11riam:c:. will not
h ave a fa rge effect on probability estimates.
T he t stat i~tic for the independent 1 lc<t is the d ifference be
tween the snmpfc 111cans
d ividc<l by the standard e•-roa· ,,r the diffprrnces between
means or
x , - x2
lut-·1 --
Sx 1- ... ~
Be«luse rwo sample mean• arc computed, 2 degrees of freedom
are lost:
df 110 + n, - 2.
where
"• = number of scores for the first group, and
11
2
= number of scores for the seco11d group.
Following is an example ot the ll>e o( the independent t test
statistic. We whh to sec
wl1ethf:r there is a difference i11 ((•vel of soci.al act iv ity in
children depending 011 whether
they are in after-school care <>r h0111c (.(ltc . Because more
childre11 attendc<l the .1fter
school program, a proportional~ stratilied >ample of 16 children
in afteHchoof care
(Group I ) and 14 childien in home care (Group 2) was drawn.
The dcpcnclcnt meJsure
v,•as a score on a socir1 l activity ).CJ )e in whk h lower scores
represent less soc ial aclivity and
higher scores represent more social activity.
We c'aluate tl1is with an independent 1 tc.L The first step in
calculating '·•• i, to com·
pule the sample mean for each group. The next step is to
compute the stJndard error of
the mean. Howe•·er,the pl'()(cdure for doing thi< i~ a little
different from that u«<< before.
A> lou might recall. the standard error of the mean is the
standard dcvi,ation d" aded by
the square root oi the sample 'ire:
$
.,;;; /sl !.. II
This also is equivalent to the squ:HC •·oot· o f the variance
times the inverse of the,., , .
p te size (l/11).
Unf{'trtunately) we c:u•not u~t..· lhis IOI'tnuln for t+ae standa
rd error o f lhc mean. It is I he
"ttdnda l'd crroJ' for a sinr,l<.- ... amplt. Bccauo,r we have two
sample:, in ,m iudcpcndt•nt
WOU(JS lCsi, the formula has to he Jitert·tf J bit.
Th~ first difference i in the (orrnuiJ for •he: va ria nce. TIH!
variM1u: i' the Uill o l
..qual'l."> divided b)' the deg~C·c~ of lrct'dom. ll•s tht same
he...- eX(Cpt that we have two
'oms of squan:s (one for Group I and one for Group 2). and o u1
degree< of freedom Jr('
11 1 rt. 2. Thi• gives "' the folfowint: cquJtion:
ss, ss1
" ' I II• 2'
CH.t.PHR 6 • Su.nsncs f OR SOC IAL W ORKERS 101
s; is the pooled estimate of the variance based on two groups,
55
1
is the sum of squares fo r Group I ,
SS, is the sum of squares for Group 2,
n
1
is the number of scores in Group J, and
n, is the number of scores in Group 2.
Because there are two groups, we do not multiply s: times (1/n);
rather, we multiply it
by i lin,+ I In,). We take the square root of this and obtain the
pooled standard error of
the mean:
S.'1-Xl = , (I 1) s- - + -P IlL nz .
The means and sums of squares for our example are presented in
Table 6.1 2. Now, let
us tq• computing t..,,.
TABLE 6.12 Group Statistics
Group Mean Sum of Squafcs "
27.8B <1330.40 16
Home care 21.36 17{)7. 16 4
First, we compute the pooled standard error of the mean (also
called the standard
error of the mean difference). We begin by calculating the
pooled variance:
ss, + ssl 43:;0.40 + 1101.16 6037.56
28 = 215.63 . = n, + n2 - 2 16+14-2
From the estimate fo r the pooled vari<Htce, we may calcubte
the standard errol' of the
mean diffe rence:
s2 - +- = ( 1 I) I' tll ll2, 2 15.63 (~ + ~) = ,128.88 = 5.37 16 14
Wt calculate 1
001
:
27.88 - 21.36 6.52
lobt = = -- = 1.213 .
5.37 5.37
For ex = .05 and df = 111 + 112 - 2 = I 6 + L4 - 2 = 28, Ia;, =
2.048. Because 1100,1 = 1.213 is
less than the critical value, we fa il to reject the null hypothesis
at a. = .OS.
102 PAI!.l I • QuANtiTATIVE AI'P~OACHES: Fou ... O-.liOM
Of 0ATA co~UtliO'f
There are two post hoc effe<:t size measures for an independent
t test. The 11m of these
(d) already has lxen di.cmsed:
Note dlatthe numerator is the difference between the two sample
m eanl and that th e
denominator is the pooled c>ti mate oft he standard deviation.
The pooh.'!! •t andard de,•i-
ation is t he square root of the pooled variance that we
calculated earlier:
Sp = fs~ = V215.63 = 14.68.
The effect size for the example would be
d = 27.88 21 36 = 6.52 = 0.44
14 .68 14.68 ,
which would be classified .ts a 1mallto medium effect size.
The other measure is Tl • (eta-.quare). n' is the proportion of
variance explained ( Pifl:) .
This is equivalent to the 'quared point-biserial correlation
coefficient and is computed by
2
/<lbt
2 if.
/Obi + d
We '''ere com paring socinl nc tivity in c hild ren in after-school
care vcrMJ> t hose in home
ca re. Children in after-sdtool cure sCC)rcd h igher on social
activity than d id c hild ren in
home care. T he differe nce was not statistically s ignificant for
<> ur chosen ex = .05.
r.,.,. was 1.2 13 with df • 28. Pu tting these numbers in t h e
formu la, we obtain the
following:
l_ ( 1.213)
1
" - ( 1.213)
2 + 28
1.471
29.47 1 = 0'0499'
So, a litde less than 5% of the variability in social activity
among the chlldren was
potentially explained by whether they were in after-school care
or home cJre.
Wilcoxon/Mann -Whiwey Test. Statistic> texts used t o reter to
this te>t as t he Mann-
~Vhitney test. Recent ly, th e name of Wilcoxon has been added
to it. The reason t hat
Wilcoxon's n ame has been added is t hat he developed the test
first and published it first
( Wilcoxon, 1945). Unfortunately, m OI'e fo lks noticed the art
ide publishtd by Mann a nd
I•Vhitn ey ( 1947) 2 years later.
Tbe W/M-W test is a nonp a1·ametric test th at involves initia
lly t reating both samples as
one group and ranking scores from lcn;t to most. After this is
done, the freq ue ncies of low
and high ranks between groups arc compared.
The assumptions of the W/M W test are as follows :
Randomness: Sample members must be randomly drawn fr<>m
the popuiJtion of inter-
est and randomly a>Signed to one of the two groups.
C U AI'rtll 6 • S IAHSHCS FOR $o cu._t W ORKU$ 103
Independence: Scores m ust be independent of each othe r.
Scaling: The dependent measure must be ordinal (inter val or
ratio scores must be con-
verted to ranks).
'When the assumptions of the t test are met, the r test will be
slightly more powel'ful
than the W!M-W test. However, if the distr ibution of
population scores is even slightly
d iffe rent from normal, t hen theW /M • W test may be t he
more powerful test.
let us look at the procedure for com puti ng t he W/M-W test
statistic. We use the same
exam ple as we d id fo r t he independent r test. We evaluated
level of social activity in
children in arter-school ca re and in home care. T he dependent
measure was a score o n a
social activity scale in which lower scores represent Jess social
activity and higher scores
represent more social activity.
The first step in carrying out the W/M· W test is to assign ranks
to the scores without
respect to which g roup individuals '"ere in. The rank of I goes
to the highest score, t he
rank of2 to the next highest score, and so on . Tied ranks
receive the average rank. We then
sum t he ran ks within each g roup. The summed ranks are
called W1 for G rou p 1 and W,
for Group 2 and are fo und in Table 6.13.
TABLE 6.13 Summed Ranks for the Wilcoxon/ Mann-Whitney
Test
Summed ranks
After-School Care
n
1
= 16
w,= 218
Home Care
n
1
= 14
w;-= 247
The test statistic for the W/M-W test is u..,,. We begin by
calculating U statistics for
each according to t he fol lo wing equations:
U
111 + ( 111 + l)
1 = 11J n;z. + lFV1
2
n2 + (n2 + 1)
U2=11rnz+ 2 w,
nt(nt + 1} u, = ,,, tiJ + 2 - w,
= ( 16)( 14) + ( l6)(~6 - I} 2 18 = 126
(]
112(n 2 + I}
2 = , J l'l:z. + -=-'-=,...--'-
2
w, = ( 16}(14) + ( 14}( 14 - l)
2
182
= 224 +-- 247 = 224 + 91 - 247 = 68.
2
We choose the smaller U as u;,.,. Ln this instance, u.,. = u, =
68.
247
u •• , m ust be less tlran or equal to the critical value to reject t
he null h ypothesis.
The critical value for the W/M· W U at n, = 16 and at n, = 14,
and o: = .OS is U"'' = 64.
104 PoIU I • 0uAN11tAT!V( A1'1'110M.Ht~ : FOU'IDATIO.,.S
or OoTA CouH.UO'
U.,..: 142 is not less than or equal to the critical value, so we
fail to rejtct the null hypothe-
sis at CL: .05.
As before, t here is no well-established effect size measure fo r
the W/M-W test. The U
m easure of nonoverlap probably would be the best bet.
For o ur example data, the minimum and maximum fo r t he
after -school care g roup
w ere 2 and 55. whereas they were 7 and 40 for the home care
grout>· The greatest mini -
mum is 7, and the le"'t ma.ximum is 40. All 14 .cores in the
home ca re g roup are within
the overlap range, and 12 of l4 scores in the after-school care
group are in t he overlap
range. This gi•es us a proportion of overlap of 26/30: .867. The
proport•on of nonover-
lap is U I .867"' .133. This would be ,, small effect.
X' Test of lmlcpt!m/ence (2 x k). The assumption> fo r d1e x'
test of indCj>Crtdence are as
follows:
/lat~dom/les.: Sample members must be rnndo mly dra"'n from
the 1>opulation.
/Jillependl'!lre: Sample scores m ust be independent of each
other. O ne implication of
this is tha t categories must be mutually exclusi'e (no case m ay
appear in more than
one c.1tegory ).
Scaling: The dependent measure (categories) must be nominal.
Expmcd frequmcie$: No expected frequency within a category
should be less than 1,
and no more d1an 20% of t he exp«tcd freq uencies sho uld be
less t han 5.
As wit h all tests of t he null hypothesis. the x2 test begins with
t he assumptions of ran-
d omness and independence. Deriving from t hese assumptions
is the requirement that the
categories in the cross·L1 bulation be mulllnl/y exclusive and
ex/u~ustive.
Mwunlly rtclusive meaJlS that nn individual may not be ill more
thn n one category per
variable. Bxluwsti•-e means that all possible categories are
covered.
let us imagine that we are interested in marijuana use among
high school students and
sp<-cifically whether there are any diffcrcn= in sutb use
between 9th and 12th-graders
in our school di>trict. We conduct • proportionate str atified
samplt in which we ran-
domly s:~mplc oixt)'-five 9th-graders and fifty-five 12th-g
raders from all Mudents in the
district. T he students are surveyed on t heir usc of ((rugs over
the past ye.ar under condi-
tio ns guaranteeing co nfiden tiality of response. Table 6.14
depicts reported marijuana use
f o r t he s tudents in the sam ple o ver the past yenr.
TABLE 6.14 Marijuana Use
None
MatiJuanil
l eta I
Grade
9th 12th
42 33
23 22
65 55
Toto!
75
1 ~0
A higher proport ion of 12th-g raders
than 9th-graders in t his sample used mar-
ijua na at least once during t he past year.
The question we are interested in is
whether it is likely that >uch a sample
could have come from a population in
which the proportion.1 of 9th- and 12th-
graders using mc:1rijuana were identicaL
The usual test used to evaluate such
data is the x: test of i ndepcndcnce. The X1
test evaluates the likelihood that a per·
ccived relationsg1ip between propor tions
in categories (called being dependent)
C HAI'TEII: 6 • STATISTIC-S fOR. Soc•AL Wo~Kflt S 105
co uld have come from a po pulatio n in which no such
relationship existed (call ed
independence) .
The null hypothesis for this example would be that the same
proportion of 9th-graders
as 12th-graders used marijuana during the past year. The null
hypot hesis values for this
test are called the expected frequencies. These expected
frequencies ior marijuana are cal-
culated so as to be proportionately equal for bot h 9th- and 12th
-graders.
Because 45 of 120 of the total sample (9th· and 12th-graders)
used marijuana during
the past year, the proportion for t he total sample is 45f!20 =
.375. The expected frequency
of marijuana use for the sixty-live 9th-graders would be
.375(65) = 24 .375. T he expected
marijuana use fo rthe fifty-five 12th-graders would be .375(55)
= 20.625. Table 6.15 shows
the expected frequencies in parentheses.
The%' test evaluates the likelihe>od of the observed frequency
departing from the
expected freq uency. T he null hypothesis is
H,: P"'- P,,= O,
where P
0
, is the pro port ion of cases within category k in the null
hypothesis population
(e.xpected; in this case, this is the expected proportion of
students in each of the two gt·ade
levels [9th and 12th] who fell into o ne or t he other use
category [marijuana use or no
marijuana usc)}; and P,~ is the proportion of cases wi thin
categor y k drawn from the
actual population (observed; in this case, this is the obser ved
[or obtaine.d] proportion of
students in eacb of t he two grade levels [9th and 12th] who fell
into one or the other use
category [marijuana use or no marijuana use]).
The X'.,, test statistic is
Degrees o f freedom for a x' test of independence are computed
by multiplying the
number of rows minus I times the n umber of columns min us I
or
df= (Row - I )(Colum ns- 1)
TABlE 6.15 Observed and Expected Frequencies for Marijuana
Use
None
Marijuana
Total
9th
42 (40.625)
23 (24.375}
65
N01'E: Expwcd frequencies are in parentheses.
Grade
12th
33 (34.375)
22 (20.675)
55
Total
75
45
120
For Ollr example, this would be
d/=(2 -1}(2 1)=(1)(1)=1
Re.::all from our dbcussion of the ;'.lcNemar change te:.t that
we include the Yates cor
rection for continuit)· in the formula ,,hen df l . The equation
for the corrected test sta
tistic is as follows:
X
1 = I: (Vo- fr,l - 0.5)
1
ul>• /c
The form of the equ~tion tells us to suhtr.ltt the expected ;core
from the observed
>eore and take the ab:.olute value of the difference (make the
difference positive). Then.
subtract O.S fro m the absolute difference (I/., f. I -0.5) and
square t he result. Next. divide
by t he expected score. T his is re~1eated for ca<h observed and
expe<ted score pair. W hc u
we are finished , we sum the answers and obtnin the corre<ted
x· .. ,. test st.ttistic.
The reader might have noticed that t he con ection for the
McNemar c hange test wa,l
I.Q, whereas th e correct ion for the X' test of independence
(and the goodness-ol:fitiCit)
was 0.5. I will not go iuto an)' detail beyond sa)'ing that this is
be.::ause the McNemar
change test uses o nly half of the a••ailable cross-tabulation
cells ( two of four) to computl'
its x.'..,., ••hereas all cells Jre used to compute ;c,.. in the
independence and goodne~< of·
fit tl'sts.
Tnble 6.16 shows how 10 work out the ma rijuJna survey data.
For df= I and ex .05, the critical value fot· x',,.,. is 3.84. Ou r c
alculated value (X',,,l was
0. 1 09. Bec<Juse t he obtuiued (cakuloted) value did not
exceed t he critical value, we wou ld
not reject the null hypothesis at a= .05.
As before, the effe.::t <i>c measure is ";which is wmputed a• a
post h oc measure by
w - Ji.x'/N).
~or a 2 >< 2 tab le, w;, eq ual to the absolute v.tlue of <p (phi),
which i, J true cor relation
cocfticient.. If we sq uare w, t hen we obta in tp' , w h ich is the
propor tion of variance
ex plained (P1£).
T AILE 6.16 Compuution of x' ...
CJb,crved (f0 ) Expected (1, ) (If.- f, J - 0.5)
42 ~() 615 8/~
lJ 14 375 81~
23 ]4.375 .875
n 20 62~ 875
NOTE: 7.' = 0.01 9 + 0.02l + 0.031 + 0.037 ~ 0. 109.
bbt
(If. - f, J- 0.5)' (Jf.- f,l - 0 .5)'
f,
0.7651>2'> 0.019
0.76~6lS 0022
0765675 0.031
0.765625 0.037
CHAI'tfft 6 • Sr.t.nsncs FOil So C-I.t.l WOII(US 107
For our example,
w = /(O. J09/t 20) = Jo.ooo90S3 - oo3o i
and
w' = PVE - .0009.
This is an extremely smaU effect size.
f'or 2 x k tabulation, we cannot convert tv to PVE.
Hypothesis Tests fork > 2 Independent Samples
Irnaginc that we wert: in terested in ageist attitudes among
sodal 'Orkers. Specificall)'> we
are interested in whether there are any d ifferences in the
magnitudes of ageist attitudes
among (a) hospital social workers. ( b) nursing home social
workers, and (c) adult pro tee-
tive services social workers.
We cotdd conduct independent group tests among aU possible
pair ings: hospital (a) with
nursing home (b), hospital (a) with protective services (c), and
nursing home (b) with pro-
tective services (c).
This gives us three tests. When we conduct o ne test at the ex=
.05 levd, we have a
.05 chance of committing a Type I error (rejecting the null
hypothesis when it is tr ue) and
a .95 chance of making a correct decision (not rejecting the null
hypot~esis when it is
true). If 1ve conduct three tests at u = .05, our chance of commi
tting at least one Type I
error increases to about .15 (the precise probability is .
142625). So, we actually are testing
at around 0'. = . 15.
As the number of comparisons incceases, t·he likelihood of
rejecting the null hypothe-
sis "rhen it is true increases. oVe are ((capitalizing on chattce
.'>
One way of dealing with capitalization on chance would be to
use a stricter alpha
leveL f'o r three co mpa risons, we m ight cond uct our tests at u
"' .05/3 "' .0 167.
Unfortunately, if we do th is, then we will reduce the po,ver ( I
- ~) of o ur test to detect a
possible existing effect.
However, there are tests that allow one to detect whether there
are any differences
among groups wiLhout compromising power. This is done by
siJnultaneously eva1U(lting
all groups for any differences. If no d ifferences are detected,
then we fai l to reject the null
hypothesis and stop. No further tests are conducted because w e
already have our ans11w.
The difference> among all gro ups are not sufficien tly large
that we can reject the notion
that all of the samples come from the s ame population.
If significant differences are detected, then further pair
comparisons are conducted to
determine which pairs arc different. T he screening tests do not
tell us whether only one
pair, two pairs, o r all pairs show statistically significant
differences. Screening tests show
only that there are some differences among all possible
comparisons.
lf we conduct our screening test at a ,. .OS, then we will carry
out the pair comparisons
when the null hypothesis is true 1 out of20 times (commit a
Type I error). By conducting
the in itial overall screening in a single test, we protect against
the compounding o f the
alpha level brought on by multiple comparisons.
We look at three examples of screen ing tests fork> 2
independent samples:
I. One-way analysis o f variance (ANOVA) (interval or ratio
scale)
2. Kruskal· Wallis (K· W) test (ordinal scale)
3. X1 test of independence (k x k) (nominal scale)
108 '""' I • QUANTITATIVl AmtOA.CIILS : fOU"-DATIOJr.S
Of DA'rA C.olUCltOh'
One· Way A011dysis of'ariance. The AtOVA is a test of
means. The null hypothesis is
where k is the number of population nocans being estimated.
If all of the means are equal, then it fo llows that the voriance
of the means is 0 or
I 10 : &,. = 0.
The test statistic used in A..'OVA is called F and is calculated
as follows:
n_.;
7
where the numerator is the variance of the sample means mu
ltiplied by the sample size,
and the denominator is a pooled estimntc of the score variances
within the samples.
The assumptions underlying o ne-way ANOVA are as follows:
Randomness: Sample members must be randomly drawn from
the population and randomly
assigned to one of the k groups.
Indepelltltllct: Scores must be independent of each other.
Scalir~g: The dependent measure must be interval or ratio.
Normnl distribution: The populations from which the
individuals in the sam ples were
drawn must be normally d istributed.
Homoge11ciry of variances (oi = o~ = .. . = o~): The samples
must be drawn from pop·
ulntions whose variances arc equal.
&jualiry of sample sizes (n, = n, = ... = 11,): The samples must
be of the same size.
ANOVA involves taking the variability among scores and
detumining which is vari·
ability due to membership in a particular group (variability
a.~sociated with group means
or between-group variance) and which is variability associated
with unexplained fluctua·
tions (wi thin-group variance).
The totnl variability of scores is divided into one componenl
representing the variability
of treatment group means around an overall mean (sometimes
called a grand mean) and
another component representing the variability of group scores
around their own individ·
ual group means. The variability of group means around the
grand mean is called between·
group variance. The variabiliry of individual scores around their
own group means is called
within-group variance. This division is rep.--nted by the
foUowing equation:
{X - X)~ (X -Xl +(X-X).
Total Within Between
The X with two bars represems the grand mean, which is the
mean of all scores with·
out respect to which group they are in. X is a particular score,
and the X with one bar is
the mean of the group to which that score belongs.
C.HAPlUt 6 a STATiiliGS roll: SOCIAl W Oill({fi S 109
This equation illustrates that tbe deviatio n of the particul ar
score fro m t he grand mean
is the sLun of the deviation of the sco re fro m its g roup mean
and the deviation of tbe
g ro up mean fro m t he g rand mean. T his might be a little
dearer if we look at a simple data
set. Let us hlke the exam ple about ageist attit udes among
hospital social workers (Group I),
nursing ho me social workers (Gro up 2), a11d adult protective
services social workers
(Group 3). T be dependent measure quan tifies ageist attitudes
(higher scores represent
n1ore ageist sentiment).
There are k = 3 g ro ups, with each containing n = 4 scores. The
total number of scores
is N= 12. The group means are 3 (Gro up 1 ), 5 (G roup 2), and
9 (Grotlp 3), and the grand
mea n is 5.67.
There are t hree types of sum of squares calculated in AN OVA.
T he fo rm ulas fo r the
sums of sq uares are derived fro m t he deviatio n score C<j
uations.
ss, ...
1
is calculated by subtracting the grand mean from each score,
squaring the differ-
ences, and add ing up (summing) the squared differences:
=2
ss,."' = (X - Xl .
ss .... m is calculated by subtracting the group mean fro m each
score within a group,
squaring the differences, a nd adding up (summing) the squared
differences fo r each
g ro up. This gives us t hree s ums of squares: sswoup I'
SSC.,>I>p , . and SS.;ooup>· These are added
up to give us ssv.·ilhin:
- 2 - 2 - 2
ssW'''" = r <x - x,J + r <x - x,) + r <x - x,J .
s~.~ is calculated by subtracting t he g rand mea n from each
group mean, squaring
the diffe rences, and adding up (summing) the squared
differences. Then, we multiply the
to tal by the sample size. This is because this sum of squares
needs to be weighted. Whereas
N = 12 scores ~~ent to make up SS10,.1, and ( k)(n) = (3)(4) =
12 scores went to m ake up
SS., ... ,,,, o nly the k= 3 g roup means went to make upS~""'".
We m ultiply by 11 = •l so that
S~~ will have t he same " 'eig ht as tlte o ther two sums of
squares:
S~"'""' = " I (X - X)'.
The sums of squares arc as fo llow·s:
SS,.;,'"' = 20 + 20 + 20 = 60
s~ ..... ,"' (4) 18.667 = 74 .667
ss ... ,, = 134.667.
The to tal sum of squares (SS~,1 ) is t he sum of the within-g ro
up su m of sq <Lares
(SS.,.,.,) and the between-group sum of squares (55,....,,):
o r
134.667 = 60.00 + 74.667.
110 PAtH I a Q u AN11JA1 1V[ APPI0A(H£S: FOUIIOAltO~S
Of 0 AlA COlltCTIO!.'
Each of these sums o f squares is a component o f a d iffere nt
variance. In ANOVA jar-
gon, a variance is called a mean square. Each particular m ean
square ( variance) has its
own degrees of freedom .
Because the total sum o f squares (SS,.,1) involves t he varia
bility o f all scores aro und
o ne grand mean, the degrees of freedom ar e N - l. The within-
groups sum of squares
(SSw"''") involves the variability of all scores wit hin g roups
around k g ro up m eans, where
k is the n umber o f g ro ups. So, the within-groups degrees o f
freedo m are N- k. T he
between-groups sum of squares($""""') involves the va riability
of k gr o up m eans
around the grand mea n. So, the between-g roups degrees of
freed om are k - J.
BeCtJase :1 (/tlritlii<'Y:' (meoll sqa,?re) is,? Rllll of square>
diviOed br degrees of freedom,
the fo rmu la fo r a m ean square would be MS ~ SSitlf
Two mean squares are u::;ed to calcnlate the Fubt statistic:
MS~·i!Jun and A-f~,wMn · Their
specific fo rm ulas are as follows:
There are k ~ 3 groups, so df,"""" = k - 1 = 3- 1 = 2. We may
now compute
A•f""'" = i 4.66712 = 3i.333
and
T here are a to tal of N = 12 scores within k = 3. so di,;,,;0 =
12- 3 = 9 and MS .. n ,h;, ~ 60/9
~ 6.667.
These are the two variances u~ed ro m ake up the F ratio (F ••
,): MS.., • ...., and MS,.,,,,.
The fo rm LLla for F •• , is
MSt,.,w..,n
MSwulUn .
l f we plug in t he values from o ur example, t hen we obtain
fo~x = MSb""'"" = 37.333 = S.6s.
MS,,;,hin 6.667
This is a bit confusing when presented in bits aJ1d pieces. The
ANOVA sununary table
is a way of p resent ing t he information about the sums of
squares, degrees of freedom,
mean squares, and F statistics in a more easily understood
fashion. Table 6 . 17 uses the
example data.
Once we have computed the Poht' iL is compared to a critical F.
Because two variances
were used to calculate o ur F •• ,. there are two types of degrees
o f freedom asso ciated with
it: n umerator deg rees o f freedom (between g ro u ps) and de;w
.minator d egrees of freedom
(within g roups). T hese are used either to look up values in a
table o f the F distribution or
by computer programs to com pu te p values.
For our example, the n umerator degrees o f freedo m are df = 2
because 2 degr ees of
freedom were used in the calculation o f MS,""'"'' The d
enominator d egrees of freedom
C HJo i'IU 6 • S t ATISTIC.S fO ft S OCtAl 1N CIIUP.S 111
TABLE 6 . 17 ANOVA Summary Table
Source Sum of Squares Degrees of Fceedom Mean Squar~ F
11111
B~tween 74.667 3 - 1 - 2 74.67/2 = 37 333 37..333/6 667 = 5 65
Within
Total
60.00
134.667
12 - 3 - 9 60.00/ 9 = 6.667
12- 1 • 11
are df: 9 because 9 degrees of freedom were used in the
calculation of MS . .,,h;, · The criti-
cal value for Fat 2 and 9 degrees of freedom is .t~"' = 4.26.
Because F..,,: 5.6 is greater than
the critical value, we reject the null hypothesis at«= .OS.
Based on these findin gs, it is likely th at at least one pair of
means come from d ifferent
populations. Because we already have screened out other
opportuni ties LO commit'I)'Pe 1
error, further testing would not be capi[aiizing on chance. Thus,
we may carry out the fol-
lowing pair comparisons:
Group l versus Group 2
Group I versus Group 3
Group 2 versus Group 3
The individual pair comparisons may be carried out using any of
a number of multi-
ple comparison tests. One of the more frequently used is the
least significant difference
(LSD) test. The l.SD test is a variant on the t test. However, the
standard error of the mean
is calculated from the within-groups mean square (variance)
from the ANOVA:
where
tt, is the nwnber of scores in Group i, and
tt, is the number of scores in Group J.
If the group TIS are equal, then this becomes
For our example,
Sx;-.<_; = )(2}(6 .667)/4 = J3.333 = 0.557.
We now maycarry oul our comparisons evaluating tat df= N - k=
12 - 3 = 9 (Figure 6.6).
In all three instances, we reject the rwll hypothesis at a = .OS.
I
Figure 6 .6
Multiple Comparisons
Hospilal (Group I) vs t - 3 - 5 - 3466 df= 9,«= 05
Nursing Home (Group 2) "' - 0.577 - . / t!tl = 2.262
Reject H.
Hosprtal (Group 1) vs. r.,. •• g;~ = 10399 Clf = 9, a- .05
Adult Protective Services t .. , = 2.262
(Group 3) Reject H.
Nursrng Home (Group 2) '-=~5~ = 6.932 Clf = 9,a ~ 05 vs.
Adun Pro!ectrve la.= 2.262
Services (Group 3) Rejecl H.
T here are a number of measure> for effect size for ru'0'A. For
the >.Ike of srmplicity,
we d eal wit h rwo: Cohen'• (1988) J and 1{
The J effect· size mca>ure is eq ual to Lhe stand ard deviatio n
of th e sam ple means divided
by the pooled "ithin group standard devialion. It ranges from a
min imum of 0 to an
rndetinitcly large upper limit. It m~) be estimated from F..,. by
using the following for mula:
f = JnFobr·
11' wa, discussed earlier and defined as a proportion of variance
explarned. It is calcu-
laled by the fo llowing formula:
l S.'itwlwttn
1) =-- - . ss,,,,.,
It also may be calcul.lled from art F.,.:
Cohen ( 1988) categorizes these effect si1-"s into small,
medium, and large categories.
The critcri~ lor each are as folio" s:
Sm all cfYcct size: f :. .lO
Medium efYect size: f; .25
Large effect size: f .40
Using the exarn plr dJLa, 11' is
11' = .0 1
11' ; .06
11'; . 14
z SSt.,,.... 74.667
'l = = 0.554.
ss,"'·'' t 34.667
CHArtfa 6 • Sr.c..nsTIC;.s fQI SociAL WoRKEss 113
which is a very large effect.
Kmskal-Wal!is Test. The K-W test is the k > 2 groups
equivalent o f the W/M -W test.
The test involves iniliall y treating all samples as one gro up
and ranking scores from
least to most. After this is done, the frequenc ies of low and
high ranks among groups <1re
compared.
The assumptions of the K-W test are as follows:
Rat~donmess: Sample members must be randomly drawn from
the population of inter-
est and randomly assigned to one of the k groups.
Independence: Scores must be independent of each other.
Scali?Jg: The dependent measure must be ordi nal (interval or
ratio scores must be con-
verted to ranks).
When the assumptions of ANOVA arc mer, the analysis of
variance will be sligh tly
more po<,•erful than the K -W test. However, if the distribution
of population scores is not
normal and/or the population variances are not equal. then the
K-W test might be the
more powerful test.
The K-W test is a screening test. If th ere is no significant
difference foun d, then we stop
testing. If a significant difference is fo und, then we proceed to
test ind ividual pairs with
the W/M -W test.
Our example involves the evaluation of three interven tion
techniques being used with
clients who wish to stop making negative self-statements: (a)
self-disputation,
(b) thought stopping, and (c) identifying the source of the
negative statement (insight). A
total o r 27 clients with this concern were randomly selected
and assigned to one of the
three intervention conditions. On the 28th day of the
intervention, each client counted
the n umber of negative self-statementS that he or she had
made.
The proced ure for tlle K-W test is s imilar to that for the W/M-
W test. We begin by
assigning ranks to the scores without regard to which group
individuals were in. We then
sum the ranks within each group. The sununed ranks are called
W, for Group I, W2 for
Group 2, and W, fo r Group 3 (Table 6 .18).
Trochim, W. M. K. (2006). Internal validity.httpwww.socialres
Trochim, W. M. K. (2006). Internal validity.httpwww.socialres
Trochim, W. M. K. (2006). Internal validity.httpwww.socialres
Trochim, W. M. K. (2006). Internal validity.httpwww.socialres
Trochim, W. M. K. (2006). Internal validity.httpwww.socialres
Trochim, W. M. K. (2006). Internal validity.httpwww.socialres
Trochim, W. M. K. (2006). Internal validity.httpwww.socialres
Trochim, W. M. K. (2006). Internal validity.httpwww.socialres
Trochim, W. M. K. (2006). Internal validity.httpwww.socialres
Trochim, W. M. K. (2006). Internal validity.httpwww.socialres
Trochim, W. M. K. (2006). Internal validity.httpwww.socialres
Trochim, W. M. K. (2006). Internal validity.httpwww.socialres
Trochim, W. M. K. (2006). Internal validity.httpwww.socialres
Trochim, W. M. K. (2006). Internal validity.httpwww.socialres
Trochim, W. M. K. (2006). Internal validity.httpwww.socialres
Trochim, W. M. K. (2006). Internal validity.httpwww.socialres
Trochim, W. M. K. (2006). Internal validity.httpwww.socialres
Trochim, W. M. K. (2006). Internal validity.httpwww.socialres
Trochim, W. M. K. (2006). Internal validity.httpwww.socialres
Trochim, W. M. K. (2006). Internal validity.httpwww.socialres
Trochim, W. M. K. (2006). Internal validity.httpwww.socialres

Weitere ähnliche Inhalte

Ähnlich wie Trochim, W. M. K. (2006). Internal validity.httpwww.socialres

Sampling distribution
Sampling distributionSampling distribution
Sampling distributionswarna dey
 
Basics of biostatistic
Basics of biostatisticBasics of biostatistic
Basics of biostatisticNeurologyKota
 
Business statistic ii
Business statistic iiBusiness statistic ii
Business statistic iiLenin Chakma
 
Evaluation Of A Correlation Analysis Essay
Evaluation Of A Correlation Analysis EssayEvaluation Of A Correlation Analysis Essay
Evaluation Of A Correlation Analysis EssayCrystal Alvarez
 
Week 5 Lecture 14 The Chi Square TestQuite often, patterns of .docx
Week 5 Lecture 14 The Chi Square TestQuite often, patterns of .docxWeek 5 Lecture 14 The Chi Square TestQuite often, patterns of .docx
Week 5 Lecture 14 The Chi Square TestQuite often, patterns of .docxcockekeshia
 
Analyzing experimental research data
Analyzing experimental research dataAnalyzing experimental research data
Analyzing experimental research dataAtula Ahuja
 
Marketing Research Hypothesis Testing.pptx
Marketing Research Hypothesis Testing.pptxMarketing Research Hypothesis Testing.pptx
Marketing Research Hypothesis Testing.pptxxababid981
 
Sampling and statistical inference
Sampling and statistical inferenceSampling and statistical inference
Sampling and statistical inferenceBhavik A Shah
 
Statistics Based On Ncert X Class
Statistics Based On Ncert X ClassStatistics Based On Ncert X Class
Statistics Based On Ncert X ClassRanveer Kumar
 
In the t test for independent groups, ____.we estimate µ1 µ2.docx
In the t test for independent groups, ____.we estimate µ1 µ2.docxIn the t test for independent groups, ____.we estimate µ1 µ2.docx
In the t test for independent groups, ____.we estimate µ1 µ2.docxbradburgess22840
 
Mpu 1033 Kuliah 9
Mpu 1033 Kuliah 9Mpu 1033 Kuliah 9
Mpu 1033 Kuliah 9SITI AHMAD
 
Assignment 2 Tests of SignificanceThroughout this assignment yo.docx
Assignment 2 Tests of SignificanceThroughout this assignment yo.docxAssignment 2 Tests of SignificanceThroughout this assignment yo.docx
Assignment 2 Tests of SignificanceThroughout this assignment yo.docxrock73
 
Estimation in statistics
Estimation in statisticsEstimation in statistics
Estimation in statisticsRabea Jamal
 
Analyzing experimental research data
Analyzing experimental research dataAnalyzing experimental research data
Analyzing experimental research dataAtula Ahuja
 
QUESTION 1Question 1 Describe the purpose of ecumenical servic.docx
QUESTION 1Question 1 Describe the purpose of ecumenical servic.docxQUESTION 1Question 1 Describe the purpose of ecumenical servic.docx
QUESTION 1Question 1 Describe the purpose of ecumenical servic.docxmakdul
 
Forecasting Academic Performance using Multiple Linear Regression
Forecasting Academic Performance using Multiple Linear RegressionForecasting Academic Performance using Multiple Linear Regression
Forecasting Academic Performance using Multiple Linear Regressionijtsrd
 
Introduction-to-Statistics.pptx
Introduction-to-Statistics.pptxIntroduction-to-Statistics.pptx
Introduction-to-Statistics.pptxAlaaKhazaleh3
 
EXERCISE 27I WILL SEND THE DATA TO WHOM EVER WILL DO THE ASSIGNMEN.docx
EXERCISE 27I WILL SEND THE DATA TO WHOM EVER WILL DO THE ASSIGNMEN.docxEXERCISE 27I WILL SEND THE DATA TO WHOM EVER WILL DO THE ASSIGNMEN.docx
EXERCISE 27I WILL SEND THE DATA TO WHOM EVER WILL DO THE ASSIGNMEN.docxAlleneMcclendon878
 

Ähnlich wie Trochim, W. M. K. (2006). Internal validity.httpwww.socialres (20)

Sampling distribution
Sampling distributionSampling distribution
Sampling distribution
 
Basics of biostatistic
Basics of biostatisticBasics of biostatistic
Basics of biostatistic
 
Business statistic ii
Business statistic iiBusiness statistic ii
Business statistic ii
 
Evaluation Of A Correlation Analysis Essay
Evaluation Of A Correlation Analysis EssayEvaluation Of A Correlation Analysis Essay
Evaluation Of A Correlation Analysis Essay
 
Week 5 Lecture 14 The Chi Square TestQuite often, patterns of .docx
Week 5 Lecture 14 The Chi Square TestQuite often, patterns of .docxWeek 5 Lecture 14 The Chi Square TestQuite often, patterns of .docx
Week 5 Lecture 14 The Chi Square TestQuite often, patterns of .docx
 
Analyzing experimental research data
Analyzing experimental research dataAnalyzing experimental research data
Analyzing experimental research data
 
Marketing Research Hypothesis Testing.pptx
Marketing Research Hypothesis Testing.pptxMarketing Research Hypothesis Testing.pptx
Marketing Research Hypothesis Testing.pptx
 
Sampling and statistical inference
Sampling and statistical inferenceSampling and statistical inference
Sampling and statistical inference
 
Statistics Based On Ncert X Class
Statistics Based On Ncert X ClassStatistics Based On Ncert X Class
Statistics Based On Ncert X Class
 
In the t test for independent groups, ____.we estimate µ1 µ2.docx
In the t test for independent groups, ____.we estimate µ1 µ2.docxIn the t test for independent groups, ____.we estimate µ1 µ2.docx
In the t test for independent groups, ____.we estimate µ1 µ2.docx
 
Mpu 1033 Kuliah 9
Mpu 1033 Kuliah 9Mpu 1033 Kuliah 9
Mpu 1033 Kuliah 9
 
Assignment 2 Tests of SignificanceThroughout this assignment yo.docx
Assignment 2 Tests of SignificanceThroughout this assignment yo.docxAssignment 2 Tests of SignificanceThroughout this assignment yo.docx
Assignment 2 Tests of SignificanceThroughout this assignment yo.docx
 
Estimation in statistics
Estimation in statisticsEstimation in statistics
Estimation in statistics
 
Analyzing experimental research data
Analyzing experimental research dataAnalyzing experimental research data
Analyzing experimental research data
 
Chi‑square test
Chi‑square test Chi‑square test
Chi‑square test
 
Basic statistics
Basic statisticsBasic statistics
Basic statistics
 
QUESTION 1Question 1 Describe the purpose of ecumenical servic.docx
QUESTION 1Question 1 Describe the purpose of ecumenical servic.docxQUESTION 1Question 1 Describe the purpose of ecumenical servic.docx
QUESTION 1Question 1 Describe the purpose of ecumenical servic.docx
 
Forecasting Academic Performance using Multiple Linear Regression
Forecasting Academic Performance using Multiple Linear RegressionForecasting Academic Performance using Multiple Linear Regression
Forecasting Academic Performance using Multiple Linear Regression
 
Introduction-to-Statistics.pptx
Introduction-to-Statistics.pptxIntroduction-to-Statistics.pptx
Introduction-to-Statistics.pptx
 
EXERCISE 27I WILL SEND THE DATA TO WHOM EVER WILL DO THE ASSIGNMEN.docx
EXERCISE 27I WILL SEND THE DATA TO WHOM EVER WILL DO THE ASSIGNMEN.docxEXERCISE 27I WILL SEND THE DATA TO WHOM EVER WILL DO THE ASSIGNMEN.docx
EXERCISE 27I WILL SEND THE DATA TO WHOM EVER WILL DO THE ASSIGNMEN.docx
 

Mehr von curranalmeta

1 PageAPAsources2Today, many more organizations leverage worl.docx
1 PageAPAsources2Today, many more organizations leverage worl.docx1 PageAPAsources2Today, many more organizations leverage worl.docx
1 PageAPAsources2Today, many more organizations leverage worl.docxcurranalmeta
 
1) A major corporation agrees to sponsor an internal study on sexual.docx
1) A major corporation agrees to sponsor an internal study on sexual.docx1) A major corporation agrees to sponsor an internal study on sexual.docx
1) A major corporation agrees to sponsor an internal study on sexual.docxcurranalmeta
 
1  Identify a large racial minority group in U.S. history What has.docx
1  Identify a large racial minority group in U.S. history What has.docx1  Identify a large racial minority group in U.S. history What has.docx
1  Identify a large racial minority group in U.S. history What has.docxcurranalmeta
 
1   With any new technology, it takes time for ethical, social, and.docx
1   With any new technology, it takes time for ethical, social, and.docx1   With any new technology, it takes time for ethical, social, and.docx
1   With any new technology, it takes time for ethical, social, and.docxcurranalmeta
 
1 paragraphNo plagiarism 2 Sources Grading RubricBasic r.docx
1 paragraphNo plagiarism 2 Sources Grading RubricBasic r.docx1 paragraphNo plagiarism 2 Sources Grading RubricBasic r.docx
1 paragraphNo plagiarism 2 Sources Grading RubricBasic r.docxcurranalmeta
 
1 PageDiscuss capitation payment methodologies between payers .docx
1 PageDiscuss capitation payment methodologies between payers .docx1 PageDiscuss capitation payment methodologies between payers .docx
1 PageDiscuss capitation payment methodologies between payers .docxcurranalmeta
 
1) Anytime an owner removes any asset for personal use it is recor.docx
1) Anytime an owner removes any asset for personal use it is recor.docx1) Anytime an owner removes any asset for personal use it is recor.docx
1) Anytime an owner removes any asset for personal use it is recor.docxcurranalmeta
 
1 Page Length In this lab, you will observe the time progressi.docx
1 Page Length In this lab, you will observe the time progressi.docx1 Page Length In this lab, you will observe the time progressi.docx
1 Page Length In this lab, you will observe the time progressi.docxcurranalmeta
 
1 page orignal answer needed·A number of organizations exist.docx
1 page orignal answer needed·A number of organizations exist.docx1 page orignal answer needed·A number of organizations exist.docx
1 page orignal answer needed·A number of organizations exist.docxcurranalmeta
 
1) discuss 2 major political implications of the cold war in europe..docx
1) discuss 2 major political implications of the cold war in europe..docx1) discuss 2 major political implications of the cold war in europe..docx
1) discuss 2 major political implications of the cold war in europe..docxcurranalmeta
 
1) Discuss the benefits of using comparative analysis of governments.docx
1) Discuss the benefits of using comparative analysis of governments.docx1) Discuss the benefits of using comparative analysis of governments.docx
1) Discuss the benefits of using comparative analysis of governments.docxcurranalmeta
 
1) Discuss essential and non-essential amino acids. Explain why prot.docx
1) Discuss essential and non-essential amino acids. Explain why prot.docx1) Discuss essential and non-essential amino acids. Explain why prot.docx
1) Discuss essential and non-essential amino acids. Explain why prot.docxcurranalmeta
 
1 Team Assignment (Will let you know which portion to complete).docx
1 Team Assignment (Will let you know which portion to complete).docx1 Team Assignment (Will let you know which portion to complete).docx
1 Team Assignment (Will let you know which portion to complete).docxcurranalmeta
 
1) Body of paper must be 4-5 pages 12 pitch Times Roman Double.docx
1) Body of paper must be 4-5 pages  12 pitch  Times Roman  Double.docx1) Body of paper must be 4-5 pages  12 pitch  Times Roman  Double.docx
1) Body of paper must be 4-5 pages 12 pitch Times Roman Double.docxcurranalmeta
 
1) CBS and the NYT conducted a national poll of 1048 randomly select.docx
1) CBS and the NYT conducted a national poll of 1048 randomly select.docx1) CBS and the NYT conducted a national poll of 1048 randomly select.docx
1) CBS and the NYT conducted a national poll of 1048 randomly select.docxcurranalmeta
 
1 page each question! double spaced Times New Roman 12 font CITE.docx
1 page each question! double spaced Times New Roman 12 font CITE.docx1 page each question! double spaced Times New Roman 12 font CITE.docx
1 page each question! double spaced Times New Roman 12 font CITE.docxcurranalmeta
 
1) Backers of a balanced-budget amendment to the Constitution might .docx
1) Backers of a balanced-budget amendment to the Constitution might .docx1) Backers of a balanced-budget amendment to the Constitution might .docx
1) Backers of a balanced-budget amendment to the Constitution might .docxcurranalmeta
 
Trust in Team DynamicsTrust is a key aspect in team dynami
Trust in Team DynamicsTrust is a key aspect in team dynamiTrust in Team DynamicsTrust is a key aspect in team dynami
Trust in Team DynamicsTrust is a key aspect in team dynamicurranalmeta
 
Turnitin Plagiarism checker enabled.What is the definition of
Turnitin Plagiarism checker enabled.What is the definition of Turnitin Plagiarism checker enabled.What is the definition of
Turnitin Plagiarism checker enabled.What is the definition of curranalmeta
 
Try to go to a museum that you have not been to before if your ass
Try to go to a museum that you have not been to before if your assTry to go to a museum that you have not been to before if your ass
Try to go to a museum that you have not been to before if your asscurranalmeta
 

Mehr von curranalmeta (20)

1 PageAPAsources2Today, many more organizations leverage worl.docx
1 PageAPAsources2Today, many more organizations leverage worl.docx1 PageAPAsources2Today, many more organizations leverage worl.docx
1 PageAPAsources2Today, many more organizations leverage worl.docx
 
1) A major corporation agrees to sponsor an internal study on sexual.docx
1) A major corporation agrees to sponsor an internal study on sexual.docx1) A major corporation agrees to sponsor an internal study on sexual.docx
1) A major corporation agrees to sponsor an internal study on sexual.docx
 
1  Identify a large racial minority group in U.S. history What has.docx
1  Identify a large racial minority group in U.S. history What has.docx1  Identify a large racial minority group in U.S. history What has.docx
1  Identify a large racial minority group in U.S. history What has.docx
 
1   With any new technology, it takes time for ethical, social, and.docx
1   With any new technology, it takes time for ethical, social, and.docx1   With any new technology, it takes time for ethical, social, and.docx
1   With any new technology, it takes time for ethical, social, and.docx
 
1 paragraphNo plagiarism 2 Sources Grading RubricBasic r.docx
1 paragraphNo plagiarism 2 Sources Grading RubricBasic r.docx1 paragraphNo plagiarism 2 Sources Grading RubricBasic r.docx
1 paragraphNo plagiarism 2 Sources Grading RubricBasic r.docx
 
1 PageDiscuss capitation payment methodologies between payers .docx
1 PageDiscuss capitation payment methodologies between payers .docx1 PageDiscuss capitation payment methodologies between payers .docx
1 PageDiscuss capitation payment methodologies between payers .docx
 
1) Anytime an owner removes any asset for personal use it is recor.docx
1) Anytime an owner removes any asset for personal use it is recor.docx1) Anytime an owner removes any asset for personal use it is recor.docx
1) Anytime an owner removes any asset for personal use it is recor.docx
 
1 Page Length In this lab, you will observe the time progressi.docx
1 Page Length In this lab, you will observe the time progressi.docx1 Page Length In this lab, you will observe the time progressi.docx
1 Page Length In this lab, you will observe the time progressi.docx
 
1 page orignal answer needed·A number of organizations exist.docx
1 page orignal answer needed·A number of organizations exist.docx1 page orignal answer needed·A number of organizations exist.docx
1 page orignal answer needed·A number of organizations exist.docx
 
1) discuss 2 major political implications of the cold war in europe..docx
1) discuss 2 major political implications of the cold war in europe..docx1) discuss 2 major political implications of the cold war in europe..docx
1) discuss 2 major political implications of the cold war in europe..docx
 
1) Discuss the benefits of using comparative analysis of governments.docx
1) Discuss the benefits of using comparative analysis of governments.docx1) Discuss the benefits of using comparative analysis of governments.docx
1) Discuss the benefits of using comparative analysis of governments.docx
 
1) Discuss essential and non-essential amino acids. Explain why prot.docx
1) Discuss essential and non-essential amino acids. Explain why prot.docx1) Discuss essential and non-essential amino acids. Explain why prot.docx
1) Discuss essential and non-essential amino acids. Explain why prot.docx
 
1 Team Assignment (Will let you know which portion to complete).docx
1 Team Assignment (Will let you know which portion to complete).docx1 Team Assignment (Will let you know which portion to complete).docx
1 Team Assignment (Will let you know which portion to complete).docx
 
1) Body of paper must be 4-5 pages 12 pitch Times Roman Double.docx
1) Body of paper must be 4-5 pages  12 pitch  Times Roman  Double.docx1) Body of paper must be 4-5 pages  12 pitch  Times Roman  Double.docx
1) Body of paper must be 4-5 pages 12 pitch Times Roman Double.docx
 
1) CBS and the NYT conducted a national poll of 1048 randomly select.docx
1) CBS and the NYT conducted a national poll of 1048 randomly select.docx1) CBS and the NYT conducted a national poll of 1048 randomly select.docx
1) CBS and the NYT conducted a national poll of 1048 randomly select.docx
 
1 page each question! double spaced Times New Roman 12 font CITE.docx
1 page each question! double spaced Times New Roman 12 font CITE.docx1 page each question! double spaced Times New Roman 12 font CITE.docx
1 page each question! double spaced Times New Roman 12 font CITE.docx
 
1) Backers of a balanced-budget amendment to the Constitution might .docx
1) Backers of a balanced-budget amendment to the Constitution might .docx1) Backers of a balanced-budget amendment to the Constitution might .docx
1) Backers of a balanced-budget amendment to the Constitution might .docx
 
Trust in Team DynamicsTrust is a key aspect in team dynami
Trust in Team DynamicsTrust is a key aspect in team dynamiTrust in Team DynamicsTrust is a key aspect in team dynami
Trust in Team DynamicsTrust is a key aspect in team dynami
 
Turnitin Plagiarism checker enabled.What is the definition of
Turnitin Plagiarism checker enabled.What is the definition of Turnitin Plagiarism checker enabled.What is the definition of
Turnitin Plagiarism checker enabled.What is the definition of
 
Try to go to a museum that you have not been to before if your ass
Try to go to a museum that you have not been to before if your assTry to go to a museum that you have not been to before if your ass
Try to go to a museum that you have not been to before if your ass
 

Kürzlich hochgeladen

Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 

Kürzlich hochgeladen (20)

Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 

Trochim, W. M. K. (2006). Internal validity.httpwww.socialres

  • 1. Trochim, W. M. K. (2006). Internal validity. http://www.socialresearchmethods.net/kb/intval.php Please follow link:^^^^^ Social Work Research: Chi Square Molly, an administrator with a regional organization that advocates for alternatives to long-term prison sentences for nonviolent offenders, asked a team of researchers to conduct an outcome evaluation of a new vocational rehabilitation program for recently paroled prison inmates. The primary goal of the program is to promote full-time employment among its participants. To evaluate the program, the evaluators decided to use a quasi- experimental research design. The program enrolled 30 individuals to participate in the new program. Additionally, there was a waiting list of 30 other participants who planned to enroll after the first group completed the program. After the first group of 30 participants completed the vocational program (the “intervention” group), the researchers compared those participants’ levels of employment with the 30 on the waiting list (the “comparison” group). In order to collect data on employment levels, the probation officers for each of the 60 people in the sample (those in both the intervention and comparison groups) completed a short survey on the status of each client in the sample. The survey contained demographic questions that included an item that inquired about the employment level of the client. This was measured through variables identified as none, part-time, or full-time. A hard copy of the survey was mailed to each probation officer and a stamped, self-addressed envelope was provided for return of the survey to the researchers. After the surveys were returned, the researchers entered the data into an SPSS program for statistical analysis. Because both the independent variable (participation in the vocational
  • 2. rehabilitation program) and dependent variable (employment outcome) used nominal/categorical measurement, the bivariate statistic selected to compare the outcome of the two groups was the Pearson chi-square. After all of the information was entered into the SPSS program, the following output charts were generated: TABLE 1. CASE PROCESSING SUMMARY Cases Valid Missing Total N Percent N Percent N Percent Program Participation *Employment 59 98.3% 1 1.7% 60 100.0% TABLE 2. PROGRAM PARTICIPATION *EMPLOYMENT CROSS TABULATION Employment Total None
  • 3. Part-Time Full-Time Program Participation Intervention Group Count % within Program Participation 5 16.7% 7 23.3% 18 60.0% 30 100.0% Comparison Group Count % within Program Participation 16 55.2% 7 24.1% 6 20.7% 29 100.0% Total Count % within Program Participation 21 35.6% 14 23.7% 24
  • 4. 40.7% 59 100.0% TABLE 3. CHI-SQUARE TESTS Value df Asymp. Sig. (2-sided) Pearson Chi-Square 11.748a 2 .003 Likelihood Ratio 12.321 2 .002 Linear-by-Linear Association 11.548 1 .001 N of Valid Cases 59 a. 0 cells (.0%) have expected count less than 5. The minimum expected count is 6.88. The first table, titled Case Processing Summary, provided the sample size (N = 59). Information for one of the 60 participants was not available, while the information was collected for all of the other 59 participants. The second table, Program Participation Employment Cross Tabulation, provided the frequency table, which showed that among participants in the intervention group, 18 or 60% were found to be employed full time, while 7 or 23% were found to be employed part time, and 5 or 17% were unemployed. The corresponding numbers for the comparison group (parolees who
  • 5. had not yet enrolled in the program but were on the waiting list for admission) showed that only 6 or 21% were employed full- time, while 7 or 24% were employed part time, and 16 or 55% were unemployed. The third table, which provided the outcome of the Pearson chi- square test, found that the difference between the intervention and comparison groups were highly significant, with a p value of .003, which is significantly beyond the usual alpha-level of .05 that most researchers use to establish significance. These results indicate that the vocational rehabilitation intervention program may be effective at promoting full-time employment among recently paroled inmates. However, there are multiple limitations to this study, including that 1) no random assignment was used, and 2) it is possible that differences between the groups were due to preexisting differences among the participants (such as selection bias). Potential future studies could include a matched comparison group or, if possible, a control group. In addition, future studies should assess not only whether or not a recently paroled individual obtains employment but also the degree to which he or she is able to maintain employment, earn a living wage, and satisfy other conditions of probation. (Plummer 63-65) Plummer, Sara-Beth, Sara Makris, Sally Brocksen. Social Work Case Studies: Concentration Year. Laureate Publishing, 10/21/13. VitalBook file. The citation provided is a guideline. Please check each citation for accuracy before use. Statistics for Social Workers J. Timothy Stocks
  • 6. tatrstrrsrefers to a branch ot mathematics dealing '"'th the direct de<erip- tion of sample or population characteristics and the an.ll)'5i• of popula· lion characteri>tics b)' inference from samples. It co•·ers J wide range of content, including th~ collection, organization, and interpretJtion of data. It is divided into two broad categoric>: de;cnptive >lathrics and inferential >lJt ost ics. Descriptive statistics involves the CQnlputation of statistics or pnr.1meters to describe a sample' or a popu lation _~ All t he data arc available and used in <.omputntlon o f t hese aggregate characteristics. T his may involve reports of central tendency or v.~r i al>il i ty of single variables (univariate statistics). ll also may involve enumeration of the I'Ciation- sh ips between or among two or moo·e variables' (bivariate or multivariJte stot istics}. Descriptiw statistics arc used 10 provide information about a large m.b> of data in a form that ma)' be easily understood. The defining characteristic of descriptive ;tJtistks b that the product is a report, not .on inference. Inferential statisti<> imolvc' the construction of a probable description of the charac· teristics of a population b•sed on s.unple data. We compute statistics from .1 pJrtial;et of the population data (a samplt) to estimate the population parameters. Thrse t<timates are not exact, but ·e can mo~k..: reawnable judgments as w
  • 7. hoV preruc our c~lim:ues are. Included within inferential statiwcs i;, hypothesis testing, a procedure for U>ing mathe- m:uics tO provide evidence for the exi<tence of relationships between o r among variable;. T bis testing is a form of inferential •"l~umem. Descriptive Statistics Measures of Central Tendency Measures of central tenden')' are individual numbers that typify the tot.tl set of ~cores. The three most frequently used mca>urcs of centraltendenq are the arithmetic mean, the mode, and the median. Arir!Jmeric .1ea11. The arithmetic mean usually is simply called the mca11. It also is called the m-erage. It is computed b)' adding up all of a set of scores and dwidmg by the number of scores in the set. The algebraic representation of this is 75 76 PA11 f I • OuANTifAllVi AffkOAGHU: fouHo~;noM Of Ot.r"' CO ltf(TIO'J ~, =l:: X , 11 where 11 represents the popu I at ion mean, X represems an individual score, and rr is t he number of scores being adde(l.
  • 8. The formula for the sample mean is the same except t hat the mean is represented by the variable lener with a bar above it: - l:;X X= --. II Following are t he numbers of class periods skipped by 20 seventh-graders d uring I week: {1, 6,2,6, 15,2(),3,20, 17, 11, 15, 18,8,3, 17, 16, 14, 17,0, 101. Wecomputethe mean by adding up the class periods missed and dh•iding by 20: l:;X 219 • J.l = -- = - = 10.9o. II 20 Mode. The mode is the most frequently appearing score. It really is not so much a measure of centrality as it is a measure of typicalness. It is found by o rganizing scores int o a fre- quency distribution and determining which score has t he greatest fre- TABLE 6 . 1 Truancy Scores quency. Table 6. 1 displays the truancy scores arranged in a frequency distribution. Score 20 19
  • 10. 0 0 l I 0 1 0 2 0 0 2 0 Because 17 is the most frequently appearing number, the mode (or modal number) of class periods skipped is 17. Unlike the mean or median, a distribution o f scores can have more than one mode. ,llfedinrr. lf we take all the scores in a set of scores, place t hem in o rder from least to greatest, and count in to the middle, then the score in the middle is the median. This is easy enough if there is an odd number of scores. However, if there is an even number of scores, then there is no single score in the middle. In this case, t he two middle scores are selected, and their average is the median.
  • 11. There a.re 20 scores in the previous example. The median would be the a"erage of the lOth and lith scores. We usc t he frequency table to find these scores, which are 14 and J 5. T hus, the median is 14.5. Measures of Variabi li ty Whereas measures of central tendency are used to estimate a typical score in a dimibution, measures of variability may be thought of ns a way in which to measure departu re from typic<~lness. They pro"ide information on how "spread out" scores in a d istribution are. J<auge. The range is the easiest measure of variability to calculate. It is simply the distance from the minimum ( lowest) score in a distribution If 10 R :.aJ 13 de c .. ...nu 6 • STAnsnu t<~~ Soc&AL Wouta~ 77
  • 12. to the maximum ( highest) score. h is obtained by subtracting the 111ini murn score flom lhe maximum ~cor~. Let us compute th.- rang.- for the following dJt.l ~ct: /1, 6, 10, 14, 18,22/. 'T'he n1inimum i!) 2, and tht." tnJximum is 22: Range = 22 - 2 20. Sum ofSquaus. The sum of squares is a measure of the total amount of variability in" set of scores. Jts na me tells how to wmpute it. Smu ofsqunres is short (or sum ofsqumed dc1ti til ion scores. It is represented by the S)'lnbol SS. The formulas for sample and population sums ot squares are the same except for sam- ple and populat•on mean symbob: SS = I(X ~tl' Using the dJtJ set fo r t11e range, the sum of squnres would be computed as in 'ldble6.2. V.~rinuce. Another name for variance i~ mean square. This is short for mean of squared devintron score<. 1l1is is obtained by dividi ng the sum of squares by the number of scores (11). It is a me,tsure of the average amount of variabilit y associated with each score in a set of scores. The population variance fOI'mu la is
  • 13. ss a2= -. n whc1e cr2 is the syn>bol for populn tion variance, SS is the symbol fo r sum of squares, and 11 st,uJds for th e number of scores in the population. The variance for the example we used to compute sum of squares would be TAOLE 6.2 Computing the Sum of Squares X X m 2 tO 6 6 10 ] l<t 12 18 >6 12 10 NOTE, !X~ 72; n- 6; ~ • 12; l:(X - p)' ~ 780 (X - m)' 100 36
  • 14. 4 4 36 100 2 280 (J --= 46.67. 6 The sample variJnce is not an unbi.as.ed estin1a1o1 of thf population variance. If we compute the vari anccs for these samples using the SS/11 formula, then the- san1ple vadn nccs wil1 average o ut smaller than the population val'iance. For th is rc:~son, the sample variance is computed differently froru the population variance: ss sl = - - . II - I CHA,Ut 6 • Sr"n~nn HJa SOCIAl wouus 77 to the maximum (highc;t) score. h is obtained by subtracting the minimum scoo·c from the maximum score. let us compute the rnnge for the following data set:
  • 15. 12. 6, 10, 14, 18.221 . The minimum is 2. and the maximum is 22: Range 22-2 = 20. Sum of8qo~t~res. The ,um of squares;, a measure of the total amoun t o f variability in a set of score~. It> name tells how to compute it. Sum of 51Jo.arcs is short for ;um of squared dco•i- atiou scores. It is reprewnt<>tl by the symlxll SS. The formulas for <.omple and popul.llion sums of squares are the ~arne except tor S<J m - p le a nd population mean sym bols: ss l.(X -X)' Usi ng the data set for the range, t he su m of squares would be computed ns i n T.,b)e 6.2. ~rta11u. Another name for variance is mean square. This is short for menn of 51JIUtred devontw11 scores. This os obtained by dividing the sum of squares by the number of ><.ores (n). It is a measure of t he averoge ••m ount of var iability associated w ith each score in a set of scores. T he popula tio n variance for m11ln is ss ¢ =- . n where o ' is th e symbol foo· population v•o·ia.nc.e, SS is t he
  • 16. symbol fo o· Slim o f squares. a11d 11 stands for the numbet of scores in the population. The •-..ria nee for the example we used to compute sum of squar~s would be TABu 6.2 Computing the Sum of Squares X X-m 2 - 10 6 -6 10 -2 14 +2 18 +6 22 +10 HOT£: r.x- 72: n; ti; p = 12: l:lX Ill'= 250. (X- m)' 100 j(, 4 4 J&
  • 17. tOO 280 cr2 = 6 ~ 46.67. The snmple variance is uot Jn Ulbiased estimalor o f' t he population variance. Jf we com pute t he vari- ances for these samples using th" SShr formu la, then the sample variances will average out smaller than thc population ••ariance. For this reJson, the sample Vllriance is computed differe ntly from the population variance: ss r =-. n - J 78 PAll I • QuAiuu.ot.nvt A"MACH(S.:. FouHDAIIOif"i Of O.AIA CoLLfcnow The n - 1 i> a correction fac tor for this tendency to undcre>tima te. I t is c.1 lled degree• of freedon1. If <lur example we1< a sample. then the ,,ariance would be .1 280 > =-- 6 - 1 280 6 5 = 5.
  • 18. Sumdard Deviatron. Although the variance is a measure of average variability associJtc'<l wllh each score, it i> on a d ifferent sc.lle from the score itself. Tlw variance measures avel· age squared deviation from the mean. To get " me<tstne of averdgc variabili ty on the ;a rue scale as the original scores, we ta ke the squa 1·c rc)Ot of the varia nee. The st<tndard deviation is the square root of the variance. The fo rmula< are Using the same .ct of numbers as before, the population standard deviation would be cr -/46.67 = 6.83 . and the sample st.mdard deviation would be s J56 = 7.'18. For a normally d istribured set of scores, n ppwximately 68% of all ;cores will be within ll •tanrlard deviation of 1 he mean. Measures of Relationship T.1ble 6.3 shows the relat iortship between number of >treSsors experien<ed by a parent during .1 week and that parent's frequency of U>C of corporal punishment during the same wee.k. One can use ,·eg,·cssion procedures to dcrivr the line that best fo ts the data. This line is rcfel'l'ed to as a regression line (or line of best ii 1 o r prediction I inc). Su ch a line bas been .CJiculated for the example plot. It has a Y ime,·cept of - 3.555 t11id a slope of + 1.279. T his gives us the prediction equation of
  • 19. Y,_. = 3.555 t 1.279X, where Yis fi-equ ency o f <Orporal p unishment and X is stresso1 ~. This is graphically pre dieted in Figure 6 . 1. Slope is the ch•ngc in Y for a unit increase in X. So, the slope of 11.279 meam that''" increase in stres.ors (X) of 1 will be accomp.ulicd by an increase in predicted frequency of ~orporal punishment (I') of + 1.279 incidents per week. If the slope were a negati'e number, then an increase in X would be accompanied by a pred ictcd decrease in Y. The equation does not give the actual value of Y (called the obt.tined or obserwd score); rather, it giv~s a prediction of the value of Y for a certain value of X. Fo r - Cu,"na 6 • SrAliSnc<o 10~ So- '"' WOhi•C. 79 r iQUIO 6.1 8 Frequency ol Stre<sors and Use of Co•poral 7 0 Punishment ~
  • 20. 6 0 c . Y P'td; - 3.555 + 1.279X .. " 5 0 r:r e ... c 4 .. E .r: 3 til ·;: " Q. 2 0 0 0 0 1 2 3 4 5 6 7 8 9 Stressors example, if X were 3 , rhen we would predi<.t t hal Y would be - 3.555 + 1.279(3) ~ - 3.555 + 3.837 ~ 0.282. Tuu 6 . 3 frequency of Sttessors and Use of Corporal Punishment Sue-ssors Pun1.shm~nt 3 0 4
  • 21. 4 } s 3 6 4 7 ~ 8 6 7 q 8 1() 9 T he regression li ne is the line that predicts Y >UCh t hat t he error of p redictio n is minim ized. Error is d efined as the d ifference between the predicted score and the obtaine<l score. The equation for compu ting error is E= Y Y..,.. .. ~1en X= 4, there arc two obL1ined ''alues of Y: I and 2. The p redicted value of Y is Y,,...t = - 3.555 I 1.279( 4) = - 3.555 + S. l l6 ~ 1.56 1. rhe error of prediction i~ E =I - 1.561 = -0.561 fu r Y = I, and E - 2 - 1.561 = +0.<139fnr Y=2 . If we square each error difference score and sum the squares. then we get a quantity called the enor sum of sq.ure;., which i;. r~presented b)•
  • 22. SSI: L( Y - Y,..,.,)'. T he regressi011 line io !he o ne line that give> the sm.11lcst va lue fo r SSt. 80 P~oar 1 • QUAtHnAnvE A ,ROACHES: FouNOAHO~r~~$ of DAtA Conte I!Otf The SSE is a measure of the lOla I variability of obtained score values around their pre- dicted values. There are two other ;un" of squares !hat are important to undcr>tanding correlation and regri'SSion. The total sum of squ.m:s (SS1) i$ a measure of the total variabilit)' of the obtained score values around the mean oft he obtained scores. The SST is represented by SST = L(Y-Y)'. The remaining sum of squa 1·cs is coiled the regression sum of S<Ju:u·cs (SSR) o r the explained sum of squares. If we squnre each of the differences between prcdie1 cd scores and t he mean and then add t hem u p, w·c get the SSR, which is represented by SSR L( v, .... - Y)'. The SSR is a measure of the tot.d variabil ity of the predicted score values around the mean of the obtained scores.
  • 23. An important and interesting feature of the>e three sums of squares is that the sum of the SSR and SSE is equal to the SS1: SST SSR- SSE. This leads us to three o ther imponnnt stat istics: t he proportion of variance explJined (I'VE) , the correlation coefficient, ond the standard error of estim ate. Proportion of Iarin nee Expluir~ctl. T ht I'VE is a measure of how good Lhc rcs,·cssion line p red icts obtained scores. The values of PV£ 1·ange fro m 0 ( no p red ictive value) to I ( pre- diction with perfect accurJLy). The cqunt ion fo r PV£ is SSR J>vE - - · SST There also is a computational equation for the PVE. which is where PVE - ( SSXY )' SSX • SSY' SSXY is the "co variance" ~um of ;qua res: l.(X - X)( Y - Y ), SSX is t he sum of squares for vn rinble X: IlX - XJ', and SSYis the sum of squares for varinblc Y: 2:( Y - Y)'. The procedure fo r computing these sums of squares is outlined in Table 6.4.
  • 24. The proportion of v.triance in the freque ncy of corporal punishment thnl may be explained by stressors experienced ;, ( 4 6L5)1 3782.25 l'VE = - = = 0 .953. (48.1)(825) 3968.25 TABLE 6.4 Computation of r2 (PVE) y Y - y (Y- Y)' X X x (X - X)' (X X)( Y Y) 3 -33 10 .89 0 -4 5 20 .2 5 +1405 4 -2 3 5.29 -lS 12 .25 +80S 4 -23 529 2 -15 6 .25 < 5.75 5 - Ll 1.69 3 1.5 2.25 • 1.95 6 -ol 0 .09 < -o5 0.25 0 IS 7 +0./ 0.49 5 ·10.5 0.25 035 8 + II 2.89 6 ; 1.5 2 .25 • 2.55 7 TO.! 0.49 7 12.5 6 .25 11.75 9 +27 7.29 R t3.5 12.25 -19.45 10 +3 I 13 69 9 "'5 20.25 16.65 NOTE: Y - 6.3; SSY - 48. l; X = 4.5; S5X = 82.5; S5XY • •6 l S
  • 25. The PVEsometimes is en lied th~ coefticient of determination and is represented by the symbol r'. Correlation Co~ffirirm. A correlation coellicient also is a 111easure of th e strength of rela- tionship between two variables. The correlation cocfficicnt is represented by the letter r and can take on values between - 1 and + I inclu~ivc. The correlation coefficient always has the same sign a.< the slope. If one squares a correlation coefficient, then <me will obtain the PV£ It is computed using the following formula: SSXY r = -vr.;;S50sx""•""S;;;S;;o;Y For our examph: data, the correlation coefficient would be +61.5 ~ 61.5 +61.5 R --- = = = -0.976 . ./(18.1)(82.5) ¥'3968.25 62.994 Standard Error of Em mate. The standard error of estimate is the <tandard deviation of the prediction errors. It i< computed like any other standard deviation: the: square root of the SSE divided by the dcRn:es of freedom. The fi rst s tep is to compute the variance error (s:.J: ..1 'E
  • 26. SSE n-2 Notice that the value for degrees of freedom is 11 2 rather than 11 - l. The reason why we subtract 2 in this instance is that variance error (and standard Cfi'Or of c:stimatc) is a statistic describing characteristics of two variables. T hey deal with the error involved in the prediction of Y (one variable) from X {the other v.triable) . 'l he standard error of estimate is the square root of the variance error: Sf.= ...j(ij. The standard error of estimate tells us hOv spread out scores are with respect to their predicted values. If the error· scores ( E = Y- Y,.o~> are normally distributed around the prediction line, then about 68% of actual scores will foil between ±I :;,; of their predicted values. We can calculate the standard error of estimate using the foUowing computing formula: ( n-1) ( I -- r 2)(-------) , u-2 where s,. is the standard deviation of Y, r is the correlation coefficient fo r X and Y, and n is tl1e sample si7.c.
  • 27. for the example dat..1, this would be S£ = 2.3lli ((J -- .953) :~ = D = 2.311 ((0.47)~) = 2.311J0.053 = (0.230)(0.727) = 0 .167. Inferential Statistics: Hypothesis Testing The Null and Alternative Hypotheses Classical ;tatistical hypothesis testing is based on the evaluation of two rival hypothescs: the null hypothesis and the alrermltive hypothesis. We try to dete<:t relationsh ips by identifying changes that are unl ikely to have occurred simp!)• bccau~e of random fluctuat ions <If dependent measures. Statistical analysis is the usual procedure for identil)•ing ;uch relationsh•p>. The null hypothesis is the hypotltcsis that there is no relationship between two vari- ables. This implies that if the null hypothesis is true, then any apparent relationship in Mmples i> the resuh of random flu ctuations in the dependent meas ure or sampling error. Statistical hypothesis tests arc carried out on samples. for example, in nn experi- ment!// two-gro11p posttcst-only design, there would be a sample whose members received an intervention and a sample whose members did not. Both of these would be probability samples from a larger population. The interven tion >ample would reprcse>11
  • 28. Figure 6.2 The Null Hypothesis and Type I Error C14Anu 6 • StAJtmu f<M' Socw. Wouus 83 the popula tion of all individuals as if they had received the i.ntervt•ntion. Th e control sample would be repre<entative of the <ame popuiJtion of individuals as if the)· had not recei>·ed the inten-emion. lf the intervention had no effect, then th e populations would be iden tical. However, it would be unlikely that two samples from two ident ical popula tions would he ident ical. So, although the sample mea ns would be diffe rent, they would not rcpre>CtH any effect of t he independent variable. The apparent difference would be due to sampling error. Statistical hypothC$is tests invoh·e e'-aluating evidence from .amples to make inler- ences about populations. II is for this reason that the null hypothe>i> is a statement about population parameters. For example, o ne null hypothe>iS for I he previous design cou ld be stated as or as H, : ll = ~to = 0. H, stands for the null hypothC$iS. It is J letter H with J " ro
  • 29. subscript. It is a statement t.ha t the m~ans of the experime ntal ( Mean I) and cont rol ( Mean 2) popultnio'ls arc eq ual. To <:>tablish that a relat ionship exists between th e in tervention (independent Vilfi:tble) and the outcome (measure o f the dependent variable), we must collect eviden<C that allows us to reject the null h)'J>Othesis. Strictly speaking, we do not mak~ J decision as to whether the nul] hypoth eoi:. is correct. Ve evaluate the evidence to determine the ext<·nL to which it •cncls to confirn"' or disconfi rm the null hypothesis. If the evide nce wct·e suc.h that it is unlikely that an observed relationship would have ocwrrcd as the re.ult of sampling e r ror, then we would reject the null hypothesis. If the eviden«: were more ambiguous, then we would f.1il to reject the null hypothesis. The terms re;err and fail to rrjm carry the implicit under<tand- ing tlMt our decision might be in ert'or. Th e truth i, th at we n ever really know whethe r our decbio11 is correct. vVhen we reject the n ull hypothesh and it is true, we ltJve committed a Type I error. By setting certain statistic•! criteria beforehand, we can ~"tablish the prombiliry that we "•ill commit a 'JYpe l error. 'c decide what proportion of the time we arc willing to commit a Type l error. This proportion ( proba bility) is called a l1>ha (o:). If we n1e willing to reject the null hypothesis when it is true onl)• I in 20 times, thc11 we set our a level at .05. If' on ly I
  • 30. in 100 time>, then we set it at .0 I. Tbe probability that we will fail to reje<t the null hy]>Othesis when it is true (correct deci;ion) ts 1 - a (Figure 6.2). Situahon: NULL HYPOTH ESIS TRUE Deas1on ACSlllt Reject H, 1'ype I Error ex • the probability or rejecting the Null Hypo thes is when it is true Fail to Reject H, Correct Decision I a= the probability of not rejecttng the Nun Hypothesis wllcn 11 is true. 84 PAII t I • Qv.umr:.WI~ A PI'IOACHH: Fourwt. lt<m S OF 0 1.1A CotulCI!Oii Figure G.:Y The Nu ll Hypothesis and u Level The fol!pwing hypothesis would be evaluated by c<>mparing the difference between sample means: If' we carried out multiple samples from populations with identical. n>eans (the null hypothesis was true), then we would find that most of the vallles for the differences
  • 31. between the sample means wou ld not be 0. Figure 6.3 represents a distribm ion of the dif· fercn ces between sample means drawn from identical populations. The mean d ifference for the total distribution of samp le means is 0, and the standard deviation is 5. I f the differences are normally distributed, then approximately 68% of lhese differences will be between - 5 (z = - 1) and +5 (z= +l). Fully 95% of the differences in the distribution will fall between the range of -9.8 ( z =-1 .96} and +9.8 (z = +1 .96). If we drew a random sa mple from each population, it '~ould not be unusual to find a di ffer- ence between sample means of as mnch as 9 .8, even though the population means were the same. On the other hand, we would expect to fin d a difference more than 9.8 about 1 in 20 times. If we set our criterion fo r rejecting the null hypothesis such that a mean difference must be greater than +9.8 or less than - 9.8, tben we would commit a Type I error only 1 in 20 times (.OS) on average. O ur (J. level ( the probability of committing a Type l error) would be set at .05. The probability that a relationship or a difference of a certain size would be seen in a sample if the nuU hypothesis were true is represented by p. To reject the null hypothesis, p mu~t be less than or equal to <X. The probability of getting an effect this large or !~rger if the null hypothesis were true is less than or equal to the
  • 32. probability of making a Type l error that we ha,•e decided is acceptable. 1 - u = .95 - 4 - 3 - 1 0 +1 +3 +4 z - 20 - 15 - 10 - 5 0 +5 +10 +15 +20 X, -x2 a = .05 CH..,tU 6 • Sr.r.nsnu •o• SoctAt Wo~·~ui 85 Rejecting the H0: We believe that it i~ likely that the relationship in the sample IS gcncr alizablc to the population. Not rejutmg the H,; We do not believe that we have >umcient e1•idence to draw infer- ences about the populat ion. For the previous example, let us imagine that we ha-e set a= .OS. Al;o, imagine thJt we obtained a difference betwt-en the sample me.ms of 10. The probability that we would obtain a difference of +10 or - 10 would be equivalent to the probability of a z ~core g reater than +2.0 plus the probabilit y of a z ~core less th.111 - 2.0 o r .0228 + .0228 = .0156. This is o ur p value; p = .0456. Because p <a, we would reject the n ull hypothesis. Some texts create the impression that the alternative (or
  • 33. research or experimental) hypothes~ b simply tbc opposite of the null hypothesis. In fact, sometimes d1is nail·c alternative h)pothesis is used. However, it generally is not particularly useful to researchers. Usually. we nrc inrertsted i n defecting an in lcrvention effccl of a particu l :~r size. On certnin measu,·c,, we would be interested in .mwll effects (<:.g., death rate), whereas on others, o nly l~rger effects would be of interest. When we are inter<5ted in an effect of a particular •ize. we use a specific altemnti1e hypotbesil. that takes the following form: H, : f.l 1 - ~,.,;:: id I, where dis a difference of a particular size. If the test is a nondirectional I<'St, then the dif- ference in the alternative hypothesis would be expressed as an absolute value, ldl, to ohnw that either ,t positive or neg.tt tve differe~tct~ ;, involv~d. lt is custo mary to exprc>S the mea11 d i ffere nce in an II , in units of standard deviat ion. Such scores are called zsco,·es. T he diffe(ence is called an effect size. Effect sizes frequently are used in meta-analyse> of outcome studies to compare the relatic cllicacy of different t )'Pes of intencntioos acrOS> 'tudies. Cohen (1988) groups effect sizes into small, medium , and large cntegorics. The criteda for each arc al follows: Small effect >iu (d ~ .2): It is appro:rimatcly the effect size for
  • 34. the average difference in height (i.e., 0.5 inches and < = 2.1) between 15- and 16 year-old girls. Medium effect size (d • .5): It is ap proximately the effect size fo r t he average differc11ce in heigh t ( i.e., 1.0 inches and s~ 2.0) bNwccn 14- aud 18· year-old g ir ls. Large cff<Xl size (d: .8): rh1s is the same eflect size (tl = .8) as the avcrJge difference in height for 13- and 18-year-old girls. l ntuit iv<:ly. it would se..-m t hat we wo uld want to detect even ve1y >mall effect si ~t·s in our research. llo1Vever, t here is a practicdl trade-off involved. All o ther things being equal. the consistcllt detection of unaU effect >izc' requires very large (1l > 200) sample size,, Because 'cry large sample sizes require resources thdt might not be readily available, they might not be practical for all studies. Furthermore. there are c~rtail1 outcome vari- ables for which we would not be part icuia l'l y in terested in small effec t>. If we rejeCt t he null hypothesis, t hen we implicitly huvc decided that t he evidence >Up- ports the alternative hypothesis. If the alttrnative hypothc<is is true and we reject t he null hypothesis. then we have m3de a correct decision. However, if we fail to reject the null hypothesis and the alternati•e hypothesis is true, then we hJve committC'd a Type II error. A Type !I error involves the fa ilure to detect an existing effect
  • 35. (Figure 6.4). 86 P1o11r I • Qt•MmTM •; e A ?PIOAC HtS: Fou NDAti ON) o, 0.-.tA Contr'fiO'I Figur• 6 .4 The Null Hypoth<sis and Typo II Error Decision Reject 1io Fail to Reject H• Siluation: ALTERNATIVE HYPOTHESIS TRUE Result Correct 0 edslon 1 -13 a t he Alternative probabinty of rejecling tho Null Hypothesis when the Hypothesis is true. The power ot a test. Type II E n· or I}~ the p r Altornatlvo
  • 36. obability of not rejecling the Null Hypothesis w11e 11 the Hypothesis is true. Beta(~) is t he probdbility o f committing a Type rr error. This probability is eStdblished when we set our criterion for rejecting the null hypothesis. The probdbility of a correct decision (I - f3) is an importdnt probability. It is so important that it has a nJmc~power. Power refers to the probability t h.u "e will detect an eff«t of the size we have sckctcd. We should decide on the power (I - (3) as well as the a level before we carry out a sta- tistical test. just as with Type 1 error, we should decide beforehand how often we are will- ing to make a Type 11 error (fail to detect a certain effect size). This is our f3 level. The procedure for making such determinat ions is discussed in Cohen ( 1988). Assumptio ns for Statisti cal Hypothesis Tests Although assumptions arc diffc •·cm leu different tests, all tests of the uull hypo1 hcsis shn re two related assumptions: randomness nud independence. T he randomness assum ption is t hnt sample members m ust be randomly selected from the populatio n being evaluate d. If the sample is being divided into groups (e.g., trc:>tment and control), then assignment to gro ups al.<e> must be random. This is referred to as mn- rlom selection and random fWigmnem. The mathematical models that underlie statistical hypothesis
  • 37. testing depend on ran- dom sampling. If the samples Jre not random. then •<e cannot compute .111 accurate prob· ability (p) that the sample could have resulted if the null hypothesi~ were true. The independence asswnption t. that one member's score •<ill not innucncc another member's score. The only common re!Jtionship among group scores should be the inter- vention. One implication of this is t hat members of a group should not have any contact with each other so as nut to a !Teet each o ther's scores. Again, the mathematical models are dependent on the independence of sample scores. l f t he scores are not independent, t hen the probability (p) is, as before. >i mply n number t h•t has little to do with the p ro babilit)' of a Type I erro r. Parametric and Nonpara metric Hypothesis Tests Traditionally. hypothesis tests arc g rouped into parametric and nonp.trJntCt ric tests. T he names are misleading given th at one class of test has no more or less to do with popula- tion parameters than t he other. T he difference between t he two tests lies in the mathe matical assumptions used to compute the likelihood of a Type I error. Parametric tests are based on the assumption that t he populations from whkh the samples are drown are norm.•lly di~t rihuted. Non parametric tests do not have this rigid
  • 38. C HAJ>TEJI 6 • STATI 11(~ 1011: SOCIAl WO !U({IS 87 assumption. T hus, a non parametric test can be carr ied out on a broader range of data than can a parametric test. Nonparametric lests remain serviceable even in circumstances where parametric procedures collapse. When the populations from which we sample are nor mally distributed , and when all the other assumptions of t he parametric test are met, parametric test~ are slightly more powerful than non parametr ic tests. However, when the parametr ic assu mptions are not met, nonparametric tests are more powerful. Specific Hypothesis Tests •Ve now investigate several frequently used hypothesis te.m and issues surrounding their appropria te use. Where appropriate, parametric and nonparametric tes ts are presented together for ead1 type of design. Single-Sample Hypothesis Tests These are tests i n which a single sample is drawn. Comparisons are made between sample values and population parameters to see whether the sample differs in a statistically sig- nificant way fro m the parent populnt.ion. Occasionally, these tests are used to determine ~<hether a sample differs from some theoretical population.
  • 39. For example, we might wish to gather evidence as to whether a particular population was normally distributed. We would take a randon1 sample from this population and com· pare the <l istribution of scores to an artificially constructed, normally d istr ibuted set of scores. If there were a statistically significam difference, tben we would reject the hypothe- sis tlwt our sample came from~ normally distributed population (the null hypothesis}. Typicrully, these tests are not used for experiments. T hey tend to be used to demonstrate that certain strata within populations differ from t he population as a whole. Here, we investigate two single-sample test~: L Single-sample rtest (interval or ratio scale) 2. x' (chi-square) goodness of fit test (nominal scale) TIJe Single-Srmrple t Test. This rest usually is used to sec whether a strotum of a population is different on average from the population as a whole (e.g., are the mean wages received by social workers in Lansing different from the mean for aU social workers in M ichigaJJ?) . The null hypothesis for t his test is t hat the mean wages fo r a particular strntum (l ansing social workers) of the population and the population as a whole ( Michigan social wor kers) will be the same: where !lo is the mean wage fo r the population and ~t 1 is the
  • 40. mean wage fo r t he stratum. The assumptions of the single-sample t test are as follows: Randomness: Sample members must be randomly drawn from the pop ulation. fndeptmdence: Sa mple (X) scores rnust be independent of each other. Sct1liug:The dependent m~sure (X scores) must be interval or ratio. Norma l distribr<tion:The population of X scores must be nor mally di&tributed. 88 PAIIT I • QUANnrAnVf At-nOA.t-H£s: Fo u iOAnotn o• OA t A Cou.£CIION These asswnprioos are li<ted more or lc:.s in order of in1portance. VioiJtions of the frrsr three assumptions are es>entiJIIy "f•tal" ones. E'·en slight violations of the lir..t two assumptions can introduce major error into the compmation of p value~. Violation of the assumption of,, normal distribution will introduce >Ome error into the computation of p vJiues. Unless the population distribution is markedly different fro m a normal distribution, rhe erro" will tend to be slight (e.g., a re ported p v.tlue of.042 Jctu ally will be a p value of .057). This is what is meant whe n some-one snys t ha t the t test is a <•robust" test.
  • 41. T he tstatistic fo•· t he sing le sample t te;t is computed by subtr:ocr ing t he null hypotbe- • is (popula tion) mean from t h e s"mple mean and dividing by th e sta ndard error of th e n1ean. T he fo rmu la for r...,, (pronOlii1Ced "t obr•ined") is As the absolute value of '·• get> larger, tht> more unlikely it is that such a difference could occur if the null hypothc>sis is true. At a certain point, tht' probabilit)' (p) of obtam- ing a t so large becomes sufficiently small (rt'acbt'S the a. level) that we rcjt'<t the null hypotbt'Sis. T he critical value oft (the v.d ue t hat too. must equal or exceed to reject the null hypoth- esis) depends o n the degrees of freedom. For a single-sample rtest,the degree> of freedom ure df= n - I , whe re" is the s.omp k >itt'. Let us look at how to compute '"k v.re know from a statewide SUI'VC)' I hat the average time taken to complete an outpa- tient rehabilitation p rogram r-or .o certain injury, X, is 46.6 d ays. We w ish to see whethe r clients seen at o u r clinic nrc taking longer o r ;horter than the state average. We randomly sa mple 16 fil e< from the pa>t year, We review these c.1>cS anu dete•mine the length of program for each of the clients in the sample. The mean n umber of days to
  • 42. complete rehabilitation a t our clinic is 19.875 days. This is lower than the populat ion mean of 46.6 days. The question is whether this result is statistically significant. I> itlikel)' that this sample could ha,·e been drawn from a population with a mean of 46.6? To determine thi>, we ne..'<lto calculate r.,... The first step in calculating t,_,. was arriro out when we computro the sample mean. Tite next step is to compute the standard error of the lllt'aO. We begin this by <umpu ung the standard deviation, which t urns our to be s 11.888. Th e standard erro r of the lliCJn i> calculated by d ividing the standard deviation by t he square root of the sample size or s; _s_ = l 1.888 = l 1.888 = 2 _ 9 72. /ii Jl6 4 We take th e fo rmu la for t,,..., Joel p lug in our n umbers 10 obLain 29.875- 46.6 2.972
  • 43. -1 6.725 8 2.972 - 5.62 We look up the tabled t val u e {I., ) at 15 degrees offreroom. This turns out to be 2. 131 for a nondirectional test at (X .05 (sec • t•ble of the critical values for the ttt»t, non<li- rectional, found in most ,tatistie> texts). The absolute , .. Jue of r.,.. = 5.628. This is greater than t"" = 2.131, so we reject the n ull hypothesis. The e-.- idencc suggests thot clicnls in o ur clinic average fewer days in rehabilitation thon is t he case in the statewide population. T he effect size index for a test o f means is d and is computed as follows fo r a single- sample t test: d = ~~o . s The effect size for our example would be as follows: d = 29.875 - 46.6 11.888 which would be classifie d as a large effect. -16.725 11.888 = 1.4069' 1he x' Cootfne;s-of· Fit Test . Th e.%' goodness- of-fit test is a single·sam pic test. lL is used in
  • 44. t he evaluation of 11ominal (categorical) variables. The test involves comparisons between observed and expected frequencies wi thin strata in a sample. Expected freq uencies are derived from either population v-alues or t heoretical values. Observed frequencie-s are those derived from the sample. T he null hypothesis for !he x' test is that the population from which the s.1mple has been drawn will have !he same proportion of members in each category as the empirical or theoretical null hypothesis population: where P., is the proportion o r case~ •.vitbin category kin the null hypothesis population (expected), and P01 is the proportion of cases within category k in the population from which the test sample was drawn (observed). The assumptio n> fo r thet' goodness-of fit test arc as follows: • Randomness: Sample members m ust be randornly drawn from the populnt i<)ll. • Independence: Snmplc scores m ust be independent of each other. O ne im plication of this is that categories must be mut ually cxclu;ive (no case may appear in more than one category). • Scaling: The dependent measure (categories) m ust be
  • 45. nominal. • expected frequenck$: No exl'ected frequency within a category should be less !han I, and no more than 20% of the expected frequencies should be less than 5. As "ith all tests of !he nuU hypothesis, the x' test begins with the assumptions of ran · domness and independence. Deriving fr o m thc.~c assumptions is the requirement that the categor ies in the cross-tabulation must be mutunlly exclusive and exhaustive. Mutually exclusive means t hat an individual may not be in more than one categot)' per variable. ExiJaustive means that all categories of int ere;t arc covered. These assumpliom nrc listed more or less in o rder of i.n1portance. Violations of the first three assumptions are essentially "fatal" ones. Even slight violations of the first two assumptions can introduce major errors into the computation of p values. 90 PA~-r l • OVAinllAt•vt Al'tfiOoCI!CS: FouNOoTION<o 01 DAYA C.ouu:.HON They} goodness-of-fit test is basically a h>rgc-sam plc test. Whc11 the c·xpectcd frequen cies are small (expected frequency les.~ thnn I or atlc:1~t 20o,(, of expected ft·equ,•ncics less than 5), the probabilities associated with the X' t~St will be in
  • 46. accurate. The usual pt·occdtu'c in this case is either to increase expc led frc<1ucncb b)' colbp, ing adj.>ccnt C<>tcgorics (also called cells) <>r to u.<c '"' ot her test. Follo<"ing is a concrete CX:l111 plc. The workers at the Interdenom ina tional Social Services Center in St. Win ifre d Township wanted to see whether they were servi ng people o f all fniths (and those of no fit ith) equ:11l)'· The)' had census 11gures indicating that religious preferences in the town>hip were as follows: Ch risti~n (64%), Jewish (10%), Muslim (8%), other religionino preference (14%). and agnostic/atheist ( 4%). The workers randomly sampled 50 clients from those seen during the previous year. Befor• they drew the sample, they calculated the expected freq uency for each category. To obtain rhe expected frequencies for the sample, the)' converted the percentage for each preference to a decimal proportion and multiplied ir by 50. Thus, the expected frequency for Christians was 64% of 50 or .64 x 50 : 32, the Jewish category was 10% of 50 or . 10 x 50 = 5, and so on. Table 6.5 depicts the expected frequencies. TABLE 6.5 Expected Frequencies for Religious Preferences Expected fr(!q uency
  • 47. Christi (In J2 Jewish 5 ti1uslim Other/No Preference Agnostic/ Atheist 4 7 2 Two (40%) of our expected frequencies (Muslim and agnostichlllteist) are less than 5. Given that the maximum allowable is 20%, we are violating a test assumption . We can remedy this by collapsing categories (merging two or more categories into one) Ot' by increasing the sample size. However, thet·e is no c.ategoq• that we could reasonably com· bir1e with agnostic/atheist. lt would not work to combine this C<tegory with any of the other categol'ics because the latter ar• religious individuals, whereas atheists and agnostics aJe not religious. However, we could increase the sample size. To get a sample in which onl)• one (20%) o f the expected frequencies was less than 5, we would need a sample large enough so that 8% ( percentage of the population identifying as Muslim ) of il would equal 5: 0.08 • 11 = 5 " = - 5- = 62.5 "' 6J.
  • 48. 0.08 So, our sample size would need to be 63, givi11g us th e expected frcq ucncio.:> show11 in Table 6.6. On!)' one of live (20%) of the expect«l frequencies is less I han 5, and nQne of them is less tha n I, so the s:un ple size assumption is mel. The results of a random sample of 63 cases were as found in Table 6.7. TABLE 6.6 New Expected Frequencies for Religious Prefere~ce; ' · < · ;. : •: •: • . . ~ ' * • Christian Jewish Muslim Other/No P(eference Agn ostic:/ Atheist -------------------------- ~>:pecte.fl frcq uc:nc;· ~0.32 6.30 5.04 8 82 2 52 TABLE 6.7 Observed and Expected Frequencies for Religious Preferences Christian Jewish Muslim Other/ No Preference Agno$tic/ Ath~isl: Expected 40.3L &.30 5.04 8 .82 2.52 rr~(j ll CrtCy Obse1·.-cd 49 2 2 9 frequency
  • 49. The null hypothesis fo r this example is th;~ l the p roporlion of peo ple living in St. Win ifred T<>wnship who identify 1vith each religious categor)' will be the sam.: as the pro· portion of people who have received services at the Interdenominational Services Center in St. Winifred 1b w nship who identify wit·h each relig io us catt:gory. The null hypoth~sis expresses the expectation that observed and expected frequencies will not be differem. Notice the similari ty ben~<.>en the nu ll hypothesis and the numerator of the ,,, .•. test statistic: /v IJ& X2 = "' (Jo - rd 0 0 1 L- fE . T he form ula tells us to >U btract the e xpe<ied score from the observed score (j~ -.0 and then to square the difference (ffo - f.:]' ) and divide by the expected score (ff0 - J~l'!f.) for each observed and expected score pair. •Vhen we are fmished, we add the answers and o bta in the X',,, test s~tlist ic (Ta ble 6.&). The x.,. is evaluated by comparing it to a cr-itical value <x'.,,) that is obtained from a table of critical values of the X2 distribution. If X'.,b, is greater than or equal to x', ... • then we reject t he null hypot hesis. For ax' goodness of fit, the degrees of freedom are equal to the number of ,,ategories
  • 50. (c) min us I or df = c- L In our case, we have five categories (Christian. Jewish, Muslim, otherino prefere nce, and agnostic/athe;st), so df = 5- I = 4. The critical value fo r X' at C< = .05 an d df =4 is X' .," = 9.49. We have calculllted 7.'.,., as 23. 1295. Because X1<,1>1 is greater than X.~ena , we reject the null hypothesh:. The evidence .sug- gests that people of all faiths (and those of no faith) are not being sec11 proportionately to their representations in the township. Earlier, we discussed the use of t he effect size measure d for the t test. Jt is an appropri- ale measure of eftect size: fO r a test of means. However, Lhc X2 test doc,~ not compare 92 PAIT I • Q UAIITI TA.Tivt A PPfiOAW £s: fou~OAliONS O f DATA Coll.ECTI OM TABLE 6.8 Computation of x' ... Observed (f 0 ) Expected (f,) fo - fe lfc - f,)' (f.- t,)' f, 49 4032 +8.68 75.3424 17.4404 2 6-30 -4.30 18.4900 2.9349 2 5 04 - 3.04 9.24 16 1.8337
  • 51. 9 .8.82 - 0. 18 0.0324 0.0037 2.S2 - 1.52 2.310• 0.9!68 !'JOT!.: I (f, - f,)' 17,4404 + 2.9349 + I 8337 + 0.0037 + 0.9168= :t',, = 23.1295. f, means. It compares frequencies (or proportions}. Therefore, a d ifferent effect size index is used for the X' test-w. This measure of effect size ranges from 0 to I . Cohen ( !988) clas- sifi es these effect s izes into three categories: Small effe<i size: w~ .10 Medium effect size: w ~ .30 Large effect size: w ~ .50 The effect size c.oefficient for a x! goodness-of-fi t test is computed according to the fol- lowing formula: where N = the total sample size. For the St. Winifred Township example, IV= J(23.! 295/ 63}- J(0.367l) = 0.6059, which would be classiGed as a large effect. Hypothesis Tests for Two Related Samples These are Jests in which either a single sample is drawn and
  • 52. rneasLtremen ts are taken at rwo times or two samples are drawn and members of the sample are individually matched o n som e altribute. ~vfeasureJDeDts are taken fot each member of the matched groups. We· investigate three examples of two related sample tests in this section: I. Dependent (matched, paired, correlated) samples t test (in terval or ratio scale) 2. Wilcoxon matched pairs, signed rank.~ test (ordinal scale) 3. McNemar change test ( nominal scale) C1MPH~ 6 • Sunsncs FOR Sot-IAt 'IOKKUlS 93 Difference Scores. The dependent r test and the Wilcoxon matched pairs, signed ranks test evaluate d ifference scores. These may be differences between scores f<om measuremenl~ taken m two differen t times on the same individual (pretest and posttest) or differences between scores taken on two diffe rent individuals who have been paired or matched with each other based on their similarity on some variable or variable cluster (e.g., gender, race/etllnicity, socioeconomic status). The formula for a d ifference score is x; - X1 =X0 , X, is the first of a pair of scores,
  • 53. x; is the second of a pair of scores. and X 0 is the d ifference between the two. The null hypothesis for all these tests is that the samples came from popub tions in which the expected differences are zero. Tlte Dependenr. Samples t Test. This also is called the correlated, paired, or matched t test. The nu ll hypothesis for this test is that the mean of the differences between the paired scores is 0: where J.l.xo = the mean diffe rence between the populations from which the samples were d rav.1n) and )!00 "" the mean difference between the populations specified by the null hypothesis. Because the null hypotnesis typically Sp<!cifies no difference (!!00 = 0), the null hypothe- sis usually is written as The t statistic for the dependent t test is the mean of the sample differences divided by the standard error of the mean difference or Xo - l'oo lobt = 5= ·
  • 54. XD As the absolute va.lue of t. gets larger, the more unlikely it is that such a difference could occur if the nnll ll)'pothesis is true. AI a certain point, the p robability (p) of obtaining at so large becomes sufficiently small (reaches the alpha level) that we reject the null hypothesis. The assumptions of the dependem t test are as follows: Randomness: Sample members must be randomly d rawn from the population. Tndependence: Xvscores must be independen t of each other. Sca ling: The Mpcndt'nt measure (X 0 scores) must be interval or ratio. No r·mal distribution: The population of X 0 scores must be normally distributed . These a>sumptions a re list ed more or less in order of import>l 11cc. Viola tions of the t1 rsl t hree asswup tions i1re essen t ially "dea th penalty" violation.. Eve n slight violation. "r the (ht two assumpti011s can intr oduce majo r e rror in to th e comp ullll ion or p values. Sim i lady, dilTnence scores computed fro1n ~""'O sel!t of ordi nal data tnay inwrporate major error.
  • 55. Violation of th~ assu mption of a normal distribution "ill introduce some error into the computation of p values. However. Wllcss the population distribution is markedly dif fcrent fi-om a normal di>tribu tion, the errors will tend to be slight (e.g., a reported p value of .042 actually will be a p value of .057). Th is is what is ml·an t wh en someone '"YS thnt the t test is a "'robu~t .. test. Still, cvm thoug h t he erwr is sli~;ht, the nonp<tr<~metric. Wikoxon rn;,tch ed ~>t~ irs, sig ned ranks test (discussed in the next section} prob;,bly will yield a more accu rate test when there are viulation~ of this normal dislribution as.su.mpliun. Let us look at the proc<"<iure for compuling th<: dependent grouvs I statistic. We usc an evaluation uf an intervention for individuals '"ith dcpre..,.inn problems. The dependent measure is the Bclk Depression Inventory ( liD I), a reliable and well 'alidated mea>urc nf dcpn:s~;un. Ten clienL~ were rand omly s~kcted r,·om clients seen fo r d ep ression problcn" a t a (l,un - m unity cent~r. 'I 'hey were pretested (X,) with t he BDI, r<·cd ved I he treatment, ;,nd t he n were posrtested (~)wi th t h e same inst ru111e n1.. The m ean of the d iffe rence scores (.k0 ) wa.s - L This means that tJ K· aven1ge: chtUl.gC' in BD f scnrefi fron1 pcelC'Sl tu pn:-:ttest was a dtcrease of I poinl. The standard deviation of the ditlcrcnce s.:ort> 'aS l.H .
  • 56. 'I he ne>.'t step is the cnmpntation of the 'landard error ol tllc mean. Wedhdde the stan- dard deviation by the square rout of t he s.unpk siu: to get t he standard c·rror of th e mean: .< XD = 1.'33/ V 10 - l .;l3j 3 .16 = 0 .•12. Ve plug the value.< into the formula li>r t.,.: XI> r"lobt = - -'xl'> - 1 -~ - .1..38 0.42 .. Fo1· a = .05 and rlf ~ 11 - I = 10 - I -9, r, ... = 2.262 (sec a t<~nle of critical values for the 1 te,r, nondire.:tional, fo und in m ost stali>Li" texts). Because lt .... l - 2 .. l8 is greater !loan or equal tn the critical ';liuc, we reject the null hyp(llhcsis at a= .05. The cff~ctsi/e index for tbiotc.,l i' ll and is rom puled a; foUows: ; For the depr~ssion intervention cx,unplc, -1-0 - 1 d = = = - 0.752.
  • 57. 1.33 1.33 w hich wou ld be classifier! ns " medium effect. CHAI'rER 6 • SI All~ucs Hl!t Socu .. l Woll.~Eas 95 lv'ilc&X011 Matched Pairs, Signed Ranks Test. The Wilcoxon matched pairs, signed ranks test is a nonparametric test for the evalua tion of d ifference scores. The test involves ranking d ifference scores as 10 how far they are from 0. The difference score closest to 0 receives the rank of I, the next score receives the rank of 2, and so on. The ranks for diffe rence scores below 0 are given a negative sign, whereas those above 0 are given a positive s ign. T he null hypothesis is t hat the sample comes from a population of di fference scores in "' hich the expected difference score is 0. The assumptions fo r t he Wilcoxon matched pairs, signed ranks test are as follows: • Ratufomness: Sample members must be randomly drawn fro m the population. • independence: XD scores 111ust be independen t of each other. • Scaling: T he dependent measure (XD scores) must be ordi nal (interval or ratio dif- ferences must be converted to ranks). Let us look at the procedure for computing the Wilcoxon matched pairs, signed ranks
  • 58. test statistic. We use the same example as for t he t test. The dependent measure is t he BDI, a measure of depression. Scores on the BDI are not normally distributed, tending to be positively skewed. Ten clients were randomly selected from clients seen for deprcs.~ion problems at a com- mun ity center. They were pretested w·ith the BDI~ received the treatment, and I hen were posttested with t he same instrument. We c.ompute the difference scores (post -pre) fo r each indi,·idual. We assign a rank to each difference score based on irs closeness to 0. Difference scores ofO do not receive a rank. Tied ranks receive the average nlllk for the tie. So, if we look at Table 6.9, we see that there is one difference score of 0 that goes unranked. There are five difference so::ores of eit her - 1 or +L These cover t he first five ranks {I, 2, 3, 4, 5), giving an average rank of 3. T here are three difference scores of - 2 (and none of +2). T hese cover the next three ranks (6, 7, 8) , giving an average rank of 7. The una! score is - 3, which is given the rank of 9. TABLE 6.9 Computation of the Wilcoxon T .. , Signed Ranks JD Number Pretcsl Postte.st Difference Rank Positive Negati ve 17 16 - 1 3 3 2 19 t8 -1 3 3
  • 59. 3 18 15 -3 9 9 4 18 17 -1 3 3 s 16 16 0 6 16 17 +1 3 3 7 18 16 - 2 7 7 8 21 19 - 2 7 7 9 18 19 .+1 3 3 10 18 16 - 2 7 7 NOTE: Sum of ranks for less, frequent ~ign ~ 6: 9 6 t-'11111 I • QUAWhlAII11 Al•f'II(IA(tUI: r t i UNOATI(Hn ()I I)AlA (.OU I CI101i T he M<l st<·p is to '';ign" the rank. ' I hi< mcJns to place the rank in eith« 1hc p<hilivc or 1hc negative <.Oiumnm 1h~ l.thle. depending on whether 1he differ,·ncc >(Ore wa, PO>i tivc or ncg.uivc. We then determine which ,ign (JXl,ithe or neg.ttive) apJl<'ared 1.-s~ fre<JUCOtl)· Jnd add up rhc r.mks for 1his >~!(n. lkcause th e positive sign ,tppearctf only twice (comp>rctf to ~even tim~s for lhc ncg:.uivc sill.n)~ w~: add up I he rank~ in the pO$itivc column .lnd obtain 1>. rhi•" I he IC1 l3l"lic v~lue for the Wil<OM>n mJI<.hed
  • 60. J>J II.,, stgncd r:lnks test. Th e IC> I. stati>l icis w iled 'f.,1, . This is an 11 ppcrcase T a nd is not the >flllll' as the >tatistic us<:d with the (lo'"erc.tse) I distribullon. There are two other i<sues with re>pect to the Wilwxon 7.1,. • hat shoul11 be ad,lresscd: 1. The Wilcoxon T..., is cvaluat<·d according to rhe ruombtr of nontcro differentc ~cores. So, we should subt ract I from the o rigina l 11 fo r each <liiferenc~ score th ot h 0 10 obtJin a corrected 11 to usc for the critical '~lue table. 2. Unlike most other t~>l &ratistic~. the Wilcoxon T,,, must be lrss tlta11 or equa l to t he c ritical value to ,·eject the null hypothc>is. We consult a table of critica l values for I he W ilcoxon T(scc t ahlc of .:ritical values for Wilcoxon Tin any general swristics book) Jnd stt whether obe result (7.,.. = 6) was sig· nificant at o. = .05. lle<:ause there wa. one differen ce score equal to 0, the corrected 11 = 9. The critical value for the Wilcoxon 7"a t n=9 and a .05 is T.,. = 5. 1:,.. = 6 is not less than or equ•lto the critic.ol value, so we fail to reject the nuU h)·polhesi> at o.- .05. There is n o weD-accepted post h oc measure of effect sizt for Otd in:d tesL~ of rela ted scores. One possib le measure would be proportion of nonoverlapping scores as a measure of effect. Cohen ( 1988) brieOy discu~s this measure, called U.
  • 61. The p1·ocedure bc:gins with compul ing the miniJuum and maximum ~cores for each of the two related g roups. We choose the least maximum and the greatest minimum. Tbi> establish es the end points for the overlap range. We count t he n umber of scores in both groups w ithin this mngc (including rhe end JX>ints) and divide by the total number of scores. This gives a proportion of overlapping score.o;. Subt ract t his number from I , and wr o btain the p ropottion of nunoverlapping $Cores. T his indc.~ ranges from 0 to I. Lower proportions arc indicative of ~mallcr effects, and higher on~> are indicative of larg<·r effects. Cohe11 ( 1988) calcula tes equivalent< between U a nd d, which would imply the foUow· ing definition> of strength of effect: Small ct rect slzr Uugc ('tfect SIZC d~ ~ d:.8 u- .IS u- .33 u ~ 47 f"Or the example da1~, the minimum scooc for th e prctCl wa& 16, and the mnximum ~core w;1~ 2 1. The poSit(!St miuimum and ua.tximllln -;cores wt:r~ 15 .md llJ. rc-'>petti•cly.
  • 62. 'I h e grc•test minimum is 16 •• md lht lcastm.l.ximum is 19. Of 20 total '>()1 e.,, 1 ~ f~U with in thi, 1werl.•1> r.onge. The p ru('<J rt ion of ovcrhop is I ~/20 c.~) . Tl'te pwportion of nonovcrl•ppings..otc., b u~ 1 -.90 = .10. hich would be a smJJI cft<:.:t. CHAnt~ 6 • STAT1srtcs rQR SQetAL Wcnrxus 97 .WcNmmr Change Test. The Mc:-icmar change test is used for pre- and post intervention designs "'here the variables in the anai)'Sis arc dichotomously scored (e.g., improved ~. not impro,•ed, same,.,_ different, increase 's. decrease). The layout for the McJ-:emar change test is shown in Figure 6.5. Cell A cont.Un> the number of indh~dual.s who changed from+ to-. Cell B contains the number of individ- uals who recei,ed +on both measu rement>. Cell C contains the number of individuals who received - on both measurements. Cell D contains the number of individullh who changed from - to +. The null hypot hesis is expressed "' where P, is t he proportion of cases shifting from+ to- (decreasing) in the null hypothesis population, and P 0
  • 63. is the proponion of ca,.,; shifting from - to + (increasing) in the ouU hypothesi' population. The assumptions for the McNemar change test are sintilar to those for the X' test: Rrmrlomness: Sample members must be randomly drawn from the population. Independence: Withi n-group sa111 plc sco•cs must be independent of each other (although llerween-group scores [pre· ~nd poM1c~1 ~cores] will necessarily be dependent). Smling: The dependent measure (categol'ies) must be nomi nal. F.xpected frequencies: No expected freq ue11cy within a category should be less than 5. A special case of X'..,, b t he test >tatistic for the McNemar change test: where t _ (If,. .fi,f - I ) 2 '"" - f, + fn J. =the frequency in Cell A, and fn =the freq uency in Cell D. Th ·is is a test statistic with df = I , For rlf I , we need to include s·omcthiug called the Yates correction for continuity in the equation. This is - I, which appears in the n ur.-'1~ 1'" tor of the test statistic.
  • 64. Figure 6.5 McNemar Change Test layout Before + After A B c 0 98 PART I • QuAutlfi~T•vt A PI'AOAC HlS! Fou~JDAfiONS OF Ot.rA CotUCliON Let us imagine that we are interested in marijuana use among high school students. We also are interested in change in marijuana ust over time. Jmagine that we collected survey data on a random sample of ninth-graders in 2007.1n 2009, we surveyed the same sample that had been in ninth grade in 2007. We fo und that 32 of 65 students said that they used marijuana during the previous year, as compared 10 23 of 65 in 2009. The results are sum- marized in Table 6. 10. TABLE 6.10 Observed and Expected Frequencies for the McNemar Change Test 2009
  • 65. None Marijuana 2007 Marijvana 2 (Cell A) 21 (Cell S) None 31 (Cell C) 11 (Cell 0) Total 33 32 l'o!<ll 23 42 65 Cell A repn-serm thMe studeitts who had used marijuaM in 2007 hut who had nOf used it in 2009. Cell B shows the number of students who had used marijuana in both 2007 and 2009. CeU C shows the number of students who did not use marijuana either in 2007 or in 2009. Cell D shows the number of students who did not use marijuana in 2007 but who did use it in 2009. So, the sum of Cells A and D is the total number of students whose patterns of mari- juano use changed. The nuU hypothesis fo r the McNemar change test is th at changing from nonuse to use would be just as likely as changing from use to nonuse. In other words, of the I 3 individuals who c.ha11ged their
  • 66. pauern of marijuana usc, "e would expect half (6.5} to go from not using 10 using and the other half (6.5) to go from using to not using if the null hypothesis were true. Tile calculation of the McNemar change test statistic is shown in Table 6. 1 L !'or df ~ 1 and C/. ~ .05, x,, = 3.84 (see a I<Jbe of critical values of x' fo<md in most sta- tistics texts). Because x ',., = 4.92, we would reject the null hypolhesis at u = .OS. We would conclude that there was in fact aJl increase in marijuana use between 2007 and 2009. TABLE 6.11 Computation of the McNemar Change Test Statistic ( JI~ - f01)-1 2 11 8 NOTE: 7~1 = 4.923. 64 (If. - f. l- 1 I' f..,. + fl) 4 ,9230767 CHAot1U 6 e STATISTICS fO-. SOCI~l W O'-I(rll 99 The effect size coefficient for a M':-lemar change test is wand
  • 67. is computed according to the following formula: For the high school survey, w = J(4.923/65) "' Jo.o757 = 0.2752, which wo uld be classified as a medium effect. Hypothes is Tests fQr Two Ind e p e nde nt S amples These are tests in '•hich a sam ple is randomly drawn and individ uals fro m the sample Jrc rJ.ndomly assigned to one of two experimental conditions. We investigate three examples of two independent samples tests: I. Independent samples (group) /test (interval or ratio scale) 2. vV"dcoxonfMann-Whitney (WfM-W) test (ordinal scale) 3. ;(2 test of independence (2 X k) ( uominal scale) l11depeudent Samples 1 Test. T his sometimes is CJIIcd the g roup t test. It is a test of mcJ.ns whose null hypothesis is fo r mally stated •• follows: Following are the assum ptions of t he independent t rest: Randomness: Sample members m usr be randomly drawn from the populotion and ran· dom ly assigned to o ne of the '-"0 groups. ltrdepe11dence: Scores must be independent of e.1ch or her.
  • 68. Scalitrg: The dependenr measure musr be inrervlll or ratio. Normal distribution: T he populations from which tbe individuals in the samples were d r,own must be normally distribured. Homogeneity of variances (a,'- a ,'): ' f he samples must be drawn from populatious whose variances are eq ual. Equality of sample sizes ( "• = n,): ' I he samples m ust be of the same sir.e. As before, these assumptions are listed more or less in o rder of imp o rtance. T he fir. r three assumptions are rbe " fa tal" assum pt ion;. Violation o f the nonnaliry assumption will make for Jess accurate p val ues. However, unlc;.s Lhe population dist r iburion is markedly diiTerent from a normal d isrr iburion, the errors will tend to be slight. Slill, e"en though the error is slight. the oonparamcrric W /M- W test probably will be more accurate when the norma lit)• assum prion is violated. The independent groups t tesr alw is fair!)' robu>t .-ith respect to •iolation of the homogeneiry of variances assumption and the equal sample size assumprion. A problem may .orise when both of these assumptions are violated Jtthe same time. 100 PAnl I • OUANntAuvt Art~AoAc.ul~ Fou~~rooAT ION>
  • 69. o• 0"'" Ct~ur<TION If the ,maller variance •~ mthc "11allca >.~mple.then the probability of,, I ypc II ca ror ( 1101 deteaing an exi;,ting dilfcrcn<c) ia"rC.1«'>.i£ th(' larger 'ariancc is i 11 til<' <mJIIcr .amp!<-, then 1 he probability of a 1Ypc I error (rei<-.:ting the null hypothc:.i> when it i> true) anne.a'<". If there is no ..tSsodarion lk·twt-en s.;1mplt"' Mit.' ~lnd vari:wcc. then ''iol.l1ion of c:.u.h of thc>e .~S»umptions is not partiCufMiy problem.uic. There may be fairly ,,ub>t.mtial di~ crrpJncies bctwet•n s. .. mplc si1C!' withnut much effect on Lhc dtc.ur~cy o i Ottr /' cMim.lttl'!. Similarly, if e- very other n~~nmption i!) mel, 1hcu a slight difference in v11riam:c:. will not h ave a fa rge effect on probability estimates. T he t stat i~tic for the independent 1 lc<t is the d ifference be tween the snmpfc 111cans d ividc<l by the standard e•-roa· ,,r the diffprrnces between means or x , - x2 lut-·1 -- Sx 1- ... ~ Be«luse rwo sample mean• arc computed, 2 degrees of freedom are lost: df 110 + n, - 2. where "• = number of scores for the first group, and
  • 70. 11 2 = number of scores for the seco11d group. Following is an example ot the ll>e o( the independent t test statistic. We whh to sec wl1ethf:r there is a difference i11 ((•vel of soci.al act iv ity in children depending 011 whether they are in after-school care <>r h0111c (.(ltc . Because more childre11 attendc<l the .1fter school program, a proportional~ stratilied >ample of 16 children in afteHchoof care (Group I ) and 14 childien in home care (Group 2) was drawn. The dcpcnclcnt meJsure v,•as a score on a socir1 l activity ).CJ )e in whk h lower scores represent less soc ial aclivity and higher scores represent more social activity. We c'aluate tl1is with an independent 1 tc.L The first step in calculating '·•• i, to com· pule the sample mean for each group. The next step is to compute the stJndard error of the mean. Howe•·er,the pl'()(cdure for doing thi< i~ a little different from that u«<< before. A> lou might recall. the standard error of the mean is the standard dcvi,ation d" aded by the square root oi the sample 'ire: $ .,;;; /sl !.. II This also is equivalent to the squ:HC •·oot· o f the variance times the inverse of the,., , .
  • 71. p te size (l/11). Unf{'trtunately) we c:u•not u~t..· lhis IOI'tnuln for t+ae standa rd error o f lhc mean. It is I he "ttdnda l'd crroJ' for a sinr,l<.- ... amplt. Bccauo,r we have two sample:, in ,m iudcpcndt•nt WOU(JS lCsi, the formula has to he Jitert·tf J bit. Th~ first difference i in the (orrnuiJ for •he: va ria nce. TIH! variM1u: i' the Uill o l ..qual'l."> divided b)' the deg~C·c~ of lrct'dom. ll•s tht same he...- eX(Cpt that we have two 'oms of squan:s (one for Group I and one for Group 2). and o u1 degree< of freedom Jr(' 11 1 rt. 2. Thi• gives "' the folfowint: cquJtion: ss, ss1 " ' I II• 2' CH.t.PHR 6 • Su.nsncs f OR SOC IAL W ORKERS 101 s; is the pooled estimate of the variance based on two groups, 55 1 is the sum of squares fo r Group I , SS, is the sum of squares for Group 2, n 1 is the number of scores in Group J, and
  • 72. n, is the number of scores in Group 2. Because there are two groups, we do not multiply s: times (1/n); rather, we multiply it by i lin,+ I In,). We take the square root of this and obtain the pooled standard error of the mean: S.'1-Xl = , (I 1) s- - + -P IlL nz . The means and sums of squares for our example are presented in Table 6.1 2. Now, let us tq• computing t..,,. TABLE 6.12 Group Statistics Group Mean Sum of Squafcs " 27.8B <1330.40 16 Home care 21.36 17{)7. 16 4 First, we compute the pooled standard error of the mean (also called the standard error of the mean difference). We begin by calculating the pooled variance: ss, + ssl 43:;0.40 + 1101.16 6037.56 28 = 215.63 . = n, + n2 - 2 16+14-2 From the estimate fo r the pooled vari<Htce, we may calcubte the standard errol' of the mean diffe rence: s2 - +- = ( 1 I) I' tll ll2, 2 15.63 (~ + ~) = ,128.88 = 5.37 16 14 Wt calculate 1
  • 73. 001 : 27.88 - 21.36 6.52 lobt = = -- = 1.213 . 5.37 5.37 For ex = .05 and df = 111 + 112 - 2 = I 6 + L4 - 2 = 28, Ia;, = 2.048. Because 1100,1 = 1.213 is less than the critical value, we fa il to reject the null hypothesis at a. = .OS. 102 PAI!.l I • QuANtiTATIVE AI'P~OACHES: Fou ... O-.liOM Of 0ATA co~UtliO'f There are two post hoc effe<:t size measures for an independent t test. The 11m of these (d) already has lxen di.cmsed: Note dlatthe numerator is the difference between the two sample m eanl and that th e denominator is the pooled c>ti mate oft he standard deviation. The pooh.'!! •t andard de,•i- ation is t he square root of the pooled variance that we calculated earlier: Sp = fs~ = V215.63 = 14.68. The effect size for the example would be d = 27.88 21 36 = 6.52 = 0.44 14 .68 14.68 ,
  • 74. which would be classified .ts a 1mallto medium effect size. The other measure is Tl • (eta-.quare). n' is the proportion of variance explained ( Pifl:) . This is equivalent to the 'quared point-biserial correlation coefficient and is computed by 2 /<lbt 2 if. /Obi + d We '''ere com paring socinl nc tivity in c hild ren in after-school care vcrMJ> t hose in home ca re. Children in after-sdtool cure sCC)rcd h igher on social activity than d id c hild ren in home care. T he differe nce was not statistically s ignificant for <> ur chosen ex = .05. r.,.,. was 1.2 13 with df • 28. Pu tting these numbers in t h e formu la, we obtain the following: l_ ( 1.213) 1 " - ( 1.213) 2 + 28 1.471 29.47 1 = 0'0499' So, a litde less than 5% of the variability in social activity among the chlldren was potentially explained by whether they were in after-school care
  • 75. or home cJre. Wilcoxon/Mann -Whiwey Test. Statistic> texts used t o reter to this te>t as t he Mann- ~Vhitney test. Recent ly, th e name of Wilcoxon has been added to it. The reason t hat Wilcoxon's n ame has been added is t hat he developed the test first and published it first ( Wilcoxon, 1945). Unfortunately, m OI'e fo lks noticed the art ide publishtd by Mann a nd I•Vhitn ey ( 1947) 2 years later. Tbe W/M-W test is a nonp a1·ametric test th at involves initia lly t reating both samples as one group and ranking scores from lcn;t to most. After this is done, the freq ue ncies of low and high ranks between groups arc compared. The assumptions of the W/M W test are as follows : Randomness: Sample members must be randomly drawn fr<>m the popuiJtion of inter- est and randomly a>Signed to one of the two groups. C U AI'rtll 6 • S IAHSHCS FOR $o cu._t W ORKU$ 103 Independence: Scores m ust be independent of each othe r. Scaling: The dependent measure must be ordinal (inter val or ratio scores must be con- verted to ranks). 'When the assumptions of the t test are met, the r test will be slightly more powel'ful
  • 76. than the W!M-W test. However, if the distr ibution of population scores is even slightly d iffe rent from normal, t hen theW /M • W test may be t he more powerful test. let us look at the procedure for com puti ng t he W/M-W test statistic. We use the same exam ple as we d id fo r t he independent r test. We evaluated level of social activity in children in arter-school ca re and in home care. T he dependent measure was a score o n a social activity scale in which lower scores represent Jess social activity and higher scores represent more social activity. The first step in carrying out the W/M· W test is to assign ranks to the scores without respect to which g roup individuals '"ere in. The rank of I goes to the highest score, t he rank of2 to the next highest score, and so on . Tied ranks receive the average rank. We then sum t he ran ks within each g roup. The summed ranks are called W1 for G rou p 1 and W, for Group 2 and are fo und in Table 6.13. TABLE 6.13 Summed Ranks for the Wilcoxon/ Mann-Whitney Test Summed ranks After-School Care n 1 = 16
  • 77. w,= 218 Home Care n 1 = 14 w;-= 247 The test statistic for the W/M-W test is u..,,. We begin by calculating U statistics for each according to t he fol lo wing equations: U 111 + ( 111 + l) 1 = 11J n;z. + lFV1 2 n2 + (n2 + 1) U2=11rnz+ 2 w, nt(nt + 1} u, = ,,, tiJ + 2 - w, = ( 16)( 14) + ( l6)(~6 - I} 2 18 = 126 (] 112(n 2 + I} 2 = , J l'l:z. + -=-'-=,...--'- 2 w, = ( 16}(14) + ( 14}( 14 - l)
  • 78. 2 182 = 224 +-- 247 = 224 + 91 - 247 = 68. 2 We choose the smaller U as u;,.,. Ln this instance, u.,. = u, = 68. 247 u •• , m ust be less tlran or equal to the critical value to reject t he null h ypothesis. The critical value for the W/M· W U at n, = 16 and at n, = 14, and o: = .OS is U"'' = 64. 104 PoIU I • 0uAN11tAT!V( A1'1'110M.Ht~ : FOU'IDATIO.,.S or OoTA CouH.UO' U.,..: 142 is not less than or equal to the critical value, so we fail to rejtct the null hypothe- sis at CL: .05. As before, t here is no well-established effect size measure fo r the W/M-W test. The U m easure of nonoverlap probably would be the best bet. For o ur example data, the minimum and maximum fo r t he after -school care g roup w ere 2 and 55. whereas they were 7 and 40 for the home care grout>· The greatest mini - mum is 7, and the le"'t ma.ximum is 40. All 14 .cores in the home ca re g roup are within
  • 79. the overlap range, and 12 of l4 scores in the after-school care group are in t he overlap range. This gi•es us a proportion of overlap of 26/30: .867. The proport•on of nonover- lap is U I .867"' .133. This would be ,, small effect. X' Test of lmlcpt!m/ence (2 x k). The assumption> fo r d1e x' test of indCj>Crtdence are as follows: /lat~dom/les.: Sample members must be rnndo mly dra"'n from the 1>opulation. /Jillependl'!lre: Sample scores m ust be independent of each other. O ne implication of this is tha t categories must be mutually exclusi'e (no case m ay appear in more than one c.1tegory ). Scaling: The dependent measure (categories) must be nominal. Expmcd frequmcie$: No expected frequency within a category should be less than 1, and no more d1an 20% of t he exp«tcd freq uencies sho uld be less t han 5. As wit h all tests of t he null hypothesis. the x2 test begins with t he assumptions of ran- d omness and independence. Deriving from t hese assumptions is the requirement that the categories in the cross·L1 bulation be mulllnl/y exclusive and ex/u~ustive. Mwunlly rtclusive meaJlS that nn individual may not be ill more thn n one category per variable. Bxluwsti•-e means that all possible categories are
  • 80. covered. let us imagine that we are interested in marijuana use among high school students and sp<-cifically whether there are any diffcrcn= in sutb use between 9th and 12th-graders in our school di>trict. We conduct • proportionate str atified samplt in which we ran- domly s:~mplc oixt)'-five 9th-graders and fifty-five 12th-g raders from all Mudents in the district. T he students are surveyed on t heir usc of ((rugs over the past ye.ar under condi- tio ns guaranteeing co nfiden tiality of response. Table 6.14 depicts reported marijuana use f o r t he s tudents in the sam ple o ver the past yenr. TABLE 6.14 Marijuana Use None MatiJuanil l eta I Grade 9th 12th 42 33 23 22 65 55 Toto!
  • 81. 75 1 ~0 A higher proport ion of 12th-g raders than 9th-graders in t his sample used mar- ijua na at least once during t he past year. The question we are interested in is whether it is likely that >uch a sample could have come from a population in which the proportion.1 of 9th- and 12th- graders using mc:1rijuana were identicaL The usual test used to evaluate such data is the x: test of i ndepcndcnce. The X1 test evaluates the likelihood that a per· ccived relationsg1ip between propor tions in categories (called being dependent) C HAI'TEII: 6 • STATISTIC-S fOR. Soc•AL Wo~Kflt S 105 co uld have come from a po pulatio n in which no such relationship existed (call ed independence) . The null hypothesis for this example would be that the same proportion of 9th-graders as 12th-graders used marijuana during the past year. The null hypot hesis values for this test are called the expected frequencies. These expected frequencies ior marijuana are cal- culated so as to be proportionately equal for bot h 9th- and 12th -graders.
  • 82. Because 45 of 120 of the total sample (9th· and 12th-graders) used marijuana during the past year, the proportion for t he total sample is 45f!20 = .375. The expected frequency of marijuana use for the sixty-live 9th-graders would be .375(65) = 24 .375. T he expected marijuana use fo rthe fifty-five 12th-graders would be .375(55) = 20.625. Table 6.15 shows the expected frequencies in parentheses. The%' test evaluates the likelihe>od of the observed frequency departing from the expected freq uency. T he null hypothesis is H,: P"'- P,,= O, where P 0 , is the pro port ion of cases within category k in the null hypothesis population (e.xpected; in this case, this is the expected proportion of students in each of the two gt·ade levels [9th and 12th] who fell into o ne or t he other use category [marijuana use or no marijuana usc)}; and P,~ is the proportion of cases wi thin categor y k drawn from the actual population (observed; in this case, this is the obser ved [or obtaine.d] proportion of students in eacb of t he two grade levels [9th and 12th] who fell into one or the other use category [marijuana use or no marijuana use]). The X'.,, test statistic is Degrees o f freedom for a x' test of independence are computed
  • 83. by multiplying the number of rows minus I times the n umber of columns min us I or df= (Row - I )(Colum ns- 1) TABlE 6.15 Observed and Expected Frequencies for Marijuana Use None Marijuana Total 9th 42 (40.625) 23 (24.375} 65 N01'E: Expwcd frequencies are in parentheses. Grade 12th 33 (34.375) 22 (20.675) 55 Total
  • 84. 75 45 120 For Ollr example, this would be d/=(2 -1}(2 1)=(1)(1)=1 Re.::all from our dbcussion of the ;'.lcNemar change te:.t that we include the Yates cor rection for continuit)· in the formula ,,hen df l . The equation for the corrected test sta tistic is as follows: X 1 = I: (Vo- fr,l - 0.5) 1 ul>• /c The form of the equ~tion tells us to suhtr.ltt the expected ;core from the observed >eore and take the ab:.olute value of the difference (make the difference positive). Then. subtract O.S fro m the absolute difference (I/., f. I -0.5) and square t he result. Next. divide by t he expected score. T his is re~1eated for ca<h observed and expe<ted score pair. W hc u we are finished , we sum the answers and obtnin the corre<ted x· .. ,. test st.ttistic.
  • 85. The reader might have noticed that t he con ection for the McNemar c hange test wa,l I.Q, whereas th e correct ion for the X' test of independence (and the goodness-ol:fitiCit) was 0.5. I will not go iuto an)' detail beyond sa)'ing that this is be.::ause the McNemar change test uses o nly half of the a••ailable cross-tabulation cells ( two of four) to computl' its x.'..,., ••hereas all cells Jre used to compute ;c,.. in the independence and goodne~< of· fit tl'sts. Tnble 6.16 shows how 10 work out the ma rijuJna survey data. For df= I and ex .05, the critical value fot· x',,.,. is 3.84. Ou r c alculated value (X',,,l was 0. 1 09. Bec<Juse t he obtuiued (cakuloted) value did not exceed t he critical value, we wou ld not reject the null hypothesis at a= .05. As before, the effe.::t <i>c measure is ";which is wmputed a• a post h oc measure by w - Ji.x'/N). ~or a 2 >< 2 tab le, w;, eq ual to the absolute v.tlue of <p (phi), which i, J true cor relation cocfticient.. If we sq uare w, t hen we obta in tp' , w h ich is the propor tion of variance ex plained (P1£). T AILE 6.16 Compuution of x' ... CJb,crved (f0 ) Expected (1, ) (If.- f, J - 0.5) 42 ~() 615 8/~
  • 86. lJ 14 375 81~ 23 ]4.375 .875 n 20 62~ 875 NOTE: 7.' = 0.01 9 + 0.02l + 0.031 + 0.037 ~ 0. 109. bbt (If. - f, J- 0.5)' (Jf.- f,l - 0 .5)' f, 0.7651>2'> 0.019 0.76~6lS 0022 0765675 0.031 0.765625 0.037 CHAI'tfft 6 • Sr.t.nsncs FOil So C-I.t.l WOII(US 107 For our example, w = /(O. J09/t 20) = Jo.ooo90S3 - oo3o i and w' = PVE - .0009. This is an extremely smaU effect size. f'or 2 x k tabulation, we cannot convert tv to PVE. Hypothesis Tests fork > 2 Independent Samples
  • 87. Irnaginc that we wert: in terested in ageist attitudes among sodal 'Orkers. Specificall)'> we are interested in whether there are any d ifferences in the magnitudes of ageist attitudes among (a) hospital social workers. ( b) nursing home social workers, and (c) adult pro tee- tive services social workers. We cotdd conduct independent group tests among aU possible pair ings: hospital (a) with nursing home (b), hospital (a) with protective services (c), and nursing home (b) with pro- tective services (c). This gives us three tests. When we conduct o ne test at the ex= .05 levd, we have a .05 chance of committing a Type I error (rejecting the null hypothesis when it is tr ue) and a .95 chance of making a correct decision (not rejecting the null hypot~esis when it is true). If 1ve conduct three tests at u = .05, our chance of commi tting at least one Type I error increases to about .15 (the precise probability is . 142625). So, we actually are testing at around 0'. = . 15. As the number of comparisons incceases, t·he likelihood of rejecting the null hypothe- sis "rhen it is true increases. oVe are ((capitalizing on chattce .'> One way of dealing with capitalization on chance would be to use a stricter alpha leveL f'o r three co mpa risons, we m ight cond uct our tests at u "' .05/3 "' .0 167.
  • 88. Unfortunately, if we do th is, then we will reduce the po,ver ( I - ~) of o ur test to detect a possible existing effect. However, there are tests that allow one to detect whether there are any differences among groups wiLhout compromising power. This is done by siJnultaneously eva1U(lting all groups for any differences. If no d ifferences are detected, then we fai l to reject the null hypothesis and stop. No further tests are conducted because w e already have our ans11w. The difference> among all gro ups are not sufficien tly large that we can reject the notion that all of the samples come from the s ame population. If significant differences are detected, then further pair comparisons are conducted to determine which pairs arc different. T he screening tests do not tell us whether only one pair, two pairs, o r all pairs show statistically significant differences. Screening tests show only that there are some differences among all possible comparisons. lf we conduct our screening test at a ,. .OS, then we will carry out the pair comparisons when the null hypothesis is true 1 out of20 times (commit a Type I error). By conducting the in itial overall screening in a single test, we protect against the compounding o f the alpha level brought on by multiple comparisons. We look at three examples of screen ing tests fork> 2 independent samples:
  • 89. I. One-way analysis o f variance (ANOVA) (interval or ratio scale) 2. Kruskal· Wallis (K· W) test (ordinal scale) 3. X1 test of independence (k x k) (nominal scale) 108 '""' I • QUANTITATIVl AmtOA.CIILS : fOU"-DATIOJr.S Of DA'rA C.olUCltOh' One· Way A011dysis of'ariance. The AtOVA is a test of means. The null hypothesis is where k is the number of population nocans being estimated. If all of the means are equal, then it fo llows that the voriance of the means is 0 or I 10 : &,. = 0. The test statistic used in A..'OVA is called F and is calculated as follows: n_.; 7 where the numerator is the variance of the sample means mu ltiplied by the sample size, and the denominator is a pooled estimntc of the score variances within the samples. The assumptions underlying o ne-way ANOVA are as follows: Randomness: Sample members must be randomly drawn from the population and randomly
  • 90. assigned to one of the k groups. Indepelltltllct: Scores must be independent of each other. Scalir~g: The dependent measure must be interval or ratio. Normnl distribution: The populations from which the individuals in the sam ples were drawn must be normally d istributed. Homoge11ciry of variances (oi = o~ = .. . = o~): The samples must be drawn from pop· ulntions whose variances arc equal. &jualiry of sample sizes (n, = n, = ... = 11,): The samples must be of the same size. ANOVA involves taking the variability among scores and detumining which is vari· ability due to membership in a particular group (variability a.~sociated with group means or between-group variance) and which is variability associated with unexplained fluctua· tions (wi thin-group variance). The totnl variability of scores is divided into one componenl representing the variability of treatment group means around an overall mean (sometimes called a grand mean) and another component representing the variability of group scores around their own individ· ual group means. The variability of group means around the grand mean is called between· group variance. The variabiliry of individual scores around their own group means is called within-group variance. This division is rep.--nted by the
  • 91. foUowing equation: {X - X)~ (X -Xl +(X-X). Total Within Between The X with two bars represems the grand mean, which is the mean of all scores with· out respect to which group they are in. X is a particular score, and the X with one bar is the mean of the group to which that score belongs. C.HAPlUt 6 a STATiiliGS roll: SOCIAl W Oill({fi S 109 This equation illustrates that tbe deviatio n of the particul ar score fro m t he grand mean is the sLun of the deviation of the sco re fro m its g roup mean and the deviation of tbe g ro up mean fro m t he g rand mean. T his might be a little dearer if we look at a simple data set. Let us hlke the exam ple about ageist attit udes among hospital social workers (Group I), nursing ho me social workers (Gro up 2), a11d adult protective services social workers (Group 3). T be dependent measure quan tifies ageist attitudes (higher scores represent n1ore ageist sentiment). There are k = 3 g ro ups, with each containing n = 4 scores. The total number of scores is N= 12. The group means are 3 (Gro up 1 ), 5 (G roup 2), and 9 (Grotlp 3), and the grand mea n is 5.67. There are t hree types of sum of squares calculated in AN OVA.
  • 92. T he fo rm ulas fo r the sums of sq uares are derived fro m t he deviatio n score C<j uations. ss, ... 1 is calculated by subtracting the grand mean from each score, squaring the differ- ences, and add ing up (summing) the squared differences: =2 ss,."' = (X - Xl . ss .... m is calculated by subtracting the group mean fro m each score within a group, squaring the differences, a nd adding up (summing) the squared differences fo r each g ro up. This gives us t hree s ums of squares: sswoup I' SSC.,>I>p , . and SS.;ooup>· These are added up to give us ssv.·ilhin: - 2 - 2 - 2 ssW'''" = r <x - x,J + r <x - x,) + r <x - x,J . s~.~ is calculated by subtracting t he g rand mea n from each group mean, squaring the diffe rences, and adding up (summing) the squared differences. Then, we multiply the to tal by the sample size. This is because this sum of squares needs to be weighted. Whereas N = 12 scores ~~ent to make up SS10,.1, and ( k)(n) = (3)(4) = 12 scores went to m ake up SS., ... ,,,, o nly the k= 3 g roup means went to make upS~""'". We m ultiply by 11 = •l so that S~~ will have t he same " 'eig ht as tlte o ther two sums of
  • 93. squares: S~"'""' = " I (X - X)'. The sums of squares arc as fo llow·s: SS,.;,'"' = 20 + 20 + 20 = 60 s~ ..... ,"' (4) 18.667 = 74 .667 ss ... ,, = 134.667. The to tal sum of squares (SS~,1 ) is t he sum of the within-g ro up su m of sq <Lares (SS.,.,.,) and the between-group sum of squares (55,....,,): o r 134.667 = 60.00 + 74.667. 110 PAtH I a Q u AN11JA1 1V[ APPI0A(H£S: FOUIIOAltO~S Of 0 AlA COlltCTIO!.' Each of these sums o f squares is a component o f a d iffere nt variance. In ANOVA jar- gon, a variance is called a mean square. Each particular m ean square ( variance) has its own degrees of freedom . Because the total sum o f squares (SS,.,1) involves t he varia bility o f all scores aro und o ne grand mean, the degrees of freedom ar e N - l. The within- groups sum of squares (SSw"''") involves the variability of all scores wit hin g roups around k g ro up m eans, where
  • 94. k is the n umber o f g ro ups. So, the within-groups degrees o f freedo m are N- k. T he between-groups sum of squares($""""') involves the va riability of k gr o up m eans around the grand mea n. So, the between-g roups degrees of freed om are k - J. BeCtJase :1 (/tlritlii<'Y:' (meoll sqa,?re) is,? Rllll of square> diviOed br degrees of freedom, the fo rmu la fo r a m ean square would be MS ~ SSitlf Two mean squares are u::;ed to calcnlate the Fubt statistic: MS~·i!Jun and A-f~,wMn · Their specific fo rm ulas are as follows: There are k ~ 3 groups, so df,"""" = k - 1 = 3- 1 = 2. We may now compute A•f""'" = i 4.66712 = 3i.333 and T here are a to tal of N = 12 scores within k = 3. so di,;,,;0 = 12- 3 = 9 and MS .. n ,h;, ~ 60/9 ~ 6.667. These are the two variances u~ed ro m ake up the F ratio (F •• ,): MS.., • ...., and MS,.,,,,. The fo rm LLla for F •• , is MSt,.,w..,n MSwulUn . l f we plug in t he values from o ur example, t hen we obtain fo~x = MSb""'"" = 37.333 = S.6s. MS,,;,hin 6.667
  • 95. This is a bit confusing when presented in bits aJ1d pieces. The ANOVA sununary table is a way of p resent ing t he information about the sums of squares, degrees of freedom, mean squares, and F statistics in a more easily understood fashion. Table 6 . 17 uses the example data. Once we have computed the Poht' iL is compared to a critical F. Because two variances were used to calculate o ur F •• ,. there are two types of degrees o f freedom asso ciated with it: n umerator deg rees o f freedom (between g ro u ps) and de;w .minator d egrees of freedom (within g roups). T hese are used either to look up values in a table o f the F distribution or by computer programs to com pu te p values. For our example, the n umerator degrees o f freedo m are df = 2 because 2 degr ees of freedom were used in the calculation o f MS,""'"'' The d enominator d egrees of freedom C HJo i'IU 6 • S t ATISTIC.S fO ft S OCtAl 1N CIIUP.S 111 TABLE 6 . 17 ANOVA Summary Table Source Sum of Squares Degrees of Fceedom Mean Squar~ F 11111 B~tween 74.667 3 - 1 - 2 74.67/2 = 37 333 37..333/6 667 = 5 65 Within
  • 96. Total 60.00 134.667 12 - 3 - 9 60.00/ 9 = 6.667 12- 1 • 11 are df: 9 because 9 degrees of freedom were used in the calculation of MS . .,,h;, · The criti- cal value for Fat 2 and 9 degrees of freedom is .t~"' = 4.26. Because F..,,: 5.6 is greater than the critical value, we reject the null hypothesis at«= .OS. Based on these findin gs, it is likely th at at least one pair of means come from d ifferent populations. Because we already have screened out other opportuni ties LO commit'I)'Pe 1 error, further testing would not be capi[aiizing on chance. Thus, we may carry out the fol- lowing pair comparisons: Group l versus Group 2 Group I versus Group 3 Group 2 versus Group 3 The individual pair comparisons may be carried out using any of a number of multi- ple comparison tests. One of the more frequently used is the least significant difference (LSD) test. The l.SD test is a variant on the t test. However, the
  • 97. standard error of the mean is calculated from the within-groups mean square (variance) from the ANOVA: where tt, is the nwnber of scores in Group i, and tt, is the number of scores in Group J. If the group TIS are equal, then this becomes For our example, Sx;-.<_; = )(2}(6 .667)/4 = J3.333 = 0.557. We now maycarry oul our comparisons evaluating tat df= N - k= 12 - 3 = 9 (Figure 6.6). In all three instances, we reject the rwll hypothesis at a = .OS. I Figure 6 .6 Multiple Comparisons Hospilal (Group I) vs t - 3 - 5 - 3466 df= 9,«= 05 Nursing Home (Group 2) "' - 0.577 - . / t!tl = 2.262 Reject H. Hosprtal (Group 1) vs. r.,. •• g;~ = 10399 Clf = 9, a- .05 Adult Protective Services t .. , = 2.262 (Group 3) Reject H.
  • 98. Nursrng Home (Group 2) '-=~5~ = 6.932 Clf = 9,a ~ 05 vs. Adun Pro!ectrve la.= 2.262 Services (Group 3) Rejecl H. T here are a number of measure> for effect size for ru'0'A. For the >.Ike of srmplicity, we d eal wit h rwo: Cohen'• (1988) J and 1{ The J effect· size mca>ure is eq ual to Lhe stand ard deviatio n of th e sam ple means divided by the pooled "ithin group standard devialion. It ranges from a min imum of 0 to an rndetinitcly large upper limit. It m~) be estimated from F..,. by using the following for mula: f = JnFobr· 11' wa, discussed earlier and defined as a proportion of variance explarned. It is calcu- laled by the fo llowing formula: l S.'itwlwttn 1) =-- - . ss,,,,., It also may be calcul.lled from art F.,.: Cohen ( 1988) categorizes these effect si1-"s into small, medium, and large categories. The critcri~ lor each are as folio" s: Sm all cfYcct size: f :. .lO Medium efYect size: f; .25 Large effect size: f .40 Using the exarn plr dJLa, 11' is
  • 99. 11' = .0 1 11' ; .06 11'; . 14 z SSt.,,.... 74.667 'l = = 0.554. ss,"'·'' t 34.667 CHArtfa 6 • Sr.c..nsTIC;.s fQI SociAL WoRKEss 113 which is a very large effect. Kmskal-Wal!is Test. The K-W test is the k > 2 groups equivalent o f the W/M -W test. The test involves iniliall y treating all samples as one gro up and ranking scores from least to most. After this is done, the frequenc ies of low and high ranks among groups <1re compared. The assumptions of the K-W test are as follows: Rat~donmess: Sample members must be randomly drawn from the population of inter- est and randomly assigned to one of the k groups. Independence: Scores must be independent of each other. Scali?Jg: The dependent measure must be ordi nal (interval or ratio scores must be con-
  • 100. verted to ranks). When the assumptions of ANOVA arc mer, the analysis of variance will be sligh tly more po<,•erful than the K -W test. However, if the distribution of population scores is not normal and/or the population variances are not equal. then the K-W test might be the more powerful test. The K-W test is a screening test. If th ere is no significant difference foun d, then we stop testing. If a significant difference is fo und, then we proceed to test ind ividual pairs with the W/M -W test. Our example involves the evaluation of three interven tion techniques being used with clients who wish to stop making negative self-statements: (a) self-disputation, (b) thought stopping, and (c) identifying the source of the negative statement (insight). A total o r 27 clients with this concern were randomly selected and assigned to one of the three intervention conditions. On the 28th day of the intervention, each client counted the n umber of negative self-statementS that he or she had made. The proced ure for tlle K-W test is s imilar to that for the W/M- W test. We begin by assigning ranks to the scores without regard to which group individuals were in. We then sum the ranks within each group. The sununed ranks are called W, for Group I, W2 for Group 2, and W, fo r Group 3 (Table 6 .18).