This document discusses reliability, validity, generalizability, and the use of multi-item scales in research. It describes how to evaluate scales for internal consistency reliability using Cronbach's alpha, test-retest reliability, and construct validity through convergent and discriminant validity testing. The document provides an example of how to develop a multi-item scale and assess its psychometric properties using statistical tools like structural equation modeling in Amos.
4. How to use a questionnaire from
published work
• Appendix with items
• Methodology section
5. Existing multi-item scales
• Used by many
• Reliability and validity may be known
• Good starting block
• Basis to compare / contrast results
6. Development of a Multi-item Scale
(Doing it the HARD way!! See Malhotra & Birks, 2007)
Develop Theory
Generate Initial Pool of Items: Theory, Secondary Data, and
Qualitative Research
Collect Data from a Large Pretest Sample
Statistical Analysis
Develop Purified Scale
Collect More Data from a Different Sample
Final Scale
Select a Reduced Set of Items Based on Qualitative Judgment
Evaluate Scale Reliability, Validity, and Generalizability
7. Example of Scale Development
• See Richins & Dawson (1992) “A Consumer
Values Orientation for Materialism and its
Measurement: Scale Development and
Validation,” Journal of Consumer Research, 19
(December), 303-316.
• Materialism scale (7 items)
– Marketing Scales Handbook (Vol IV) p. 352.
1.It is important to me to have really nice things.
2.I would like to be rich enough to buy anything I want.
3.I‟d be happier if I could afford to buy more things.
4. ......
• Note, published scales not always perfect!!!
8. Scale Evaluation
(See Malhotra & Birks, 2007)
Discriminant NomologicalConvergent
Test/
Retest
Alternative
Forms
Internal
Consistency
Content Criterion Construct
GeneralizabilityReliability Validity
Scale Evaluation
9. Reliability & Validity
• Reliability - extent a measuring
procedure yields consistent results on
repeated administrations of the scale
• Validity - degree a measuring
procedure accurately reflects or assesses
or captures the specific concept that the
researcher is attempting to measure
Reliable ď‚ą Valid
10. Reliability
• Internal consistency reliability
DO THE ITEMS IN THE SCALE GEL WELL TOGETHER
• Split-half reliability, the items on the scale are divided
into two halves and the resulting half scores are
correlated
• Cronbach alpha (α)
– average of all possible „split-half‟ correlation coefficients resulting
from different ways of splitting the scale items
– value varies from 0 to 1
– α < 0.6 indicates unsatisfactory internal consistency reliability
(see Malhotra & Birks, 2007, p.358)
– Note: alpha tends to increase with an increase in the number of
items in scale
11. • test-retest reliability
– identical scale items administered at two different
times to same set of respondents
– assess (via correlation) if respondents give similar
answers
• alternative-forms reliability
– two equivalent forms of the scale are constructed
– same respondents are measured at two different
times, with a different form being used each time
– assess (via correlation) if respondents give similar
answers
– Note. Hardly ever practical
12. Construct Validity
• Construct validity is evidenced if we can establish
– convergent validity, discriminant validity and nomological validity
• Convergent validity extent to which scale correlates
positively with other measures of the same construct
• Discriminant validity extent to which scale does not
correlate with other conceptually distinct constructs
• Nomological validity extent to which scale correlates in
theoretically predicted ways with other distinct but
related constructs.
• Also read Malhotra & Birks, 2007, 358-359 on
– content (or face) validity, criterion (concurrent & predictive)
validity
13. Generalizability
• Refers to extent you can generalise from
your specific observations to beyond your
limited study, situation, items used,
method of administration, context.....
• Hardly even possible!!!
14. Fun time
• Now onto the data (COCB.sav) !!!!!!
• Read my forthcoming JBR article for
background on COCB and the scale
• 1st SPSS and Cronbach alpha
• Next, Amos and CFA
• Followed by Excel to calculate
composite/construct reliability and AVE, as
well as establish discriminant validity
17. SPSS output for α
Alpha value for dimension Credibility = 0.894 > 0.7 hence satisfactory
18. SPSS further output for α
• We note that alpha value for the Credibility
dimension would increase in value (from 0.894
to 0.902) if item cred4 is removed.
• However, unless the improvement is dramatic
AND there is separate reasons (e.g. similar
findings from other studies), then we should
leave the item as part of the dimension.
19. Limitations for Cronbach alpha
• We should employ multiple measures of
reliability (Cronbach alpha, composite/construct
reliability CR & Average Variance Extracted
AVE)
– Alpha and CR values often are very similar
but AVE‟s can vary much more from alpha
values
– AVE‟s are also used to assess construct
discriminant validity
20. Composite/Construct Reliability
• CR = {(sum of standardized loadings)2} / {(sum of
standardized loadings)2 + (sum of indicator
measurement errors)}
• AVE = Average Variance Extracted = Variance Extracted
= {sum of (standardzied loadings squared)} / {[sum of
(standardzied loadings squared)] + (sum of indicator
measurement errors)}
• Note: Recommended thresholds: CR > 0.6 & AVE > 0.5,
then construct internal consistency is evidenced (Fornell
& Larker, 1981).
Ref: Fornell, Claes and David G. Larcker (1981). “Evaluating Structural
Equation Models with Unobservable Variables and Measurement
Error,” Journal of Marketing Research, 18(1, February): 39-50.
21. Discriminant validity
• Discriminant validity is assessed by comparing
the shared variance (squared correlation)
between each pair of constructs against the
minimum of the AVEs for these two constructs.
• If within each possible pairs of constructs, the
shared variance observed is lower than the
minimum of their AVEs, then discriminant validity
is evidenced (Fornell and Larker, 1981).
23. CFA and goodness of fit
• See Hair et al.‟s book
• E.g.,
• The CFA resulted in an acceptable overall fit
(GFI=.90, CFI=.94, TLI=.92, RMSEA=.068, and
χ2=524.64, df=160, p<.001). All indicators load
significantly (p<.001) and substantively
(standardized coef >.5) on to their respective
constructs; thus providing evidence of
convergent validity.
24. Refs
• Baumgartner H, Homburg C. (1996). “Applications of structural
equation modeling in marketing and consumer research: a review,”
International Journal of Research in Marketing,13(2):139–61.
• Churchill, Gilbert A., Jr. (1979). “A Paradigm for Developing Better
Measures of Marketing Constructs,” Journal of Marketing Research,
16(1, February): 64-73.
• Fornell, Claes and David G. Larcker (1981). “Evaluating Structural
Equation Models with Unobservable Variables and Measurement
Error,” Journal of Marketing Research, 18(1, February): 39-50.
• Hair, Joseph F., Jr., Rolph E. Anderson, Ronald L. Tatham, and
William C. Black (1998), Multivariate Data Analysis. 5th ed.
Englewood Cliffs, NJ: Prentice Hall.
• Nunnally JC & Berstein IH. (1994) Psychometric Theory. New York:
McGraw-Hill.