Validity and Validation Theories and Procedures

VALIDITY AND VALIDATION:
THEORIES AND PROCEDURES
125/12/2015
VALIDITY AND VALIDATION:
THEORIES AND PROCEDURES

VALIDATION TASK
To establish whether the interpretation and uses
of the VSTEP test scores were valid for measuring the
English language competence of test-takers
from level 3 to level 5 on the Vietnamese English
language competence scale
225/12/2015
To establish whether the interpretation and uses
of the VSTEP test scores were valid for measuring the
English language competence of test-takers
from level 3 to level 5 on the Vietnamese English
language competence scale

VALIDITY & VALIDATION
Validity is an integrated evaluative judgment of the degree to
which empirical evidence and theoretical rationales support the
adequacy and appropriateness of inferences and actions based
on test scores or other models of assessment.
(Messick, 1989)
325/12/2015
Validity is an integrated evaluative judgment of the degree to
which empirical evidence and theoretical rationales support the
adequacy and appropriateness of inferences and actions based
on test scores or other models of assessment.
(Messick, 1989)
Validation is to marshal evidence and arguments in support of,
or counter to, proposed interpretations and uses of test scores.
(Messick, 1989)

VALIDITY THEORIES
 1985 – The 1985 Testing Standards
 Unified concept of validity
 Construct-related evidence
 Content-related evidence
 Concurrent-related evidence
 1989 – Messick’s Validity Chapter
 Evidential basis (Construct, Relevance, Utility)
 Consequential basis (Values, Social Consequences)
425/12/2015
 1985 – The 1985 Testing Standards
 Construct-related evidence
 Content-related evidence
 Concurrent-related evidence
 1989 – Messick’s Validity Chapter
 Evidential basis (Construct, Relevance, Utility)
 Consequential basis (Values, Social Consequences)

MESSICK (1989)’S ASPECTS OF VALIDITY
Content
Structural
Consequential
External
Generalizability
Substantive
525/12/2015
Content
Structural
Consequential
External
Generalizability
Substantive

 The content aspect
 Content relevance
 Representativeness
 Technical quality
 The substantive aspect
Theoretical rationales for observed consistencies in responses
 Process of performance
 Empirical evidence of process
625/12/2015
 The content aspect
 Content relevance
 Representativeness
 Technical quality
 The substantive aspect
Theoretical rationales for observed consistencies in responses
 Process of performance
 Empirical evidence of process

 The structural aspect
The fidelity of the scoring structure to the construct structure.
 The generalizability aspect
The extent to which score properties and interpretations
generalize to and across groups, settings and tasks
 Reliability
 Content representativeness
725/12/2015
 The structural aspect
The fidelity of the scoring structure to the construct structure.
 The generalizability aspect
The extent to which score properties and interpretations
generalize to and across groups, settings and tasks
 Reliability
 Content representativeness

 The external aspect
 Convergent and discriminant evidence
 Criterion relevance
 Applied utility
 The consequential aspect
Value implications as a basis for action/consequences
 Bias
 Fairness
825/12/2015
 The external aspect
 Convergent and discriminant evidence
 Criterion relevance
 Applied utility
 The consequential aspect
Value implications as a basis for action/consequences
 Bias
 Fairness

MESSICK (1989)’S VALIDITY FRAMEWORK
 Value
 The most influential framework of validity
 Criticisms
 Abstract
 Difficult to be done by a single researcher
 No specific guidance for specific validation context
925/12/2015
 Value
 The most influential framework of validity
 Criticisms
 Abstract
 Difficult to be done by a single researcher
 No specific guidance for specific validation context

VALIDITY THEORIES
 Kane (1992)’s and (2006)’s Validity Chapter
Argument-based Approach to Validation
 Interpretive Argument
The network of inferences and assumptions
 Validity Argument
 Logical evidence
 Empirical evidence
The
Development
Stage
1025/12/2015
 Kane (1992)’s and (2006)’s Validity Chapter
Argument-based Approach to Validation
 Interpretive Argument
The network of inferences and assumptions
 Validity Argument
 Logical evidence
 Empirical evidence
The
Appraisal
Stage

KANE (1992)’S VALIDITY FRAMEWORK
 Values
 The most practical, objective framework of validity
 Unique interpretive argument, consistent validity argument
steps (Bachman, 2004)
 Criticisms
 No attention to the structural aspect (Messick, 1995)
 Inadequate attention/method to policy context and
consequences of tests (McNamara, 2006).
1125/12/2015
 Values
 The most practical, objective framework of validity
 Unique interpretive argument, consistent validity argument
steps (Bachman, 2004)
 Criticisms
 No attention to the structural aspect (Messick, 1995)
 Inadequate attention/method to policy context and
consequences of tests (McNamara, 2006).

LANGUAGE TEST VALIDATION
 Bachman (1990)’s framework, after Messick (1989)’s
 Bachman (2004)’s framework, after Kane (1992)’s
1225/12/2015
 Bachman (1990)’s framework, after Messick (1989)’s
 Bachman (2004)’s framework, after Kane (1992)’s

CHOICE OF VALIDITY FRAMEWORK
 Messick (1989)’s
 Six aspects
Content
Structural
Consequential
External
Generalizability
Substantive
1325/12/2015
Content
Structural
Consequential
External
Generalizability
Substantive

1. To what extent was the test content relevant to and
representative of the domain of English language ability?
2. To what extent was each sub-test successful in measuring
students’ English language ability?
3. How well did the test-takers’ test scores on the VSTEP
correlate with their test scores on the IELTS?
4. What were the consequences of the UEE English test
scores' interpretation and use?
VALIDATION QUESTIONS
1425/12/2015
1. To what extent was the test content relevant to and
representative of the domain of English language ability?
2. To what extent was each sub-test successful in measuring
students’ English language ability?
3. How well did the test-takers’ test scores on the VSTEP
correlate with their test scores on the IELTS?
4. What were the consequences of the UEE English test
scores' interpretation and use?

WINTERTemplate
01CONTENT
• Content relevance
• Technical quality
• Content representativeness

WINTERTemplate
RELEVANCE
• Topical content
• Typical behavior
• Underlying process
• Test specifications
01CONTENT
RELEVANCE
• Topical content
• Typical behavior
• Underlying process
• Test specifications

WINTERTemplate
01CONTENT
TECHNICAL QUALITY
Empirical Evidence
• difficulty level
• discriminating power
Expert Judgment
• readability level
• freedom of ambiguity/irrelevancy
• appropriateness of keyed answers & distractors
TECHNICAL QUALITY
Empirical Evidence
• difficulty level
• discriminating power
Expert Judgment
• readability level
• freedom of ambiguity/irrelevancy
• appropriateness of keyed answers & distractors

WINTERTemplate
REPRESENTATIVENESS
The breadth of the content specifications for a test should
reflect the breadth of the construct invoked in score
interpretation” (Messick, 1989, p. 35).
All essential components of the construct domain are
covered (Messick, 1994, p. 12).
01CONTENT
REPRESENTATIVENESS
The breadth of the content specifications for a test should
reflect the breadth of the construct invoked in score
interpretation” (Messick, 1989, p. 35).
All essential components of the construct domain are
covered (Messick, 1994, p. 12).

WINTERTemplate
01CONTENT
CONTENT ANALYSIS BY EXPERTS
• What knowledge and skills are needed to do each
item correctly?
• How relevant are the items to their assigned
objectives and domain?
Domain
• English secondary school curricula
• English program at the college
CONTENT ANALYSIS BY EXPERTS
• What knowledge and skills are needed to do each
item correctly?
• How relevant are the items to their assigned
objectives and domain?
Domain
• English secondary school curricula
• English program at the college

WINTERTemplate
01CONTENT
RASCH ANALYSIS
Item fit statistics

WINTERTemplate
01CONTENT
Item fit statistics
Smith (2004) suggested using item fit statistics to evaluate the
extent to which items tap into the same construct and place
test-takers in the same order.
- the extent to which the use of each item is consistent with the
way people have responded to the other items
- does the item rank order the individuals in a manner similar to
other items? (p. 106)
Smith (2004) argued that test-takers should be ranked
consistently by items measuring the same construct. If not, the
misfitting items to the Rasch model, i.e. the items that measure
a different construct, should be subject to revision or elimination
(p. 107).
Item fit statistics
Smith (2004) suggested using item fit statistics to evaluate the
extent to which items tap into the same construct and place
test-takers in the same order.
- the extent to which the use of each item is consistent with the
way people have responded to the other items
- does the item rank order the individuals in a manner similar to
other items? (p. 106)
Smith (2004) argued that test-takers should be ranked
consistently by items measuring the same construct. If not, the
misfitting items to the Rasch model, i.e. the items that measure
a different construct, should be subject to revision or elimination
(p. 107).

To what extent was the VSTEP sub-tests successful in
measuring students’ English language competence?
ITEM RESPONSE THEORY (RASCH MODEL)
item fit
item discrimination
item cluster
DISCRIPTIVE STATISTICS
choice response analysis
02SUBSTANTIVE & STRUCTURAL
25/12/2015 22
To what extent was the VSTEP sub-tests successful in
measuring students’ English language competence?
ITEM RESPONSE THEORY (RASCH MODEL)
item fit
item discrimination
item cluster
DISCRIPTIVE STATISTICS
choice response analysis

How well did the test-takers’ VSTEP overall and
sub-test scores correlate with the test-takers’
overall and sub-test IELTS scores?
03CRITERION-RELATED
25/12/2015 23

04
• The value implications of score interpretation
• The actual and potential consequences of score
uses
(Messick, 1989)
FOCUS: on validity of test score interpretation and
use - construct under-representation or construct-
irrelevant variance
CONSEQUENCES
25/12/2015 24
• The value implications of score interpretation
• The actual and potential consequences of score
uses
(Messick, 1989)
FOCUS: on validity of test score interpretation and
use - construct under-representation or construct-
irrelevant variance

04
Sources of evidence
• Content relevance and representativeness
• Item bias
• Technical quality of the test
• Expert judgment
CONSEQUENCES
25/12/2015 25
Sources of evidence
• Content relevance and representativeness
• Item bias
• Technical quality of the test
• Expert judgment

References
 American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1985). Standards
for Educational and Psychological Testing. Washington, DC: Authors.
for Educational and Psychological Testing. Washington, DC: American Educational Research Association.
 Andrich, D., & Mercer, A. (1997). International perspectives on selection methods of entry into higher education. Canberra: National Board of
Employment, Education and Training [and] Higher Education Council.
 Bachman, L. F. (1990). Fundamental considerations in language testing. Oxford: Oxford University Press.
 Bachman, L. F. (2004). Statistical analyses for language assessment. Cambridge: Cambridge University Press.
 Berk, R. A. (1980). Item Analysis. In R. A. Berk (Ed.), Criterion-referenced measurement: the state of the art. Baltimore and London: The Johns Hopkins
University Press.
 Cureton, E. E. (1951). Validity. In E. F. Lindquist (Ed.), Educational measurement (pp. 621-694). Washington, D.C.: American Council on Education.
 Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park, California: Sage Publications.
 Kane, M. T. (1992). An argument-based approach to validity. Psychological Bulletin, 112, 527.
 Kane, M. T. (2006). Validation. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 17-64). Westport, CT: American Council on
Education/Praeger.
 Loevinger, J. (1957). Objective tests as instruments of psychological theory. Psychological Reports, 3(4), 635-694.
 McNamara, T., & Roever, C. (2006). Language testing: the social dimension. Malden, MA: Blackwell Publishing.
 Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13-103). New York: American Council on Education/Macmillan.
 MOET. (2006). Secondary Education Curriculum: English. Hanoi: Education Publisher.
 Moss, P. A. (2007). Reconstructing Validity. Educational Researcher, 36(8), 470-476.
 Popham, W. J. (1997). Consequential Validity: Right Concern--Wrong Concept. Educational Measurement: Issues and Practice, 16(2), 9-13.
 Purpura, J. E. (1999). Learner strategy use and performance on language tests : a structural equation modeling approach. Cambridge: Cambridge
University Press.
 Smith, E. V. (2004). Evidence for Reliability of Measures and Validity of Measure Interpretation: A Rasch Measurement Perspective. In E. V. Smith & R.
M. Smith (Eds.), Introduction to Rasch Measurement: Theory, Models and Applications. Maple Grove: JAM Press.
 Wu, M. L., Adams, R. J., & Haldane, S. (2008). ConQuest: Generalised Item Response Modelling Software [computer program]. Camberwell: Australian
Council for Educational Research.
2625/12/2015
for Educational and Psychological Testing. Washington, DC: Authors.
for Educational and Psychological Testing. Washington, DC: American Educational Research Association.
 Andrich, D., & Mercer, A. (1997). International perspectives on selection methods of entry into higher education. Canberra: National Board of
Employment, Education and Training [and] Higher Education Council.
 Bachman, L. F. (1990). Fundamental considerations in language testing. Oxford: Oxford University Press.
 Bachman, L. F. (2004). Statistical analyses for language assessment. Cambridge: Cambridge University Press.
 Berk, R. A. (1980). Item Analysis. In R. A. Berk (Ed.), Criterion-referenced measurement: the state of the art. Baltimore and London: The Johns Hopkins
University Press.
 Cureton, E. E. (1951). Validity. In E. F. Lindquist (Ed.), Educational measurement (pp. 621-694). Washington, D.C.: American Council on Education.
 Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park, California: Sage Publications.
 Kane, M. T. (1992). An argument-based approach to validity. Psychological Bulletin, 112, 527.
 Kane, M. T. (2006). Validation. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 17-64). Westport, CT: American Council on
Education/Praeger.
 Loevinger, J. (1957). Objective tests as instruments of psychological theory. Psychological Reports, 3(4), 635-694.
 McNamara, T., & Roever, C. (2006). Language testing: the social dimension. Malden, MA: Blackwell Publishing.
 Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13-103). New York: American Council on Education/Macmillan.
 MOET. (2006). Secondary Education Curriculum: English. Hanoi: Education Publisher.
 Moss, P. A. (2007). Reconstructing Validity. Educational Researcher, 36(8), 470-476.
 Popham, W. J. (1997). Consequential Validity: Right Concern--Wrong Concept. Educational Measurement: Issues and Practice, 16(2), 9-13.
 Purpura, J. E. (1999). Learner strategy use and performance on language tests : a structural equation modeling approach. Cambridge: Cambridge
University Press.
 Smith, E. V. (2004). Evidence for Reliability of Measures and Validity of Measure Interpretation: A Rasch Measurement Perspective. In E. V. Smith & R.
M. Smith (Eds.), Introduction to Rasch Measurement: Theory, Models and Applications. Maple Grove: JAM Press.
 Wu, M. L., Adams, R. J., & Haldane, S. (2008). ConQuest: Generalised Item Response Modelling Software [computer program]. Camberwell: Australian
Council for Educational Research.

THANK YOU
FOR YOUR ATTENTION
2725/12/2015
THANK YOU
FOR YOUR ATTENTION

Validity and Validation Theories and Procedures

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (6)

Ähnlich wie Validity and Validation Theories and Procedures

Ähnlich wie Validity and Validation Theories and Procedures (20)

Mehr von englishonecfl

Mehr von englishonecfl (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Validity and Validation Theories and Procedures